🐛 Partitioned Data Debugging

Testing Hive partitioning with DuckDB WASM

🔄 Initializing DuckDB WASM...

Test 1: ✅ Direct File Access

Direct file access works perfectly

Query SELECT * FROM read_parquet('https://storage.googleapis.com/climate_ts/partitioned/model_name=WRF-NARR_HIS/grid_name=R10C29/data_0.parquet') LIMIT 3;

Test 2: ✅ Manual Partition Columns

Manually add partition info to single file

Query SELECT 'WRF-NARR_HIS' as model_name, 'R10C29' as grid_name, * FROM read_parquet('https://storage.googleapis.com/climate_ts/partitioned/model_name=WRF-NARR_HIS/grid_name=R10C29/data_0.parquet') LIMIT 2;

Test 3: ✅ Array of URLs

Test if DuckDB can read from an array of specific URLs

Query SELECT * FROM read_parquet(['https://storage.googleapis.com/climate_ts/partitioned/model_name=WRF-NARR_HIS/grid_name=R10C29/data_0.parquet']) LIMIT 3;

Test 4: ✅ Multiple Files with UNION

Manually combine files from different partitions

Query SELECT 'WRF-NARR_HIS' as model_name, 'R10C29' as grid_name, * FROM read_parquet('https://storage.googleapis.com/climate_ts/partitioned/model_name=WRF-NARR_HIS/grid_name=R10C29/data_0.parquet') LIMIT 2 UNION ALL SELECT 'WRF-NARR_HIS' as model_name, 'R10C30' as grid_name, * FROM read_parquet('https://storage.googleapis.com/climate_ts/partitioned/model_name=WRF-NARR_HIS/grid_name=R10C30/data_0.parquet') LIMIT 2;

Test 5: ✅ Different Climate Model

Test accessing a different climate model partition

Query SELECT * FROM read_parquet('https://storage.googleapis.com/climate_ts/partitioned/model_name=access1.3_RCP85_PREC_6km/grid_name=R10C29/data_0.parquet') LIMIT 3;

Debug Info

Expected Partition Structure:
climate_ts/partitioned/model_name=WRF-NARR_HIS/grid_name=R10C29/data_0.parquet
climate_ts/partitioned/model_name=access1.3_RCP85_PREC_6km/grid_name=R10C29/data_0.parquet

Key Points:
• Uses Hive partitioning (key=value in folder names)
• DuckDB should auto-detect partition columns
• Filter pushdown should work on partition columns
• Partition columns: model_name, grid_name