Partitioned Data Debugging
Testing Hive partitioning with DuckDB WASM.
DuckDB Status
Test Queries
Test 1: Direct File Access
Direct file access works perfectly
Query
SELECT * FROM read_parquet('https://storage.googleapis.com/climate_ts/partitioned/model_name=WRF-NARR_HIS/grid_name=R10C29/data_0.parquet') LIMIT 3;
Test 2: Manual Partition Columns
Manually add partition info to single file
Query
SELECT 'WRF-NARR_HIS' as model_name, 'R10C29' as grid_name, * FROM read_parquet('https://storage.googleapis.com/climate_ts/partitioned/model_name=WRF-NARR_HIS/grid_name=R10C29/data_0.parquet') LIMIT 2;
Test 3: Array of URLs
Test if DuckDB can read from an array of specific URLs
Query
SELECT * FROM read_parquet(['https://storage.googleapis.com/climate_ts/partitioned/model_name=WRF-NARR_HIS/grid_name=R10C29/data_0.parquet']) LIMIT 3;
Test 4: Multiple Files with UNION
Manually combine files from different partitions
Query
SELECT 'WRF-NARR_HIS' as model_name, 'R10C29' as grid_name, * FROM read_parquet('https://storage.googleapis.com/climate_ts/partitioned/model_name=WRF-NARR_HIS/grid_name=R10C29/data_0.parquet') LIMIT 2
UNION ALL
SELECT 'WRF-NARR_HIS' as model_name, 'R10C30' as grid_name, * FROM read_parquet('https://storage.googleapis.com/climate_ts/partitioned/model_name=WRF-NARR_HIS/grid_name=R10C30/data_0.parquet') LIMIT 2;
Test 5: Different Climate Model
Test accessing a different climate model partition
Query
SELECT * FROM read_parquet('https://storage.googleapis.com/climate_ts/partitioned/model_name=access1.3_RCP85_PREC_6km/grid_name=R10C29/data_0.parquet') LIMIT 3;
Debug Info
Expected Partition Structure:
climate_ts/partitioned/model_name=WRF-NARR_HIS/grid_name=R10C29/data_0.parquet
climate_ts/partitioned/model_name=access1.3_RCP85_PREC_6km/grid_name=R10C29/data_0.parquet
Key Points:
• Uses Hive partitioning (key=value in folder names)
• DuckDB should auto-detect partition columns
• Filter pushdown should work on partition columns
• Partition columns: model_name, grid_name