For the complete documentation index, see llms.txt. This page is also available as Markdown.

Example Usage

All examples below use --profile gridstatus which assumes you have configured the named profile as described in Getting Started.

List available datasets

aws s3 ls s3://gs-catalog-csv/ --profile gridstatus
PRE aeso_daily_average_pool_price/
PRE aeso_fuel_mix/
PRE aeso_interchange/
PRE aeso_load/
PRE aeso_load_forecast/
PRE caiso_as_prices/
PRE caiso_curtailment/
PRE caiso_fuel_mix/
PRE caiso_lmp_day_ahead_hourly/
PRE caiso_lmp_real_time_5_min/
...

Each PRE entry is a dataset folder. There may be hundreds of datasets depending on your export.

Explore available data range within a dataset

List the available years:

List months within a year:

List individual files within a month:

Get total file count and size for an entire dataset with --recursive --summarize:

In this case, pjm_lmp_real_time_5_min contains 2,880 files totaling ~213 GB.

Download data

Download a single file with aws s3 cp:

Sync an entire dataset to a local folder:

Sync a single year:

Sync a single month:

Use --exclude and --include to filter specific files within a year (e.g. only June 2025):

Sync everything at once:

Tips for bulk downloads

Resumable syncs

aws s3 sync compares each remote object to its local counterpart by size and last-modified time, so re-running the same command after an interruption only re-downloads the missing or changed files. The same property makes it the right tool for incremental refreshes — point it at the same local directory each day and it will only pull what changed.

Run outside the daily refresh window

The bucket is rewritten by an incremental export starting at 06:00 UTC and typically finishing within an hour. Pulling during the refresh window can give you inconsistent state across datasets — some have been rewritten with the new partition, others haven't yet. For point-in-time consistency, schedule large jobs outside the 06:00–07:00 UTC window.

Region matters

The bucket lives in us-east-2 (Ohio). For the highest download throughput, run your client from EC2 in the same region.

Reading the gzipped CSVs

Files are gzipped CSVs (*.csv.gz). Most modern data tools handle them natively without a manual gunzip step:

  • pandas: pd.read_csv("2025-01-01.csv.gz") — auto-detected by extension.

  • polars: pl.read_csv("2025-01-01.csv.gz") — auto-detected.

  • DuckDB: SELECT * FROM read_csv_auto('data/caiso_fuel_mix/year=2025/**/*.csv.gz') — supports glob patterns, partition pruning via hive_partitioning=true, and reads gzip transparently.

Last updated

Was this helpful?