# Example Usage

{% hint style="warning" %}
Bulk CSV Downloads is in **early beta**. We'd love your feedback — please reach out with any questions or issues.
{% endhint %}

All examples below use `--profile gridstatus` which assumes you have configured the named profile as described in [Getting Started](/developers/bulk-csv-downloads/getting-started.md).

## List available datasets

```bash
aws s3 ls s3://gs-catalog-csv/ --profile gridstatus
```

```
PRE aeso_daily_average_pool_price/
PRE aeso_fuel_mix/
PRE aeso_interchange/
PRE aeso_load/
PRE aeso_load_forecast/
PRE caiso_as_prices/
PRE caiso_curtailment/
PRE caiso_fuel_mix/
PRE caiso_lmp_day_ahead_hourly/
PRE caiso_lmp_real_time_5_min/
...
```

Each `PRE` entry is a dataset folder. There may be hundreds of datasets depending on your export.

## Explore available data range within a dataset

List the available years:

```bash
aws s3 ls s3://gs-catalog-csv/caiso_fuel_mix/ --profile gridstatus
```

```
PRE year=2018/
PRE year=2019/
PRE year=2020/
PRE year=2021/
PRE year=2022/
PRE year=2023/
PRE year=2024/
PRE year=2025/
PRE year=2026/
```

List months within a year:

```bash
aws s3 ls s3://gs-catalog-csv/caiso_fuel_mix/year=2025/ --profile gridstatus
```

```
PRE month=01/
PRE month=02/
PRE month=03/
...
PRE month=12/
```

List individual files within a month:

```bash
aws s3 ls s3://gs-catalog-csv/caiso_fuel_mix/year=2025/month=01/ --profile gridstatus
```

```
2026-02-18 21:26:05       9064 2025-01-01.csv.gz
2026-02-18 21:26:08       9235 2025-01-02.csv.gz
2026-02-18 21:26:07       9171 2025-01-03.csv.gz
...
2026-02-18 21:26:33       9442 2025-01-31.csv.gz
```

Get total file count and size for an entire dataset with `--recursive --summarize`:

```bash
aws s3 ls s3://gs-catalog-csv/pjm_lmp_real_time_5_min/ --recursive --summarize --profile gridstatus
```

```
...
2026-02-18 07:34:20  115232137 pjm_lmp_real_time_5_min/year=2026/month=02/2026-02-17.csv.gz

Total Objects: 2880
   Total Size: 228560459449
```

In this case, `pjm_lmp_real_time_5_min` contains 2,880 files totaling \~213 GB.

## Download data

Download a single file with `aws s3 cp`:

```bash
aws s3 cp s3://gs-catalog-csv/caiso_fuel_mix/year=2025/month=01/2025-01-01.csv.gz ./data/ --profile gridstatus
```

```
download: s3://gs-catalog-csv/caiso_fuel_mix/year=2025/month=01/2025-01-01.csv.gz to data/2025-01-01.csv.gz
```

Sync an entire dataset to a local folder:

```bash
aws s3 sync s3://gs-catalog-csv/ercot_spp_day_ahead_hourly/ ./data/ercot_spp_day_ahead_hourly/ --profile gridstatus
```

```
download: s3://...ercot_spp_day_ahead_hourly/year=2019/month=01/2019-01-01.csv.gz to data/ercot_spp_day_ahead_hourly/year=2019/month=01/2019-01-01.csv.gz
download: s3://...ercot_spp_day_ahead_hourly/year=2019/month=01/2019-01-02.csv.gz to data/ercot_spp_day_ahead_hourly/year=2019/month=01/2019-01-02.csv.gz
...
```

Sync a single year:

```bash
aws s3 sync s3://gs-catalog-csv/caiso_fuel_mix/year=2025/ ./data/caiso_fuel_mix/year=2025/ --profile gridstatus
```

Sync a single month:

```bash
aws s3 sync s3://gs-catalog-csv/caiso_fuel_mix/year=2025/month=01/ ./data/caiso_fuel_mix/year=2025/month=01/ --profile gridstatus
```

Use `--exclude` and `--include` to filter specific files within a year (e.g. only June 2025):

```bash
aws s3 sync s3://gs-catalog-csv/caiso_fuel_mix/year=2025/ ./data/caiso_fuel_mix/year=2025/ \
  --exclude "*" \
  --include "month=06/*" \
  --profile gridstatus
```

Sync everything at once:

```bash
aws s3 sync s3://gs-catalog-csv/ ./data/ --profile gridstatus
```

{% hint style="warning" %}
The full export is approximately **1 TB** (\~1 million files).
{% endhint %}

## Tips for bulk downloads

### Resumable syncs

`aws s3 sync` compares each remote object to its local counterpart by size and last-modified time, so re-running the same command after an interruption only re-downloads the missing or changed files. The same property makes it the right tool for incremental refreshes — point it at the same local directory each day and it will only pull what changed.

### Run outside the daily refresh window

The bucket is rewritten by an incremental export starting at **06:00 UTC** and typically finishing within an hour. Pulling during the refresh window can give you inconsistent state across datasets — some have been rewritten with the new partition, others haven't yet. For point-in-time consistency, schedule large jobs outside the 06:00–07:00 UTC window.

### Region matters

The bucket lives in `us-east-2` (Ohio). For the highest download throughput, run your client from EC2 in the same region.

### Reading the gzipped CSVs

Files are gzipped CSVs (`*.csv.gz`). Most modern data tools handle them natively without a manual `gunzip` step:

* **pandas:** `pd.read_csv("2025-01-01.csv.gz")` — auto-detected by extension.
* **polars:** `pl.read_csv("2025-01-01.csv.gz")` — auto-detected.
* **DuckDB:** `SELECT * FROM read_csv_auto('data/caiso_fuel_mix/year=2025/**/*.csv.gz')` — supports glob patterns, partition pruning via `hive_partitioning=true`, and reads gzip transparently.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.gridstatus.io/developers/bulk-csv-downloads/examples.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
