# Getting Started

{% hint style="success" %}
Bulk CSV Downloads is a new offering. We welcome feedback as you get set up.
{% endhint %}

Bulk CSV Downloads provides our entire data catalog as compressed CSV flat files, delivered through AWS S3. The full export is approximately 1 TB compressed, and S3 gives you high-throughput parallel downloads to efficiently backfill your systems with our data.

We will share access to the export using AWS Security Token Service (STS), which grants you a temporary, limited-privilege credential to access the files in S3.

## Prerequisites

Please send us:

* AWS Account ID
* Confirmation you can use `sts:AssumeRole`

{% hint style="info" %}
You must use an IAM user or role when downloading files. A root account will not work (this is a limitation of AWS).
{% endhint %}

We will send you the following credentials so you can then access the data.

* `RoleArn`
* `ExternalId`

## Configure the AWS CLI (recommended option for downloading data)

First [install the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

In `~/.aws/config`, add a `gridstatus` profile. Using a named profile will allow the CLI to handle credential refresh automatically.

```ini
[profile gridstatus]
role_arn = <RoleArn>
external_id = <ExternalId>
source_profile = default

s3 =
  max_concurrent_requests = 64
```

`max_concurrent_requests = 64` is a good starting point for bulk downloads — the AWS CLI default of 10 leaves throughput on the table.

Verify it works by listing the available datasets:

```bash
aws s3 ls s3://gs-catalog-csv/ --profile gridstatus
```

```
PRE aeso_daily_average_pool_price/
PRE aeso_fuel_mix/
PRE aeso_interchange/
...
```

If you see dataset folders listed, your credentials are working. See [Example Usage](/developers/bulk-csv-downloads/examples.md) for more commands.

## Transfer to Google Cloud Storage

If your destination is Google Cloud Storage, [Storage Transfer Service](https://cloud.google.com/storage-transfer/docs/source-amazon-s3) can sync `gs-catalog-csv` directly into a GCS bucket. It authenticates with [AWS IAM role for federated identity](https://docs.cloud.google.com/storage-transfer/docs/source-amazon-s3#federated_identity), so let us know you want to use this path and send us the **Subject ID** of your Google-managed service account. We'll add it to the role's trust policy, after which you can point a transfer job at `s3://gs-catalog-csv` using the `RoleArn` we provide you.

If federated identity isn't a fit, [`rclone`](https://rclone.org/s3/) supports `sts:AssumeRole` with `ExternalId` (`role_arn` + `role_external_id`) and can sync S3 → GCS from a Compute Engine VM using the credentials we already provide.

## Refresh schedule

The bucket is refreshed once per day around **06:00 UTC** by an incremental export, typically finishing within an hour. Any daily partition with rows that were inserted or updated upstream since the previous run is rewritten in full, so historical files can change on any given day when corrections or late-arriving data flow through. To avoid pulling files mid-rewrite, schedule large `aws s3 sync` jobs **outside the 06:00–07:00 UTC window**.

## Other Options

* **Python with** [**`s3fs`**](https://s3fs.readthedocs.io/) - Use `S3FileSystem` with `assume_role_arn` and `assume_role_kwargs` to download files.
* **Python with** [**`boto3`**](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) - Use `RefreshableCredentials` via STS `AssumeRole` to list and download objects with concurrent transfers.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.gridstatus.io/developers/bulk-csv-downloads/getting-started.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
