Working with input datasets#
This page documents the main input datasets used in Open Climate Risk and shows how to access them programmatically via the ocr dataset catalog. Treat this as a technical reference for dataset names, example usage, and ingestion notes. Information about data provenance is available on the Data sources page.
Accessing the catalog
from ocr import catalog
# List datasets
print(catalog)
# Load a dataset as an xarray / geopandas object
# rps_30 = catalog.get_dataset('riley-et-al-2025-2047-30m-4326').to_xarray()
Tensor data (raster / Zarr)#
These are n-dimensional raster datasets stored in Zarr/Icechunk stores.
USFS Wildfire Risk to Communities#
Source: Scott et al. 2024
Ingested to:
ocr/input_datasets/tensor/usfs_scott_2024.pyTypical usage:
from ocr import catalog
crps = catalog.get_dataset('scott-et-al-2024-30m-4326').to_xarray()
USFS climate runs (2011 / 2047)#
Source: Riley et al. 2025
These are stored as zipped archives that the ingestion scripts expand into Icechunk stores.
climate_run_2011 = catalog.get_dataset('riley-et-al-2025-2011-30m-4326').to_xarray()
climate_run_2047 = catalog.get_dataset('riley-et-al-2025-2047-30m-4326').to_xarray()
Wind datasets#
Source: Rasmussen et al. 2023
Wind datasets and versions may change; if you add or switch wind sources, update the ingestion script under
input-data/and register the new dataset with theocrcatalog.
Vector data#
Vector data are building footprints, administrative boundaries, and other GIS vector layers used for exposure and aggregation.
Overture buildings#
Ingested subset for CONUS in
ocr/input_datasets/vector/overture.py
conus_buildings = catalog.get_dataset('conus-overture-buildings')
Accessing private data
The catalog relies on privately hosted data. Operationalizing our codebase without access to these data will currently require an update to the catalog. See Issue #367 for more detail.