Data schema#

Open Climate Risk (OCR) produces two primary types of output data: raster (tensor) datasets and vector (polygon) datasets. This page documents the structure, variables, and schema of both output types.

Overview#

        %%{init: {'theme':'neutral', 'themeVariables': {'primaryColor':'#2563eb','primaryTextColor':'#1f2937','primaryBorderColor':'#3b82f6','lineColor':'#6b7280','secondaryColor':'#7c3aed','tertiaryColor':'#10b981','background':'#ffffff','mainBkg':'#f3f4f6','secondBkg':'#e5e7eb','tertiaryBkg':'#d1d5db','primaryTextColor':'#111827','lineColor':'#6b7280','textColor':'#374151','mainContrastColor':'#1f2937','darkMode':false}}}%%
graph TB
    %% Input Data Sources
    subgraph Inputs["<b>Input Data Sources</b>"]
        USFS[USFS Fire Risk<br/>Scott et al. 2024<br/>Riley et al. 2025]
        CONUS[CONUS404 Climate<br/>Rasmussen et al. 2023]
        Buildings[Overture Maps<br/>Building Footprints]
    end

    %% Processing Pipeline
    subgraph Pipeline["<b>OCR Processing Pipeline</b>"]
        ProcessRegion[Process Region<br/>Wind-Adjusted Risk Calculation]
        WindCalc[Wind Direction<br/>Distribution]
        Sample[Sample Risk Values<br/>at Building Locations]
    end

    %% Raster Outputs
    subgraph RasterOutputs["<b>Raster Datasets (30m resolution)</b>"]
        RiskLayers["<b>Fire Risk Variables</b><br/>• rps_2011, rps_2047<br/>• bp_2011, bp_2047<br/>• Reference: rps_scott, crps_scott<br/>• Reference: bp_2011_riley, bp_2047_riley"]
        WindDist["<b>Wind Distribution</b><br/>• wind_direction_distribution<br/>• 8 cardinal/ordinal directions<br/>• Derived from CONUS404"]
    end

    %% Vector Outputs
    subgraph VectorOutputs["<b>Vector Datasets (polygon)</b>"]
        BuildingRisk["<b>Building-Level Risk</b><br/>• Same variables as raster<br/>• Sampled at building centroids<br/>• CONUS-wide coverage<br/>"]
    end

    %% Storage Formats
    subgraph Storage["<b>Storage Formats</b>"]
        Icechunk[("<b>Icechunk</b><br/>Zarr-based<br/>S3-backed<br/>Versioned")]
        GeoParquet[("<b>GeoParquet</b><br/>Hive-partitioned")]
    end

    %% Data Flow
    USFS --> ProcessRegion
    CONUS --> ProcessRegion
    CONUS --> WindCalc
    Buildings --> Sample

    ProcessRegion --> RiskLayers
    WindCalc --> WindDist
    ProcessRegion --> Sample

    RiskLayers --> Icechunk
    WindDist --> Icechunk
    Sample --> BuildingRisk
    BuildingRisk --> GeoParquet

    %% Styling
    classDef input fill:#e0f2fe,stroke:#0284c7,stroke-width:2px,color:#075985
    classDef process fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#5b21b6
    classDef raster fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
    classDef vector fill:#dcfce7,stroke:#22c55e,stroke-width:2px,color:#166534
    classDef storage fill:#fce7f3,stroke:#ec4899,stroke-width:2px,color:#9f1239

    class USFS,CONUS,Buildings input
    class ProcessRegion,WindCalc,Sample process
    class RiskLayers,WindDist raster
    class BuildingRisk vector
    class Icechunk,GeoParquet storage
    

Raster (tensor) datasets#

Raster datasets are gridded geospatial layers stored at 30m resolution in EPSG:4326 (WGS84) projection. These datasets are organized by region and stored in the Icechunk format.

Spatial characteristics#

Property

Value

Resolution

30m (~0.00028 degrees)

Projection

EPSG:4326 (WGS84)

Extent

CONUS

Chunking

Regional chunks

Storage Format

Icechunk (Zarr-based)

Fire risk variables#

The primary output dataset contains the following variables. To support transparency we also include previously published datasests we used as inputs or for comparison. For clarity, we append a variable_{name} modifier to any variable name describing previously-published data. We are the authors of any data without a _{name} modifier.

Core risk variables#

Variable

Type

Units

Description

rps_2011

float32

%

Annual relative risk to potential structures (RPS) for ~2011 climate conditions. Calculated as bp_2011 × crps_scott

rps_2047

float32

%

Annual risk to potential structures (RPS) for ~2047 climate conditions. Calculated as bp_2047 × crps_scott

bp_2011

float32

dimensionless

Annual burn probability for ~2011 climate conditions

bp_2047

float32

dimensionless

Annual burn probability for ~2047 climate conditions

Reference variables (data from USFS and Wildfire Risk to Communities project)#

Variable

Type

Units

Description

rps_scott

float32

%

Annual risk to potential structures from Scott et al., (2024)

crps_scott

float32

%

Conditional risk to potential structures (cRPS) from Scott et al., (2024)

bp_2011_riley

float32

dimensionless

Burn probability for ~2011 from Riley et al. (2025) (RDS-2025-0006)

bp_2047_riley

float32

dimensionless

Burn probability for ~2047 from Riley et al. (2025)

Coordinate variables#

Variable

Type

Description

latitude

float64

Latitude in decimal degrees (WGS84)

longitude

float64

Longitude in decimal degrees (WGS84)

Wind direction distribution dataset#

A separate dataset provides the statistical distribution of wind directions during fire-weather conditions:

Variable

Type

Dimensions

Description

wind_direction_distribution

float32

(latitude, longitude, wind_direction)

Fraction of fire-weather hours coming from each of 8 cardinal/ordinal directions derived from Rasmussen et al., (2023)

Wind direction dimension: The wind_direction coordinate contains 8 direction labels: ['N', 'NE', 'E', 'SE', 'S', 'SW', 'W', 'NW']

Properties:

  • Values sum to 1.0 for all pixels (normalized probability distribution)

  • Derived from CONUS404 data (Rasmussen et al, 2023) using 99th percentile Fosberg Fire Weather Index (FFWI) as threshold

Data processing flow#

        %%{init: {'theme':'neutral', 'themeVariables': {'primaryColor':'#2563eb','primaryTextColor':'#1f2937','primaryBorderColor':'#3b82f6','lineColor':'#6b7280','secondaryColor':'#7c3aed','tertiaryColor':'#10b981','background':'#ffffff','mainBkg':'#f3f4f6','secondBkg':'#e5e7eb','tertiaryBkg':'#d1d5db','primaryTextColor':'#111827','lineColor':'#6b7280','textColor':'#374151','mainContrastColor':'#1f2937','darkMode':false}}}%%
flowchart LR
    %% Input Data
    subgraph Inputs["<b>Input Data</b>"]
        BP_Riley["<b>Riley et al. (2025)</b><br/>Burn Probability<br/>• bp_2011_riley<br/>• bp_2047_riley"]
        CRPS["<b>Scott et al. (2024)</b><br/>Conditional RPS<br/>• crps_scott"]
        WindDist["<b>CONUS404</b><br/>Wind Distribution<br/>• 8 directions<br/>• Fire-weather hours"]
    end

    %% Processing Steps
    subgraph Processing["<b>Wind-Adjustment Processing</b>"]
        Convolve["<b>Directional Convolution</b><br/>Apply elliptical kernel<br/>for each wind direction<br/>30m resolution"]
        Weight["<b>Weighted Composite</b><br/>Combine 8 directions<br/>using wind frequency<br/>as weights"]
    end

    %% Intermediate Results
    WindAdjBP["<b>Wind-Adjusted BP</b><br/>bp_2011, bp_2047<br/>Accounts for directional<br/>fire spread"]

    %% Final Output
    FinalRPS["<b>Risk to Potential Structures</b><br/>rps_2011, rps_2047<br/>Annual risk [%]<br/>Ready for sampling"]

    %% Data Flow
    BP_Riley --> Convolve
    WindDist --> Weight
    Convolve --> Weight
    Weight --> WindAdjBP
    WindAdjBP --> |"Multiply"| FinalRPS
    CRPS --> |"Multiply"| FinalRPS

    %% Styling
    classDef input fill:#e0f2fe,stroke:#0284c7,stroke-width:2px,color:#075985
    classDef process fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#5b21b6
    classDef intermediate fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
    classDef output fill:#dcfce7,stroke:#22c55e,stroke-width:2px,color:#166534

    class BP_Riley,CRPS,WindDist input
    class Convolve,Weight process
    class WindAdjBP intermediate
    class FinalRPS output
    

Vector (polygon) datasets#

Vector datasets contain building-level risk samples stored as a consolidated GeoParquet file covering all buildings across CONUS.

Schema#

Geometry column#

Column

Type

Description

geometry

WKB (Polygon)

Building polygon location in EPSG:4326

Risk attribute columns#

Vector datasets contain the same risk variables as raster datasets, sampled at each building location:

Column

Type

Description

rps_2011

float32

Annual risk to potential structures for ~2011 at building location

rps_2047

float32

Annual risk to potential structures for ~2047 at building location

bp_2011

float32

Annual burn probability for ~2011 at building location

bp_2047

float32

Annual burn probability for ~2047 at building location

rps_scott

float32

Annual risk to potential structures (Scott et al., 2024) at building location

crps_scott

float32

Conditional risk to potential structures (Scott et al., 2024) at building location

bp_2011_riley

float32

Annual burn probability ~2011 (Riley et al, 2025) at building location

bp_2047_riley

float32

Annual burn probability ~2047 (Riley et al, 2025) at building location

Storage characteristics#

Property

Value

Format

GeoParquet (schema version 1.1.0)

Compression

zstd

Geometry Encoding

WKB

Spatial Index

Covering bounding box (bbox)

Coverage

CONUS-wide, single consolidated file

Aggregation

Consolidated from regional processing via DuckDB

Data quality#

  • Buildings with NaN values (outside CONUS) are excluded

  • Building locations sourced from Overture Maps dataset

File location#

The consolidated building dataset is available at:

{building_geoparquet_uri}

This single-file format enables:

  • Efficient CONUS-wide spatial queries

  • Direct access for analysis tools and workflows

  • Simplified data distribution and versioning

Data validation#

Expected value ranges#

Variable

Expected Range

Notes

Risk to potential structures (RPS)

[0, 100]

Annual risk of loss [%] to potential structures. Product of BP and cRPS.

Conditional risk to potential structures (cRPS)

[0, 100]

Risk of loss [%] to a hypothetical structure if it were to burn

Burn probability (BP)

[0, 1]

Annual likelihood [-] of a pixel burning

Wind Distribution

[0, 1]

Sums to 1.0 per pixel (normalized probability distribution across 8 cardinal/ordinal directions)

Quality checks#

  1. Spatial consistency: All raster layers share identical coordinate systems and extents

  2. Missing data: NaN values appear only in unburnable areas (water, urban, etc.)

  3. Normalization: Wind direction distributions sum to 1.0 (within tolerance of 1e-5) where valid

Metadata attributes#

All datasets include descriptive metadata attributes:

  • description: Human-readable description of the variable

  • long_name: Extended variable name

  • units: Physical units (if applicable)

  • composition: Method used for compositing (e.g., “weighted”)

  • direction_labels: Cardinal/ordinal direction labels for wind data

  • weights_source: Source of weights used in calculations

Access patterns#

Raster data#

  • By region: Query specific regional chunks using latitude/longitude slices

  • Full CONUS: Access complete dataset via Icechunk storage

Vector data#

  • Full dataset: Query the consolidated CONUS-wide building dataset

  • Spatial query: Use bounding box attributes for efficient spatial filtering

  • Attribute query: Filter by risk threshold using Parquet predicate pushdown with DuckDB or similar tools

  • Regional subset: Extract specific areas using spatial predicates on latitude/longitude