Project structure#
This page documents the Open Climate Risk (OCR) repository layout and explains the purpose of key directories and files. Use this as a technical reference when contributing code, adding datasets, or extending documentation.
Repository overview#
The OCR platform is organized into distinct layers: the core Python package (ocr/), supporting infrastructure (configuration, deployment, testing), input data management, exploratory research notebooks, and comprehensive documentation. The structure follows best practices for scientific Python projects with emphasis on reproducibility, modularity, and cloud-native execution.
graph TB
subgraph Repository["OCR Repository"]
subgraph Core["Core Package"]
OCR[ocr/]
OCR --> CONFIG[config.py - Configuration models]
OCR --> TYPES[types.py - Type definitions]
OCR --> DATASETS[datasets.py - Data catalog]
OCR --> CONUS[conus404.py - Climate data]
OCR --> UTILS[utils.py - Utilities]
OCR --> TESTING[testing.py - Test helpers]
OCR --> DEPLOY[deploy/]
DEPLOY --> CLI[cli.py - CLI app]
DEPLOY --> MANAGERS[managers.py - Orchestration]
OCR --> PIPELINE[pipeline/]
PIPELINE --> PROCESS[process_region.py]
PIPELINE --> PARTITION[partition.py]
PIPELINE --> STATS[fire_wind_risk_regional_aggregator.py]
PIPELINE --> PYRAMID[create_pyramid.py]
PIPELINE --> PMTILES[create_building_pmtiles.py<br/>create_building_centroid_pmtiles.py<br/>create_regional_pmtiles.py]
PIPELINE --> WRITERS[write_aggregated_region_analysis_files.py]
OCR --> INPUTDS[input_datasets/]
INPUTDS --> INPUTCLI[cli.py - Ingest CLI]
INPUTDS --> INPUTBASE[base.py - Base classes]
INPUTDS --> STORAGE[storage.py - Storage utils]
INPUTDS --> INPUTTENSOR[tensor/ - Tensor ingestion]
INPUTDS --> INPUTVECTOR[vector/ - Vector ingestion]
OCR --> RISKS[risks/]
RISKS --> FIRE[fire.py - Risk models]
end
subgraph Data["Data & Inputs"]
INPUT[input-data/]
INPUT --> TENSOR[tensor/ - CONUS404, USFS fire risk]
INPUT --> VECTOR[vector/ - Buildings, regions, structures]
end
subgraph Research["Research & Exploration"]
NOTEBOOKS[notebooks/]
NOTEBOOKS --> NB1[Wind analysis notebooks]
NOTEBOOKS --> NB2[Fire risk kernels]
NOTEBOOKS --> NB3[Scaling experiments]
end
subgraph Docs["Documentation"]
DOCSDIR[docs/]
DOCSDIR --> HOWTO[how-to/ - Guides]
DOCSDIR --> METHODS[methods/ - Science docs]
DOCSDIR --> REFERENCE[reference/ - API & specs]
end
subgraph Testing["Testing & QA"]
TESTS[tests/]
TESTS --> UNIT[Unit tests]
TESTS --> INTEGRATION[Integration tests]
TESTS --> SNAPSHOTS[Snapshot tests]
end
subgraph Config["Configuration & Build"]
PYPROJECT[pyproject.toml - Package config]
PIXI[pixi.lock - Environment lock]
SPHINX[docs/conf.py - Docs config]
ENV[ocr-*.env - Environment vars]
GITHUB[.github/ - CI/CD workflows]
end
subgraph Infra["Infrastructure"]
BUCKET[bucket_creation/ - S3 setup]
end
end
style Core fill:#e1f5ff
style Data fill:#fff4e1
style Research fill:#f3e8ff
style Docs fill:#e8f5e9
style Testing fill:#ffebee
style Config fill:#fafafa
style Infra fill:#fff9c4
Core package (ocr/)#
Contains all production code organized into logical modules:
Top-level modules#
Module |
Purpose |
|---|---|
|
Pydantic models for storage, chunking, Coiled, and processing configuration |
|
Type definitions and enums (Environment, Platform, RiskType, RegionType) |
|
Catalog abstraction for Zarr and GeoParquet datasets in S3 storage |
|
CONUS404 climate data helpers: load variables, compute humidity, wind transformations |
|
DuckDB utilities, S3 secrets, vector sampling, file transfer helpers |
|
Snapshot testing extensions for xarray and GeoPandas |
|
Rich console instance for pretty terminal output |
Deployment (deploy/)#
Orchestration layer for local and cloud execution:
cli.py- Typer-based CLI application (ocrcommand) with commands for processing regions, aggregation, PMTiles generation, and analysis file creationmanagers.py- Abstract batch manager interface withCoiledBatchManager(cloud) andLocalBatchManager(local) implementations
Pipeline (pipeline/)#
Internal processing modules coordinated by the CLI. These implement the data processing workflow:
process_region.py- Sample risk values to building locationspartition.py- Partition GeoParquet by geographic regionsfire_wind_risk_regional_aggregator.py- Compute regional statistics with DuckDBcreate_pyramid.py- Generate ndpyramid multiscale Zarr for web visualizationcreate_building_pmtiles.py- Generate PMTiles for building footprint visualizationcreate_building_centroid_pmtiles.py- Generate PMTiles for building centroid visualizationcreate_regional_pmtiles.py- Generate PMTiles for regional aggregated statisticswrite_aggregated_region_analysis_files.py- Write regional summary tables for all regions
Risk models (risks/)#
Domain-specific risk calculation logic:
fire.py- Fire/wind risk kernels, wind classification, elliptical spread models
Input datasets (input_datasets/)#
Infrastructure for ingesting and processing input datasets:
cli.py- CLI application for dataset ingestion (ocr ingest-datacommand)base.py- Abstract base classes for dataset processorsstorage.py- Storage utilities for managing dataset filestensor/- Tensor (raster) dataset ingestion modulesvector/- Vector (GeoParquet) dataset ingestion modules
Data management (input-data/)#
Organized storage for input datasets and ingestion scripts:
Tensor data (tensor/)#
conus404/- CONUS404 climate reanalysis data (wind speed, direction, temperature, etc.)
Vector data (vector/)#
alexandre-2016/- Historical fire perimeter datacalfire_stuctures_destroyed/- Structure damage records from CalFire
Note
Raw data files are typically not committed. This directory contains ingestion scripts and metadata. Large datasets are stored on S3.
Research notebooks (notebooks/)#
Exploratory Jupyter notebooks for prototyping and analysis:
conus404-winds.ipynb- Wind data exploration and CONUS404 analysiselliptical_kernel.ipynb- Fire spread kernel developmentevaluating_wind_spreading.ipynb- Wind spreading validationfire-weather-wind-mode-reprojected.ipynb- Wind mode analysiswind_spread.ipynb- Wind-driven fire spread modelingwind-spreading-kernels.ipynb- Wind spread kernel experimentsmethods-figures.ipynb- Generate figures for methodology documentationbenchmarking.ipynb- Performance benchmarking experiments
Note
Convention: When a notebook reaches maturity and demonstrates stable workflows, consider converting it into a how-to guide under docs/how-to/.
Documentation (docs/)#
docs/
├── how-to/ # Task-oriented guides
├── reference/ # Information-oriented technical specs
├── methods/ # Explanation-oriented background
├── assets/ # Images, stylesheets, static files
└── access-data.md # Quick reference for downloads
└── terms-of-data-access.md # Terms that apply to downloads
└── index.md # Documentation home page
Documentation is built with Sphinx using the sphinx-book-theme and deployed automatically to ReadTheDocs on every PR and merge to main.
Testing (tests/)#
Comprehensive test suite with unit and integration tests:
File |
Purpose |
|---|---|
|
Pytest fixtures and configuration |
|
Configuration model validation |
|
CONUS404 data loading and transformations |
|
Dataset catalog and access patterns |
|
Batch manager orchestration logic |
|
Utility function tests |
|
Snapshot-based integration tests for pipeline outputs |
|
Risk model tests |
Test execution:
pixi run tests # Unit tests only
pixi run tests-integration # Integration tests (may require S3 access)
Configuration files#
Package and environment#
pyproject.toml- Project metadata, dependencies (managed by Pixi), build config, tool settings (ruff, pytest, coverage)pixi.lock- Locked dependency versions for reproducible environmentsenvironment.yaml- Conda environment export (auto-generated from Pixi for Coiled deployments)
Documentation#
docs/conf.py- Sphinx configuration: theme, extensions, intersphinx mappings.readthedocs.yaml- ReadTheDocs build configuration
Environment templates#
ocr-local.env- Template for local development (uses local filesystem)ocr-coiled-s3.env- Template for cloud execution (S3 backend)ocr-coiled-s3-staging.env- Staging environment configurationocr-coiled-s3-production.env- Production environment configuration
Code quality#
.pre-commit-config.yaml- Pre-commit hooks for linting and formatting.prettierrc.json- Prettier configuration for Markdown/YAML formattingcodecov.yml- Code coverage reporting configuration
Infrastructure (bucket_creation/)#
Helper scripts for cloud infrastructure setup:
create_s3_bucket.py- Script to create and configure S3 buckets with appropriate permissions and lifecycle policies
CI/CD (.github/)#
GitHub Actions workflows for automated testing, building, and deployment:
workflows/- CI/CD pipeline definitions (tests, linting, docs deployment, releases)scripts/- Helper scripts for environment export and Coiled software creationdependabot.yaml- Automated dependency updates configurationrelease-drafter.yml- Automated release notes generation
Development workflows#
Adding new code#
Create module under
ocr/(or in appropriate subpackage)Add tests under
tests/(unit tests are required, integration tests for complex scenarios)Update documentation:
Add how-to guide if introducing new user-facing workflow
Update API reference if adding public functions/classes
Add method explanation if introducing new scientific approach
Adding new datasets#
Create ingestion script under
input-data/tensor/orinput-data/vector/Register dataset in
ocr.datasetscatalog with metadataDocument provenance: Add new source information to
docs/reference/data-sources.mdDocument ingestion Add information to
docs/how-to/input-dataset-ingestion.mdDocument workflow Add information to
docs/how-to/work-with-input-datasets.md
Updating documentation#
Choose appropriate section based on Diátaxis framework:
How-to guides: task-oriented, assume prior knowledge
Reference: information-oriented, technical specifications
Methods: explanation-oriented, scientific background
Update navigation in
docs/index.mdtoctree if adding new top-level pagesTest locally:
pixi run docs-build && pixi run docs-serveto preview changesSubmit PR: Documentation builds are tested in ReadTheDocs (PR preview link will be posted)
Release workflow#
See Release procedure for detailed release instructions.