API reference#
This page provides a structured, auto-generated reference for the ocr Python package. Each section links to the corresponding module(s) and surfaces docstrings, type hints, and signatures.
Package overview#
High-level package entry points and public exports.
Core modules#
Configuration#
Configuration models for storage, chunking, Coiled, and processing settings.
- class ocr.config.CoiledConfig(_case_sensitive=None, _nested_model_default_partial_update=None, _env_prefix=None, _env_prefix_target=None, _env_file=PosixPath('.'), _env_file_encoding=None, _env_ignore_empty=None, _env_nested_delimiter=None, _env_nested_max_split=None, _env_parse_none_str=None, _env_parse_enums=None, _cli_prog_name=None, _cli_parse_args=None, _cli_settings_source=None, _cli_parse_none_str=None, _cli_hide_none_type=None, _cli_avoid_json=None, _cli_enforce_required=None, _cli_use_class_docs_for_groups=None, _cli_exit_on_error=None, _cli_prefix=None, _cli_flag_prefix_char=None, _cli_implicit_flags=None, _cli_ignore_unknown_args=None, _cli_kebab_case=None, _cli_shortcuts=None, _secrets_dir=None, _build_sources=None, *, tag={'Project': 'OCR'}, forward_aws_credentials=False, spot_policy='spot_with_fallback', region='us-west-2', ntasks=1, vm_type='m8g.2xlarge', scheduler_vm_type='m8g.2xlarge')[source]#
Bases:
BaseSettings- model_config = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_shortcuts': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': None, 'env_file_encoding': None, 'env_ignore_empty': False, 'env_nested_delimiter': None, 'env_nested_max_split': None, 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': 'ocr_coiled_', 'env_prefix_target': 'variable', 'extra': 'forbid', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_default': True, 'yaml_config_section': None, 'yaml_file': None, 'yaml_file_encoding': None}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ocr.config.ChunkingConfig(_case_sensitive=None, _nested_model_default_partial_update=None, _env_prefix=None, _env_prefix_target=None, _env_file=PosixPath('.'), _env_file_encoding=None, _env_ignore_empty=None, _env_nested_delimiter=None, _env_nested_max_split=None, _env_parse_none_str=None, _env_parse_enums=None, _cli_prog_name=None, _cli_parse_args=None, _cli_settings_source=None, _cli_parse_none_str=None, _cli_hide_none_type=None, _cli_avoid_json=None, _cli_enforce_required=None, _cli_use_class_docs_for_groups=None, _cli_exit_on_error=None, _cli_prefix=None, _cli_flag_prefix_char=None, _cli_implicit_flags=None, _cli_ignore_unknown_args=None, _cli_kebab_case=None, _cli_shortcuts=None, _secrets_dir=None, _build_sources=None, *, chunks=None, debug=False)[source]#
Bases:
BaseSettings- model_config = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_shortcuts': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': None, 'env_file_encoding': None, 'env_ignore_empty': False, 'env_nested_delimiter': None, 'env_nested_max_split': None, 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': 'ocr_chunking_', 'env_prefix_target': 'variable', 'extra': 'forbid', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_default': True, 'yaml_config_section': None, 'yaml_file': None, 'yaml_file_encoding': None}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_ChunkingConfig__context)[source]#
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- property extent_as_tuple_5070[source]#
5070 projection as tuple (xmin, xmax, ymin, ymax)
- Type:
Get extent in EPSG
- property valid_region_ids: list[source]#
Generate valid region IDs by checking which regions contain non-null data.
- Returns:
List of valid region IDs (e.g., ‘y1_x3’, ‘y2_x4’, etc.)
- Return type:
- region_id_chunk_lookup(region_id)[source]#
given a region_id, ex: ‘y5_x14, returns the corresponding chunk (5, 14)
- region_id_slice_lookup(region_id)[source]#
given a region_id, ex: ‘y5_x14, returns the corresponding x,y slices. ex: (slice(np.int64(30000), np.int64(36000), None), slice(np.int64(85500), np.int64(90000), None))
- region_id_to_latlon_slices(region_id)[source]#
Get latitude and longitude slices from region_id
Returns (lat_slice, lon_slice) where lat_slice.start < lat_slice.stop and lon_slice.start < lon_slice.stop (lower-left origin, lat ascending).
- get_chunk_mapping()[source]#
Returns a dict of region_ids and their corresponding chunk_indexes.
- Returns:
chunk_mapping – Dictionary with region IDs as keys and corresponding chunk indexes (iy, ix) as values
- Return type:
- plot_all_chunks(color_by_size=False)[source]#
Plot all data chunks across the entire CONUS with their indices as labels
- Parameters:
color_by_size (bool, default False) – If True, color chunks based on their size (useful to identify irregularities)
- bbox_from_wgs84(xmin, ymin, xmax, ymax)[source]#
https://observablehq.com/@rdmurphy/u-s-state-bounding-boxes
- visualize_chunks_on_conus(chunks=None, color_by_size=False, highlight_chunks=None, include_all_chunks=False)[source]#
Visualize specified chunks on CONUS map
- Parameters:
chunks (list of tuples, optional) – List of (iy, ix) tuples specifying chunks to visualize If None, will show all chunks
color_by_size (bool, default False) – If True, color chunks based on their size
highlight_chunks (list of tuples, optional) – List of (iy, ix) tuples specifying chunks to highlight
include_all_chunks (bool, default False) – If True, show all chunks in background with low opacity
- class ocr.config.PyramidConfig(_case_sensitive=None, _nested_model_default_partial_update=None, _env_prefix=None, _env_prefix_target=None, _env_file=PosixPath('.'), _env_file_encoding=None, _env_ignore_empty=None, _env_nested_delimiter=None, _env_nested_max_split=None, _env_parse_none_str=None, _env_parse_enums=None, _cli_prog_name=None, _cli_parse_args=None, _cli_settings_source=None, _cli_parse_none_str=None, _cli_hide_none_type=None, _cli_avoid_json=None, _cli_enforce_required=None, _cli_use_class_docs_for_groups=None, _cli_exit_on_error=None, _cli_prefix=None, _cli_flag_prefix_char=None, _cli_implicit_flags=None, _cli_ignore_unknown_args=None, _cli_kebab_case=None, _cli_shortcuts=None, _secrets_dir=None, _build_sources=None, *, environment=Environment.QA, version=None, storage_root, output_prefix=None, debug=False)[source]#
Bases:
BaseSettingsConfiguration for visualization pyramid / multiscales
- environment: Environment#
- model_config = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_shortcuts': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': None, 'env_file_encoding': None, 'env_ignore_empty': False, 'env_nested_delimiter': None, 'env_nested_max_split': None, 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': 'ocr_vector_', 'env_prefix_target': 'variable', 'extra': 'forbid', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_default': True, 'yaml_config_section': None, 'yaml_file': None, 'yaml_file_encoding': None}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_PyramidConfig__context)[source]#
Post-initialization to set up prefixes and URIs based on environment.
- property pyramid_uri: UPath#
- class ocr.config.VectorConfig(_case_sensitive=None, _nested_model_default_partial_update=None, _env_prefix=None, _env_prefix_target=None, _env_file=PosixPath('.'), _env_file_encoding=None, _env_ignore_empty=None, _env_nested_delimiter=None, _env_nested_max_split=None, _env_parse_none_str=None, _env_parse_enums=None, _cli_prog_name=None, _cli_parse_args=None, _cli_settings_source=None, _cli_parse_none_str=None, _cli_hide_none_type=None, _cli_avoid_json=None, _cli_enforce_required=None, _cli_use_class_docs_for_groups=None, _cli_exit_on_error=None, _cli_prefix=None, _cli_flag_prefix_char=None, _cli_implicit_flags=None, _cli_ignore_unknown_args=None, _cli_kebab_case=None, _cli_shortcuts=None, _secrets_dir=None, _build_sources=None, *, environment=Environment.QA, version=None, storage_root, prefix=None, output_prefix=None, debug=False, metadata=None)[source]#
Bases:
BaseSettingsConfiguration for vector data processing.
- environment: Environment#
- model_config = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_shortcuts': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': None, 'env_file_encoding': None, 'env_ignore_empty': False, 'env_nested_delimiter': None, 'env_nested_max_split': None, 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': 'ocr_vector_', 'env_prefix_target': 'variable', 'extra': 'forbid', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_default': True, 'yaml_config_section': None, 'yaml_file': None, 'yaml_file_encoding': None}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_VectorConfig__context)[source]#
Post-initialization to set up prefixes and URIs based on environment.
- property building_geoparquet_uri: UPath#
- class ocr.config.IcechunkConfig(_case_sensitive=None, _nested_model_default_partial_update=None, _env_prefix=None, _env_prefix_target=None, _env_file=PosixPath('.'), _env_file_encoding=None, _env_ignore_empty=None, _env_nested_delimiter=None, _env_nested_max_split=None, _env_parse_none_str=None, _env_parse_enums=None, _cli_prog_name=None, _cli_parse_args=None, _cli_settings_source=None, _cli_parse_none_str=None, _cli_hide_none_type=None, _cli_avoid_json=None, _cli_enforce_required=None, _cli_use_class_docs_for_groups=None, _cli_exit_on_error=None, _cli_prefix=None, _cli_flag_prefix_char=None, _cli_implicit_flags=None, _cli_ignore_unknown_args=None, _cli_kebab_case=None, _cli_shortcuts=None, _secrets_dir=None, _build_sources=None, *, environment=Environment.QA, version=None, storage_root, prefix=None, debug=False, metadata=None)[source]#
Bases:
BaseSettingsConfiguration for icechunk processing.
- environment: Environment#
- model_post_init(_IcechunkConfig__context)[source]#
Post-initialization to set up prefixes and URIs based on environment.
- repo_and_session(readonly=False, branch='main')[source]#
Open an icechunk repository and return the session.
- commit_messages_ancestry(branch='main')[source]#
Get the commit messages ancestry for the icechunk repository.
- processed_regions(*, branch='main')[source]#
Get a list of region IDs that have already been processed.
- insert_region_uncooperative(subset_ds, *, region_id, branch='main')[source]#
Insert region into Icechunk store
- pretty_paths()[source]#
Pretty print key IcechunkConfig paths and URIs.
This version touches cached properties (e.g., uri, storage) to surface real configuration and types.
- model_config = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_shortcuts': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': None, 'env_file_encoding': None, 'env_ignore_empty': False, 'env_nested_delimiter': None, 'env_nested_max_split': None, 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': '', 'env_prefix_target': 'variable', 'extra': 'forbid', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_default': True, 'yaml_config_section': None, 'yaml_file': None, 'yaml_file_encoding': None}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ocr.config.RegionIDStatus(provided_region_ids: set[str], valid_region_ids: set[str], invalid_region_ids: set[str], processed_region_ids: set[str], previously_processed_ids: set[str], unprocessed_valid_region_ids: set[str])[source]#
Bases:
object
- class ocr.config.OCRConfig(_case_sensitive=None, _nested_model_default_partial_update=None, _env_prefix=None, _env_prefix_target=None, _env_file=PosixPath('.'), _env_file_encoding=None, _env_ignore_empty=None, _env_nested_delimiter=None, _env_nested_max_split=None, _env_parse_none_str=None, _env_parse_enums=None, _cli_prog_name=None, _cli_parse_args=None, _cli_settings_source=None, _cli_parse_none_str=None, _cli_hide_none_type=None, _cli_avoid_json=None, _cli_enforce_required=None, _cli_use_class_docs_for_groups=None, _cli_exit_on_error=None, _cli_prefix=None, _cli_flag_prefix_char=None, _cli_implicit_flags=None, _cli_ignore_unknown_args=None, _cli_kebab_case=None, _cli_shortcuts=None, _secrets_dir=None, _build_sources=None, *, environment=Environment.QA, version=None, storage_root, vector=None, icechunk=None, pyramid=None, chunking=None, coiled=None, debug=False)[source]#
Bases:
BaseSettingsConfiguration settings for OCR processing.
- environment: Environment#
- vector: VectorConfig | None#
- icechunk: IcechunkConfig | None#
- pyramid: PyramidConfig | None#
- chunking: ChunkingConfig | None#
- coiled: CoiledConfig | None#
- model_config = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_shortcuts': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': None, 'env_file_encoding': None, 'env_ignore_empty': False, 'env_nested_delimiter': None, 'env_nested_max_split': None, 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': 'ocr_', 'env_prefix_target': 'variable', 'extra': 'forbid', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_default': True, 'yaml_config_section': None, 'yaml_file': None, 'yaml_file_encoding': None}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_OCRConfig__context)[source]#
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- pretty_paths()[source]#
Pretty print key OCRConfig paths and URIs.
This method intentionally touches cached properties that create directories (e.g., via mkdir) so you can verify real locations.
- resolve_region_ids(provided_region_ids, *, allow_all_processed=False)[source]#
Validate provided region IDs against valid + processed sets.
- Parameters:
provided_region_ids (set[str]) – The set of region IDs to validate.
allow_all_processed (bool, optional) – If True, don’t raise an error when all regions are already processed. This is useful for production reruns where you want to regenerate vector outputs even if icechunk regions are complete. Default is False.
- Returns:
Status object with validation results.
- Return type:
- Raises:
ValueError – If no valid unprocessed region IDs remain and allow_all_processed is False.
- select_region_ids(region_ids, *, all_region_ids=False, allow_all_processed=False)[source]#
Helper to pick the effective set of region IDs (all or user-provided) and return the validated status object.
- Parameters:
region_ids (list[str] | None) – User-provided region IDs to process.
all_region_ids (bool, optional) – If True, use all valid region IDs instead of user-provided ones. Default is False.
allow_all_processed (bool, optional) – If True, don’t raise an error when all regions are already processed. Passed through to resolve_region_ids. Default is False.
- Returns:
Status object with validation results.
- Return type:
Type definitions#
Strongly typed enums for environment, platform, and risk types.
Data access#
Datasets#
Dataset and Catalog abstractions for Zarr and GeoParquet on S3/local storage.
- class ocr.datasets.Dataset(*, name, description, bucket, prefix, data_format, version='v1', license=None)[source]#
Bases:
BaseModelBase class for datasets.
- to_xarray(*, is_icechunk=None, xarray_open_kwargs=None, xarray_storage_options=None)[source]#
Convert the dataset to an xarray.Dataset.
- Parameters:
is_icechunk (bool | None, default None) – Whether to use icechunk to access the data. - If True: only try using icechunk - If None: try icechunk first, fall back to direct S3 access if it fails - If False: only use direct S3 access
xarray_open_kwargs (dict, optional) – Additional keyword arguments to pass to xarray.open_dataset.
xarray_storage_options (dict, optional) – Storage options for S3 access when not using icechunk.
- Returns:
The opened dataset.
- Return type:
xr.Dataset
- Raises:
ValueError – If the dataset is not in ‘zarr’ format.
FileNotFoundError – If the dataset cannot be found or accessed.
- query_geoparquet(query=None, *, install_extensions=True)[source]#
Query a geoparquet file using DuckDB.
- Parameters:
- Returns:
Result of the DuckDB query.
- Return type:
duckdb.DuckDBPyRelation
- Raises:
ValueError – If dataset is not in ‘geoparquet’ format.
Example
Example of querying buildings with a converted geometry column:
>>> buildings = catalog.get_dataset('conus-overture-buildings', 'v2025-03-19.1') >>> result = buildings.query_geoparquet(""" ... SELECT ... id, ... roof_material, ... geometry ... FROM read_parquet('{s3_path}') ... WHERE roof_material = 'concrete' ... """) >>> # Then convert to GeoDataFrame >>> gdf = buildings.to_geopandas(""" ... SELECT ... id, ... roof_material, ... geometry ... FROM read_parquet('{s3_path}') ... WHERE roof_material = 'concrete' ... """)
- to_geopandas(query=None, geometry_column='geometry', crs='EPSG:4326', target_crs=None, **kwargs)[source]#
Convert query results to a GeoPandas GeoDataFrame.
- Parameters:
query (str, optional) – SQL query to execute. If not provided, returns all data.
geometry_column (str, default 'geometry') – The name of the geometry column in the query result.
crs (str, default 'EPSG:4326') – The coordinate reference system to use for the geometries.
target_crs (str, optional) – The target coordinate reference system to convert the geometries to.
**kwargs (dict) – Additional keyword arguments passed to query_geoparquet.
- Returns:
A GeoPandas GeoDataFrame containing the queried data with geometries.
- Return type:
gpd.GeoDataFrame
- Raises:
ValueError – If dataset is not in ‘geoparquet’ format or if the geometry column is not found.
Example
Example of converting buildings to GeoPandas GeoDataFrame - no need for ST_AsText(): >>> buildings = catalog.get_dataset(‘conus-overture-buildings’, ‘v2025-03-19.1’) >>> gdf = buildings.to_geopandas(“”” … SELECT … id, … roof_material, … geometry … FROM read_parquet(‘{s3_path}’) … WHERE roof_material = ‘concrete’ … “””) >>> gdf.head()
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ocr.datasets.Catalog(*, datasets)[source]#
Bases:
BaseModelBase class for datasets catalog.
- get_dataset(name, version=None, *, case_sensitive=True, latest=False)[source]#
Get a dataset by name and optionally version.
- Parameters:
name (str) – Name of the dataset to retrieve
version (str, optional) – Specific version of the dataset. If not provided, returns the dataset if only one version exists, or raises an error if multiple versions exist, unless get_latest=True.
case_sensitive (bool, default True) – Whether to match dataset names case-sensitively
latest (bool, default False) – If True and version=None, returns the latest version instead of raising an error when multiple versions exist
- Returns:
The matched dataset
- Return type:
- Raises:
ValueError – If multiple versions exist and version is not specified (and latest=False)
KeyError – If no matching dataset is found
Examples
>>> # Get a dataset with a specific version >>> catalog.get_dataset('conus-overture-buildings', 'v2025-03-19.1') >>> >>> # Get latest version of a dataset >>> catalog.get_dataset('conus-overture-buildings', get_latest=True)
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
CONUS404 helpers#
Load CONUS404 variables, compute relative humidity, wind rotation and diagnostics. Geographic selection utilities (point/bbox) with CRS-aware transforms.
- ocr.conus404.load_conus404(add_spatial_constants=True)[source]#
Load the CONUS 404 dataset.
- Parameters:
add_spatial_constants (bool, optional) – If True, adds spatial constant variables (SINALPHA, COSALPHA) to the dataset.
- Returns:
ds – The CONUS 404 dataset.
- Return type:
xr.Dataset
- ocr.conus404.compute_relative_humidity(ds)[source]#
Compute relative humidity from specific humidity, temperature, and pressure.
- Parameters:
ds (xr.Dataset) – Input dataset containing ‘Q2’ (specific humidity), ‘T2’ (temperature in K), and ‘PSFC’ (pressure in Pa).
- Returns:
hurs – Relative humidity as a percentage.
- Return type:
xr.DataArray
- ocr.conus404.rotate_winds_to_earth(ds)[source]#
Rotate grid-relative 10 m winds (U10,V10) to earth-relative components. Uses SINALPHA / COSALPHA convention from WRF.
- ocr.conus404.compute_wind_speed_and_direction(u10, v10)[source]#
Derive hourly wind speed (m/s) and direction (degrees from) using xclim.
- Parameters:
u10 (xr.DataArray) – U component of wind at 10 m (m/s).
v10 (xr.DataArray) – V component of wind at 10 m (m/s).
- Returns:
wind_ds – Dataset containing wind speed (‘sfcWind’) and wind direction (‘sfcWindfromdir’).
- Return type:
xr.Dataset
Utilities#
General utilities#
Helpers for DuckDB (extension loading, S3 secrets), vector sampling, and file transfer.
- ocr.utils.get_temp_dir()[source]#
Get optimal temporary directory path for the current environment.
Returns the current working directory if running in /scratch (e.g., on Coiled clusters), otherwise returns None to use the system default temp directory.
On Coiled clusters, /scratch is bind-mounted directly to the NVMe disk, avoiding Docker overlay filesystem overhead and providing better I/O performance and more available space compared to /tmp which sits on the Docker overlay.
- Returns:
Current working directory if in /scratch, None otherwise (uses system default).
- Return type:
Path | None
Examples
>>> import tempfile >>> from ocr.utils import get_temp_dir >>> with tempfile.TemporaryDirectory(dir=get_temp_dir()) as tmpdir: ... # tmpdir will be in /scratch on Coiled, system temp otherwise ... pass
- ocr.utils.apply_s3_creds(region='us-west-2', *, con=None)[source]#
Register AWS credentials as a DuckDB SECRET on the given connection.
- Parameters:
region (str) – AWS region used for S3 access.
con (duckdb.DuckDBPyConnection | None) – Connection to apply credentials to. If None, uses duckdb’s default connection (duckdb.sql), preserving prior behavior.
- ocr.utils.install_load_extensions(aws=True, spatial=True, httpfs=True, con=None)[source]#
Installs and applies duckdb extensions.
- Parameters:
aws (bool, optional) – Install and load AWS extension, by default True
spatial (bool, optional) – Install and load SPATIAL extension, by default True
httpfs (bool, optional) – Install and load HTTPFS extension, by default True
con (duckdb.DuckDBPyConnection | None) – Connection to apply extensions to. If None, uses duckdb’s default
- ocr.utils.extract_points(gdf, da)[source]#
Extract/sample points from a GeoDataFrame to an Xarray DataArray.
- Parameters:
gdf (gpd.GeoDataFrame) – Input geopandas GeoDataFrame. Geometry should be points
da (xr.DataArray) – Input Xarray DataArray
- Returns:
DataArray with geometry sampled
- Return type:
xr.DataArray
Notes
UserWarning: Geometry is in a geographic CRS. Results from ‘centroid’ are likely incorrect. Use ‘GeoSeries.to_crs()’ to re-project geometries to a projected CRS before this operation.
The relatively small size of a building footprint should account for a very small shift in the centroid when calculating from EPSG:4326 vs EPSG:5070.
TODO: Should/can this be a DataArray for typing
- ocr.utils.bbox_tuple_from_xarray_extent(ds, x_name='x', y_name='y')[source]#
Creates a bounding box from an Xarray Dataset extent.
- ocr.utils.copy_or_upload(src, dest, overwrite=True, chunk_size=16777216)[source]#
Copy a single file from src to dest using UPath/fsspec. - Uses server-side copy if available on the same filesystem (e.g., s3->s3). - Falls back to streaming copy otherwise. - Creates destination parent directories when supported.
- ocr.utils.geo_sel(ds, *, lon=None, lat=None, bbox=None, method='nearest', tolerance=None, crs_wkt=None)[source]#
Geographic selection helper.
- Exactly one of:
(lon AND lat)
(lons AND lats)
bbox=(west, south, east, north)
- Parameters:
ds (xr.Dataset) – Input dataset with x, y coordinates and a valid ‘crs’ variable with WKT
lon (float, optional) – Longitude of point to select, by default None
lat (float, optional) – Latitude of point to select, by default None
bbox (tuple, optional) – Bounding box to select (west, south, east, north), by default None
method (str, optional) – Method to use for point selection, by default ‘nearest’
tolerance (float, optional) – Tolerance (in units of the dataset’s CRS) for point selection, by default None
crs_wkt (str, optional) – WKT string for the dataset’s CRS. If None, attempts to read from ds.crs.attrs[‘crs_wkt’].
- Returns:
Single point: time dimension only Multiple points: adds ‘point’ dimension BBox: retains y, x subset
- Return type:
Testing utilities#
Snapshot testing extensions for xarray and GeoPandas.
- class ocr.testing.XarraySnapshotExtension[source]#
Bases:
SingleFileSnapshotExtensionSnapshot extension for xarray DataArrays and Datasets stored as zarr.
Supports both local and remote (S3) storage via environment variable configuration: - SNAPSHOT_STORAGE_PATH: Base path for snapshots (local or s3://bucket/path)
Default: s3://carbonplan-scratch/snapshots (configured in tests/conftest.py)
Examples
# Use default S3 storage (no env var needed) pytest tests/test_snapshot.py –snapshot-update
# Override with local storage SNAPSHOT_STORAGE_PATH=tests/__snapshots__ pytest tests/
# Override with different S3 bucket SNAPSHOT_STORAGE_PATH=s3://my-bucket/snapshots pytest tests/
- file_extension = 'zarr'#
- classmethod get_snapshot_name(*, test_location, index=0)[source]#
Generate snapshot name based on test name.
Sanitizes the test name to replace problematic characters (e.g., brackets from parametrized tests) with underscores for valid file paths.
- classmethod get_location(*, test_location, index=0)[source]#
Get the full snapshot location path.
Override to properly handle S3 paths using upath instead of os.path.join.
- serialize(data, **kwargs)[source]#
Convert DataArray to Dataset for consistent zarr storage. Returns the data unchanged.
- matches(*, serialized_data, snapshot_data)[source]#
Check if serialized data matches snapshot using approximate comparison.
Uses assert_allclose instead of assert_equal to handle platform-specific numerical differences from OpenCV and scipy operations between macOS and Linux.
- read_snapshot_data_from_location(*, snapshot_location, snapshot_name, session_id)[source]#
Read zarr snapshot from disk.
- class ocr.testing.GeoDataFrameSnapshotExtension[source]#
Bases:
SingleFileSnapshotExtensionSnapshot extension for GeoPandas GeoDataFrames stored as parquet.
Supports both local and remote (S3) storage via environment variable configuration: - SNAPSHOT_STORAGE_PATH: Base path for snapshots (local or s3://bucket/path)
Default: s3://carbonplan-scratch/snapshots (configured in tests/conftest.py)
Examples
# Use default S3 storage (no env var needed) pytest tests/test_snapshot.py –snapshot-update
# Override with local storage SNAPSHOT_STORAGE_PATH=tests/__snapshots__ pytest tests/
# Override with different S3 bucket SNAPSHOT_STORAGE_PATH=s3://my-bucket/snapshots pytest tests/
- file_extension = 'parquet'#
- classmethod get_snapshot_name(*, test_location, index=0)[source]#
Generate snapshot name based on test name.
Sanitizes the test name to replace problematic characters (e.g., brackets from parametrized tests) with underscores for valid file paths.
- classmethod get_location(*, test_location, index=0)[source]#
Get the full snapshot location path.
Override to properly handle S3 paths using upath instead of os.path.join.
- serialize(data, **kwargs)[source]#
Validate that data is a GeoDataFrame. Returns the data unchanged.
- matches(*, serialized_data, snapshot_data)[source]#
Check if serialized data matches snapshot using GeoDataFrame comparison.
- read_snapshot_data_from_location(*, snapshot_location, snapshot_name, session_id)[source]#
Read parquet snapshot from disk.
Risk analysis#
Fire risk#
Core fire/wind risk utilities used by the pipeline (kernels, wind classification, risk composition).
- ocr.risks.fire.haversine(lon1, lat1, lon2, lat2)[source]#
Calculate the great circle distance in meters between two points on the earth (specified in decimal degrees).
Uses the haversine formula from: https://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points
- ocr.risks.fire.get_grid_spacing_info(da)[source]#
Extract grid spacing information from a DataArray.
Returns the center coordinates and the spacing (in degrees) between pixels for latitude and longitude dimensions.
- Parameters:
da (xr.DataArray) – DataArray with latitude and longitude coordinates.
- Returns:
latitude (float) – Latitude at the center of the grid.
longitude (float) – Longitude at the center of the grid.
latitude_increment (float) – Spacing between latitude pixels in degrees.
longitude_increment (float) – Spacing between longitude pixels in degrees.
- ocr.risks.fire.generate_weights(method='skewed', kernel_size=81.0, circle_diameter=35.0, direction='W', lat_pixel_size_meters=34, lon_pixel_size_meters=25)[source]#
Generate a 2D array of weights for a kernel.
- Parameters:
method (str, optional) – The method to use for generating weights. Options are ‘skewed’ or ‘circular_focal_mean’. ‘skewed’ generates an elliptical kernel to simulate wind directionality. ‘circular_focal_mean’ generates a circular kernel, by default ‘skewed’
kernel_size (float, optional) – The size of the kernel, by default 81.0
circle_diameter (float, optional) – The diameter of the circle, by default 35.0
direction (str, optional) – Wind direction (‘N’, ‘NE’, ‘E’, ‘SE’, ‘S’, ‘SW’, ‘W’, ‘NW’), by default ‘W’
lat_pixel_size_meters (float, optional) – Physical size of one pixel in the latitude direction in meters, by default 34
lon_pixel_size_meters (float, optional) – Physical size of one pixel in the longitude direction in meters, by default 25
- Returns:
weights – A 2D array of weights for the circular kernel.
- Return type:
np.ndarray
- ocr.risks.fire.generate_wind_directional_kernels(kernel_size=81.0, circle_diameter=35.0, latitude=38.0, longitude=-100, longitude_increment=0.0003, latitude_increment=0.0003)[source]#
Generate a dictionary of 2D arrays of weights for elliptical kernels oriented in different directions.
- Parameters:
- Returns:
kernels – A dictionary of 2D arrays of weights for elliptical kernels oriented in different directions.
- Return type:
- ocr.risks.fire.apply_wind_directional_convolution(da, iterations=3, kernel_size=81.0, circle_diameter=35.0, latitude=34.0, longitude=100.0, latitude_increment=0.0003, longitude_increment=0.0003)[source]#
Apply a directional convolution to a DataArray.
- Parameters:
da (xr.DataArray) – The DataArray to apply the convolution to.
iterations (int, optional) – The number of iterations to apply the convolution, by default 3
kernel_size (float, optional) – The size of the kernel, by default 81.0
circle_diameter (float, optional) – The diameter of the circle, by default 35.0
- Returns:
ds – The Dataset with the directional convolution applied
- Return type:
xr.Dataset
- ocr.risks.fire.classify_wind_directions(wind_direction_ds)[source]#
Classify wind directions into 8 cardinal directions (0-7). The classification is:
0: North (337.5-22.5) 1: Northeast (22.5-67.5) 2: East (67.5-112.5) 3: Southeast (112.5-157.5) 4: South (157.5-202.5) 5: Southwest (202.5-247.5) 6: West (247.5-292.5) 7: Northwest (292.5-337.5)
- Parameters:
wind_direction_ds (xarray.DataArray) – DataArray containing wind direction in degrees (0-360)
- Returns:
result – DataArray with wind directions classified as integers 0-7
- Return type:
- ocr.risks.fire.create_weighted_composite_bp_map(bp, wind_direction_distribution, *, distribution_direction_dim='wind_direction', weight_sum_tolerance=1e-05)[source]#
Create a weighted composite burn probability map using wind direction distribution.
- Parameters:
bp (xr.Dataset) – Dataset containing 9 directional burn probability layers with variables named [‘N’,’NE’,’E’,’SE’,’S’,’SW’,’W’,’NW’,’circular’] produced by apply_wind_directional_convolution.
wind_direction_distribution (xr.DataArray) – Probability distribution over 8 cardinal directions with dimension ‘wind_direction’ and length 8, matching direction labels: [‘N’,’NE’,’E’,’SE’,’S’,’SW’,’W’,’NW’] (order must align). Values should sum to 1 where fire-weather hours exist; may be all 0 where none exist.
distribution_direction_dim (str, optional) – Name of the dimension in wind_direction_distribution that holds the direction labels, by default ‘wind_direction’.
weight_sum_tolerance (float, optional) – Tolerance for deviation from 1.0 in the sum of weights, by default
- Returns:
weighted – Weighted composite burn probability with same spatial dims as inputs. Name: ‘wind_weighted_bp’. Missing (all-zero) distributions yield NaN.
- Return type:
xr.DataArray
- ocr.risks.fire.create_wind_informed_burn_probability(wind_direction_distribution_30m_4326, riley_270m_5070)[source]#
Create wind-informed burn probability dataset by applying directional convolution and creating a weighted composite burn probability map.
- Parameters:
wind_direction_distribution_30m_4326 (xr.DataArray) – Wind direction distribution data at 30m resolution in EPSG:4326 projection.
riley_270m_5070 (xr.DataArray) – Riley et al. (2011) burn probability data at 270m resolution in EPSG:5070 projection.
- Returns:
smoothed_final_bp – Smoothed wind-informed burn probability data at 30m resolution in EPSG:4326 projection.
- Return type:
xr.DataArray
- ocr.risks.fire.calculate_wind_adjusted_risk(*, x_slice, y_slice, buffer=0.15)[source]#
Calculate wind-adjusted fire risk using climate run and wildfire risk datasets.
- Parameters:
x_slice (slice) – Slice object for selecting longitude range.
y_slice (slice) – Slice object for selecting latitude range.
buffer (float, optional) – Buffer size in degrees to add around the region for edge effect handling (default 0.15). For 30m EPSG:4326 data, 0.15 degrees ≈ 16.7 km ≈ 540 pixels. This buffer ensures neighborhood operations (convolution, Gaussian smoothing) have adequate context at boundaries.
- Returns:
fire_risk – Dataset containing wind-adjusted fire risk variables.
- Return type:
xr.Dataset
- ocr.risks.fire.direction_histogram(data_array)[source]#
Compute direction histogram on xarray DataArray with dask chunks.
- Parameters:
data_array (xarray.DataArray) – Input data array containing direction indices (expected to be integers 0-7)
- Returns:
Normalized histogram counts as a probability distribution
- Return type:
- ocr.risks.fire.fosberg_fire_weather_index(hurs, T2, sfcWind)[source]#
Calculate the Fosberg Fire Weather Index (FFWI) based on relative humidity, temperature, and wind speed. taken from https://wikifire.wsl.ch/tiki-indexb1d5.html?page=Fosberg+fire+weather+index&structure=Fire hurs, T2, sfcWind are arrays
- Parameters:
hurs (xr.DataArray) – Relative humidity in percentage (0-100).
T2 (xr.DataArray) – Temperature
sfcWind (xr.DataArray) – Wind speed in meters per second.
- Returns:
Fosberg Fire Weather Index (FFWI).
- Return type:
xr.DataArray
- ocr.risks.fire.compute_wind_direction_distribution(direction, fire_weather_mask)[source]#
Compute the wind direction distribution during fire weather conditions.
- Parameters:
direction (xr.DataArray) – Wind direction in degrees (0-360).
fire_weather_mask (xr.DataArray) – Boolean mask indicating fire weather conditions.
- Returns:
wind_direction_hist – Wind direction histogram during fire weather conditions.
- Return type:
xr.Dataset
- ocr.risks.fire.compute_modal_wind_direction(distribution)[source]#
Compute the modal wind direction from the wind direction distribution.
- Parameters:
distribution (xr.DataArray) – Wind direction distribution.
- Returns:
mode – Modal wind direction.
- Return type:
xr.Dataset
- ocr.risks.fire.rps_to_score(rps)[source]#
Convert RPS (Risk Percent to Structures) value(s) to a categorical fire risk score.
The scoring system uses 11 categories (0–10) with bin boundaries designed so that higher scores are increasingly rare: each higher score encompasses a progressively smaller share of the building population.
- Parameters:
rps (float or array-like) – RPS value(s) in percent. Must be in the range [0, 100].
- Returns:
Risk score(s) in the range [0, 10].
- Return type:
int or numpy.ndarray
Examples
>>> rps_to_score(0.0) 0 >>> rps_to_score(0.005) 1 >>> rps_to_score(100.0) 10 >>> import numpy as np >>> rps_to_score(np.array([0.0, 0.015, 3.5])) array([ 0, 2, 10])
Internal pipeline modules#
Internal API
These modules are used internally by the pipeline and are not intended for direct public consumption. They are documented here for completeness and advanced use cases.
Batch managers#
Orchestration backends for local and Coiled execution.
- class ocr.deploy.managers.AbstractBatchManager(*, debug=False)[source]#
Bases:
BaseModelAbstract base class for batch managers.
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ocr.deploy.managers.CoiledBatchManager(*, debug=False, status_check_int=10, job_limit=1000, job_ids=<factory>)[source]#
Bases:
AbstractBatchManagerCoiled batch manager for managing batch jobs.
- wait_for_completion(exit_on_failure=False)[source]#
Wait for all tracked jobs to complete.
- Parameters:
exit_on_failure (bool, default False) – If True, raise an Exception immediately when a job failure is detected.
- Returns:
completed, failed – A tuple of (completed_job_ids, failed_job_ids). If
exit_on_failureis True and a failure is encountered the method will raise before returning.- Return type:
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ocr.deploy.managers.LocalBatchManager(*, debug=False, status_check_int=1, max_workers=4, jobs=<factory>)[source]#
Bases:
AbstractBatchManagerLocal batch manager for running jobs locally using subprocess.
- model_post_init(_LocalBatchManager__context)[source]#
Initialize the thread pool executor after model creation.
- wait_for_completion(exit_on_failure=False)[source]#
Wait for all tracked jobs to complete.
- Parameters:
exit_on_failure (bool, default False) – If True, raise an Exception immediately when a job failure is detected.
- Returns:
completed, failed – A tuple of (completed_job_ids, failed_job_ids). If
exit_on_failureis True and a failure is encountered the method will raise before returning.- Return type:
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
CLI application#
Command-line interface exposed as the ocr command. For detailed usage and options, see the Data Pipeline guide.
ocr#
Run OCR deployment pipeline on Coiled
Usage
ocr [OPTIONS] COMMAND [ARGS]...
Options
- --install-completion#
Install completion for the current shell.
- --show-completion#
Show completion for the current shell, to copy it or customize the installation.
aggregate-region-risk-summary-stats#
Generate time-horizon based statistical summaries for county and tract level PMTiles creation
Usage
ocr aggregate-region-risk-summary-stats [OPTIONS]
Options
- -e, --env-file <env_file>#
Path to the environment variables file. These will be used to set up the OCRConfiguration
- -p, --platform <platform>#
If set, schedule this command on the specified platform instead of running inline.
- Options:
coiled | local
- --vm-type <vm_type>#
Coiled VM type override (Coiled only).
- Default:
'm8g.16xlarge'
create-building-centroid-pmtiles#
Create building centroid PMTiles from the consolidated geoparquet file.
Usage
ocr create-building-centroid-pmtiles [OPTIONS]
Options
- -e, --env-file <env_file>#
Path to the environment variables file. These will be used to set up the OCRConfiguration
- -p, --platform <platform>#
If set, schedule this command on the specified platform instead of running inline.
- Options:
coiled | local
- --vm-type <vm_type>#
Coiled VM type override (Coiled only).
- Default:
'c8g.8xlarge'
- --disk-size <disk_size>#
Disk size in GB (Coiled only).
- Default:
250
create-building-pmtiles#
Create PMTiles from the consolidated geoparquet file.
Usage
ocr create-building-pmtiles [OPTIONS]
Options
- -e, --env-file <env_file>#
Path to the environment variables file. These will be used to set up the OCRConfiguration
- -p, --platform <platform>#
If set, schedule this command on the specified platform instead of running inline.
- Options:
coiled | local
- --vm-type <vm_type>#
Coiled VM type override (Coiled only).
- Default:
'c8g.8xlarge'
- --disk-size <disk_size>#
Disk size in GB (Coiled only).
- Default:
250
create-pyramid#
Create Pyramid
Usage
ocr create-pyramid [OPTIONS]
Options
- -e, --env-file <env_file>#
Path to the environment variables file. These will be used to set up the OCRConfiguration
- -p, --platform <platform>#
If set, schedule this command on the specified platform instead of running inline.
- Options:
coiled | local
- --vm-type <vm_type>#
Coiled VM type override (Coiled only).
- Default:
'm8g.16xlarge'
create-regional-pmtiles#
Create PMTiles for regional risk statistics (counties and tracts).
Usage
ocr create-regional-pmtiles [OPTIONS]
Options
- -e, --env-file <env_file>#
Path to the environment variables file. These will be used to set up the OCRConfiguration
- -p, --platform <platform>#
If set, schedule this command on the specified platform instead of running inline.
- Options:
coiled | local
- --vm-type <vm_type>#
Coiled VM type override (Coiled only).
- Default:
'c8g.8xlarge'
- --disk-size <disk_size>#
Disk size in GB (Coiled only).
- Default:
250
ingest-data#
Ingest and process input datasets
Usage
ocr ingest-data [OPTIONS] COMMAND [ARGS]...
download#
Download raw source data for a dataset.
Usage
ocr ingest-data download [OPTIONS] DATASET
Options
- --dry-run#
Preview operations without executing
- Default:
False
- --debug#
Enable debug logging
- Default:
False
Arguments
- DATASET#
Required argument
Name of the dataset to download
list-datasets#
List all available datasets that can be ingested.
Usage
ocr ingest-data list-datasets [OPTIONS]
process#
Process downloaded data and upload to S3/Icechunk.
Usage
ocr ingest-data process [OPTIONS] DATASET
Options
- --dry-run#
Preview operations without executing
- Default:
False
- --use-coiled#
Use Coiled for distributed processing
- Default:
False
- --software <coiled_software>#
Software environment to use (required if –use-coiled is set)
- --debug#
Enable debug logging
- Default:
False
- --overture-data-type <overture_data_type>#
For overture-maps: which data to process (buildings, addresses, or both)
- Default:
'both'
- --census-geography-type <census_geography_type>#
For census-tiger: which geography to process (blocks, tracts, counties, or all)
- Default:
'all'
- --census-subset-states <census_subset_states>#
For census-tiger: subset of states to process (e.g., California Oregon)
Arguments
- DATASET#
Required argument
Name of the dataset to process
run-all#
Run the complete pipeline: download, process, and cleanup.
Usage
ocr ingest-data run-all [OPTIONS] DATASET
Options
- --dry-run#
Preview operations without executing
- Default:
False
- --use-coiled#
Use Coiled for distributed processing
- Default:
False
- --debug#
Enable debug logging
- Default:
False
- --overture-data-type <overture_data_type>#
For overture-maps: which data to process (buildings, addresses, or both)
- Default:
'both'
- --census-geography-type <census_geography_type>#
For census-tiger: which geography to process (blocks, tracts, counties, or all)
- Default:
'all'
- --census-subset-states <census_subset_states>#
For census-tiger: subset of states to process (e.g., California Oregon)
Arguments
- DATASET#
Required argument
Name of the dataset to process
partition-buildings#
Partition buildings geoparquet by state and county FIPS codes.
Usage
ocr partition-buildings [OPTIONS]
Options
- -e, --env-file <env_file>#
Path to the environment variables file. These will be used to set up the OCRConfiguration
- -p, --platform <platform>#
If set, schedule this command on the specified platform instead of running inline.
- Options:
coiled | local
- --vm-type <vm_type>#
Coiled VM type override (Coiled only).
- Default:
'c8g.12xlarge'
process-region#
Calculate and write risk for a given region to Icechunk CONUS template.
Usage
ocr process-region [OPTIONS] REGION_ID
Options
- -e, --env-file <env_file>#
Path to the environment variables file. These will be used to set up the OCRConfiguration
- -t, --risk-type <risk_type>#
Type of risk to calculate
- Default:
<RiskType.FIRE: 'fire'>- Options:
fire
- -p, --platform <platform>#
If set, schedule this command on the specified platform instead of running inline.
- Options:
coiled | local
- --vm-type <vm_type>#
Coiled VM type override (Coiled only).
- --init-repo#
Initialize Icechunk repository (if not already initialized).
- Default:
False
Arguments
- REGION_ID#
Required argument
Region ID to process, e.g., y10_x2
run#
Run the OCR deployment pipeline. This will process regions, aggregate geoparquet files, and create PMTiles layers for the specified risk type.
Usage
ocr run [OPTIONS]
Options
- -e, --env-file <env_file>#
Path to the environment variables file. These will be used to set up the OCRConfiguration
- -r, --region-id <region_id>#
Region IDs to process, e.g., y10_x2
- --all-region-ids#
Process all valid region IDs
- Default:
False
- -t, --risk-type <risk_type>#
Type of risk to calculate
- Default:
<RiskType.FIRE: 'fire'>- Options:
fire
- --write-regional-stats#
Write aggregated statistical summaries for each region (one file per region type with stats like averages, medians, percentiles, and histograms)
- Default:
False
- --create-pyramid#
Create ndpyramid / multiscale zarr for web-visualization
- Default:
False
- -p, --platform <platform>#
Platform to run the pipeline on
- Default:
<Platform.LOCAL: 'local'>- Options:
coiled | local
- --wipe#
Wipe the icechunk and vector data storages before running the pipeline
- Default:
False
- --dispatch-platform <dispatch_platform>#
If set, schedule this run command on the specified platform instead of running inline.
- Options:
coiled | local
- --vm-type <vm_type>#
VM type override for dispatch-platform (Coiled only).
- --process-retries <process_retries>#
Number of times to retry failed process-region tasks (Coiled only). 0 disables retries.
- Default:
2
write-aggregated-region-analysis-files#
Write aggregated statistical summaries for each region (CONUS, state, county, tract and block).
Creates one file per region type containing aggregated statistics for ALL regions, including building counts, average/median risk values, percentiles (p90, p95, p99), and histograms. Outputs in geoparquet, geojson, and csv formats.
Usage
ocr write-aggregated-region-analysis-files [OPTIONS]
Options
- -e, --env-file <env_file>#
Path to the environment variables file. These will be used to set up the OCRConfiguration
- -p, --platform <platform>#
If set, schedule this command on the specified platform instead of running inline.
- Options:
coiled | local
- --vm-type <vm_type>#
Coiled VM type override (Coiled only).
- Default:
'r8g.4xlarge'