SYMFLUENCE Plugin¶
CAS ships a SYMFLUENCE integration
(cas.integrations.symfluence) that lets SYMFLUENCE source per-HRU zonal
attributes from any CAS dataset — 228+ providers behind one config key.
It plugs into SYMFLUENCE at four coexisting seams — the first three deliver per-HRU statistics, the fourth delivers vector geometry via the curated mirror:
| Seam | Entry point | What it produces |
|---|---|---|
| Primary: attribute processor | symfluence.attribute_processors → CASAttributeProcessor |
Attributes merged into SYMFLUENCE's per-HRU attribute table, alongside the native elevation.* / soil.* / climate.* attributes |
| Secondary: acquisition handler | symfluence.plugins → register() (CASAttributeAcquirer under key CAS) |
Standalone analysis CSV in data/attributes/cas/ |
| Tertiary: attribute backend | symfluence.plugins → register() (CommunityAttributeBackend under R.attribute_backends['community']) |
Per-HRU HRU_STATS_V1 CSV + acquisition manifest in data/attributes/cas/, ingested by the model-ready AttributesNetCDFBuilder as a cas group |
| Quaternary: mirror acquisition | symfluence.plugins → register() (CASMirrorAcquirer under CAS_WOKAM/CAS_HYDROLAKES/CAS_GLHYMPS) |
Domain-clipped vector GeoPackages — the curated-mirror replacement for SYMFLUENCE's native WOKAM/HydroLAKES/GLHYMPS bulk downloads |
The attribute backend is the contract-0.3.0 protocol tier — the same
backend pattern as the CFS forcing backend and the CSFS observation backend.
Under DATA_ACCESS: community, SYMFLUENCE's attribute pipeline selects it
first (parity-gated; attribute parity is tolerance-based, not
bit-identical, since zonal stats depend on resampling/masking/grid alignment),
and it delivers a declared-schema per-HRU table plus a sidecar manifest that
flows straight into the model-ready attributes store. All three seams wrap the
same extraction helpers (batch_extract over HRU geometries → per-HRU
stats); when the backend serves CAS, SYMFLUENCE excludes the cas processor
plugin from its plugin loop so CAS is extracted exactly once. Frameworks
predating contract 0.3.0 simply lack the attribute_backends registry and fall
back to the processor-plugin seam.
Install & auto-discovery¶
Install CAS into the same environment as SYMFLUENCE:
That is all the wiring there is. CAS declares both entry points in its
pyproject.toml:
[project.entry-points."symfluence.attribute_processors"]
cas = "cas.integrations.symfluence:CASAttributeProcessor"
[project.entry-points."symfluence.plugins"]
cas = "cas.integrations.symfluence:register"
SYMFLUENCE's discover_attribute_plugins() loads the first group and
validates that each entry subclasses its BaseAttributeProcessor; the second
group is loaded at import symfluence and registers the acquisition handler.
Verify discovery:
python -c "
from symfluence.data.preprocessing.attribute_processors.plugins import discover_attribute_plugins
print([name for name, _ in discover_attribute_plugins()])"
The module imports defensively: without SYMFLUENCE installed, import cas
(and even import cas.integrations.symfluence) still works — the classes'
base classes degrade to object and register() is a no-op. CAS gains
no SYMFLUENCE dependency.
Primary: the attribute-processor seam¶
No-op unless configured¶
Both seams do nothing until you opt in: when CAS_DATASETS is unset or
empty, CASAttributeProcessor.process() logs an info message and returns
{}. On the SYMFLUENCE side you can also switch all external attribute
plugins off with ATTRIBUTE_PLUGINS_ENABLED: false or skip just CAS with
ATTRIBUTE_PLUGINS_EXCLUDE: [cas].
Configuration¶
Add flat keys to your SYMFLUENCE YAML config:
# Comma-separated CAS dataset ids ({provider}:{dataset}) — required opt-in
CAS_DATASETS: "copernicus_dem:elevation,isric_soilgrids:clay_0-5cm"
# Optional: zonal aggregation method (default: mean)
# mean | median | min | max | std | sum | majority | minority | distribution
CAS_AGGREGATION: mean
# Optional: CAS runtime settings, passed to cas.configure()
CAS_API_CONFIG:
provider_timeout_s: 60
Browse dataset ids with cas datasets <provider> or the
provider catalog.
What it does¶
SYMFLUENCE's attribute machinery
(attributeProcessor._process_plugin_attributes) discovers the processor,
constructs it with (config, logger), and calls .process(). The processor:
- locates the domain's HRU/catchment polygons through the inherited
BaseAttributeProcessorpath resolution (CATCHMENT_PATH,CATCHMENT_SHP_NAME,CATCHMENT_SHP_HRUID, defaulting toshapefiles/catchment/{DOMAIN_NAME}_HRUs_{discretization}.shp) — the same geometry every in-tree processor reduces over, - reprojects to EPSG:4326 if needed and builds one CAS geometry per HRU,
- batches the geometries into
BatchAttributeRequests (max 1000 geometries each, all datasets per request) and callscas.batch_extract_sync(), - logs a quality-flag summary, and
- returns a flat attribute dict.
How results flow onward (the consumed contract)¶
What we verified in SYMFLUENCE's _process_plugin_attributes (and the
surrounding process_attributes()):
- A plugin that returns a non-empty dict has it merged as-is into the
same results dict the in-tree elevation/soil/climate processors feed
(
results.update(plugin_results)). A plugin that raises is logged and skipped — it never aborts attribute processing. - For lumped domains (
DOMAIN_DEFINITION_METHOD: lumped) all keys become columns of a single attribute row. CAS emits plaincas.{dataset}keys (e.g.cas.copernicus_dem_elevation). - For distributed domains SYMFLUENCE rebuilds one row per HRU from keys
shaped
HRU_{id}_{attribute}, parsing the id withint(key.split("_")[1]). CAS emitsHRU_{id}_cas.{dataset}keys with integer-coerced HRU ids (non-integer ids fall back to the geometry's positional index, with a warning). - Categorical
distributionresults expand to onecas.{dataset}_{class}fraction key per class. Per-datasetcas.{dataset}_quality(string) andcas.{dataset}_coverage_fraction(float) keys ride along; SYMFLUENCE's numeric-only CSV writers drop the string keys automatically, exactly as they do for climaclass's string class codes.
Because the keying is byte-for-byte the shape native processors emit, CAS
attributes ride whatever path native attribute results ride: the per-HRU
DataFrame process_attributes() rebuilds from the merged dict, and the
per-HRU numeric CSV reshaping (BaseAttributeProcessor._write_results_csv,
the format SYMFLUENCE's AttributesNetCDFBuilder and transfer-function
regionalization consume) both parse exactly these HRU_{id}_{attribute}
keys. No special-cased attributes/cas/ directory is involved — which was
the gap in the CSV-export mode below.
The replacement recipe¶
To source the zonal-statistics attribute layer from CAS instead of native downloads:
ATTRIBUTE_PROFILE: core # skip the extended native processors
DOWNLOAD_WORLDCLIM: false # and any other DOWNLOAD_* you replace
CAS_DATASETS: "copernicus_dem:elevation,\
isric_soilgrids:clay_0-5cm,isric_soilgrids:sand_0-5cm,\
esa_worldcover:land_cover,terraclimate:pet,terraclimate:aridity"
CAS_AGGREGATION: mean
Hard limits¶
Anything in SYMFLUENCE that consumes rasters cannot come from CAS: the DEM used for delineation/discretization/elevation bands, and the land-cover/soil grids used for HRU generation, stay native acquisitions. CAS replaces the zonal-statistics attribute layer only (per-HRU scalar/fraction attributes).
Starter mapping: native attributes → CAS dataset ids¶
All ids below are verified against CAS's provider registry (cas datasets
<provider>).
| Native attribute family | CAS dataset id(s) | Notes |
|---|---|---|
elevation.* zonal stats |
copernicus_dem:elevation, cop_dem_90:elevation |
use CAS_AGGREGATION: mean/min/max/std; merit_hydro:elevation_adjusted for hydrologically conditioned elevation |
soil.* texture / properties |
isric_soilgrids:clay_0-5cm, isric_soilgrids:sand_0-5cm, isric_soilgrids:silt_0-5cm (layers to 100-200cm), isric_soilgrids:soc_0-5cm, isric_soilgrids:phh2o_0-5cm |
one id per depth layer |
soil.* derived stocks/classes |
soilgrids_derived:ocs, soilgrids_derived:wrb_class |
wrb_class with CAS_AGGREGATION: majority or distribution |
landcover.* class fractions |
esa_worldcover:land_cover, esa_cci_lc:land_cover |
use CAS_AGGREGATION: distribution for per-class fractions |
vegetation.* structure |
canopy_height:canopy_height |
canopy height, not LAI |
climate.pet_annual_mean, climate.aridity_index, climate.prec_annual_mean |
terraclimate:pet, terraclimate:aridity, terraclimate:precipitation (also aet, deficit, soil_moisture, runoff, tmax, tmin, vpd, pdsi) |
TerraClimate climatologies |
| permafrost extent | permafrost:permafrost_index |
Secondary: the acquisition-handler CSV export¶
register() adds CASAttributeAcquirer to SYMFLUENCE's acquisition registry
under the key CAS for explicit use — reference it from a custom
attribute profile or invoke it directly. It is not part of the built-in
profiles (core/camels_spat/full); the processor seam above superseded
that.
Given an output directory (by convention
{project_dir}/data/attributes/cas/), download() runs the same extraction
pipeline and writes {DOMAIN_NAME}_cas_attributes.csv:
- one row per HRU, sorted by HRU id, with an explicit
hru_idfirst column; - one numeric column per dataset, named from the sanitized dataset id
(
isric_soilgrids:clay_0-5cm→isric_soilgrids_clay_0_5cm); categoricaldistributionresults expand to one{dataset}_{class}fraction column per class; - per-dataset metadata extras:
{column}_units,{column}_quality,{column}_coverage_fraction.
Re-runs skip the extraction when the CSV already exists (unless
FORCE_DOWNLOAD: true). This CSV is an analysis-oriented sidecar: it joins
on hru_id but is not folded into SYMFLUENCE's model-ready attribute store —
use the processor seam for that.
Quaternary: mirror-acquisition delegation¶
Unlike the three statistics seams, this one delivers vector geometry.
SYMFLUENCE's native wokam.py / hydrolakes.py / glhymps.py handlers each
download a global distribution and clip it to the domain — exactly what the
curated-mirror tier now does from a version-pinned, checksummed
local copy. Once the mirror-vs-native parity gate certified the two
paths feature- and geometry-equivalent, register() began wiring
CASMirrorAcquirer: drop-in acquirers that call cas.mirror_subset_sync and
write the same GeoPackage (path, projected columns, EPSG:4326) the native
handler produced, so downstream SYMFLUENCE steps are unchanged.
| CAS mirror dataset | Native handler it supersedes | Output GeoPackage |
|---|---|---|
wokam |
WOKAM / KARST / KARST_AQUIFER |
attributes/geology/karst/domain_{name}_wokam_karst.gpkg |
hydrolakes |
HYDROLAKES / HYDROLAKES_V10 |
attributes/lakes/domain_{name}_hydrolakes.gpkg |
glhymps |
GLHYMPS / GLHYMPS_V2 |
attributes/geology/glhymps/domain_{name}_glhymps.gpkg |
The acquirers always register under additive explicit keys
(CAS_WOKAM / CAS_HYDROLAKES / CAS_GLHYMPS) for scripted use. Because
SYMFLUENCE's attribute profiles reference the native keys, routing an
unmodified config through the mirror requires overriding those keys — opt in
with:
With the flag set, register() rebinds WOKAM/HYDROLAKES/GLHYMPS (and their
aliases) to the mirror-backed acquirers — last-writer-wins over the native
decorators, which already ran when CAS imported
symfluence.data.acquisition.base. No SYMFLUENCE source is edited. The flag is
off by default, so simply installing CAS never silently changes a run's
acquisition path.
RGI 7.0 / glacier is only half-delegated. SYMFLUENCE's glacier handler also
builds rasters and catchment-intersection shapefiles, which live in SYMFLUENCE —
so its key is never overridden. Instead CAS exposes
cas.integrations.symfluence.mirror_rgi_outlines(bbox, output_dir), the
acquisition-only helper that returns a domain-clipped rgi7 outline GeoPackage
for the glacier handler to consume in place of its NSIDC/GLIMS download.