Provider Catalog¶
CFS ships 33 connectors — 31 live-verified against their upstream stores (the auth-gated ones with real CDS and Earthdata credentials), 2 offline-verified pending access or provider-specific credentials.
The machine-readable catalog (resolution, bbox, license, citation, canonical
variables, caveats) is committed at
inventory/providers.yaml,
and the registry is discoverable at runtime (cfs providers,
cfs products).
Live upstreams
CFS is a passthrough service: every fetch hits the provider's live store, so transient upstream outages (a THREDDS restart, an S3 hiccup, CDS queue congestion) can surface as fetch errors independent of CFS itself.
Connectors¶
| slug | product | grid | access | auth | verified |
|---|---|---|---|---|---|
era5_arco |
ECMWF ERA5 (0.25°, hourly) | regular | GCS Zarr | anonymous | live |
aorc |
NOAA AORC v1.1 (1 km, hourly) | regular | S3 Zarr | anonymous | live |
aorc_nwm |
NOAA AORC v1.1 NWM-projected (1 km) | 2-D LCC | S3 Zarr | anonymous | live |
nwm_operational |
NOAA NWM operational forcing (1 km, hourly, real-time) | 2-D LCC | S3 NetCDF | anonymous | live |
gfs |
NOAA GFS atmospheric forecast (0.25°, hourly, global) | regular | S3 GRIB2 (byte-range) | anonymous | live |
gefs |
NOAA GEFS ensemble forecast (0.25°, 3-hourly, global) | regular | S3 GRIB2 (byte-range) | anonymous | live |
chirps |
CHIRPS v2.0 daily precip (0.05°) | regular | HTTP NetCDF | anonymous | live |
chirts |
CHIRTS daily temperature (0.05°, global tropics) | regular | HTTP NetCDF | anonymous | live |
persiann_cdr |
PERSIANN-CDR daily satellite precip (0.25°, 1983–) | regular | HTTP NetCDF | anonymous | live |
rdrs |
RDRS / CaSR v3.2 (Canada, ~10 km, hourly) | 2-D rotated pole | OPeNDAP | anonymous | live |
barra2 |
BoM BARRA-R2 (Australia, ~12 km, hourly) | regular | NCI THREDDS ncss | anonymous | live |
conus404 |
CONUS404 (4 km WRF, hourly) | 2-D LCC | OSN Zarr | anonymous | live |
hrrr |
NOAA HRRR analysis + forecast (3 km) | 2-D LCC | hrrrzarr S3 | anonymous | live |
era5_land |
ECMWF ERA5-Land (0.1°, hourly) | regular | CDS API | CDS | live (creds) |
era5_cds |
ECMWF ERA5 reanalysis (0.25°, hourly) | regular | CDS API | CDS | live (creds) |
wfde5 |
WFDE5 bias-corrected ERA5 forcing (0.5°, hourly) | regular | CDS API | CDS | live (creds) |
carra |
Copernicus Arctic Regional Reanalysis (2.5 km) | regular (interp.) | CDS API | CDS | live (creds) |
cerra |
Copernicus European Regional Reanalysis (5.5 km) | regular (interp.) | CDS API | CDS | live (creds) |
eobs |
E-OBS European gridded observations (0.1°/0.25° daily) | regular | CDS API | CDS | live (creds) |
merra2 |
NASA MERRA-2 (0.5°×0.625°, hourly) | regular | OPeNDAP | Earthdata | live (creds) |
nldas |
NLDAS-2 (0.125°, hourly, CONUS) | regular | OPeNDAP | Earthdata | live (creds) |
gpm |
GPM IMERG Daily precip (Final/Early/Late) | regular | OPeNDAP | Earthdata | live (creds) |
cmorph |
NOAA CPC CMORPH CDR daily precip (0.25°) | regular | HTTP tar NetCDF | anonymous | live |
daymet |
Daymet V4R1 (1 km daily, N. America) | 2-D LCC (x/y) | OPeNDAP | Earthdata | live (creds) |
gldas |
NASA GLDAS-2 Noah (0.25°, 3-hourly, global land) | regular | OPeNDAP | Earthdata | live (creds) |
fldas |
NASA FLDAS Noah (0.1°, monthly, global/Africa land) | regular | OPeNDAP | Earthdata | live (creds) |
nex_gddp |
NEX-GDDP-CMIP6 (0.25° daily projections) | regular | S3 NetCDF | anonymous | live |
na_cordex |
NA-CORDEX (0.22°/0.44° daily projections, N. America) | regular | S3 Zarr | anonymous | live |
gridmet |
gridMET daily CONUS surface meteorology (~4 km) | regular | OPeNDAP | anonymous | live |
nclimgrid_daily |
NOAA nClimGrid-Daily (5 km, CONUS) | regular | OPeNDAP | anonymous | live |
narr |
NOAA NARR daily monolevel fields (32 km) | 2-D LCC | OPeNDAP | anonymous | live |
mswep |
MSWEP precipitation (0.1°, daily/3-hourly) | regular | rclone / GDrive | Drive access | offline |
em_earth |
EM-Earth (0.1° daily, global) | regular | S3 / FRDR HTTPS / local staging | AWS credentials (S3) or anonymous (FRDR) | live |
"live" means a real fetch against the upstream store returned physical values; "live (creds)" the same, using real CDS or Earthdata credentials; "offline" means the connector's path/subsetting/conversion logic is verified against synthetic data but a live fetch is pending access (see the notes below).
Per-provider notes¶
Rolling-window archives¶
cmorph— the NOAA CPC daily-tar archive only hosts a rolling recent window (roughly the last couple of months). Historical years are not on that endpoint; a fetch outside the window raises a clear "no tar listed" error.nwm_operational— reads the real-time NWM forcing fromnoaa-nwm-pds(S3), generating lat/lon from the same 1 km LCC projection asaorc_nwm. Only theanalysis_assimconfiguration is exposed (itstm00analysis maps cleanly to a valid time); the bucket keeps only a rolling ~4-week window, so fetches must target recent dates.
Offline-verified connectors¶
-
mswep— distributed only via a GloH2O-shared Google Drive folder, reached through the externalrcloneCLI. The connector's path and conversion logic are verified offline (with a clear setup error otherwise); a real fetch needsrcloneplus a configured Drive remote with granted access. Exact unblock sequence:- register at https://www.gloh2o.org/mswep/ (free for non-commercial use) with the Google account you will use for Drive;
- wait for the GloH2O confirmation email — it shares the
MSWEP_V*Drive folders with that account ("Shared with me"); - install rclone and run
rclone config→ new remote, nameGoogleDrive(the connector default; override viaMSWEP_RCLONE_REMOTE), typedrive, default scopes, browser auth with the same account; - verify with
rclone lsd --drive-shared-with-me GoogleDrive:(theMSWEP_V316folder should list); - run the prepared validation script
(
/tmp/parity-exp/validate_mswep_when_unblocked.sh) to exercise the doc-fixed paths on both the CFS and SYMFLUENCE sides and record a parity grade. em_earth— two sources. The default S3 bucket denies anonymous reads (allows listing only), so it needs AWS credentials (config={"anon": False}). The FRDR route (config={"source": "frdr"}) needs no credentials at all: FRDR's documented stable per-file links (https://www.frdr-dfdr.ca/repo/files/6/published/publication_542/ submitted_data/EM_Earth_v1/…) 302 to the Globus HTTPS collection (g-f0a056.cd4fe.0ec8.data.globus.org), which serves anonymous GETs — live-verified 2026-06-12 (an earlier probe of the landing page missed this; the dataset page itself only advertises Globus transfer and the email-gated Zip). TheEM_Earth_v1/layout mirrors the S3 keys undernc/exactly, so the same key construction serves both. Caveats: files are whole-month globals (~100–300 MB per variable-month — setdata_dirso they are cached and reused), and the FRDR route covers the deterministic daily product only (FRDR's probabilistic/hourly trees are continent- and member-split).
Pre-staged files (e.g. bulk Globus transfers from collection
515c70c4-2eb8-4f2a-b406-7959b5edc28d, path/6/published/publication_542/submitted_data) are picked up fromconfig={"data_dir": ...}— either the archive-relative layout<data_dir>/deterministic_raw_daily/<var>/<file>.ncor flat<data_dir>/<file>.nc— before any network access.Units are file-verified (2026-06-12, FRDR deterministic daily):
prcpismm day-1(→/86400), temperaturesCelsius(→+273.15); the long-standing "precip units unverified" warning is retired. The files also carryprcp_corrected(PBCOR WorldClim-corrected); the connector ships rawprcp, matching the native SYMFLUENCE handler. Validation (exp17, Colorado box, 2018-06 ×14 days): CFS canonical output is bitwise identical to the documented derivations applied to the raw FRDR values (tmean/tdew + 273.15exactly;prcp × (1/86400)exactly, ≤ 1 float32 ulp vs the/86400op order). Native-vs-community parity remains pending AWS credentials: the native acquirer is S3-only (it grew anEM_EARTH_S3_ANON: falsecredentialed path after exp7, but has no FRDR route and no local-staging mode for the daily product), and the bucket 403s anonymous GETs.
Forecasts¶
gfs— global 0.25° GFS surface forcing from thenoaa-gfs-bdp-pdsS3 GRIB2 archive. Reads each file's.idxbyte-offset index and HTTP byte-range fetches only the surface messages it needs (~MB, not the ~1.5 GB whole file), decoding withcfgrib(theforecastextra). For a requested range it uses the most recent 00/06/12/18 UTC cycle at/before the start; valid times map to lead hours (1-hourly to f120, 3-hourly to f384). All fields are identity SI (u/v winds,prateflux); radiation and precip are interval averages, absent at f000 (analysis). Live-verified (US, T 292–300 K, SW 80–228 W m⁻²).gefs— the ensemble companion togfs: same.idx/byte-range/cfgrib machinery over the GEFS 0.25° select product (noaa-gefs-pds), returning amemberdimension (controlgec00+ perturbationsgep01–gep30, selected viaconfig={"members": [...]}; default all 31). 3-hourly leads to f240 (the select product stops there). Instantaneous fields are identity SI. Precip (APCP) and radiation (DSWRF/DLWRF) ship as 6-hour-bucket quantities (0-3,0-6,6-9, …) —APCPaccumulates, radiation averages — so they are de-bucketed by lead hour (value-sign reset detection is unsafe when a fresh bucket out-rains the previous one):cur−prevfor accumulations,2·cur−prevfor averages, thenAPCPbecomes a flux while radiation stays W m⁻². The select product ships RH, not specific humidity, soqis derived from 2 m RH + temperature + pressure (Bolton 1980,cfs.derive.humidity); RH is consumed as a derivation input, never exposed. Live-verified (non-negative precip, SW ~0 overnight after de-averaging, derived q 9–17 g/kg, member spread visible).
Climate projections¶
nex_gddp— NEX-GDDP-CMIP6 downscaled projections. A projection has a model × scenario × ensemble axis: the scenario is the product ID (nex_gddp:historical,nex_gddp:ssp245,nex_gddp:ssp585, …); the model/member are connector config, e.g.get_connector("nex_gddp")(config={"model": "MPI-ESM1-2-HR", "member": "r1i1p1f1"})(defaultACCESS-CM2/r1i1p1f1). The choices are recorded inFetchResult.provenanceand as dataset attrs (cmip6_model/cmip6_scenario/cmip6_member).na_cordex— North-American CORDEX regional projections (S3 Zarr, NCARncar-na-cordex): experimentseval/hist-rcp45/hist-rcp85, with grid (NAM-22i/NAM-44i) and bias-correction (raw/mbcn-Daymet/mbcn-gridMET) asconfigknobs; returns the CORDEX multi-model ensemble (member_iddimension). Relative humidity (hurs) is not offered — NA-CORDEX lacks the surface pressure needed to derive specific humidity.
CDS connectors¶
All need ~/.cdsapirc and the relevant dataset licences accepted on the CDS
account.
carra/cerra— delivered interpolated to a regular grid via the CDSgridparameter (the native grids are projected). Both ship 2 m RH rather than specific humidity, sospecific_humidityis derived (Bolton 1980). Live-confirmed naming trap: downwelling longwave isthermal_surface_radiation_downwardson the CARRA form butsurface_thermal_radiation_downwards(ERA5-style) on the CERRA form — and CDS silently drops an unknown variable name instead of rejecting the request, so getting this wrong yields files with no longwave (caught and fixed in the 2026-06-12 parity campaign).wfde5— needs the required CDSproducttoken (wfde5) and an underscoreversion(2_1), confirmed against the live form constraints. Downloads full half-degree monthly NetCDFs (one CDS request per variable; precipitation =Rainf+Snowf) and subsets locally.eobs— fills the European observational gap (CFS otherwise has only reanalysis there). Unlike the other CDS connectors, E-OBS has no server-sideareasubset, so the full European domain is downloaded once per variable (large, cached) and subset locally. Exposes only the cleanly-convertible fields —tg→air_temperature,rr→precipitation_flux,qq→shortwave,fg→wind_speed — and deferspp(sea-level, not surface, pressure) andhu(relative humidity needs a surface pressure E-OBS lacks). Request tokens:grid_resolution0_1deg/0_25deg,version31_0e,periodfull_period; version override viaconfig={"version": "30_0e"}. Live-verified once all E-OBS dataset licences were accepted on the CDS account.era5_cds— standard ERA5 single levels via the CDS API (a zip of instant + accumulated NetCDFs, merged), as a credentialed alternative to the anonymousera5_arco.
Earthdata connectors¶
All need EARTHDATA_TOKEN (or ~/.netrc / EARTHDATA_USERNAME +
EARTHDATA_PASSWORD) and the "NASA GESDISC DATA ARCHIVE" app authorized on
the URS profile.
nldas— opens one OPeNDAP endpoint per hour: fine for short windows, slow for long ranges (warned in theFetchResult).gldas— same GES DISChydro1host asnldas; all GLDAS-2 Noah forcing fields are already canonical SI (identity mappings). Two products:gldas:noah025_3h(GLDAS-2.1, 2000→present) andgldas:noah025_3h_v20(GLDAS-2.0, 1948–2014). Wind ships as a scalar speed only (no u/v), so it maps towind_speed; opens one endpoint per 3-hour stamp (8/day), so long ranges are slow (warned in theFetchResult).fldas— reuses the GLDAS Earthdata/DAP4 path, all-identity SI, but the global product is monthly — good for climatology or seasonal forcing over Africa and the global land surface, too coarse for event-scale hydrology. Wind is a scalar speed; land-only (ocean is fill).gpm— IMERG Daily; adds the Early and Late near-real-time runs alongside Final.
Other notes¶
barra2— uses the anonymous NCI THREDDS NetcdfSubset service: the server does the bbox+time subset and returns a clean NetCDF, avoiding the OPeNDAP DAP2 truncation NCI's server exhibits under concurrent reads. All fields are CORDEX/CMIP CF names already in SI (identity mappings, includingprflux andhuss); no dewpoint is published. Instantaneous fields are stamped on the hour and hourly means (pr/rsds/rlds) at the half-hour midpoint, so times are floored to the hour to share one axis. The grid is regular lat/lon on 0–360 longitudes (requests are normalized).hrrr— adds ansfc_fcstproduct (1-hour forecast) that provides precipitation flux, which the analysis product lacks.aorc_nwm— AORC v1.1 on the NWM v3.0 1 km LCC grid (S3 Zarr); lat/lon are generated from the LCC projection parameters.narr— includesdswrf/dlwrf(down short/longwave) radiation, live-verified against NOAA PSL. Carries occasional tiny-negative precipitation from the source fields (flagged by the advisory range QC).chirts— the temperature companion tochirps(UCSB CHC, global tropics, 0.05° daily, 1983–2016):Tmax/Tmin(°C) →air_temperature= mean + 273.15, read via HTTP byte-range from per-year chunked NetCDFs.persiann_cdr— global daily satellite-precip CDR (NCEI, 0.25°, 1983→present with multi-week latency). Per-day files carry an unpredictable creation-date suffix, so the connector resolves filenames from each year's directory index.
Adding a provider¶
Want a product that isn't here? Open a connector request, or implement it yourself — see the contributing guide for the connector pattern.