THALWAG — FOR RESEARCHERS

Data that exists
nowhere else.

The Indian Ocean is one of the most climatically significant and least-observed ocean basins on Earth. THALWAG is equipping India’s commercial fishing fleet to change that — generating observation densities in the Arabian Sea and Bay of Bengal that no single research programme approaches. This page describes what the data are, what we make open, and how to work with us.

200 000vessel fleet
active India coastline

< 1/5survey density
vs North Atlantic (WOD)

CC BY 4.0all observation data
released open

01 — THE DATA

Why no single institution
can gather this.

THALWAG’s observational advantage is structural, not incremental. Dense daily coverage of the Arabian Sea and Bay of Bengal at subsurface depth — from a network of thousands of active vessels — is impossible to replicate with research cruises, Argo floats, or any current mooring programme in the Indian Ocean basin. The fleet is latent infrastructure that simply has not been used until now.

T / S / O₂ profiles

Temperature, salinity, and dissolved oxygen at depth, transmitted daily from solid-state sensors on participating vessels. Each reading carries a quality flag and field-calibration correction. Data are published at transmission time with a permanent Zenodo DOI per quarterly release.

Parameters: T (°C), S (psu), O₂ (µmol kg⁻¹)
Depth: 0–200 m (CTD-profiling; vessel-dependent)

Fleet-scale spatial density

At minimum operational density (~2 000 active vessels), the network generates observation densities in the Arabian Sea and Bay of Bengal exceeding Argo by a factor of roughly 8–12 per unit area. At full scale the comparison is not meaningful — no analogous programme exists in the basin.Estimate derived from OSSE framework - George, 2026

Coverage: Arabian Sea, Bay of Bengal, Lakshadweep Sea
Cadence: daily transmission per vessel

Subsurface, not skin

Satellites measure skin-layer temperature only. THALWAG sensors profile the water column to depth from moving vessels. Thermal structure below the mixed layer — where fish aggregate, where hypoxic zones form, where monsoon heat exchange occurs — is directly observed rather than inferred from poorly-constrained model initialisation.

Sensor type: solid-state; field-validated against Argo matchups
Depth resolution: variable; see calibration log per dataset

Environmental DNA (eDNA)

Water-sample collection for metabarcoding of marine biodiversity is integrated into the sensor deployment protocol for participating vessels. Fleet-scale eDNA sampling across the northern Indian Ocean — with matching physical oceanographic context per sample — is a dataset that does not exist anywhere else in the literature at this spatial scale.

Protocol: seawater filtration; 0.22 µm membrane; cold-chain to shore
Taxonomic scope: prokaryotes to fish; 16S, 18S, 12S amplicons

Passive acoustics

Hydrophone arrays on selected vessels record ambient ocean sound continuously during passage. The data support cetacean presence-absence monitoring, snapping-shrimp population density as a proxy for reef health, and anthropogenic noise characterisation across fishing-ground corridors in the Arabian Sea and Bay of Bengal.

Sample rate: 96 kHz (16-bit PCM) per deployment class
Archive: raw audio + derived detections; researcher access on request

Indian Ocean specificity

The World Ocean Database shows the Indian Ocean has received approximately one-fifth the hydrographic survey attention of the North Atlantic over the past half-century, despite comparable area.WOD, NODC/NCEI, 2023THALWAG data are not supplements to a well-observed basin — they address a structural gap in global ocean observation. The network operates where no comparable permanent programme exists.

Primary geography: 5°S–25°N, 55°E–100°E
Seasonal coverage: year-round, including SW and NE monsoon transitions

02 — OPEN SCIENCE

The data are open.
So is the model.

All observation data are released under CC BY 4.0 as a permanent public archive. The founding methods paper is on EarthArXiv as a preprint. Co-authorship credit is offered to data contributors by default. The state estimation model produces a living, continuously updated ocean analysis — and maintains a full reanalysis archive — that researchers can use as a tool rather than a product.

THALWAG treats data contributors — fishing crews, vessel operators, community cooperatives — as scientific partners, not platforms. Named co-authorship credit in published research is offered to all contributing vessel operators who opt in. For academic collaborators using THALWAG data, the citation format specifies the contributing network rather than anonymising the source. This is not a courtesy; it is a design requirement for a network that must be sustained by its participants.

Format: contributor credit follows CREDIT taxonomy where applicable.
Opt-in: vessel operator controls attribution level at enrolment.

The THALWAG state estimate is not a periodic product — it is a continuously running ocean analysis for the northern Indian Ocean, initialised from CMEMS global analyses and locally refined by fleet observations through variational data assimilation. It produces daily gridded fields of temperature, salinity, and dissolved oxygen with uncertainty estimates. When new observations contradict its predictions, it corrects itself. The model is designed to be falsifiable and to improve with time, because it must be.

Grid: 0.1° × 0.1°; depth levels: 5–500 m standard
Output: NetCDF, CF-compliant; released quarterly with permanent DOI

Every version of the state estimate is archived permanently. As the network grows and observation density increases, historical periods can be reanalysed with denser data — the archive captures the full trajectory of model improvement. Researchers studying decadal trends, monsoon variability, or heat content change in the Arabian Sea have a continuously improving historical record rather than a static dataset.Reanalysis methodology: George, 2026 (EarthArXiv preprint)The archive is accessible to academic collaborators before the public API is live; contact us directly.

Archive format: Zarr (cloud-native) + NetCDF mirror on Zenodo
Versioning: semantic; each reanalysis run has a unique DOI

DATA AVAILABILITY STATUS

Pilot observation datasetsZenodo — permanent DOI — available now

Methods preprintEarthArXiv — DOI pending submission

OSSE validation codeGitHub — public repository

Public observation APIIn development — 2026 Q4 target; researcher priority access on request

Quarterly gridded model outputPlanned — first release at ~2 000 active vessels

eDNA metabarcoding archivePlanned — protocol in trial; NCBI BioProject on submission

03 — HOW TO COLLABORATE

Three ways to work
with us.

We are open to collaboration on three levels: scientific advisory engagement, joint research projects with shared data and co-authorship, and data access arrangements for groups that need material before the public API is live. All three are arranged directly, without a form.

THALWAG maintains a small scientific advisory panel of working oceanographers, biogeochemists, and climate scientists. Advisory relationships are substantive: members review the observing system design, evaluate the data assimilation methodology, and receive early access to model output and pilot datasets. In exchange, advisory panellists are credited in the methods paper and in subsequent publications that draw on their input. This is not a letterhead arrangement.

→ Write to derin@thalwag.com with your institution and research focus. Advisory capacity is limited; currently open.

We are interested in co-designed research projects with institutions working on Indian Ocean dynamics, monsoon predictability, marine biogeochemistry, fisheries ecology, and underwater bioacoustics. Joint projects receive priority access to pre-release datasets, dedicated data streams (e.g. eDNA or acoustic subsets), and co-authorship credit on all publications drawing on THALWAG network data. THALWAG is not a data vendor; we are a research project with an operational arm.

→ Describe the research question and what data would be needed. Joint project agreements are documented publicly once active.

Pilot datasets are available on Zenodo now. The full observation API is in development (2026 Q4 target). Researchers who need data before the API is live — for model validation, assimilation experiments, biodiversity baselines, or acoustic noise mapping — can request direct access. Access arrangements are considered individually, documented publicly, and do not require an institutional affiliation or a grant reference.

→ State your institution (if any), the research question, and the specific parameters and region you need. Turnaround is typically < 5 business days.

WRITE DIRECTLY

All collaboration enquiries go to one person. Include your institution, your research question, and what you need from us. Arrangements for early data access, advisory membership, or joint project discussions are considered individually and documented publicly once agreed.

derin@thalwag.com

Data that exists
nowhere else.

Why no single institution
can gather this.

T / S / O₂ profiles

Fleet-scale spatial density

Subsurface, not skin

Environmental DNA (eDNA)

Passive acoustics

Indian Ocean specificity

The data are open.
So is the model.

Co-authorship by design

A living model, not a snapshot

Reanalysis archive as a research tool

Three ways to work
with us.

Scientific advisory panel

Joint research projects

Data access

Data that existsnowhere else.

Why no single institutioncan gather this.

T / S / O2 profiles

Fleet-scale spatial density

Subsurface, not skin

Environmental DNA (eDNA)

Passive acoustics

Indian Ocean specificity

The data are open.So is the model.

Co-authorship by design

A living model, not a snapshot

Reanalysis archive as a research tool

Three ways to workwith us.

Scientific advisory panel

Joint research projects

Data access

Data that exists
nowhere else.

Why no single institution
can gather this.

T / S / O₂ profiles

The data are open.
So is the model.

Three ways to work
with us.