THALWAG — OPEN SCIENCE
Open by principle.
This page describes what THALWAG makes publicly available, what it does not, and why both decisions were made deliberately. It is not a press release. It is a commitment with specifics.
AT A GLANCE
Open — CC BY 4.0
- Vessel observation data (coordinates, timestamp, T, S, O2, depth)
- Science methods paper (EarthArXiv preprint)
- Data schema and API specification
- Model validation code and OSSE results
- Uncertainty estimates (bundled with every data release)
- Data ingestion pipeline (open source, MIT)
- Quarterly gridded model output (after QC)
Protected — see below
- Core state estimation engine (production model)
- Commercial forecast products for named clients
- Individual vessel identities and precise routes (by default)
- Partner sensor calibration coefficients (partner-owned IP)
01 — WHAT IS OPEN
What THALWAG
makes open.
All observation data collected through the THALWAG network is published in full — coordinates, timestamps, sensor readings, and uncertainty estimates — under CC BY 4.0. The methods paper is on EarthArXiv. The data schema, API specification, ingestion pipeline code, and model validation results are all public. No observation leaves our systems before becoming a permanent public record.
The CC BY 4.0 license is the broadest attribution-only open license in standard use. It permits any reuse — commercial or non-commercial, modified or unmodified — provided the source is acknowledged. We use it rather than a more restrictive license deliberately: science built on THALWAG data should not be limited in where it can be published or how it can be used.
Uncertainty estimates are not optional additions. Every data record includes a quality flag and, where computable, a per-reading uncertainty estimate. An observation without stated uncertainty is a claim without qualification. We treat uncertainty estimates as part of the data, not metadata appended to it.
Observation data
Temperature, salinity, dissolved oxygen, and depth readings from all participating vessels, with coordinate pairs, timestamps, vessel-class ID, and quality flags. Published at transmission time after automated QC, with a permanent Zenodo DOI per quarterly release and a continuously updated archive.
Status: pilot datasets available — full API in development
Methods and science
The founding thesis is on EarthArXiv as a preprint. The data assimilation methodology, sensor specifications, calibration procedures, and observing system simulation experiments are described in full and reproducible from the public code repository.
Status: preprint submitted — DOI pending
Model validation output
The OSSE results used to evaluate the network design, including cases where the simulated system underperforms expectations. Negative results are published with the same commitment as positive ones.
Status: published alongside the thesis preprint
Model gridded output
Quarterly releases of the gridded ocean state estimate for the Arabian Sea and Bay of Bengal: temperature, salinity, and dissolved oxygen fields on a regular grid, with depth levels and uncertainty fields included.
Status: planned — first release when network reaches minimum density
02 — WHY OPENNESS
Openness is not generosity.
It is design.
Open data builds the kind of trust that makes THALWAG worth using. Oceanographers who can interrogate our methods become collaborators. Governments that can inspect our observations become partners. Fishers who see exactly what happens to their data continue to participate. Openness is not a policy choice; it is what makes the system function.
The ocean has a replication problem. A significant portion of oceanographic knowledge rests on model output that cannot be independently reproduced because the code, the data, and the calibration choices were never made public. THALWAG was designed from the start to be a system that anyone can interrogate. If our observations are wrong, we want to know. If our model is biased in a particular region, we want a researcher in Chennai or Cape Town to be able to demonstrate it — not to discover it in a journal review two years later.
The historical precedent here is instructive. The Argo programme's open-data policy — all float data freely available within 24 hours of transmission — is widely credited with transforming global ocean observation science. Before Argo, most hydrographic data sat in national archives with varying access restrictions. After Argo, the number of peer-reviewed studies using deep-ocean temperature data rose by an order of magnitude.Roemmich et al., 2019, Frontiers in Marine ScienceThe lesson is not subtle: open data produces more science.
For THALWAG specifically, openness is also how we establish ourselves as a reference rather than a vendor. A proprietary ocean model cannot become a scientific standard — it can only be a product. A model whose methods are published, whose data are open, and whose validation is reproducible can become the basis on which other scientists build, the reference against which other products are compared, the record that policy rests on. That is what we are trying to build.
03 — WHAT IS PROTECTED
What we protect,
and why.
The core state estimation engine, commercial forecast products, vessel identities, and partner sensor calibration data are not open. These protections are not contradictions of the open-data commitment; they are what allows THALWAG to sustain the observation network that produces open data. Each protection has a specific rationale, stated here plainly.
Core state estimation engine
The assembled, tuned, production model — the specific software that ingests new observations, runs the assimilation cycle, and produces the state estimate — is proprietary. The methods used to build it are fully described in the public paper; the implementation is not released.
Why:Maintaining the observation network requires sustained engineering effort and infrastructure. The model engine is how THALWAG generates the revenue that pays for that effort. Releasing it would not materially advance open science — the open data and the open methods paper are what researchers need. It would, however, remove THALWAG's ability to sustain the network that generates the open data in the first place.
Commercial forecast products
Tailored ocean forecasts, decision-support dashboards, and bespoke analysis products sold to specific clients are proprietary. They are derived from model output that itself becomes open after the quarterly release cycle.
Why:Commercial clients pay for customisation, presentation, and support — not for exclusive access to underlying data, which is open. The distinction matters: THALWAG does not sell access to data. It sells products built on data that is itself freely available.
Individual vessel identities and routes
Participating vessels are identified in the public record by a vessel-class code, not by individual ID or name. The precise route taken is not published; only the observation point coordinates are. This is the default. Operators who wish to receive named attribution for their contributions to published research may opt in.
Why:Fishing routes are commercially sensitive. A fisher's route to a productive ground is intellectual property in the most direct sense. Publishing it would harm the people on whose cooperation the network depends. This protection was a design requirement, not a legal precaution.
Partner sensor calibration data
Calibration coefficients and factory characterisation data for sensors supplied by manufacturing partners are owned by those partners and are not THALWAG's to release. What is published is the field calibration methodology — how THALWAG validates and corrects sensor output in deployment — which is fully documented.
Why:These are not THALWAG's data. We do not own them; we cannot release them. The open field calibration methodology is sufficient for independent validation of our data quality.
04 — RESEARCHER ACCESS
How researchers
can engage.
Pilot observation datasets and validation code are available now via Zenodo. A public REST API for observation data is in development, with priority access for academic collaborators. If you are working on Indian Ocean oceanography, climate modelling, or fisheries science and would like access before the API is live, contact us directly. Access can be arranged.
Pilot datasets on Zenodo
Initial observation data from the pilot sensor deployments is archived on Zenodo with a permanent DOI. It includes raw and QC-flagged records, the field calibration log, and the deployment metadata. Cite it as you would any dataset.
See citation format →Validation code (GitHub)
The OSSE validation code, the data ingestion pipeline, and the schema definitions are in a public repository. They are documented and runnable. If you find errors, open an issue. If you improve them, open a pull request.
Request repository link →Public observation API
A REST API returning live and historical observation records, filterable by bounding box, time range, depth, and parameter. Authenticated access for high-volume or priority users. Rate-limited anonymous access for exploratory use.
Target: 2026 Q4 — early access for research collaborators on request
Quarterly gridded model output
Gridded temperature, salinity, and oxygen fields for the northern Indian Ocean at standard depth levels, in CF-compliant NetCDF format. Released quarterly with a permanent DOI. Uncertainty fields included.
First release: when network reaches minimum operational density (~2 000 vessels)