Schema Evolution and Versioning
Static analysis of schema evolution in JSON Schema is something we are actively working on, and it is a genuinely complex problem.
Other schema formats like Apache Avro and Protocol Buffers have solved compatibility detection reasonably well because their type systems are constrained and predictable. However, JSON Schema is a different beast. Its expressiveness is its strength, but it also makes automated analysis significantly harder to get right in cases beyond the trivial.
This page covers what is available today, practical guidance on versioning, and the roadmap for where automated analysis is heading.
Versioning strategy is an organisational decision, not a tooling decision
Sourcemeta One deliberately does not impose a versioning scheme. This is an architectural decision, not a gap, as versioning strategies across the industry are genuinely disparate:
- ISO standards combine a standard number with a publication year:
ISO 4217:2015,ISO 8601:2004. The year signals the edition without implying any compatibility relationship with prior editions - The JSON Schema specification uses date-based identifiers:
2020-12,2019-09. Unambiguous, but carries no compatibility signal - Semantic Versioning (
1.2.3) encodes a compatibility signal in the version number itself - SchemaVer
adapts SemVer for schemas with a
MODEL-REVISION-ADDITIONstructure, replacing major/minor/patch with terms that map to data compatibility - Monotonic integers (
v1,v2,v3) make no claim about the nature of the change. Simple and widely used in internal API governance
A registry that forces one scheme creates friction for organisations already using another. For this reason, Sourcemeta One treats version identifiers as opaque path segments and accommodates any convention.
Why SemVer is harder to apply to schemas than it looks
Semantic Versioning is tempting because it is familiar and encodes useful compatibility signals. But applying it to JSON Schema involves decisions the ecosystem has not yet settled. For example:
Annotations. Keywords like
default produce
annotations rather than constraints. Consumers using schemas for documentation
generation or UI rendering may depend on them. Is changing a default value a
breaking change? SchemaVer may say patch. A consumer whose UI break may
disagree.
Anchors and definitions. If another schema references a named anchor or internal definition directly and you rename it during a refactor, you have broken a consumer without touching any validation behaviour.
SchemaVer addresses some of this with MODEL/REVISION/ADDITION semantics. But even SchemaVer acknowledges grey areas requiring author judgement.
The fundamental challenge is that JSON Schema is a constraint system, not a modelling language. In most programming languages, adding something is backwards compatible. In JSON Schema, adding more typically means permitting less. What looks like an addition is often a restriction, and software versioning intuitions become actively misleading.
Version as a dedicated path component
A JSON Schema is uniquely identified by its URI. Two versions cannot share the
same URI, thus a dedicated path component is the clearest convention:
/customers/customer/v1.json, /customers/customer/v2.json. It makes versions
visible, navigable, and directly addressable, with both versions served
simultaneously and unambiguously.
In practice, we found that a simple monotonic major version is more than enough for most organisations. A single integer, incremented when a breaking change is needed, sidesteps SemVer ambiguity and gives you all the versioning room you need while keeping it simple.
Schema-level versus data model-level versioning
Should you version each schema individually, or version an entire collection together as a data model? For example:
- Per-schema versioning:
/project/customer/v1.json,/project/invoice/v1.json,/project/customer/v2.json. Each schema evolves independently - Data model versioning:
/project/v1/customer.json,/project/v1/invoice.json,/project/v2/customer.json. The entire namespace is versioned together
Data model versioning makes sense when schemas are tightly coupled and genuinely always evolve together. For example, a wire protocol or standards specification. However, for most API governance scenarios, it creates a practical problem. When one schema out of twenty changes, the other nineteen are copied unchanged into the new namespace. Consumers face multiple identical copies with no way to know what actually changed, and diffs become noise.
For this reason, we suggest per-schema versioning: version only what changes, only when it changes.
Guaranteeing compatibility today: testing
This is exactly how software libraries are versioned and published throughout the industry.
For example, The NPM registry does not validate that your 2.0.0 is genuinely
incompatible with 1.x. The guarantee comes from the author's discipline,
their test suite, and their changelog. Assuming we treat schemas like code,
this should feel familiar: the same model that governs every package you depend
on governs your schemas here.
The jsonschema
test
command is the unit test runner for schemas. A compatibility test suite asserts
two things: instances that must remain valid under the new version, and
instances that must remain invalid. Running this against both versions tells
you whether a change is backwards compatible.
jsonschema test /path/to/schema/tests/
Run this in CI on every proposed change to a canonical schema before it merges, just like you would do for any production code.
Redirecting consumers when schemas move
In practice, schemas move: concepts get renamed, directories restructured,
versions superseded. Many do not realise that JSON Schema handles this natively
via $ref forwarding.
When you need to move a schema, leave a minimal schema with a
$ref to the new location
and if on 2019-09 and later, set the
deprecated
keyword to true.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"deprecated": true,
"$ref": "../path/to/new/v2.json"
}
Consumers resolving the old URI transparently follow the reference. Tooling
that honours
deprecated
surfaces a migration warning, and the old URI keeps working. The platform team
can then monitor deprecated URIs still receiving traffic and track adoption of
the new version.
Coming Soon
Sourcemeta One will add HTTP-level 3xx redirection at deprecated URIs so
HTTP clients and caching layers handle the transition natively. This can be
convenient when moving directories with a non-trivial amount of schemas.
What is coming: breaking change detection and lineage
We are actively working on shipping the following features:
Breaking change detection. When Sourcemeta One is deployed, it will analyse changes against the previously published state and refuse to complete if any modification to an existing schema URI would break consumers: a removed required property, a narrowed type, a tightened constraint, or a deleted schema. The registry deployment becomes a compatibility gate, not just a publication step.
Schema lineage. There is currently no formal way for a schema to declare it
is a newer version of a previous one beyond directory conventions. We are
introducing an x-supersedes keyword that lets a schema explicitly declare
which URI it supersedes, independent of its location. This survives
reorganisation and allows forks: a schema can supersede two ancestors,
modelling a merge of previously separate concepts. Sourcemeta One will
visualise these chains in the web explorer. Snowplow's Iglu
registry has
pioneered a similar concept with their $supersedes keyword. For organisations
with consistent directory-based versioning, Sourcemeta One will also support
automatic lineage inference during deployment from path patterns.
Coming Soon
Schema evolution tooling is actively in development and will land incrementally. If breaking change detection or lineage tracking are critical to your evaluation, get in touch to discuss timelines and priority.