Incrementally Adopting Schema Governance
At this point, the model makes sense, and now you are looking at your actual situation: hundreds or thousands of APIs in various states of documentation, owned by different teams, some with OpenAPI specs and some without.
The key principle: you do not need to clean up your existing mess before you start. You start alongside it, one API at a time, while development continues uninterrupted.
Step 0: Know what APIs you have
Axway's 2024 State of Enterprise API Maturity report found that 78% of enterprise leaders do not know how many APIs their organisation has. If you are in that category, start here. The goal is not a perfect catalog, just a working list good enough to identify where to begin.
Ask the teams. A quick conversation with platform and engineering leads, plus a shared spreadsheet of API name, owner, traffic, and spec status, will surface the most important APIs quickly.
Check your API gateways. Kong, AWS API Gateway, Azure API Management, and similar tools hold route configurations that are often the most reliable inventory of what is actually running in production.
Scan your infrastructure. Optic can discover APIs passively from network traffic, which is often the only way to find shadow APIs.
Once you have enough visibility to identify a first candidate API, move on. The inventory can be refined in parallel.
Step 1: Get OpenAPI specs, however you can
If your APIs already have OpenAPI specs, skip to Step 2. If not, the goal is a working first draft. Quality comes later.
Use an LLM. Point Claude Code, GitHub Copilot, or any other AI code assistant at your codebase and ask it to generate an OpenAPI spec. Faster than writing from scratch, and usually good enough to start.
Generate from your framework. Most frameworks have OpenAPI generators built in or available as plugins. OpenAPI Generator covers a wide range of server frameworks.
Infer from gateway traffic. Optic can generate and maintain OpenAPI specs from observed traffic. Kong and AWS API Gateway have supported tooling for this as well.
The result will have inline schemas, inconsistent naming, and missing annotations. That is fine to start. Quality will come later.
Step 2: Verify the spec matches your running API
Before extracting anything, verify the spec actually reflects what the API does. APIContext's research, based on validating billions of API calls, found 75% of production APIs have variances from their published OpenAPI specifications. Extracting schemas from a spec that does not match reality means canonicalising fiction.
We recommend you add one of the following tools to your workflow:
- Prism. A popular validation proxy that checks live API responses against the spec; also serves as a mock server
- Schemathesis. A fantastic tool that automatically generates test cases from the spec and fires them at the running API
- Optic. A tool that diffs actual traffic against the spec in CI, flagging drift as it accumulates
Fix mismatches before moving on. If the spec is wrong, everything built on top of it is wrong too.
Step 3: Pick one API to start with
Pick a single API to work through the full cycle first. The best choice is actively maintained, reasonably well understood, and likely to share concepts with other APIs so the schemas you extract pay dividends immediately.
A good first candidate is usually a core domain API: something defining
Customer, Order, or Product that you already suspect exists in multiple
forms across your landscape.
Step 4: Set up your schema repository and deploy the registry
Unlike many registries that enforce a rigid two-level namespace, Sourcemeta One lets you define the full directory hierarchy to any depth, mirroring your domain model exactly. While convenient, that flexibility can initially feel daunting.
The answer: start simple. Structure matters less than you think, because you can always reorganise later in a backwards-compatible way. For example:
/shared
/primitives
/money
v1.json
/customers
/identity
/customer
v1.json
/finance
/invoicing
/invoice
v1.json
The only thing we strongly recommend at this stage is the use of a version in
every schema path. A JSON Schema is uniquely identified by its URI, which
means the version must be part of the file path. If you create
/customers/customer.json today and dozens of APIs point to it, you have no
room for breaking changes later. /customers/customer/v1.json gives you that
room.
Sourcemeta One supports arbitrary versioning strategies, so do not overthink it: a single number is enough to start. Versioning and schema evolution are covered later in this guide.
Let us handle $id
When a schema has an explicit identifier,
$refs inside it
resolve against the
$id rather than the
file path, which is confusing during local development where files are just
files on disk. Without
$id, relative
$refs resolve
predictably against the file system, which is more intuitive for local
development. The registry assigns
$ids automatically
based on each schema's path and deployment configuration, so you get the
best of both worlds.
Now deploy Sourcemeta One pointing at this repository. It will be mostly empty at this stage, but you need it running to assess the quality of everything you extract next.
Enterprise
If you have an Enterprise license, your registry comes pre-loaded with the Standard Library: thousands of schemas mapped to ISO, IETF, W3C, and other open standards. Before extracting schemas from your first API, browse the Standard Library first. You may find that a significant portion of what you were about to extract already exists: dates, currencies, country codes, addresses, financial data models, and more. What you need to add is only what is genuinely specific to your organisation.
Step 5: Extract the schemas from your first API
Split the OpenAPI spec into separate files using a tool such as redocly
split:
redocly split openapi.yaml --outDir ./split
Review the output, place the schema files into your central repository under the appropriate versioned paths, and commit.
Now check out the registry's health analysis. This is where the process becomes self-guiding: the health report will flag missing descriptions, overly permissive types, inconsistent patterns, and much more. Work through the issues, commit improvements, and watch the score rise. You do not need to decide what good looks like in the abstract. The health checks tell you. Trust the process: the closer you get to 100%, the better your schemas become.
Enterprise
The Enterprise edition lets you configure the registry with custom linting rules that encode your own organisation's specific guidelines: naming conventions, required annotations, structural patterns, or any constraint that matters to your governance standards. Each rule can include your own explanation and remediation guidance, so when a schema fails a check, the developer sees exactly what is expected and why. This turns the registry's health analysis into a living style guide for your data contracts.
Step 6: Update the OpenAPI spec to reference the registry
Replace inline schema definitions with
$refs pointing at the
registry:
# Before
components:
schemas:
Customer:
type: object
properties:
id:
type: string
# After
components:
schemas:
Customer:
$ref: 'https://schemas.yourcompany.com/customers/identity/customer/v1.json'
Tip
Run redocly bundle to
produce the distributable single-file spec for tooling that requires it.
If you prefer schemas on disk rather than live references, jsonschema
install
fetches schemas with integrity verification into your own repository, removing
the runtime dependency on the registry.
Finally, don't forget to re-run your conformance tests from Step 2 to confirm the API still matches the updated spec.
Step 7: Move to the next API
Before extracting schemas from your second API, check what is already in the registry. Anything that already exists should be referenced, not re-extracted.
For each schema in the second API's spec: if it already exists in the registry,
replace the inline definition with a
$ref. If not, extract it
first, then reference it.
This is where the fragmentation cycle stops. The effort for the second API is less than the first. The third less than the second. Each iteration makes the registry more comprehensive and the next iteration faster.
The cycle compounds
Early on, most schemas in each new API are novel. Over time, most already exist
in the registry. The proportion of new work shrinks with each iteration, until
designing a new API becomes largely an assembly exercise: declare the
endpoints, $ref what you
need, and ship with confidence in definitions already reviewed and proven in
production.
The platform team monitors what is being extracted: which schemas appear across APIs in different forms, which local definitions duplicate canonical ones, which newly extracted schemas should be promoted organisation-wide. The registry health analysis and web explorer provide that visibility without requiring manual review of every API.
There is no finish line. The goal is not to centralise everything by a given date. It is to make centralisation the path of least resistance for new work, and to gradually draw existing schemas toward the canonical layer over time.
Need help getting there?
Enterprise customers can engage Sourcemeta for hands-on professional services: API discovery, spec generation, schema extraction, registry setup, and team onboarding. Get in touch.