Provenance & lineage
Geopera records lineage at the moment data is created — not as an afterthought, a log, or a best-effort background job. When an operation produces an artifact, the lineage record is written in the same transaction that creates the artifact. If the lineage can’t be recorded, the operation does not succeed.
This page explains what that means in practice, what gets recorded, and why it’s a guarantee rather than a feature you have to enable.
Provenance by construction
Most systems treat audit trails as something bolted on: a log line, an event emitted after the fact, a table someone hopefully writes to. Those drift — code paths get added that forget to log, and the trail has holes.
Geopera inverts this. The operation pipeline has a fixed shape, and for any operation that isn’t a pure read it runs:
… → open provenance → execute → validate output → seal provenance → auditThe seal step enforces a contract: if an operation declares that it produces an artifact (an item, an order, a job…), it must have emitted that artifact’s lineage, or the operation fails closed with a provenance error — even if the underlying work “succeeded.” There is no flag to turn this off, and no privileged code path that skips it. A capability that produces data and forgets to record where it came from cannot ship.
The result: every produced artifact carries a complete record of how it came to exist, by construction.
What gets recorded
Each producing operation emits one or more artifacts, each linked by derived-from edges to the things it came from. The artifact kinds today:
| Kind | Emitted by (examples) | Typically derived from |
|---|---|---|
order | orders.archive.place, orders.tasking.place | the captures / AOIs it was placed for |
item | uploads.complete, order delivery, processing output | the upload, order, or job that produced it |
asset | asset ingestion | its parent item |
processing_job | processing.create, clip.create_from_item | its input items/assets |
report | reports.generate | the items/analytics it summarised |
share_link | share.link.create | the item or collection it exports |
collection, project | collections.create, projects.create | their parent scope |
For example:
- An archive order is recorded as an
orderartifact derived from each capture it ordered (relation: "ordered"). - A delivered item is recorded as an
itemderived from its order (relation: "delivered_for"). - An uploaded item is an
itemderived from its upload session (relation: "uploaded"). - A clip output is an
itemderived from both the processing job (relation: "produced_by") and the source item it was clipped from.
Chained together, these edges form a walkable graph: a clipped item → the job that made it → the source item → the order that delivered the source → the captures that order bought. Lineage survives even if an upstream source is later archived.
The idempotent no-op
A producing operation that legitimately produces nothing on a given call — e.g. uploads.complete on an already-completed session, or an order placement that returned
a cached result on an idempotent retry — marks itself an idempotent no-op. The seal
contract recognises this and doesn’t demand a fresh emit. So idempotent retries stay
safe and provenance stays honest: a record is written exactly when an artifact is
actually created, and not otherwise.
Why it matters
- Auditability — for any artifact you can answer “where did this come from?” with a record that was written atomically with the artifact, not reconstructed from logs.
- Reproducibility — the inputs an output was derived from are captured, so a result can be traced back to exactly what produced it.
- Trust across clients — because the guarantee lives in the kernel, it holds no matter who called the operation: the portal, an SDK, an AI agent, or a worker registering its job’s outputs all produce the same complete lineage.
Integrity of the record
The provenance store is written through a privileged, security-defined path: operations emit through the kernel, which writes the artifact and its edges atomically; application roles cannot write provenance rows directly. So the lineage graph reflects what the governed operations actually did — it can’t be forged by a client that bypasses the emit path, because there is no such path.
Reading lineage
Walk the graph over the API with provenance.get (scope provenance:read):
curl -s -X POST https://api.geopera.com/v1/op/provenance.get \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{ "entity_type": "item", "entity_id": "it_3a9c...", "direction": "up", "max_depth": 20 }'{
"root": { "entity_type": "item", "entity_id": "it_3a9c..." },
"direction": "up",
"edges": [
{ "depth": 1, "src_type": "item", "src_id": "it_src...", "dst_type": "item", "dst_id": "it_3a9c...", "relation": "duplicated_from", "invocation_id": "..." }
],
"nodes": [ { "entity_type": "item", "entity_id": "it_3a9c...", "depth": 0 }, { "entity_type": "item", "entity_id": "it_src...", "depth": 1 } ]
}direction—up(ancestors: “how was this produced?”),down(descendants: “what came from this?”), orboth.max_depthbounds the walk (1–50).- Supported root types:
item,order,processing_job,collection,project. - Org-scoped: you can only read lineage for a root your organization owns — a
cross-org or unknown root returns
404(no existence leak), exactly like every other read.