<!-- Source: https://docs.geopera.com/api-reference/guides/catalog-discovery · Markdown for LLMs -->

# Catalog discovery

Discovery is how you turn an area of interest into a list of orderable captures. Three
read operations cover it: `catalog.search` queries one commercial host, `catalog.federated_search`
fans out across every data source covering your AOI, and `stac.search` searches your
organization's own STAC items.

| Operation                  | What it searches                               | Side-effect | Scope          |
| -------------------------- | ---------------------------------------------- | ----------- | -------------- |
| `catalog.search`           | One commercial host's catalog (price-enriched) | read        | `catalog:read` |
| `catalog.federated_search` | Every registry source covering the AOI, merged | read        | `catalog:read` |
| `stac.search`              | Your organization's own STAC items             | read        | `items:read`   |

All three are invoked the same way as every Geopera operation — `POST /v1/op/{operation_id}`
with a JSON body. There are no path or query parameters; the meaning lives in the
operation name. See [Operations](/api-reference/operations) and [Concepts](/api-reference/concepts)
for the RPC-over-HTTP model, and [Authentication](/api-reference/authentication) for the
`Bearer` token (a session token or a `gpra_`-prefixed API key).

## Choosing a search operation

- **You know which vendor you want** (e.g. a specific high-resolution provider) — use
  `catalog.search` with that `host_name`.
- **You just want "what imagery exists here"** across free and commercial sources at once
  — use `catalog.federated_search`. It routes the query only to sources whose coverage
  matches the AOI and returns one merged, type-tagged result set.
- **You want to search items your organization already owns** (delivered orders, uploads)
  — use `stac.search`, which is org-scoped and speaks CQL2-JSON.

## Filtering inputs

`catalog.search` and `catalog.federated_search` share the same core filter fields
(the `CatalogSearchInput` / `FederatedSearchInput` schemas). Every filter is optional
except `host_name` on `catalog.search`.

| Field         | Type     | Meaning                                                                                                                                                   |
| ------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `host_name`   | string   | **Required for `catalog.search`** — the vendor host to query (e.g. `earthsearch-aws`, `twentyoneat`, `cgstl`, `siis`, `spacewill`, `improsat`, `vantor`). |
| `collections` | string[] | Restrict to specific STAC collection ids (e.g. `sentinel-2-l2a`).                                                                                         |
| `ids`         | string[] | Fetch specific item ids directly.                                                                                                                         |
| `datetime`    | string   | An RFC 3339 instant, or a `start/end` interval. Date-only bounds like `2024-01-01` are expanded to whole days.                                            |
| `bbox`        | number[] | AOI as `[west, south, east, north]`. Mutually exclusive with `intersects`.                                                                                |
| `intersects`  | object   | AOI as a GeoJSON **Polygon**. Mutually exclusive with `bbox`.                                                                                             |
| `query`       | object   | Property filters — currently `cloudCoverage` with a comparison operator (see below).                                                                      |
| `limit`       | integer  | Max items to return, `1`–`500` (default `100`).                                                                                                           |
| `next`        | string   | `catalog.search` only — the pagination cursor from the previous page.                                                                                     |

`catalog.federated_search` adds two facets that narrow the fan-out:

| Field        | Type     | Meaning                                                                                          |
| ------------ | -------- | ------------------------------------------------------------------------------------------------ |
| `source_ids` | string[] | Only query these registry source ids.                                                            |
| `data_types` | string[] | Only query collections of these types — `raster`, `dem`, `pointcloud`, `altimetry`, or `vector`. |

### Spatial filter: bbox or intersects

Provide **exactly one** spatial filter. Passing both `bbox` and `intersects` returns a
`400`. `bbox` is a four-number array in `[west, south, east, north]` (WGS84) order:

```json
{ "bbox": [151.1, -33.92, 151.3, -33.78] }
```

`intersects` must be a GeoJSON **Polygon** (other geometry types are rejected with `422`).
The polygon may not exceed 999 vertices or roughly 50,000 km²:

```json
{
	"intersects": {
		"type": "Polygon",
		"coordinates": [
			[
				[151.1, -33.92],
				[151.3, -33.92],
				[151.3, -33.78],
				[151.1, -33.78],
				[151.1, -33.92]
			]
		]
	}
}
```

### Time filter: datetime

`datetime` accepts a single instant or a `start/end` interval. A bare date is treated as
the whole day, and an open-ended interval uses `..`:

```json
{ "datetime": "2024-06-01T00:00:00Z/2024-06-30T23:59:59Z" }
```

```json
{ "datetime": "2024-01-01/.." }
```

### Cloud-cover filter: query.cloudCoverage

Cloud cover is filtered through the `query` object under the `cloudCoverage` key, using a
single comparison operator — `GTE`, `LTE`, `GT`, or `LT` — with a percentage value:

```json
{ "query": { "cloudCoverage": { "LTE": 20 } } }
```

This keeps captures with 20% cloud cover or less.

## Understanding price-enriched results

`catalog.search` returns a STAC `FeatureCollection`. Each feature is a capture you can
order, and the backend enriches commercial features in place with **server-authoritative
pricing** computed by the same engine the order path uses. Look for these added fields on
each feature's `properties`:

| Property            | Meaning                                            |
| ------------------- | -------------------------------------------------- |
| `pricePerSqKm`      | Archive price per km² in AUD.                      |
| `creditsPerSqKm`    | The same rate in credits (100 credits = A$1).      |
| `pricingResolution` | The resolution tier the price was matched against. |

Pricing is area-independent here (the rate is read off a 1 km² probe), so multiply by your
AOI area to estimate total cost — or call `orders.archive.estimate` for the exact figure
over your geometry. A capture from an unpriced vendor, or one with no pricing config, is
returned without these fields. When you authenticate as a principal with an organization,
any contracted (org-custom) rates overlay the default list pricing automatically; a
pre-workspace user simply sees default list pricing.

`catalog.federated_search` does not price-enrich. Instead it tags each merged feature with
the source and fork metadata the viewer needs: `sourceId`, `sourceTitle`, `collectionId`,
`dataType`, `accessModel`, and `renderStrategy`. The response also carries a top-level
`sources` array summarizing each queried source's `count` and any per-source `error` —
one slow or failing upstream never sinks the whole search.

## Worked example: search one commercial host

Find recent, low-cloud Sentinel-2 captures over an AOI in Sydney from the public
`earthsearch-aws` host.

```http
POST /v1/op/catalog.search HTTP/1.1
Host: api.geopera.com
Authorization: Bearer gpra_your_api_key
Content-Type: application/json

{
  "host_name": "earthsearch-aws",
  "collections": ["sentinel-2-l2a"],
  "bbox": [151.10, -33.92, 151.30, -33.78],
  "datetime": "2024-06-01T00:00:00Z/2024-06-30T23:59:59Z",
  "query": { "cloudCoverage": { "LTE": 20 } },
  "limit": 25
}
```

A successful response is a STAC `FeatureCollection`:

```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "type": "FeatureCollection",
  "features": [
    {
      "id": "S2B_56HLH_20240612_0_L2A",
      "collection": "sentinel-2-l2a",
      "geometry": { "type": "Polygon", "coordinates": [ ... ] },
      "properties": {
        "datetime": "2024-06-12T00:02:11Z",
        "eo:cloud_cover": 8.4,
        "gsd": 10,
        "pricePerSqKm": 0,
        "creditsPerSqKm": 0,
        "pricingResolution": null
      }
    }
  ],
  "numberReturned": 1
}
```

### curl

```bash
curl -s -X POST https://api.geopera.com/v1/op/catalog.search \
  -H "Authorization: Bearer $GEOPERA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "host_name": "earthsearch-aws",
    "collections": ["sentinel-2-l2a"],
    "bbox": [151.10, -33.92, 151.30, -33.78],
    "datetime": "2024-06-01T00:00:00Z/2024-06-30T23:59:59Z",
    "query": { "cloudCoverage": { "LTE": 20 } },
    "limit": 25
  }'
```

### Python

The [Python SDK](/api-reference/sdks/python) (`pip install geopera`) exposes the operation
as `catalog_search` with a typed `CatalogSearchInput` body.

```python
from geopera import AuthenticatedClient
from geopera.api.operations import catalog_search
from geopera.models import CatalogSearchInput

client = AuthenticatedClient(
    base_url="https://api.geopera.com",
    token="gpra_...",
)

result = catalog_search.sync(
    client=client,
    body=CatalogSearchInput(
        host_name="earthsearch-aws",
        collections=["sentinel-2-l2a"],
        bbox=[151.10, -33.92, 151.30, -33.78],
        datetime="2024-06-01T00:00:00Z/2024-06-30T23:59:59Z",
        query={"cloudCoverage": {"LTE": 20}},
        limit=25,
    ),
)

for feature in result.to_dict()["features"]:
    props = feature["properties"]
    print(feature["id"], props.get("eo:cloud_cover"), props.get("creditsPerSqKm"))
```

### TypeScript

The [TypeScript SDK](/api-reference/sdks/typescript) (`@geopera/sdk`) calls every
operation through `client.invoke(operationId, body)`.

```typescript
import { GeoperaClient } from '@geopera/sdk';

const client = new GeoperaClient({ token: 'gpra_...' });

const result = await client.invoke('catalog.search', {
	host_name: 'earthsearch-aws',
	collections: ['sentinel-2-l2a'],
	bbox: [151.1, -33.92, 151.3, -33.78],
	datetime: '2024-06-01T00:00:00Z/2024-06-30T23:59:59Z',
	query: { cloudCoverage: { LTE: 20 } },
	limit: 25
});

for (const feature of result.features) {
	console.log(feature.id, feature.properties.creditsPerSqKm);
}
```

## Worked example: federated search

To answer "what imagery exists over this AOI?" across every source at once, drop the
`host_name` and call `catalog.federated_search`. Optionally narrow the fan-out with
`data_types` or `source_ids`.

```bash
curl -s -X POST https://api.geopera.com/v1/op/catalog.federated_search \
  -H "Authorization: Bearer $GEOPERA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bbox": [151.10, -33.92, 151.30, -33.78],
    "datetime": "2024-01-01/..",
    "data_types": ["raster"],
    "limit": 50
  }'
```

The response merges features across sources (most-recent-first, capped at `limit`) and
includes a `sources` summary:

```json
{
	"type": "FeatureCollection",
	"features": [
		{
			"id": "S2B_56HLH_20240612_0_L2A",
			"properties": {
				"datetime": "2024-06-12T00:02:11Z",
				"sourceId": "earthsearch-aws",
				"sourceTitle": "Element 84 Earth Search",
				"collectionId": "sentinel-2-l2a",
				"dataType": "raster",
				"accessModel": "free",
				"renderStrategy": "..."
			}
		}
	],
	"numberReturned": 1,
	"sources": [
		{
			"source_id": "earthsearch-aws",
			"title": "Element 84 Earth Search",
			"count": 1,
			"error": null
		}
	]
}
```

To discover which sources, collections, and data types are available before you search,
call `catalog.sources.list`.

## Worked example: search your own STAC items

`stac.search` searches the STAC items owned by the authenticated principal's organization
(it requires an organization — a principal without one gets a `403`). It is the
provider-compatible search surface and speaks CQL2-JSON through the `filter` field, with
`sortby`, `page`, and a `limit` up to `10000`.

```bash
curl -s -X POST https://api.geopera.com/v1/op/stac.search \
  -H "Authorization: Bearer $GEOPERA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "collections": ["my-deliveries"],
    "bbox": [151.10, -33.92, 151.30, -33.78],
    "datetime": "2024-01-01T00:00:00Z/..",
    "filter": { "op": "<=", "args": [ { "property": "eo:cloud_cover" }, 20 ] },
    "filter-lang": "cql2-json",
    "sortby": [ { "field": "datetime", "direction": "desc" } ],
    "limit": 50,
    "page": 0
  }'
```

Note the differences from the catalog operations: `stac.search` filters cloud cover (and
any other property) through CQL2-JSON in `filter` rather than `query.cloudCoverage`, and it
pages with a numeric `page` offset rather than a `next` cursor.

## Pagination

When more results exist, `catalog.search` includes a STAC `links` entry with `rel: "next"`
whose `href` carries the opaque cursor (`?next=...`); pass that cursor back as the `next`
field on the following request to fetch the next page. `stac.search` uses a numeric `page`
offset instead. See [Pagination](/api-reference/pagination) for the full cursor and offset
conventions, and use [Idempotency](/api-reference/idempotency) only on write operations —
search is a read.

## Edge cases and gotchas

- **`bbox` and `intersects` are mutually exclusive.** Sending both returns a `400`; pick one.
- **`intersects` must be a GeoJSON Polygon.** Points, lines, and MultiPolygons are rejected
  with a `422`, as are polygons over 999 vertices or ~50,000 km².
- **Date-only datetimes are expanded for you.** `2024-01-01` becomes the whole day; you do
  not need to add the time component, though full RFC 3339 timestamps are always safe.
- **`host_name` is required for `catalog.search`.** An unknown host returns a `404` listing
  the known hosts. Use `catalog.federated_search` when you do not want to pick one.
- **`limit` caps at 500** for the catalog operations (`stac.search` allows up to 10000).
- **Upstream failures surface as problems.** A commercial host timing out returns `504`; an
  upstream error with no results returns `502`. In federated search, a single source's
  failure is isolated into its `sources[].error` and never fails the whole call.
- **A large result set?** For full multi-vendor coverage with progressive results, see
  `catalog.search_stream`, an NDJSON-streaming variant of `catalog.search`.

All errors are RFC 9457 `problem+json` — see [Errors](/api-reference/errors) for the shape
and status-code catalog. The scopes referenced here (`catalog:read`, `items:read`) are
documented in [Scopes](/api-reference/scopes).
