All storefront data access goes through a central GraphQL federation gateway. This gives the storefront a single, composable API surface for all domain data without requiring it to hold per-service gRPC stubs or know which service owns which type.
Service-to-service calls continue to use gRPC directly and do not pass through the gateway. The gateway is a storefront concern, not a general-purpose internal API proxy.
Technology Choice#
The gateway runs Apollo Router — a Rust binary that implements the Apollo Federation specification.
The JVM ecosystem does not have a production-ready federation gateway. The distinction matters:
| Tool | What it is |
|---|---|
| Quarkus / SmallRye GraphQL | Subgraph framework — builds a federated subgraph, not a gateway |
| Netflix DGS | Subgraph framework — builds a federated subgraph, not a gateway |
| federation-jvm | Library for adding Federation directives to a JVM schema — not a gateway |
| Apollo Router | Federation gateway — composes subgraph schemas and executes query plans |
| Hive Router | Federation gateway — less mature; Apollo Router chosen for greater production history |
Apollo Router was chosen over Hive Router on the basis of maturity. Both implement the Federation specification, but Apollo Router has broader production adoption and a more complete feature set at the time of this decision.
Deployment Topology#
The router runs as a dedicated service inside the Istio mesh. All traffic between the router and subgraph services is encrypted and mutually authenticated via Istio mTLS — no additional TLS configuration is required at the application layer.
The storefront-service is the only service that calls the gateway. Browsers never call the gateway directly.
flowchart TD
SF[storefront-service] -->|GraphQL over HTTP\nistio mTLS| GW[Apollo Router\ngraphql-gateway]
GW -->|Subgraph query\nistio mTLS| CAT[catalog-service\nsubgraph]
GW -->|Subgraph query\nistio mTLS| INV[inventory-service\nsubgraph]
GW -->|Subgraph query\nistio mTLS| CST[customer-service\nsubgraph]
GW -->|Subgraph query\nistio mTLS| MKT[marketing-service\nsubgraph]
GW -->|Subgraph query\nistio mTLS| N[other subgraphs...]
note1([Browsers never call\nthe gateway directly])Subgraph Requirements#
Every service that needs to expose data to the storefront must become a subgraph. The requirements per service are:
- Add the
quarkus-smallrye-graphqlextension to the service’s build. - Annotate the schema with Federation directives —
@key,@external,@provides,@requiresas appropriate for the types the service owns or extends. _entitiesand_serviceresolvers are generated automatically by SmallRye GraphQL when Federation support is enabled. No manual implementation is required.- Register the subgraph URL in the Apollo Router configuration. This is the only change required outside the service itself.
Adding a new service as a subgraph does not require a storefront-service redeployment. The router picks up schema changes through its own reload mechanism.
Persisted Queries#
Store-level code — Lua resolvers and {% fetch %} tags — can only issue registered persisted query IDs. Arbitrary GraphQL strings are rejected with a 400 response.
This restriction serves two purposes:
- Prevents schema harvesting — a malicious or compromised store-level script cannot enumerate the full schema by sending introspection queries or exploratory field selections.
- Prevents N+1 abuse — store-level code cannot construct unbounded queries at runtime; every query that reaches the gateway has been reviewed and registered by the platform team.
Platform-level DataResolvers (implemented in Java, running inside storefront-service) are not subject to this restriction. They communicate with the gateway using the full GraphQL protocol because they are platform-controlled code, not store-operator-controlled code.
| Caller | Query mode | Rationale |
|---|---|---|
| Store-level Lua resolver | Persisted query ID only | Untrusted; operator-supplied code |
Store-level {% fetch %} tag | Persisted query ID only | Untrusted; operator-supplied code |
| Platform DataResolver (Java) | Full GraphQL | Trusted; platform-controlled code |
Observability#
Apollo Router exports OpenTelemetry traces natively to the same OTLP endpoint used by all other platform services. No additional instrumentation adapter is required.
Per-subgraph latency and error rate metrics are included in the router’s OTel output and land in the Datadog pipeline alongside metrics from all other services. This makes it possible to identify which subgraph is contributing to elevated gateway latency without inspecting individual service logs.