ShopSTAR3 uses a split communication model: asynchronous event-driven messaging for domain events and integrations, synchronous gRPC for internal request/response calls, and REST for external and admin-facing APIs.
Async Messaging — Kafka#
Apache Kafka (via AWS MSK) is the platform’s event bus.
All domain events (order placed, inventory updated, customer data changed, etc.) and integration triggers are published to Kafka topics. Kafka’s durable, replayable log model serves two platform goals simultaneously:
- Integration ready — external systems subscribe to topics and react to events without polling.
- BI-ready — BI tools consume event streams directly from topics, no intermediate ETL pipeline required.
Quarkus services use the SmallRye Reactive Messaging extension with the Kafka connector for producers and consumers.
Synchronous Communication — gRPC / REST#
| Boundary | Protocol | Serialisation |
|---|---|---|
| Service → Service (internal) | gRPC | Protobuf |
| Admin SPA → Backend | REST | JSON |
| External integrations → Platform | REST | JSON |
gRPC internally provides strongly typed service contracts, efficient binary serialisation, and built-in streaming. Quarkus services expose and consume gRPC via the quarkus-grpc extension.
REST externally keeps the admin API and any public-facing surface accessible without Protobuf tooling on the client side.
Schema & Contract Management#
All Kafka event payloads and gRPC message types are defined in Protobuf. Schemas are managed in a dedicated ss3-protos Git repository and enforced through a self-hosted Apicurio Registry instance.
- Kafka event schemas use FULL compatibility — no field removals, no type changes, additions only. Enforced by
buf breakingin CI (pre-merge) and by Apicurio server-side (post-merge). - REST OpenAPI specs use BACKWARD compatibility — registered in Apicurio at build time by each service’s CI pipeline.
- gRPC protos are enforced at build time via code generation from
ss3-protos; they are not registered in the schema registry.
See Schema Registry for the full specification.
Service Mesh — Istio#
Istio is the service mesh, deployed on Kubernetes across all environments.
Istio was chosen over AWS App Mesh specifically because it runs identically on AWS, Azure, and GCP — consistent with the platform’s cloud portability requirement.
What Istio Provides#
| Capability | How |
|---|---|
| mTLS | Envoy sidecars automatically encrypt and mutually authenticate all inter-service traffic. No plaintext internal calls. |
| Traffic splitting | Canary and blue/green rollouts controlled at the network layer via VirtualService rules, independent of application code. |
| Network authz | AuthorizationPolicy resources enforce which services are permitted to call which. Policy-driven, not convention-driven. |
| L7 observability | Envoy sidecars export per-route request rates, error rates, and latency into the existing OTel collector pipeline automatically. |
Layering with Quarkus Fault Tolerance#
Istio and Quarkus SmallRye Fault Tolerance are complementary, not redundant:
- Quarkus (application layer) — handles retries and circuit breaking for calls the application initiates. Aware of business context (e.g. can fall back to a cached value).
- Istio (network layer) — handles infrastructure-level retries during rolling deploys, pod restarts, and transient network failures before the request ever reaches application code.