All secrets and runtime configuration for ShopSTAR3 services are managed through a self-hosted HashiCorp Vault cluster. Services never store secrets in environment files, container images, or Git in plaintext. Vault is the single authoritative source for every credential and configuration value the platform needs at runtime.
Vault Infrastructure#
| Property | Value |
|---|---|
| Deployment | Self-hosted, 3-node Raft cluster |
| Auto-unseal | AWS KMS — Vault unseals automatically on pod restart without operator intervention |
| Storage | EBS-backed PersistentVolumeClaims per node |
| HA | Raft consensus; one leader, two followers; quorum requires two nodes |
Config Delivery to Services#
Services consume secrets through the quarkus-vault extension, which registers Vault as a MicroProfile Config source. The process is:
- The Quarkus application starts and the
quarkus-vaultextension authenticates to Vault using the pod’s Kubernetes ServiceAccount token. - Vault returns the secrets assigned to that service’s role.
- The extension resolves any
${}interpolation expressions inapplication.propertiesusing the fetched values. - The application receives fully resolved environment variables with no further indirection.
Application code requires no changes to pick up a rotated secret — updating the Vault path and restarting the pod is sufficient. There is no SDK or Vault client call in business logic.
flowchart LR
POD([Service Pod]) -->|K8s ServiceAccount token| VA[Vault\nKubernetes Auth]
VA -->|Vault token| KV[KV Secrets Engine]
KV -->|Secrets| QV[quarkus-vault\nextension]
QV -->|Resolved env vars| APP[Application\nconfig]Path Structure#
Vault paths follow a two-tier layout. Shared infrastructure configuration lives under a common prefix. Service-specific secrets live under a per-service prefix.
| Path | Contents |
|---|---|
ss3/kv/shared/kafka | Kafka bootstrap server addresses (AWS MSK endpoints) |
ss3/kv/shared/otel | OTel exporter endpoint (OTEL_EXPORTER_OTLP_ENDPOINT) |
ss3/kv/shared/identity | JWKS URL published by identity-service |
ss3/kv/{service-name}/db | PostgreSQL host, port, database name, username, password |
ss3/kv/{service-name}/api-keys | Third-party API credentials specific to the service |
ss3/kv/{service-name}/* | Any other service-specific secrets |
Every service has read access to its own path and to ss3/kv/shared/*. No service can read another service’s path.
Kubernetes Authentication#
Each service is assigned a dedicated Kubernetes ServiceAccount. A corresponding Vault role is created that:
- Accepts tokens from that specific ServiceAccount in the correct namespace.
- Grants a read-only policy scoped to
ss3/kv/{service-name}/*andss3/kv/shared/*. - Has no write or delete permissions.
This means a compromised service can only read its own secrets — it cannot modify Vault state or read credentials belonging to another service.
GitOps Secret Lifecycle#
Secrets are managed as code but never stored in plaintext. The flow is:
flowchart TD
ENG([Engineer]) -->|Encrypts with SOPS + AWS KMS| GIT[(Git Repository\nSOPS-encrypted files)]
GIT -->|Merge to main triggers| JEN[Jenkins Pipeline]
JEN -->|Decrypts via AWS KMS| PLAIN[Plaintext secrets\nin pipeline memory]
PLAIN -->|vault kv put| VLT[(HashiCorp Vault)]
VLT -->|Read at pod startup| SVC([Service Pod])- Engineers encrypt secret values using SOPS with AWS KMS as the key provider.
- The encrypted files are committed to Git. Git is the source of record for the secrets lifecycle — creation, rotation, and deletion are all tracked as commits.
- On merge to
main, a Jenkins pipeline step decrypts the SOPS-encrypted files using the KMS key (the pipeline role has KMS decrypt permission) and writes the plaintext values into Vault viavault kv put. - The decrypted values exist only in pipeline memory and are never written to disk or logged.
Startup Contract#
A service pod that cannot reach Vault at startup will fail to start. This is intentional. Partial configuration — where some secrets are available and others are not — is treated as an unacceptable state. A pod that starts with incomplete configuration can behave unpredictably in ways that are harder to diagnose than a clean startup failure.
Kubernetes liveness and readiness probes catch the failed pod and surface it through standard cluster health tooling. The pod will not receive traffic until it starts successfully with a complete configuration.