Config Management

All secrets and runtime configuration for ShopSTAR3 services are managed through a self-hosted HashiCorp Vault cluster. Services never store secrets in environment files, container images, or Git in plaintext. Vault is the single authoritative source for every credential and configuration value the platform needs at runtime.

Vault Infrastructure#

PropertyValue
DeploymentSelf-hosted, 3-node Raft cluster
Auto-unsealAWS KMS — Vault unseals automatically on pod restart without operator intervention
StorageEBS-backed PersistentVolumeClaims per node
HARaft consensus; one leader, two followers; quorum requires two nodes

Config Delivery to Services#

Services consume secrets through the quarkus-vault extension, which registers Vault as a MicroProfile Config source. The process is:

  1. The Quarkus application starts and the quarkus-vault extension authenticates to Vault using the pod’s Kubernetes ServiceAccount token.
  2. Vault returns the secrets assigned to that service’s role.
  3. The extension resolves any ${} interpolation expressions in application.properties using the fetched values.
  4. The application receives fully resolved environment variables with no further indirection.

Application code requires no changes to pick up a rotated secret — updating the Vault path and restarting the pod is sufficient. There is no SDK or Vault client call in business logic.

flowchart LR
    POD([Service Pod]) -->|K8s ServiceAccount token| VA[Vault\nKubernetes Auth]
    VA -->|Vault token| KV[KV Secrets Engine]
    KV -->|Secrets| QV[quarkus-vault\nextension]
    QV -->|Resolved env vars| APP[Application\nconfig]

Path Structure#

Vault paths follow a two-tier layout. Shared infrastructure configuration lives under a common prefix. Service-specific secrets live under a per-service prefix.

PathContents
ss3/kv/shared/kafkaKafka bootstrap server addresses (AWS MSK endpoints)
ss3/kv/shared/otelOTel exporter endpoint (OTEL_EXPORTER_OTLP_ENDPOINT)
ss3/kv/shared/identityJWKS URL published by identity-service
ss3/kv/{service-name}/dbPostgreSQL host, port, database name, username, password
ss3/kv/{service-name}/api-keysThird-party API credentials specific to the service
ss3/kv/{service-name}/*Any other service-specific secrets

Every service has read access to its own path and to ss3/kv/shared/*. No service can read another service’s path.

Kubernetes Authentication#

Each service is assigned a dedicated Kubernetes ServiceAccount. A corresponding Vault role is created that:

  • Accepts tokens from that specific ServiceAccount in the correct namespace.
  • Grants a read-only policy scoped to ss3/kv/{service-name}/* and ss3/kv/shared/*.
  • Has no write or delete permissions.

This means a compromised service can only read its own secrets — it cannot modify Vault state or read credentials belonging to another service.

GitOps Secret Lifecycle#

Secrets are managed as code but never stored in plaintext. The flow is:

flowchart TD
    ENG([Engineer]) -->|Encrypts with SOPS + AWS KMS| GIT[(Git Repository\nSOPS-encrypted files)]
    GIT -->|Merge to main triggers| JEN[Jenkins Pipeline]
    JEN -->|Decrypts via AWS KMS| PLAIN[Plaintext secrets\nin pipeline memory]
    PLAIN -->|vault kv put| VLT[(HashiCorp Vault)]
    VLT -->|Read at pod startup| SVC([Service Pod])
  1. Engineers encrypt secret values using SOPS with AWS KMS as the key provider.
  2. The encrypted files are committed to Git. Git is the source of record for the secrets lifecycle — creation, rotation, and deletion are all tracked as commits.
  3. On merge to main, a Jenkins pipeline step decrypts the SOPS-encrypted files using the KMS key (the pipeline role has KMS decrypt permission) and writes the plaintext values into Vault via vault kv put.
  4. The decrypted values exist only in pipeline memory and are never written to disk or logged.

Startup Contract#

A service pod that cannot reach Vault at startup will fail to start. This is intentional. Partial configuration — where some secrets are available and others are not — is treated as an unacceptable state. A pod that starts with incomplete configuration can behave unpredictably in ways that are harder to diagnose than a clean startup failure.

Kubernetes liveness and readiness probes catch the failed pod and surface it through standard cluster health tooling. The pod will not receive traffic until it starts successfully with a complete configuration.