- Published on
Stop shipping secrets in Git — the AKS Workload Identity + External Secrets pattern I use
- Authors

- Name
- Krzysztof Kozłowski
I'll be honest. For a long time I used Kubernetes Secrets without ever stopping to ask what was actually inside them.
It looked safe. The resource is literally called Secret. The values are "encoded." There are dedicated RBAC verbs. The platform was clearly treating them as sensitive — right?
It wasn't.
A Kubernetes Secret is base64-encoded plaintext sitting in etcd. By default, etcd isn't encrypted at rest. By default, anyone with get secrets in a namespace can decode them with base64 -d. By default, every pod that mounts the secret has the plaintext value on disk inside the container.
The word "secret" in the API does not mean "encrypted." It means "we agreed not to print this to stdout."
So I stopped putting them in the cluster.
Here's the pattern I've landed on — end to end. The pod reaches into Azure Key Vault using its own federated identity, and the values it needs show up in the application without ever passing through a YAML file. No kubectl create secret. No kubeseal. No sealed-secrets sync. No vault-injector sidecar.
It's not a simple setup. But once it's running, a whole class of risk goes away.
What's actually wrong with Kubernetes Secrets
Five things, in order of how often they cause real problems:
1. They're plaintext in etcd unless you turn on encryption-at-rest, and most clusters don't have it on. AKS has had envelope encryption with customer-managed keys for a while, but it's opt-in and most clusters I've seen don't have it enabled. EKS and GKE are the same story. The default is plaintext.
2. Any kubectl get secrets reader in a namespace can decode every value. This is the killer. RBAC for get secrets ends up granted broadly — to operators, to debugging dashboards, to oncall users, to CI/CD pipelines, to ArgoCD. If any one of those bindings has too much access, it reads every secret in that scope.
3. Rotation is manual or non-existent. When a database password rotates in Key Vault, the value in the cluster doesn't change. You either rebuild the secret (and bounce pods), or you build a sidecar that polls. Neither is included with the platform.
4. They show up in places they shouldn't. Helm release history. ArgoCD diff views. CI logs of kubectl apply -f. Pod descriptions if anyone runs kubectl describe pod against the wrong thing. Backup tools. Each is a chance for a base64-encoded password to land in a Slack thread.
5. They make GitOps harder, not easier. You can't put real secrets in Git, so you reach for sealed-secrets or SOPS, both of which are fine but introduce their own key management problem. Now you have two secret stores instead of one.
Three of these five — rotation, leak surface, GitOps friction — go away with the pattern below. One does not. The K8s Secret you end up with is still base64 plaintext in etcd. Anyone with get secrets in your namespace still reads the value.
So the goal isn't "make the K8s Secret stronger."
The goal is to make it the least important place where that value lives.
The architecture
The mental model is the pod proves who it is to Azure, and Azure hands it the secret directly. There is no shared password between cluster and cloud. There is no API key that the cluster owns.
The cluster never holds a credential it can leak, because it never held one in the first place.
The flow in words:
- AKS issues an OIDC token for the pod's ServiceAccount.
- Azure trusts that OIDC token via a federated credential attached to a User-Assigned Managed Identity.
- External Secrets Operator runs inside the cluster and uses the same Workload Identity mechanism to authenticate to Key Vault.
- ESO reads the value from Key Vault on a schedule, writes a regular Kubernetes Secret into the app namespace.
- The app pod mounts that secret like any other secret. The difference: nobody hand-wrote the value, nobody put it in Git, and rotation in Key Vault propagates automatically.
The K8s Secret still exists transiently — that's how the app's container env vars and volume mounts work. It is derived state, not source of truth.
But wait — isn't the K8s Secret still base64?
Yes. The Secret that ESO writes is exactly as decodable as one you create by hand. Anyone with get secrets in that namespace can read the current value.
This catches most people off guard. It caught me off guard.
The setup isn't protecting the current value inside the cluster. It's removing every other place where that value used to live — and shrinking the lifetime of what's left.
Before, the database password lived in a sealed-secret blob in Git, in the CI log of the last kubectl apply, in the Service Principal client secret your app used to pull it, in a few pods that mounted it weeks ago, and in your kubectl get secret history in zsh. Forever, in all of those places.
After, the value lives in Key Vault. It syncs to one K8s Secret per namespace, refreshed hourly. There is no Git history. There is no Service Principal client secret. There is no "shared cluster credential to Azure." The K8s Secret is a one-hour-old cache. You can blow it away and ESO rebuilds it.
You didn't make base64 stronger. You made the K8s Secret the least important copy of that value in your system.
If an attacker gets kubectl get secrets in your namespace, they read the current password. Rotate in Key Vault, the leak ages out in an hour, and Azure Activity Log tells you exactly what their identity touched. Compare that to a sealed-secret leak: the attacker has the value forever, you can't tell if anyone ever decoded it, and rotation means a Git commit dance.
That's the trade.
What the app looks like before and after
Before the setup, the punchline. This is the same application Deployment, talking to Key Vault, before and after the switch.
Before — Service Principal with a client secret, baked into env vars:
spec:
template:
spec:
containers:
- name: app
env:
- name: AZURE_TENANT_ID
valueFrom:
secretKeyRef:
name: sp-credentials
key: tenant-id
- name: AZURE_CLIENT_ID
valueFrom:
secretKeyRef:
name: sp-credentials
key: client-id
- name: AZURE_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: sp-credentials
key: client-secret
After — Workload Identity, one label, zero credentials:
spec:
template:
metadata:
labels:
azure.workload.identity/use: "true"
spec:
serviceAccountName: app-sa
containers:
- name: app
# no client secret env vars — DefaultAzureCredential picks
# up the federated token from the projected volume
Same application code. Same Azure SDK. Same Key Vault. What changed is that the secret values are gone, and with them the Service Principal, the password rotation runbook, and the sp-credentials Secret object.
That's the destination. Here's how to get there.
End to end — Terraform first
Here's the order that worked for me. Not the order you'll see in tutorials. Skip the part where I tried to install ESO first and spent half a day debugging missing permissions before realizing OIDC wasn't even on yet.
1. Enable OIDC and Workload Identity on AKS
This is two flags on the AKS cluster. Easy to miss because the defaults are off.
resource "azurerm_kubernetes_cluster" "this" {
name = "aks-prod"
location = azurerm_resource_group.this.location
resource_group_name = azurerm_resource_group.this.name
dns_prefix = "aks-prod"
oidc_issuer_enabled = true
workload_identity_enabled = true
identity {
type = "SystemAssigned"
}
default_node_pool {
name = "system"
vm_size = "Standard_D4ds_v5"
node_count = 3
}
}
After apply, the cluster has an OIDC issuer URL. You'll need it for the federated credential. Read it from the resource:
output "oidc_issuer_url" {
value = azurerm_kubernetes_cluster.this.oidc_issuer_url
}
2. The managed identity and its Key Vault access
This is what your pod will become, from Azure's perspective.
resource "azurerm_user_assigned_identity" "app" {
name = "id-app-prod"
location = azurerm_resource_group.this.location
resource_group_name = azurerm_resource_group.this.name
}
resource "azurerm_role_assignment" "app_kv_reader" {
scope = azurerm_key_vault.this.id
role_definition_name = "Key Vault Secrets User"
principal_id = azurerm_user_assigned_identity.app.principal_id
}
A few things worth flagging:
- I'm using RBAC mode on Key Vault, not access policies. If your Key Vault is still on access policies, switch — it's been generally available for years and access policies are slower to update, harder to audit, and don't match how the rest of Azure RBAC works. The role
Key Vault Secrets Useris the read-only role at the data plane. - The identity is per-application, not per-cluster. One managed identity per service is the right level. A shared "cluster identity" with broad Key Vault access defeats the point.
3. The federated credential
This is the part that actually makes the OIDC trust work. It maps a specific ServiceAccount in a specific namespace to this identity.
resource "azurerm_federated_identity_credential" "app" {
name = "fc-app-prod"
resource_group_name = azurerm_resource_group.this.name
parent_id = azurerm_user_assigned_identity.app.id
audience = ["api://AzureADTokenExchange"]
issuer = azurerm_kubernetes_cluster.this.oidc_issuer_url
subject = "system:serviceaccount:app:app-sa"
}
The subject field is the single thing that bites everyone the first time. The format is strict:
system:serviceaccount:<namespace>:<serviceaccount-name>
If your namespace is prod-app and your ServiceAccount is app, the subject must be exactly system:serviceaccount:prod-app:app. One character off, no useful error. The token exchange just silently returns "unauthorized" and you spend 90 minutes thinking your Key Vault permissions are wrong.
A second one I learned the hard way: the subject is case-sensitive, even though Kubernetes itself usually isn't. Lowercase everywhere.
4. External Secrets Operator — install
ESO goes into its own namespace, cluster-wide.
helm repo add external-secrets https://charts.external-secrets.io
helm install external-secrets external-secrets/external-secrets \
-n external-secrets --create-namespace \
--set installCRDs=true \
--set serviceAccount.annotations."azure\.workload\.identity/client-id"=$ESO_CLIENT_ID \
--set podLabels."azure\.workload\.identity/use"=true
ESO itself uses Workload Identity to talk to Key Vault. So ESO also gets its own managed identity with read access to Key Vault. The same federated credential pattern, just pointed at ESO's ServiceAccount.
In Terraform, that's a second pair of azurerm_user_assigned_identity + azurerm_federated_identity_credential, subject system:serviceaccount:external-secrets:external-secrets.
5. The SecretStore in your app namespace
SecretStore is the ESO custom resource that describes "where to fetch from." One per namespace, usually.
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: azure-kv
namespace: app
spec:
provider:
azurekv:
authType: WorkloadIdentity
vaultUrl: https://kv-prod.vault.azure.net
serviceAccountRef:
name: external-secrets
namespace: external-secrets
Note the serviceAccountRef — ESO uses its own SA to authenticate, not the app's. This is correct. The app pod uses its own SA later, but ESO is what actually pulls values from Key Vault, not the app.
6. The ExternalSecret resource
This is the per-secret declaration. You write one of these per secret you want synced.
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-db-credentials
namespace: app
spec:
refreshInterval: 1h
secretStoreRef:
name: azure-kv
kind: SecretStore
target:
name: app-db-credentials # the K8s Secret that will be created
creationPolicy: Owner # ESO owns it; manual edits get reverted
data:
- secretKey: DB_PASSWORD
remoteRef:
key: app-prod-db-password
- secretKey: DB_USERNAME
remoteRef:
key: app-prod-db-username
After this applies, ESO creates a regular Kubernetes Secret called app-db-credentials in the app namespace. Your application doesn't know anything about ESO or Key Vault. It mounts the secret like always.
7. The application pod
Two things on the app side: a ServiceAccount with the right annotation, a pod with the right label.
apiVersion: v1
kind: ServiceAccount
metadata:
name: app-sa
namespace: app
annotations:
azure.workload.identity/client-id: <client-id-of-azurerm_user_assigned_identity.app>
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
namespace: app
spec:
template:
metadata:
labels:
app: app
azure.workload.identity/use: "true" # the magic label
spec:
serviceAccountName: app-sa
containers:
- name: app
image: ghcr.io/example/app:1.0.0
envFrom:
- secretRef:
name: app-db-credentials
The client-id is the GUID of the User-Assigned Managed Identity from step 2. The label azure.workload.identity/use: "true" is what triggers the AKS Workload Identity webhook to inject the token volume into the pod. Without that label, no token. Silently.
If the pod also needs to call Azure APIs directly (not just via ESO), the Workload Identity token is mounted at /var/run/secrets/azure/tokens/azure-identity-token and the Azure SDK picks it up automatically via DefaultAzureCredential. No code changes — just remove your old client secret env vars.
Pitfalls I've hit
None of these is in the official docs. All of them cost me hours — here's what to look out for.
1. The federated credential subject typo. Already mentioned. 90 minutes lost. Lesson: keep a single Terraform variable service_account_subject = "system:serviceaccount:${var.namespace}:${var.sa_name}" and reference it. Don't hand-write the subject string anywhere.
2. The azure.workload.identity/use: "true" label is on the POD, not the deployment. I put it on the Deployment metadata once and it did nothing. The webhook only inspects pod labels. Put it under spec.template.metadata.labels.
3. RBAC takes time to take effect on Key Vault. A fresh Key Vault Secrets User role assignment takes 1–5 minutes to apply. If you terraform apply and immediately deploy the app, you'll see Forbidden errors that disappear on their own. I added a time_sleep in Terraform after the role assignment, 60 seconds. Not elegant, but stops the false alarm.
4. ESO version pinning. ESO's API moved from v1beta1 to v1 in newer versions. If you upgrade the chart, your SecretStore manifests may need updating. Pin the chart version, plan upgrades deliberately, and read the release notes — ESO is not a set-and-forget operator.
5. Refresh interval and rate limits. The default refreshInterval of 1h works for most cases. Don't set it to 30s "to be safe" — you'll hit Key Vault rate limits if you have hundreds of ExternalSecret resources, and you'll get 429s with no useful retries. If you really need real-time rotation, look at push-based rotation (Key Vault event grid → ESO restart), not aggressive polling.
6. The 20-credential limit per identity. Federated credentials are limited to 20 per managed identity in Azure. If you try to share one identity across many ServiceAccounts, you hit it. The right pattern is one identity per app, not one identity per namespace.
What you removed
Once this works end-to-end, here is what gets deleted from your cluster:
- Every
kubectl create secretin your deploy scripts. - The sealed-secrets controller and the
kubesealstep in your CI. - The vault-injector sidecar, if you had one.
- Any SOPS encrypted YAML in your Helm values.
- The "rotate database password" runbook — Key Vault does it, ESO syncs it, pods notice it (or rolling restart on Secret change if you're strict).
- Every Service Principal client secret stored anywhere — apps now authenticate to Azure via Workload Identity, no client secret needed.
What stays:
- A Kubernetes Secret per app, created by ESO, holding the current value.
- The ServiceAccount + label pattern in every Deployment.
- A federated credential per app in Terraform.
Three artifacts instead of fifteen.
Key takeaways
- Kubernetes Secrets are not encrypted. They're base64-encoded plaintext. Treat them as a copy, not the source of truth.
- Workload Identity replaces shared API keys with per-pod OIDC federation. Your cluster never owns a credential to Azure.
- External Secrets Operator turns Key Vault into the single source of truth for secret values. The K8s Secret is just a cached copy.
- The setup has roughly six moving pieces. None of them are hard on their own. The complexity is in how they connect — which is why nobody writes a clean end-to-end guide.
- The federated credential
subjectfield is the single most common silent failure. Make it a Terraform variable and never hand-write it.
If I'd seen one post that walked through this end-to-end the first time I needed it, I'd have saved myself two weekends. That's what I want this blog to be.
Next post: a deeper dive into this pattern — multi-tenant identity isolation, audit logging, what to do when Key Vault itself becomes the single point of failure. If there's a specific piece of this that bit you, message me on LinkedIn — I'll write about it.