Skip to main content

Envelope encryption

Type: ExplanationCreated: Team: Security
draft

The problem it solves

Envelope encryption is the concept that, once it clicks, makes the rest of key management fall into place. It's worth understanding why it's built this way, not just what it does — and the why starts with what goes wrong without it.

Picture the naive approach: you have one master key, and you encrypt all your data directly with it. Three things immediately go wrong.

  • The master key is always exposed. It has to be present in your application's memory every single time you encrypt or decrypt anything. Your most valuable secret is constantly in reach — one memory dump, one compromised app server, and the whole vault is open.

  • Rotation means re-encrypting everything. If the master key changes, you must decrypt and re-encrypt every row you've ever stored under the new key. For a real database that's hours of downtime and risk.

  • You can't delegate to a KMS. The most secure option is to hand custody of the master key to a managed service like AWS KMS — but that turns out to be incompatible with encrypting bulk data, and the reason is worth unpacking.

    "Delegating" means letting KMS hold the key inside a hardware security module (HSM), so the raw key bytes never leave that tamper-proof boundary — not even you can extract them. You don't store the key at all; you simply ask KMS to perform crypto operations with it. That's a genuinely valuable property, because it directly fixes the first problem: the key is never sitting in your application's memory.

    The catch is mechanical. If the key never leaves KMS, the only way to use it is to send your plaintext to KMS and have it hand back the ciphertext — the data has to travel to the key. That's fine for a small secret, but it collapses for a database: each Encrypt/Decrypt call is capped at around 4 KB (rows, blobs, and documents are routinely larger), and every operation becomes a rate-limited, billed network round-trip. KMS is deliberately built as a key custodian, not a bulk encryption engine. So you're stuck — the secure delegation model is incompatible with encrypting large volumes of data, because secure delegation requires routing the data through the key holder.

Envelope encryption resolves all three by splitting the job into two tiers of keys.

The two keys

The split is between a key that does the work and a key that guards the working key.

  • DEK — Data Encryption Key. The workhorse. A fast symmetric key (AES-256) that actually encrypts your data, locally, inside your app. Bulk crypto, no size limits, no per-byte network round-trip.
  • KEK — Key Encryption Key. The guardian. Its only job is to encrypt ("wrap") the DEK. It lives inside KMS or an HSM and — this is the whole point — never leaves that boundary. It cannot be exported: every operation that uses it happens inside the secure hardware. You only ever send things to the KEK; you never receive the KEK itself.

The encrypt flow - step by step

When you want to store something sensitive:

  1. Your app calls KMS: GenerateDataKey. KMS creates a fresh DEK and hands you back two versions of it — the plaintext DEK (usable immediately) and the wrapped DEK (the same key, already encrypted by the KEK).
  2. You use the plaintext DEK to encrypt your data locally with AES — fast, no size limit, no per-record network call.
  3. You immediately discard the plaintext DEK from memory. It's done its job.
  4. You store the ciphertext and the wrapped DEK together in the database.

Notice what you end up with on disk: encrypted data, sitting next to an encrypted key. Neither is useful on its own, and the only thing that can turn the wrapped DEK back into a usable key — the KEK — isn't there. It's in KMS.

The decrypt flow - step by step

To read it back:

  1. Read the ciphertext and the wrapped DEK from the database.
  2. Send only the wrapped DEK to KMS: Decrypt.
  3. Inside its secure boundary, KMS uses the KEK to unwrap it and returns the plaintext DEK to you.
  4. You decrypt the ciphertext locally, use the data, and discard the DEK again.
tip

The KEK never traveled. Your bulk data never went near KMS. Only a tiny key made the round-trip.

Why every concern from earlier dissolves

  • Root key exposure — gone. The KEK is never in your app's memory, never on your disk, never in a backup. A full application compromise leaks DEKs (bad, but scoped and rotatable), not the master key.
  • Rotation — cheap. To rotate the KEK you don't touch your data at all; you only re-wrap the small DEKs — and KMS keeps old KEK versions around, so old wrapped DEKs still decrypt. What used to be a re-encrypt-the-whole-database event becomes a metadata operation.
  • Separation of duties — the payoff that ties back to your threat model. The permission to read the database and the permission to call KMS:Decrypt are two different IAM grants. An attacker who breaches only the database gets ciphertext plus wrapped DEKs and can do nothing with them. To use them they'd also need to compromise a principal with KMS decrypt rights — so you've forced them to defeat two independent controls instead of one.
  • Auditability — for free. Every unwrap is a KMS API call, logged in CloudTrail. So even though decryption happens in your app, the authorization to decrypt is centralized and recorded. You can answer "who decrypted this, and when" — which is Article 32 accountability evidence, at no extra cost.

Rotating the KEK

First, separate two operations that get conflated:

  • KEK rotation — change the wrapper around the DEK. Cheap; the data is never touched.
  • DEK rotation — generate a new DEK and decrypt-then-re-encrypt everything it protected. Expensive, and proportional to data volume. Only needed if the DEK or the data itself is exposed.

Everything below is about the cheap one.

Two ways to rotate the KEK

  • Automatic (transparent) rotation. KMS keeps the same key ID and generates new key material on a schedule, retaining old versions internally. Each wrapped DEK records which version wrapped it, so KMS auto-selects the right one to unwrap. Existing DEKs keep decrypting under their old version forever; new DEKs are wrapped under the latest. You do nothing — but old versions are never separately addressable.
  • Active re-wrap. You unwrap each DEK with the old KEK, re-wrap it with the new one, and overwrite the stored wrapped-DEK blob. The DEK value never changes, so the data ciphertext stays valid and is never read or rewritten — the crypto only ever touches 32-byte keys, one per object.

The difference isn't "backward compatible or not" — in both cases the DEK and the data are unchanged. The real question is whether you still depend on the old KEK version. With automatic rotation that dependency never goes away; with active re-wrap, once everything is re-wrapped nothing relies on the old version, so you can remove it.

Can you actually retire the old KEK

Only under the right rotation model:

  • Automatic rotation (one key, internal versions) — an old version isn't a deletable object. You can only delete the whole key, which takes the current version with it and makes all data unrecoverable. Re-wrapping buys almost nothing here; there's nothing to retire.
  • Manual rotation (each KEK is its own key) — the old KEK is a separate resource. Re-wrap onto the new key, then DisableKey or ScheduleKeyDeletion. This is the only model where retiring or deleting the old KEK is real.

Does rotating the KEK actually improve security

Mostly it's a hygiene and compliance control, not an incident-response one. Against the obvious threats it does little:

  • Attacker with live kms:Decrypt access (stolen credentials) — rotation does nothing; KMS unwraps under any version. The fix is to revoke access or disable the key.
  • Stolen plaintext DEK — rotation does nothing; you need DEK rotation and re-encryption.
  • Stolen wrapped DEK with no KMS access — the attack already fails, so rotation is irrelevant.

Where it genuinely earns its place:

  • Compliance — standards like PCI-DSS mandate periodic rotation. The honest primary driver.
  • Imported key material (BYOK) — the one case where KEK material can actually leak, so rotating it limits exposure of newly wrapped data (old data still needs re-encryption).
  • Defense in depth / crypto wear-out — weak at the KEK level, since a KEK only ever encrypts tiny DEKs.

Granularity — the lever that unlocks more

The quietly powerful part: you decide how many DEKs you have.

  • One DEK for everything — simplest, but a single blast radius.
  • Per-tenant DEK — each customer's data is wrapped under its own DEK. For a multi-tenant platform this gives you cryptographic isolation between tenants — a strong story for Skapp's HR and business data.
  • Per-user DEK — and here's the connection to GDPR erasure: if each user's data sits under their own DEK, you "delete" that user by destroying their wrapped DEK. The ciphertext becomes permanently unreadable everywhere it exists — primary, replicas, backups — without hunting it down. That's the crypto-shredding technique from the earlier map, and envelope encryption is what makes it practical.

Choosing a KEK

There are two independent choices to make about the KEK.

The three key types

This is about who controls the key:

  • Customer managed — you create it and you're in charge: who can use it, when it rotates, whether to disable or delete it. Costs about $1/month. Use this when you need control.
  • AWS managed — a service creates it for you automatically (named aws/rds, aws/s3, and so on). You can see it but can't change its policy, its rotation, or delete it. Free. Convenient, but no control.
  • AWS owned — AWS's own key, living in AWS's account. You can't see it, manage it, or audit it at all. Free. Maximum convenience, zero control.

A simple ladder: customer managed = full control, AWS managed = some visibility but no control, AWS owned = invisible.

Where the key material comes from

This choice exists only on a customer-managed key:

  • KMS-generated (AWS_KMS) — KMS creates the key material itself. The default. The material never leaves KMS and it supports automatic rotation. This is what almost everyone uses.
  • BYOK (EXTERNAL) — you generate the material yourself and import it into KMS, keeping a copy outside AWS. Used only when a compliance rule says you must own the material. No automatic rotation.

So the type is about who controls the key; the origin is about where the material came from. BYOK versus KMS-generated is just that second choice — and it's available only on a customer-managed key.