Crypto Agility After PQC: How to Avoid the Next Migration Failure

30 min read · 5,970 words

Why Crypto Agility Matters Now

Cryptography has a shelf life. Every algorithm we depend on today — RSA, ECC, AES — will eventually be broken or deprecated. The question isn’t if but how painful the next transition will be. If history is any guide, the answer is: very painful, unless you start building agility into your cryptographic infrastructure today.

The post-quantum era isn’t some distant science-fiction scenario. NIST finalized its first three post-quantum cryptography standards in August 2024 — ML-KEM (FIPS 203), ML-DSA (FIPS 204), and SLH-DSA (FIPS 205) — and the NIST CSWP 39 publication in December 2025 formally elevated crypto agility from an industry buzzword to a design imperative. Yet as of March 2025, only 7% of organizations have a formal post-quantum transition plan, according to Quantum Insider. That gap between awareness and action is a ticking clock.

The threat is concrete. A sufficiently powerful quantum computer running Shor’s algorithm would break RSA-2048, ECC (P-256), and every other algorithm built on the hardness of factoring or discrete logarithms. But you don’t need to wait for a quantum computer to exist. The “harvest now, decrypt later” attack model means adversaries are already capturing encrypted traffic today, banking on the ability to decrypt it when quantum capabilities mature. If your data has a classification lifetime measured in years or decades, the risk is already active.

Crypto agility — the ability to swap cryptographic algorithms, protocols, and implementations without rewriting applications — is the architectural answer to this challenge. It transforms migration from a one-time, crisis-driven project into a continuous, manageable process. This article breaks down how to actually build it, grounded in the NIST CSWP 39 framework and real-world migration lessons.

Lessons from Past Cryptographic Failures

Before diving into solutions, let’s understand why migrations fail. The cryptographic industry has a poor track record of timely transitions, and each failure follows a recognizable pattern: denial, panic, and painful, incomplete remediation.

The SHA-1 Decade

The first practical collision attack on SHA-1 was demonstrated in 2017 by researchers at Google and CWI Amsterdam (the SHAttered attack). But SHA-1 was considered weak years before that — theoretical vulnerabilities were identified as early as 2005. Despite this head start, the migration to SHA-2 took over 10 years to achieve broad adoption. Certificate authorities were still issuing SHA-1 certificates in 2016. Legacy systems in enterprise environments and embedded devices continued using SHA-1 well into the 2020s. The root cause: cryptographic hash functions were baked into code as hard-coded constants, not configurable abstractions.

TLS 1.0/1.1 Deprecation

When major browsers deprecated TLS 1.0 and 1.1 in 2020, organizations scrambled. Payment processors, healthcare systems, and IoT devices found themselves unable to establish connections. The problem wasn’t technical complexity — TLS 1.2 had been available since 2008. The problem was that TLS versions were hard-coded in legacy clients, embedded in firmware that couldn’t be updated, and scattered across third-party integrations with no centralized visibility. The deprecation deadline forced a migration that should have been planned years in advance.

Heartbleed and Hard-Coded Dependencies

The 2014 Heartbleed vulnerability in OpenSSL exposed a deeper problem: organizations didn’t know where their cryptography lived. The bug affected an estimated 17% of secure web servers globally, but the remediation nightmare extended far beyond web servers. VPN appliances, email gateways, embedded systems, and enterprise software all silently depended on vulnerable OpenSSL versions. Recovery took months, and some systems were never patched. The lesson was clear: if you can’t inventory your cryptographic dependencies, you can’t protect them.

The Common Thread

Every one of these failures shares the same root causes:

Hard-coded algorithms: Cryptographic primitives were embedded directly in application code rather than referenced through configurable abstractions.
No centralized inventory: Organizations didn’t know which systems used which algorithms, libraries, or key lengths.
No migration planning: Transitions were reactive, triggered by deprecation deadlines or active exploits rather than proactive risk management.
Vendor lock-in: Hardware security modules (HSMs), TLS terminators, and protocol implementations often couldn’t be upgraded without replacing the entire device.

These aren’t theoretical risks. They’re documented, expensive failures that organizations repeated across multiple transitions. Crypto agility is the discipline that prevents the next repeat.

The NIST CSWP 39 Blueprint

NIST’s Cryptographic Snapshot Publication 39 (CSWP 39), released in its final form in December 2025, is the most authoritative framework for crypto agility to date. It doesn’t just describe the problem — it provides a structured maturity model, concrete architectural principles, and implementation guidance.

CSWP 39 defines crypto agility as “the ability of an information system to rapidly and efficiently transition to alternative cryptographic algorithms, protocols, and implementations without significant changes to system architecture or application logic.”

The document establishes a five-level maturity model for organizational crypto agility:

Level	Name	Characteristics
1	Unstructured / Unplanned	No awareness of crypto dependencies. Algorithms are hard-coded. No inventory exists. Migrations are reactive and chaotic.
2	Ad Hoc Awareness	Partial inventory of cryptographic assets. Some awareness of deprecation timelines. No systematic process for transitions.
3	Defined Process	Formal CBOM exists. Crypto dependencies are documented. Migration plans exist but are manual and project-driven.
4	Managed & Measured	Crypto policy engine drives algorithm selection. CBOM integrated with SBOM. Automated detection of deprecated algorithms. Regular testing of migration paths.
5	Adaptive & Integrated	Crypto agility is integrated into enterprise risk management. Algorithm transitions are routine, low-risk operations. Continuous monitoring of cryptographic health across all assets.

Most organizations today sit at Level 1 or Level 2. The goal of this article is to give you a concrete path to Level 4 and beyond.

CSWP 39 also identifies three foundational pillars that every crypto-agile architecture must implement. Let’s examine each one.

The Three Pillars of Crypto Agility

Crypto agility isn’t a single technology — it’s an architectural approach built on three interdependent pillars. Remove any one, and your agility collapses.

Pillar 1: Modularity

Modularity means treating cryptographic algorithms as swappable components, not integral parts of your application logic. When you need to transition from AES-256 to a post-quantum KEM, you should be able to do so by changing a configuration value or swapping a module — not by refactoring your application code.

In practice, this means:

Cryptographic operations are isolated in dedicated modules or libraries.
Applications reference algorithms by capability (e.g., “asymmetric encryption”) rather than by specific algorithm (e.g., “RSA-2048”).
Algorithm implementations conform to a common interface, making them interchangeable.
No algorithm-specific logic leaks into business code — no manual padding, no hardcoded key sizes, no algorithm-specific error handling.

Pillar 2: Abstraction

Abstraction provides a standardized API layer between applications and cryptographic implementations. Your application code calls encrypt(plaintext, key, policy) — it doesn’t call rsa_encrypt_oaep_sha256(plaintext, key). The abstraction layer maps the policy to a specific algorithm based on current organizational standards.

This is analogous to how database drivers work: your application uses a standard query interface, and the driver handles the specifics of PostgreSQL, MySQL, or SQLite. Cryptographic abstraction layers like Google’s Tink, AWS-LC, and the OpenSSL provider architecture follow this same pattern.

Key benefits of abstraction:

Application code remains stable across algorithm transitions.
Security policies can be enforced centrally in the abstraction layer.
Testing is simplified — you can swap implementations without changing tests.
Algorithm negotiation becomes a library concern, not an application concern.

Pillar 3: Policy-Mechanism Separation

This is the most critical and most often overlooked pillar. Policy-mechanism separation means that what algorithms are approved is a policy decision, and how they’re implemented is a mechanism decision. These two concerns must live in different places.

A cryptographic policy defines which algorithms, key lengths, and protocols are approved for use, based on organizational risk tolerance, regulatory requirements, and threat intelligence. A mechanism is the actual implementation of a specific algorithm.

In a policy-driven architecture:

Policies are defined in machine-readable configuration files (YAML, JSON, or domain-specific policy languages).
The abstraction layer reads policies at runtime to determine which algorithms to use.
Policies can be updated without code changes — simply update the policy file and reload.
Different environments (development, staging, production) can enforce different policies.
Audit trails track which policies were active at which times.

Together, these three pillars create an architecture where algorithm transitions are configuration changes, not engineering projects.

The Cryptographic Bill of Materials (CBOM)

You can’t manage what you can’t see. The Cryptographic Bill of Materials (CBOM) is a comprehensive inventory of every cryptographic algorithm, library, key, protocol, and implementation detail across your entire technology stack. It’s the cryptographic equivalent of a Software Bill of Materials (SBOM), and NIST recommends integrating CBOM data directly into your existing SBOM workflows.

A proper CBOM tracks:

Algorithms in use: Symmetric (AES, ChaCha20), asymmetric (RSA, ECC, ML-KEM), hash (SHA-256, SHA-3, SLH-DSA), key exchange (X25519, ML-KEM).
Library versions: OpenSSL 3.x, BoringSSL, libsodium, and their specific versions.
Key metadata: Algorithm, key length, creation date, rotation schedule, responsible owner.
Protocol usage: TLS versions, cipher suites, certificate chains.
Hardware dependencies: HSM models, TPM versions, secure enclave capabilities.
Deprecation status: Current NIST guidance for each algorithm (approved, deprecated, or prohibited).

Here’s a Python-based CBOM scanner that demonstrates how to build a basic inventory from application dependencies and runtime configurations:

import json
import hashlib
import ssl
import sqlite3
from datetime import datetime, timezone
from pathlib import Path
from dataclasses import dataclass, field, asdict
from typing import Optional


@dataclass
class CryptoAsset:
    """A single entry in the Cryptographic Bill of Materials."""
    asset_id: str
    algorithm: str
    category: str  # symmetric, asymmetric, hash, kex, protocol
    library: str
    library_version: str
    key_length: Optional[int] = None
    purpose: str = ""
    location: str = ""  # file, module, or service where found
    deprecation_status: str = "unknown"  # approved, deprecated, prohibited
    last_assessed: str = field(default_factory=lambda: datetime.now(
        timezone.utc).isoformat())
    risk_score: float = 0.0  # 0.0 (safe) to 1.0 (critical)


# NIST-recommended deprecation mapping (simplified)
NIST_STATUS = {
    "RSA-1024": "prohibited", "RSA-2048": "approved",
    "RSA-3072": "approved", "RSA-4096": "approved",
    "ECDSA-P256": "approved", "ECDSA-P384": "approved",
    "ECDSA-P521": "approved",
    "AES-128": "approved", "AES-256": "approved",
    "SHA-1": "prohibited", "SHA-224": "approved",
    "SHA-256": "approved", "SHA-384": "approved", "SHA-512": "approved",
    "3DES": "prohibited", "RC4": "prohibited",
    "TLS-1.0": "prohibited", "TLS-1.1": "prohibited",
    "TLS-1.2": "approved", "TLS-1.3": "approved",
    "X25519": "approved", "ML-KEM-768": "approved",
    "ML-KEM-1024": "approved", "ML-DSA-65": "approved",
    "ML-DSA-87": "approved", "SLH-DSA-SHA2-128s": "approved",
}


class CBOMScanner:
    """Scans systems and builds a Cryptographic Bill of Materials."""

    def __init__(self):
        self.assets: list[CryptoAsset] = []
        self._id_counter = 0

    def _next_id(self) -> str:
        self._id_counter += 1
        return f"CBOM-{self._id_counter:04d}"

    def scan_ssl_configuration(self) -> list[CryptoAsset]:
        """Scan the current Python SSL/TLS configuration."""
        results = []
        ctx = ssl.create_default_context()

        # Check TLS version
        tls_version = ctx.maximum_version.name
        status = NIST_STATUS.get(f"TLS-{tls_version.replace('_', '.')}", "unknown")
        risk = 1.0 if status == "prohibited" else (0.5 if status == "deprecated" else 0.0)
        results.append(CryptoAsset(
            asset_id=self._next_id(), algorithm=f"TLS-{tls_version}",
            category="protocol", library="OpenSSL",
            library_version=ssl.OPENSSL_VERSION.split()[1],
            purpose="Transport Layer Security",
            deprecation_status=status, risk_score=risk,
            location="system-ssl-context"
        ))

        # Check cipher suites
        ciphers = ctx.get_ciphers()
        for cipher in ciphers[:10]:  # Limit for demonstration
            algo_name = cipher.get("description", cipher["name"])
            status = NIST_STATUS.get(algo_name, "approved")
            risk = 1.0 if status == "prohibited" else (0.5 if status == "deprecated" else 0.0)
            results.append(CryptoAsset(
                asset_id=self._next_id(), algorithm=algo_name,
                category="protocol", library="OpenSSL",
                library_version=ssl.OPENSSL_VERSION.split()[1],
                key_length=cipher.get("alg_bits"),
                purpose=f"Cipher suite (bits={cipher.get('bits', 'N/A')})",
                deprecation_status=status, risk_score=risk,
                location="system-ssl-ciphers"
            ))
        return results

    def scan_python_hashes(self) -> list[CryptoAsset]:
        """Inventory hash algorithms available in Python stdlib."""
        results = []
        for name in hashlib.algorithms_available:
            status = NIST_STATUS.get(name.upper(), "approved")
            risk = 1.0 if status == "prohibited" else (0.5 if status == "deprecated" else 0.0)
            results.append(CryptoAsset(
                asset_id=self._next_id(), algorithm=name,
                category="hash", library="Python hashlib",
                library_version="3.13",
                purpose="Hash function availability",
                deprecation_status=status, risk_score=risk,
                location="stdlib"
            ))
        return results

    def assess_risk(self, asset: CryptoAsset) -> float:
        """Calculate risk score based on deprecation and age."""
        status_weights = {"approved": 0.0, "deprecated": 0.5, "prohibited": 1.0, "unknown": 0.3}
        base = status_weights.get(asset.deprecation_status, 0.3)
        # Penalize shorter key lengths for asymmetric algorithms
        if asset.category == "asymmetric" and asset.key_length:
            if asset.key_length < 2048:
                base = max(base, 0.8)
        return min(base, 1.0)

    def build_cbom(self) -> dict:
        """Build the complete CBOM by running all scanners."""
        self.assets.extend(self.scan_ssl_configuration())
        self.assets.extend(self.scan_python_hashes())

        # Reassess all risks
        for asset in self.assets:
            asset.risk_score = self.assess_risk(asset)

        # Sort by risk (critical first)
        self.assets.sort(key=lambda a: a.risk_score, reverse=True)

        return {
            "cbom_version": "1.0",
            "generated_at": datetime.now(timezone.utc).isoformat(),
            "total_assets": len(self.assets),
            "critical_count": sum(1 for a in self.assets if a.risk_score >= 0.8),
            "assets": [asdict(a) for a in self.assets]
        }


if __name__ == "__main__":
    scanner = CBOMScanner()
    cbom = scanner.build_cbom()
    print(json.dumps(cbom, indent=2))

This scanner produces a structured CBOM that can be integrated into CI/CD pipelines, vulnerability management systems, and compliance dashboards. In a production environment, you’d extend it to scan container images, compiled binaries (using tools like cryptosense or cbom-tool), and network traffic to identify TLS cipher suites in use.

Building a Crypto-Agile Architecture

With the three pillars defined and the CBOM providing visibility, the next step is implementing the actual architectural patterns. Let’s walk through four concrete components: a cryptographic policy engine, a hybrid encryption wrapper for PQC transition, and an algorithm negotiation protocol.

Cryptographic Policy Engine

The policy engine is the brain of a crypto-agile system. It translates organizational security requirements into runtime algorithm decisions. Here’s a Python implementation that demonstrates the pattern:

"""
Cryptographic Policy Engine
Reads machine-readable policy profiles and resolves algorithm
selection based on context, risk level, and compliance requirements.
"""
import yaml
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional


class RiskLevel(Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"


@dataclass
class AlgorithmSpec:
    """Specification for a single algorithm choice."""
    name: str
    min_key_length: int
    approved: bool = True
    pq_ready: bool = False
    deprecation_date: Optional[str] = None
    notes: str = ""


@dataclass
class PolicyProfile:
    """A complete cryptographic policy profile."""
    name: str
    version: str
    effective_date: str
    asymmetric_encryption: list[AlgorithmSpec] = field(default_factory=list)
    key_exchange: list[AlgorithmSpec] = field(default_factory=list)
    digital_signatures: list[AlgorithmSpec] = field(default_factory=list)
    symmetric_encryption: list[AlgorithmSpec] = field(default_factory=list)
    hashing: list[AlgorithmSpec] = field(default_factory=list)


# Default policy aligned with NIST CSWP 39 recommendations
DEFAULT_POLICY_YAML = """
name: nist-cswp39-2025
version: "1.0"
effective_date: "2025-12-01"

asymmetric_encryption:
  - name: ML-KEM-768
    min_key_length: 768
    pq_ready: true
    notes: "NIST FIPS 203 - Primary KEM for general use"
  - name: RSA-4096
    min_key_length: 4096
    pq_ready: false
    deprecation_date: "2030-01-01"
    notes: "Legacy fallback, plan for retirement"

key_exchange:
  - name: ML-KEM-768
    min_key_length: 768
    pq_ready: true
    notes: "Post-quantum key encapsulation"
  - name: X25519
    min_key_length: 256
    pq_ready: false
    notes: "Classical ECDH, use in hybrid mode only"

digital_signatures:
  - name: ML-DSA-65
    min_key_length: 0
    pq_ready: true
    notes: "NIST FIPS 204 - Primary PQC signature"
  - name: ECDSA-P384
    min_key_length: 384
    pq_ready: false
    deprecation_date: "2030-01-01"
    notes: "Classical fallback for interoperability"

symmetric_encryption:
  - name: AES-256-GCM
    min_key_length: 256
    pq_ready: true
    notes: "Quantum-resistant symmetric, NIST approved"

hashing:
  - name: SHA-384
    min_key_length: 0
    pq_ready: true
    notes: "Minimum acceptable hash for new systems"
  - name: SHA-512
    min_key_length: 0
    pq_ready: true
    notes: "Preferred for digital signatures"
  - name: SLH-DSA-SHA2-128s
    min_key_length: 0
    pq_ready: true
    notes: "NIST FIPS 205 - Stateful hash-based signature"
"""


class CryptoPolicyEngine:
    """Resolves algorithm selection from policy profiles."""

    def __init__(self, policy_data: Optional[str] = None):
        raw = policy_data or DEFAULT_POLICY_YAML
        self.profile = self._load_profile(raw)
        self._require_pq = False  # Toggle for PQC-first mode

    def _load_profile(self, yaml_str: str) -> PolicyProfile:
        data = yaml.safe_load(yaml_str)
        return PolicyProfile(
            name=data["name"],
            version=data["version"],
            effective_date=data["effective_date"],
            asymmetric_encryption=[
                AlgorithmSpec(**s) for s in data.get("asymmetric_encryption", [])
            ],
            key_exchange=[
                AlgorithmSpec(**s) for s in data.get("key_exchange", [])
            ],
            digital_signatures=[
                AlgorithmSpec(**s) for s in data.get("digital_signatures", [])
            ],
            symmetric_encryption=[
                AlgorithmSpec(**s) for s in data.get("symmetric_encryption", [])
            ],
            hashing=[
                AlgorithmSpec(**s) for s in data.get("hashing", [])
            ],
        )

    def set_require_pq(self, require: bool):
        """Enable PQC-first mode — only quantum-safe algorithms."""
        self._require_pq = require

    def select(self, category: str, risk_level: RiskLevel = RiskLevel.MEDIUM,
               allow_legacy: bool = False) -> AlgorithmSpec:
        """Select the best algorithm for a given category and context."""
        candidates = getattr(self.profile, category, [])

        # Filter by PQC requirement
        if self._require_pq:
            candidates = [c for c in candidates if c.pq_ready]
        elif not allow_legacy:
            # Prefer PQC, but allow legacy if no PQC option
            pq_candidates = [c for c in candidates if c.pq_ready]
            if pq_candidates:
                candidates = pq_candidates

        # Filter by risk level — higher risk demands stronger algorithms
        if risk_level in (RiskLevel.HIGH, RiskLevel.CRITICAL):
            candidates = [c for c in candidates if c.approved]

        # Filter out deprecated
        candidates = [c for c in candidates if c.deprecation_date is None]

        if not candidates:
            raise ValueError(
                f"No approved algorithm for category '{category}' "
                f"with PQC={'required' if self._require_pq else 'optional'}"
            )

        return candidates[0]  # First = highest priority in policy

    def audit(self) -> dict:
        """Generate audit report of current policy coverage."""
        categories = [
            "asymmetric_encryption", "key_exchange",
            "digital_signatures", "symmetric_encryption", "hashing"
        ]
        report = {"profile": self.profile.name, "version": self.profile.version}
        for cat in categories:
            specs = getattr(self.profile, cat, [])
            pq_count = sum(1 for s in specs if s.pq_ready)
            legacy_count = len(specs) - pq_count
            report[cat] = {
                "total": len(specs),
                "pq_ready": pq_count,
                "legacy": legacy_count,
                "primary": specs[0].name if specs else "none"
            }
        return report


if __name__ == "__main__":
    engine = CryptoPolicyEngine()

    # Normal operation
    kem = engine.select("key_exchange")
    print(f"Selected KEM: {kem.name} (PQ-ready: {kem.pq_ready})")

    # High-risk context — PQC required
    engine.set_require_pq(True)
    sig = engine.select("digital_signatures", risk_level=RiskLevel.CRITICAL)
    print(f"Critical-context signature: {sig.name}")

    # Audit
    import json
    print(json.dumps(engine.audit(), indent=2))

This policy engine demonstrates the key architectural pattern: algorithm selection is a runtime decision driven by policy, not a compile-time constant baked into code. Updating the policy YAML file is all that’s needed to shift the entire organization to new algorithms — no code changes, no redeployment of business logic.

Hybrid Encryption for PQC Transition

During the transition to post-quantum cryptography, organizations need to maintain backward compatibility with classical systems while adding quantum-safe protection. The solution is hybrid cryptography — combining a classical algorithm (e.g., X25519) with a post-quantum algorithm (e.g., ML-KEM-768) so the combined scheme is secure as long as either algorithm remains unbroken.

"""
Hybrid Encryption Wrapper
Combines classical ECDH key exchange with post-quantum ML-KEM
for defense-in-depth during the PQC transition period.
"""
import os
import hashlib
from dataclasses import dataclass


@dataclass
class HybridKeyPair:
    """Contains both classical and post-quantum key material."""
    classical_private: bytes
    classical_public: bytes
    pq_private: bytes
    pq_public: bytes
    classical_algorithm: str = "X25519"
    pq_algorithm: str = "ML-KEM-768"


@dataclass
class HybridCiphertext:
    """Combined ciphertext from hybrid encryption."""
    classical_encapsulated_key: bytes
    pq_encapsulated_key: bytes
    encrypted_data: bytes
    nonce: bytes


class HybridKEM:
    """
    Hybrid Key Encapsulation Mechanism.
    In production, replace the stub functions with actual calls to:
      - cryptography.hazmat for X25519
      - liboqs-python for ML-KEM
    This wrapper shows the architectural pattern, not production crypto.

    NOTE: The demo below uses deterministic stubs so the encapsulate/
    decapsulate round-trip produces matching secrets. Real KEMs derive
    shared secrets from public/private key material — os.urandom()
    would break the round-trip demonstration.
    """

    # Deterministic seed for demo purposes only (never do this in production)
    _demo_seed = 0

    @staticmethod
    def _stub_shared_secret(label: str) -> bytes:
        """Deterministic stub that simulates KEM shared secret derivation.
        In production, this is derived from public/private key material."""
        return hashlib.sha256(
            f"{label}-stub-shared-secret".encode()
        ).digest()

    @staticmethod
    def generate_key_pair() -> HybridKeyPair:
        """Generate a hybrid key pair (classical + PQC)."""
        # Classical: X25519 key pair (32 bytes each)
        classical_private = os.urandom(32)
        # Stub: real impl derives public from private via scalar multiplication
        classical_public = hashlib.sha256(
            classical_private + b"x25519-pub-derive"
        ).digest()[:32]

        # Post-quantum: ML-KEM-768 (768-byte public key, ~2400-byte private)
        pq_private = os.urandom(2400)
        # Stub: real impl derives public from private via ML-KEM keygen
        pq_public = hashlib.sha256(
            pq_private[:64] + b"ml-kem-pub-derive"
        ).digest()[:768]

        return HybridKeyPair(
            classical_private=classical_private,
            classical_public=classical_public,
            pq_private=pq_private,
            pq_public=pq_public,
        )

    @staticmethod
    def encapsulate(public_key: HybridKeyPair) -> tuple[bytes, HybridCiphertext]:
        """
        Encapsulate a shared secret using both classical and PQC KEMs.
        Returns (shared_secret, ciphertext) for the sender.
        """
        # Classical KEM: ECDH encapsulation (stub)
        classical_ct = os.urandom(32)
        classical_shared = HybridKEM._stub_shared_secret("classical-encap")

        # PQC KEM: ML-KEM encapsulation (stub)
        pq_ct = os.urandom(1088)  # ML-KEM-768 ciphertext size
        pq_shared = HybridKEM._stub_shared_secret("pq-encap")

        # Combine shared secrets using HKDF
        combined_secret = hashlib.sha256(
            classical_shared + pq_shared + b"hybrid-kem-combine"
        ).digest()

        return combined_secret, HybridCiphertext(
            classical_encapsulated_key=classical_ct,
            pq_encapsulated_key=pq_ct,
            encrypted_data=b"",  # Actual data encrypted with combined_secret
            nonce=os.urandom(12),
        )

    @staticmethod
    def decapsulate(
        private_key: HybridKeyPair,
        ciphertext: HybridCiphertext
    ) -> bytes:
        """
        Decapsulate the shared secret using both private keys.
        Security: secure as long as EITHER KEM remains unbroken.
        """
        # Classical decapsulation (stub)
        classical_shared = HybridKEM._stub_shared_secret("classical-encap")

        # PQC decapsulation (stub)
        pq_shared = HybridKEM._stub_shared_secret("pq-encap")

        # Same KDF as encapsulation
        combined_secret = hashlib.sha256(
            classical_shared + pq_shared + b"hybrid-kem-combine"
        ).digest()

        return combined_secret


def demo_hybrid_workflow():
    """Demonstrate the hybrid encryption workflow."""
    print("=== Hybrid KEM Demo ===\n")

    # Step 1: Receiver generates hybrid key pair
    receiver_keys = HybridKEM.generate_key_pair()
    print(f"Classical public key: {len(receiver_keys.classical_public)} bytes")
    print(f"PQ public key:        {len(receiver_keys.pq_public)} bytes")

    # Step 2: Sender encapsulates shared secret
    shared_secret, ciphertext = HybridKEM.encapsulate(receiver_keys)
    print(f"\nShared secret:  {shared_secret.hex()[:32]}...")
    print(f"Classical CT:   {len(ciphertext.classical_encapsulated_key)} bytes")
    print(f"PQ CT:          {len(ciphertext.pq_encapsulated_key)} bytes")

    # Step 3: Receiver decapsulates
    recovered_secret = HybridKEM.decapsulate(receiver_keys, ciphertext)
    print(f"\nRecovered:      {recovered_secret.hex()[:32]}...")
    print(f"Match:          {shared_secret == recovered_secret}")

    # Step 4: Security analysis
    print(f"\n=== Security Properties ===")
    print(f"Classical security:  X25519 (128-bit security level)")
    print(f"PQ security:         ML-KEM-768 (NIST Level 3)")
    print(f"Combined:            Secure if EITHER remains unbroken")
    print(f"Overhead vs X25519:  ~1104 bytes (ciphertext) + ~736 bytes (public key)")


if __name__ == "__main__":
    demo_hybrid_workflow()

The hybrid approach is explicitly recommended by NIST and major protocol designers. TLS 1.3 extensions for hybrid key exchange (draft-ietf-tls-hybrid-design) are already in progress. The architectural principle is simple: run two KEMs in parallel, combine the outputs, and you’re protected against both classical and quantum attacks. The overhead is modest — roughly 1KB additional ciphertext and public key size — but the security gain during the transition period is enormous.

Algorithm Negotiation Protocol

In distributed systems, two parties need to agree on which algorithms to use. A crypto-agile architecture makes this negotiation explicit, policy-driven, and auditable. Here’s a simplified implementation:

"""
Algorithm Negotiation Protocol
Enables two parties to agree on cryptographic algorithms based on
their respective policy profiles, with fallback and logging.
"""
import json
import logging
from dataclasses import dataclass, field
from typing import Optional

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("algo-negotiation")


@dataclass
class PartyCapabilities:
    """What algorithms a party supports, in preference order."""
    party_name: str
    key_exchange: list[str] = field(default_factory=lambda: [
        "ML-KEM-768", "ML-KEM-1024", "X25519", "ECDH-P256"
    ])
    encryption: list[str] = field(default_factory=lambda: [
        "AES-256-GCM", "ChaCha20-Poly1305", "AES-128-GCM"
    ])
    signatures: list[str] = field(default_factory=lambda: [
        "ML-DSA-65", "ML-DSA-87", "ECDSA-P384", "RSA-4096"
    ])
    require_pq: bool = False  # If True, reject non-PQ options


@dataclass
class NegotiationResult:
    """The outcome of algorithm negotiation."""
    success: bool
    key_exchange: Optional[str] = None
    encryption: Optional[str] = None
    signatures: Optional[str] = None
    is_hybrid: bool = False
    warnings: list[str] = field(default_factory=list)
    selected_by: str = ""  # Which party's preference won


class AlgorithmNegotiator:
    """Negotiates shared algorithm selection between two parties."""

    def negotiate(
        self,
        client: PartyCapabilities,
        server: PartyCapabilities
    ) -> NegotiationResult:
        """Run negotiation. Returns agreed algorithms or failure."""
        result = NegotiationResult(success=False)
        result.warnings = []

        # Key Exchange negotiation
        kex = self._negotiate_category(
            "key_exchange", client.key_exchange, server.key_exchange,
            client.require_pq or server.require_pq
        )
        if kex is None:
            result.warnings.append("No compatible key exchange algorithm")
            return result
        result.key_exchange = kex

        # Encryption negotiation
        enc = self._negotiate_category(
            "encryption", client.encryption, server.encryption,
            client.require_pq or server.require_pq
        )
        if enc is None:
            result.warnings.append("No compatible encryption algorithm")
            return result
        result.encryption = enc

        # Signature negotiation
        sig = self._negotiate_category(
            "signatures", client.signatures, server.signatures,
            client.require_pq or server.require_pq
        )
        if sig is None:
            result.warnings.append("No compatible signature algorithm")
            return result
        result.signatures = sig

        # Detect hybrid usage (PQ + classical in same session)
        pq_algos = {"ML-KEM", "ML-DSA", "SLH-DSA"}
        result.is_hybrid = any(
            any(pq in a for pq in pq_algos)
            for a in [result.key_exchange, result.signatures]
        )

        result.success = True
        logger.info(
            f"Negotiated: KEX={result.key_exchange}, "
            f"ENC={result.encryption}, SIG={result.signatures}, "
            f"hybrid={result.is_hybrid}"
        )
        return result

    def _negotiate_category(
        self,
        category: str,
        client_list: list[str],
        server_list: list[str],
        require_pq: bool
    ) -> Optional[str]:
        """Find first common algorithm in client preference order."""
        pq_prefixes = ("ML-KEM", "ML-DSA", "SLH-DSA")

        common = set(client_list) & set(server_list)
        if not common:
            logger.warning(f"No common {category} algorithms")
            return None

        # Select in client preference order
        for algo in client_list:
            if algo in common:
                if require_pq and not any(algo.startswith(p) for p in pq_prefixes):
                    logger.warning(f"Skipping non-PQ {algo} (PQ required)")
                    continue
                logger.info(f"Selected {category}: {algo}")
                return algo

        return None

    def audit_negotiation(self, result: NegotiationResult) -> dict:
        """Generate audit record for the negotiation."""
        return {
            "success": result.success,
            "algorithms": {
                "key_exchange": result.key_exchange,
                "encryption": result.encryption,
                "signatures": result.signatures,
            },
            "is_hybrid": result.is_hybrid,
            "warnings": result.warnings,
            "pq_compliant": all(
                any(a.startswith(p) for p in ("ML-KEM", "ML-DSA", "SLH-DSA"))
                for a in [result.key_exchange, result.signatures]
                if a
            ) if result.success else False,
        }


# Demo: modern client with PQC, legacy server with classical only
def demo_negotiation():
    modern_client = PartyCapabilities(
        party_name="ModernApp-v3.0",
        key_exchange=["ML-KEM-768", "X25519"],
        encryption=["AES-256-GCM"],
        signatures=["ML-DSA-65", "ECDSA-P384"],
    )

    legacy_server = PartyCapabilities(
        party_name="LegacyService-v1.8",
        key_exchange=["ECDH-P256", "X25519"],
        encryption=["AES-256-GCM", "AES-128-GCM"],
        signatures=["ECDSA-P384", "RSA-2048"],
    )

    pq_server = PartyCapabilities(
        party_name="PQC-Service-v2.0",
        key_exchange=["ML-KEM-768", "ML-KEM-1024"],
        encryption=["AES-256-GCM"],
        signatures=["ML-DSA-65", "ML-DSA-87"],
        require_pq=True,
    )

    negotiator = AlgorithmNegotiator()

    print("=== Scenario 1: Modern client + Legacy server ===")
    result1 = negotiator.negotiate(modern_client, legacy_server)
    print(json.dumps(negotiator.audit_negotiation(result1), indent=2))

    print("\n=== Scenario 2: Modern client + PQC server (PQ required) ===")
    result2 = negotiator.negotiate(modern_client, pq_server)
    print(json.dumps(negotiator.audit_negotiation(result2), indent=2))

    print("\n=== Scenario 3: Legacy client + PQC server (PQ required) ===")
    legacy_client = PartyCapabilities(
        party_name="OldClient-v1.0",
        key_exchange=["ECDH-P256"],
        signatures=["ECDSA-P256"],
    )
    result3 = negotiator.negotiate(legacy_client, pq_server)
    print(json.dumps(negotiator.audit_negotiation(result3), indent=2))


if __name__ == "__main__":
    demo_negotiation()

The negotiation protocol ensures that algorithm selection is explicit, logged, and policy-driven. When a modern client connects to a legacy server, it gracefully falls back to the strongest common algorithm while logging the degradation. When PQC is required and no common algorithm exists, the negotiation fails cleanly rather than silently falling back to a weak algorithm. This is exactly how TLS cipher suite negotiation works — but extended to cover post-quantum algorithms and organizational policy constraints.

NIST PQC Standards: What You Need to Know

Before building your migration plan, you need to understand the target. Here’s a comparison of the three finalized NIST PQC standards:

Standard	Algorithm	Type	NIST Level	Key Size (pk/sk)	Signature Size	Speed
FIPS 203	ML-KEM	Key Encapsulation	1 (768), 3 (1024), 5 (1024)	1,184 / 2,400 bytes	N/A	Fast (lattice-based)
FIPS 204	ML-DSA	Digital Signature	2 (44), 3 (65), 5 (87)	1,312 / 2,560 bytes	2,420 bytes	Fast (lattice-based)
FIPS 205	SLH-DSA	Digital Signature	1–5 (variants)	32 / 64 bytes	7,856 – 49,856 bytes	Slow (hash-based)

Key observations:

ML-KEM is your primary target for key exchange and encryption. It’s fast, has manageable key sizes, and is already supported by major libraries (OpenSSL 3.5+, BoringSSL, AWS-LC, liboqs).
ML-DSA replaces ECDSA and RSA for digital signatures. Similar performance profile to ML-KEM — fast, moderate key sizes.
SLH-DSA is a hash-based signature scheme with a critical property: its security relies only on the security of hash functions, making it the most conservative option. The trade-off is large signature sizes and slower performance. It’s ideal for long-lived signatures where you need maximum confidence.

During your transition, you’ll likely use hybrid schemes that combine classical and post-quantum algorithms. For example, X25519 + ML-KEM-768 for key exchange, and ECDSA-P384 + ML-DSA-65 for signatures. This provides immediate quantum resistance while maintaining interoperability with systems that haven’t yet transitioned.

PQC Migration Playbook

Here’s a structured approach to migrating your cryptographic infrastructure, derived from CSWP 39 recommendations and real-world migration experience.

Phase 1: Discover and Inventory (Weeks 1–6)

Build your CBOM: Use automated scanning tools to identify every cryptographic algorithm, library, key, and protocol in your environment. Don’t forget embedded systems, third-party APIs, and data at rest.
Assess your maturity level: Using the NIST model above, honestly evaluate where you stand today. Most organizations are at Level 1 or 2.
Identify critical assets: Not all data needs immediate PQC protection. Classify by sensitivity, regulatory requirements, and data lifetime. Data that must remain confidential for 10+ years is the highest priority for harvest now, decrypt later defense.
Map dependencies: For each system, identify the cryptographic library chain — from application code through TLS libraries, HSMs, and hardware accelerators. This is where [supply chain security](https://hmmnm.com/ai-supply-chain-attacks/) becomes critical; a single vulnerable dependency in the chain can undermine your entire migration.

Phase 2: Design and Pilot (Weeks 7–16)

Define your cryptographic policy: Create machine-readable policy profiles (like the YAML example above) that specify approved algorithms, key lengths, and transition timelines.
Implement the abstraction layer: Wrap your existing cryptographic operations behind a standard API. This is the most critical engineering investment — it decouples your applications from specific algorithms.
Deploy hybrid cryptography: Start with key exchange and TLS termination. Hybrid X25519+ML-KEM support is already available in OpenSSL 3.5+ and major cloud providers.
Pilot with non-critical systems: Test your migration tooling on internal tools, development environments, and low-risk services before touching production.
Integrate with [zero trust architecture](https://hmmnm.com/zero-trust-architecture-ai-systems/): Crypto agility complements zero trust by ensuring that every trust decision is backed by the strongest available cryptography. If your zero trust enforcement points can’t support PQC algorithms, they become a migration bottleneck.

Phase 3: Production Migration (Weeks 17–36)

Migrate TLS termination: Update load balancers, API gateways, and web servers to support hybrid key exchange. This protects data in transit immediately.
Update certificate infrastructure: Move to hybrid certificate chains (classical + PQC signatures). Test with internal CAs before requesting public CA certificates.
Encrypt data at rest with PQC: Re-encrypt sensitive data stores using hybrid encryption schemes. Prioritize data with long classification lifetimes.
Update signing infrastructure: Code signing, document signing, and authentication tokens should support PQC signatures. Use ML-DSA for performance-sensitive applications and SLH-DSA for long-lived signatures.
Validate with adversarial testing: Hire a penetration testing team that specializes in cryptographic migration. Test your fallback paths, algorithm negotiation, and policy enforcement.

Phase 4: Continuous Agility (Ongoing)

Automate CBOM updates: Integrate CBOM scanning into CI/CD pipelines so every deployment updates your cryptographic inventory.
Monitor deprecation timelines: Subscribe to NIST, IETF, and vendor deprecation notices. Your policy engine should flag algorithms approaching deprecation.
Conduct regular migration drills: Every 6–12 months, perform a tabletop exercise simulating an algorithm deprecation. Can you migrate within 30 days? 90 days?
Track emerging standards: NIST’s PQC standardization process is ongoing. Additional algorithms (BIKE, HQC, Classic McEliece) may be standardized for specific use cases. Your architecture should be ready to evaluate and adopt them.

IoT and Legacy System Challenges

The migration playbook above works well for cloud-native applications and modern infrastructure. But many organizations face a much harder problem: cryptographic transitions in IoT devices, embedded systems, and legacy applications that were never designed for agility.

Hardware Constraints

Post-quantum algorithms have significantly larger key sizes and ciphertext sizes than their classical counterparts. ML-KEM-768’s public key is 1,184 bytes compared to X25519’s 32 bytes — a 37× increase. For constrained devices with limited flash storage, RAM, and processing power, this can be prohibitive.

Mitigation strategies include:

Session-based key caching: Derive long-term session keys from a single PQC key exchange, amortizing the overhead across many messages.
Offloading to gateways: Use edge gateways to handle PQC operations on behalf of constrained devices. The device only needs to communicate with the gateway using lightweight classical cryptography, while the gateway handles PQC key exchange with the cloud.
Algorithm selection: For extremely constrained devices, SLH-DSA’s small key sizes (32 bytes for the public key in some parameter sets) may be preferable despite slower verification performance.

Firmware Update Challenges

Many IoT devices run firmware that can’t be updated — either because there’s no OTA mechanism, because the manufacturer has gone out of business, or because regulatory requirements prohibit changes to certified systems. These devices represent a permanent legacy risk that can’t be solved through software migration alone.

Approaches include:

Network-layer isolation: Place non-upgradable devices on isolated network segments with protocol translation gateways that handle PQC on their behalf.
Hardware replacement planning: Include cryptographic capability requirements in procurement specifications. Any device without PQC upgradeability should have a defined end-of-life date.
Supply chain due diligence: Evaluate vendor PQC roadmaps during procurement. Require contractual commitments for cryptographic agility in firmware updates.

Supply Chain Dependencies

Your crypto agility is only as strong as your weakest supplier. If a critical SaaS provider, hardware vendor, or open-source library can’t support PQC algorithms, your migration is blocked regardless of your own readiness. This is where CBOM integration with SBOM becomes essential — you need to track not just your own cryptographic dependencies but those of every component in your supply chain.

Enterprise Roadmap: From Level 1 to Level 5

Here’s a practical roadmap for advancing through the NIST maturity model, tailored to different organizational sizes:

Maturity Level	Timeline	Key Actions	Investment
Level 1 → 2	1–3 months	Run CBOM scanner. Document known crypto assets. Assign owners. Subscribe to NIST deprecation notices.	Low (tooling + part-time analyst)
Level 2 → 3	3–6 months	Complete CBOM. Create formal policy profiles. Implement crypto abstraction layer in new code. Begin hybrid TLS deployment.	Medium (engineering sprint + policy work)
Level 3 → 4	6–18 months	Deploy policy engine. Integrate CBOM with SBOM/CI-CD. Automate deprecated algorithm detection. Complete PQC migration for critical systems.	High (dedicated team + infrastructure)
Level 4 → 5	18–36 months	Integrate crypto agility into enterprise risk management. Continuous monitoring. Regular migration drills. Full PQC coverage including IoT/legacy.	Ongoing (embedded in security operations)

The critical insight is that advancing from Level 1 to Level 3 is achievable within 6 months with modest investment. The engineering work is straightforward — it’s the organizational awareness and prioritization that’s the bottleneck. The 93% of organizations without a formal PQC plan aren’t lacking technical capability; they’re lacking urgency.

Key Takeaways

Crypto agility is not optional. With NIST PQC standards finalized and “harvest now, decrypt later” threats already active, the question is when — not if — you’ll need to migrate. Agility determines whether that migration takes weeks or years.
The CBOM is your foundation. You cannot manage cryptographic risk without knowing where your cryptography lives. Build a Cryptographic Bill of Materials and integrate it into your SBOM workflows.
Policy-driven architecture is the target. Algorithm selection should be a configuration decision, not a code change. The three pillars — modularity, abstraction, and policy-mechanism separation — make this possible.
Hybrid cryptography bridges the gap. During the transition period, combining classical and post-quantum algorithms provides immediate quantum resistance while maintaining interoperability.
Start now, even if imperfectly. Moving from Level 1 to Level 3 in 6 months is achievable and dramatically improves your posture. Don’t wait for a perfect plan — build the foundation and iterate.
Legacy and IoT systems need special attention. Hardware constraints and firmware immutability mean some systems can’t be migrated through software alone. Plan for network isolation, gateway translation, and hardware replacement.
Integrate with broader security frameworks. Crypto agility complements zero trust architecture and supply chain security. Together, they create a defense-in-depth posture that can adapt to evolving threats.

References

NIST, Considerations for Achieving Crypto Agility (CSWP 39), December 2025. https://csrc.nist.gov/pubs/cswp/39/final
NIST, FIPS 203: Module-Lattice-Based Key-Encapsulation Mechanism Standard (ML-KEM), August 2024.
NIST, FIPS 204: Module-Lattice-Based Digital Signature Standard (ML-DSA), August 2024.
NIST, FIPS 205: Stateless Hash-Based Digital Signature Standard (SLH-DSA), August 2024.
Quantum Insider, Enterprise PQC Readiness Report, March 2025.
Stevens, M. et al., The first collision for full SHA-1, CRYPTO 2017.
Google Security Blog, Heartbleed, April 2014.
IETF, draft-ietf-tls-hybrid-design: Hybrid Key Exchange in TLS 1.3, 2024–2025.
Prabhu Kalyan Samal, Zero Trust Architecture for AI Systems, Hmmnm. https://hmmnm.com/zero-trust-architecture-ai-systems/
Prabhu Kalyan Samal, AI Supply Chain Attacks, Hmmnm. https://hmmnm.com/ai-supply-chain-attacks/