What if HIPAA Violations Failed at Compile Time?

Mario Antonetti

2026-02-06

Somewhere right now, an engineer is deploying a service that stores patient health records in an unencrypted S3 bucket sitting in a region that violates data residency requirements. They don't know it yet. They'll find out during the audit — or worse, after the breach.

The compliance model for cloud infrastructure is backwards. We build first, audit later, and pray in between. HIPAA, GDPR, SOC2, FedRAMP are formal requirements with specific technical controls: encrypt this, restrict that, log everything, retain for exactly this long. They're computable. So why are we checking them with checklists and consultants instead of compilers?

Proven Architecture treats compliance requirements as types and valid architectures as proofs. If your architecture typechecks, it satisfies the spec. If it doesn't, you get a type error instead of millions of dollars in fines.

This post is written in Lean 4 using Verso. Every definition, every constraint, every proof obligation typechecks. If it didn't, you wouldn't be reading it.

Compliance Is a Relation

Compliance is a relation between data and the infrastructure that holds it. Whether a deployment is compliant depends on what data you're putting in it. The same S3 bucket in us-east-1 is perfectly fine for US patient records and a violation for German patient records with EU residency requirements.

Dependent types make this precise. In Proven Architecture, the data type itself carries its constraints. A PatientRecord has fields tagged as sensitive health data and a jurisdiction field that says where this patient's data must reside. A FileStore for PatientRecord is a dependent type — the store's valid configurations depend on the data it holds. If the data is tagged as health-sensitive, the store must be encrypted. If the patient's jurisdiction is Germany, the store must be in an EU region. These are proof obligations the type system generates. Satisfy them or the architecture doesn't compile.

Tagged Fields: Data That Knows What It Is

The fundamental idea: wrap field types with sensitivity tags. A Sensitive.Health String is a string that knows it's protected health information, and that knowledge propagates upward through any structure that contains it. I call this a tag, but properly it's chained instances of typeclasses.

namespace Sensitive
structure Health (α : Type) where
  val : α
  deriving Repr
structure Identity (α : Type) where
  val : α
  deriving Repr
end Sensitive

These are zero-cost wrappers at the conceptual level — they carry no runtime data beyond the wrapped type. Their purpose is entirely type-theoretic: a structure containing a Sensitive.Health field is a structure that holds health data, and the type system can see that.

inductive Jurisdiction where
  | us | eu_de | eu_ie | eu_fr
deriving DecidableEq, Repr

def Jurisdiction.country : Jurisdiction → Jurisdiction
  | j => j  -- identity for now; in practice maps to a Country type

inductive JurisdictionRegion where
  | usEast1 | usWest2 | euWest1 | euCentral1
deriving DecidableEq, Repr

def JurisdictionRegion.jurisdiction : JurisdictionRegion → Jurisdiction
  | .usEast1    => .us
  | .usWest2    => .us
  | .euWest1    => .eu_ie
  | .euCentral1 => .eu_de

Typeclasses: What the Data Demands

Typeclasses on the data type capture what kind of regulatory regime governs it. These are derived from the structure of the type itself:

class ContainsHealthData (α : Type) where
  requiresEncryption : Bool := true
  requiresAuditLog : Bool := true
  minRetentionDays : Nat := 2190  -- 6 years

class Locatable (α : Type) where
  jurisdiction : α → Jurisdiction

ContainsHealthData is a property of the type — every term of that type is health data, so every store holding it must be encrypted and audited. Locatable is term-dependent — different terms of the same type can have different jurisdictions, and the store constraint follows the term.

Now the architect defines their domain type:

structure PatientRecord where
  patientId : Nat
  diagnosis : Sensitive.Health String
  treatmentPlan : Sensitive.Health String
  jurisdiction : Jurisdiction
deriving Repr

instance : ContainsHealthData PatientRecord where

instance : Locatable PatientRecord where
  jurisdiction r := r.jurisdiction

One type. The ContainsHealthData instance exists because the type has Sensitive.Health fields — in a production system, a deriving handler would generate this automatically. The Locatable instance projects the jurisdiction from each term. One type, and the constraints follow from its structure.

FileStore: Dependent on What It Stores

A file store for a particular data type is a dependent type. Its valid configurations depend on what the data demands:

inductive StorageBackend where
  | aws_s3
  | gcp_gcs
  | azure_blob
deriving DecidableEq, Repr

inductive EncryptionStandard where
  | aes256 | aes128 | none
deriving DecidableEq, Repr

def EncryptionStandard.isApproved : EncryptionStandard → Bool
  | .aes256 => true
  | .aes128 => true
  | .none   => false

structure FileStore (α : Type) [ContainsHealthData α] where
  backend : StorageBackend
  region : JurisdictionRegion
  encryption : EncryptionStandard
  encryptionOk : encryption.isApproved = true
deriving Repr

Read the type: FileStore α is parameterized by the data type α, with a ContainsHealthData constraint. The encryptionOk field is a proof obligation — constructing a FileStore PatientRecord requires evidence that the chosen encryption standard is approved. With encryption := .none, the obligation EncryptionStandard.none.isApproved = true reduces to false = true. No proof term exists. The store must be encrypted because the data it holds is health data.

This is a construction constraint. The unencrypted S3 bucket from the opening line is a type that has no inhabitants.

def usPatientStore : FileStore PatientRecord :=
  { backend := .aws_s3
    region := .usEast1
    encryption := .aes256
    encryptionOk := rfl }

-- This does not compile:
-- def badStore : FileStore PatientRecord :=
--   { backend := .aws_s3
--     region := .usEast1
--     encryption := .none
--     encryptionOk := rfl }  -- ERROR: false ≠ true

The Binding: Where Dependent Types Earn Their Keep

The store exists. The data type exists. The compliance question is: can this particular term be stored in this particular store? This is where the term-level dependency matters — the same FileStore PatientRecord might be valid for one patient record and invalid for another, depending on jurisdiction:

structure CompliantBinding (α : Type) [ContainsHealthData α] [Locatable α] where
  record : α
  store : FileStore α
  residencyOk : store.region.jurisdiction = Locatable.jurisdiction record

residencyOk is a proof that the store's region matches this particular term's jurisdiction. The proof obligation changes with the term. A US patient in a US-region store? rfl. A German patient in a US-region store? Jurisdiction.us = Jurisdiction.eu_de — false. No proof. No binding. Type error instead of HIPAA violation.

def usPatient : PatientRecord :=
  { patientId := 1
    diagnosis := ⟨"Type 2 diabetes"⟩
    treatmentPlan := ⟨"Metformin 500mg daily"⟩
    jurisdiction := .us }

def dePatient : PatientRecord :=
  { patientId := 2
    diagnosis := ⟨"Hypertension"⟩
    treatmentPlan := ⟨"Lisinopril 10mg daily"⟩
    jurisdiction := .eu_de }

def euPatientStore : FileStore PatientRecord :=
  { backend := .aws_s3
    region := .euCentral1
    encryption := .aes256
    encryptionOk := rfl }

-- US patient in US store: compiles
def usBinding : CompliantBinding PatientRecord :=
  { record := usPatient
    store := usPatientStore
    residencyOk := rfl }

-- German patient in EU store: compiles
def deBinding : CompliantBinding PatientRecord :=
  { record := dePatient
    store := euPatientStore
    residencyOk := rfl }

-- German patient in US store: does not compile
-- def badBinding : CompliantBinding PatientRecord :=
--   { record := dePatient
--     store := usPatientStore
--     residencyOk := rfl }  -- ERROR: Jurisdiction.us ≠ Jurisdiction.eu_de

This is the dependent types argument in full. The same type CompliantBinding PatientRecord has different valid inhabitants depending on which PatientRecord term you bind. The store configuration that works for usPatient is a type error for dePatient. The proof obligation is generated from the term — and that's what dependent types uniquely express.

Provider Guarantees as Structured Evidence

Where does the evidence come from? When we say "S3 in us-east-1 supports AES-256 encryption at rest," that's a claim made by AWS. Today it lives in a whitepaper, a BAA, a documentation page. No cloud provider publishes service guarantees as structured, machine-readable data. SLAs are prose legal documents. Compliance certifications are PDFs.

inductive ProviderGuarantee where
  | encryptionSupported (backend : StorageBackend) (region : JurisdictionRegion) (std : EncryptionStandard)
  | regionAvailable (backend : StorageBackend) (region : JurisdictionRegion)
  | hipaaEligible (backend : StorageBackend)
  | slaCommitment (backend : StorageBackend) (region : JurisdictionRegion) (availability : Float)
deriving Repr

structure ProviderEvidence where
  guarantee : ProviderGuarantee
  source : String
  retrievedAt : String
deriving Repr

A ProviderGuarantee is a structured claim traceable to a specific provider commitment. Today, an engineer — or an LLM parsing provider documentation — constructs these terms. Tomorrow, providers could publish them directly as a machine-readable feed. When a provider updates a whitepaper and a guarantee changes, regenerate the terms. Anything depending on a removed guarantee fails to typecheck.

This is a trust boundary made explicit. The proof localizes trust in the provider. You trust the provider's published guarantees. Everything above that layer is verified.

Composing Architectures

A real system is a collection of bindings. Compliance composes because each binding independently carries its proof:

structure Architecture where
  bindings : List (CompliantBinding PatientRecord)

Adding a service means constructing a new CompliantBinding. The proof obligation is local — you don't re-prove the entire architecture. Swapping a store means re-proving only the bindings that reference it. The architecture is modular because each compliance judgment is self-contained.

In a richer version, Architecture would be polymorphic over multiple data types — patient records, billing data, analytics events — each with their own sensitivity tags and compliance requirements. The bindings compose because the typeclass constraints are orthogonal: health data demands encryption, PII demands access control, locatable data demands residency. A structure deriving all three must satisfy all three, and the proofs compose pointwise.

SLA Composition

Every cloud architect does availability math. They're just doing it by hand and getting it wrong.

Series composition: 99.9% × 99.9% × 99.9% = 99.7%. People chain ten services and wonder why their five-nines target delivers three.

structure Availability where
  nines : Float
deriving Repr

def Availability.series (a b : Availability) : Availability :=
  ⟨a.nines * b.nines⟩

def Availability.parallel (a b : Availability) : Availability :=
  ⟨1.0 - (1.0 - a.nines) * (1.0 - b.nines)⟩

Composed availability should be computed from component SLAs. When a provider downgrades an SLA, the composition recomputes and your architecture either still meets spec or fails to typecheck.

Cloud costs follow the same pattern. Estimated cost as a type constraint is weaker than compliance proofs — pricing is dynamic — but "this architecture at provisioned capacity costs no more than $X/month" catches overruns as type errors before they become invoice shocks.

structure USD_Monthly where
  amount : Nat
deriving Repr

instance : LE USD_Monthly where
  le a b := a.amount ≤ b.amount

-- TODO: refine Float to a proper probability type
-- and derive cost bounds from provisioned resources

Who Writes Lean?

The workflow is layered:

Platform engineers and architects define data types, store specs, and service specs in Lean. These are the people who already think about compliance, SLAs, and system-level guarantees. They write the types: PatientRecord, FileStore, CompliantBinding. They construct the proofs.

Lean proves the architecture satisfies the constraints. Before any infrastructure is provisioned — "compile time" for your architecture.

Terraform, Pulumi, or CloudFormation is the output. A FileStore PatientRecord term carries enough information to generate an S3 resource definition with the right region and encryption settings. The proven architecture compiles to standard IaC that humans can read, review, and deploy through existing pipelines.

Runtime code stays in whatever language it's already in. Developers work with the schemas. The platform team proves that the infrastructure bindings, given those schemas, satisfy the regulatory requirements.

The architecture proof can also specify what contract tests must exist for each service interface. If every service has verified contract tests, and the architecture proof assumes only those contracts, the system's guarantees hold when the tests pass. The proof tells you exactly what the tests need to cover.

What This Enables

When compliance is structural rather than aspirational:

Audit becomes trivial. The proof term is the audit trail. Every CompliantBinding carries a residencyOk proof traceable to concrete terms. Every FileStore carries encryptionOk. Hand the auditor the proof. It typechecked.

Refactoring is safe. Swap a database, change a region, add a service — if the new architecture typechecks against the same constraints, compliance is preserved by construction. Re-prove only what changed.

Provider changes are caught. Regenerate ProviderGuarantee terms from updated documentation. Anything depending on a removed guarantee fails to typecheck. The compiler tells you before the auditor does.

The spec is the documentation. New engineers read the types. FileStore PatientRecord tells you the store must be encrypted. CompliantBinding tells you the region must match the patient's jurisdiction. The compliance posture is in the types, and the types are current because the code wouldn't compile otherwise.

Honest Limitations

This is early-stage research.

Modeling fidelity. Real failure modes are messy — correlated failures, cascading failures, partial degradation, Byzantine behavior. An overly simplified model gives false confidence.

Probabilistic reasoning. Cloud systems are stochastic. SLA composition involves probability, and Float is a placeholder that needs a proper treatment — likely Rat or a dedicated probability type.

Keeping models current. Cloud providers change constantly. The ProviderGuarantee design addresses this structurally, but the pipeline to regenerate terms from updated documentation doesn't exist yet.

Adoption. Someone needs to write Lean specs. The layered workflow helps, but the learning curve can be significant. The deriving handlers that would automatically generate ContainsHealthData from Sensitive.Health fields are the kind of tooling that makes the difference between interesting research and a usable system.

None of these are fundamental obstacles. The core insight — that compliance is a relation between data and infrastructure, that the data itself carries the constraints via tagged field types, and that dependent types let the proof obligations follow the terms — is sound. The question is how far we can push it.

What's Next

I'm building out the core type definitions and working toward a minimal extraction pipeline that outputs Terraform. The ProviderGuarantee design points toward something larger — a machine-readable format for cloud providers to publish service guarantees as structured data. No provider does this today. An open standard for typed provider guarantees would make this entire approach practical at scale.

If this problem interests you — particularly if you work in formal methods, cloud platform engineering, or compliance — I'd like to hear from you. The infrastructure industry spends billions on compliance. The type theory community has spent decades building proof assistants. It's time to connect them.