IaC Security: Your Infrastructure Has the Same Bugs as Your Code

In 2019, Capital One suffered a breach that exposed data on over 100 million customers in the United States and Canada. The initial attack vector was a misconfiguration in an AWS IAM role — excessive permissions on an infrastructure resource that had been defined in code, reviewed, approved, and deployed to production. The infrastructure was properly version-controlled. So was the security flaw.

What makes that case especially instructive isn’t that it happened, but that it was entirely preventable. The misconfiguration would have appeared in any IaC scanner with a basic IAM permissions rule. Nobody was running one.

Why IaC amplifies the misconfiguration problem

The move toward infrastructure as code emerged as a solution to a real problem: hand-defined environments are inconsistent, hard to audit, and slow to replicate. IaC solved all of that. But by moving infrastructure into code, it also imported every error pattern we already knew from software development — with more severe consequences.

A bug in application code affects that application. A misconfiguration in IaC potentially affects the entire infrastructure — and unlike an application bug, it doesn’t require anyone to actively exploit it to cause harm. An unencrypted database is always in that state. A public S3 bucket is always exposed. The attack is optional; the exposure is not.

The other difference is persistence. A CVE in a dependency can be patched in the next update cycle. An infrastructure misconfiguration exists for as long as that resource exists — sometimes years. Organizations accumulate infrastructure resources — things get added, rarely deleted — and misconfigurations travel with them.

What IaC scanning is

Infrastructure as Code security scanning analyzes infrastructure definition files for configurations that introduce security risk — before they’re applied. The analysis happens on source code, not on deployed infrastructure, which means problems are detected at the pull request stage, not weeks after the resource has been running in production.

The formats covered include Terraform (HCL and .tf files), Kubernetes and Helm (YAML manifests for container orchestration), Dockerfile (image definitions), CloudFormation and ARM Templates (the native formats for AWS and Azure), and Ansible (configuration automation playbooks). In practice, most modern teams have at least three or four of these formats coexisting in the same repository.

Gerion’s scanner uses KICS (Keeping Infrastructure as Code Secure), the open-source project originally maintained by Checkmarx with growing community contributions. KICS has over 2,000 detection queries covering major cloud providers and the most relevant risk categories: access control, encryption, network exposure, container configuration, and secret exposure.

The most common risk categories

Unintended network exposure

The most common pattern — and the one appearing most frequently in real breaches — is a Security Group or firewall configured with overly permissive rules. The obvious version is easy to catch in review:

resource "aws_security_group_rule" "ingress_all" {
  type        = "ingress"
  from_port   = 0
  to_port     = 65535
  protocol    = "-1"
  cidr_blocks = ["0.0.0.0/0"]
}

Any port, any protocol, any IP on the planet. But the version that slips through is more dangerous precisely because it seems reasonable: a security group that only opens port 22 (SSH) to 0.0.0.0/0, justified as “temporary for staging environment debugging.” That temporary rule has been in production for two years and nobody remembers creating it.

KICS catches both variants, along with less obvious ones: database administration ports (3306, 5432, 27017) accessible from the internet, full subnets accidentally exposed instead of individual IPs, and unrestricted egress rules that allow data exfiltration without network impediment.

Storage with public access

Misconfigured S3 buckets have been the source of so many data breaches that AWS introduced account-level controls to block them by default. The errors persist regardless — especially when a public access configuration is defined explicitly in Terraform for a legitimate use case (a static web asset bucket, for example) and that same configuration gets copied over to a bucket holding database backups.

resource "aws_s3_bucket_acl" "data_bucket" {
  bucket = aws_s3_bucket.data.id
  acl    = "public-read"
}

KICS detects buckets with public ACLs, buckets without at-rest encryption policies (SSE), buckets without versioning enabled (relevant for backup strategies), and bucket policy configurations that grant excessive permissions to IAM roles. The key is that it detects this in the .tf file, at the pull request stage, before the terraform apply executes the change in production.

Hardcoded credentials in Kubernetes manifests

Kubernetes has a Secret object specifically designed to manage credentials securely — separated from application code, mountable as environment variables or volumes, with its own access control (RBAC). Despite this, credentials frequently appear directly in Deployment manifests:

containers:
  - name: app
    image: myapp:latest
    env:
      - name: DATABASE_URL
        value: "postgres://admin:production_password@db.internal:5432/appdb"
      - name: STRIPE_SECRET_KEY
        value: "sk_live_abc123xyz"

The problem isn’t just that those credentials are in the YAML applied to the cluster. It’s that that YAML is in a git repository, potentially with dozens of collaborators, visible in the full history even if removed in a later commit. KICS detects hardcoded credential patterns in Kubernetes manifests before they reach the repository — and Gitleaks catches them in the history if they already have.

Containers with excessive permissions

The security philosophy behind containers is about reducing the attack surface: a process should have exactly the permissions it needs, nothing more. In practice, containers end up running with far more permissions than necessary because it’s the option that works without debugging.

securityContext:
  privileged: true
  runAsUser: 0
  allowPrivilegeEscalation: true
  capabilities:
    add: ["SYS_ADMIN", "NET_ADMIN"]

A container with privileged: true has near-complete access to the Kubernetes node it runs on — it can mount host filesystems, access hardware devices, and in many cases escape the container namespace entirely. If a container escape vulnerability exists (and they do, with regularity), the difference between a privileged container and one with minimal permissions is the difference between a contained incident and a breach of the entire node’s infrastructure.

KICS detects privileged: true, containers running as root (runAsUser: 0 or absent runAsNonRoot: true), excessive capabilities, and the absence of basic security configurations like readOnlyRootFilesystem or seccompProfile.

Disabled or absent encryption

Cloud providers offer at-rest and in-transit encryption on virtually all their storage and database services. Most are disabled by default or require explicit configuration. It’s easy to end up with production infrastructure where RDS has unencrypted backups, EBS volumes lack encryption, or traffic between internal services uses no TLS — not because anyone made a conscious decision not to encrypt, but because nobody enabled the option.

resource "aws_db_instance" "production" {
  identifier        = "prod-db"
  instance_class    = "db.t3.medium"
  storage_encrypted = false  # explicitly disabled — or simply absent
}

KICS detects storage and database resources without encryption configured, connections without mandatory TLS/SSL, and self-signed certificates in production contexts.

Why IaC misconfigs are harder to review than application code

Security review in application code is difficult but has methodology: an experienced reviewer looks for known patterns in business logic they control. Security review in IaC has a different problem: a 600-line Terraform file defining 40 AWS resources contains hundreds of attributes, each with security implications that depend on the values of other attributes in the same file or related files.

A human reviewer who doesn’t have a complete map of which services are in production, what data they handle, and which combination of attributes creates exposure in each specific AWS service cannot do that review reliably. It’s not lack of skill — it’s that the space of possible configurations is too large for exhaustive manual review.

KICS solves this with static analysis: it traverses all attributes of all resources, evaluates value combinations against over 2,000 detection queries, and produces a list of findings in seconds. It doesn’t replace architecture review — but it eliminates the entire class of errors detectable with deterministic rules.

Separating infrastructure repositories doesn’t solve the problem

A common pattern in mature teams is separating infrastructure code from application code — an infra/ repository managed by the platform team, separate from application repositories. It makes organizational sense. But it creates a security coverage gap: if application repositories are integrated with Gerion’s scanning pipeline but the infrastructure repository isn’t, coverage is incomplete.

The Gerion CLI runs like any other CI/CD step — on the same runner as the rest of the pipeline, with access only to the current repository’s code. Adding it to the infrastructure repository pipeline is the same process as adding it to any other:

- name: Gerion Security Scan
  run: |
    docker run --rm -v "$PWD:/code" \
      -e GERION_API_URL=${{ secrets.GERION_API_URL }} \
      -e GERION_API_KEY=${{ secrets.GERION_API_KEY }} \
      ghcr.io/gerion-appsec/gerion-cli:latest scan-all /code

KICS findings from the infrastructure repository land on the same platform as SAST, SCA, and Secrets findings from application repositories — normalized, with the same branch-based prioritization model and the same financial impact calculation. A Security Group misconfiguration on the main branch of the infrastructure repository carries a 10× multiplier. The same misconfiguration on a feature branch, a 1× multiplier. The Financial Impact Engine doesn’t distinguish between repository types — it distinguishes between what’s in production and what isn’t.

The practical result is that the platform team sees their IaC findings in the same dashboard where the application team sees theirs. No separate tool for infrastructure, no different triage process, no manual tracking spreadsheet for Terraform misconfigurations. A single security backlog, prioritized by real cost, with findings from all repositories and all scanners in one place.