Open Policy Agent for Terraform: Build policy-based guardrails for your IaC deployments

While traditional Infrastructure as Code tools offer a multitude of benefits, they usually fail to meet the security and compliance requirements of modern security-focused organizations when managing infrastructure at scale.

This post will show you how you can leverage Open Policy Agent and Policy as Code to automate security and compliance procedures as well as enforce custom policies across an organization at scale.

Introduction

Everything as Code, also called EaC, is an emerging practice that aims to treat and implement every component of an IT system as code. The main focus is to switch from error-prone manual operations to codifying components while employing best practices used for software development regarding versioning, scaling, and testing.

When deploying infrastructure to the cloud, Infrastructure as Code has become the de facto standard. Be it CloudFormation, AWS CDK, or cloud-agnostic open-source tools like Terraform, there are a wide variety of options to choose from when provisioning, upgrading, and managing cloud infrastructure. Using IaC offers several benefits in terms of flexibility, speed, and consistency of development and deployment. While the advantages of IaC are clearly visible, traditional IaC tools alone fail to meet the security and compliance requirements of modern security-focused organizations when managing infrastructure at scale.

One solution to this problem is the addition of Policy as Code. Policy as Code tools like Open Policy Agent allows automated and unified policy enforcement as well as security and compliance validation across a company’s technology stack. By expressing policies and guardrails as code, testing, sharing, and enforcement becomes possible at nearly any scale. Reduction of human error and a higher level of overall security and compliance are the end result.

In this blog, I would like to show you how you can leverage Terraform (IaC) in combination with Open Policy Agent (PoC) to ensure secure and compliant infrastructure deployments.

Workflow

Before I guide you through the example, I would like to start by giving you a general overview of the workflow when using Terraform and Open Policy Agent (OPA) to deploy infrastructure. The figure below highlights the main steps and will function as a blueprint later on.

OPA Workflow

The first step of the process starts with the developer and the development of Terraform Code itself. Without Terraform code there is nothing we could evaluate our OPA policies against. After having implemented the desired infrastructure configuration, we generate a Terraform plan output by using terraform plan -out tfplan. Terraform plan lets us create an execution plan and preview the changes Terraform intends to perform. Afterward, OPA will analyze resource creations, updates, and deletions and will compare these planned changes to permitted actions defined in custom policy documents. Policy documents can be stored either on the local machine or in a remote location like a database. In case the intended changes comply with the guardrails defined in the policies, the evaluation is marked as successful. A successful evaluation by OPA leads to terraform apply being executed to deploy the infrastructure to AWS.

Setup Project Structure

Before we start implementing our OPA policies and Terraform configuration, I would like you to create the project structure. Please create the following files and folders.

├── policy/
│   └── policy.rego
├── main.tf

As you can see, we don’t need much to demonstrate the usage of OPA in combination with Terraform. The file main.tf will contain our Terraform code. The folder policy will contain a single policy file policy.rego that we will use to evaluate our Terraform plan output.

Generate Terraform Plan Output

As already described above, the whole OPA evaluation process starts with the Terraform code itself. For that reason, we will create an example configuration containing a few resources. The actual infrastructure is not the main focus of this discussion and has been kept simple on purpose. If you are looking for a challenge, feel free to experiment and use your own Terraform configuration. The example below creates four resources, an S3 Bucket, two EC2 Instances, and an IAM Role. Please copy the code into your main.tf. (You can also download the code straight from Github)


################################################################################
# S3
################################################################################

resource "aws_s3_bucket" "this" {
}


################################################################################
# EC2
################################################################################

resource "aws_instance" "instance_A" {
  instance_type = "t2.large"
  ami = data.aws_ami.ubuntu.id
}

resource "aws_instance" "instance_B" {
  instance_type = "t2.large"
  ami = data.aws_ami.ubuntu.id
}

data "aws_ami" "ubuntu" {
  most_recent = true

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}


################################################################################
# IAM
################################################################################

resource "aws_iam_role" "this" {
  name = "example-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect  = "Allow"
        Sid    = ""
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      },
    ]
  })
}

Once you have copied the code into your main.tf, initialize Terraform by running terraform init and generate a Terraform plan output by using the command terraform plan -out tfplan. This command will perform a terraform plan and store the plan output as a binary into the file tfplan. Before we can proceed we have to convert the binary file into JSON so that it can be read by OPA. Run the command terraform show -json tfplan > tfplan.json to create a JSON version of the Terraform plan output.

Implement OPA Policy

Now that we have converted our plan to JSON, we can start implementing our OPA policy. OPA policies are written in Rego. Rego is a declarative, general-purpose policy language. Given that OPA is Policy as Code, you can implement any policy you want as long as the attributes and values you are evaluating are part of the Terraform plan JSON file.

In this example, we will make a couple of basic checks to ensure that only approved AWS resources can be created, deleted, or changed. The example should only be used as a foundation and starting point for your own policies. Please start by copying the policy below into the policy.rego file we created earlier. I will go over the policy step by step and explain the major components in detail later on.


##########################################
# Imports
##########################################

package terraform.analysis

import input as tfplan
import future.keywords.in


##########################################
# Parameters
##########################################

blast_radius := 30

weights := {
    "aws_instance":{"delete":100, "create": 6, "modify": 1},
    "aws_s3_bucket":{"delete":100, "create": 20, "modify": 1}
}


##########################################
# Changed & Created Resources
##########################################

res_changes[resource_type] := all {
    some resource_type
    weights[resource_type]
    all := [name |
        name:= tfplan.resource_changes[_]
        name.type == resource_type
    ]
}

res_creations[resource_type] := num {
    some resource_type
    res_changes[resource_type]
    all := res_changes[resource_type]
    creates := [res |  res:= all[_]; res.change.actions[_] == "create"]
    num := count(creates)
}


##########################################
# Policies
##########################################

score := s {
    all := [ x |
            some resource_type
            crud := weights[resource_type];
            new := crud["create"] * res_creations[resource_type];
            x := new
    ]
    s := sum(all)
}

deny_iam_changes {
    some resource in tfplan.resource_changes
    violations := [address |
        address := resource.address
        contains(resource.type, "iam")
    ]
    count(violations) > 0
}

check_instance_type {
    some resource in tfplan.resource_changes
    violations := [address |
        address := resource.address
        resource.type == "aws_instance"
        not resource.change.after.instance_type == "t2.micro"
        ]

    count(violations) > 0
}

default authz := false
authz {
    score < blast_radius
    not deny_iam_changes
    not check_instance_type
}

The first section of the policy loads the OPA Terraform package terraform.analysis, the Keyword in, and imports the evaluation input as tfplan. When we run our OPA evaluation later on, tfplan.json will be our input. By declaring import input as tfplan, we are able to use the keyword tfplan when referencing our attributes and values from the plan file instead of having to use input. Even though this step is not necessary, it makes referencing easier.


##########################################
# Imports
##########################################

package terraform.analysis

import input as tfplan
import future.keywords.in

After having declared the imports, we will define two additional parameters - blast_radius and weights. Our OPA policy will analyze the total amount of created, destroyed, and modified resources and will make sure that the combined changes don’t go above a pre-defined threshold. Each resource that we want to include in this evaluation receives an entry in the weights object as well as a numeric value for delete, create, and modify. The numeric values represent a score that is assigned to each resource type and Terraform action. Creating an S3 Bucket and an EC2 Instance for example would result in a score of 6 + 20 = 26. The blast_radius represents the upper boundary for the score of combined resource changes and cannot be exceeded. Creating two S3 Buckets would not be possible, as a score of 20 +20 = 40 would result in a greater score than the blast_radius of 30.


##########################################
# Parameters
##########################################

blast_radius := 30

weights := {
    "aws_instance":{"delete":100, "create": 6, "modify": 10},
    "aws_s3_bucket":{"delete":100, "create": 20, "modify": 10}
}

Next, we will create two objects - res_changes and res_creations. The first object res_changes is a collection of all resources that will be changed by Terraform (create, delete, modify) and are included in the weights object. The resources are grouped by resource type. In the case of our example the object res_changes will include all aws_instance and aws_s3_bucket resources that are either created, deleted, or modified. We use this object to create the object res_creation. res_creation is a subset of res_changes and includes the number of resources per resource type that will be created by Terraform. If we create two S3 Buckets for example, res_creation will contain the value entry 2 for the object key aws_s3_bucket. Both objects are later used to determine the total blast_radius.

To keep this example as understandable as possible, only resource creations are considered. Feel free to challenge yourself and implement a separate object for resource deletion and modification on your own.


##########################################
# Changed & Created Resources
##########################################

res_changes[resource_type] := all {
    some resource_type
    weights[resource_type]
    all := [name |
        name:= tfplan.resource_changes[_]
        name.type == resource_type
    ]
}

res_creations[resource_type] := num {
    some resource_type
    res_changes[resource_type]
    all := res_changes[resource_type]
    creates := [res |  res:= all[_]; res.change.actions[_] == "create"]
    num := count(creates)
}

The next section deals with the implementation of the actual policies. In the case of our example we will implement three rules that will be evaluated together - score, deny_iam_changes, and check_instance_type. score calculates the total blast_radius by multiplying the create weights of each resource type with the number of distinct resources that will be created of said type. deny_iam_changes will count all planned changes to resources that contain the word iam. We will use this rule to disallow any changes to IAM resources. check_instance_type will check if the instance type of EC2 Instances is set to t2.micro.


##########################################
# Policies
##########################################

score := s {
    all := [ x |
            some resource_type
            crud := weights[resource_type];
            new := crud["create"] * res_creations[resource_type];
            x := new
    ]
    s := sum(all)
}

deny_iam_changes {
    some resource in tfplan.resource_changes
    violations := [address |
        address := resource.address
        contains(resource.type, "iam")
    ]
    count(violations) > 0
}

check_instance_type {
    some resource in tfplan.resource_changes
    violations := [address |
        address := resource.address
        resource.type == "aws_instance"
        not resource.change.after.instance_type == "t2.micro"
        ]

    count(violations) > 0
}

After having defined all rules, we combine all three into a single rule authz. By default authz is set to false which means that the policy will be non-compliant by default. Only if the calculated score is lower than the pre-defined blast_radius, no changes have been made to IAM resources, and the instance type of EC2 Instances is set to t2.micro will the policy evaluation be shown as compliant.


default authz := false
authz {
    score < blast_radius
    not deny_iam_changes
    not check_instance_type
}

Evaluate Policy

Now that we have implemented our Terraform configuration as well as our policy, it is time for the evaluation. To evaluate our policy we will run the following command.

opa exec --decision terraform/analysis/authz -b policy/ tfplan.json

opa exec will execute OPA against one or more input files. The --decision flag is used to set the rule we want to evaluate. In our case --decision will be pointed at authz. The -b flag lets us define the directory of our policy files. tfplan.json is the evaluation input file. When running the command, you should receive the following output.

OPA evaluation with IAM

As you can see the rule authz evaluates to false. That means that our Terraform configuration is not compliant with our policy. Let’s dig a little deeper. As discussed in the section earlier, Terraform configurations that make any changes to IAM resources will be non-compliant. To check to the status of our rule deny_iam_changes, we can point the opa exec command to a different rule via the --decision flag.

opa exec --decision terraform/analysis/deny_iam_changes -b policy/ tfplan.json

As shown by the command output, the rule is evaluated as true. That means that our Terraform plan includes changes to IAM resources. To make our configuration compliant we would have to remove the resource aws_iam_role from our Terraform configuration.

Deny IAM Changes evaluation

Let’s also verify the status of the other two rules score and check_instance_type by adjusting the --decision flag. Run the two following commands.

opa exec --decision terraform/analysis/score -b policy/ tfplan.json

opa exec --decision terraform/analysis/check_instance_type -b policy/ tfplan.json

For the evaluation of the score rule you should receive a result of 32. By combining the pre-defined creation weights of our resources aws_instance and aws_s3_bucket we receive a score of 6 + 6 + 20 = 32. This means that our current Terraform configuration has a blast_radius high than 30. In order to be compliant with our policy we will have to remove either the S3 Bucket or one of the EC2 Instances.

Score evaluation

The rule check_instance_type also evaluates to true which is not compliant with our policy. To make our configuration compliant, we will have to change the instance type of our EC2 Instances from t2.large to t2.micro.

Instance Type evaluation

Adjust Terraform Configuration

After having evaluated all the rules of our policy, we will make the necessary Terraform adjustments. Please replace the content of your main.tf with the following snippet.


################################################################################
# S3
################################################################################

resource "aws_s3_bucket" "this" {
}


################################################################################
# EC2
################################################################################

resource "aws_instance" "instance_A" {
  instance_type = "t2.micro"
  ami = data.aws_ami.ubuntu.id
}

data "aws_ami" "ubuntu" {
  most_recent = true

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

As you can see, we removed the aws_iam_role resource, one aws_instance, and changed the instance type of the remaining instance to t2.micro. To evaluate our adjusted configuration, we have to regenerate our tfplan.json. Start by deleting the old tfplan and tfplan.json files. Once deleted, rerun the commands terraform plan -out tfplan and terraform show -json tfplan > tfplan.json. Afterward, reevaluate the OPA policy by executing opa exec --decision terraform/analysis/authz -b policy/ tfplan.json. As you can see that authz evaluates to true. That means, that our new Terraform configuration is compliant and could be deployed. Feel free to rerun the evaluation for the other rules to get a better feeling of how our adjustments affected the outcome.

OPA evaluation after adjustments

Summary

As you can see it is not that complicated to create basic guardrails with OPA for your IaC deployments. By leveraging Policy as Code we were able to make our Terraform deployment more secure and compliant with pre-defined standards. Even though the configuration itself as well as the policy used as part of this example were simple in nature, I hope the power and potential of using IaC and PaC together have been made clear.

Besides OPA, static code analysis tools with pre-defined rules like KICS or Checkov can also be used to handle the most common misconfigurations and security threats. A sensible approach could be to use Checkov to make a first basic scan and OPA to enforce highly customized policies afterward.

I hope you had fun and learned something new while working through this short example. I am looking forward to your feedback and questions. If you want to take a look at the complete example code please visit my Github.

— Hendrik

Similar Posts You Might Enjoy

Site-to-Site VPN with Public Encryption Domain: Build IPSec VPN tunnels between AWS and your On-Premises data center using public IP's as encryption domain.

When setting up IPSec VPN connections between different companies, the connecting parties often require the tunnel to use public IP addresses as the encryption domain. Especially when establishing a connection to telecommunication partners, the usage of public addresses is often mandatory and ensures that there are no overlapping addresses across other connections. While this requirement poses a challenge when using AWS-managed services like AWS Site-to-Site VPN, it can still be accomplished by using third-party VPN appliances running on EC2. In this blog post, I would like to show you how you can leverage tools like pfSense and VNS3 in combination with Terraform to build a Site-to-Site IPSec VPN connection between AWS and on-premises networks with a public encryption domain. - by Hendrik Hagen

Terraform CI/CD Pipelines: Use AWS CodePipeline to build fully-managed deployment pipelines for Terraform.

When deciding which Infrastructure as Code tool to use for deploying resources in AWS, Terraform is often a favored choice and should therefore be a staple in every DevOps Engineer’s toolbox. While Terraform can increase your team’s performance quite significantly even when used locally, embedding your Terraform workflow in a CI/CD pipeline can boost your organization’s efficiency and deployment reliability even more. By adding automated validation tests, linting as well as security and compliance checks you additionally ensure that your infrastructure adheres to your company’s standards and guidelines. In this blog post, I would like to show you how you can leverage the AWS Code Services CodeCommit, CodeBuild, and CodePipeline in combination with Terraform to build a fully-managed CI/CD pipeline for Terraform. - by Hendrik Hagen

Cross Account Kafka Streaming Part 1: Use Amazon MSK and Terraform to build a real-time data analytics pipeline.

When discussing high performant real-time event streaming, Apache Kafka is a tool that immediately comes to mind. Optimized for ingesting and transforming real-time streaming data in a reliable and scalable manner, a great number of companies today rely on Apache Kafka to power their mission-critical applications and data analytics pipelines. In this blog series, I would like to show you how you can leverage Amazon MSK and Terraform to set up a fully managed, cross-account Apache Kafka streaming pipeline on AWS. In this first part, we will set up the MSK Kafka cluster and producers. The second part will show you how you can set up distributed Kafka clients in different AWS accounts and communicate with the MSK cluster via AWS VPC Endpoints. - by Hendrik Hagen