Hybrid DNS resolution using Route 53 Endpoints



When implementing a hybrid cloud solution and connecting your AWS VPCs with corporate data centers, setting up proper DNS resolution across the whole network is an important step to ensure full integration and functionality. In order to accomplish this task, Route53 Inbound and Outbound endpoints can be used. In combination with forwarding rules, they allow you to forward DNS traffic between your AWS VPC and on-premises data centers.

In this blog post, I would like to show you how you can leverage Route53 endpoints in combination with Terraform to establish seamless DNS query resolution across your entire hybrid network.

Architecture

To start, I would like to guide you through the architecture we are planning to deploy as part of this example. The diagram below displays a high-level overview of the setup we are trying to simulate. The overall goal of this example is to create a hybrid network by connecting our AWS and on-premises network using an AWS Site-to-Site VPN. Furthermore, we will set up proper DNS resolution between the sites by leveraging Route53 Inbound and Outbound endpoints.

Architecture

We will set up two VPCs as part of this example. The first VPC on the left-hand side will represent the AWS network while the VPC on the right-hand side will simulate the on-premises network. The on-premises network will be connected to the AWS network using AWS Site-to-Site VPN tunnels. Each network will also receive its own DNS zone. The AWS network will host the zone aws-site.com, while the on-premises network will host the zone on-premises.com.

The AWS VPC will consist of a single EC2 called Client that needs to communicate with the on-premises web server. Furthermore, we will implement a private hosted zone named aws-site.com in Route53 and a single A record called client that will point to the IP address of the Client EC2 instance. To allow DNS forwarding between AWS and the on-premises network, Route53 Inbound and Outbound endpoints will be implemented. The Inbound endpoint will be configured to only allow inbound traffic from the DNS server located in the on-premises network. The Inbound endpoint will receive static private IP addresses. We will configure the on-premises DNS server to forward queries for aws-site.com to these IP addresses.

Once the Outbound endpoint has been created, we can implement forwarding rules and associate these rules with the endpoint. Forwarding rules allow you to forward DNS queries for specific domains to certain IP addresses that you define. In our example, we will create a rule that will forward queries for on-premises.com to the IP address of the on-premises DNS server. In order to establish a connection between the AWS and on-premises network via AWS Site-to-Site VPN, we will set up a Virtual Private Gateway as well as a Customer Gateway in the AWS VPC.

The on-premises environment will consist of a VPC with private and public subnets. The private subnet will contain a Web Server that the Client in the AWS VPC needs to reach. Besides the Web Server, we will also set up a custom DNS Server on an EC2 instance using bind. To ensure that the on-premises instances use our custom DNS server instead of the AWS Route53 Resolver, we will implement a custom DHCP option set and will configure our custom DNS server as the default VPC DNS server. We will configure a DNS zone called on-premises.com on the DNS server. In addition, we will define a single A record named server pointing to the IP of the web server in the zone file. The DNS server will be responsible to forward queries aimed at the zone aws-site.com to the Route53 Inbound endpoint and also to respond to queries for the on-premises.com zone.

The public subnet will contain the strongSwan VPN that we will use to create a Site-to-Site VPN connection between the AWS network and our simulated on-premises setup. The strongSwan EC2 instance will have two Elastic Network Interfaces, a LAN ENI as well as a WAN ENI. The LAN ENI will be placed in the private subnet of the VPC and will allow EC2 instances running in the private subnets to forward VPN traffic to the virtual appliance. The LAN ENI will only be assigned a private IP address. The WAN ENI will be placed in the public subnet of the VPC and will function as the entry point to the on-premises network. Besides the private IP, it will also be assigned a static public IP by using an Elastic IP. The public IP will be used as the VPN Peer IP and will allow the creation of an IPSec VPN connection between the two VPCs.

DNS Flow

Next, I would like to discuss the DNS traffic flow between AWS and the on-premises network when using Route53 Inbound and Outbound endpoints.

Outbound

Outbound

In our example, the flow starts with a DNS query by the AWS Client for server.on-premises.com.

  1. The DNS query is sent to the IP address located at the VPC network range plus two (VPC + 2) and forwarded to the VPC Route53 resolver.
  2. A Route53 forwarding rule has been configured to send queries for the on-premises.com domain to the IP address of the on-premises DNS server. The DNS query is sent to the Route53 Outbound endpoint.
  3. The Route53 Outbound endpoint forwards the query to the on-premises DNS server via the AWS Site-to-Site VPN connection.
  4. The on-premises DNS server receives and resolves the DNS query for server.on-premises.com.

Inbound

Inbound

In our example, the flow starts with a DNS query by the on-premises Web Server for client.aws-site.com.

  1. The DNS query is sent to the on-premises DNS server.
  2. A forwarding rule has been configured on the DNS server to send queries for the aws-site.com domain to the IP addresses of the Route53 Inbound endpoint. The DNS query is sent to the Route53 Inbound endpoint via the AWS Site-to-Site VPN connection.
  3. The DNS query arrives at the Route53 Inbound endpoint.
  4. The Route53 Inbound endpoint forwards the query to the VPC+2 address which in turn forwards the request to the VPC Route53 resolver. The DNS resolver resolves the DNS query for client.aws-site.com.

Project Structure

Before we start setting up our Terraform configuration, I would like you to create the project structure. Please create the following files and folders.

├── aws
│   ├── data.tf
│   ├── output.tf
│   ├── route53.tf
│   ├── terraform.tfvars
│   └── variables.tf
├── bootstrap
└── on-premises
    ├── dhcp.tf
    ├── terraform.tfvars
    └── variables.tf

As this example will be quite extensive, we will split up our Terraform configuration into separate files and folders. The Terraform configuration for the AWS network will be contained in the aws folder while the configuration for the on-premises networks will be contained in the on-premises folder. The folder bootstrap contains the Terraform code that will be used to bootstrap the aws and on-premises network.

Bootstrap Environment

Let’s start this example by bootstrapping the AWS and on-premises network. After having completed the bootstrapping process, we will have the following setup.

Bootstrap Completed

We will leverage an existing Terraform module to quickly bootstrap the AWS and on-premises environments. Please download the module from the Github repository of this blog post and copy it into the folder bootstrap/. The module will create the whole AWS and on-premises infrastructure including VPCs, VPN tunnel, and EC2 instances based on a few input variables.

Once downloaded, go into the bootstrap/ folder and run terraform init and terraform apply to deploy the infrastructure.

Test Connectivity

After having run the bootstrap Terraform configuration, we will test the connection between the Client and the DNS server as well as the on-premises Web server. We will start with the Client -> DNS server connection.

Open the AWS EC2 Console and select the aws-site-Client EC2 instance. Click on Connect, select the Session Manager tab, and click Connect. Once you have connected to the instance via the Session Manager, execute a ping command to the private IP of the on-premises-DNS.

Client DNS Ping

Next, we will test the communication between the on-premises Web server and the AWS Client. Open the AWS EC2 Console and select the on-premises-Web-Server EC2 instance. Click on Connect, select the Session Manager tab, and click Connect. Once you have connected to the instance via the Session Manager, execute a ping command to the private IP of the aws-site-Client.

Server Client Ping

AWS Configuration

Once the connectivity has been verified, we will continue with the deployment of the Route53 Inbound and Outbound endpoints.

Route53 Endpoint Deployment

Let’s start by defining the Terraform configuration for Route53. Please copy the following code into aws/route53.tf.

locals {
  dns_server_ip_cidr_notation = format("%s/32", var.on_premises_network.dns_server_ip)
}

################################################################################
# Private Hosted Zone
################################################################################

resource "aws_route53_zone" "this" {
  name = format("%s.com", var.application_name)

  vpc {
    vpc_id = data.aws_vpc.this.id
  }
}


################################################################################
# Record
################################################################################

resource "aws_route53_record" "this" {
  zone_id = aws_route53_zone.this.zone_id
  name    = "client"
  type    = "A"
  ttl     = 300
  records = [var.aws_site_client_ip]
}


################################################################################
# Inbound Endpoint
################################################################################

resource "aws_route53_resolver_endpoint" "inbound" {
  name      = "${var.application_name}-inbound"
  direction = "INBOUND"

  security_group_ids = [aws_security_group.inbound.id]

  dynamic "ip_address" {
    for_each = data.aws_subnets.private.ids
    content {
      subnet_id = ip_address.value
    }
  }
}

data "aws_route53_resolver_endpoint" "inbound" {
  resolver_endpoint_id = aws_route53_resolver_endpoint.inbound.id
}


################################################################################
# Outbound Endpoint
################################################################################

resource "aws_route53_resolver_endpoint" "outbound" {
  name      = "${var.application_name}-outbound"
  direction = "OUTBOUND"

  security_group_ids = [aws_security_group.outbound.id]

  dynamic "ip_address" {
    for_each = data.aws_subnets.private.ids
    content {
      subnet_id = ip_address.value
    }
  }
}

resource "aws_route53_resolver_rule" "this" {
  domain_name          = var.on_premises_network.domain_name
  name                 = "outbound"
  rule_type            = "FORWARD"
  resolver_endpoint_id = aws_route53_resolver_endpoint.outbound.id

  target_ip {
    ip = var.on_premises_network.dns_server_ip
  }
}

resource "aws_route53_resolver_rule_association" "this" {
  resolver_rule_id = aws_route53_resolver_rule.this.id
  vpc_id           = data.aws_vpc.this.id
}


################################################################################
# Inbound Endpoint Security Group
################################################################################

resource "aws_security_group" "inbound" {
  name   = "${var.application_name}-inbound-endpoint"
  vpc_id = data.aws_vpc.this.id
}

resource "aws_security_group_rule" "inbound_egress" {
  security_group_id = aws_security_group.inbound.id

  type        = "egress"
  from_port   = 0
  to_port     = 0
  protocol    = -1
  cidr_blocks = [data.aws_vpc.this.cidr_block]
}


resource "aws_security_group_rule" "inbound_udp_ingress" {
  security_group_id = aws_security_group.inbound.id

  type        = "ingress"
  from_port   = 53
  to_port     = 53
  protocol    = "udp"
  cidr_blocks = [local.dns_server_ip_cidr_notation]
}

resource "aws_security_group_rule" "inbound_tcp_ingress" {
  security_group_id = aws_security_group.inbound.id

  type        = "ingress"
  from_port   = 53
  to_port     = 53
  protocol    = "tcp"
  cidr_blocks = [local.dns_server_ip_cidr_notation]
}


################################################################################
# Outbound Endpoint Security Group
################################################################################

resource "aws_security_group" "outbound" {
  name   = "${var.application_name}-outbound-endpoint"
  vpc_id = data.aws_vpc.this.id
}

resource "aws_security_group_rule" "outbound_udp_egress" {
  security_group_id = aws_security_group.outbound.id

  type        = "egress"
  from_port   = 53
  to_port     = 53
  protocol    = "udp"
  cidr_blocks = [local.dns_server_ip_cidr_notation]
}

resource "aws_security_group_rule" "outbound_tcp_egress" {
  security_group_id = aws_security_group.outbound.id

  type        = "egress"
  from_port   = 53
  to_port     = 53
  protocol    = "tcp"
  cidr_blocks = [local.dns_server_ip_cidr_notation]
}

resource "aws_security_group_rule" "outbound_ingress" {
  security_group_id = aws_security_group.outbound.id

  type        = "ingress"
  from_port   = 0
  to_port     = 0
  protocol    = -1
  cidr_blocks = [data.aws_vpc.this.cidr_block]
}

We will deploy a private hosted zone named aws-site.com including an A record client pointing at the private IP address of the AWS Client EC2. In addition, the Inbound and Outbound endpoint will be deployed. Each endpoint will receive its own security group. The Inbound endpoint security group will only allow ingress traffic on port 53 from the IP address of the on-premises DNS server. The Outbound endpoint security group will allow egress traffic on port 53 to the IP address of the on-premises DNS server. Setting up the security groups this way ensures that communication between the endpoints and the on-premises network is restricted.

Before we are able to deploy the endpoints, we have to define a few variables and set up Terraform data resources to retrieve values. We will start with defining the variables. Please copy the following snippet into the aws/variables.tf.

variable "application_name" {
  description = "Name of the application"
  type        = string
}

variable "aws_site_client_ip" {
  description = "IP addresse of the client EC2"
  type        = string
}

variable "on_premises_network" {
  description = "Object of On-Premises network"
  type = object({
    domain_name   = string
    cidr_range    = string
    dns_server_ip = string
  })
}

To manage the values of our defined variables, we will create a aws/terraform.tfvars file. This will help us to keep our Terraform configuration clean and readable. Please define the following values for our variables in the terraform.tfvars file.

application_name   = "aws-site"
aws_site_client_ip = "172.16.128.10"

on_premises_network = {
  domain_name   = "on-premises.com"
  cidr_range    = "10.0.0.0/16"
  dns_server_ip = "10.0.128.10"
}

Besides the Terraform variables, we will retrieve the information regarding the AWS VPC and private subnets dynamically using Terraform data resources. Please copy the following Terraform configuration into the aws/data.tf.

################################################################################
# Get VPC
################################################################################

data "aws_vpc" "this" {
  filter {
    name   = "tag:Name"
    values = [var.application_name]
  }
}


################################################################################
# Get List of private Subnet IDs
################################################################################


data "aws_subnets" "private" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.this.id]
  }
  filter {
    name   = "tag:Name"
    values = ["*private*"]
  }
}

Before we deploy the Route53 Inbound and Outbound endpoints, we will add an output to our Terraform configuration. This output will provide the privat IP addresses of the Inbound endpoint which will be needed to configure the forwarding rule on the on-premises DNS server. Let’s create the output by adding the following configuration to the aws/outputs.tf.

output "route53_inbound_endpoint_ips" {
  description = "IPs of Route53 resolver inbound endpoints"
  value       = data.aws_route53_resolver_endpoint.inbound.ip_addresses
}

Go into the aws folder and run terraform init to initialize Terraform and terraform apply to deploy the Route53 infrastructure.

Test Endpoint Connectivity

After having set up the Route53 endpoints, we will test the connection between the on-premises DNS server and the Route53 Inbound endpoint.

Open the AWS EC2 Console and select the on-premises-DNS EC2 instance. Click on Connect, select the Session Manager tab, and click Connect. Once you have connected to the instance via the Session Manager, execute a telnet command on port 53 to the private IPs of the Route53 Inbound endpoint. The IPs of the Inbound endpoint can be found in the Terraform output route53_inbound_endpoint_ips.

DNS Telnet

If the telnet command succeeds, the DNS server is able to connect to the Route53 Inbound endpoint on port 53. This will allow the on-premises DNS server to forward queries for aws-site.com to the Route53 Inbound endpoint once the proper forwarding rules are configured on the EC2 instance.

On-Premises Configuration

To finalize the setup, we will have to configure the forwarding rule on the on-premises DNS server and implement a custom DHCP option set to define our custom DNS server as the default DNS server.

Configure Forwarding Rule

We will start by configuring the forwarding rule for the aws-site.com domain on the DNS server. Open the AWS EC2 Console and select the on-premises-DNS EC2 instance. Click on Connect, select the Session Manager tab, and click Connect. Once you have connected to the instance via the Session Manager, open the file /etc/named.conf. Execute the following command.

sudo vim /etc/named.conf

DNS VIM

Scroll down to the bottom. You should see the following, commented-out forwarding rule at the bottom.

//forward zone to AWS
//zone "aws-site.com" {
//    type forward;
//    forward only;
//    forwarders { ;};
//};

Press i to move into INSERT mode. Remove the comments // and add the two IP addresses of the Route53 Inbound endpoints under forwarders { ;};. In our example, the two IP addresses allocated by AWS are 172.16.128.43 and 172.16.129.55. After having removed the comments and having entered the Inbound IP addresses, the forwarding rule should look as follows.

//forward zone to AWS
zone "aws-site.com" {
    type forward;
    forward only;
    forwarders { 172.16.128.43; 172.16.129.55; };
};

Press ESC and type in :x to store the new configuration. Press Enter to confirm. Restart the service by using the command sudo systemctl restart named.

Make sure to NOT remove // in front of //forward zone to AWS.

Deploy DHCP Option Set

After having configured our DNS server, we will configure a custom DHCP option set in our on-premises VPC to ensure that all on-premises EC2 use the custom DNS server as the default DNS server. Copy the following Terraform configuration into the on-premises/dhcp.tf.

################################################################################
# Get VPC
################################################################################

data "aws_vpc" "this" {
  filter {
    name   = "tag:Name"
    values = [var.application_name]
  }
}

################################################################################
# DHCP Option Set
################################################################################

resource "aws_vpc_dhcp_options" "this" {
  domain_name         = format("%s.com", var.application_name)
  domain_name_servers = [var.dns_server_ip]

}

resource "aws_vpc_dhcp_options_association" "this" {
  vpc_id          = data.aws_vpc.this.id
  dhcp_options_id = aws_vpc_dhcp_options.this.id
}

Before we are able to deploy the DHCP option set, we have to define a few variables. We will start with defining the variables. Please copy the following snippet into the on-premises/variables.tf.

variable "application_name" {
  description = "Name of the application"
  type        = string
}

variable "dns_server_ip" {
  description = "IP of the On-Premises DNS server"
  type        = string
}

To manage the values of our defined variables, we will create a aws/terraform.tfvars file. This will help us to keep our Terraform configuration clean and readable. Please define the following values for our variables in the terraform.tfvars file.

application_name = "on-premises"
dns_server_ip    = "10.0.128.10"

Go into the on-premises folder and run terraform init to initialize Terraform and terraform apply to deploy the DHCP infrastructure.

Reboot EC2 Instances

To make sure that the running EC2 instances are configured correctly and are using the custom DNS server as the default DNS server, we will reboot the EC2 instances on-premises-DNS, on-premises-Strongswan, and on-premises-Web-Server. Open the AWS EC2 Console and right-click each of the three instances metioned above. Select the option Reboot Instance to reboot the instance.

Reboot Option

Confirm the reboot by clicking on Reboot.

Confirm Reboot

Test Hybrid DNS Resolution

Once the reboot of the EC2 instances has been performed, we will test the hybrid DNS resolution.

We will start by sending a dig command and querying server.on-premises.com from the AWS Client. Open the AWS EC2 Console and select the aws-site-Client EC2 instance. Click on Connect, select the Session Manager tab, and click Connect. Once you have connected to the instance via the Session Manager, execute a dig command to the domain server.on-premises.com to perform a DNS lookup for the on-premises domain.

dig AWS Site

You will see the IP address of the on-premises web server has been returned by the VPC DNS server located at the IP address VPC + 2. Next, issue a ping command to the DNS name of the on-premises-Web-Server.

Client Server DNS Ping

Next, we will test the communication between the on-premises Web server and the AWS Client. Open the AWS EC2 Console and select the on-premises-Web-Server EC2 instance. Click on Connect, select the Session Manager tab, and click Connect. Once you have connected to the instance via the Session Manager, execute a dig command to the domain client.aws-site.com to perform a DNS lookup for the on-premsies domain.

dig On-Premises

You will see the IP address of the client server has been returned by the on-premises DNS server. Next, issue a ping command to the DNS name of the aws-site-Client.

Server Client DNS Ping

If the DNS Resolution has been set up correctly, the pings should succeed. Congratulations, you just set up a hybrid network including Site-to-Site VPN connections and proper DNS resolution between AWS and a simulated on-premises network!

Summary

The architecture that we just implemented is meant to serve as a starting point and an example. There are multiple ways to make the solution more robust and efficient. For example, we could make the solution more robust by implementing a dynamic AWS Site-to-Site VPN including a Transit Gateway and multiple redundant VPN tunnels. Please see the blog post Multiple Site-to-Site VPN Connections in AWS Hub and Spoke Topology for a detailed explanation.

I hope you had fun and learned something new while working through this example. I am looking forward to your feedback and questions. If you want to take a look at the complete example code please visit my Github.

— Hendrik


Title Photo by Ian Taylor on Unsplash

Similar Posts You Might Enjoy

Multiple Site-to-Site VPN Connections in AWS Hub and Spoke Topology

When setting up an IPSec VPN connection between your AWS network and your corporate data center, the fully-managed AWS Site-to-Site VPN service is a popular choice that often comes to mind. AWS Site-to-Site VPN offers a highly-available, scalable, and secure way to connect your on-premises users and workloads to AWS. In this blog post, I would like to show you how you can go beyond a simple, static AWS Site-to-Site VPN connection by leveraging dynamically routed Site-to-Site VPNs in combination with a Transit Gateway. This hub and spoke network setup will allow us to employ the Border Gateway Protocol (BGP) as well as equal-cost multi-path routing (ECMP) and AWS Global Accelerator to not only exchange routing information between AWS and the corporate data center automatically but also increases the overall VPN throughput and reliability. - by Hendrik Hagen

Build a Site-to-Site IPSec VPN with Public Encryption Domain

When setting up IPSec VPN connections between different companies, the connecting parties often require the tunnel to use public IP addresses as the encryption domain. Especially when establishing a connection to telecommunication partners, the usage of public addresses is often mandatory and ensures that there are no overlapping addresses across other connections. In this blog post, I would like to show you how you can leverage tools like pfSense and VNS3 in combination with Terraform to build a Site-to-Site IPSec VPN connection between AWS and on-premises networks with a public encryption domain. - by Hendrik Hagen

Serverless Cross-Account Microservices

When setting up a microservice architecture, each individual service is often owned and managed by a different team. To achieve a higher level of resource isolation, and allow for more granular security and cost management, each service team usually deploys its resources into a dedicated AWS account. While this type of distributed approach offers many benefits in terms of productivity, scalability, and resiliency, it introduces another layer of complexity in regard to AWS cross-account communication and microservice consumption. In this blog post, I would like to show you how you can leverage AWS services like Amazon API Gateway, Lambda, DynamoDB, and VPC Endpoints in combination with Terraform to build a fully-managed and serverless cross-account microservice architecture. - by Hendrik Hagen