Out-of-Band Bootstrapping with Chef on AWS Systems Manager

A modern architecture avoids opening any SSH or WinRM/RDP ports to minimize the attack surface of your systems. Instead, management connections like the AWS SSM Agent should be implemented. But some tools, especially in the configuration management sector, still rely on direct access.

Chef Infra is on track to break this limitation with its new support for out-of-band (OoB) bootstrapping using Knife and arbitrary Train transports.

The mechanics in this post depend on two currently approved but not yet merged Pull requests (Chef PR #13534, Train PR #742)


“Bootstrapping” commonly relates to enrolling a system into a centralized management system. With Chef Infra, this means installing the Chef Infra agent on the system, preconfiguring it, getting certificate-based authentication set up, and doing a first run with instructions from the Chef Infra Server.

Previously, this was done via SSH or WinRM only, limiting this to systems with direct reachability. The alternative is a manual process where administrators upload the agent to the machine and set up everything by hand.

In this blog, I refer to indirect access as “out-of-band,” regardless of the connection protocol. We will use the example of an SSM-managed EC2-node to clarify the new possibilities.


As Chef leans heavily on metaphorical names for its tool, knife is the standard tool to interact with a Chef Server. It can execute many tasks, from inventory gathering to workflow management and configuration. You can even extend it with custom plugins if you need additional functionality or commands specific to your company or project. To use knife, your workstation needs to be known and authenticated to the Chef Server.

It also is responsible for the bootstrapping of new nodes:

knife bootstrap --node-name webserver-01 --user ec2-user --sudo --bootstrap-version 18.2.7

This command will

  • determine the connection protocol: SSH (22/tcp) or WinRM (5985/tcp or 5986/tcp)
  • connect to the machine
  • check the operating system
  • download the Chef Infra agent (in this case with a fixed version of 18.2.7)
  • ask the Chef Server for a new client identity and certificate
  • check for instructions to run (classical Chef run list or assigned policy)

So after this one-line command, you have the node inside your Chef infrastructure and can manage it along with 10,000s or even 100,000s of other nodes centrally. You can read more about the internals of Chef Bootstrapping on the documentation pages.

General Knife Train Support

Until now, knife had hardcoded SSH and WinRM protocols but already used a modular framework called Train under the hood. This framework specifies an abstract interface for command execution, file transfers, and file operations. The interface then gets implemented by different Train Transports, such as train-winrm.

With the initially mentioned pull requests, this restriction will be lifted. While API-based transports (like train-aws to communicate with the AWS meta-structure) have been turned off for obvious reasons, any installed Train Transport for command execution can now be used. This change enables other protocols like train-telnet and out-of-band functionalities like train-awsssm.


For quite a while, an AWS-specific Train Transport has been available under train-awsssm. Its primary use so far was for another project called InSpec, which does compliance checks of cloud platforms or operating systems. It is used daily to check 100,000s of EC2 instances in regulated environments with limited connectivity.

Internally, train-awsssm uses SSM Run Documents to encapsulate commands which otherwise would be sent over SSH. The SSM agent on EC2 instances will poll for instructions via HTTPS, get these documents, execute them, and return output and exit codes. While this approach is not ideal from a latency perspective, it does include an audit trail of any commands sent via this out-of-band connection.

Development of the more advanced Session Manager connectivity is ongoing in train-awsssm but is a considerable amount of work due to the WebSocket-based proprietary binary protocol involved. This added capability will speed up out-of-band access, making the wait worth it.

EC2 Prerequisites

To enable EC2 instances for SSM, you have to associate them with an EC2 Instance Profile, which has the appropriate privileges. A quick and secure way to ensure this is to use the SSMManagedInstanceCore policy. Alternatively, you can enable SSM access on a per-region basis using Default Host Management Configuration (DHMC).

If you associate an IAM Profile, you must also enable the Instance Metadata Service (IMDS); this feature is accessible via local IP, and preferably set to require version 2. The older version 1 is still supported but has an inherent risk of leaking credentials if your instance has remote file inclusion (RFI) vulnerabilities.

If you want to use EC2 tags inside your Chef cookbooks, take care also to enable passing them into IMDS (which is not default). Chef Infra will then provide them in the node['ec2']['tags_instance_*'] attributes.

To communicate with the Chef Infra server during and after bootstrapping, enable outgoing HTTPS (443/tcp) traffic to your server address.

Workstation Prerequisites

Your local administrator workstation will need to have an updated Chef Workstation installed, which includes the updates to knife and train. Also, you need to connect it to your Chef Infra server

Then, install your out-of-band Train Transport:

  • for AWS SSM: chef gem install train-awsssm
  • for VMware Guest Operations Management: chef gem install train-vsphere-gom
  • if you intend to use a console connection: chef gem install train-serial

Using OoB Bootstrapping with AWS SSM

Of course, you need to assume an AWS profile that allows you access to the account the EC2 instance in question is in. For managing your AWS credentials, the recommendations are either the traditional Awsume command or Leapp.

Then, you will need the instance’s IP address or ID to bootstrap. train-awsssm will automatically discover the instance with this information but not try to use any IP connectivity.

While knife has a legacy option to specify the connection protocol (--connection-protocol or -o), you can use Train’s under-documented URL notation instead.

This notation will use the Train Transport name as the URL’s schema, and you can even add parameters if the Transport offers them:

knife bootstrap awsssm://i-1234567890/ --node-name webserver-01 --bootstrap-version 18.2.7 --user ec2-user --sudo

# Windows needs an extended timeout for execution
knife bootstrap awsssm://i-1234567890/?execution_timeout=600 --node-name winserver1 --bootstrap-version 18.2.7

You can check those additional parameters on the respective GitHub repositories (like the parameters for train-awsssm).

Future Developments

It is possible to extend the out-of-band capabilities of Train to other Cloud Providers or Hypervisors - the plugin ecosystem makes this very easy.

Other knife functionality still relies on the non-Train connectivity options. Hence, an extension to other subcommands or unification (knife ssh vs knife winrm) is a logical next step.

Similar Posts You Might Enjoy

Streamlined Kafka Schema Evolution in AWS using MSK and the Glue Schema Registry

In today’s data-driven world, effective data management is crucial for organizations aiming to make well-informed, data-driven decisions. As the importance of data continues to grow, so does the significance of robust data management practices. This includes the processes of ingesting, storing, organizing, and maintaining the data generated and collected by an organization. Within the realm of data management, schema evolution stands out as one of the most critical aspects. Businesses evolve over time, leading to changes in data and, consequently, changes in corresponding schemas. Even though a schema may be initially defined for your data, evolving business requirements inevitably demand schema modifications. Yet, modifying data structures is no straightforward task, especially when dealing with distributed systems and teams. It’s essential that downstream consumers of the data can seamlessly adapt to new schemas. Coordinating these changes becomes a critical challenge to minimize downtime and prevent production issues. Neglecting robust data management and schema evolution strategies can result in service disruptions, breaking data pipelines, and incurring significant future costs. In the context of Apache Kafka, schema evolution is managed through a schema registry. As producers share data with consumers via Kafka, the schema is stored in this registry. The Schema Registry enhances the reliability, flexibility, and scalability of systems and applications by providing a standardized approach to manage and validate schemas used by both producers and consumers. This blog post will walk you through the steps of utilizing Amazon MSK in combination with AWS Glue Schema Registry and Terraform to build a cross-account streaming pipeline for Kafka, complete with built-in schema evolution. This approach provides a comprehensive solution to address your dynamic and evolving data requirements. - by Hendrik Hagen

🇩🇪 Verbesserung der deutschen Suche im Amazon OpenSearch Service

Der Amazon OpenSearch Service, der auf dem robusten OpenSearch-Framework basiert, zeichnet sich durch seine bemerkenswerte Geschwindigkeit und Effizienz in Such- und Analysefunktionen aus. Trotz seiner Stärken sind die Standardkonfigurationen des Dienstes möglicherweise nicht vollständig darauf ausgelegt, die spezifischen sprachlichen Herausforderungen bestimmter Sprachen zu bewältigen. - by Alexey Vidanov

Implementing SAML federation for Amazon OpenSearch Service with KeyCloak

Welcome back to our series on implementing SAML Federation for Amazon OpenSearch Service. In our previous post, we explored setting up SAML Federation using OneLogin. Today, we’ll focus on another popular identity provider - Keycloak. Keycloak is an open-source Identity and Access Management solution, ideal for modern applications and services. We’ll guide you through integrating Keycloak with Amazon OpenSearch Service to implement SAML Federation. - by Alexey Vidanov