Replace Local Cronjobs with EventBridge/SSM

Every machine has recurring tasks. Backups, updates, runs of configuration management software like Chef, small scripts, …

But one of the problems in a cloud environment is visibility. Instead of scheduling dozens of cron jobs or tasks per instance, would it not be nice to have a central service for this?

You already have. And it’s called EventBridge…

CloudWatch Events, EventBridge, what?

A long time back, scheduling and event-driven workflows were part of CloudWatch under the name “CloudWatch Events”. Since then, it has been heavily extended and renamed into “EventBridge”. If you see Terraform code etc, you might still encounter the old name - so do not be surprised.

EventBridge modes

Without trying to duplicate the official AWS documentation on EventBridge, I want to quickly introduce some of the concepts.

First, the Scheduler allows recurring execution of events - either in a certain interval or at defined points in time. The rate() syntax makes it easy to execute regular runs every few minutes or hours. On the other hand, the cron() method allows complex statements to address tasks for every first Friday of a month etc.

We also have the purely event-driven mode of EventBridge, which is hidden in the “Rules” section. Here, you can specify any input events (queues, notifications, even notifications from external AWS partners like Stripe or GitHub) and connect them with any AWS service. This is helpful if you want to immediately react to things like expiring ACM certificates or CloudTrail alerts.

In addition, there are a multitude of additional features like custom Event Busses, Pipes, etc. If you require some serverless event processing - EventBridge is your friend.

Systems Manager

For our context, two of the numerous Systems Manager (SSM) features are relevant.

Most importantly, Run Documents. There are canned documents that can take parameters and then execute commands via SSM Agent. While many are already provided by AWS, you can of course create your own and use this for custom automation.

It is worth noting, that there is an SSM-integrated option to run actions periodically. The State Manager is a way to associate instances with tasks and a time to execute them. While this is indeed a big overlap with EventBridge’s scheduler, it is also limited to EC2 instances. Still, it offers the same rate/cron variety for execution.

In contrast to this, EventBridge can also work cross-account (with a custom event bus), archive and replay events at a later time.

Wiring Up the Services

One of the base concepts of regular execution via on-instance Cron or Scheduled Tasks is the specific point in time. To carry this property over, we need to set the “Flexible Time Window” option to off.

When you look at the AWS Web Console, you find multiple event targets predefined. Lambda, Step Functions, ECS Tasks, SQS, and others are already ready for selection. But SSM is sadly missing.

The solution for this is to choose the general API integration, which provides access to all AWS APIs inside EventBridge. You simply select the service (“Systems Manager”) and the desired action (“SendCommand”).

Now we come to the point where some API knowledge is required because the next field will simply expect a JSON. As this is a generic integration, the data schema of the specific command will match their API exactly.

For our SendCommand example, a quick search of “SSM SendCommand API” will lead us to the SendCommand official API documentation. The trick is, to just use the properties marked as “required” plus the ones which you need.

For our example, we might end up with a JSON like this:

{
  "DocumentName": "AWS-ApplyChefRecipes",
  "InstanceIds": ["i-123456789abcdef"],
  "Parameters": {
    "SourceType": "S3",
    "SourceInfo": "https://examplebucket.s3.amazonaws.com/my-cookbook.tgz",
    "RunList": "recipe[my-cookbook]"
  }
}

Of course, every SSM command has its specific parameters which you can revisit in the Systems Manager console under “Documents” (right at the bottom).

Terraform Example

In a minimalistic example, this will schedule the same command we have used above.

data "aws_caller_identity" "current" {}
data "aws_region" "current" {}

locals {
  account_id = data.aws_caller_identity.current.account_id
  region     = data.aws_region.current.name
}

resource "aws_scheduler_schedule" "execute_chef" {
  name       = "execute_chef"

  flexible_time_window {
    mode = "OFF"
  }

  # schedule_expression = "rate(60 minutes)"

  schedule_expression = "cron(30 8 * * ? *)"
  schedule-expression-timezone "America/New_York"

  target {
    arn      = "arn:aws:scheduler:::aws-sdk:ssm:sendCommand"
    role_arn = aws_iam_role.execute_chef.arn

    input = jsonencode({
      DocumentName = "AWS-ApplyChefRecipes"
      InstanceIds  = ["i-123456789abcdef"]
      Parameters = {
        SourceType = "S3"
        SourceInfo = "https://examplebucket.s3.amazonaws.com/my-cookbook.tgz"
        RunList    = "recipe[my-cookbook]"
      }
    })
  }
}

resource "aws_iam_role" "execute_chef" {
  name        = "execute_chef"
  description = "Role for Scheduling"

  inline_policy {
    name = "InlinePolicy"
    policy = jsonencode({
      Version = "2012-10-17"
      Statement = [
        {
          Sid      = "InvokeCommand"
          Effect   = "Allow"
          Action   = "ssm:SendCommand"
          Resource = "*"
        }
      ]
    })
  }

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = [
          "scheduler.amazonaws.com"
        ]
      }
    }]
  })
}

Note: Adjust this for Least Privilege, using wildcards is not good practice in production.

Summary

Once you know about this way of scheduling and understand how to work with the native APIs, things get very easy. Of course, ideally, you will use this with Infrastructure-as-Code to have repeatable deployments.

Similar Posts You Might Enjoy

Querying Local Health Check URLs

Do you run software that provides locally available health checks via a webserver only reachable via localhost? In this blog post, I will show you an architecture that you can use to connect those local health checks to CloudWatch Logs and even receive alarms if things are not going to plan. - by Thomas Heinen

Version Control your Database on AWS using Flyway

Proper version control is an essential part of a fast-paced, agile development approach and the foundation of CI/CD. Even though databases are an important aspect of nearly every application, database migrations, and schema evolutions are often not versioned and not integrated into the automation process. In this blog post, I would like to show you how you can leverage Flyway on AWS to version control your schema changes and automate your database migrations. - by Hendrik Hagen

Streamlined Kafka Schema Evolution in AWS using MSK and the Glue Schema Registry

In today’s data-driven world, effective data management is crucial for organizations aiming to make well-informed, data-driven decisions. As the importance of data continues to grow, so does the significance of robust data management practices. This includes the processes of ingesting, storing, organizing, and maintaining the data generated and collected by an organization. Within the realm of data management, schema evolution stands out as one of the most critical aspects. Businesses evolve over time, leading to changes in data and, consequently, changes in corresponding schemas. Even though a schema may be initially defined for your data, evolving business requirements inevitably demand schema modifications. Yet, modifying data structures is no straightforward task, especially when dealing with distributed systems and teams. It’s essential that downstream consumers of the data can seamlessly adapt to new schemas. Coordinating these changes becomes a critical challenge to minimize downtime and prevent production issues. Neglecting robust data management and schema evolution strategies can result in service disruptions, breaking data pipelines, and incurring significant future costs. In the context of Apache Kafka, schema evolution is managed through a schema registry. As producers share data with consumers via Kafka, the schema is stored in this registry. The Schema Registry enhances the reliability, flexibility, and scalability of systems and applications by providing a standardized approach to manage and validate schemas used by both producers and consumers. This blog post will walk you through the steps of utilizing Amazon MSK in combination with AWS Glue Schema Registry and Terraform to build a cross-account streaming pipeline for Kafka, complete with built-in schema evolution. This approach provides a comprehensive solution to address your dynamic and evolving data requirements. - by Hendrik Hagen