Articles tagged with "level-300"

Push-Down-Predicates in Parquet and how to use them to reduce IOPS while reading from S3

Working with datasets in pandas will almost inevitably bring you to the point where your dataset doesn’t fit into memory. Especially parquet is notorious for that since it’s so well compressed and tends to explode in size when read into a dataframe. Today we’ll explore ways to limit and filter the data you read using push-down-predicates. Additionally, we’ll see how you can do that efficiently with data stored in S3 and why using pure pyarrow can be several orders of magnitude more I/O-efficient than the plain pandas version.

The beating heart of SQS - of Heartbeats and Watchdogs

Using SQS as a queue to buffer tasks is probably the most common use case for the service. Things can get tricky if these tasks have a wide range of processing durations. Today, I will show you how to implement an SQS consumer that utilizes heartbeats to dynamically extend the visibility timeout to accommodate different processing durations.

Implementing Pessimistic Locking with DynamoDB and Python

I will show you how to implement pessimistic locking using Python with DynamoDB as our backend. Before we start, we’ll review the basics and discuss some of the design criteria we’re looking for. In an earlier post, I outlined to you how to implement optimistic locking using DynamoDB. There, I explained some of the reasons why locking is useful and which issues it can prevent. If you’re unfamiliar with the topic, I suggest you check that one out first.

Glue Crawlers: No GetObject, No Problem

This is the story of how we accidentally learned more about the internals of Glue Crawlers than we ever wanted to know. Once upon a time (a few days ago), André and I were debugging a crawler that didn’t do what it was supposed to. Before we dive into that, maybe some background on Crawlers first. Glue Crawlers are used to create tables in the Glue Data Catalog. They crawl, i.

Find all Lambda-Runtimes in all Accounts: Multi Account Query with steampipe and TASFKAS (the AWS service formerly known as SSO *)

You have got some mails from AWS: [Action Required] AWS Lambda end of support for Node.js 12 [Action Required] AWS Lambda end of support for Python 3.6 [Solution Required] Search all Lambdas in multiple accounts. [Solution Found] Steampipe with AWS multi-account support. Multi-account management is like managing all the arms of a Kraken. I will show you a fast and straightforward solution for this. (* the new offical name is IAM Identity Center, but I think TASFKAS would also fit 😉)