Glue | tecRacer Amazon AWS Blog

09 Sep '22

Glue Crawlers: No GetObject, No Problem

Written by André Reinecke , Maurice Borgmeier

This is the story of how we accidentally learned more about the internals of Glue Crawlers than we ever wanted to know. Once upon a time (a few days ago), André and I were debugging a crawler that didn’t do what it was supposed to. Before we dive into that, maybe some background on Crawlers first. Glue Crawlers are used to create tables in the Glue Data Catalog. They crawl, i.

Read Blog

03 May '22

Glue Crawlers don't correctly recognize Ion data - here's how you fix that

Written by Maurice Borgmeier

Amazon Ion is one of the data serialization formats you can use when exporting data from DynamoDB to S3. Recently, I tried to select data from one of these exports with Athena after using a Glue Crawler to create the schema and table. It didn’t work, and I got a weird error message. In this post, I’ll show you how to fix that problem. If you’re not familiar with Ion yet, check out my recent blog post introducing it for more details.

Read Blog

08 Feb '22

Working around Glue's habit of dropping unsuspecting columns

Written by Maurice Borgmeier

This point explains how to work around Glue’s problem of selective amnesia when creating Dynamic Frames from the Glue data catalog.

Read Blog

25 Jan '22

Solving Hive Partition Schema Mismatch Errors in Athena

Written by Maurice Borgmeier

Working with CSV files and Big Data tools such as AWS Glue and Athena can lead to interesting challenges. In this blog I will explain to you how to solve a particular problem that I encountered in a project - the HIVE_PARTITION_SCHEMA_MISMATCH.

Read Blog

03 Dec '21

Using PySpark and AWS Glue to analyze multi-line log files

Written by Maurice Borgmeier

Logfile analytics with Spark is tricky. One of the common problems are multi-line logs. In this post I explain to you how you can use PySpark to get your multi-line logs into a structured data frame.

Read Blog

22 Jun '21

What I wish somebody had explained to me before I started to use AWS Glue

Written by Maurice Borgmeier

There are many components under the Glue umbrella that can fit together into a cohesive big picture. In this introduction to Glue I’m explaining my version of this big picture.

Read Blog

Articles tagged with "glue"

Glue Crawlers: No GetObject, No Problem

Glue Crawlers don't correctly recognize Ion data - here's how you fix that

Working around Glue's habit of dropping unsuspecting columns

Solving Hive Partition Schema Mismatch Errors in Athena

Using PySpark and AWS Glue to analyze multi-line log files

What I wish somebody had explained to me before I started to use AWS Glue