Logfile analytics with Spark is tricky. One of the common problems are multi-line logs. In this post I explain to you how you can use PySpark to get your multi-line logs into a structured data frame.
Articles tagged with "pyspark"
There are many components under the Glue umbrella that can fit together into a cohesive big picture. In this introduction to Glue I’m explaining my version of this big picture.