RAG is a way to approach the “hallucination” problem with LLM: A contextual reference increases the accuracy of the answers. Do you want to use RAG (Retrieval Augmented Generation) in production? The Python langchain library may be too slow for your production services. So what about serverless RAG in fast GO Lambda?
Articles tagged with "level-300"
“Everything fails all the time” has been preached to us by Werner Vogels for a few years now. Every engineer working on building and maintaining systems knows this to be true. Distributed systems come with their own kind of challenges, and one of the AWS services that help deal with those is AWS Step Functions. AWS Step Functions allow you to describe workflows as JSON and will execute those workflows for you. In this blog, we’ll explore what happens when things inevitably go wrong and the options the service offers to perform error handling and retries using an example application.
In a recent project, a colleague asked me to look at a HIVE_CURSOR_ERROR in Athena that they weren’t able to get rid of. Since the error message was not incredibly helpful and the way this error appeared is not that uncommon, I thought writing this may help you, dear reader, and future me when I inevitably forget about it again.
Custom runtime on Amazon Linux 2 - GO outperforms Node (3x) and Python (2x) with AWS Lambda Cold-start time
Lambda GO runtime is deprecated until the end of 2023. The new default custom Amazon Linux 2 runtime really speeds things up for GO. Now the cold-start is 2x faster than Python and 3x faster than node!
In this blog post, I would like to show you the various types of AWS permission management capabilities to generate a better understanding, where access denied API errors (403) may arise from and what ample options there are to grant permissions. One should be familiar with IAM policies to get the most out of the blog.
Rotating credentials with grace can be challenging when the underlying service doesn’t support scheduled deletion. Today I will show you how to implement access key rotation for an IAM user while supporting a grace period where both the new and old credentials are valid.
Working with datasets in pandas will almost inevitably bring you to the point where your dataset doesn’t fit into memory. Especially parquet is notorious for that since it’s so well compressed and tends to explode in size when read into a dataframe. Today we’ll explore ways to limit and filter the data you read using push-down-predicates. Additionally, we’ll see how you can do that efficiently with data stored in S3 and why using pure pyarrow can be several orders of magnitude more I/O-efficient than the plain pandas version.
Using SQS as a queue to buffer tasks is probably the most common use case for the service. Things can get tricky if these tasks have a wide range of processing durations. Today, I will show you how to implement an SQS consumer that utilizes heartbeats to dynamically extend the visibility timeout to accommodate different processing durations.