Building a deduplication machine
A few months ago we supported a customer with a data migration project and one of the most important aspect of the migration was to make sure data duplicates were not reproduced in the new data layer but instead copied only once and to have duplicates of a file listed as references in the new data layer. To solve the uniqueness challenge we built a deduplication machine mainly using Amazon S3 and DynamoDB.