Key Data Science

RSS
Apr
19

One line S3 cleaner

The Amazon S3 Object Expiration allows you to define rules to schedule the removal of your objects after a pre-defined time period. However, I have S3 data that I want to remove only after a data ingestion process has completed successfully.

For example, my bucket has directories with the timestamp in the name. I want to remove everything that’s older than 2 days and only if my process has successfully imported the data.

A simple combination of bash and aws cli is usefull. You can test the removal with –dryrun

aws s3 rm --dryrun s3://path-to-your-bucket/ --recursive --exclude $(date --date="1 days ago" +%Y-%m-%d*) --exclude $(date +%Y-%m-%d*)

I use Jenkins to orchestrate my ETL jobs. I simply added the below shell code to the pipeline as a contitional build step:

aws s3 rm s3://path-to-your-bucket/ --recursive --exclude $(date --date="1 days ago" +%Y-%m-%d*) --exclude $(date +%Y-%m-%d*)

Quick and easy.

AWS, Bash, Linux kk Comments Off on One line S3 cleaner