How to clean up orphaned AWS EC2 snapshots?

22

9

We end up with a fair amount of AWS EC2 snapshots where the AMI has been deleted, but the snapshot is left behind to rot. I'd like a non-manual way of identifying and deleting these orphans to save us money and space.

Ideally I'm thinking a bash script leveraging the CLI, but my AWS-fu is weak. I assume someone's done this before but I can't find a script that actually works.

In the best-case scenario this will also check volumes and clean those as well, but that may be better suited for a second question.

Alex

Posted 2017-03-22T16:45:06.300

Reputation: 3 120

My version on python. How-to use and github linkE.Big 2017-09-05T16:11:53.917

Answers

13

Largely inspired by the blog posts and gist already linked in the other answers, here is my take to the problem.

I did use some convoluted JMESpath functions to get a list of snapshots and not require tr.

Disclaimer: Use at your own risks, I did my best to avoid any problem and keep sane defaults, but I won't take any blame if it cause problem to you.

#!/bin/sh
# remove x if you don't want to see the commands
set -ex

# Some variable initialisation with sane defaults
DRUN='--dry-run'
DO_DELETE=${1:-'no'}
REGION=${2:-'eu-west-1'}
ACCOUNTID=${3:-'self'}

# Get two temporary files
SNAP_FILE=$(mktemp)
IMAGE_FILE=$(mktemp)

# Get the snapshot list and the volume list
aws --region "$REGION" ec2 describe-snapshots --owner-ids "$ACCOUNTID" --query 'Snapshots[*].[SnapshotId]' --output text > "$SNAP_FILE"
aws --region "$REGION" ec2 describe-images --owners "$ACCOUNTID" --filters Name=state,Values=available --query 'Images[*].BlockDeviceMappings[*].Ebs.[SnapshotId]' --output text > "$IMAGE_FILE"

# Check if the outputed command should be dry-run (default) or not
if [ "$DO_DELETE" = "IAMSURE" ]
then
 DRUN=''
fi

# count each snapshot id, decrease when a volume reference it, print delete command for those with no volumes
awk -v REGION="$REGION" -v DRUN="$DRUN" '
FNR==NR { snap[$1]++; next } # increment snapshots and get to next line in file immediately

{ snap[$1]-- } # we changed file, decrease the snap counter when a volume reference it

END {
 for (s in snap) { # loop over the snapshots
   if (snap[s] > 0) { # if we did not decrese under 1 that means there is no volume referencing this snapshot
    cmd="aws --region " REGION " " DRUN " ec2 delete-snapshot --snapshot-id " s
    print(cmd)
  }
 }
}
' "$SNAP_FILE" "$IMAGE_FILE"
# Clean up the temp files
rm "$SNAP_FILE" "$IMAGE_FILE"

I hope the script itself is commented enough.

Default usage (no-params) will list delete commands of orphaned snapshots for the current account and region eu-west-1, extract:

aws --region eu-west-1 --dry-run ec2 delete-snapshot --snapshot-id snap-81e5856a
aws --region eu-west-1 --dry-run ec2 delete-snapshot --snapshot-id snap-95c68c7e
aws --region eu-west-1 --dry-run ec2 delete-snapshot --snapshot-id snap-a3bf50bd

You can redirect this output to a file for review before sourcing it to execute all the commands.

If you want the script to execute the command instead of printing them, replace print(cmd) by system(cmd).

Usage is as follow with a script named snap_cleaner:

for dry-run commands in us-west-1 region

./snap_cleaner no us-west-1

for usable commands in eu-central-1

./snap_cleaner IAMSURE eu-central-1 

A third parameter can be used to access another account (I do prefer to switch role to another account before).

Stripped down version of the script with awk script as a oneliner:

#!/bin/sh
set -ex

# Some variable initialisation with sane defaults
DRUN='--dry-run'
DO_DELETE=${1:-'no'}
REGION=${2:-'eu-west-1'}
ACCOUNTID=${3:-'self'}

# Get two temporary files
SNAP_FILE=$(mktemp)
IMAGE_FILE=$(mktemp)

# Get the snapshot list and the volume list
aws --region "$REGION" ec2 describe-snapshots --owner-ids "$ACCOUNTID" --query 'Snapshots[*].[SnapshotId]' --output text > "$SNAP_FILE"
aws --region "$REGION" ec2 describe-images --owners "$ACCOUNTID" --filters Name=state,Values=available --query 'Images[*].BlockDeviceMappings[*].Ebs.[SnapshotId]' --output text > "$IMAGE_FILE"

# Check if the outputed command should be dry-run (default) or not
if [ "$DO_DELETE" = "IAMSURE" ]
then
 DRUN=''
fi

# count each snapshot id, decrease when a volume reference it, print delete command for those with no volumes
awk -v REGION="$REGION" -v DRUN="$DRUN" 'FNR==NR { snap[$1]++; next } { snap[$1]-- } END { for (s in snap) { if (snap[s] > 0) { cmd="aws --region " REGION " " DRUN " ec2 delete-snapshot --snapshot-id " s; print(cmd) } } }' "$SNAP_FILE" "$IMAGE_FILE"
# Clean up the temp files
rm "$SNAP_FILE" "$IMAGE_FILE"

Tensibai

Posted 2017-03-22T16:45:06.300

Reputation: 9 733

Magnific! And except from the 'follow' (which IMO should be 'follows'), I think this answer is to be considered as a sample of high quality posts. The only thing in it that seems a bit redundant, is the disclaimer (anything one uses from something on an SE site comes with "use it at your own risk"). I can only think of 1 additional improvement you might want to add: an indication if you did test this script and if so how to summarize its test results (something like "works as designed"?). Obviously, if you already use it yourself, that's an even better indication.Pierre.Vriens 2017-03-23T11:08:31.253

@pierre wrote it this morning , tested partially, will probably enter our pipeline this afternoon, and while I agree on the general idea ´provided as is' , the risk level of removing a ´backup' is high and I feel I should stress it even more.Tensibai 2017-03-23T11:16:22.583

Hm, so we can get you involved to start a free code writing service for these kinds of DevOps needs (with some disclaimer-strings attached) ... interesting! I suggest that later on (when time is right), you add a minor update (at the end) like "my script entered our pipeline this afternoon".Pierre.Vriens 2017-03-23T11:24:25.997

@Pierre.Vriens I said probably, not guarantee, could be next week or later also ;)Tensibai 2017-03-23T12:40:01.993

Better late then never, n'est-çe pas?Pierre.Vriens 2017-03-23T12:42:28.503

I get unexpected EOF while looking for matching ' on the line with awk -v REGION="$REGION" -v DRUN="$DRUN" ' ?Alex 2017-03-23T13:49:46.723

@Alex On which distribution ? I assume the problem comes from which shell is effectively called by /bin/sh. Replacing carriage returns by ; in the awk script should fix, removing a part of readabilityTensibai 2017-03-23T13:52:06.287

@Alex and just noticed I forgot a pair of {} for the commands within the if while copy/pastingTensibai 2017-03-23T13:53:08.633

I'm running bash 3.2.57(1)-release - not sure which carriage returns should be changed, could you provide an edit and/or another snippet with the change?Alex 2017-03-23T13:59:56.857

@Alex Edited with a stripped version, without the comments in the awk script and as one linerTensibai 2017-03-23T14:03:29.443

1Perfect, thanks for editing! Works exactly as intended.Alex 2017-03-23T14:19:12.057

5

I used the following script on GitHub by Rodrigue Koffi (bonclay7) and it works pretty good.

https://github.com/bonclay7/aws-amicleaner

Command:

amicleaner --check-orphans

From the documentation blog post it does some more things:

It actually does a bit more than that, at of today it allows:

  • Removing a list of images and associated snapshots
  • Mapping AMIs:
    • Using names
    • Using tags
  • Filtering AMIs:
    • used by running instances
    • from autoscaling groups (launch configurations) with a desired capacity set to 0
    • from launch configurations detached from autoscaling groups
  • Specifying how many AMIs you want to keep
  • Cleaning orphan snapshots
  • A bit of reporting

StackOverFlow User

Posted 2017-03-22T16:45:06.300

Reputation: 151

3

Here is one script which can help you find orphaned snapshots

comm -23 <(echo $(ec2-describe-snapshots --region eu-west-1 | grep SNAPSHOT | awk '{print $2}' | sort | uniq) | tr ' ' '\n') <(echo $(ec2-describe-images --region eu-west-1 | grep BLOCKDEVICEMAPPING | awk '{print $3}' | sort | uniq) | tr ' ' '\n') | tr '\n' ' '

(from here)

Also you can check this article from serverfault

P.S. Of course you can change the region to reflect your

P.P.S. Here is updated code:

 comm -23 \
<(echo $(aws ec2 describe-snapshots --region eu-west-1 |awk '/SNAPSHOT/ {print $2}' | sort -u) | tr ' ' '\n') \
<(echo $(aws ec2 describe-images --region eu-west-1 |  awk '/BLOCKDEVICEMAPPING/ {print $3}' | sort -u) | tr ' ' '\n') | tr '\n' ' '

The sample exaplanations what the code do is:

echo $(aws ec2 describe-snapshots --region eu-west-1 | awk '/SNAPSHOT/ {print $2}' | sort -u) | tr ' ' '\n')

send to STDOUT the list of snapshots. this construction:

<(...)

create virtual temporary filehandler to make comm command read from two "files" and compare them

Romeo Ninov

Posted 2017-03-22T16:45:06.300

Reputation: 431

Did you test it? I found the same article but can't get it to work. If you can, user error on my end, but I fear it may be outdated based on the age of the article.Alex 2017-03-22T18:26:35.267

@Alex, can check it tomorrowRomeo Ninov 2017-03-22T18:27:24.897

Command see have changed, use aws ec2 describe/deleteTensibai 2017-03-22T18:47:12.213

@Tensibai, thank you. Let me confirm it tomorrow and will edit my answer :)Romeo Ninov 2017-03-22T18:48:45.127

1I did found the same source, but chaining hero awk sort and uniq makes my shell coder side sad, I'll post my version tomorrow :)Tensibai 2017-03-22T18:53:33.757

I don't understand the magic in the included script. Can you think of a way to extend your answer a bit to add some explanation about it?Pierre.Vriens 2017-03-22T19:18:00.503

@Pierre.Vriens, give me time till tomorrow, will simplify and explain the script :)Romeo Ninov 2017-03-22T19:19:58.950

1Fine for me, just wanted to provide you some (constructive) feedback to let you know that what probably looks like regular English to an expert (like you), looks pretty much like Chinese to me, ok? PS: and it doesn't sound Flemish either ... Drop me an extra comment if you want to notify me after you're done (if you want my updated feedback then).Pierre.Vriens 2017-03-22T19:22:09.020

@Pierre.Vriens, now you can check updated answer :)Romeo Ninov 2017-03-23T07:34:40.757

Looks like quite some progress. But IMO te PPS is confusing as related to what was in front of it already in the prior version. Why not just integrate it all without such PPS? Also, there are some typos and punctuation issues (like 'commend'?) that you may want to address, no?Pierre.Vriens 2017-03-23T08:46:01.097

2

Here is a GitHub Gist code snippet of exactly what you are asking for by Daniil Yaroslavtsev.

It uses the list of all images and their snapshots and compares the IDs to list of all snapshot IDs. Whatever remains are the orphaned ones. The code works in the same principle as the answer above, but is better formatted and slightly more readable.

The code takes advantage of the JMESPath with --query Snapshots[*].SnapshotId option (you can also use jp command line utility for that, if its already in your distribution. The formats the output as text with --output text. Here is a link to API reference and few examples. It is slightly more elegant than a long chain of grep/awk/sort/uniq/tr pipes.

Warning by Todd Walton: Don't mistake with 'jq' utility which uses different query language to parse json documents.

Jiri Klouda

Posted 2017-03-22T16:45:06.300

Reputation: 4 827

Just FYI, the jq command line utility is not the same JSON query language as what the "aws" command uses. The "aws" command uses JMESPath.Todd Walton 2018-11-14T17:54:22.030

Thank you for pointing that out. I've learned something new today.Jiri Klouda 2018-11-14T18:58:27.317

0

I've written snapshots.py script which iterates over all snapshots (in defined list of regions) and generates report.csv. This file contains information about instance, AMI and volume referenced by all snapshots.

There is also command to interactively remove dangling snapshots.

jazgot

Posted 2017-03-22T16:45:06.300

Reputation: 101