CloudMapper "collect" - Command to inventory your AWS metadata

2018.06.05

RSS feed

Recently I worked with a client that has over 100 AWS accounts that provided me access in a variety of ways depending on who within the company actually owned the account (SSO access, IAM users, or cross-account roles). My task was to assess these accounts, looking for security misconfigurations and areas for improvement. I didn’t want to have to repeatedly query all of these accounts in order to run scripts and perform manual exploration that might hit the same APIs over and over again. I wanted to download all of the metadata about the accounts so I’d have a local copy to grep and run jq against.

Duo’s CloudMapper tool, a tool to visualize the networks of AWS accounts, already had functionality to download network data about accounts, so I expanded that to download all of the metadata about an account that was of interest to me. This post will describe the collect command, and subsequent posts will describe other new commands as I merge that code in to the public repo.

Why is this useful?

Here are a few things you can do with a local copy of an account’s metadata.

Ensures you have a backup of your setup for Disaster Recovery

In a perfect world, everything about your account’s configuration would be in terraform, CloudFormation, or similar configuration tools and backed up in a git repo. In reality, many people either don’t have that at all, or only have part of their configuration there. For example, they might rarely touch their domain configurations so they’ve never bothered putting their Route53 configuration into terraform.

Understand what new things have been created

One struggle new AWS users have is figuring out what exists in their account to avoid going outside of the free tier. Sometimes people want to try some things out and then delete the new resources they created, but they might forget about some resources or not know what was created if a script did it. Using collect, you can collect the data about your account when you create it, and then again after you tested some things out and tried to delete all of the things you created. By applying a diff you could ensure you really removed everything.

Keep track of changes over time

For larger companies, they might want to maintain a record of their accounts for ITIL CMDB (configuration management database) needs, or just otherwise be able to see everything that has changed in the account over time.

Easily search across multiple accounts

This was my primary motivator. If I saw a Security Group allowing access to a certain IP, I wanted to easily be able to query across every AWS account at a company to see where else that IP might appear. Maybe the IP is in other Security Groups in another account with a Description that would help me understand what it is, or maybe it is an Elastic IP in another account.

Why not AWS Config?

Some of you might immediately be wondering why not just use AWS Config? One big reason is that AWS Config is not kept up-to-date with the latest AWS services. For example, both AWS Config and Lambda were introduced in 2014, but AWS Config only added support for Lambda a few weeks ago in April of 2018.

Another reason is AWS Config costs money. It’s not much at $0.003/item, but it is not very clear on what a “Configuration Item” is, and you can’t predict how much this will cost until you have an inventory. The AWS way of getting an inventory is to run AWS Config, thus you have a chicken and egg problem. Depending on your situation, it might be easier to just collect this data yourself rather than request an account owner to turn on AWS Config.

Other reasons include:

  • The flexibility of having my own solution to collect exactly what I want and how.
  • Having a copy of the data that matches the AWS calls made to easily correlate between where to find the data and how to document what the results of interest are.

How to use this data

Running this command is done with:

python cloudmapper.py collect --account prod

All of the data collected is written to files with names like:

account-data/<account_name>/<region_name>/<service>-<api>.json

An example is:

account-data/demo/us-east-1/ec2-describe-instances.json

This is the same data as you would get running:

aws --profile demo --region us-east-1 --output json ec2 describe-instances

So now that you have this data, you can do things like get counts of all instance types across all accounts:

jq -r '.Reservations[].Instances[].InstanceType' account-data/*/*/ec2-describe-instances.json | sort | uniq -c

For the demo data, this outputs 3 t2.micro.

API calls that take parameters

Some of the API’s require parameters. For example, to get an S3 bucket policy, you have to specify the bucket name. You can’t just call a list or describe to get all of the bucket policies in one attempt. For this, the results are stored in a file named:

account-data/<account_name>/<region_name>/<service>-<api>/<parameter_value>

For example the policy for an S3 bucket named mywebsite.com in an account named prod would be at:

account-data/prod/us-east-1/s3-get-bucket-policy/mywebsite.com

Adding new APIs to collect

The code that powers this collection has some nice tricks to make it easy to add new API requests easily. For example, the call to aws s3 list-buckets is recorded in a yaml file as:

- Service: s3
  Request: list-buckets

Where this becomes powerful is when working with requests that require parameters. For example, the call to get the bucket policy is:

- Service: s3
  Request: get-bucket-policy
  Parameters:
  - Name: Bucket
    Value: s3-list-buckets.json|.Buckets[].Name

What this does is call aws s3 get-bucket-policy --Bucket <bucket_name> where <bucket_name> is found by running .Buckets[].Name as a jq query against s3-list-buckets.json

See the full list of API calls made here.

Next steps

I’ll be merging in a set of new commands that work off of this newly collected data in the coming weeks so stay tuned!

Download CloudMapper at https://github.com/duo-labs/cloudmapper