Athena support in CloudTracker


RSS feed

This post describes the new AWS Athena functionality added to CloudTracker so that you can point CloudTracker at an S3 bucket containing CloudTrail logs and automatically analyze them.

CloudTracker identifies what privileges the different users or roles in your account have been using and compares that with the IAM privileges they’ve been granted in order to advise what privileges can be removed. This allows you to implement a Least Privilege strategy and is further explained in the intro post here.

When I first developed CloudTracker, it required you to load your CloudTrail logs into ElasticSearch. Some companies already do this, but if you just wanted to try CloudTracker out, and you had many gigabytes of logs you wanted to analyze, you were going to end up spending a few days downloading your logs somewhere, transforming them, and then ingesting them into ElasticSearch. ElasticSearch is also fragile and many people experience failures of their log ingestion or the cluster itself.

Today you can now use AWS Athena with CloudTracker! Athena is a serverless interactive query service. You define tables that describe what your data looks like and where it is, and then you can make SQL calls against that data. CloudTracker takes care of the table creation for you, so all you need to do is tell it where your S3 bucket is that contains your logs, and CloudTracker takes care of the rest!

As an AWS service, there are some costs involved, but they are very minimal, and this is much more cost efficicent than spinning up an ElasticSearch cluster. This is also especially well suited to a common use case of CloudTracker for performing a manual review once per quarter of the privilege usage of your users and roles, so you don’t have to keep an ElasticSearch cluster running when not needed.

As a bonus, once CloudTracker has setup Athena to work with the CloudTrail logs, you can use Athena’s UI directly to make queries against that data yourself.

How to use CloudTracker with Athena

CloudTracker was built with the use case in mind of a security team with their own AWS account and multiple other AWS accounts sending CloudTrail logs to an S3 bucket in the security team’s account. This doesn’t have to be the case, but it is the reason for certain design decisions of CloudTracker.

Step 1: Clone and setup CloudTracker

git clone
cd cloudtracker
python3 -m venv ./venv
source venv/bin/activate
pip install -r requirements.txt

Step 2: Download your IAM data

Download a copy of the IAM data of an account using the AWS CLI:

aws iam get-account-authorization-details > account-data/demo_iam.json

Step 3: Configure CloudTracker

Create a config.yaml file with contents similar to:

  s3_bucket: my_log_bucket
  path: my_prefix
  - name: demo
    id: 111111111111
    iam: account-data/demo-iam.json

This assumes your CloudTrail logs are at s3://my_log_bucket/my_prefix/AWSLogs/111111111111/CloudTrail/

Step 3: Run CloudTracker

CloudTracker uses boto and assumes it has access to AWS credentials in environment variables, which can be done by using aws-vault. Once you’re running in an aws-vault environment, you can run:

python --account demo --list users

This will perform all of the initial setup which takes about a minute. Subsequent calls will be faster. You’ll see log messages and final output like this:

INFO     Source of CloudTrail logs: s3://my_log_bucket/my_prefix/
INFO     Using AWS identity: arn:aws:iam::111111111111:user/admin
INFO     Using output bucket: s3://aws-athena-query-results-222222222222-us-east-1
INFO     Account cloudtrail log path: s3://my_log_bucket/my_prefix/AWSLogs/111111111111/CloudTrail
INFO     Checking if all partitions for the past 12 months exist
INFO     Partition groups remaining to create: 12
INFO     Partition groups remaining to create: 11
INFO     Partition groups remaining to create: 1
- charlie

In this case, this simply lists the users in the account and identifies ones (ex. charlie) that have not been used in over a year. To then see the privileges used by alice you would run:

python --account demo --user alice --skip-setup

To speed things up I’ve added the flag --skip-setup. AWS Athena takes a few seconds to perform any call, even those that are just checking if tables exist, so this flag skips those steps.

CloudTracker supports a number of other flags for filtering. For this initial Athena support, CloudTracker does not support analyzing the actions of users or roles that have assumed into other roles or across accounts. The Athena support also only works at the month granularity, not the day. So for example, you can see what happened in the past month or 12 months (the default), but not the past 5 days. To get more complex functionality you’ll still have to use ElasticSearch for now.

How CloudTracker uses Athena

When CloudTracker starts up, it creates the Athena database cloudtracker. It then creates a table for the account to be queried (ex. cloudtrail_logs_975426262029) and partitions for every month. I found that any call to Athena took a few seconds, even with attempts to parallelize calls, so trying to create partitions for every day for every region for an entire year would have taken over 4 hours to setup, so that’s why I only partition at the month granularity.

Once CloudTracker has set things up, you can then make queries yourself against the CloudTrail logs by going to

AWS Athena querying CloudTrail


I hope this helps you better analyze the privilege usage of your account.