I’m excited to announce that I’ve taken a new job with Aurora and am shutting down my consulting business. This post will discuss some project ideas I never got to, but first I want to briefly discuss this move. It’s weird to move on from something I built over the past 3.5 years and that was by all definitions a success. I’ve had dozens of clients across 5 continents, was quoted in the WSJ, keynoted a conference in Switzerland, travelled to South Africa to train people, obtained over 10,000 followers on Twitter, worked with Duo Security to create some of the most popular open-source cloud security tools, and generally have become one of the goto people in the world for AWS security.
It all started with the release of flaws.cloud almost 4 years ago to this day, which motivated me to quit my job and start that new adventure. I’ve had a ton of personal and professional growth along the way. I highly recommend considering that life path and have written about how to do something similar here and here. I’m excited for this new opportunity where I can focus on challenges that require deeper integrations, architectural changes, and longer time horizons than short-term contract work.
This post will describe project ideas I didn’t get around to as I suspect I’ll be a bit too busy for a while to get them. If you’d like to kick start a consulting business, draw attention to the engineering talent of your organization, or possibly create a SaaS business to get some extra revenue, these are some ideas that I think the world of AWS security would benefit from.
Security Group optimizer using VPC Flow Logs
In much the same way as tools such as repokid and cloudtracker have recommended or auto-remediated changes to IAM privileges based on the privileges actually used (as evidenced by Access Advisor or CloudTrail logs) vs the privileges granted, the concept here would be to take the existing Security Groups (ie. network access granted) and available VPC Flow Logs (ie. network access used) and perform a diff. The resulting output would allow you to recommend changes, so you could say “This EC2 has never received traffic on port 80, therefore that Security Group can be changed”. I suspect that much like the problems I encountered in graphing network diagrams with CloudMapper, this may be more difficult in larger environments. You would also likely need to take CloudTrail logs into consideration in order to understand what EC2s (or other resource) existed at the time of the flow logs. AWS has a blog post here that mentioned some analysis of VPC Flow Logs for Security Group improvements, but is minimal in terms of what all could be done.
Investigate memory capture using EC2 hibernation
When AWS updated their Incident Response whitepaper in June 2020, the biggest change was the mention of using EC2 hibernation for memory capture. Historically it has been very easy to take a disk snapshot of an EC2, but doing a memory capture required third party tools. Having AWS native functionality for memory capture is very interesting, but there are a ton of limitations and requirements for this to be possible, and it’d be interesting in seeing this explored. See some discussion here.
IAM privilege aggregator
I have often found myself trying to understand what all the privileges a role has when taking into consideration the IAM policies, Permission Boundaries, and SCPs being applied to it. At a minimum it would be nice to just concat all these together into one place, especially for SCPs which can be applied to the account or multiple levels of OUs. Ideally though you’d like something that can understand how these operate together and give you the actual privileges in some minimized way. For example, if an IAM role has an admin policy, but a permission boundary that only allows access to S3 and an SCP that denies changing GuardDuty, you’d like to see that the role can only access S3. This functionality is needed for some later projects.
Access Denied explainer
Similar to the “IAM privilege aggregator”, a situation that is going to become an increasingly worse problem on AWS, is debugging why something was denied. When you get an Access Denied error, there is no further context, either within the API response or CloudTrail logs. Imagine trying to access an S3 bucket and getting an Access Denied. Was it because you didn’t have privileges, or because of a permission boundary, a bucket policy, or SCP denying access? This problem is made worse because from an account you do not have an ability to see what SCPs have been applied to you. Is the access restricted based on some set of conditions? Maybe the tags on the bucket don’t match the tags required for your access?
As SCPs become more common and as AWS advocates greater use of ABAC, this situation is going to happen more frequently and become harder to debug. Minimally, you want to at least tell someone the relevant statements and conditions for the privilege involved. The ultimate goal wold be to tie this to the CloudTrail events in EventBridge to look for Access Denied messages in real-time and then automatically send a Slack message to someone telling them “Hey, looks like you tried to create an EC2, but you’re only allowed to use t3.micro instances in that account”. There are some common conditions like that which you could have pre-planned conversational text for. If you build that, there are people I know right now that would write you checks. Another option would be to use CSM to identify these Access Denieds locally or in special cases, such as when unit tests are run in a CI/CD or staging environment.
SCP baseline creator
Once in an engagement I was given access to a newly created account that had been initialized with the company baseline that included IAM roles for a vendor, enabled GuardDuty, EventBridge rules, etc. I was asked to create an SCP that could protect those configurations, so for example if a developer runs aws-nuke on a sandbox account, it doesn’t modify this baseline. It should be fairly easy to generate an SCP based on some common features like that which has built-in functionality for exception conditions so that those features can still be changed by a certain Organization accessible admin role.
Along with this, it would be nice to have a differ and some sort of linter, such that you could identify that an existing SCP, for example, does not properly protect GuardDuty.
Tagging policy SCP generator
Many companies would like to ensure all their resources are tagged with certain keys, such as an Owner tag. You might think that the Organization feature Tag Policies could do this, but you would be sorely dissapointed, just like everyone else that ever tried using that feature to do anything meaningful. One way of accomplishing this is to auto-remediate or otherwise detect improperly tagged resources after the fact, but I believe a better way would be to enforce this via an SCP. Before doing this, you’ll want the “Access Denied explainer” project to exist, or else you will bring sadness and frustration to your developers.
AWS List/Describe proxy
Many companies have multiple tools scanning their accounts looking for security, tagging, cost, operations, and other issues. As these tools make List and Describe calls, in large environments, eventually they are rate limited and companies try to make use of AWS Config, which requires you to change how you make queries and also lacks coverage across a number of services and features. It would be better to have a single tool collect this information, possibly leveraging AWS Config and also possibly leveraging the CloudTrail data to keep it updated without querying (or at least without querying as often). Netflix began down this path with their Historical project, but no longer maintains that project. Ideally, this would work fairly transparently, such that you could trick an application into visiting your server instead of amazonaws.com (such as via /etc/hosts and the SDK environment variable CA_BUNDLE so you get through HTTPS), then you’d either respond with the data or relay the request to AWS. In environments that use AWS Config, you might convert the request into an Athena query against the data in S3 as explained here.
Improving existing tools
The ideas above were for some new projects. There are lots of features and improvements to CloudMapper that I haven’t gotten to as well, in addition to just general maintenance and bug fixes of CloudTracker. One valuable focus area for CloudMapper is on the identification and visualization of trust relationships between accounts. CloudMapper, via the weboftrust command, can identify trust relationships between accounts of IAM roles, S3 buckets, and VPC peerings. There are lots more relationships that can exist, and one could start by going through the list of aws_exposable_resources. The visualization of these could also be improved by simply creating a table as one option. In theory Access Analyzer could find many of these relationships, but that tool lacks the coverage you would expect.
I’m not going away, but I might not ever get to some of these things, and I don’t want these ideas to die, so go forth and do awesome things!