On May 7, Brandon Sherman of Twilio discovered something concerning with the AWS IAM Managed Policy that is recommended when using SageMaker.
He mentioned it to a couple of folks to get more pairs of eyes on it, it was reported to Amazon, and rumor has it that AWS sounded the alarms all the way up to just below Bezos himself. By 9:45pm PST, AWS had silently pushed out a fix for all customers (which is one of the benefits of using managed policies).
But here’s the thing: You probably never heard anything about this. There was no announcement.
I’m told AWS had somehow checked if this policy had been abused. But if a customer had made a modified copy somehow or used a similar pattern, AWS wouldn’t know to check if it had been abused in that environment.
I was concerned about this, and it got me thinking about how many other managed policies might have issues or might have been silently fixed. So I decided to manually audit all 500+ of them, and all 1,200+ versions of these policies, looking for other issues and silent fixes.
I’ve found issues in the past with AWS recommended policies, including their policy for MFA usage and in a blog post from the IAM team.
This post will describe what I found during my research, along with the SageMaker policy issue mentioned, and how to check if these policies have been abused in your environments.
Tagging secrets
Let’s start with the issue Brandon found in the AmazonSageMakerFullAccess
policy. The trouble spot of this large, 273-line policy is shown in Figure 1 below.
In November 2018, this policy added the ability to view Secret Manager secrets, but restricted access to only those secrets with the tag SageMaker
. Unfortunately, the policy also granted the ability to freely list and tag all of the other secrets, which means an attacker with these privileges could list the other secrets, tag them with the SageMaker
tag, and then read them. To put it another way, instead of being restricted to the least privilege access needed, this policy indirectly granted access to all of the Secret Manager secrets in the account.
In the correction, AWS removed the ability to tag secrets, along with a few other changes, as seen in the fix here.
// BAD - DO NOT USE
{
"Action": [
"secretsmanager:CreateSecret",
"secretsmanager:DescribeSecret",
"secretsmanager:ListSecrets",
"secretsmanager:TagResource"
],
"Resource": "*",
"Effect": "Allow"
}
{
"Action": [
"secretsmanager:GetSecretValue"
],
"Resource": "*",
"Effect": "Allow",
"Condition": {
"StringEquals": {
"secretsmanager:ResourceTag/SageMaker": "true"
}
}
}
Classic privilege escalation
Classic privilege escalation with IAM policies is the ability to modify the IAM policies that are either applied to yourself or applied to something else and then become that other thing.
Here’s an example: AWSOpsWorksRegisterCLI has the ability to create a user, add any policy to them, and create an access key for them. This provides a direct path to creating a user with AdministratorAccess
and an access key to become that new user. This issue has not yet been fixed.
// BAD - DO NOT USE
{
"Action": [
"iam:AddUserToGroup",
"iam:CreateAccessKey",
"iam:CreateGroup",
"iam:CreateUser",
"iam:ListInstanceProfiles",
"iam:PassRole",
"iam:PutUserPolicy"
],
"Resource": [
"*"
],
"Effect": "Allow"
}
Another classic privilege escalation issue was found in the AWSDeepRacerServiceRolePolicy; it had sts:*
, granting it the ability to assume any role in the account via sts:AssumeRole
. If a role in the account had higher privileges, it could potentially be assumed via this privilege. Apparently no aspect of this privilege was ever needed as AWS fixed this by simply removing sts:*
from the policy.
// BAD - DO NOT USE
{
"Action": [
"robomaker:*",
"sagemaker:*",
"sts:*",
"s3:ListAllMyBuckets"
],
"Resource": "*",
"Effect": "Allow"
}
Resource policy privilege escalation
A generic problem I found in a few IAM policies was that they restricted the type of actions that could be performed on objects in S3 buckets, but at the same time allowed the resource policy of the bucket to be modified. I’m calling this technique “resource policy privilege escalation”.
As an example, the policy AWSCloudTrailFullAccess, shown partially in Figure 4, is supposed to only allow Get
and List
access to the objects in an S3 bucket, but due to the privilege s3:PutBucketPolicy
, an attacker could make the bucket world-writable and then anonymously put new objects or delete existing ones. This issue also impacted AWSCodePipelineFullAccess and AmazonMachineLearningRoleforRedshiftDataSource.
// BAD - DO NOT USE
{
"Action": [
"s3:CreateBucket",
"s3:DeleteBucket",
"s3:ListAllMyBuckets",
"s3:PutBucketPolicy",
"s3:ListBucket",
"s3:GetObject",
"s3:GetBucketLocation",
"s3:GetBucketPolicy"
],
"Resource": "*",
"Effect": "Allow"
}
The policy AWSCloudTrailFullAccess
remains unfixed. AWS did apply a fix to AWSCodePipelineFullAccess
by restricting which buckets the s3:PutBucketPolicy
action can be applied to, but it still allows a more limited privilege escalation over the objects within that bucket.
In my opinion, the AmazonMachineLearningRoleforRedshiftDataSource
policy has had the most interesting “fix”. AWS fixed this policy by deprecating it, meaning no one can use it that hasn’t already. Customers that already were using that policy still have the unfixed policy and should “upgrade” to the new AmazonMachineLearningRoleforRedshiftDataSourceV2 policy. Customers that would like to use this policy to begin with can get started with the new V2 version.
A similar issue exists with the AmazonElasticTranscoderRole policy, which attempts to deny the ability to delete S3 objects or modifying the bucket policy by allowing s3:Get*
and s3:Put*
, but denying s3:*Policy*
and s3:*Delete*
.
This complication gives an attacker at least two techniques they could use to get around the restrictions on them to delete objects. The first technique is to put a lifecycle policy on the bucket to delete all of the objects using s3:PutLifecycleConfiguration
. The second technique would be to put an ACL on the bucket to grant Everyone
full_control
, which allows object deletion anonymously. This issue has been fixed by only allowing object related Put
calls.
// BAD - DO NOT USE
{
"Action": [
"s3:ListBucket",
"s3:Put*",
"s3:Get*",
"s3:*MultipartUpload*"
],
"Resource": [
"*"
],
"Effect": "Allow"
}
{
"Action": [
"s3:*Policy*",
"sns:*Permission*",
"sns:*Delete*",
"s3:*Delete*",
"sns:*Remove*"
],
"Resource": [
"*"
],
"Effect": "Deny"
}
As an aside, the Elastic Transcoder service is infamous for a previous IAM policy mishap (described here) where the policy AmazonElasticTranscoderFullAccess
allowed classic privilege escalation due to the use of iam:PutRolePolicy
.
This issue is infamous because AWS’s response was to simply delete the managed policy entirely!
Other findings
In this audit, I focused on looking for clear mistakes where an attempt had been made to restrict privileges, but was done incorrectly.
AWS managed policies are notoriously over-privileged, and I generally recommend against using them. For example, the policy AmazonEC2RoleforSSM which AWS had recommended for all EC2s that are to be managed by the SSM service (until last week), allows what is basically s3:*
by every EC2 it is applied to. This gives every EC2 instance read and write access to every S3 bucket in the account—which means that if an attacker compromises one of these EC2 instances, they likely can walk off with any data of interest in the account.
Many people have complained about this policy, but I included this as one of the findings in my list and AWS took action and no longer recommends using that policy. Instead, they describe how to build a policy by leveraging the more restricted policy AmazonSSMManagedInstanceCore.
Many of the additional issues that I reported or saw had been fixed, included mistakes in AWS not knowing the names of their own privileges. These were often innocent mistakes, like the addition of an extra “s” or the lack of one, such as ec2:DescribeCustomerGateway
being corrected to ec2:DescribeCustomerGateways
.
Other issues were the result of AWS using different names for privileges than the API calls you make, such as using the incorrect name s3:ListBuckets
instead of s3:ListAllMyBuckets
or using the incorrect name s3:ListObjects
instead of s3:ListBucket
.
One of my favorite mistakes is the SystemAdministrator policy spelling “lambda” incorrectly as “lamdba” in a resource and having to keep that mistake forever.
One odd privilege addition was the inclusion of cloudtrail:LookupEvents
in AmazonEC2ContainerRegistryFullAccess which provides a lot of insight into an account beyond just the ECR service. This privilege is still present in that policy.
What you can do
Check your accounts if you’ve used the policies mentioned in this article or if you copied and modified those policies. Managed IAM policies are generally over-privileged, so sometimes they are copied and restricted. This creates a problem, though, because you may have copied the managed policy and attempted to restrict it further on your own, but, in doing so, unknowingly copied in the problems of the original policy.
Because these problems are fixed by AWS without any notice, there isn’t a way to know that you have a problematic policy. I monitor these policies and will announce issues I see via my Twitter account @0xdabbad00. For those interested in monitoring these changes, I maintain a repository where you can see the changes of the policies via commit differences here.
If you do have one of these policies, or a copied version, you should ensure that the privilege escalation paths have been removed. You should also review your CloudTrail logs to make sure these issues haven’t been abused in the past in your accounts.
I recommend reviewing any policies, even if they come from AWS, to reduce privileges as much as possible, limiting the actions that can be performed and the resources that can be used. You should also apply conditions where possible. For more information on what resources and conditions can be used, head over here.
What AWS can do
I believe AWS should provide better tooling in this area, including:
- An IAM linting solution (as an open source library, outside of the web console),
- Better documentation on what is recorded by CloudTrail so that third-party tooling there (ex. RepoKid and CloudTracker) can advance, and
- Tooling to leverage Client Side Monitoring to record the API calls that CloudTrail does not.
Doing this would help AWS customers implement a least privilege strategy.
Disclosure timeline
- 2019.05.07 - Brandon Sherman identifies the AmazonSageMakerFullAccess problem.
- 2019.05.08 - The problem is reported and fixed that evening.
- 2019.06.01 - I report the other issues mentioned in this article along with more minor mistakes, impacting 21 policies total.
- 2019.06.07 - AWS makes the first fix, a minor one, removing the incorrectly named privilege s3:ListObjects from AWSOpsWorksCMServiceRole.
- 2019.06.10 - AWS begins fixing the bigger problems including AWSCodePipelineFullAccess and deprecating AmazonMachineLearningRoleforRedshiftDataSource and creating the new policy AmazonMachineLearningRoleforRedshiftDataSourceV2, which makes this concept public for anyone monitoring the IAM policy changes. More fixes continue in later days, although not all issues have yet been resolved.
- 2019.06.18 - This article is published
Along with working with AWS, I also reached out to Mitre to obtain CVEs for these issues, but was turned down because of an assumption that problems with AWS can always be fixed by the vendor without client action. This is specifically not the case with the policy AmazonMachineLearningRoleforRedshiftDataSource
, but so far this position of Mitre has not changed. I’m curious what others think with regards to when CVEs should be created for SaaS and cloud vendors.