Opting out of AWS AI data usage

2021.01.06

RSS feed

Update 2022.02.22: AWS has no team managing this feature and now some services work slightly differently depending on if this feature is enabled or not. Some caution must therefore be taken when enabling this. AWS is working on better documenting what these differences are.

This post will discuss why you should opt out of the AI data usage on AWS, how to do that, and how to confirm you did it correctly.

A general rule on AWS is that your data will not leave the region you put it in. AWS customers rely on this for compliance, data sovereignty, and other reasons. Another general rule is that AWS will not access your data. AWS customers rely on this because they want to use AWS services, but do not want Amazon to use their data to create competing products or improve their services in such a way that it would directly benefit existing competitors.

In 2017 AWS made exceptions for these rules for their Machine Learning and Artificial Intelligence services in the AWS Service Terms. Specifically, under section 50.3, by using AWS services such as Lex, Transcribe, and more, you automatically opt in to allowing AWS “to develop and improve AWS and affiliate machine-learning and artificial-intelligence technologies” and that they may store your data outside of the region you put it in. Section 50.4, as I understand it, then informs you that it is your responsibility to disclose this use by AWS to your own users in order to abide by various laws, such as the Children’s Online Privacy Protection Act (COPPA).

The AWS terms are not often read or understood, other than the regular chuckle about Amazon Lumberyard having a clause related to zombies (section 42.10). So it wasn’t until AWS added functionality to the SDK for Organizations and subsequent announcement in July 2020 that I noticed the section about the AI services. They added functionality to the SDK to opt-out of this data usage, which is described in the docs here. Opting out of this still allows you to use these services. Opting out just means that now AWS won’t copy your data or use it for their own benefit. There is no negative to opting out other than that maybe these services may not be improved as much as they would otherwise.

The functionality has all sorts of complexity to allow opting out of specific services, default allowing some services, having different policies for different accounts, and whether accounts can make exceptions. This is ridiculous because anyone that learns about this and takes action is only going to ensure that all of their accounts are opted out.

The remainder of this article will explain how to turn this on and how to confirm it is on.

Opting out

You should opt out of this data usage. You can do this by following the docs to enable the opt-out ability, creating a policy, and associating that policy with your organization root. The policy you create should be:

{
    "services": {
        "@@operators_allowed_for_child_policies": ["@@none"],
        "default": {
            "@@operators_allowed_for_child_policies": ["@@none"],
            "opt_out_policy": {
                "@@operators_allowed_for_child_policies": ["@@none"],
                "@@assign": "optOut"
            }
        }
    }
}

This policy is documented here and will opt you out of all services from using your data for all accounts, and will not allow member accounts to create exceptions. As mentioned, these policies are needlessly complex.

To opt-out via terraform you can use this gist from Christophe Tafani-Dereeper.

This process can be done from the command-line with the following with a session to the Organization management account:

# Get root id
export ROOT_ID=`aws organizations list-roots | jq -cr '.Roots[0].Id'`

# Enable the feature
aws organizations enable-policy-type --root-id $ROOT_ID --policy-type AISERVICES_OPT_OUT_POLICY

# Create the policy
aws organizations create-policy --type AISERVICES_OPT_OUT_POLICY --name optout --description "Opt out of AI services using our data" --content "{\"services\":{\"@@operators_allowed_for_child_policies\":[\"@@none\"],\"default\":{\"@@operators_allowed_for_child_policies\":[\"@@none\"],\"opt_out_policy\":{\"@@operators_allowed_for_child_policies\":[\"@@none\"],\"@@assign\":\"optOut\"}}}}"

# Get the policy id
export POLICY_ID=`aws organizations list-policies --filter AISERVICES_OPT_OUT_POLICY|jq -cr '.Policies[0].Id'`

# Attach it to the root
aws organizations attach-policy --policy-id $POLICY_ID --target-id $ROOT_ID

Confirming you opted out

To ensure this has been applied correctly, I’ll list the steps to confirm this. This can be used by auditors to ensure this step has been performed. For these commands, ensure you have the latest version of the AWS CLI. When I run aws --version the response includes aws-cli/2.1.15 so ensure you are using at least that version.

Confirm feature is enabled

First, ensure the opt out functionality has been enabled for the organization. In the organization management account, run:

 aws organizations list-roots

This should return something like:

 {
    "Roots": [
        {
            "Id": "r-p0xn",
            "Arn": "arn:aws:organizations::123456789012:root/o-abcd123456/r-abcd",
            "Name": "Root",
            "PolicyTypes": [
                {
                    "Type": "AISERVICES_OPT_OUT_POLICY",
                    "Status": "ENABLED"
                },
                {
                    "Type": "TAG_POLICY",
                    "Status": "ENABLED"
                },
                {
                    "Type": "SERVICE_CONTROL_POLICY",
                    "Status": "ENABLED"
                }
            ]
        }
    ]
}

You want to confirm AISERVICES_OPT_OUT_POLICY is set to ENABLED.

Confirm the policy exists

Confirm an opt-out policy exists by running:

$ aws organizations list-policies --filter AISERVICES_OPT_OUT_POLICY
{
    "Policies": [
        {
            "Id": "p-0000000000",
            "Arn": "arn:aws:organizations::123456789012:policy/o-abcd123456/aiservices_opt_out_policy/p-0000000000",
            "Name": "optout",
            "Type": "AISERVICES_OPT_OUT_POLICY",
            "AwsManaged": false
        }
    ]
}

Then get the contents of this policy (replacing your policy id):

$ aws organizations describe-policy --policy-id p-0000000000
{
    "Policy": {
        "PolicySummary": {
            "Id": "p-0000000000",
            "Arn": "arn:aws:organizations::123456789012:policy/o-abcd123456/aiservices_opt_out_policy/p-0000000000",
            "Name": "optout",
            "Type": "AISERVICES_OPT_OUT_POLICY",
            "AwsManaged": false
        },
        "Content": "{\"services\":{\"@@operators_allowed_for_child_policies\":[\"@@none\"],\"default\":{\"@@operators_allowed_for_child_policies\":[\"@@none\"],\"opt_out_policy\":{\"@@operators_allowed_for_child_policies\":[\"@@none\"],\"@@assign\":\"optOut\"}}}}"
    }
}

You can pipe that through jq '.Policy.Content|fromjson' for a more readable policy. Confirm the policy matches the policy text shown earlier.

Confirm the policy is attached to the root

Finally, run the following (replacing your policy id):

$ aws organizations list-targets-for-policy --policy-id p-0000000000
{
    "Targets": [
        {
            "TargetId": "r-p0xn",
            "Arn": "arn:aws:organizations::123456789012:root/o-abcd123456/r-abcd",
            "Name": "Root",
            "Type": "ROOT"
        }
    ]
}

Confirm that the type in the response is “ROOT”.

This confirms that you have opted out all the accounts in your organization from having their data moved to other regions or used for reasons outside of your control.