AWS Lake Formation.

Data Lake, LakeFormation, 2025

AWS Lake Formation = Scaled Data Lake + Scaled Security Provisioning image

We will provision access to the users based on the mapped tags to the database

IAM Setup

image

Data Lake Adminstrator

kfn-lf-admin These are few of policies which we will need by the lake admin. Apart from Data Lake Admin who will configure the policies on the lake, we are going to use Athena, Redshift and Glue so that we can distunguish between the other roles which will configure.

image

Service Linked Role

A service-linked role is a special type of AWS IAM role that is predefined and managed by an AWS service, allowing that service to perform actions on our behalf.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:CreateServiceLinkedRole",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "iam:AWSServiceName": "lakeformation.amazonaws.com"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PutRolePolicy"
            ],
            "Resource": "arn:aws:iam::528454491151:role/aws-service-role/lakeformation.amazonaws.com/AWSServiceRoleForLakeFormationDataAccess"
        }
    ]
}

Analyst and Engineer Roles

Analyst

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lakeformation:GetDataAccess",
                "glue:GetTable",
                "glue:GetTables",
                "glue:SearchTables",
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:GetPartitions",
                "lakeformation:GetResourceLFTags",
                "lakeformation:ListLFTags",
                "lakeformation:GetLFTag",
                "lakeformation:SearchTablesByLFTags",
                "lakeformation:SearchDatabasesByLFTags"
            ],
            "Resource": "*"
        }
    ]
}

Data Engineer. May be we can down select even further

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lakeformation:GetDataAccess",
                "lakeformation:GrantPermissions",
                "lakeformation:RevokePermissions",
                "lakeformation:BatchGrantPermissions",
                "lakeformation:BatchRevokePermissions",
                "lakeformation:ListPermissions",
                "lakeformation:AddLFTagsToResource",
                "lakeformation:RemoveLFTagsFromResource",
                "lakeformation:GetResourceLFTags",
                "lakeformation:ListLFTags",
                "lakeformation:GetLFTag",
                "lakeformation:SearchTablesByLFTags",
                "lakeformation:SearchDatabasesByLFTags",
                "lakeformation:GetWorkUnits",
                "lakeformation:GetWorkUnitResults",
                "lakeformation:StartQueryPlanning",
                "lakeformation:GetQueryState",
                "lakeformation:GetQueryStatistics"
            ],
            "Resource": "*"
        }
    ]
}

Data Lake Setup

Using the Data Lake Admin just to show all the tables are available for query

S3

We are creating 2 tables.

image

AWS Glue Crawler

image

AWS Glue Database / Tables

image

Querying the Data lake via Athena

image

Querying the Data lake via Redshift

image

Now Lake Formation

Step 1: Setup the LF datalake administrators.

We will be using the lake admin role we have been using for this.

image

Step 2: Data Lake Location

We can use Data Location as an additional layer of securing roles having access to the data. image

Step 3: Setting up Tags

We will use tags as an additional layer of securing roles having access to the data. image

Step 4: Setting up permission of Tags to Tables

image

image

Step 5: Setting up permissions of Data Permissions of User based on Tags

Now that we have setup tables and associated the tags for access to the tables, we now would need to associate the tags to users so that users can access to the ables based on matched tags. image

Testing LF Analyst

Only LF Tag Data Sensitivity: sensitive need to be visible.

Only the table “balls” is available to the LF Analyst user. The match table is not available.

image

Testing LF Enginer

Providing access to tables and databased with Tag DataSensitivity: internal image

The engineer does not have access to the balls table. image

But the engineer does have access to the Match table image

Some more interesting play

The database is set with tag value: sensitive. But I did override the Match with tag value internal. That is why the value is mentioned as overridden in the match table.

image

The implication for this is that, the database describe is not assessable for the engineer but the Match table is query-able.

image