AWS Lake Formation.
Data Lake, LakeFormation, 2025
AWS Lake Formation = Scaled Data Lake + Scaled Security Provisioning
We will provision access to the users based on the mapped tags to the database
IAM Setup
Data Lake Adminstrator
kfn-lf-admin These are few of policies which we will need by the lake admin. Apart from Data Lake Admin who will configure the policies on the lake, we are going to use Athena, Redshift and Glue so that we can distunguish between the other roles which will configure.
Service Linked Role
A service-linked role is a special type of AWS IAM role that is predefined and managed by an AWS service, allowing that service to perform actions on our behalf.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:CreateServiceLinkedRole",
"Resource": "*",
"Condition": {
"StringEquals": {
"iam:AWSServiceName": "lakeformation.amazonaws.com"
}
}
},
{
"Effect": "Allow",
"Action": [
"iam:PutRolePolicy"
],
"Resource": "arn:aws:iam::528454491151:role/aws-service-role/lakeformation.amazonaws.com/AWSServiceRoleForLakeFormationDataAccess"
}
]
}
Analyst and Engineer Roles
Analyst
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"lakeformation:GetDataAccess",
"glue:GetTable",
"glue:GetTables",
"glue:SearchTables",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetPartitions",
"lakeformation:GetResourceLFTags",
"lakeformation:ListLFTags",
"lakeformation:GetLFTag",
"lakeformation:SearchTablesByLFTags",
"lakeformation:SearchDatabasesByLFTags"
],
"Resource": "*"
}
]
}
Data Engineer. May be we can down select even further
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"lakeformation:GetDataAccess",
"lakeformation:GrantPermissions",
"lakeformation:RevokePermissions",
"lakeformation:BatchGrantPermissions",
"lakeformation:BatchRevokePermissions",
"lakeformation:ListPermissions",
"lakeformation:AddLFTagsToResource",
"lakeformation:RemoveLFTagsFromResource",
"lakeformation:GetResourceLFTags",
"lakeformation:ListLFTags",
"lakeformation:GetLFTag",
"lakeformation:SearchTablesByLFTags",
"lakeformation:SearchDatabasesByLFTags",
"lakeformation:GetWorkUnits",
"lakeformation:GetWorkUnitResults",
"lakeformation:StartQueryPlanning",
"lakeformation:GetQueryState",
"lakeformation:GetQueryStatistics"
],
"Resource": "*"
}
]
}
Data Lake Setup
Using the Data Lake Admin just to show all the tables are available for query
S3
We are creating 2 tables.
AWS Glue Crawler
AWS Glue Database / Tables
Querying the Data lake via Athena
Querying the Data lake via Redshift
Now Lake Formation
Step 1: Setup the LF datalake administrators.
We will be using the lake admin role we have been using for this.
Step 2: Data Lake Location
We can use Data Location as an additional layer of securing roles having access to the data.
Step 3: Setting up Tags
We will use tags as an additional layer of securing roles having access to the data.
Step 4: Setting up permission of Tags to Tables
Step 5: Setting up permissions of Data Permissions of User based on Tags
Now that we have setup tables and associated the tags for access to the tables, we now would need to associate the tags to users so that users can access to the ables based on matched tags.
Testing LF Analyst
Only LF Tag Data Sensitivity: sensitive need to be visible.
Only the table “balls” is available to the LF Analyst user. The match table is not available.
Testing LF Enginer
Providing access to tables and databased with Tag DataSensitivity: internal
The engineer does not have access to the balls table.
But the engineer does have access to the Match table
Some more interesting play
The database is set with tag value: sensitive. But I did override the Match with tag value internal. That is why the value is mentioned as overridden in the match table.
The implication for this is that, the database describe is not assessable for the engineer but the Match table is query-able.