Managing Azure permissions in a multi-team environment

There are a number of goals when it comes to good permissions management:

  • Minimize the number of priviledged accounts (particularly those who can grant permissions)
  • Least privilege - security principles should start with no permissions and only gain the ones they need
  • Audit - we need to be able to understand which principles made which changes. This relates to number 10 on the OWASP top 10
  • Don’t kill innovation - permissions in production should be tighter than those in development environments. We don’t want to make it difficult for engineers to test and innovate when developing the product
  • Don’t kill automation - teams running in their workloads via IaC (e.g. Terraform) shouldn’t be penalised by overzealous security measures

In a multi-team environment we need to make sure security doesn’t become a burden or bottleneck. Developer experience is important.

In this post I will discuss how permissions can be managed in Azure for a multi-team environment. I’m specifically thinking about service principals that teams use to run their IaC, but the same principals apply to user accounts.

In summary

Assuming teams are deploying to the same subscription, and we have different subscriptions for production and development environments (ideally in different tenants), then resource groups are the security boundary we need to think about.

The following resources have to be managed centrally to achieve the goal of least privilege:

  • Resource groups
  • Other Azure AD objects (e.g. App Registrations)
  • User-assigned identities (if permissions are required outside of the resource group)

Teams need to write their IaC (e.g. Terraform) in such a way that the above resources are created in development environments (where permissions are more open), and discovered in production environments (where a central team needs to control creation of those resources to maintain security boundaries).

In the future, ABAC might allow us to define permissions up front, meaning teams could also create resources in production, without sacrificing least privilege. See below for an example from AWS.

Alternative technology choices, such as Kubernetes, allow for more flexible permissions. However, in terms of teams creating Azure resources, we need projects like the Azure Service Operator to become generally available before this would be a complete option for most teams.

For further reading, Azure have published documentation around governance in multi-team environments that describes different ways of managing permmissions - it’s quite detailed and has a lot of diagrams.

Example scenario

Lets say we have a scenario where there’s 1 platform team deploying some shared concerns, and 2 application teams that deploy their own apps and depend on those shared concerns (e.g. a vNet). Each team requires a service principl to run their IaC under (for example, as part of their deployment pipeline).

In Azure, there are 4 levels at which we can scope RBAC permissions:

  • Management Group
  • Subscription
  • Resource Group
  • Resource

In a multi-team environment, we need each team will deploy their products to individual resource groups. This is because it’s the only level of scope where we can achieve our goals of least priviledge and not killing innovation and automation. Subscription is too high level - we don’t want individual teams creating whatever resources they like subscription wide. Resource is too low level - we don’t want teams having to request new resources from a centralised team.

So the structure looks like this:

Structure of teams, products and resource groups

Scoping permissions for resource groups

We can easily assign roles to our service principals at the resource level scope. For example:

1
2
3
az role assignment create --assignee 00000000-0000-0000-0000-000000000000 \
--role "Storage Account Key Operator Service Role" \
--scope /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-group

However, the resource group must exist for us to do this. If not, you will receive an error:

1
Resource group 'example-group' could not be found.

Assignable scopes in role definitions are also validated in the same way.

This creates a challenge for our aim of not killing automation. Teams running their own IaC are likely creating the resource group they need (especially in development environments). How can we allow teams to create their own resource groups, but also scope their permissions to that resource group?

Unfortunately, at least currently, we can’t. The only choice we have is to create those resource groups ahead of time. This presents a challenge for the teams. In some environments they will need to create their resource group, and in others they will need to discover the one that has been created for them.

What about managed identities?

Managed identities provide a convenient way for developers to assign an identity to their app without having to worry about credentials.

These managed identities will likely require some permissions. If these permissions are for resources within the team’s resource group, we can grant Microsoft.Authorization/roleAssignments/write to the service principal the team are using, but scope it to the resource group.

If the identity requires permissions on resources outside the resource group, we are in the same chicken-and-egg situation as the resource group. Teams could create their identity first, and then request permissions, but it’s no good for automation and developer experience.

A way around this is to create user-assigned identities centrally that teams can discover and use. For example, if team 1 need an identity for their app service, the user-defined identity can be created centrally (like the resource group) and assigned appropriate permmissions.

What about other identity objects such as App Registrations?

A team might need to create Azure AD objects such as app registrations. This requires the microsoft.directory/applications/createAsOwner permission in Azure AD (note the important difference between this and the create permission - createByOwner constrains the number of apps that can be created by that principal, which is probably a good thing).

We wouldn’t want to assign our principles something like the Application Administrator role as this allows them to do more than required - for example they could edit other app registrations they don’t own. The Application developer role would work - this grants them createAsOwner, and as the owner they would also be able to manage the app registration.

A similar pattern would follow for other Azure AD objects. If you want to create bespoke roles, you will need to pay for the P1 or P2 editions of Azure AD.

There is an unfortunate limitation in the azuread provider for terraform. At the time of writing it’s not possible to assign AD roles to security principles (users, groups or service principals) using the provider. It is being tracked by this GitHub issue.

Terraform module for create-or-discover resource groups

If teams are unable to create their own resource groups, they may need to adjust their IaC scripts to either create or discover resource groups. For example, they will likely create their own in the development subscription, but need to discover in production subscriptions.

Terraform allows us to create or discover resource groups. This pattern could potentially be used for other resources, too. My employer (at the time of writing) have made this solution open source, under the MIT licence.

The module outputs the same interface as the azurerm_resource_group resource, regardless of whether the resource group was created or discovered.

Using a simple bit of terraform to run this module and create a storage account, when I use this predicate for the create_resource_group parameter: !contains(["staging", "production"], var.environment), and run the following:

1
terraform plan -var environment=dev

I get a plan to create the resource group and storage account:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Terraform will perform the following actions:

# azurerm_storage_account.example will be created
+ resource "azurerm_storage_account" "example" {
...
+ location = "uksouth"
+ name = "robblogtfdemodev"
+ resource_group_name = "example-rg-dev"
...
}

# module.resource_group.azurerm_resource_group.example[0] will be created
+ resource "azurerm_resource_group" "example" {
+ id = (known after apply)
+ location = "uksouth"
+ name = "example-rg-dev"
+ tags = {
+ "environment" = "dev"
}
}

Plan: 2 to add, 0 to change, 0 to destroy.

Running it for staging (where I have already created the resource group) only creates the storage account and discovers the resource group:

1
terraform plan -var environment=staging
1
2
3
4
5
6
7
8
9
10
11
12
Terraform will perform the following actions:

# azurerm_storage_account.example will be created
+ resource "azurerm_storage_account" "example" {
...
+ location = "uksouth"
+ name = "robblogtfdemostaging"
+ resource_group_name = "example-rg-staging"
...
}

Plan: 1 to add, 0 to change, 0 to destroy.

Future solutions and alternatives

Attribute-Based Access Control (ABAC) is currently in preview and only supports some storage account operations. However, it enable us to define permissions up front and allow the teams to create and manage their resource groups when they’re ready. AWS supports this well - for example, we can create an IAM policy to allow someone to create an S3 bucket, but only with the name sometestbucketname123:

1
2
3
4
5
6
7
8
9
10
11
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": "s3:CreateBucket",
"Resource": "arn:aws:s3:::sometestbucketname123"
}
]
}