Formal Specification for Authorization: Clarity Before Implementation
In large organizations, authorization can be a very complex topic. Almost always it is tightly constraint with strict compliance requirements, while at the same time it is intricately tied to the user experience. Who hasn’t had the experience of trying to access a resource they need for work only to be stumped by insufficient access, access that can sometimes take days or even weeks to be granted.
For engineering platform teams, that is an especially acute problem. As it is part of their mission, to provide a seamless experience and at the same time they need care for authorization of multiple offerings that multiples the pain of any friction.
In this blog post, I’ll demonstrate how to describe authorization requirements
using a formal specification language (TLA+). I’ll demonstrate the benefits in
clarity and specificity that allow for richer discussions in the organization
as well as the development new options that strengthen DevEx without compromising security.
I intentionally leave out specific tooling, to implement the authorization at this stage. Starting the discussion about specific tools very early can derail the discussion away from the need of your organization and towards the “features” and the “marketing” of the solution. Start looking into tool only after your requirements are clear. You don’t want to adapt your organizational process to the tool, you want the tool to make your organizational processes simple to use.
Why TLA+ for Authorization?
TLA+ is a high-level language for modeling programs and systems–especially concurrent and distributed ones.
Leslie Lamport
While authorization might not seem like a concurrent or distributed problem, it shares key characteristics:
- Multiple actors: Users, teams, systems all interact with authorization
- State changes: Permissions evolve as people join teams, projects change, or access is granted
- Safety properties: “User X should never access resource Y”
- Liveness properties: “User X should eventually get access to resource Y when they join team Z”
TLA+ excels at:
- Making implicit assumptions explicit
- Discovering edge cases you haven’t considered
- Validating that your design satisfies requirements
- Communicating precisely with both technical and non-technical stakeholders
The Example: Multi-Team, Multi-Environment Access Control
Organizational Context
In our fictional organization, the IT department has three teams:
dev_team_aanddev_team_b: Developing the flagship applicationoperations_team: Responsible for running the application in production
Technical Context
To support safe deployment, we maintain two environments:
integration: For testing and validationproduction: The live environment serving customers
Each environment runs in a separate AWS account. Users access an environment by being members of the AccessGroup IAM group in that account.
The Problem
Given this setup, we need to ensure:
Only members of the operations team can access the production environment.
This seems straightforward, but as we’ll see, formally specifying this requirement will reveal hidden assumptions and raise important questions about our authorization model.
With our problem defined, let’s model it formally. We’ll start with the organizational context.
Modeling the Organization (Constants)
Let’s start by specifying the organizational context in a formal manner with TLA+.
I’m going to defined the information about the organization as CONSTANT. As, in the scope of this specification
they are not going to be changing.
Below you see how to map complex values to CONSTANTS in TLA+. You can find more information about this syntax here
OrgPolicyEvaluation.cfg
1CONSTANTS
2 ApplicationEnvironments <- MyApplicationEnvironments
3 OrganizationalUnits <- MyOrganizationalUnits
4 OrganizationalUsers <- MyOrganizationalUsers
5 OrganizationalUnitMemberships <- MyOrganizationalUnitMemberships
OrgPolicyEvaluation.tla
1
2MyApplicationEnvironments == { "integration", "production" }
3
4MyOrganizationalUnits == {
5 "dev_team_a", "dev_team_b", "operations_team"
6}
7
8MyOrganizationalUsers == { "user1", "user2", "user3", "user4" }
9
10MyOrganizationalUnitMemberships == [ ou \in MyOrganizationalUnits |->
11 CASE ou = "dev_team_a" -> { "user1", "user2" }
12 [] ou = "dev_team_b" -> { "user3" }
13 [] ou = "operations_team" -> { "user4" }
14]
What I want to focus in this blog post in how TLA+ provides the tools to describe the organizational units, the users and also
the memberships, which user belongs to which team organizational units.
The membership is modeled using a function, that maps the name of the organizational unit to a set of its members.
Now that we’ve captured the organizational structure, let’s model the technical resources we’ll configure.
Modeling the Technical System (Variables)
The technical resource of the system, I prefer to model as variables as those are some of the resources that is one area with no hard constraints, those resources can be modified to achive the best solution.
1VARIABLES
2 awsGroups,
3 awsGroupMemberships
4
5vars == <<awsGroups, awsGroupMemberships>>
6
7AwsAccounts == ApplicationEnvironments
8
9AccessIamGroupName == "AccessGroup"
The only technical resources we know of are the awsGroups and the awsGroupMemberships.
The awsGroupMemberships maps users to awsGroups.
With our model in place, we can now formally express our requirements. As we do this, watch how the act of formalization reveals hidden assumptions.
Defining Requirements
Let’s start defining our goals and constraints,
Requirement 1: Only members of the operations team can access production
This was the only requirement explicitly discussed as part of the scope.
Here’s how you describe it in TLA+:
1OnlyMembersOfTheOperationsTeamCanAccessProduction ==
2 \/ AccessIamGroupName \notin awsGroups["production"]
3 \/ \A groupMember \in awsGroupMemberships["production"][AccessIamGroupName]:
4 groupMember \in OrganizationalUnitMemberships["operations_team"]
5]
After writing this requirement formally, it became obvious that there was also another requirement implied.
Requirement 2: All operations team members can access production
That requirement wasn’t never explicitly stated, but in the context of the organization it makes sense that this is how it should be. By making the implicit explicit you can have more meaningful conversations with the rest of the stakeholders and avoid miscommunications and nasty surpises later.
It also raise new questions to clarify:
- what are the steps to give new joiners of the
operations_teamaccess to theproductionenvironment? - Is the onboardning process seamless, is it fast?
- etc
1AllMembersOfOperationsTeamCanAccessProduction ==
2 /\ awsGroups["production"] # {}
3 /\ \A team_member \in OrganizationalUnitMemberships["operations_team"]:
4 team_member \in awsGroupMemberships["production"][AccessIamGroupName]
That requirement does not endanger the security of the system but its highly important for the operations_team to get the job done!
And as you start to think of requirements more carefully you see a third one that wasn’t discussed before.
Requirement 3: All team members of the three teams can access integration
1AllMembersOfAllTeamsCanAccessIntegration ==
2 /\ awsGroups["integration"] # {}
3 /\ \A ou \in OrganizationalUnits:
4 \A team_member \in OrganizationalUnitMemberships[ou]:
5 team_member \in awsGroupMemberships["integration"][AccessIamGroupName]
6]
Again stating the requirement explicitly you make sure to avoid misunderstandings and surprises later in the implementation phase.
This requirement also does not risk the security of the system, but rather the experience of the users and their ability to do their job. Raising similar discussion with Requirement 2.
Safety and Liveness
TLA+ allows to define Safety and Liveness properties.
Safety properties ensure that nothing bad ever happens.
Liveness properties that something good will eventually happen.
In our case, the requirement that only operation team members is a safety property. In TLA+ we write
OrgPolicyEvaluation.cfg
1INVARIANT OnlyMembersOfTheOperationsTeamCanAccessProduction
The requirements to ensure that all members of the operation team have access to production and that
all members of all teams have access to integration. Are the kind of requirements that they can be
violated for a period of time, for example it may take a while from when you onboard a new operations
team member until their access is granted. It is ok, for those two properties to be temporarily
violated so long as in the future they eventually hold again.
TLA+ can actually express and validate that kind of eventual properties:
OrgPolicyEvaluation.cfg
1PROPERTIES
2 EventuallyAllMembersOfAllTeamsCanAccessIntegration
3 EventuallyAllMembersOfOperationsTeamCanAccessProduction
OrgPolicyEvaluation.tla
1EventuallyAllMembersOfAllTeamsCanAccessIntegration == []<>AllMembersOfAllTeamsCanAccessIntegration
2EventuallyAllMembersOfOperationsTeamCanAccessProduction == []<>AllMembersOfOperationsTeamCanAccessProduction
Requirements alone don’t make a useful specification. We need to define the initial state and the possible actions that can change the system.
The Specification (Actions)
The specification, which consists of the initial conditions and the possible action, would depend strongly on how your organization looks like, as what constitutes as action can be very broad and organization depended. An action can be an HR rep, assigning a new members to a team in the HR software, it could be a manager accepting the assignment of a new member, or it could be a new joiner requesting access via a developer portal.
Below, I’ll draft some steps (actions) to demonstrate that formally describing the steps can also be beneficial.
Initial state
Initially, there are no AWS IAM Groups in environment and no memberships. It is up to the actions to create groups and add group memberships.
1Init ==
2 /\ awsGroups = [ s \in AwsAccounts |-> {} ]
3 /\ awsGroupMemberships = [ s \in AwsAccounts |-> [g \in awsGroups[s] |-> {}] ]
Actions
The following actions are possible:
- Create the AWS IAM access group in one of the environments.
- Assign an organizational user to the AWS IAM group in the integration environment, so that they can have access to that environment.
- Assign an organizational user to the AWS IAM group in the production environment, so that they can have access to that environment.
1CreateAccessIamGroup(account) ==
2 /\ awsGroups' = [awsGroups EXCEPT ![account] = @ \union {AccessIamGroupName}]
3 /\ awsGroupMemberships' = [awsGroupMemberships EXCEPT ![account] = @ @@ [ g \in {AccessIamGroupName} |-> {} ] ]
4
5AssignUserToIamGroupInIntegration(group, user) ==
6 /\ awsGroupMemberships' = [awsGroupMemberships EXCEPT !["integration"][group] = @ \union {user} ]
7 /\ UNCHANGED <<awsGroups>>
8
9AssignUserToIamGroupInProduction(group, user) ==
10 /\ awsGroupMemberships' = [awsGroupMemberships EXCEPT !["production"][group] = @ \union {user} ]
11 /\ UNCHANGED <<awsGroups>>
12
13Next ==
14 \/ \E account \in AwsAccounts:
15 CreateAccessIamGroup(account)
16 \/ \E group \in awsGroups["integration"]:
17 \E user \in OrganizationalUsers:
18 AssignUserToIamGroupInIntegration(group, user)
19 \/ \E group \in awsGroups["production"]:
20 \E user \in OrganizationalUsers:
21 AssignUserToIamGroupInProduction(group, user)
If you run the model, TLA+ will inform you that it violates the safety property OnlyMembersOfTheOperationsTeamCanAccessProduction.
It would also provide an example of the sequence of steps that caused the violation.
The violation is caused by assiging a memmber of one of the development teams in the production AccessGroup.
The action AssignUserToIamGroupInProduction, does not check that a user is a member of the operations_team before
assignment.
Let’s do an update:
1AssignUserToIamGroupInProduction(group, user) ==
2 \* Organizational measure required to ensure that only members of the operations team are added in production
3 /\ user \in OrganizationalUnitMemberships["operations_team"]
4 /\ awsGroupMemberships' = [awsGroupMemberships EXCEPT !["production"][group] = @ \union {user} ]
5 /\ UNCHANGED <<awsGroups>>
Adding a guard statement in the action AssignUserToIamGroupInProduction, makes the model validation pass!
And most importantly, as those steps represents actions your organization is taking. This is also the start of a discussion in your organization about how robust are your procedural and technical measures that gatekeep who can be assigned to the production AWS IAM Group.
💡 Key Insight: The guard condition reveals the need for either procedural or technical controls—exactly the kind of conversation formal specification enables.
What We’ve Gained
Clarity Through Explicitness
- Implicit → Explicit: Requirement 2 (all ops members can access prod) was never stated but is clearly necessary, as some requirement 3.
- Unambiguous timeliness: We distinguished safety (always true) from liveness (eventually true) properties
- Minimal complete set: Three requirements fully capture the authorization policy
Design Insights
- Implementation independence: The specification works regardless of whether you use AWS SSO, Terraform, custom tooling, or manual processes
- Gap identification: The guard condition in
AssignUserToIamGroupInProductionreveals the need for either procedural or technical controls - Discussion enabler: You can have precise discussions with non-techninal stakeholder without getting bogged down on technical details
Actionable Outcomes
You can now:
- Evaluate existing tools against these precise requirements
- Discuss trade-offs with PM, compliance, and security teams using shared vocabulary
- Design implementation strategies that provably satisfy the specification
- Leave behind unambiguous documentation that won’t be misinterpreted
Conclusion
In this post, we’ve seen how formal specification transforms vague authorization requirements into precise, verifiable constraints.
Key takeaways:
- Formal methods aren’t just for aerospace—they’re valuable wherever precision matters
- The process of specification is as valuable as the artifact (it forces clarity)
- Starting with requirements, not tools, keeps you focused on organizational needs
- TLA+ bridges technical and non-technical conversations with unambiguous language
What’s next? In my next post, I’ll demonstrate how to bridge the gap from TLA+ specification to AWS implementation, showing how to maintain traceability from requirements through deployed code.
For platform teams caught between security and DevEx, formal specification isn’t overhead—it’s how you escape the false choice and achieve both.
Resources
Resources: