TDD kata with serverless services in AWS

November 3, 2020

Last week, I’ve re-read “Test Driven Development: By Example” by Kent Beck. I was amazed by the simplicity of his process, consisting of small pragmatic steps. So, I decided to put the process to the test in an unfamiliar domain.

The kata

In this kata, I am going to develop a serverless service in AWS using a Lambda and the API Gateway. I chose this task because, on the one hand, it will contain a fair amount of infrastructure code, which is considered hard to test. On the other hand, because I wanted a task that was more abstract and closer to a business requirement in contrast to a technical requirement like ‘deploy an AWS HTTP API Gateway’.

My goal is to understand whether TDD for infrastructure is possible and what are the trade-offs.

The TDD process

I am going to be following the TDD process, as described in the book, as close as possible. I will attempt to follow the Red-Green-Refactor cycle.

Red: Start by writing a failing test. It may not even compile at that time.
Green: Make the test pass, committing any sin necessary in the process.
Refactor: Eliminate any duplication introduced to make the tests green.

One important piece of the process is having a To-Do list. It is going to help me keep track of what is left to do and help me discover what to work on next.

To-Do List
==========
* Publish sales API

What is new for me is an appreciation of TDD as a process to manage my uncertainty and fear, rather than a process to write tests. You can read more about this in my previous blog post, here.

Tech stack

Being cognizant of the uncertainty, I decided to use as much as possible familiar tools.

Terraform for infrastructure provisioning.
Javascript for the Lambda.
Jest as a test runner.

There are probably tools better than terraform when it comes to testability. What I want to demonstrate is that the TDD process is independent of the tools you have to work with.

There are also aspects of this task that I am not familiar with:

Using the API Gateway
Using TDD in an infrastructure heavy task

I will have to be careful to tackle them in small increments, so that I don’t get overwhelmed.

Putting Terraform under test harness

‘Publish sales API’ is a very big task to do in one step. So, let’s look for a more achievable intermediate step to start with. If I can have a test which applies a minimal terraform configuration, I will at least know that I can put terraform under test harness, which is a prerequisite to the TDD cycle.

To-Do List
==========
* Publish sales API
* Run terraform in test

Test snippet

The test uses the simplest approach I can think of to drive terraform. It does what I would do in the command line.

const {execSync} = require('child_process');
const https = require('https');
const AWS = require('aws-sdk')

describe('serverless', () => {
  test('run terraform', () => {
    const modulePath = './src';
    execSync('terraform init', {cwd: modulePath});
    const applyResp = execSync('terraform apply -auto-approve', {cwd: modulePath});
    expect(applyResp.toString()).toContain('Apply complete!')
  })
})

Code snippet

provider "aws" {
  region = "eu-central-1"
}

Writing this test was hard work but making it green required only 3 lines of boilerplate code! Still, it is an important milestone. It demonstrates that it is possible to use jest to drive terraform to act and then also assert on the outcome of the operation.

To-Do List
==========
* Publish sales API
☑️ Run terraform in test

Deploying the Lambda

Even with terraform under test harness, deploying the sales API in one step, is still too big a task. What I find especially challenging about the task ahead, is doing all the infrastructure automation in one go, especially since I am unfamiliar with the API Gateway.

I think a smaller task, that I feel comfortable to undertake, is to deploy the AWS Lambda with my application code and make sure I can invoke it using the aws-sdk. Let’s update the To-Do List with our next steps.

To-Do List
==========
* Publish sales API
☑️ Run terraform in test
* Deploy the sales Lambda

Test snippet

I’ve refactor the test code from before, and I’ve extracted a beforeAll block where the terraform related code now lives.

The test itself is using the aws-sdk to invoke a Lambda function and asserts that the function returned the expected values. The name of the function is coming from terraform.

const {execSync} = require('child_process');
const https = require('https');
const AWS = require('aws-sdk')

describe('lambda', () => {
    jest.setTimeout(20000)
    let lambda_name;

    beforeAll(() => {
        execSync('terraform init', {cwd: './src'});
        const modulePath = './src';
        const applyResp = execSync('terraform apply -auto-approve', {cwd: modulePath});
        expect(applyResp.toString()).toContain('Apply complete!')

        const resp = JSON.parse(execSync('terraform output -json', {cwd: modulePath}));
        lambda_name = resp.lambda_name.value
    })

    test('have a lambda', async () => {
        const lambda = new AWS.Lambda({apiVersion: '2015-03-31', region: 'eu-central-1'});
        const resp = await lambda.invoke({
            FunctionName: lambda_name,
        }).promise()

        expect(resp.StatusCode).toBe(200)
        expect(resp.Payload).toContain('{ sales: [] }')
    })
})

Code snippet

data "archive_file" "example" {
  type = "zip"
  source_file = "${path.module}/example/index.js"
  output_path = "${path.module}/files/example.zip"
}

resource "aws_lambda_function" "example" {
  function_name = "serverless_example"
  handler = index.handler
  role = aws_iam_role.lambda_exec.arn
  runtime = "nodejs12.x"

  filename = data.archive_file.example.output_path
  source_code_hash = filebase64sha256(data.archive_file.example.output_path)
  reserved_concurrent_executions = 1
  timeout = 10
  publish = true
}

data aws_iam_policy_document "lambda_exec" {
  statement {
    actions = ["sts:AssumeRole"]
    principals {
      identifiers = ["lambda.amazonaws.com"]
      type = "Service"
    }
    effect = "Allow"
  }
}

resource "aws_iam_role" "lambda_exec" {
  name = "serverless_example_lambda"
  assume_role_policy = data.aws_iam_policy_document.lambda_exec.json
}

output "lambda_name" {
  value = aws_lambda_function.example.function_name
}

There was a bit more code, I had to write to make this test green. Fortunately, I was able to use jest, and work through the failures one by one until my Lambda was properly deployed.

I had to make the lambda_name an output of terraform to have it available in the test.

I don’t provide the JS code of the Lambda. I don’t think there is any educational value in it.

To-Do List
==========
* Publish sales API
☑️ Run terraform in test
☑️ Deploy the sales Lambda

Publishing the sales API

Now, I think I can go back and tackle the original task.

To-Do List
==========
* Publish sales API
☑️ Run terraform in test
☑️ Deploy the sales Lambda

Test snippet

const {execSync} = require('child_process');
const https = require('https');
const AWS = require('aws-sdk')

describe('simple http api', () => {
    jest.setTimeout(20000)
    let tf_output = {};

    beforeAll(() => {
        execSync('terraform init', {cwd: './src'});
        const applyResp = execSync('terraform apply -auto-approve', {cwd: './src'});
        expect(applyResp.toString()).toContain('Apply complete!')
        tf_output = JSON.parse(execSync('terraform output -json', {cwd: './src'}));
    })

    test('API', async () => {
        const apigatewayv2 = new AWS.ApiGatewayV2({apiVersion: '2018-11-29', region: 'eu-central-1'});
        const resp = await apigatewayv2.getApi({
            ApiId: tf_output.simple_http_api.value.id
        }).promise();

        expect(resp.ApiEndpoint).toBeTruthy()
        expect(resp.ApiId).toEqual(tf_output.simple_http_api.value.id)
    })

    test('Get response from API', (done) => {
        const req = https.request(
            tf_output.simple_http_api.value.api_endpoint,
            (res) => {
                let data = ''
                res.setEncoding('utf8');
                res.on('data', (chunk) => {
                    data += chunk
                });
                res.on('end', () => {
                    expect(res.statusCode).toBe(200)
                    expect(data).toContain('{sales: []}')
                    done()
                });
            });
        req.on('error', (e) => {
            console.error(e);
            done(e)
        });
        req.end();
    })
})

Those two tests demonstrate two different approaches to write assertions.

The first one uses the aws-sdk to inspect whether the necessary resource has been created.
The second one uses a completely outside-in approach without any knowledge of the infrastructure. It makes an HTTP request to the endpoint, demonstrating that our API is published and working.

Both tests depend on output from terraform.

Code snippet

resource "aws_apigatewayv2_api" "example" {
  name = "simple_http_example"
  protocol_type = "HTTP"
  target = aws_lambda_function.example.arn
}

resource "aws_lambda_permission" "apigw" {
  action = "lambda:InvokeFunction"
  function_name = aws_lambda_function.example.function_name
  principal = "apigateway.amazonaws.com"
  source_arn = "${aws_apigatewayv2_api.example.execution_arn}/*/*"
}

output "simple_http_api" {
  value = aws_apigatewayv2_api.example
}

With that, our main task is done!

To-Do List
==========
☑️ Publish sales API
☑️ Run terraform in test
☑️ Deploy the sales lambda

Conclusion

All in all, I wrote 3 tests which take around 15 seconds to run including the terraform apply.

This is about one order of magnitude slower than what I am used to, when I write tests for classic applications. Still, it is one of the fastest feedback cycles I’ve experienced doing infrastructure.

I hope I demonstrated that a TDD approach is a viable approach for developing infrastructure code. Of course, with dedicated tooling it gets easier to write tests. However, you can use simple tools to start today and reap the benefits of TDD.

Practical concerns

You will need an AWS account to apply the resources to. Either you need one account per developer or tweak the resources' names so that the resources of different developers do not conflict. Alternatively, you can use tools like localstack. They may help.
You may want an afterAll hook that destroys the resources in the end.

Lessons learned

Tests force you to write testable code. If I compare the terraform code I wrote for this kata with the code I usually write, I see a few differences:

I definitely used more terraform outputs than I would otherwise.
Terraform apply can get quite slow depending on the number and type of resources to provision. In order to keep the tests run fast, I would be forced to decompose my terraform code into smaller, independently deployable components sooner than otherwise.
Testability should be one of your major considerations when design and building software systems. Be careful with technologies which do not make it easy for you. When it comes to serverless make sure to weight the potential benefits against the difficulty to test.