Courses / Learning AWS Serverless / Part 2: CloudWatch / Backgrounders

Working with CloudWatch Data

This lesson references the scripts in the aws-connectedcar-common repository. If you're new to this course, see the introduction for information about setting up your workstation and getting the sample code.

The previous lesson outlined how CloudWatch is built on three core services. Let’s now dig a little deeper into how you can work with these core services, showing how each works to solve a specific observability problem.

Using CloudWatch Logs

CloudWatch logs are generally the first place you look when debugging a problem in your solution. For the sample code in this course there will be log entries for each of the three APIs, and also for each of the deployed Lambdas. Let’s start by looking at some sample API Gateway logs for the Admin API:

First of all, note the data hierarchy for the logs. At the top of the hierarchy you have log groups, which are persistent resources with names that relate to the services that are the source for the logs. For API Gateway, unfortunately, these names incorporate the generated API ID rather than the friendlier, user-defined resource names. Within log groups there are a series of transient, sharded log streams, and within these log streams, there are individual log entries.

Working from the console, you can quickly drill down to these individual log entries. You can also enter text searches that span all the streams within a log group. This works well when you have a fairly specific text string to search for. You can see an example below, where we’re looking for API request bodies with the “request body before transformation” text string. This example shows log entries for the incoming API request bodies before they're transformed into the Lambda invocation format:

You can also search for log entries from the command line, where you can apply more advanced query criteria. We’ll go into more detail about how to use the command line for logs in the labs. But, here’s an example script that performs the same search shown above, using some of the CloudFormation stack-querying tricks we covered in the previous section:

Here’s an example of what the output from running this script looks like in the VS Code terminal. Obviously the format is different, but you can get the same log information this way as you would from the console:

Using X-Ray Traces

CloudWatch logs are a good source of information when you already know which service is causing a problem. X-Ray traces can be useful when what you have is an error at the top of your stack and you want to follow the flow of execution to uncover the point where it's triggered.

To illustrate, let’s first look at a trace for a successful API call. The trace shown below displays the flow of execution from the API, through a Lambda, and into the DynamoDB service:

Here’s a trace for the same API call that this time resulted in a 400 “Bad Request” HTTP response. You can see from this trace that the Lambda never called the downstream DynamoDB service, suggesting that an exception occurred there. The trace thus helps guide you to the source of the problem, at which point you can use the logs to uncover more information:

Looking at the first, successful trace shown above, you can also see how it profiles the performance for each step of the executing service stack. When you have unexpected performance bottlenecks, this is a diagnostic tool that, like with errors, can help guide you to the source of problems.

Using CloudWatch Metrics

The CloudWatch metrics service has a lot of potential uses, but two big ones are visualizing performance patterns for a service through time, and capturing usage information for resources. The graph shown below is an example of the former, which shows how you can visualize the cold start latency for an API call by constructing an ad-hoc query and generating a graph:

For an example of what you can do with CloudWatch data from the command line, the script shown below captures usage metrics for all the DynamoDB tables and indexes in the sample code:

We’ll be working with this particular script in the DynamoDB labs, later in this course. For now, here’s an example of the output generated by this script, following a test simulation. This demonstrates how you can extrapolate what your production costs might be, based on resource usage for particular test scenarios you might run: