An independent guide to building modern software for serverless and native cloud

Instrumenting Services with CloudWatch Metrics

This lab references the scripts in the aws-connectedcar-common repository. If you're new to this course, see the introduction for information about setting up your workstation and getting the sample code.

CloudWatch logs and traces are both about understanding what’s happening at the level of individual service operations. CloudWatch metrics, on the other hand, is about getting aggregate values for many instrumented events across time. These aggregated values provide insight into the health and performance of your services as well as what resources they’re consuming.

To give you an idea of how you can make use of CloudWatch metrics, this lab will start by showing you how to use Postman to generate a sustained flow of sample data. It will then step through using the query and graphing tools in the console followed by a demonstration of some command line techniques.

Generating Sample Metrics Data

Because metrics data is aggregated across time, to demonstrate its use we need to generate some test activity over a sustained period as well. To do this, we're going to use the Flows feature in Postman. Unfortunately, Flows can’t be saved outside of a workspace, so you won’t see anything to import from the “aws-serverless-common” repository. Instead we’ll go through the steps here to build a simple flow.

Step 1: Build the Postman Flow to send a series of requests

From Postman, start by clicking on the "Flows" menu on the left, and then clicking the “Create Flow” link in the panel on the left, or clicking the plus icon at the top left. This should result in a new tab appearing in Postman for the new flow. Give this flow a suitable name, such as “Generate Metrics Data Flow” as shown below:

Next, click the "+ Block" button at the bottom and add four blocks to the canvas: “Number”, “Repeat”, “Delay”, and “Send Request”. For each block, you can enter the name in the search field, as shown below, and click the result displayed in the panel below, and then click again to place the block on the canvas.

Now, arrange these blocks in sequence and connect them by dragging the mouse from the output nub on the right of one block to the input nub on the left of the next block. You want to connect the blocks so that they look like the example below.

Lastly, set the properties inside the blocks as follows:

  • Number: set to 600
  • Delay: set to 1000 milliseconds
  • Send Request: Select the Get Dealers request

At this point, your flow should look like this:

Step 2: Run the Postman Flow to generate metrics test data

Once this flow is built, you can click on the orange “Run” button on the bottom, and it should then execute for the next ten minutes. Once complete, you can open the console in Postman, and you should see 600 successful requests interleaved with 600 warnings about setting global variables from within a flow (which in this case, you can safely ignore):

Querying Metrics Data in the Console

Now you can go to the console to start querying this newly generated sample data.

Step 3: Define a metrics query in the console

To get there, navigate in the console to the CloudWatch service, expand the “Metrics” menu on the left, click on the "All metrics" option, and select the "Query" tab. Then make entries in the query builder on the lower half of the page, as shown below:

  • Namespace: AWS/ApiGateway
  • Metric name: AVG(Latency)
  • Filter by: ApiName = ConnectedCar_Admin_API_Dev and Stage = api

The query builder should now look like this:

Step 4: Graph the query results in the console

Once the criteria are entered, run the query and then select the "Graphed metrics" tab. At the top of the page, click the "Custom" option, and select a time span that will include the ten minutes of generated data from above. Also at the top, select the "Stacked area" graph type. As shown below, you should also set the sampling period to a value that will provide granularity in the results and will also include some metrics within each. Our generated data has entries once every second, so we're using a 30 second sampling period. Here’s what these settings and graph output should look like:

Here are some additional notes about how you can use metrics to instrument your services.

Filtering

Metrics queries like this can have additional filters applied. In this case, the results for API Gateway can be filtered by one or more dimensions, such as API name, method, or resource path.

Alarms

You can use a metrics query as the basis for an alarm. You might, for example, query for the count of 4XX errors for a specific resource and method of an API. You could then define a threshold as a condition to trigger an alarm that sends an email notification to a support team.

Dashboards

You can also take any graph that you define, such as the one you did above, and add it to a custom dashboard. This can provide a visual indication of the health of your service, using metrics tailored to suit.

Querying Metrics Data from the Command Line

Take a look at the get-api-metrics.zsh script that’s in the /scripts/cloudwatch/zsh folder. This script uses the CloudWatch “get-metric-statistics” command, and the arguments used exactly mirror those used for the query we just ran in the console:

Step 5: Run the get-api-metrics.zsh script in the terminal

Run this script, and you should see results in tabular form, as shown below. Note that the datapoints are presented in random order:

This format can be used, with a bit of tweaking, if you want to import the metrics into a spreadsheet. As we’ve seen before, you can also use the default JSON output simply by omitting the —output argument, generating results like this: