Understanding CloudWatch
One of the challenges you face when starting to run serverless solutions in the cloud is the loss of the server-centric tools you previously relied on to log, instrument, and profile your code. With serverless, you find yourself with an array of services in your solution and you need a way to get visibility into what’s happening, equivalent to what you had on traditional servers.
This is where CloudWatch comes in. CloudWatch is the native AWS service that collects logging, metrics, and trace data from all the services in your solution, and then makes this data available through a range of console-based and command-line tools. Like CloudFormation, this is a service that’s essential to master when moving to the cloud.
Distinguishing Between CloudWatch & CloudTrail
Before going further, we should distinguish CloudWatch from the similarly named CloudTrail. Be aware that these are, in fact, different tools that are built for different purposes. CloudTrail is a service that logs the build-time changes made to the resources in your AWS account through the control plane. This covers all changes, whether made through the REST APIs, command-line tools, or the console. CloudWatch, on the other hand, is a service that captures run-time data from AWS services.
In other words, any action that creates, updates, or deletes a resource is logged by CloudTrail. Any action that utilizes a resource in place will have event data captured by CloudWatch. Creating an endpoint in API Gateway, for example, is a build-time activity that's logged in CloudTrail. Once that endpoint is created, however, sending HTTP requests to it is a run-time activity that’s logged in CloudWatch.
Three Core Services
When you first look into CloudWatch, you'll a confusing array of services that all fall within the broader CloudWatch brand. But fundamentally, CloudWatch is built on a foundation of three core services: logging, metrics, and tracing. All the other services build on the foundation provided by these three, which are what we’ll focus on in this section.
The concept diagram shown below illustrates the primacy of these core services. At the top, the components in your solution push data into CloudWatch. On the left, you can see console and command-line tools through which you can query and analyze this data. Then, as shown with the CloudWatch Alarms service on the right, additional services within CloudWatch build on this foundation. In this example, the capacity utilization notifications and service auto-scaling features are built on top of the Alarms feature which itself works by monitoring CloudWatch metrics.
Here’s a quick overview of what you can do with these three services:
CloudWatch Logs
The logs data is accessible on the console as well as through the command line. You can use either method to filter logs for keywords. There is a "Log Insights" tool that lets you build expressions that can access log entry metadata and can incorporate multiple query criteria. There are also commands you can use to tail log streams, for nearly real-time output on a local terminal.
CloudWatch Metrics
As with the logs data, you can use the console to explore various metrics, create ad-hoc graphs, or define custom dashboards with visualizations. The command line gives you access to the same aggregated metric values, but in data formats you can import into other tools. As noted above, you can also define alarms on conditional metrics, using them to trigger notifications.
X-Ray Traces
Metrics provide a view of instrumented service data that’s aggregated over periods of time. The X-Ray service captures similar data, but applied to specific service invocations, which it then traces through the flow of execution. In a sense, metrics give you a horizontal view of how services perform through time. X-Ray traces give you a vertical view that profiles the performance of individual calls down a service stack.
Ingesting Data
Most AWS services push basic metrics to CloudWatch without you needing to take any extra steps. Some services, such as Lambda, offer advanced metrics that require extra configuration. Capturing logs always requires permission through a service role, sometimes applied at the account level, and sometimes at the resource level. X-Ray traces are usually enabled with special configuration, and as we’ll see in this section, with the help of some extra code for DynamoDB clients.
Exporting Data
If, instead of gathering data in CloudWatch you prefer to export it, that's also possible. CloudWatch log data can be exported to S3 buckets where it can be read by external tools. There's also a "Metric Streams" feature which can send metrics data to user-defined external endpoints.
Governing Data
Access to the data in CloudWatch is managed using Identity and Access Management (IAM), as you would with any other resource in AWS. Using IAM, you can control who has access to the data and what actions they can perform on it. You also have some control over how long CloudWatch data is retained. Metrics and trace data both have fixed retention periods of 15 months and 30 days, respectively. (Be aware that the granularity of the metrics data diminishes as the data ages). Logs data is retained indefinitely, by default, but you can define retention periods ranging from 1 day up to 10 years, applied at the log group level.