An independent guide to building modern software for serverless and native cloud

Understanding Lambda

Now it’s time to look at Lambda, which is the service that more or less defines the term “serverless”. With Lambda you have a simple programming model where all you have to do is implement an event handling interface. It’s easy to associate Lambdas with API events in SAM templates, and easy to build and deploy Lambdas to AWS. Once deployed, you can take advantage of on-demand compute to pay only for the resources used, and your Lambdas automatically scale to meet demand with no extra work on your part.

Without question these are a lot of benefits, and they explain why Lambdas are such a compelling service. But there are also trade-offs involved. So, what we’ll do in this lesson is outline how Lambdas work under the covers. Our goal is to gain an understanding of the inherent limitations that also come with this service.

Understanding the Programming Model

Let’s start by looking at some example code. With Lambda you’re implementing a programmatic interface for an event handler. Depending on the type of event to be handled, the interface being implemented will have different request and response method signatures. In the code below, you can see a Lambda from the sample code that handles an API Gateway proxy request:

Like any event handling code that’s intended to scale, Lambdas should be stateless. You might cache interfaces to external services, such as database clients, but you don’t want to be saving conversational state in instance variables. Note also that as an on-demand compute service, your code is necessarily sandboxed. So you can’t do things like launch executables or access the underlying file system.

Providing On-Demand Compute

Let’s say you’ve deployed your Lambda to AWS and then you invoke it through an API Gateway integration. What happens?

There are two analogies we’ll use to explain how Lambdas work, and the first is memory management in modern programming languages. So, when writing code in older languages like C, you have to allocate memory when instantiating data structures. Then you have to make sure you deallocate the same memory when it isn't needed any more.

Once interpreted or VM-based languages with memory management came along, these concerns went away. Now your runtime environment manages a pool of pre-allocated memory for you and ensures that instantiated objects are always provided with the memory they need. When these objects are no longer in scope, the memory they take up is recovered automatically, sometimes through mark-and-sweep garbage collection, and sometimes by tracking reference counts.

Lambda provides compute resources for deployed Lambda code in a broadly similar way. As the diagram below illustrates, there’s a pool of pre-allocated compute resources in the form of servers that can run the lightweight VMs on which Lambda code is hosted. When a specific Lambda is invoked for the first time, the Lambda service finds spare compute, launches the hosting VM, loads and initializes the deployed code, and then processes the request. If no further invocations are made after a pre-defined interval, then, like with memory management, the compute resources are released.

The key concept that applies equally to memory management and to Lambda is that for both, the runtime environment mediates access to the target resource. Memory managed programming environments mediate access to objects (and therefore don’t use pointers), and the Lambda runtime mediates access to Lambda instances.

The second analogy involves the telephone system. We all know that if there’s an emergency event, like an earthquake, we can’t all simultaneously make calls on our phones because there isn’t the capacity in the system to serve everyone at the same time. Lambda is similarly constrained by capacity limits. Even AWS does not have an unlimited supply of servers in its data centers. Lambda as a service is possible only because at any given moment not all deployed Lambdas are simultaneously executing.

Enabling On-Demand Pricing

The process that enables on-demand compute also enables on-demand pricing. As we’ve shown above, shared compute resources are only used when a Lambda is invoked. Otherwise, it’s in the pool and available for others, meaning that with Lambda, AWS can afford to charge for the compute that you use, instead of the capacity you would provision for conventional server resources. In addition, the fact that all calls to Lambdas are mediated by the service means that it’s possible to capture the size of requests, and the time spent processing, and to charge the user based on these metrics.

With Lambda, AWS has to provision resources for both the service overhead as well as surplus capacity to handle spikes in traffic and the lag time to release compute back to the pool when no longer used. So, if you have a service with steady traffic volume, it may be cheaper to pay for capacity. But the on-demand model gives you the flexibility to deploy, for example, from low-volume environments at nearly zero cost, or environments with peaks and valleys of traffic without extra management overhead.

Highlighting Limitations

So, having outlined the programming model and how on-demand compute and pricing works, we can also start to see what the inherent limitations are when using Lambda. First, you’re deploying code onto a runtime that’s provided, instead of one that you set up yourself. The benefit is that you don’t have to set up and maintain servers. The trade-off is that AWS can only provide a limited combination of platforms and versions. If you need to run code that depends on an unsupported language platform or version, then you’re going to have to look at alternative technologies.

Second, in order to deliver on-demand compute that runs on shared infrastructure, AWS has to limit the size and duration of the requests that are processed. As a result, the maximum size of a request payload is 256 KB, and the maximum allowed processing time is 15 minutes. Again, if your use case doesn’t fit these constraints, you’ll have to look elsewhere for a solution.

Dealing with Cold Starts

Lastly, we should talk about the cold start issue. Once your Lambda code is deployed, your requests will be routed through the service, to the designated server and back with an overhead of not more than a few milliseconds. But the first time your code is deployed on a server, whether from being called the first time ever, or because the number of deployed instances of your code is scaling up, you have the overhead of launching the VM and then loading and initializing your code.

And it’s this last step that is slow for some language platforms. Interpreted languages like Javascript or Python initialize more quickly than compiled languages that run in VMs like Java or C#. The latter two languages do a lot of profiling and JIT compilation at the start that reflects the fact that they were designed not for fast initialization, but for fast execution once initialized.

We’ll dive deeper into this issue in the labs later in this section of the course. What we can note here is that .Net appears to have been optimized for somewhat faster initialization in recent years. The C# sample code in this course takes about 1.5 seconds to execute on a cold start, and sub-100 milliseconds thereafter.

It used to be worse for Java, where a Lambda that uses the DynamoDB client interface, for example, might take several seconds for a cold start. Now AWS offers a mechanism to cache the initialized Java runtime. This cuts cold start latency down to about what we see for .Net code. As the labs will demonstrate, you also have the option to provision specified Lambda capacity ahead of time. This changes the cost equation for the service but does largely eliminate cold starts.