Courses / Learning AWS Serverless / Part 5: DynamoDB / Backgrounders

Modelling Data for DynamoDB

There are lots of different use cases for DynamoDB, many of which don’t involve relationships between tables. You might, for example, have bundles of settings for an application, all of which could go into a single table or perhaps multiple, unrelated tables. But lots of databases, even when built on a NoSQL database like DynamoDB, nonetheless are an expression of an application system model, and as such they have relationships between entities.

The sample code for this course is one such example. We touched on the limited indexing options with DynamoDB tables in the previous lesson. What we’ll do here is step through some examples from the sample code, showing ways you can model data to support table relationships, even with these limitations.

The ConnectedCar Data Model

For context, let’s start by outlining the full data model for the ConnectedCar solution, including both the DynamoDB tables and the global secondary indexes used (shaded, on the right). Note that this model is designed as a teaching exercise, not as something you would put into production. The model is shown in the diagram below, with key attributes in the second box for each table, separate from the non-key attributes in the box below that:

As you will have read in the “Getting Started” documentation, this data model is for a service that’s designed to enable vehicles to submit runtime events, and for customers to book service appointments about these events, online. What follows below is a summary of the tables shown. In accordance with the nomenclature for DynamoDB we’re using the word “item” in place of “record”:

The Dealer & Timeslot Tables

The Dealer table represents car dealership locations. And for each of these locations, the Timeslot table is populated with items representing a "serviceDateHour" and the number of service bays available for these times. The Timeslot items basically describe the service hours of the associated dealership.

The Customer, Vehicle & Registration Tables

The Customer and Vehicle items are populated independently and durably represent both real-world entities. These two entities can then be linked together in the Registration table which also has a status flag to indicate whether the link is active or not. This way, even after a vehicle is no longer "registered" with a specific customer, the history of that linkage is preserved.

The Appointment Table

This is another junction table which is designed to link an active vehicle registration to an available Timeslot item. There can only be as many appointments for a Timeslot “serviceDateHour” value as there are service bays.

The Event Table

This is the table that holds information about runtime errors and warnings submitted by a vehicle.

Note that to avoid hot partitions, the partition keys in the solution are either GUIDs or reasonably long string-formatted fields such as the customer username or vehicle vin.

Querying Child Tables

The diagram below illustrates our first modelling example, which is a simple one-to-many relationship between the parent Dealer table and its related Timeslot child table. Using the keys in the Timeslot table, with DynamoDB you can efficiently query for Timeslot items associated with a given “dealerId” value. You can also query for a subset of these associated Timeslot items by applying conditions to the “serviceDateHour” range key.

Note that you can’t include conditions for serviceBayCount values in queries for the Timeslot table because this attribute is not included in the keys. If you want to retrieve a subset of queried items based on “serviceBayCount” values, you need to perform the query first, using the keys, and then filter the results in code for the desired non-key values.

Alternatively, if you want to retrieve a result set based on the “serviceBayCount” alone, then you have to perform a scan of the table. Be aware that this is a more expensive operation because you consume capacity units for each item read, and a table scan basically has to read every item in the selected table.

Using Global Secondary Indexes

For our second example, shown in the diagram below, we have a Registration table for vehicle registrations that combines the partition keys from the Customer and Vehicle tables. Given the order of these keys, the queries that you can perform against this table are limited to retrievals of registration items for specified customers based on their username values.

To query in the other direction for registration items associated with a specified VIN number requires the use of the VehicleRegistrationIndex, as shown on the right. This index reverses the order of the keys which enables the alternative, VIN-based query to be performed.

Using Compound Key Values

The Registration table that we saw above is a junction between two tables that both have one-attribute keys. As a result, the Registration table simply combines these two keys to enable parent-child queries in one direction. It then employs a secondary index with these attributes reversed to enable queries in the other direction.

It gets more complicated with the Appointment table, shown above, which is acts as a junction of the Timeslot and Registration tables. Both these tables have two-attribute keys, so we can’t just combine these into a pair of keys for Appointment items. Instead, the strategy we’ve used with Appointments is to first define a single, GUID partition key for the table. Then we’ve added a pair of delimited string attributes whose values relate to the keys in the Timeslot and Registration tables. The timeslotKey combines the dealerId and serviceDateHour key values from the Timeslot table, while the registrationKey does the same for the key values from the Registration table.

By creating a pair of secondary indexes that use these string attributes as compound keys, we make it possible to perform queries for child items in two directions, using a pair of secondary indexes. The TimeslotAppointmentIndex enables clients to query for Appointment items for by concatenating the keys for a Timeslot item. The RegistrationAppointmentIndex, meanwhile, enables a similar query for Appointment items, but by concatenating the keys for a Registration item.

De-Normalizing for Performance

Looking at the Customer, Vehicle, and Registration tables in the model once more, let's imagine that we need to query for Customer items that have related Registration items with status codes that are "active". When working with a relational database, we would simply add a non-unique index to the statusCode column and add an additional where clause to a query that would join the Customer and Registration tables, and filter for the results we need.

With DynamoDB this is more challenging. As defined in the model above, you would have to perform a table scan of the Registration tabled to retrieve those items with an “active” status. You would then have to loop through these items in code and retrieve the related Customer items individually, which would also have to be stored in a dictionary to eliminate duplicate results. There is also no secondary index that could be applied to the Registration table, either, because you would have to use the statusCode as the partition key, and this attribute doesn’t have enough values to avoid creating a hot partition.

If this was an operation that was run frequently enough for the usage cost to be a factor, then a workaround would be to selectively denormalize the data model. You could add an “isActive” attribute to the Customer table and add client code to keep this synchronized with the values of the related statusCode attributes in the Registration table. This attribute would then support a simple scan of the Customer table to satisfy the requirement. This approach is not ideal, but given the limitations of DynamoDB, it might be the only way to streamline an operation like this.