Courses / Learning AWS Serverless / Part 5: DynamoDB / Labs

Performing Ad Hoc Queries & Scans

This lab references the scripts in the aws-connectedcar-common repository. If you're new to this course, see the introduction for information about setting up your workstation and getting the sample code.

In this lab we’re going to show how you can perform ad hoc queries and scans of table data in DynamoDB. We’ll run through some examples in the console, and then we’ll show how you can perform the same operations from the command line.

Note that we’re going to be working with the data that we populated using the Tools utility in the previous lab. If you skipped ahead to this lab, you’ll need to go back, because you won’t actually have any data to work with.

Working in the Console

We’ll start by showing how easily you can query for items in the console.

Step 1: Run a query for a single customer item in the console

Navigate to the DynamoDB service in the console, then select the “Explore items” option on the left. Select the “ConnectedCar_Customer_Table_Dev” table, expand the “Scan or query items” panel on the upper right, and select the “Query” option. You should see a page that looks like this:

Next, copy and paste “anayamorrison244” into the “username (Partition key)” field and click the orange “Run” button. You should then see one item in the results at the bottom right, as shown below:

If you click on the username for this item, and then select the “JSON view” at the top right, you should then see the DynamoDB-formatted JSON for this item:

Step 2: Run a query for a single Timeslot item in the console

The Customer table that we just queried has a partition key, but no sort key. Let’s run a query for a Timeslot item to see an example that uses two keys. On the same “Explore items” page, select the “ConnectedCar_Timeslot_Table_Dev” table, then copy to a text editor both a dealerId GUID and a serviceDateHour value for any item in the list. Then paste these values into the respective entry fields on the query panel at the top.

When you run this query, you should once again see one result in the lower right:

Step 3: Run a query for multiple Timeslot items in the console

Next, remove the serviceDateHour value from the query panel, and run the query again. You should now see all the Timeslot items for the specified dealerId partitionKey value:

Note that the initial count of items returned is an estimate. As you scroll through the pages of results, this count may increase. In the example shown above, the initial result was 50 items returned. Once the pages were scrolled to the end, the result count was 63.

Step 4: Run a query that uses a Global Secondary Index

Let’s now run a query to view vehicle registrations for a customer. Select the “ConnectedCar_Registration_Table_Dev” table and, once more, paste the “anayamorrison244” username into the username field. When you run the query, you’ll get one result, because the populate customers batch update in the previous lab created one related pair of vehicle and registration items per customer item:

As we showed in the modelling lesson, you cannot query by vin alone with this table. This field is the sort key, which can only be used to specify a single registration for a specified customer username.

However, this table has a Global Secondary Index that makes vin-based queries possible. So, click on the “Select a table or index” dropdown in the query panel and select the “VehicleRegistrationIndex”, as shown below:

Next, click on the “vin” attribute in the results at the bottom, copy the value, and paste it into the “vin” entry field that should now be visible on the query panel. When you run this query, you should, of course, get the same result as with the table query:

Step 5: Perform a table scan in the console

Sometimes you need to find items in a table using attributes that are not key values. For these cases you need to perform a table scan instead of a query. To demonstrate, select the “ConnectedCar_Dealer_Table_Dev” table, and click on the “Scan” option. As you can see, the query panel no longer provides input fields for the keys. Instead, open the “Filters” panel and enter “stateCode” for the attribute name, select Number for the type, and enter a value of “41”. (For a Java-populated table, select String and a value of “OR” instead).

When you run the scan, you should see something this, with 17 items returned:

Up to now, we’ve been performing queries that consume an extremely minimal number of capacity units. A query that returns a single item for the tables in this solution, for example, consumes only 0.5 capacity units. A table scan, however, by definition can’t take advantage of an index and so has to sequentially read every item in a table. Our populated Dealer table doesn’t have a huge number of items in it, so the capacity units consumed in the example shown above is only 52. But be aware, this would obviously be larger for a table with more data.

For context, the price for on-demand read capacity units in the US West Oregon region, as of writing, is US$0.25 per million request units. So the specific table scan shown above costs about US$0.000013.

Working from the Command Line

As we’ve noted in other labs the console is a great way to perform quick, ad hoc tasks. However, if when working with DynamoDB you need to query repeatedly for the same sets of data, like you might during development, you’ll find the command line to be a more effective tool. To demonstrate, let’s step through the same operations we just did in the console, but using sample scripts from the “aws-connectedcar-common” repository.

Step 6: Run the get-item-customer.zsh script in the terminal

open the “aws-connectedcar-common” repository in VS Code and navigate to the “/scripts/dynamodb/zsh” folder. The “get-item-customer.zsh” script is the first that we’ll run here. It reproduces the functionality of the customer query that we performed in step one above.

When you run this script, you should see JSON output in the terminal like this:

The JSON output from this script should match the JSON that you saw displayed in the console in step one, although the attributes may be sorted differently:

Step 7: Run the get-item-timeslots.zsh script in the terminal

Next, run the “get-item-timeslots.zsh” script, which reproduces the query that we ran in step two. This query specifies a partition key and a sort key. Copy and paste values for the “dealerId” and “serviceDateHour” keys, the same as you did above.

Run the script, after which you should see a result this like:

Step 8: Run the query-timeslots.zsh script in the terminal

Note that in the console you’re performing a “query” whether you’re specifying all the keys or only part of the keys. As you can see from looking at these two sample scripts, the DynamoDB CLI has a “get-item” command to use where all the keys are specified, and only one item is returned. Then there’s a separate “query” command that takes a key condition expression as an argument. With this command, you can specify only a hash key, or specify a hash key and conditions for the range key. This command can potentially return multiple items.

The “query-timeslots.zsh” script, shown below, shows the use of the “query” command combined with a key-condition-expression. This script reproduces the results of the console query from step three above:

When you run this command, you should see multiple items in the JSON output:

Step 9: Run the query-registrations.zsh and query-index.zsh scripts in the terminal

The “query-registrations.zsh” script uses the same arguments that we just saw with the “query-timeslots.zsh” script, but this time returning a single item, because that’s the extent of the data:

As with the console query performed in step four above, the “query-index.zsh” script specifies the “VehicleRegistrationIndex” (on line 7 below). This enables it to use the vin as the hash key instead of the customer username:

As we saw in step four, this query returns the same result as the previous one:

Step 10: Run the scan-table.zsh script in the terminal

Lastly, run the “scan-table.zsh” script, which performs a scan of the dealer table, filtering by the stateCode attribute. Here’s what the script looks like:

Again, note that the .Net version of the code serializes the StateCode enumerator with ordinal numeric values, while the Java version does so with string name values. So, line 7 above shows “N”:”41” as values for data written by the .Net code, while the Java equivalent would be “S”:”OR”.

Here’s some example output for this script when redirected to a file: