Using Batch Updates to Populate Data
This tutorial references the code in the aws-connectedcar-dotnet-core repository. If you're new to this course, see the introduction for information about setting up your workstation and getting the sample code.
Once again, we’ve looked at a lot of code in this section, and now we’re ready to shift to some hands on tasks. The next few labs will demonstrate how you the following: how to use batch updates to populate large amounts of data in DynamoDB; how to perform ad hoc queries in both the console and from the command line; how to use CloudWatch metrics to estimate resource usage; and lastly, what measures you can take to protect your DynamoDB table data.
Getting Credentials
The first thing you’re going to do in these labs is populate some test data using the Tools utility. But before you can run this utility, you need to get some temporary credentials.
Step 1: Run the get-credentials.zsh script
To get these credentials, open the “aws-connectedcar-common” repository in VS Code. Then, open the “scripts/sts/zsh” folder, where you should see the “get-credentials.zsh” script, as shown below:
This script calls the “aws sts get-session-token” command, which uses the identity associated with your local AWS CLI configuration to issue temporary credentials. Run the script, after which you should get a JSON output like that shown below:
Copy this JSON and save it in a text editor for the next step.
Step 2: Add the credentials to the Tools configuration
Next, you need to add the credentials you’ve just generated to the configuration for the Tools utility. Note that the configuration files are different between the .Net and Java versions of this utility.
For the .Net version, open the “aws-connectedcar-dotnet-core” repository in VS Code. Navigate to the “serviceconfig.json” file in the “/src/ConnectedCar.Core.Tools” folder. You should see the JSON configuration file, as shown below, with four AccessConfig properties ready to be entered:
Set the “Region” property as required, e.g. “us-west-2”, and then copy and paste the next three credential values from the output of the script you just ran.
For the Java version, open the “aws-connectedcar-java-core” repository in VS Code. Then navigate to the “config.properties” file in the “/main/tools/src/main/resources” folder, as shown below:
As with the .Net version, set the AWS_REGION property to match your selected region, e.g. “us-west-2”. Then copy and paste the credentials in the corresponding config properties. Once these are saved, perform a maven build using the Cmd-P shortcut for the “build-all” task in VS Code.
Running the Tools Utility
The Tools utility is structured the same in both the .Net and Java versions. For both, it uses a simple key-capture routine along with a switch block to run a command class that corresponds to the user input. There are three command classes to populate dealer, customer, or appointment items in DynamoDB, along with related data such as dealer timeslot items, and customer vehicle and registration items. There are two sample CSV files in the source repository that are used by default, one set of files for dealers, and another for customers.
Note that you must run the language version of the tools utility that matches the language of the deployed solution. The enumerators are serialized slightly differently between the Java and .Net versions, so you can’t mix different languages with the same data.
Here’s the .Net version of the PopulateCustomersCommand class, to give you some insight into how this code is written:
In this code, the CSV file(s) are read on line 25, get split into smaller chunks on line 27, and are then processed on parallel threads on lines 29-44. You can see the calls to the three data services, on lines 41-43, where the DynamoDB batch updates are performed.
Step 3: Run the Tools utility to populate dealer data
When ready, select the “Run and Debug” view, on the left in VS Code, and click the “Tools Launch” option in the dropdown at the top left. When the console application starts, switch to the TERMINAL tab on the right, and type “1” for the “populate dealers” option. After a short interval, you should see the elapsed time displayed for the completed task, as shown below:
One way to quickly confirm that the expected number of items were added to the table in DynamoDB is to run the item count routine in the console. To do this, navigate to the DynamoDB service in the console, select the “Tables” option on the left, then select the table in the next panel. You should see the “Overview” tab, as shown below:
Click on the “Get live item count” button in the “Items summary” panel, on the lower right. On the dialog box that displays, click the orange “Start scan” button. This will perform a scan of the table, which will calculate how many items it contains. (Note that like any DynamoDB data operation performed on the console, this consumes capacity units.)
The result of the scan should look something like this:
Step 4: Run the Tools utility again to populate customer data
Repeat the same VS Code debug launch steps, but this time type “2” for the “populate customers” option. This will add 25,000 customers along with related vehicle and registration items.
Here’s an example run with this option:
Now, a quick note on performance. As you can see from the screen shot above, 75,000 items were added in only 32 seconds. And, we’re running this utility on a local workstation, so there’s latency to the AWS region included in this number. An individual item saved to DynamoDB from a warm Lambda will typically take about 5ms, resulting in a throughput on a single thread of about 20 per second. With the batch update, we have a throughput of more than 2,000 items per second. That’s a big difference.
Step 5: Run the Tools utility again to populate appointment data
Run the Tools utility one more time, and select the third option, to populate appointment data. Together with the previous batch updates, this will provide a good sampling of data that you can work with in the next lab.