We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.
Thanks, Leena :)
I would say that 'Primary Key' is more equivalent to index. If you have a table with a simple primary key, then partition key == index.
If you have a table with a composite primary key, then both elements (Partition key + Sort key) combine for the index.
Make sense
This is great, Alex. Coming from an RDB background, this is all new and this was really clear! One thing I'm struggling to figure out is how to handle multiple IDs to represent the same item.
As a fictitious example, there is a database of patients and drug prescriptions (many to many). If I have a single ID for a patient and a single ID for each drug, this seems ok, it can all live in the patient ID partition or the drug ID partition with a single GSI to query both ways: 1) patient -> patient's drugs and 2) drug -> all patients prescribed that drug
But say I want to be able to look up a patient by multiple IDs, e.g. hospital registration number OR social security number OR email address etc. Drug Ref ID 1 OR Drug Ref ID 2 OR Drug name etc. It's all querying the exact same data. How would you do this without having to create a GSI for every combination of ID or duplicating the exact same row multiple times with a different PK and SK?
Thanks! Yea, that's a tricky problem. There are some times when you will need to make multiple requests in DynamoDB, even if you generally try to avoid it.
It depends on your general requirements around latency and cost, but it might make sense to store a User record that is indexed by the different values (hospital registration, SSN, email, etc.). When you have one of the non-standard values, you can fetch the User record to find its 'canonical' identifier that is used to store the prescription info. Then you can go fetch the prescription info.
Does that work?
Thanks so much for your speedy reply Alex. That sounds sensible. Your solution actually prompted me to determine the amount of crossover between the IDs and actually only 40% of the records have all possible IDs which greatly reduces the amount of duplication.
I've read so much now about how you "only need one table" and your user should be able to get all the information "in a single query" that I was trying to force it to work. Of course there's always nuance...
Thanks again!
I'm wondering whether single table design is even a good fit in this case, I've tried to mock it up in NoSQL workbench and I already have 5 indexes for basically the same query (because of the need to query by different IDs). Seems really inflexible when flexibility is part of the reason I wanted to go down the DynamoDB route in the first place...
Hey Alex, I just loved this article. Every bit of it. I was working on a side project over the last month and decided to try out DynamoDB. Initially I struggled with the modelling but once I figured that out, I have developed immense respect for DynamoDB. And my search for how partitioning works in DynamoDB brought me here.
Just one question - can the request router become a bottleneck when dealing with heavy reads and/or writes, or is it serverless itself so that it can scale like a Lambda?
Glad to hear, Zohaib!
On the request router -- nope, it won't become a bottleneck. The request router for an entire AWS region is shared across all DynamoDB tables in the region, so a spike in your traffic will be barely a blip in the overall traffic. In fact, Rick Houlihan has even talked about how DynamoDB gets *faster* if you start hitting really high request per second numbers as the request router instances will have some of your table metadata cached on it.
This is great, thanks. Let me look up for Rick's conversation.
Hi Alex this was a wonderful read. I have a presentation where I am proposing a RDBMS model switch to dynamoDB. Been following your vidoes and articles and you do one hell of a job. Quick qs- for partition key , do I lose the benefit of fast serach if I use like in my partition key? . For example, for a partition key value EQ0000000000039108-ACQUIS-0, . I want to provide only EQ0000000000039108-ACQUIS as I can have values EQ0000000000039108-ACQUIS-1 and EQ0000000000039108-ACQUIS-2. I want all three values by providing only EQ0000000000039108-ACQUIS and effective date which is a sort key. Reason for this model is storing history. EQ0000000000039108-ACQUIS-0, -1 and -2 is just a way to store history with a sort date.
Hey, thanks! Glad you liked it.
I don't quite understand your question. Are you saying you want to do a 'LIKE' operation against your partition key? The DynamoDB API doesn't allow that -- you have to do an exact match on your partition key.
Yes, I want to do a 'Like' operation on primary key. Does API allow to do a like on sort key? I am struggling with modelling my data from RDBMS to Nosql. Existing table has too many combinaitons to make a key unique and with DynamoDB I can only have one primary key with sort key.
You can't do a 'Like' on a sort key, but you can do more flexible filtering -- before, after, between. Think of the sort key like a physical dictionary. It's very easy to get all the words after 'elephant' or between 'cactus' and 'dog', but it's hard to find all the words that include 'ing'.
Great Article! I was struggling to understand until i found your blog post. Thanks so much.
Great to hear! :)
And another one: ...
They include information about table structures and **indicies**, as well as statistics on the contents of a particular table. ... => indices
Another typo: ... This goes against DynamoDB’s core **philsophy** around ...
There is a typo: ... this table will need to **inclue** the primary key ...
D'oh! Thanks for all of these. They're fixed now :)
Great article Alex.
Just for my understanding,
Partition Key is roughly equivalent to Index in relational database...