We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.

Anton Babenko • 5 years ago

You probably saved me couple hours of reading about GraphQL and DynamoDB patterns!

Alex D • 5 years ago

Great to hear! :)

markkwhelan • 4 years ago

I know this article is a bit old now, but the part of the article referring to GraphQL and having to make two trips to the database to resolve the User then the Orders for that User query is incorrect. The root query in the Query type would have a resolver that fetches your single-table record for a User. The User type in the GraphQL schema would then have an orders property, that would return [Order] in the schema. The resolver for the order property belongs to the User type. That order property resolver receives in its argument list the root (or source) object, which in this scenario is our User record pulled from the database in the root query, which because of it's single table table design, has the orders! So that resolver can simply return root.orders. No extra database trip is required! In a real world design, you'd check orders isn't undefined, in which case you would do another DB query. In your design, that wouldn't be the case. Understanding the GraphQL resolver chain is very important to writing efficient queries. So happy days with single table records, even with GraphQL.

Antonio Presto • 5 years ago

Hi Alex, great job, thanks! Just a detail, there is no need to add a resolver to return an object within another object in graphql, the internal object can come directly from the root object.

Dina Basumatary • 4 years ago

What if you did not need the internal object? You would not want your external object to fetch the internal object if the client does not need it. It would be over fetching.

Jon Nichols • 4 years ago

It is possible to inspect the request to determine if the internal object would be required by the caller.

ptejada • 4 years ago

Even then you will still end up over fetching some data just because you don't have as much control when querying data from a DDB table.

Matt Welke • 3 years ago

Great article. For me, it was my intro to single table design. A thought I had while reading this is would you only use single table design for read-only data access patterns, or use it for read and write patterns? When I looked at the examples in the article, I see read only use cases described. And when I read the article from Forrest Brazeal that you link to, he has a list of nine use cases, all of which are also read-only.

So where this confuses me, where I'd appreciate some insight if you don't mind, is how one would handle updates to this OLTP data. If we have just one table, maybe we could write DynamoDB requests to update the data. Problem solved. But if we find that the single table we created a year ago no longer satisfies all of our use cases, we'd need to solve that problem. One thing I think we could do to solve this is create a second "single table" for a new set of use cases. But, then our data is duplicated.

How do we ensure the data is updated no matter how many places it lives? Would we make sure the application layer is built to handle this, knowing where the data lives so that it can update it in each spot? Or would be have a separate database, maybe relational where the data is normalized, acting as source of truth? And we'd have some sort of syncing in place to copy the data from it to each DynamoDB table the data lives in for read only use cases?

Alex D • 3 years ago

Hey Matt, great question.

You absolutely need to think about your write patterns as well. I generally show the read patterns as they're easier as an introduction, but your points hold. My more recent presentations talk about how you need to be thoughtful about your write patterns, and I actually recommend designing your write patterns before your read patterns in most case.

Matt Welke • 3 years ago

Thanks for replying. I bookmarked a few recent AWS Events videos on YouTube:

- https://www.youtube.com/wat...
- https://www.youtube.com/wat...

Are these the more recent talks you're referring to?

And, knowing that you're meant to write to these tables, regarding my point about more complex designs where you have a separate database that all writes are done to (that you sync with many read only databases), would you say that's more of an advanced pattern that one would only resort to if a single table that supported their writes and reads didn't work?

Alex D • 3 years ago

Hey Matt, here are two resources that talk about the importance of write patterns:

- https://www.youtube.com/wat... (reInvent 2021 talk)
- https://youtu.be/YI67mWmjbZ... (A Decade of Innovation w/ DynamoDB. This is a 7-hour long event, but my talk starts around 4:15:00 in).

I would not advise having a separate database handling your writes and trying to duplicate into DynamoDB. DynamoDB can handle OLTP applications quite well and should be able to handle any write patterns that you have. The harder parts for DynamoDB are more ad-hoc read patterns -- flexible aggregations, complex filtering, or full-text search. I have seen people use supplementary databases in those situations.

Matt Welke • 3 years ago

Nice, thanks. Bookmarked.

Ajay Wadhawan • 4 years ago

Good Article !! Do you have any example where you have converted a relational database to dynamo db.

Matt Welke • 3 years ago

The link in the article to Forrest Brazeal's article (https://www.trek10.com/blog... is a good example of that.

Aidan Lawn • 5 years ago

Thanks Alex, helpful article. Does Aurora serverless change recommendations at all? It feels like it provides best of both worlds, serverless type pricing and scalability with good old relational data modelling. Trying to choose for new project, be interested in your opinion.

Alex D • 5 years ago

Hey Aidan Lawn, it doesn't change it too much for me. I think Aurora Serverless doesn't scale quite as well as I'd like for it to truly replace DynamoDB. It still has the horizontal scaling issues (and thus gradual performance reduction) of a relational database. Also, the 'serverless' scaling characteristics are a bit slow for most production use cases. AWS markets it more as a way to save on test environments or other infrequently used databases rather than as a rapidly scaling production database.

Connor Leech • 5 years ago

Awesome high level overview of DynamoDB single table design. Thank you!

Alex D • 5 years ago

Thanks, Connor Leech ! Glad it was helpful :)

Drew • 8 months ago

I wrote an article about using dyna-record to make these patterns easy to implement

https://medium.com/@drewdavis888/unlock-relational-data-modeling-in-dynamodb-with-dyna-record-5b9cce27c3ce

Drew • 1 year ago

This is a great read! Thank you for another great article. I recently published a typescript npm package encapsulating many of these design patterns and abstracting away the read/write operations and complexity of these patterns using dynamo transactions. I am interested to know what the community thinks and if this solves any of the stated concerns regarding complexity.

https://www.npmjs.com/packa...

Nacho Martín Moreno • 1 year ago

Hi, regarding the graphql statement: "In this flow, our backend is making multiple, serial requests to DynamoDB to fulfill our access pattern." is not completely accurate. For a root-children resolver this is true, but with graphql you can parallelize and run concurrently several request to the downstream services, specially in complex use cases.

Gary Bisaga • 2 years ago

This is a good discussion and I'm benefiting from your other presentations on DynamoDB. However I have a question that I have not seen addressed thoroughly: cardinality of the partition key. I often hear the partition key needs a high cardinality to avoid hot partition problems, and examples are always given like an on/off value or gender as "not good" partition keys. I have two questions:

1. These are clearly spectacularly bad choices, but it's unclear when other choices are good or bad. On YouTube you did a presentation to the AWS Portsmouth User Group where at 33:12 you have a table with organization as partition key and employee as sort key. Is organization high cardinality? What if you only have 100 organizations but thousands of employees per organization, would that cause a hot partition?

2. I'm confused about how you apply this to GSIs. In another video, there is an example of orders in a store, and one of the attributes is country. The presenter described wanting to query on country, and used the country as the partition key for the GSI. (I presume you'd need a sort key on the GSI since country is not unique, presumably the order id.) My understanding is that a GSI is basically another table with automatic replication. So, does the GSI get into a hot partition problem using country as its partition key?

Also, I have an unrelated question which I also have not seen addressed. In examples you often put multiple entity types in a single table, prefixed with the entity type and a hash sign. My question is, why? When you have a common partition key, I can see one benefit in that you can get data from both sides of the "join". In the Portsmouth User Group presentation I mention above, you could get information BOTH about Berkshire Hathaway AND about the employees in a single query.

But what is the advantage when the partition key is different? Let's take the example you gave in the mini-Instagram clone in your excellent presentation with Marcia Villalba, where you have five different partition keys, for users, photos, likes, etc. Is there an advantage to not putting these into five tables? The main thing I can think of is you might be able to add GSIs across these different item types.

Thank you!

Leonardo Angeli • 2 years ago

I did see this AWS office hour video with Rick Huilihan on YouTube (https://www.youtube.com/wat... where they expose this pattern that uses lambdas as data resolvers for "query" fields. Do you think this would be a good solution? Although I'm guessing the only "advantage" of GraphQL that remains after using this approach is the data consumption one, where you don't have to pull down the entire query result in your application therefore saving on internet traffic

Alex D • 2 years ago

Hey Leonardo -- this is one area where Rick and I disagree a bit :)

I'm mostly against single-table design for GraphQL as I think you're losing a lot of the reason you chose GraphQL in the first place. There are different tradeoffs there that may or may not be right for you, but it's part of the point of GraphQL.

Some further thoughts here: https://www.alexdebrie.com/...

Jason W. • 2 years ago

Hi Alex! Thanks for your articles on this design pattern, they have been a HUGE help to me.

I built this NPM package to take the sting out of maintaining & applying single-table design config across entities & partition shards. Super useful within the project I built it to support, but it's generic enough to be useful outside that context.

Hope it serves!

Alex D • 2 years ago

Hey Jason, great to hear!

Thanks for sharing the library -- I'll take a look!

wcanyon • 3 years ago

I think the date on your article should be near the top of the page. If I'm trying to find when the article was published I scroll down to the bottom of the page. But on this page that gets me to the bottom of the comments, which are large. Then I gotta scroll up and hunt around for the date. I always check dates because I want to make sure I'm not reading something that's 5 years old and probably out of date.

Alex D • 3 years ago

Updated! Thanks for the note

Ramin Hbb • 3 years ago

Good job! I really enjoy reading this article and learn a lot

Alex D • 3 years ago

Thanks! Glad you liked it :)

HACKSCOMICON • 3 years ago

banger article. thank you. really cleared some things up for me

Alex D • 3 years ago

Great to hear! :)

David Cuthbert • 3 years ago

How do you get around the 400 kB item size limit for a user? Seems like this would be very limiting to constrain everything about a user to 400 kB. We had considered this pattern recently, but ultimately rejected it due to this DDB hard limit.

Moving to a different engine, alas, is not an option; it's for a regulation-heavy place that moves very slowly to approve new platforms, so it was DDB or RDS.

Alex D • 3 years ago

It's been pretty rare for me to see items that large, other than blob-y like things that could be stored in S3.

Could you break that User item into pieces? Is it storing related data that could be split off into same items in the same item collection?

David Cuthbert • 3 years ago

In our case, we're storing security groups, load balancers, and related information so we could recreate applications in another region (using DynamoDB Global Tables), for users who aren't familiar with IaC tools (CFN, Terraform) and have set up a bunch of what I jokingly call hand-crafted, artisanal EC2 instances.

Prior to the advent of security group rules as objects, we were storing all of the rules in DDB directly (prefix lists and other security groups weren't expanded). I think I did the calculation and convinced myself that, given the AWS quotas, we would never run afoul of the 400 kB limit here, but I wasn't so sure.

We're about to dive into load balancers for this, but I've realized that the number of listeners, certificates, rules, etc., that can be applied explodes beyond 400 kB if my math is correct.

Having to go to S3 to fetch a JSON document separately, manage the lifetime so it's deleted when my item is deleted, etc., just feels so... undifferentiaed-heavy-lifting-ish. I kind of wish I could tell DynamoDB to just stuff this attribute into S3 for me behind the scenes.

Alex D • 3 years ago

Very interesting! Does this information change regularly? And are you querying by many different elements within the information, or just a few high-level elements, like the user and/or application name?

I agree with your broad point on DynamoDB + S3, but there's some weird nuance there that could be tricky in practice. DynamoDB is meant to be a low-level tool that's pretty straightforward in how it works. The one thing I can say is that this is something you implement one place in your application, in the data access layer, and the rest of the application shouldn't really need to worry about it.

Matt Welke • 3 years ago

In my experience, data this large is usually slow-changing (if mutable at all). Are these things like large pieces of JSON text? Article bodies? Little binaries like profile photos?

It might make sense to normalize that data. If it's binary data, the classic approach I think is to put the binary into something like an S3 bucket, giving you a URL. Then the URL becomes part of the document. This runs into the "extra round trips" problem Alex describes in this post. But usually, for big, slow-changing data, you don't mind this because you'd want to use something else to cache it, like a CDN, so you're already in a position where that data has its own URL and involves a second round trip.

If you ever need to mutate that data, you replace the data in the S3 bucket and leave the URL in the document the same, or make a new object in the bucket, delete the old one, and replace the URL in the document with the new URL. Even if it isn't binary data, I could see this approach working well.

If the data is big (bigger than 400KB) and changes frequently, this might not work so well.

Hyunil Kim • 3 years ago

A very informative and eye-opening article. Thank you.

How would you approach "single-table design for micro services architecture"? Would it be a big single-table for all services, or would it rather be a table per service?

Justin Menga • 4 years ago

In my experience I have found it is best to avoid nested GraphQL resolvers if you want to use single table design with multiple related entities. This means your resolvers are "root" level resolvers - i.e. they must resolve the entire query and resolvers should only be defined for fields under the Query type. This puts more work into the resolver, but ultimately you are not constrained to the serialized semantics of the GraphQL resolver model and allows more flexibility in how you can execute and optimize any additional DB queries that may be required to construct the full response.

Paritosh Agrawal • 4 years ago

Good article!
Question - what would be max layer of denormalization. What if i have category (PK) subcategories (SK) and product (?) Having index on product would be enough?
How should be the balance between single table design and relational design. You could need multiple single tables also right?

Yaron Miro • 4 years ago

Hi Alex,
Wonderful article! You explain everything in clarity and precision, everytime I incountred one of your articles or video presentations I felt the same. thank you for sharing your wisdom and knowledge.

MK • 4 years ago

I am trying to determine whether its still useful to use a guid for the actual key value. I understand the value in being able to look at the key and make sense of the data, but for a lot of objects (e.g. 'File', 'Folder'), it sort of locks you into the user being able to change later. I guess it just depends on what you are modeling? A customer's email isn't going to change. But for others, assigning a surrogate key makes more sense?

Dina Basumatary • 4 years ago

Hey Alex,
Great post. I've been implementing GraphQL applications with relational databases and it works out great! Things obviously gets messy when data model does not reflect the GraphQL schema but it nothing that cannot be handled.

I've also worked with DDB in the past and with single table design. I've been wondering why AWS Amplify provided relational design (multi table) with DDB. It makes sense now. You'd loose money to complexity and still end paying as if it was multi table. I guess we can still hack around GraphQL resolvers to preprocess the query separately, understand what is required and make a single request to DDB but why go to such extent when things are already so complicated.

I was wondering what you meant by this statement:

Further, if you are using DynamoDB On-Demand pricing, you won’t save any money by going to a single-table design.
Subash Adhikari • 4 years ago

I read your book and it's awesome. I was thinking how you would handle streams for single table design? Generally I have a lambda function processing the stream. Wouldnt it be hammered too hard with this approach and might require filtering logic for each entity types? What are your suggestions for this?

Tim Weisbrod • 4 years ago

With GraphQL, does anyone know if you could still use a single table structure, and then in somewhat the opposite of a lazy loading model (call it "eager loading"), have earlier steps in the resolver pipeline preload data if they were able to efficiently load it in one call, with later steps in the pipeline skipping their data access calls if the data has already been fetched?

Not sure if the initial resolver can make the right DynamoDB query to do so, but if it can, that might allow for single table designs to still provide performance benefits for some of the access patterns. Thoughts?

Benoît Bouré • 4 years ago

I agree. As Antonio Presto also said, there is no reason for having a resolver for each "level". You probably could go away with it by fetching the user and all hi/her orders in one DynamoDB query (using single table) from the parent resolver, "nest" the records and return the whole thing (No need for an Orders resolver).

There are some limits to that though. If you have deeply nested entities in your request (i.g: you also need the Orders' items), you will have to do it in an additional resolver and DynamoDB query.

That said, even though GraphQL allows it, in real world applications, your app will probably almost never query deeply nested relations in one go. You will probably see something like:
One GraphQL query for User and Orders in one page, and when the users clicks an order, another query to get the selected Order and its items. (Otherwise, it kind of defeats one of the purpose of GraphQL: no overfetching).

Chandan Jha • 4 years ago

Excellent piece of information. Enjoyed it thoroughly. Thank you for sharing it!!

Eric Cartman • 4 years ago

Hey the video link of the debate with Rick on twitch is broken, could you update that by any chance?

Sergey Kryvets • 4 years ago

You explained it in a way that 5 year old can understand. Very clear and informative. Thanks a lot! Please keep up the good work!

Alex D • 4 years ago

Thanks, Sergey Kryvets ! Glad you liked it :)