Storing large amount of data efficiently in Storage

This is a continuation of this issue on GitHub:

To give a summery, there are some JSON data stored in storage table. Clients need to have the latest version of these data to start the game. Let’s call these data definitions.

A couple of questions arise when dealing with definitions.

1- What is the most efficient way to store definitions? Should everything be in one entry, or we should separate them as much as possible?

2- How much data in bytes should we have to even start worrying about database read performance? (i.e., if there is only 1 KB of definitions, it doesn’t really matter what you do. It’s going to be fine)

Also, in the GitHub issue, it was mentioned that we should look which game screen needs definitions at game start. In my use case, we prefer to get all definitions for every game screen at startup to avoid any unnecessary loading time when traveling between game screens.

2 Likes

@mnarimani These are a good set of questions. I’ll answer each inline.

What is the most efficient way to store definitions? Should everything be in one entry, or we should separate them as much as possible?

It would be really good if you could provide more specific details about the entities that make up these definitions. For example are they collectable card game stats, is it an inventory system, is it a shop catalog? The details do matter for what the best logical breakdown of these data structures look like.

How much data in bytes should we have to even start worrying about database read performance?

There’s two ways to consider this problem (though it differs slightly between AWS Aurora, Postgres, and CRDB):

  1. We limit batch reads to 100 objects per batch request though we may lift this to 1000 in a newer release. Anything above 10,000 storage objects in a single read will start to degrade over time as the dataset grows. This is just a functional restriction on how IO is utilized by database engines on disk.

  2. In general you want to denormalize your data objects into logical groups. For example I’d put all collectable card definitions in a single storage object. The upper limit on a storage object depends on the specific database engine you use but in Postgres its 1GB and in CRDB I’d suggest you keep it to around 64MB max.

In my use case, we prefer to get all definitions for every game screen at startup to avoid any unnecessary loading time when traveling between game screens.

Sure, this approach is fine too. You could also consider a foreground load at game start and a background load for data which will not be “seen” by the player for a little while perhaps because of the layout of the UI in the game client.

Hope this helps.

3 Likes

@novabyte That was a great response. Thank you.

At last, I chose to split data into logical groups as much as possible.
Even though in our case (~500 KB of data) it doesn’t make much difference in performance, it makes a lot of difference in ease of use. With split data it’s much easier to work simultaneously (There won’t be a lot of version rejects).

3 Likes