Query by JSON data & Indexing

In my game, I have 3 different game states: Joinable, Ongoing & Complete. I store this state information inside value of a specific collection.

When searching for a specific type of game, I want to be able to run a query that returns games of a certain state. I understand that I must use SQL for this on the JSON column type in the nakama.storage table.

Here’s an example of an SQL query I will run:

SELECT collection, key, user_id, value FROM storage WHERE value @> '{"state":"Complete"}'

Here’s an example of another query where I look for a game that can be joined:
SELECT collection, key, user_id, value FROM storage WHERE value @> '{"state":"Joinable"}' AND value->>'minPlayers' >= $1 AND value->>'maxPlayers' <= $2

What’s the best way for me to create an index given cockroachdb?

Something like this, maybe?:
CREATE INDEX IF NOT EXISTS idx_car_model ON storage USING GIN ((value->'state'))

1 Like

Hey @Ndidi. This is a great post. Thanks for the question.

I’ll share as much detail as I can on what approach we recommend that will scale well with Nakama with these kinds of requirements but first some context.

Nakama’s storage engine API supports key-value batch gets, updates, and deletes as well as listings to return a range of objects in a collection (which optionally belong to a specific user). We don’t have an official partial update API yet or the option to filter within the contents of storage objects but this does not mean it’s not possible with the server.

As you’ve observed its possible to drop under the official API and use custom SQL queries to access your data in an optimal way. I’d definitely recommend this approach when you know your access patterns to subsets of your data like in your example and can set custom indexes.

SELECT
    collection, key, user_id, value
FROM
    storage
WHERE
    value @> '{"state":"Complete"}'

The SQL query example you’ve given needs to access a field called "state" with a value called "Complete" or similar. You’re using the "@>" (containment operator) which works well with a GIN index:

CREATE INDEX IF NOT EXISTS 
    storage_value_ginidx
ON
    storage
USING GIN(value);

Now if you’re certain that you’ll have specific states to the value for the field you want to query on within the storage object you could target your index to be more specific with a BTREE index and an index expression (NOTE: These only work in Postgres although support for them will come soon with cockroachdb).

CREATE INDEX IF NOT EXISTS
    storage_value_state_idx
ON
    storage ((value->>'state'))
WHERE
    (value->>'state') = ANY(ARRAY['Joinable', 'Complete']);

With a BTREE index you should use a different operator to access the data to take advantage of it.

SELECT
    collection, key, user_id, value
FROM
    storage
WHERE
    value->>'state' = 'Complete'

When you’ve created a custom index which will best fit the data you can verify what the performance profile will look like using an EXPLAIN statement. You want to look to avoid full table scans which are the worst performance you can achieve with any database. :slight_smile:

EXPLAIN (VERBOSE) SELECT collection, key, user_id, value FROM storage WHERE value->>'state' = 'Complete' LIMIT 100;

                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Limit  (cost=0.12..8.14 rows=1 width=596)
   Output: collection, key, user_id, value
   ->  Index Scan using storage_value_state_idx on public.storage  (cost=0.12..8.14 rows=1 width=596)
         Output: collection, key, user_id, value
         Index Cond: ((storage.value ->> 'state'::text) = 'Complete'::text)
(5 rows)

You can see that the query analyzer would use an “Index Scan” which is what you’d want rather than a sequence scan.

You should also remember to make sure that your queries use LIMIT to set an upper bound on the number of rows returned. If you return large ranges of rows in your results it will become a source of performance problems as the datasets grow.

In your later query you’re looking at more than one field.

SELECT
    collection, key, user_id, value
FROM
    storage
WHERE
    value @> '{"state":"Joinable"}'
    AND value->>'minPlayers' >= $1
    AND value->>'maxPlayers' <= $2

In this query because it includes ranges of results a composite index would not be optimal so instead you’d likely need to combine a few individual indexes. The specifics of how best to structure your indexes is one of the reasons we’ve not yet come up a generalized way to solve this in our storage engine API.

Finally for where to set these indexes you’ll want to apply them slightly differently based on whether your server-side logic is in Lua or Go.

In Lua use run_once.

local nk = require('nakama')
nk.run_once(function(context)
    local params = {}
    nk.sql_exec([[
CREATE INDEX IF NOT EXISTS ... etc
]], params)
end)

In Go you’d use the standard SQL driver in the InitModule function.

func InitModule(ctx context.Context, logger runtime.Logger, db *sql.DB, nk runtime.NakamaModule, initializer runtime.Initializer) error {
    exec, err := db.ExecContext(ctx, `
CREATE INDEX IF NOT EXISTS ... etc
`)
    if err != nil {
        return err
    }
    logger.Debug("index created: %v", exec.RowsAffected())
    return nil
}

Hope this helps.

4 Likes

Thanks @novabyte! This massively clears things up as well as highlighting BTREE indexing not being available in CockroachDB.

@Ndidi I’m glad it helped. Just one extra note. cockroachdb does support BTREE indexes but does not yet support indexing fields inside JSONB (i.e. "ON storage ((value->>'state'))") or conditional indexes (i.e. "WHERE (value->>'state') = ANY(ARRAY['Joinable', 'Complete'])") for use as part of a BTREE index. You can track their progress on this issue: https://github.com/cockroachdb/cockroach/issues/9682.

You can and should use the GIN index with the containment operator which works great. :+1:

1 Like

Thanks, @Ndidi for asking the query and @novabyte for clearing the confusion in detail.

I want to understand, as the question was answered more than two years back, do we have any APIs now for the indexing and retrieval on the basis of JSONB fields.