0

Based on this original question and the follow up question from 2015.

Suppose I have a jsonb column data within a table tracks. This column holds an array with multiple keys e.g. Artist and Title in the JSON data. So we have something like:

INSERT INTO tracks (id, data)
VALUES (1, '[{"artist": "Simple Plan", "title": "Welcome to My Life"}]');

I understand that there are two Options to index the key artist:

  1. GIN Index on data as a whole

    CREATE INDEX tracks_artists_gin_idx ON tracks
    USING GIN ((data));
    
  2. A functional Index using for example json2arr described below

    CREATE INDEX tracks_artists_gin_idx ON tracks
    USING GIN (json2arr(data, 'artist'));
    

    CREATE OR REPLACE FUNCTION json2arr(_j json, _key text) RETURNS text[] LANGUAGE sql IMMUTABLE AS 'SELECT ARRAY(SELECT elem->>_key FROM json_array_elements(_j) elem)';

    --Index Usage: SELECT * FROM tracks WHERE '{"Simple Plan"}'::text[] <@ (json2arr(data, 'artist'));

In my case, the data array is huge and has many more keys. So Option 1 is a bad idea. Is there a third option in 2021? Or is Option 2 still the way to go?

EDIT 16.02. : This is the real world WHERE condition to be optimized:

 WHERE event @> '{"data":{"allFahrtabschnitt":[{"fahrtnummer":"62786"}]}}';

EDIT 22.02. : Option 2 works so far and delivers great results. I would still be interested if other Options exist that can use the standard jsonb Syntax.

 CREATE OR REPLACE FUNCTION events.json2arr(_j jsonb, _key text)
     RETURNS text[] LANGUAGE sql IMMUTABLE AS
 'SELECT ARRAY(SELECT elem->>_key FROM jsonb_array_elements(_j) elem)';

CREATE INDEX interface_dk_fahrt_v4_fahrtnummer_gin_idx ON events.interface_dk_fahrt_v4 USING GIN (events.json2arr(event->'data'->'allFahrtabschnitt', 'fahrtnummer'::text)); -- completed in 5 m 33 s 209 ms

-- Selection - must match Index-Syntax! SELECT metaid FROM events.interface_dk_fahrt_v4_20210219 WHERE 1=1 AND '{"31650"}'::text[] <@ (events.json2arr(event->'data'- >'allFahrtabschnitt', 'fahrtnummer'::text)); -- 127ms

-- Selection with incorrect Syntax -> Index isnt used SELECT metaid FROM events.interface_dk_fahrt_v4_20210219 WHERE 1=1 AND event @> '{"data":{"allFahrtabschnitt":[{"fahrtnummer":"31650"}]}}'; -- 5 Minutes

EDIT 09.04. : Here is a new option that we learned when looking into json_path_query that needs no custom functions:

CREATE INDEX interface_dk_fahrt_v4_fahrtnummer_gin_idx ON events.interface_dk_fahrt_v4 
USING gin (jsonb_path_query_array (event, '$.data.data.allFahrtabschnitt.fahrtnummer[*]));
HolgerDB
  • 1
  • 1

0 Answers0