3

I need to model products with properties. Properties need to be dynamic, in the sense that application users can add / delete properties to products. For this reason, I am using a JSONB column storing a JSONB object to model those dynamic properties, with each field of the object a dynamic property, for instance:

{
  product_number: "snsug1",
  color: "white",
  weight: [100, 200, 300]
  ...
}

Furthermore, as can be seen in the example above, some properties are allowed to have multiple values, as some products can come in different configurations (different values for the weight property for instance, for packets of sugar of different sizes). I am modeling those as JSONB arrays.

I need to be able to select products based on some filter values for the properties.

Filtering based on comparisons on single value properties are straightforward (for instance where properties->'color' = '"white"'::jsonb) and overlap queries for multi-value properties are straightforward too when the overlap target contains discreet values, like where properties->'weight' @> any(array[to_jsonb(200), to_jsonb(300)]).

The problem comes with filtering on a multi-value property based on overlap with a range of values (instead of overlap with discreet values as above). So my question is: how can I filter on a multi-value property like weight, with criteria that any of the values in the array should fall with a range?

Something like (pseudo code):

select id from products where properties->'weight' @> range(50, 150);

should return the entry in my example above, but

select id from products where properties->'weight' @> range(350, 500);

should not.

The weight @> range(50, 150) bit in the pseudo code query above is what I am asking about. What should that look like?

If I am barking up the wrong tree altogether and this is not doable with my current model, then what would be a good alternative for solving my problem? Is there a more appropriate way of modeling the data that can lead to elegant and fast queries of the type above? The solution needs to perform well on large data sets, it needs to be able to make use of indexes for the above types of queries.

yakutori
  • 31
  • 1
  • 4

3 Answers3

4

You need to use a sub-query that unnests all array elements together with an EXISTS predicate:

select p.*
from products p
where exists (select *
              from jsonb_array_elements(p.properties->'weight') as t(weight)
              where t.weight::int <@ range(50, 150));
1

If this is dynamic, how do you know that weight exists and is an array of int in the first place, in order to write the query?

There is (strangely) no function or operator to convert a JSONB array to a PostgreSQL array, you need a helper function to do that:

CREATE OR REPLACE FUNCTION public.jsonb_to_intarray(jsonb)
 RETURNS int[]
 LANGUAGE sql
 IMMUTABLE
AS $function$
select array_agg(x::int) from
   (select jsonb_array_elements($1)) as f(x);
$function$

Once that is in place, you can use the ANY feature of PostgreSQL:

select id from products where 
    int4range(50, 150) @> any(jsonb_to_intarray(properties->'weight'));
jjanes
  • 42,332
  • 3
  • 44
  • 54
1

If you chose to operate with a Postgres array, use an efficient conversion function with an ARRAY constructor:

CREATE OR REPLACE FUNCTION pg_temp.jsonb_arr2int_arr(_js jsonb)
  RETURNS int[] LANGUAGE sql IMMUTABLE AS
'SELECT ARRAY(SELECT jsonb_array_elements_text($1)::int)';

(Assuming all values are indeed integer.)

See:

But you can operate on the nested JSON array in the jsonb column directly, too. The ANY construct also accepts a set. See:

So:

SELECT *
FROM   products
WHERE  int4range(50, 150) @> ANY ((SELECT jsonb_array_elements_text(properties->'weight')::int));

Either way, if keys like 'weight' are only present in relatively small subsets of all rows, the query can profit from index support. While I don't see an easy way to employ an index for the range containment itself, a default jsonb_ops GIN index should be instrumental to identify rows with a 'weight' property to begin with:

CREATE INDEX products_properties_idx ON products USING GIN (properties);

SELECT *
FROM   products
WHERE  int4range(50, 150) @> ANY ((SELECT jsonb_array_elements_text(properties->'weight')::int))
AND    properties ? 'weight';  --  logically redundant, to get idx support

If, on the other hand, 'weight' is present in most rows or even just 20 % or so, consider adding a dedicated column to your table instead. NULL storage is very cheap. This would then be cleaner, more efficient and easier to query.

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633