1

We have two tables:

CREATE TABLE listing_master (
    id SERIAL PRIMARY KEY,
    feed_id character varying(20) NOT NULL,
    feed_listing_id character varying(20) NOT NULL,
    updated_time timestamp(0) without time zone NOT NULL
);

-- Indices -------------------------------------------------------

CREATE UNIQUE INDEX listing_master_pkey ON listing_master(id int4_ops); CREATE UNIQUE INDEX listing_master_feed_id_feed_listing_id_idx ON listing_master(feed_id text_ops,feed_listing_id text_ops);

And

CREATE TABLE listing (
    id SERIAL PRIMARY KEY,
    mls_num character varying(20),
    feed_listing_id character varying(20) NOT NULL,
    feed_id character varying(255) NOT NULL,
    list_price double precision,
    postal_code character varying(255) DEFAULT NULL::character varying,
    photos_count integer,
    unparsed_address character varying(255) DEFAULT NULL::character varying,
    city character varying(255) DEFAULT NULL::character varying,
    status character varying(20) NOT NULL DEFAULT 'new'::character varying,
    processing_status character varying(20) NOT NULL DEFAULT 'none'::character varying,
    last_update_from_feed timestamp(0) without time zone DEFAULT NULL::timestamp without time zone,
    images_data jsonb,
    coordinates point,
    raw_data jsonb,
    deleted_date timestamp(0) without time zone DEFAULT NULL::timestamp without time zone,
    state_or_province text,
    country text,
    type character varying(40) DEFAULT NULL::character varying,
    ownership_type character varying(40) DEFAULT NULL::character varying,
    bedrooms integer,
    living_area integer,
    lot_size integer,
    year_built integer
);
COMMENT ON COLUMN listing.images_data IS '(DC2Type:json_array)';

-- Indices -------------------------------------------------------

CREATE UNIQUE INDEX listing_pkey ON listing(id int4_ops); CREATE UNIQUE INDEX listing_feed_id_feed_listing_id_idx ON listing(feed_id text_ops,feed_listing_id text_ops); CREATE UNIQUE INDEX listing_mls_num_feed_id_state_or_province_idx ON listing(mls_num text_ops,feed_id text_ops,state_or_province text_ops) WHERE state_or_province IS NOT NULL AND status::text = 'live'::text AND mls_num IS NOT NULL AND deleted_date IS NULL;

listing_master is populated from an API call and holds all the records that eventually have to exist in the listing table.

So, let's assume listing_master has 100k records listing has somewhat less

In order to verify that I ran this query:

select * 
from listing_master 
where feed_id = 'ddf' 
 and feed_listing_id not in (select feed_listing_id 
                             from listing 
                             where feed_id = 'ddf' 
                               and deleted_date is null );

And, as promised, this gave me 136 records back.

Now, I want to populate the missing listings from listing_master into listing, and trying it this way:

insert into listing(feed_id,feed_listing_id,status,processing_status) 
select lm.feed_id, lm.feed_listing_id, 'new' as status,'none' as processing_status 
from listing_master lm 
on conflict (feed_id,feed_listing_id) do nothing;

For some reason, this inserts 0 records, while it clearly should add 136 of them. What am I missing?

I am in PostgreSQL 12.

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
Alex A.
  • 163
  • 1
  • 8

1 Answers1

2

I see an elephant in the room:

... and deleted_date is null

There can be rows with non-null deleted_date, which are ignored by your test with SELECT but still conflict in the unique index on (feed_id,feed_listing_id).

Aside, NOT IN (SELECT ...) is almost always a bad choice. Even if it's not going to break completely with all NOT NULL columns in your case, it still typically delivers bad performance. See:

Use NOT EXISTS instead:

SELECT * 
FROM   listing_master  m
WHERE  feed_id = 'ddf' 
AND    NOT EXISTS (
   SELECT FROM listing 
   WHERE  feed_id = 'ddf' 
   AND    feed_listing_id = m.feed_listing_id
   AND    deleted_date is null                 -- ?????
   );

What's more, even with deleted_date is null out of the way, this expectation of yours is strictly incorrect:

... while it clearly should add 136 of them.

You only tested for feed_id = 'ddf'. There can be any number of additional rows with a different feed_id. So it would be "at least 136 of them".


If that's not it, we need to talk about possible concurrent write access. Related:

Or possible index corruption. In this case the cure would be:

REINDEX INDEX listing_feed_id_feed_listing_id_idx;
Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633