Gitlab's praefect enormous wal generation on postgresql

Question

We have praefect's database about 500 Mb in size, but wal generation per week is relly huge - about 1Tb. Most frequent query is being executed ~8 times/minute:

WITH lock AS (
    SELECT
        id
    FROM
        replication_queue_lock
    WHERE
        id LIKE $1 || $4 || $2 || $5 AND
        NOT acquired
    FOR UPDATE SKIP LOCKED
)
, candidate AS (
    SELECT
        id
    FROM
        replication_queue
    WHERE
        id IN (
            SELECT DISTINCT
                first_value(queue.id) OVER(PARTITION BY lock_id, job ->> $6 ORDER BY queue.created_at)
            FROM
                replication_queue queue
            JOIN
                lock
                    ON queue.lock_id = lock.id
            WHERE
                queue.state IN (
                    $7
                ,   $8
                ) AND
                NOT EXISTS(
                    SELECT
                        $9
                    FROM
                        replication_queue_job_lock
                    WHERE
                        lock_id = queue.lock_id
                )
        )
    ORDER BY
        created_at
    LIMIT $3
    FOR UPDATE
)
, job AS (
    UPDATE
        replication_queue queue
    SET
        attempt = CASE
            WHEN job ->> $10 = $11 THEN
                queue.attempt
            ELSE
                queue.attempt - $12
        END
    ,   state = $13
    ,   updated_at = timezone(
            $14
        ,   now()
        )
    FROM
        candidate
    WHERE
        queue.id = candidate.id
    RETURNING
        queue.id
    ,   queue.state
    ,   queue.created_at
    ,   queue.updated_at
    ,   queue.lock_id
    ,   queue.attempt
    ,   queue.job
    ,   queue.meta
)
, track_job_lock AS (
    INSERT INTO replication_queue_job_lock(
        job_id
    ,   lock_id
    ,   triggered_at
    )
    SELECT
        job.id
    ,   job.lock_id
    ,   timezone(
            $15
        ,   now()
        )
    FROM
        job
    RETURNING
        lock_id
)
, acquire_lock AS (
    UPDATE
        replication_queue_lock lock
    SET
        acquired = $16
    FROM
        track_job_lock tracked
    WHERE
        lock.id = tracked.lock_id
)
SELECT
    id
,   state
,   created_at
,   updated_at
,   lock_id
,   attempt
,   job
,   meta
FROM
    job
ORDER BY
    id;

Is there any way to tune application to slow this thing down?

score 0 · Answer 1 · answered Jan 30 '23 at 18:22

Your "lock" CTE is run to completion, even if the LIMIT means that only a few rows of it get used in a meaningful way. That means that every row which satisfies WHERE id LIKE ... gets locked, generating WAL. And then that is done over and over again, as only a LIMITed number of rows are changed in such a way that causes them to fail the WHERE clause next time.

Now we can't actually see how big your LIMIT is, or how many rows match the WHERE id LIKE ..., because you haven't given us that information. Nor do we know what the whole purpose of this query is, nor whether you can change it all since it seems to be coming from some product outside your control.

We also don't know why this is a problem. Is it too slow? Are you charged by the IO? Are you wearing out your SSD to quickly? Is your replication failing to keep up?

Gitlab's praefect enormous wal generation on postgresql

1 Answers1