How to perform schema migration from array to many-to-many to lookup table?

Question

I currently am storing values in a column of type array like follows (this is just an example):

CREATE TABLE  blog_posts(
    id INTEGER PRIMARY KEY,
    blog_title TEXT,
    tags TEXT[]
);

With the column tags being an array of text values. I have since realized that due to the way I will be querying this data, I want to switch from using an array column to store my labels to a many-to-many lookup table instead.

CREATE TABLE  blog_posts(
    id INTEGER PRIMARY KEY,
    blog_title TEXT,
);

CREATE TABLE tags(
    id INTEGER PRIMARY KEY,
    label TEXT

);

CREATE TABLE blog_tags(
    blog_id INTEGER REFERENCES blog_posts(id),
    tag_id INTEGER REFERENCES tags(id)
);

I know how to do the alter schema commands needed to perform the migration for the schema itself but my question is how can I write a migration that will correctly transfer the data from the array to the new tables as well? (Also, I would preferably want the equivalent down migration)

Should I just write the SQL for the SELECTs and INSERTs that would be needed to accomplish this and have those run as part of the migration? Or is having any sort of SELECT or INSERT in a migration script considered a bad practice?

Should I even be trying to do a scripted migration like this at all and should this just be a manual step I have to do?

Is there some other method I have completely missed here?

Doing the needed SELECTs and INSERTs as part of the migration seems like the right approach, but I have never done a migration that required that so I find myself uncertain if that is the right approach.

This is all on PostgreSQL 9.6

score 1 · Answer 1 · answered Jan 24 '20 at 21:01

Basically, I suggest a many-to-many design like this:

CREATE TABLE blog (              -- I suggest singular terms
   blog_id serial PRIMARY KEY    -- and descriptive names
 , blog    text NOT NULL         -- NOT NULL!
);

CREATE TABLE tag (
   tag_id serial PRIMARY KEY
 , tag    text NOT NULL          -- NOT NULL!
);

CREATE TABLE blog_tag (
   blog_id int REFERENCES blog
 , tag_id  int REFERENCES tag
 , PRIMARY KEY (blog_id, tag_id) -- !
);

See:

How to implement a many-to-many relationship in PostgreSQL?

And the migration can be done efficiently with a a single command using data-modifying CTEs:

WITH org AS (TABLE blog_posts)       -- read original table once
, ins_blog AS (                      -- fill table blob ...
   INSERT INTO blog (blog_id, blog)  -- ... keeping original IDs for simplicity
   SELECT id, blog_title
   FROM   org
   )
, bt AS (
   SELECT unnest(tags) AS tag, id    -- unnest tags & keep link to blog_id
   FROM   org
   )
, ins_tag AS (                       -- fill table tag ...
   INSERT INTO tag (tag)
   SELECT DISTINCT tag               --  ... with distinct tags
   FROM   bt
   ORDER  BY 1                       -- optionally order tags while being at it
   RETURNING tag_id, tag             -- new tag_id + connected tag
   )
INSERT INTO blog_tag (blog_id, tag_id)
SELECT id, tag_id
FROM   bt
JOIN   ins_tag USING (tag)           -- join to unnested tags, voilá
ORDER BY 1,2;  -- optional

Insert data in 3 tables at a time using Postgres

Don't forget to sync the sequence for table blog:

SELECT setval(pg_get_serial_sequence('blog', 'blog_id'), max(blog_id)) FROM blog;  -- sync blog serial

See:

How to reset Postgres' primary key sequence when it falls out of sync?

Consider an additional index on (tag_id, blog_id). See:

Is a composite index also good for queries on the first field?

db<>fiddle here

How to perform schema migration from array to many-to-many to lookup table?

1 Answers1