0

I have a relation where a User has multiple dogs (as many as 15) but each dog is contained in a single row in the table, and they all have a userId in common.

For example, Table `dogs`:
|   User    |    Dog Name     |  Age
|   ABCD    |      Fido       |  7
|   ABCD    |      Gooofy     |  9
|   ABCD    |      Toto       |  4
|   ABCD    |      Roger      |  12
|   ABCD    |      Barkley    |  3

I would like to do something like SELECT * FROM dogs GROUP BY User where the result will output a csv of ABCDs dogs, as follows:

ABCD, Fido, 7, Goofy, 9, Toto, 4, Roger, 12, Barkley, 3

I'm using BigQuery but I think knowing how to do this in PostgreSQL may even be helpful. So maybe I can create some sort of view or something?

Vérace
  • 30,923
  • 9
  • 73
  • 85
Kamilski81
  • 231
  • 3
  • 13

1 Answers1

1

Discussion:

To resolve this, I did the two things - the first is that I answered the question as asked and then I answer the question in what I think is more robust way.

This "better" way is not to store age as a field, because by definition, age is mutable - it increments every year, so you should do any calculations of age on the fly relative to the date upon which the age was calculated by comparing it to something which is immutable, i.e. the birth date/year.

Part 1: Question as asked.

For the question as asked, see this fiddle for PostgreSQL code and this one for MySQL.

Create our table:

CREATE TABLE dog
(
  owner TEXT NOT NULL,
  name  TEXT NOT NULL,
  age   SMALLINT NOT NULL
);

Populate it:

INSERT INTO dog VALUES
('abc', 'fido', 4),
('abc', 'tim',  6),
('abc', 'max',  9),
('abc', 'fred', 1),
('xyz', 'joe',  4),
('xyz', 'jim',  2);

and then run this SQL:

SELECT 
  owner, 
  STRING_AGG(name || ' '  || age, ',' ORDER BY age) AS doggies
FROM dog
GROUP BY owner;

Result:

owner   doggies
abc     fred 1,fido 4,tim 6,max 9
xyz     jim 2,joe 4

It appears that Google's BigQuery system uses the STRING_AGG syntax.

Both queries work on the oldest system that I could find to test on - i.e. PostgreSQL 9.5 and MySQL 5.5 - this should get you started!

Part 2: Robust method.

Now, to answer the question in a more robust way. The dog table now has an immutable attribute - the year of birth from which we can always calculate an age now and at any time into the future. Note birth_year (which never changes) and not age.

We create a second table:

CREATE TABLE dog_2
(
  owner      TEXT NOT NULL,
  name       TEXT NOT NULL,
  birth_year SMALLINT NOT NULL
CHECK (birth_year <= DATE_PART('YEAR', CURRENT_DATE) 
   AND birth_year >= DATE_PART('YEAR', CURRENT_DATE) - 30),

-- Or the same thing written according to the SQL Standard.

CHECK (birth_year <= EXTRACT('YEAR' FROM CURRENT_DATE) 
   AND birth_year >= EXTRACT('YEAR' FROM CURRENT_DATE) - 30)

);

The 30 number is not plucked from the air - I got it from here! Then:

INSERT INTO dog_2 VALUES
('abc', 'fido', 2017),
('abc', 'tim',  2015),
('abc', 'max',  2012),
('abc', 'fred', 2020),
('xyz', 'joe',  2017),
('xyz', 'jim',  2019);

and then we run:

SELECT 
  owner,
  STRING_AGG
  (
    name || 
    ' '  || 
    (EXTRACT('YEAR' FROM CURRENT_DATE) - birth_year)::TEXT, ', ' ORDER BY birth_year DESC
  )
FROM
  dog_2
GROUP BY owner;

Result:

owner   string_agg
abc     fred 1, fido 4, tim 6, max 9
xyz     jim 2, joe 4
  • For PostgreSQL, see the same 9.5 fiddle - works for both!

  • For MySQL, there is a problem (quelle suprise)! It won't let non-deterministic function in CHECK constraints - although PostgreSQL does support some of them. To my mind, CURRENT_DATE() is deterministic, because for a given set of circumstances, it will always return the same value - unlike RAND() or UUID()... anyway, YMMV.

    However, MySQL has only had CHECK constraints since it's only had them since 2019 (a mere 27 years after the SQL Standard mandated them!!!), maybe they've some catching up to do! This means that the fiddle is also for version 8 - you can find it here.

So, the best I could do for MySQL was the following:

CREATE TABLE dog_2
(
  owner      TEXT NOT NULL,
  name       TEXT NOT NULL,
  birth_year SMALLINT NOT NULL

CHECK (birth_year < 2021 AND birth_year > 1990)

--

-- Won't work!

-- CHECK (birth_year <= YEAR(CURRENT_DATE()) AND birth_year >= YEAR(CURRENT_DATE()) - 30)

);

Note that this requires the CHECK constraint in the table definition to be updated every year - which is perhaps a bit better than having to update every record? You could perhaps even programme an EVENT?

And populate it as above, and then run the following SQL:

SELECT 
  owner, 
  GROUP_CONCAT(CONCAT
                 (
                   name, 
                   ' ',  
                   YEAR(CURRENT_DATE()) - birth_year
                 ) 
                 ORDER BY birth_year 
                 SEPARATOR ', '
              ) AS doggies
FROM dog_2
GROUP BY owner;

Result:

owner   doggies
abc     max 9, tim 6, fido 4, fred 1
xyz     joe 4, jim 2

As mentioned above, the BigQuery implementation appears to use the PostgreSQL implementation, so you might be in luck!

Vérace
  • 30,923
  • 9
  • 73
  • 85