8

I have a table that contains several keys into other tables (where each key is comprised of multiple columns). I would like to be able to group rows together that have an equal key, but I don't want to group all of them together. It's not a simple GROUP BY on the key but rather I want to be able to make groups of say 10. So if a particular key showed up 50 times I would get 5 results when I do this grouping (5 groups of 10). I also want this grouping to occur randomly within the key.

I didn't know of the direct way to do this, and the roundabout method I came up with isn't working like I think it should. The roundabout solution I came up with was to create a new column for each key that would be an integer such that value i represents the ith occurrence of that key (but in random order). I could then do integer division so that every n (say 10) rows within the key have the same value, and I could do a GROUP BY on that value.

Is there a more direct way to accomplish what I just described? It's quite awkward, and I ran into problems in creating the new index column (as I described in this question).

EDIT: First of all note that this is for MySQL. I'll add an example in case my goal is not clear. The MySQL docs show a method to get almost there:

CREATE TABLE animals (
    grp ENUM('fish','mammal','bird') NOT NULL,
    id MEDIUMINT NOT NULL AUTO_INCREMENT,
    name CHAR(30) NOT NULL,
    PRIMARY KEY (grp,id)
) ENGINE=MyISAM;

INSERT INTO animals (grp,name) VALUES
    ('mammal','dog'),('mammal','cat'),
    ('bird','penguin'),('fish','lax'),('mammal','whale'),
    ('bird','ostrich');

SELECT * FROM animals ORDER BY grp,id;

This creates a table which, although not what I want, gets close:

+--------+----+---------+
| grp    | id | name    |
+--------+----+---------+
| fish   |  1 | lax     |
| mammal |  1 | dog     |
| mammal |  2 | cat     |
| mammal |  3 | whale   |
| bird   |  1 | penguin |
| bird   |  2 | ostrich |
+--------+----+---------+

I would essentially like to GROUP BY id, except I would want the records with mammal to have one "group" for IDs 1-10, another "group" for IDs 11-20, etc. However, I would be doing this with an existing table, and I wouldn't necessarily want "dog" to show up with ID 1. I would want that initial ordering to be random, but then deterministic from then out.

Michael McGowan
  • 799
  • 2
  • 10
  • 20

4 Answers4

5

What about doing a little math against your ID column to dynamically generate the group?

SELECT grp, FLOOR(id/10) AS id_grp
FROM animals
GROUP BY grp, id_grp

This would give you groups of 10 based on the ID of the record. I used your animals table above to generate the data below.

Sample data

 INSERT INTO animals VALUES
 ('mammal',10,'dog'),('mammal',11,'dog'),('mammal',12,'dog'),
 ('mammal',21,'cat'),('mammal',22,'cat'),('mammal',23,'cat'),
 ('mammal',24,'cat'),('mammal',25,'cat'),('mammal',26,'cat'),
 ('bird',30,'penguin'),('bird',31,'penguin'),('bird',32,'penguin'),
 ('bird',33,'penguin'),('fish',44,'lax'),('fish',45,'lax'),
 ('fish',46,'lax'),('fish',47,'lax'),('fish',48,'lax'),
 ('mammal',31,'whale'),*'fish',51,'lax'),('fish',52,'lax'),
 ('fish',53,'lax'),('fish',54,'lax'),('bird',10,'ostrich');

Query Output

 +--------+--------+
 | grp    | id_grp |
 +--------+--------+
 | fish   |      4 |
 | fish   |      5 |
 | mammal |      1 |
 | mammal |      2 |
 | mammal |      3 |
 | bird   |      1 |
 | bird   |      3 |
 +--------+--------+
 7 rows in set (0.00 sec)
nabrond
  • 275
  • 1
  • 5
2

In SQL generally this would be :

  • a DISTINCT subselect
  • JOIN back to main table on DISTINCT keys
  • NTILE with PARTITION BY on DISTINCT keys and an ORDER BY to create buckets

It isn't an aggregate so GROUP BY isn't needed

Edit:

Actually, NTILE is enough by itself to create "n buckets per set of distinct values"

gbn
  • 70,237
  • 8
  • 167
  • 244
1

I'm still not seeing any complete solutions (that actually work in MySQL), so this is the solution I will probably use:

  1. Generate the random IDs outside of SQL entirely (in some sort of script)
  2. Apply integer division on those IDs to group those accordingly.

I'm still hoping somebody can beat this answer; I don't want to have to accept my own answer. I've said this before, but I knew from the beginning how to do #2; #1 is what's been troubling me. If you can answer #1 then you actually answer another question as well, but it might be possible to answer this question in some other way so as to bypass #1.

Michael McGowan
  • 799
  • 2
  • 10
  • 20
0
-- Change 'ValueField' to whatever provides your 'group' values

set @rownum := 0;
set @groupnum := 0;
set @lastGroup := 0;

select
    ValueField, 
    Grouping, 
    count(1) as Count
from
    (
        -- We have a row number for each record
        select
            -- Set the record number
            case when @lastGroup != ValueField 
                then @rownum := 0 else (@rownum := @rownum + 1) 
            end as Record, 

            -- Determine which group we are in
            case
                -- If the 'Group' changed, reset our grouping
                when @lastGroup != ValueField 
                    then @groupnum := 0

                -- Determines the grouping value; group size is set to 10
                when floor(@rownum / 10) != @groupnum 
                    then @groupnum := @groupnum + 1 
                else @groupnum
            end as Grouping,

            -- Track the last Group
            case 
                when @lastGroup != ValueField 
                    then @lastGroup := ValueField 
                else @lastGroup 
            end as LastGroup,

            -- Value field that will be aggregated
            ValueField 
        from 
            YourTable
        order by 
            ValueField
    ) as x
group by
    ValueField, 
    Grouping;
dba4life
  • 346
  • 1
  • 3