5

I was checking selectivity of some columns for an index.
Where is this "ignore what I give you" behaviour documented?

This gives 4,851,908, 4,841,060, and 1,000,052

SELECT
     COUNT(*), 
     COUNT(DISTINCT Col1), COUNT(DISTINCT Col2)
FROM Sometable;

This gives 4,843,634 unique pairs as per MySQL extension

SELECT COUNT(DISTINCT Col1, Col2) FROM Sometable

The following are wrong: the individual COUNT(DISTINCT colx) all give the 4,843,634 unique pair count regardless of any filler column or expression order.

I expected COUNT(DISTINCT Col1) = 4,841,060, and COUNT(DISTINCT Col1) = 1,000,052.

SELECT COUNT(DISTINCT Col1), COUNT(DISTINCT Col2) FROM Sometable

SELECT COUNT(DISTINCT Col2), COUNT(DISTINCT Col1) FROM Sometable

SELECT COUNT(DISTINCT Col1), 1 AS Filler, COUNT(DISTINCT Col2) FROM Sometable

But this give correct values again with another aggregate (like with COUNT(*) above)

SELECT COUNT(DISTINCT Col1), MAX(col1) AS Filler, COUNT(DISTINCT Col2) FROM Sometable

Questions, in case it wasn't clear:

  • Why does COUNT(DISTINCT Col1), COUNT(DISTINCT Col2) behave like COUNT(DISTINCT Col1, Col2)
  • Why is another aggregate required to make it work?
gbn
  • 70,237
  • 8
  • 167
  • 244

2 Answers2

7

It looks like you are hitting this regression bug:

select count(distinct N1), count(distinct N2) from test.AA" works incorrectly
...
"This bug happens when a unique index exists"

One of the suggested workarounds is to use sql_buffer_result

Jack Douglas
  • 40,517
  • 16
  • 106
  • 178
0

Without seeing your exact results, I'm not sure that I understand what the problem is. I tried this on a random table on my machine and got back the results that I expected.

mysql> select count(*), count(distinct location_country), count(distinct referer_name) from piwik_log_visit;
+----------+----------------------------------+------------------------------+
| count(*) | count(distinct location_country) | count(distinct referer_name) |
+----------+----------------------------------+------------------------------+
|    44176 |                              109 |                          291 |
+----------+----------------------------------+------------------------------+
1 row in set (0.81 sec)


mysql> select count(distinct location_country,referer_name) from piwik_log_visit;
+-----------------------------------------------+
| count(distinct location_country,referer_name) |
+-----------------------------------------------+
|                                           932 |
+-----------------------------------------------+
1 row in set (0.19 sec)

mysql> select count(distinct location_country), count(distinct referer_name) from piwik_log_visit;
+----------------------------------+------------------------------+
| count(distinct location_country) | count(distinct referer_name) |
+----------------------------------+------------------------------+
|                              109 |                          291 |
+----------------------------------+------------------------------+
1 row in set (0.16 sec)

mysql> select count(distinct referer_name), count(distinct location_country) from piwik_log_visit;
+------------------------------+----------------------------------+
| count(distinct referer_name) | count(distinct location_country) |
+------------------------------+----------------------------------+
|                          291 |                              109 |
+------------------------------+----------------------------------+
1 row in set (0.16 sec)

mysql> select count(distinct location_country), 1 as filler, count(distinct referer_name) from piwik_log_visit;
+----------------------------------+--------+------------------------------+
| count(distinct location_country) | filler | count(distinct referer_name) |
+----------------------------------+--------+------------------------------+
|                              109 |      1 |                          291 |
+----------------------------------+--------+------------------------------+
1 row in set (0.16 sec)

mysql> select count(distinct location_country), max(location_country) as filler, count(distinct referer_name) from piwik_log_visit;
+----------------------------------+--------+------------------------------+
| count(distinct location_country) | filler | count(distinct referer_name) |
+----------------------------------+--------+------------------------------+
|                              109 | zw     |                          291 |
+----------------------------------+--------+------------------------------+
1 row in set (0.25 sec)

You say

These all give 4,843,634 too which isn't what I asked for

SELECT COUNT(DISTINCT Col1), COUNT(DISTINCT Col2) FROM Sometable

SELECT COUNT(DISTINCT Col2), COUNT(DISTINCT Col1) FROM Sometable

SELECT COUNT(DISTINCT Col1), 1 AS Filler, COUNT(DISTINCT Col2) FROM Sometable

but that doesn't make any sense. The first two queries should return two columns, the last one should return 3.

Can you provide your actual results inline along with what you were expecting to see and maybe we can figure out if there is an actual problem or if you are simply misunderstanding something.

For reference, I ran this on Percona Server 5.5.16

mysql> select @@version;
+---------------+
| @@version     |
+---------------+
| 5.5.16-55-log |
+---------------+
1 row in set (0.00 sec)

EDIT: I also tried this on a different dataset with ~5MM rows and got back the same results...everything checked out. This was on Percona Server 5.1.43

Aaron Brown
  • 5,140
  • 25
  • 25