Pairing Row Values in Any Order

Question

I'm trying to find % of unique pairs that exist in two columns. For example

Where 1,6 and 6,1 / 2,3 and 3,2 are unique pairs. So the % of matched pairs is 33%

Vérace · Answer 1 · 2019-09-08T00:14:19.950

Doesn't make much sense:

the number of pairs is 6, -- not relevant to the question...
the number of unique permuted pairs is 4 - far more relevant...
There are 2 matched pairs so I make the number of unique to matched pairs at 4/2, i.e. 50%...

Notwithstanding the objection above, what you can do is something like this, using the LEAST and GREATEST (almost standard - see discussion below) SQL functions.

This example (see fiddle) is from PostgreSQL, however see discussion at the end.

CREATE TABLE test (col1 INTEGER, col2 INTEGER);

populate with your data:

INSERT INTO test VALUES (1, 6), (2,3), (3, 2), (4, 7), (5, 8), (6, 1);

First query:

SELECT LEAST(col1, col2) AS mn_c, GREATEST(col1, col2) AS mx_c,
COUNT(*) 
FROM test
GROUP BY 1, 2

Result:

mn_c    mx_c    count
   2       3        2
   4       7        1
   1       6        2
   5       8        1

Then:

SELECT COUNT(cnt1) AS matched_count,
       ROUND(COUNT(cnt1)/(SELECT COUNT(*) FROM test)::FLOAT * 100) AS percentage
FROM 
(
  SELECT LEAST(col1, col2) AS mn_c, GREATEST(col1, col2) AS mx_c,
  COUNT(*) AS cnt1 
  FROM test
  GROUP BY 1, 2
  HAVING COUNT(*) > 1
) AS t

Result:

matched_count   percentage
            2           33

A version of the above code should work for most servers - see here for a discussion of the LEAST and GREATEST functions in other servers - works pretty much for all of them except MS SQL Server.

p.s. have you considered what happens if you have duplicate columns as follows?

INSERT INTO test VALUES (1, 6), (2,3), (3, 2), (4, 7), (5, 8), (6, 1), (5,5), (5,5)

Note the duplication of a duplication - (5,5) - see here for the differences which emerge between my approach and that of @McNets. My solution says that there are 3/8 matched pairs, but McNets says different. Not sure if I completely understand what his SQL is doing?

Anyway, interesting question (+1) - why do you want to do this? p.s. welcome to the forum! :-)

score 2 · Answer 2 · answered Aug 21 '19 at 16:33

Using standard SQL could be:

SELECT
  COUNT(*) * 100 / (SELECT COUNT(*) FROM t)
FROM
    t t1
WHERE NOT EXISTS (SELECT 1
                  FROM   t t2
                  WHERE  t2.col1 = t1.col2
                  AND    t2.col2 = t1.col1);

| Percent |
| ------: |
|      33 |

db<>fiddle here

score 0 · Answer 3 · answered Aug 21 '19 at 17:11

You may want to do as follows (if your DB is MS SQL)

Create table #Pairs
(c1 int,
 c2 int
);
go

insert into #Pairs
values 
(1,6),
(2,3),
(3,2),
(4,7),
(5,8),
(6,1)
go

select p1.c1, p1.c2, 
        p2.PairCount,
        --(select count (*) from PairTest) as TotalPairs,
        (p2.PairCount + 0.0) / (select count (*) from PairTest) as Pair_Percent
from #Pairs as p1
    join 
        (
        select (c1+ c2) as Pair, COUNT (*) PairCount
        from #Pairs
        group by (c1+ c2)
        ) as p2 on (p1.c1 + p1.c2) = p2.pair
go

Drop table #Pairs;

Pairing Row Values in Any Order

3 Answers3

Linked