I'm trying to find % of unique pairs that exist in two columns. For example
Where 1,6 and 6,1 / 2,3 and 3,2 are unique pairs. So the % of matched pairs is 33%
I'm trying to find % of unique pairs that exist in two columns. For example
Where 1,6 and 6,1 / 2,3 and 3,2 are unique pairs. So the % of matched pairs is 33%
Doesn't make much sense:
Notwithstanding the objection above, what you can do is something like this, using the LEAST and GREATEST (almost standard - see discussion below) SQL functions.
This example (see fiddle) is from PostgreSQL, however see discussion at the end.
CREATE TABLE test (col1 INTEGER, col2 INTEGER);
populate with your data:
INSERT INTO test VALUES (1, 6), (2,3), (3, 2), (4, 7), (5, 8), (6, 1);
First query:
SELECT LEAST(col1, col2) AS mn_c, GREATEST(col1, col2) AS mx_c,
COUNT(*)
FROM test
GROUP BY 1, 2
Result:
mn_c mx_c count
2 3 2
4 7 1
1 6 2
5 8 1
Then:
SELECT COUNT(cnt1) AS matched_count,
ROUND(COUNT(cnt1)/(SELECT COUNT(*) FROM test)::FLOAT * 100) AS percentage
FROM
(
SELECT LEAST(col1, col2) AS mn_c, GREATEST(col1, col2) AS mx_c,
COUNT(*) AS cnt1
FROM test
GROUP BY 1, 2
HAVING COUNT(*) > 1
) AS t
Result:
matched_count percentage
2 33
A version of the above code should work for most servers - see here for a discussion of the LEAST and GREATEST functions in other servers - works pretty much for all of them except MS SQL Server.
p.s. have you considered what happens if you have duplicate columns as follows?
INSERT INTO test VALUES (1, 6), (2,3), (3, 2), (4, 7), (5, 8), (6, 1), (5,5), (5,5)
Note the duplication of a duplication - (5,5) - see here for the differences which emerge between my approach and that of @McNets. My solution says that there are 3/8 matched pairs, but McNets says different. Not sure if I completely understand what his SQL is doing?
Anyway, interesting question (+1) - why do you want to do this? p.s. welcome to the forum! :-)
You may want to do as follows (if your DB is MS SQL)
Create table #Pairs
(c1 int,
c2 int
);
go
insert into #Pairs
values
(1,6),
(2,3),
(3,2),
(4,7),
(5,8),
(6,1)
go
select p1.c1, p1.c2,
p2.PairCount,
--(select count (*) from PairTest) as TotalPairs,
(p2.PairCount + 0.0) / (select count (*) from PairTest) as Pair_Percent
from #Pairs as p1
join
(
select (c1+ c2) as Pair, COUNT (*) PairCount
from #Pairs
group by (c1+ c2)
) as p2 on (p1.c1 + p1.c2) = p2.pair
go
Drop table #Pairs;