3

I have a table that has path1, path2, and sha1 value. For any values of path2 and sha1, there can be multiple values of path1. I just want one of those paths. I don't really care which one.

I'm thinking I can do a group by for path2 and sha1. Now I just need to select one of the values of path1. I suppose I could select the minimum value of path1 but that would be doing extra work that isn't really needed.

Google tells me that Microsoft has "FIRST" but I don't see that in the postgres pages. Plus... I'd like to stick with normal SQL if possible.

pedz
  • 133
  • 4

2 Answers2

2

There are a bunch of ways you can do this, one of them is with DISTINCT ON as @Ypercube has suggested,

SELECT DISTINCT ON (path2, sha1) path2, sha1, path1
FROM table_name
ORDER BY path2, sha1;

You can also use an ordered-set aggregate which should generally be slower.

SELECT percentile_disc(0) WITHIN GROUP (ORDER BY path1) AS path1, path2, sha1
FROM table_name
GROUP BY path2, sha1;
Evan Carroll
  • 65,432
  • 50
  • 254
  • 507
0

A simple approach is to take the min/max of the path1:

select path2,sha1,max(path1) from table_name group by path2,sha1

This works in mysql also, where you don't have the window functions. Index on path2,sha1,path1 speeds up the query.