I have a table in Hive which looks like:
col1 col2
b 1
b 2
a 3
b 2
c 4
c 5
How do I, with hiveql, group up col1 elements together, sum them up, sort by the sum, as well as create a cumulative sum (csum) based on the sum?
id sum_all csum
a 3 3
b 5 8
c 9 17
I've only managed to come up with the grouping and sum, but am out of ideas regarding the cumulative sum. Correlated sub-queries are not supported in Hive
select col1 as id
sum(col2) as sum_all
from t
group by col1
order by sum_all
With the following result:
id sum_all
a 3
b 5
c 9