Example of parallel sum algorithm on GPU

Question

Trying to imagine how you would go about implementing summation (or reduction?) on a parallel architecture and am having a difficult time.

Specifically thinking in terms of WebGL arrays of vectors such as this:

[integer1, integer2, integer3, ...]

Wondering if there is a way to do this in parallel

var array = [ 1, 2, 3, 4, 5, ... ]

var sum = 0
array.forEach(function(number){
  sum += number
})

return sum

The problem I am encountering is I think mentioned here in this NVidia document on GPU reduction (not quite following):

how do we communicate partial results between thread blocks?

Wondering if one can describe an algorithm that computes the sum of something in parallel, since there is a shared variable that needs to be used somehow (the sum variable). Or perhaps this isn't possible.

score 1 · Answer 1 · answered Apr 25 '18 at 15:15

The short answer is that you don't share the accumulator. You have a separate accumulator for each adder, and start each at 0. The final step is to sum the results.

This generalises beyond arithmetic addition, as a parallel reducer will assume that it's arguments form a Monoid.

Suppose that S is a set and • is some binary operation S × S → S, then S with • is a monoid if it satisfies the following two axioms:

Associativity

For all a, b and c in S, the equation (a • b) • c = a • (b • c) holds.

Identity element

There exists an element e in S such that for every element a in S, the equations e • a = a • e = a hold.

Where S is the set of all values of a particular type, • is the operation, and the identity e is the initial value (or alternatively that a default instance of S is e)

Example of parallel sum algorithm on GPU

1 Answers1