That STUFF FOR XML PATH string concatenation technique sure is cute, but it does not scale very well and across millions of rows it is probably not a very good idea. For larger tables, you may have to write some good old-fashioned procedural SQL with a loop, something like this:
-- Create the working table ...
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp
SELECT ROW_NUMBER() OVER( PARTITION BY email ORDER BY prject_name ) rowId, email, CAST( prject_name AS VARCHAR(500 ) ) prject_name
INTO #tmp
FROM dbo.[UMG sent 2016]
GO
-- Index temp table
CREATE UNIQUE CLUSTERED INDEX _cdx ON #tmp ( rowId, email )
GO
SELECT TOP 100 'before' s, *
FROM #tmp
ORDER BY email
-- Loop through appending the projects
DECLARE @n INT = 1
WHILE @@ROWCOUNT != 0
BEGIN
IF @n > 99 BEGIN RAISERROR( 'Too many loops!', 16, 1 ) BREAK END -- Loop safety
SET @n += 1
UPDATE t
SET t.prject_name = CONCAT( t.prject_name, ', ', s.prject_name )
FROM #tmp t
INNER JOIN #tmp s ON t.email = s.email
WHERE t.rowId = 1
AND s.rowId = @n
END
GO
SELECT TOP 100 'after' s, *
FROM #tmp
WHERE rowId = 1
ORDER BY email
The concatenated result all ends up in 'bucket 1'. In my simple repro, with 2.6 million records with between 1 and 26 projects each, this script ran in a few minutes. Full repro script here.
Please bear in mind, this pattern is optimized for large tables with fewer items to concatenate. It also relies on the email/project combinations being unique, hence the primary key in my repro. There will be a tipping point where the STUFF technique is faster. There are also other techniques such as CLR, cursor even, which might suit depending on the distribution of your data.
Finally, can you please tell me more about your data so I can tweak my repro? For example, on average how many projects does each email have and what does the distribution look like?