6

I've updated this post with some test data.

I'm creating a report for my Movies database where I'd like for the end-user to be able to select movies of a certain genre. However, some movies have multiple genres and I've normalized the database so that a movie line item that had more than one genre turns into multiple line items, each pointing to the respective genre/genreID. (A similar thing has been done for directors).

Before Normalization

Movie Genre
Bride of Frankenstein Horror, Drama

After Normalization

Movie Genre
Bride of Frankenstein Horror
Bride of Frankenstein Drama

The problem that I have is, for the sake of this report, I would like to make it so that the movies do not repeat in the report if they have multiple genres. But rather, the movie title becomes one line item and the genres field is concatenated to show all of the genres within the one line (similar to the before normalization view). What I've ended up doing is creating a view where I cross apply the output of the movie selection by the genres that match the movie ID. I feel like I've overcomplicated it a bit and my stores procedure is running pretty slow, as I have a couple of other fields that I am allowing the user to filter on as well.

Here's a look at the cross apply view.

ALTER VIEW [dbo].[vwMoviesJoinedGenres] AS

WITH genreMovies_CTE AS ( SELECT M.MovieID , M.MovieTitle , G.GenreName , G.GenreID , M.TitleTypeID , TT.TitleType , M.MediaID , M.IMDBLink , M.IMDBRating , M.ReleaseDate , M.Runtime , M.ImageURL , M.MovieYear FROM [dbo].[Movies] AS M INNER JOIN GenresMovies AS GM ON GM.MovieID = M.MovieID INNER JOIN Genres AS G ON G.GenreID = GM.GenreID INNER JOIN TitleType AS TT ON TT.TitleTypeID = M.TitleTypeID )
SELECT DISTINCT MovieID, ImageURL, MovieYear, MovieTitle, TitleType, SUBSTRING(G.genres, 0, LEN(G.genres)) AS Genres, GenreID, TitleTypeID, MediaID, IMDBLink, IMDBRating, ReleaseDate, Runtime FROM genreMovies_CTE CROSS APPLY ( SELECT DISTINCT GenreName + ', ' FROM Genres AS G INNER JOIN GenresMovies AS GM ON GM.GenreID = G.GenreID WHERE genreMovies_CTE.MovieID = GM.MovieID FOR XML PATH('')
) G(genres) GO

I then use this view (and a similar view for directors) in the stored procedure below.

USE [Movies]
GO

SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO

--==================================================== ALTER PROCEDURE [dbo].[usp_MovieByGenreUsingViews] -- Add the parameters for the stored procedure here @GenreID nvarchar(255) , @MediaID nvarchar(255) , @TitleTypeID nvarchar(255) WITH RECOMPILE

AS BEGIN

SET NOCOUNT ON;

WITH genresMovies_CTE AS (SELECT DISTINCT JG.[MovieID] , JG.[MovieTitle] , JG.Genres , JG.[TitleTypeName] , JG.[TitleTypeID] , JG.[MediaID] , Me.MediaType , JD.Directors FROM [dbo].[vwMoviesJoinedGenres] AS JG --JoinedGenres INNER JOIN [dbo].[vwMoviesJoinedDirectors] AS JD ON JG.MovieID = JD.movieID INNER JOIN Media AS Me ON Me.MediaTypeID = JG.MediaTypeID WHERE GenreID IN (SELECT Number FROM fn_SplitInt) AND JG.MediaTypeID IN (SELECT Number FROM fn_SplitInt) AND JG.TitleTypeID IN (SELECT Number FROM fn_SplitInt) ) SELECT MovieTitle, Genres, Directors, TitleType, MediaType FROM genresMovies_CTE ORDER BY movietitle

END

**Updating with a very small sample data set as requested. I've simplified a lot obviously to save time and have decided to focus on Genres only.

Creating Tables and Inserting Data

USE [Test Movies];
GO
--DROP TABLE IF EXISTS MovieDetails;
CREATE TABLE MovieDetails 
(
    MovieID int NOT NULL 
    , MovieTitle nvarchar(255)
    , Constraint MovieID PRIMARY KEY (MovieID)
);

--DROP TABLE IF EXISTS Genres; CREATE TABLE Genres( GenreID tinyint NOT NULL Identity(1,1) , GenreName varchar(50) NOT NULL , CONSTRAINT GenreID PRIMARY KEY (GenreID) )

--DROP TABLE IF EXISTS MovieGenre; CREATE TABLE MovieGenre ( MovieID int NOT NULL , GenreID tinyint NOT NULL , CONSTRAINT GenresMovies PRIMARY KEY (MovieID, GenreID) );

INSERT INTO MovieDetails (MovieID, MovieTitle) VALUES (1, 'Forest Gump') , (2, 'Eyes Wide Shut') , (3, 'Kelly''s Heroes') , (4, 'Where Eagles Dare') , (5, 'Star Trek: First Contact') , (6, 'The Ten Commandments') , (7, 'Clash of the Titans') , (8, 'AVP: Alien vs. Predator') , (9, 'Batman Returns') , (10, 'Crash')

INSERT INTO Genres (GenreName) VALUES ('Drama'), ('Adventure'), ('Family'), ('Horror'), ('Crime');

INSERT INTO MovieGenre (MovieID, GenreID) VALUES (1,1), (2,1), (3,2), (4,2), (5,1), (5,2), (6,1), (6,2), (7,2), (7,3), (8,4), (9,5), (10,1);

Code to splitInt Function found online

ALTER FUNCTION [dbo].[fn_SplitInt]
(
    @List       nvarchar(4000),
    @Delimiter  char(1)= ','
)
RETURNS @Values TABLE
(
    Position int IDENTITY PRIMARY KEY,
    Number int
)

AS

BEGIN

-- set up working variables DECLARE @Index INT DECLARE @ItemValue nvarchar(100) SELECT @Index = 1

-- iterate until we have no more characters to work with WHILE @Index > 0

BEGIN

  -- find first delimiter
  SELECT @Index = CHARINDEX(@Delimiter,@List)

  -- extract the item value
  IF @Index  > 0     -- if found, take the value left of the delimiter
    SELECT @ItemValue = LEFT(@List,@Index - 1)
  ELSE               -- if none, take the remainder as the last value
    SELECT @ItemValue = @List

  -- insert the value into our new table
  INSERT INTO @Values (Number) VALUES (CAST(@ItemValue AS int))

  -- remove the found item from the working list
  SELECT @List = RIGHT(@List,LEN(@List) - @Index)

  -- if list is empty, we are done
  IF LEN(@List) = 0 BREAK

END

RETURN

END

Joining Genres using STRING_AGG

PROCEDURE [dbo].[usp_MovieByGenreStrAgg] 
    @GenreID nvarchar(255)
WITH RECOMPILE

AS BEGIN SET NOCOUNT ON;

SELECT DISTINCT movieTitleID.movieID , movieTitleID.movietitle , movieTitleID.genres FROM (SELECT MD.MovieID , MD.MovieTitle , STRING_AGG(G.GenreName, ', ') AS Genres FROM MovieDetails AS MD INNER JOIN MovieGenre AS MG ON MG.MovieID = MD.MovieID INNER JOIN Genres AS G ON G.GenreID = MG.GenreID GROUP BY MD.MovieID, MD.MovieTitle) AS movieTitleID INNER JOIN MovieGenre AS MG ON MG.MovieID = movieTitleID.MovieID INNER JOIN Genres AS G ON G.GenreID = MG.GenreID WHERE G.GenreID IN (SELECT Number FROM fn_SplitInt)

END

Joining Genres using Cross Apply

CREATE PROCEDURE [dbo].[usp_MovieByGenreCrsApply] 
    @GenreID nvarchar(255)
WITH RECOMPILE

AS BEGIN SET NOCOUNT ON;

SELECT movieTitleID.MovieID , movieTitleID.MovieTitle , SUBSTRING(G.genres, 0, LEN(G.genres)) AS genres FROM ( SELECT MovieID , MovieTitle FROM MovieDetails

) 
AS movieTitleID
CROSS APPLY
(
SELECT DISTINCT G.GenreName + ', '
FROM Genres AS G
INNER JOIN MovieGenre AS MG
    ON MG.GenreID = G.GenreID
WHERE movieTitleID.MovieID = MG.MovieID 
    AND G.GenreID IN (SELECT Number FROM [fn_SplitInt](@GenreID, ','))
FOR XML PATH('')        
) G(genres)
WHERE G.genres IS NOT NULL; 

END

When I executed the separate statements, I noticed something I hadn't before.

exec [usp_MovieByGenreStrAgg] '2,3' -- where 2 is Adventure, 3 is Family

MovieID MovieTitle Genres
3 Kelly's Heroes Adventure
4 Where Eagles Dare Adventure
5 Star Trek: First Contact Drama, Adventure
6 The Ten Commandments Drama, Adventure
7 Clash of the Titans Adventure, Family

exec [usp_MovieByGenreCrsApply] '2,3' -- where 2 is Adventure, 3 is Family

MovieID MovieTitle Genres
3 Kelly's Heroes Adventure
4 Where Eagles Dare Adventure
5 Star Trek: First Contact Adventure
6 The Ten Commandments Adventure
7 Clash of the Titans Adventure, Family

String_Agg returns all of the genres of a given movie whose genre has been selected, even if only one of the genres were selected as a parameter value. Cross apply returned only the genres that were selected. I realized I prefer when it displays only the genres I'm interested in.

However, I noticed that when I save my cross apply in a view first and then apply the parameters, it behaves effectively like my String_Agg procedure. I think I prefer the way cross applies behaves (without having to save it in a view first), but I'm pretty sure that it's much slower this way without a view, so back at square one with the sloweness. I hope this makes sense.

ck123
  • 61
  • 4

1 Answers1

1

Having your split function inside an APPLY is a bad idea. It potentially could be evaluated over and over, killing performance. You should put those values in a temp table or table variable.

Your STRING_AGG is returning categories other than those you selected because you are building the list, then filter (via the additional joins and where clause. Move the WHERE clause inside the grouping and you will only get the categories you selected.

In either case, you can simplify your queries a fair amount. Here is some POC code for either style ('just selected' or 'all'):

DROP TABLE IF EXISTS #MovieDetails
DROP TABLE IF EXISTS #Genres
DROP TABLE IF EXISTS #MovieGenre

CREATE TABLE #MovieDetails ( MovieID int NOT NULL , MovieTitle nvarchar(255) , Constraint MovieID PRIMARY KEY (MovieID) );

--DROP TABLE IF EXISTS Genres; CREATE TABLE #Genres( GenreID tinyint NOT NULL Identity(1,1) , GenreName varchar(50) NOT NULL , CONSTRAINT GenreID PRIMARY KEY (GenreID) )

--DROP TABLE IF EXISTS MovieGenre; CREATE TABLE #MovieGenre ( MovieID int NOT NULL , GenreID tinyint NOT NULL , CONSTRAINT GenresMovies PRIMARY KEY (MovieID, GenreID) );

INSERT INTO #MovieDetails (MovieID, MovieTitle) VALUES (1, 'Forest Gump') , (2, 'Eyes Wide Shut') , (3, 'Kelly''s Heroes') , (4, 'Where Eagles Dare') , (5, 'Star Trek: First Contact') , (6, 'The Ten Commandments') , (7, 'Clash of the Titans') , (8, 'AVP: Alien vs. Predator') , (9, 'Batman Returns') , (10, 'Crash')

INSERT INTO #Genres (GenreName) VALUES ('Drama'), ('Adventure'), ('Family'), ('Horror'), ('Crime');

INSERT INTO #MovieGenre (MovieID, GenreID) VALUES (1,1), (2,1), (3,2), (4,2), (5,1), (5,2), (6,1), (6,2), (7,2), (7,3), (8,4), (9,5), (10,1);

DECLARE @ids NVARCHAR(MAX) = '2,3'

DECLARE @idsXML XML = TRY_CAST('<i>' + REPLACE(ISNULL(@ids, ''), ',', '</i><i>') + '</i>' AS XML) DECLARE @idTable AS TABLE(id INT)

INSERT INTO @idTable SELECT TRY_CAST(i.value('.', 'NVARCHAR(MAX)') AS INT) FROM @idsXML.nodes('//i') X(i)

SELECT m.MovieID, m.MovieTitle, STRING_AGG(g.GenreName, ',') Genres FROM #MovieDetails m JOIN #MovieGenre mg ON m.MovieID = mg.MovieID JOIN #Genres g ON g.GenreID = mg.GenreID JOIN @idTable i ON g.GenreID = i.id GROUP BY m.MovieID, m.MovieTitle

SELECT m.MovieID, m.MovieTitle, STRING_AGG(g.GenreName, ',') Genres FROM #MovieDetails m JOIN #MovieGenre mg ON m.MovieID = mg.MovieID JOIN #Genres g ON g.GenreID = mg.GenreID WHERE m.MovieID IN (SELECT MovieID FROM #MovieGenre mg JOIN @idTable i ON mg.GenreID = i.id) GROUP BY m.MovieID, m.MovieTitle

You could fancy too...

DECLARE @allMode  BIT = 0

SELECT m.MovieID, m.MovieTitle, STRING_AGG(g.GenreName, ',') Genres FROM #MovieDetails m JOIN #MovieGenre mg ON m.MovieID = mg.MovieID JOIN #Genres g ON g.GenreID = mg.GenreID LEFT JOIN @idTable i ON g.GenreID = i.id WHERE (m.MovieID IN (SELECT MovieID FROM #MovieGenre mg JOIN @idTable i ON mg.GenreID = i.id) AND @allMode = 1) OR (i.id IS NOT NULL AND @allMode = 0) GROUP BY m.MovieID, m.MovieTitle

Graham
  • 619
  • 4
  • 12