2

I have two data sets. The first data set has approx. 50.000 movie and song titles and the second one have 20.000 blacklist strings. I am looking for the best algorithm to detect movie/song title which contains blacklisted word(s).

Example: Dataset #1

The Lord Of The Rings
E.T.
Star Wars
...
(50k items)

Blacklist Data set

Lord
Home Alone
Matrix
ar
...
(20k items)

Items in these data sets may be a character or a few words. String search algorithms like Boyer-Moore is not helping me with this since I have more than 1 needle to search in the haystack. I (probably) need to find an algorithm to find all combinations efficiently and later make a string search (regex maybe?) for each combination.

Eray
  • 336

0 Answers0