I have a varchar(200) column that contains entries such as,
ABC123124_A12312
ABC123_A1212
ABC123124_B12312
AC123124_AD12312
A12312_123
etc..
I want to replace a sequence of numbers with a single * so that I can group the distinct non-numeric patterns in the table.
The result for this set would be
ABC*_A*
ABC*_B*
AC*_AD*
A*_*
I have written the following primitive query below, it works correctly, but takes a long time to run on a huge table.
I need help with rewriting or editing it to improve it's performance. SQL Server 2014
-- 1. replace all numeric characters with '*'
-- 2. replace multiple consecutive '*' with just a single '*'
SELECT REPLACE
(REPLACE
(REPLACE
(REPLACE
(REPLACE
(REPLACE
(REPLACE
(REPLACE
(REPLACE
(REPLACE
(REPLACE
(REPLACE
(REPLACE(SampleID, '0', '*'),
'1', '*'),
'2', '*'),
'3', '*'),
'4', '*'),
'5', '*'),
'6', '*'),
'7', '*'),
'8', '*'),
'9', '*')
, '*', '~*') -- replace each occurrence of '*' with '~*' (token plus asterisk)
, '*~', '') -- replace in the result of the previous step each occurrence of '*~' (asterisk plus token) with '' (an empty string)
, '~*', '*') -- replace in the result of the previous step each occurrence of '~*' (token plus asterisk) with '*' (asterisk)
AS Pattern
FROM TABLE_X
Data
The column includes letters and numbers [A-Za-z0-9] and may also include the special characters / and _. I want to replace any sequence of numbers with *, but I do not know if the entry has special characters, and if so how many special characters.
I also do not know how many sequences of numbers are in the entry. All I know is that an entry must have a minimum of 1 number sequence.