1

I am trying to write a regex that can replace '[', ']' and '].' with '_' and in cases where ']' is the last character it should be replaced with '' but I am struggling to come up with a regex for it that works in all cases because Snowflake does not support lookahead & lookbehind.

My question is, has anyone tried/achieved to do this before? or is it impossible and I should just give up??

Eg:

  • look_behind[0] --> look_behind_0
  • look_behind[0]_positive[1] --> look_behind_0_positive_1
nbk
  • 8,699
  • 6
  • 14
  • 27

1 Answers1

1

As far as I can see, you have no need to use a regex at all!

Using a search engine, I found that the functions LEFT() (manual for all string functions), RIGHT() and TRANSLATE() have the same signatures (mostly) in Snowflake (1, 2 & 3) as in PostgreSQL. All of the code below is available on the fiddle here.

Populate with your sample data with extra records:

INSERT INTO tab VALUES 
('look_behind[0]'),
('look_behind[0]_positive[1]'),
('No_[brackets]_[at]_end'),      -- two added records for testing
('[]][][_sderfrx÷=/_[]ddf]');

Then, we do this:

--
-- Works where the string <underscore><left-bracket> (_[) OR the string 
-- <right-bracket><underscore> (]_) are to be replaced by two underscores.
--
-- If the last character of the string is <right-bracket> (]), it is replaced
-- by the empty string.
--
-- It is worth noting that the order of the replacements is important as
-- the operations are not commutative - i.e. x R y <> y R x (not necessarily!).
--
--

SELECT str, LENGTH(str) AS len, TRANSLATE ( CASE WHEN (RIGHT(str, 1) = ']') THEN RIGHT(str, -1) ELSE str END, '[]', '__' -- two underscores - one for each bracket (left & right) ) AS f_str, REGEXP_COUNT ( str, '[[|]]', 1, -- default anyway - not required 'i' -- not relevant for non-alpha characters ) AS f_str_chars_cnt FROM tab;

Result:

str                       len f_str                 f_str_chars_cnt
look_behind[0]             14 look_behind_0                     2
look_behind[0]_positive[1] 26 look_behind_0__positive_1         4
No_[brackets]_[at]_end     22 No__brackets___at__end            4
[]][][_sderfrx÷=/_[]ddf]   24 _______sderfrx÷=/___ddf           9

You could also do this:

--
-- Works where the string <underscore><left-bracket> (_[) OR the string 
-- <right-bracket><underscore> (]_) are to be replaced by one underscore. 
--
-- If the last character of the string is <right-bracket> (]), it is replaced
-- by the empty string.
--
-- It is worth noting that the order of the replacements is important as
-- the operations are not commutative - i.e. x R y <> y R x (not necessarily!).
--
--

SELECT str, TRANSLATE ( CASE WHEN RIGHT (REPLACE(REPLACE(str, '[', ''), ']', ''), 1) = ']' THEN LEFT (REPLACE(REPLACE(str, '[', ''), ']', ''), -1) ELSE str END, '[]', '__' ) FROM tab;

Result (the reader can count the underscores!):

str                         translate
look_behind[0]              look_behind_0
look_behind[0]_positive[1]  look_behind_0_positive_1
No_[brackets]_[at]_end      No__brackets___at__end
[]][][_sderfrx÷=/_[]ddf]    _______sderfrx÷=/__ddf

Regexes are expensive relative to simple string functions. If you really want a regular expression, you can do the following (REGEXP_REPLACE() also has a similar signature in PostgreSQL as in Snowflake):

-- If the last character of the string is <right-bracket> (]), it is replaced
-- by the empty string.
--
-- Works where the string <underscore><left-bracket> (_[) AND the string 
-- <right-bracket><underscore> (]_) are to be replaced by two underscores (__).
--
-- The order of the replacements for the desired outcome in terms of what character/
-- string is replaced first is left up to the reader.
--

SELECT REGEXP_REPLACE ( REGEXP_REPLACE ( str, '(.*)]$', -- replace with pattern '\1' ), '[[]]', '_',
'g' -- "global" flag to replace all occurrences ) FROM tab;

Result: same as first (TRANSLATE()) snippet above.

Performance analysis (see end of fiddle):

The entire results of the fiddle aren't included here, but just to see the times is interesting.

TRANSLATE()          - Execution Time: 0.037 ms
REGEXP_REPLACE()     - Execution Time: 0.100 ms
REPLACE(...REPLACE() - Execution Time: 0.027 ms

The time taken for the two string function solutions is roughly 33% of the time taken for the regular expression solution. This is why it's always worth checking to see if an ordinary string function will suffice in cases such as this.

Vérace
  • 30,923
  • 9
  • 73
  • 85