0

I created this PostgreSQL query while troubleshooting a problem that turned out to be due to non-printing characters in my strings:

with cte as ( select 'The character between the xes is evil: xx' as my_bad_string )
select substring( s.my_bad_string, n.i, 1) char_i
  , ascii(substring( s.my_bad_string, n.i, 1)) ascii_char_i
from 
  cte as s
  cross join (
    select generate_series(1, length( 'The character between the xes is evil: xx' ))
  ) n(i)

It works. It helped me to find the evil character 141.

Question: what is a more elegant way to write this query so that my hard-coded string appears exactly once instead of twice?

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
mdahlman
  • 357
  • 3
  • 14

1 Answers1

1

Use unnest() after converting string to array with string_to_array():

SELECT char_i, ascii(char_i) AS ascii_char_i
FROM   unnest(string_to_array('The character between the xes is evil: xx', NULL)) char_i;

NULL as separator splits after every character in string_to_array().

There is also regexp_split_to_table(), it's more versatile, but also more expensive.

To guarantee original order of characters throw WITH ORDINALITY into the mix:

SELECT pos, char_i, ascii(char_i) AS ascii_char_i
FROM   unnest(string_to_array('The character between the xes is evil: xx', NULL)) WITH ORDINALITY t(char_i, pos)
ORDER  BY pos;

But while the query is that simple the order is preserved anyway - by the current implementation.

About tricky whitespace and non-printing (UNICODE) characters:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633