0

It often happens that after designing my regexp (on regex101.com) I want to paste it in my program. Consider this regexp that matches numbers and string (but keep in mind this is general question!):

^(\"(?:[^\"]|\\\")*\"|\-?[0-9]+(?:\.[0-9]+)?)$

I overlined all characters that need to be escaped before pasting them into languages that use " for strings.

Needless to say, doing this manually drives me crazy. I face this problem both at work with C++ project and at home with Java and JavaScript projects.

How can I deal with this efficiently?

4 Answers4

1

If you feel it to be worth make your small DSL (or maybe it already exists) so you can do (java):

// ^(\"(?:[^\"]|\\\")*\"|\-?[0-9]+(?:\.[0-9]+)?)$
// @formatter:off
Pattern pattern = Patterning.start() // ^
    .group()
    .lookahead()
        ...
        .set("0-9").plus()
        .string("E=m.c^2") // \Q ... \E
    .lookaheadEnd()
    .groupEnd();
    .end()                               // $
    .build();
// @formatter:on

class Patterning { ... }

Though most people know regex; or at least it is worth learning regex, if only to do powerfull replaces in the editor.

Joop Eggen
  • 2,629
1

In C++, use raw string literals (added in C++11). Nothing between the delimiter sequences is treated as an escape:

const char *regex = R"-regexp-(^(\"(?:[^\"]|\\\")*\"|\-?[0-9]+(?:\.[0-9]+)?)$)-regexp-";

in this case the delimiters are the literal strings -regexp-( and )-regexp-

Useless
  • 12,823
0

Use Unicode character escapes instead of literals. For example:

  • Java

    boolean b = Pattern.matches("\u0022", '"');
    
  • JavaScript

    /\u0022/.test('"');
    
  • Perl

    '"' =~ /\N{U+0022}/;
    

In addition, strings that are compiled to regular expressions can use line breaks for added clarity:

  • Java

    boolean phone_mask = Pattern.matches("^[^0-9]*"/* Optional non-numeric characters */ +
                            "\\+9{3}" /* Followed by a plus sign and three nines */ +
                            "\\s9"    /* Followed by a space and one nine */  +
                            "\\s9{3}" /* Followed by a space and three nines */ +
                            "\\s9{4}" /* Followed by a space and four nines */ +
                            "$", "Phone: +999 9 999 9999");
    
  • JavaScript

    var phone_mask = RegExp("^[^0-9]*"/* Optional non-numeric characters */ +
                            "\\+9{3}" /* Followed by a plus sign and three nines */ +
                            "\\s9"    /* Followed by a space and one nine */  +
                            "\\s9{3}" /* Followed by a space and three nines */ +
                            "\\s9{4}" /* Followed by a space and four nines */ +
                            "$").test("Phone: +999 9 999 9999");
    

References

Deduplicator
  • 9,209
-2

Write a program that interprets and escapes your regexps for you. You can either use this to generate the code needed to paste into your source or have it work on the fly having your regexp in a separate file.

For the file version, a big downside is: not having your logic with your source.

Pieter B
  • 13,310