When I first answered this question the rules for what characters would determine valid or invalid were a bit hazy (the documentation isn't entirely clear on it, and is in some ways misleading). However, since that time I have done some research into the specific rules for valid regular identifiers (i.e. those that do not need to be delimited by either "name" or [name]) related to this DBA.SE question:
How to create Unicode parameter and variable names
and have discovered the exact rules. The research into getting those exact rules lead to finding a list of Unicode Categories (not blocks) that can be used in combination to determine both the "first" / "start" character and any possible "subsequent" / "continue" characters. That research is towards the end of this post (Steps 7 and 8):
The Uni-Code: The Search for the True List of Valid Characters for T-SQL Regular Identifiers, Part 2
Reworking your pattern I came up with the following:
(?!\[(?i:PRIMARY|GROUP)\])\[([\p{L}\p{Nl}\u0023\u005F\uFF3F][\p{L}\p{Nl}\p{Mn}\p{Mc}\p{Nd}\p{Pc}\p{Cf}\u0023\u0024\u0040]*)\](?!\])
Some notes:
You don't need to "or" the separate items together with a | in a character class (i.e. the [stuff_here]) as it is already a list. In fact, including the pipe symbol actually means you were adding it to the list of allowed characters ;-).
You need to separate out the exclusion list (i.e. PRIMARY|GROUP...) such that they can be wrapped in [ and ]. Otherwise you allow for something like [PRIMARYa] to get skipped and keep its delimiters.
I added (?!\]) to the end to account for embedded escaped end-delimiters: ]].
This pattern is 99.5% effective. The only thing it does not catch is if you have both:
- an embedded leading delimiter:
[, AND
- no embedded escaped end-delimiter:
]
Meaning, [Te[st] will erroneously un-delimit it to be [Test (oops), but both [Te[s]]t] and [Tes]]t] will remain delimited as expected. And really, how often are there embedded delimiters? And even then, how often is there just the beginning delimiter and not the ending delimiter? Very rare, I would think.
Of course, I said it was 99.5% effective but it only catches 2 reserved words. Easy enough to fix since we can get the list of reserved words. Adding in all 185 reserved words (well, 184 as I left out "WITHIN GROUP" since it has a space in it and is thus naturally excluded) gives us the following:
( I made it a comment instead of a code block so that it would be readable since it is a single line [that can be copied and pasted into the "Find" box] )
(?!\[(?i:ADD|ALL|ALTER|AND|ANY|AS|ASC|AUTHORIZATION|BACKUP|BEGIN|BETWEEN|BREAK|BROWSE|BULK|BY|CASCADE|CASE|CHECK|CHECKPOINT|CLOSE|CLUSTERED|COALESCE|COLLATE|COLUMN|COMMIT|COMPUTE|CONSTRAINT|CONTAINS|CONTAINSTABLE|CONTINUE|CONVERT|CREATE|CROSS|CURRENT|CURRENT_DATE|CURRENT_TIME|CURRENT_TIMESTAMP|CURRENT_USER|CURSOR|DATABASE|DBCC|DEALLOCATE|DECLARE|DEFAULT|DELETE|DENY|DESC|DISK|DISTINCT|DISTRIBUTED|DOUBLE|DROP|DUMP|ELSE|END|ERRLVL|ESCAPE|EXCEPT|EXEC|EXECUTE|EXISTS|EXIT|EXTERNAL|FETCH|FILE|FILLFACTOR|FOR|FOREIGN|FREETEXT|FREETEXTTABLE|FROM|FULL|FUNCTION|GOTO|GRANT|GROUP|HAVING|HOLDLOCK|IDENTITY|IDENTITY_INSERT|IDENTITYCOL|IF|IN|INDEX|INNER|INSERT|INTERSECT|INTO|IS|JOIN|KEY|KILL|LEFT|LIKE|LINENO|LOAD|MERGE|NATIONAL|NOCHECK|NONCLUSTERED|NOT|NULL|NULLIF|OF|OFF|OFFSETS|ON|OPEN|OPENDATASOURCE|OPENQUERY|OPENROWSET|OPENXML|OPTION|OR|ORDER|OUTER|OVER|PERCENT|PIVOT|PLAN|PRECISION|PRIMARY|PRINT|PROC|PROCEDURE|PUBLIC|RAISERROR|READ|READTEXT|RECONFIGURE|REFERENCES|REPLICATION|RESTORE|RESTRICT|RETURN|REVERT|REVOKE|RIGHT|ROLLBACK|ROWCOUNT|ROWGUIDCOL|RULE|SAVE|SCHEMA|SECURITYAUDIT|SELECT|SEMANTICKEYPHRASETABLE|SEMANTICSIMILARITYDETAILSTABLE|SEMANTICSIMILARITYTABLE|SESSION_USER|SET|SETUSER|SHUTDOWN|SOME|STATISTICS|SYSTEM_USER|TABLE|TABLESAMPLE|TEXTSIZE|THEN|TO|TOP|TRAN|TRANSACTION|TRIGGER|TRUNCATE|TRY_CONVERT|TSEQUAL|UNION|UNIQUE|UNPIVOT|UPDATE|UPDATETEXT|USE|USER|VALUES|VARYING|VIEW|WAITFOR|WHEN|WHERE|WHILE|WITH|WRITETEXT)\])\[([\p{L}\p{Nl}\u0023\u005F\uFF3F][\p{L}\p{Nl}\p{Mn}\p{Mc}\p{Nd}\p{Pc}\p{Cf}\u0023\u0024\u0040]*)\](?!\])
Also, with the new pattern you will need to change the replacement to be $1 instead of $2.
I tested with the following:
SELECT [f], [p$o], -- match these
[prImary], [12y], [a b], [g%g], [j^6], [f¾], [u²], [h⁊], [s|w]
FROM dbo.[$a],
dbo.[#a], dbo.[@d], dbo.[_a] -- match these
WHERE f.[&col1] = N'it will also [match] <- that :-(';
/* and [this] */
--[PRIMARY]
--[PRIMARy]
--[PRIMARYg]
--[1PRIMARYg]
--[PRIMARY g]
--[GROUP]
--[GROUPBY]
--[@Test]
--[Te]]st]
--[Te]]st]]]
--SELECT 1 AS [Te]]st]]a[b]; --OOPS
--[Te]]s[t]]ab]
--[Te]]s[t]]ab]]
--SELECT 2 AS [Te]]st]]], 2.1 AS [p];
--SELECT 3 AS [Tes[tab]; --OOPS
--[Te[st] --OOPS
--[Te[s]]t]
--[Tes]]t]
--[T[e[s]]t]
--[dbo].[TableName]
--[dbo].[1TableName]
--[dbo.1TableName]
--[[t] --OOPS
--SELECT 4 AS []]t];
--[[t]]]
-- SELECT 1 AS [d], '---' AS [1f], '$' AS [g]
-- SELECT 1 AS d, '---' AS [1f], '$' AS g
--[OPTION]
--[OPTIONt]
That all being said, I will reiterate what I originally said in a comment on the question:
Is making the script "look nicer" worth any amount of risk of potential errors? I would say "no". I used to remove square brackets for pretty much the same reason, but have gotten to the place where I now always add them because "guaranteed working" beats aesthetics. For me. I realize not for everyone. Of course, if you are more comfortable with .NET RegEx, then SQL# (which I wrote) has RegEx_Replace for free .
Also, keep in mind that this is a global search and replace, so it will match strings of this pattern even if they are in comments or string literals. If you have such strings then this might not be desirable and you will either need to find a way to exclude those (especially string literals), or give up this fight ;).