5

I am trying to strip odd characters from strings using PowerShell. I used the following output to attempt to learn on my own:

get-help about_regular_expressions

I am trying to take a string that is mostly ASCII, but that has one anomalous character that needs to be removed. (The registered trademark symbol; the R with a circle around it.) I'd like to strip any occurrence of that character out of a string, leaving everything else intact. What is the cleanest expression to accomplish this using PowerShell 2.0?

[EDIT]

I have done a little further digging, and I believe the problem is stemming from the Import-CSV call I'm using.

When I cut-and-paste this symbol from within notepad into the PS prompt, and assign it to a string, I match just fine:

# This code yields 'True'
$string -match "\u00ae"

However, when I use Import-CSV on a CSV file where one of the fields contains the special symbol, I believe somehow the raw bytes are getting converted, because doing something like this doesn't work:

# This code yields 'False'
$source = Import-CSV -path testing.csv
# The following extracts the entry / line containing the special symbol that was
# copy-and-pasted above
$culprit = $source[5].COMMITTEE_NAME
$culprit -match "\u00ae"

However, the following DOES work:

# This yields True
$filedata = get-content testing.csv
$filedata[6] -match "\u00ae"

So I think my followup question to all of this is:

How can I keep the strings intact through the import-csv call so that calls to -match for the individual fields will still work?

Larold
  • 812

1 Answers1

1

It's important to note that the console PS doesn't display Unicode well. You'll have to use the ISE to "see" what's happening. Have a look at this related SO question for some additional reading. You can use the ® character in PS, regardless, if you don't need to watch the script in-action.

In the ISE:

PS C:\Users\jscott> $string = "This string contains the ® character"
PS C:\Users\jscott> $string
This string contains the ® character

PS C:\Users\jscott> $string.Replace("®","")
This string contains the  character

PS C:\Users\jscott> $string ="This ® string ® contains ® many ® characters ®®®®"
PS C:\Users\jscott> $string
This ® string ® contains ® many ® characters ®®®®

PS C:\Users\jscott> $string.Replace("®","")
This  string  contains  many  characters 

To use character code instead of the literal:

PS C:\Users\jscott> $string.Replace("$([char]0x00AE)","")

Per your question update:

You need to convert the ASCII file to Unicode/UTF8 before running it through Import-Csv -- I didn't realize you were using this. Have all look at this and this for other examples.

You may just want to pipe the initial CSV file thought Get-Content or Export-Csv -Encoding Unicode to pre-process the file and make life easier.

jscott
  • 25,114