31

I simply need to get the match from a regular expression:

$ cat myfile.txt | SOMETHING_HERE "/(\w).+/"

The output has to be only what was matched, inside the parenthesis.

Don't think I can use grep because it matches the whole line.

Please let me know how to do this.

Alex L
  • 591

7 Answers7

27

Use the -o option in grep.

Eg:

$ echo "foobarbaz" | grep -o 'b[aeiou]r'
bar
Amandasaurus
  • 33,461
25

2 Things:

  • As stated by @Rory, you need the -o option, so only the match are printed (instead of whole line)
  • In addition, you neet the -P option, to use Perl regular expressions, which include useful elements like Look ahead (?= ) and Look behind (?<= ), those look for parts, but don't actually match and print them.

If you want only the part inside the parenthesis to be matched, do the following:

grep -oP '(?<=\/\()\w(?=\).+\/)' myfile.txt

If the file contains the sting /(a)5667/, grep will print 'a', because:

  • /( are found by \/\(, but because they are in a look-behind (?<= ) they are not reported
  • a is matched by \w and is thus printed (because of -o )
  • )5667/ are found by \).+\/, but because they are in a look-ahead (?= ) they are not reported
David
  • 103
  • 3
DrYak
  • 533
18
    sed -n "s/^.*\(captureThis\).*$/\1/p"

-n      don't print lines
s       substitute
^.*     matches anything before the captureThis 
\( \)   capture everything between and assign it to \1 
.*$     matches anything after the captureThis 
\1      replace everything with captureThis 
p       print it
Joshua
  • 579
8

Because you tagged your question as bash in addition to shell, there is another solution beside grep :

Bash has its own regular expression engine since version 3.0, using the =~ operator, just like Perl.

now, given the following code:

#!/bin/bash
DATA="test <Lane>8</Lane>"

if [[ "$DATA" =~ \<Lane\>([[:digit:]]+)\<\/Lane\> ]]; then
        echo $BASH_REMATCH
        echo ${BASH_REMATCH[1]}
fi
  • Note that you have to invoke it as bashand not just sh in order to get all extensions
  • $BASH_REMATCH will give the whole string as matched by the whole regular expression, so <Lane>8</Lane>
  • ${BASH_REMATCH[1]} will give the part matched by the 1st group, thus only 8
DrYak
  • 533
5

Assuming the file contains:

$ cat file
Text-here>xyz</more text

And you want the character(s) between > and </ , you can use either:

grep -oP '.*\K(?<=>)\w+(?=<\/)' file
sed -nE 's:^.*>(\w+)</.*$:\1:p' file
awk '{print(gensub("^.*>(\\w+)</.*$","\\1","g"))}' file
perl -nle 'print $1 if />(\w+)<\//' file

All will print a string "xyz".

If you want to capture the digits of this line:

$ cat file
Text-<here>1234</text>-ends

grep -oP '.*\K(?<=>)[0-9]+(?=<\/)' file
sed -E 's:^.*>([0-9]+)</.*$:\1:' file
awk '{print(gensub(".*>([0-9]+)</.*","\\1","g"))}' file
perl -nle 'print $1 if />([0-9]+)<\//' file

4

If you want only what is in the parenthesis, you need something that supports capturing sub matches (Named or Numbered Capturing Groups). I don't think grep or egrep can do this, perl and sed can. For example, with perl:

If a file called foo has a line in that is as follows:

/adsdds      /

And you do:

perl -nle 'print $1 if /\/(\w).+\//' foo

The letter a is returned. That might be not what you want though. If you tell us what you are trying to match, you might get better help. $1 is whatever was captured in the first set of parenthesis. $2 would be the second set etc.

Kyle Brandt
  • 85,693
0

This will accomplish what you are requesting, but I don't think it is what you really want. I put the .* in the front of the regex to eat up anything before the match, but that is a greedy operation, so this only matches the penultimate \w character in the string.

Note that you need to escape the parens and the +.

sed 's/.*\(\w\).\+/\1/' myfile.txt