Don't need the whole line, just the match from regular expression

Question

I simply need to get the match from a regular expression:

$ cat myfile.txt | SOMETHING_HERE "/(\w).+/"

The output has to be only what was matched, inside the parenthesis.

Don't think I can use grep because it matches the whole line.

Please let me know how to do this.

score 27 · Answer 1 · answered Aug 06 '09 at 16:36

27

Use the -o option in grep.

Eg:

$ echo "foobarbaz" | grep -o 'b[aeiou]r'
bar

answered Aug 06 '09 at 16:36

Amandasaurus

33,461

score 25 · Accepted Answer · edited Jan 19 '21 at 12:59

2 Things:

As stated by @Rory, you need the -o option, so only the match are printed (instead of whole line)
In addition, you neet the -P option, to use Perl regular expressions, which include useful elements like Look ahead (?= ) and Look behind (?<= ), those look for parts, but don't actually match and print them.

If you want only the part inside the parenthesis to be matched, do the following:

grep -oP '(?<=\/\()\w(?=\).+\/)' myfile.txt

If the file contains the sting /(a)5667/, grep will print 'a', because:

/( are found by \/\(, but because they are in a look-behind (?<= ) they are not reported
a is matched by \w and is thus printed (because of -o )
)5667/ are found by \).+\/, but because they are in a look-ahead (?= ) they are not reported

score 18 · Answer 3 · answered Apr 22 '16 at 15:58

    sed -n "s/^.*\(captureThis\).*$/\1/p"

-n      don't print lines
s       substitute
^.*     matches anything before the captureThis 
\( \)   capture everything between and assign it to \1 
.*$     matches anything after the captureThis 
\1      replace everything with captureThis 
p       print it

score 8 · Answer 4 · answered Jan 20 '15 at 13:47

Because you tagged your question as bash in addition to shell, there is another solution beside grep :

Bash has its own regular expression engine since version 3.0, using the =~ operator, just like Perl.

now, given the following code:

#!/bin/bash
DATA="test <Lane>8</Lane>"

if [[ "$DATA" =~ \<Lane\>([[:digit:]]+)\<\/Lane\> ]]; then
        echo $BASH_REMATCH
        echo ${BASH_REMATCH[1]}
fi

Note that you have to invoke it as bashand not just sh in order to get all extensions
$BASH_REMATCH will give the whole string as matched by the whole regular expression, so <Lane>8</Lane>
${BASH_REMATCH[1]} will give the part matched by the 1st group, thus only 8

score 5 · Answer 5 · 2017-07-22T20:10:11.157

Assuming the file contains:

$ cat file
Text-here>xyz</more text

And you want the character(s) between > and </ , you can use either:

grep grep -oP '.*\K(?<=>)\w+(?=<\/)' file
sed sed -nE 's:^.*>(\w+)</.*$:\1:p' file
awk awk '{print(gensub("^.*>(\\w+)</.*$","\\1","g"))}' file
perl perl -nle 'print $1 if />(\w+)<\//' file

All will print a string "xyz".

If you want to capture the digits of this line:

$ cat file
Text-<here>1234</text>-ends

grep grep -oP '.*\K(?<=>)[0-9]+(?=<\/)' file
sed sed -E 's:^.*>([0-9]+)</.*$:\1:' file
awk awk '{print(gensub(".*>([0-9]+)</.*","\\1","g"))}' file
perl perl -nle 'print $1 if />([0-9]+)<\//' file

Kyle Brandt · Answer 6 · 2009-08-06T18:19:10.477

If you want only what is in the parenthesis, you need something that supports capturing sub matches (Named or Numbered Capturing Groups). I don't think grep or egrep can do this, perl and sed can. For example, with perl:

If a file called foo has a line in that is as follows:

/adsdds      /

And you do:

perl -nle 'print $1 if /\/(\w).+\//' foo

The letter a is returned. That might be not what you want though. If you tell us what you are trying to match, you might get better help. $1 is whatever was captured in the first set of parenthesis. $2 would be the second set etc.

score 0 · Answer 7 · answered Aug 06 '09 at 18:02

This will accomplish what you are requesting, but I don't think it is what you really want. I put the .* in the front of the regex to eat up anything before the match, but that is a greedy operation, so this only matches the penultimate \w character in the string.

Note that you need to escape the parens and the +.

sed 's/.*\(\w\).\+/\1/' myfile.txt

Don't need the whole line, just the match from regular expression

7 Answers7