106

How can I get diff to show only added and deleted lines? If diff can't do it, what tool can?

C. Ross
  • 3,125

11 Answers11

125

Try comm

Another way to look at it:

  • Show lines that only exist in file a: (i.e. what was deleted from a)

      comm -23 a b
    
  • Show lines that only exist in file b: (i.e. what was added to b)

      comm -13 a b
    
  • Show lines that only exist in one file or the other: (but not both)

      comm -3 a b | sed 's/^\t//'
    

(Warning: If file a has lines that start with TAB, it (the first TAB) will be removed from the output.)

Sorted files only

NOTE: Both files need to be sorted for comm to work properly. If they aren't already sorted, you should sort them:

sort <a >a.sorted
sort <b >b.sorted
comm -12 a.sorted b.sorted

If the files are extremely long, this may be quite a burden as it requires an extra copy and therefore twice as much disk space.

Or if you use a modern shell:

comm -12 <(sort a) <(sort b)
TomOnTime
  • 8,381
23

To show additions and deletions without context, line numbers, +, -, <, > ! etc, you can use diff like this:

diff --changed-group-format='%<%>' --unchanged-group-format='' a.txt b.txt 

For example, given two files:

a.txt

Common
Common
A-ONLY
Common

b.txt

Common
B-ONLY
Common
Common

The following command will show lines either removed from a or added to b:

diff --changed-group-format='%<%>' --unchanged-group-format='' a.txt b.txt 

output:

B-ONLY
A-ONLY

This slightly different command will show lines removed from a.txt:

diff --changed-group-format='%<' --unchanged-group-format='' a.txt b.txt 

output:

A-ONLY

Finally, this command will show lines added to a.txt

diff --changed-group-format='%>' --unchanged-group-format='' a.txt b.txt 

output

B-ONLY
15

comm might do what you want. From its man page:

DESCRIPTION

Compare sorted files FILE1 and FILE2 line by line.

With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files.

These columns are suppressable with -1, -2 and -3 respectively.

Example:

[root@dev ~]# cat a
common
shared
unique

[root@dev ~]# cat b
common
individual
shared

[root@dev ~]# comm -3 a b
    individual
unique

And if you just want the unique lines and don't care which file they're in:

[root@dev ~]# comm -3 a b | sed 's/^\t//'
individual
unique

As the man page says, the files must be sorted beforehand.

markdrayton
  • 2,449
4

Visual comparison tools fit two files together so that a segment with the same number of lines but differing content will be considered a changed segment. Completely new lines between matching segments are considered added segments.

This is also how sdiff command-line tool works, which shows a side-by-side comparison of two files in a terminal. Changed lines are separated by | character. If a line exists only in file A, < is used as the separator character. If a line exists only in file B, > is used as the separator. If you don't have < and > characters in the files, you can use this to show only added lines:

sdiff A B | grep '[<>]'
3

No, diff doesn't actually show the differences between two files in the way one might think. It produces a sequence of editing commands for a tool like patch to use to change one file into another.

The difficulty for any attempt at doing what you're looking for is how to define what constitutes a line that has changed versus a deleted one followed by an added one. Also what to do when lines are added, deleted and changed adjacent to each other.

2

Thanks senarvi, your solution (not voted for) actually gave me EXACTLY what I wanted after looking for ages on a ton of pages.

Using your answer, here is what I came up with to get the list of things changed/added/deleted. The example uses 2 versions of the /etc/passwd file and prints out the username for the relevant records.

#!/bin/bash
sdiff passwd1 passwd2 | grep '[|]' | awk -F: '{print "changed: " $1}'
sdiff passwd1 passwd2 | grep '[<]' | awk -F: '{print "deleted: " $1}'
sdiff passwd1 passwd2 | grep '[>]' | awk -F\> '{print $2}' | awk -F: '{print "added: " $1}'
2

That's what diff does by default... Maybe you need to add some flags to ignore whitespace?

diff -b -B

should ignore blank lines and different numbers of spaces.

1

I find it simplest to use grep:

Added lines:

grep -xvFf filea.txt fileb.txt

Removed lines:

grep -xvFf fileb.txt filea.txt

-x: match whole lines
-v: lines NOT matching the pattern(s)
-F: treat patterns as fixed strings, not regular expressions
-f <otherfile>: read the list of patterns from a file

1

I find this particular form often useful:

diff --changed-group-format='-%<+%>' --unchanged-group-format='' f g

Example:

printf 'a\nb\nc\nd\ne\nf\ng\n' > f
printf 'a\nB\nC\nd\nE\nF\ng\n' > g
diff --old-line-format=$'-%l\n' \
     --new-line-format=$'+%l\n' \
     --unchanged-line-format='' \
     f g

Output:

-b
-c
+B
+C
-e
-f
+E
+F

So it shows old lines with - followed immediately by the corresponding new line with +.

If we had a deletion of C:

printf 'a\nb\nd\ne\nf\ng\n' > f
printf 'a\nB\nC\nd\nE\nF\ng\n' > g
diff --old-line-format=$'-%l\n' \
     --new-line-format=$'+%l\n' \
     --unchanged-line-format='' \
     f g

it looks like this:

-b
+B
+C
-e
-f
+E
+F

The format is documented at man diff:

       --line-format=LFMT
              format all input lines with LFMT`

and:

       LTYPE is 'old', 'new', or 'unchanged'.
              GTYPE is LTYPE or 'changed'.

and:

              LFMT (only) may contain:

       %L     contents of line

       %l     contents of line, excluding any trailing newline

       [...]

Related question: https://stackoverflow.com/questions/15384818/how-to-get-the-difference-only-additions-between-two-files-in-linux

Tested in Ubuntu 18.04.

0

We can combine diff and sed to achieve what you want. lets take the same example from https://serverfault.com/a/68717/947477

[root@dev ~]# cat file1
common
shared
unique

[root@dev ~]# cat file2 common individual shared

To show added lines with + and deleted lines with - we can use

root@dev ~]# diff -u file1 file2 |sed -n '/^\(+\|-\)/p'

--- a 2022-03-25 18:30:57.507551352 +0530 +++ b 2022-03-25 18:31:15.087860053 +0530 -shared -unique +individual

Here, -u is for printing unified content and sed will filter only outputs with - or + at the beginning.

A more straightforward answer is

diff file1 file2
< shared
< unique
---
> individual
Jabir Ali
  • 101
  • 2
-1

File1:

text670_1
text067_1
text067_2

File2:

text04_1
text04_2
text05_1
text05_2
text067_1
text067_2
text1000_1

Use:

diff -y file1 file2

This show two columns for repectives files.

Output:

text670_1                           
                                  > text04_1
                                  > text04_2
                                  > text05_1
                                  > text05_2
text067_1                           text67_1
text067_2                           text67_2
                                  > text1000_1
Adriano
  • 97