1514

I need to get a list of human readable du output.

However, du does not have a "sort by size" option, and piping to sort doesn't work with the human readable flag.

For example, running:

du | sort -n -r 

Outputs a sorted disk usage by size (descending):

du |sort -n -r
65108   .
61508   ./dir3
2056    ./dir4
1032    ./dir1
508     ./dir2

However, running it with the human readable flag, does not sort properly:

du -h | sort -n -r

508K    ./dir2
64M     .
61M     ./dir3
2.1M    ./dir4
1.1M    ./dir1

Does anyone know of a way to sort du -h by size?

Tom Feiner
  • 18,598

39 Answers39

2138

As of GNU coreutils 7.5 released in August 2009, sort allows a -h parameter, which allows numeric suffixes of the kind produced by du -h:

du -hs * | sort -h

If you are using a sort that does not support -h, you can install GNU Coreutils. E.g. on an older Mac OS X:

brew install coreutils
du -hs * | gsort -h

From sort manual:

-h, --human-numeric-sort compare human readable numbers (e.g., 2K 1G)

ptman
  • 29,862
113
du | sort -nr | cut -f2- | xargs du -hs
cadrian
  • 1,365
89

There is an immensely useful tool I use called ncdu that is designed for finding those pesky high disk-usage folders and files, and removing them. It's console based, fast and light, and has packages on all the major distributions.

71

@Douglas Leeder, one more answer: Sort the human-readable output from du -h using another tool. Like Perl!

du -h | perl -e 'sub h{%h=(K=>10,M=>20,G=>30);($n,$u)=shift=~/([0-9.]+)(\D)/;
return $n*2**$h{$u}}print sort{h($b)<=>h($a)}<>;'

Split onto two lines to fit the display. You can use it this way or make it a one-liner, it'll work either way.

Output:

4.5M    .
3.7M    ./colors
372K    ./plugin
128K    ./autoload
100K    ./doc
100K    ./syntax

EDIT: After a few rounds of golf over at PerlMonks, the final result is the following:

perl -e'%h=map{/.\s/;99**(ord$&&7)-$`,$_}`du -h`;die@h{sort%h}'
48
du -k * | sort -nr | cut -f2 | xargs -d '\n' du -sh
Jake Wilson
  • 9,133
25

As far as I can see you have three options:

  1. Alter du to sort before display.
  2. Alter sort to support human sizes for numerical sort.
  3. Post process the output from sort to change the basic output to human readable.

You could also do du -k and live with sizes in KiB.

For option 3 you could use the following script:

#!/usr/bin/env python

import sys
import re

sizeRe = re.compile(r"^(\d+)(.*)$")

for line in sys.stdin.readlines():
    mo = sizeRe.match(line)
    if mo:
        size = int(mo.group(1))
        if size < 1024:
            size = str(size)+"K"
        elif size < 1024 ** 2:
            size = str(size/1024)+"M"
        else:
            size = str(size/(1024 ** 2))+"G"

        print "%s%s"%(size,mo.group(2))
    else:
        print line
24

I've had that problem as well and I'm currently using a workaround:

du -scBM | sort -n

This will not produce scaled values, but always produce the size in megabytes. That's less then perfect, but for me it's better than nothing (or displaying the size in bytes).

21

Here's an example that shows the directories in a more compact summarized form. It handles spaces in directory/filenames.

% du -s * | sort -rn | cut -f2- | xargs -d "\n" du -sh

53G  projects
21G  Desktop
7.2G VirtualBox VMs
3.7G db
3.3G SparkleShare
2.2G Dropbox
272M apps
47M  incoming
14M  bin
5.7M rpmbuild
68K  vimdir.tgz
slm
  • 8,010
21

Found this posting elsewhere. Therefore, this shell script will do what you want without calling du on everything twice. It uses awk to convert the raw bytes to a human-readable format. Of course, the formatting is slightly different (everything is printed to one decimal place precision).

#/bin/bash
du -B1 | sort -nr  |awk '{sum=$1;
hum[1024**3]="G";hum[1024**2]="M";hum[1024]="K";
for (x=1024**3; x>=1024; x/=1024){
        if (sum>=x) { printf "%.1f%s\t\t",sum/x,hum[x];print $2;break
}}}'

Running this in my .vim directory yields:

4.4M            .
3.6M            ./colors
372.0K          ./plugin
128.0K          ./autoload
100.0K          ./syntax
100.0K          ./doc

(I hope 3.6M of color schemes isn't excessive.)

21

This version uses awk to create extra columns for sort keys. It only calls du once. The output should look exactly like du.

I've split it into multiple lines, but it can be recombined into a one-liner.

du -h |
  awk '{printf "%s %08.2f\t%s\n", 
    index("KMG", substr($1, length($1))),
    substr($1, 0, length($1)-1), $0}' |
  sort -r | cut -f2,3

Explanation:

  • BEGIN - create a string to index to substitute 1, 2, 3 for K, M, G for grouping by units, if there's no unit (the size is less than 1K), then there's no match and a zero is returned (perfect!)
  • print the new fields - unit, value (to make the alpha-sort work properly it's zero-padded, fixed-length) and original line
  • index the last character of the size field
  • pull out the numeric portion of the size
  • sort the results, discard the extra columns

Try it without the cut command to see what it's doing.

Here's a version which does the sorting within the AWK script and doesn't need cut:

du -h |
   awk '{idx = sprintf("%s %08.2f %s", 
         index("KMG", substr($1, length($1))),
         substr($1, 0, length($1)-1), $0);
         lines[idx] = $0}
    END {c = asorti(lines, sorted);
         for (i = c; i >= 1; i--)
           print lines[sorted[i]]}'
16

sort files by size in MB

du --block-size=MiB --max-depth=1 path | sort -n
11

I've a simple but useful python wrapper for du called dutop. Note that we (the coreutils maintainers) are considering adding the functionality to sort to sort "human" output directly.

pixelbeat
  • 256
11

Got another one:

$ du -B1 | sort -nr | perl -MNumber::Bytes::Human=format_bytes -F'\t' -lane 'print format_bytes($F[0])."\t".$F[1]'

I'm starting to like perl. You might have to do a

$ cpan Number::Bytes::Human

first. To all the perl hackers out there: Yes, I know that the sort part can also be done in perl. Probably the du part, too.

0x89
  • 6,535
10

This snippet was shameless snagged from 'Jean-Pierre' from http://www.unix.com/shell-programming-scripting/32555-du-h-sort.html. Is there a way I can better credit him?

du -k | sort -nr | awk '
     BEGIN {
        split("KB,MB,GB,TB", Units, ",");
     }
     {
        u = 1;
        while ($1 >= 1024) {
           $1 = $1 / 1024;
           u += 1
        }
        $1 = sprintf("%.1f %s", $1, Units[u]);
        print $0;
     }
    '
Bozojoe
  • 635
9

Use the "-g" flag

 -g, --general-numeric-sort
              compare according to general numerical value

And on my /usr/local directory produces output like this:

$ du |sort -g

0   ./lib/site_ruby/1.8/rubygems/digest
20  ./lib/site_ruby/1.8/rubygems/ext
20  ./share/xml
24  ./lib/perl
24  ./share/sgml
44  ./lib/site_ruby/1.8/rubygems/package
44  ./share/mime
52  ./share/icons/hicolor
56  ./share/icons
112 ./share/perl/5.10.0/YAML
132 ./lib/site_ruby/1.8/rubygems/commands
132 ./share/man/man3
136 ./share/man
156 ./share/perl/5.10.0
160 ./share/perl
488 ./share
560 ./lib/site_ruby/1.8/rubygems
604 ./lib/site_ruby/1.8
608 ./lib/site_ruby
Mick T
  • 129
8

Found this one on line... seems to work OK

du -sh * | tee /tmp/duout.txt | grep G | sort -rn ; cat /tmp/duout.txt | grep M | sort -rn ; cat /tmp/duout.txt | grep K | sort -rn ; rm /tmp/duout.txt
Nick Roz
  • 103
  • 4
6

I learned awk from concocting this example yesterday. It took some time, but it was great fun, and I learned how to use awk.

It runs only du once, and it has a output much similar to du -h

du --max-depth=0 -k * | sort -nr | awk '{ if($1>=1024*1024) {size=$1/1024/1024; unit="G"} else if($1>=1024) {size=$1/1024; unit="M"} else {size=$1; unit="K"}; if(size<10) format="%.1f%s"; else format="%.0f%s"; res=sprintf(format,size,unit); printf "%-8s %s\n",res,$2 }'

It shows numbers below 10 with one decimal point.

marlar
  • 461
6

Here is the simple method I use, very low resource usage and gets you what you need:

du --max-depth=1 | sort -n | awk 'BEGIN {OFMT = "%.0f"} {print $1/1024,"MB", $2}'

0 MB ./etc
1 MB ./mail
2 MB ./tmp
123 MB ./public_html
JacobN
  • 156
6

Another one:

du -h | perl -e'
@l{ K, M, G } = ( 1 .. 3 );
print sort {
    ($aa) = $a =~ /(\w)\s+/;
    ($bb) = $b =~ /(\w)\s+/;
    $l{$aa} <=> $l{$bb} || $a <=> $b
  } <>'
5

du -cka --max-depth=1 /var/log | sort -rn | head -10 | awk '{print ($1)/1024,"MB ", $2'}

Patrick
  • 81
4

If you need to handle spaces you can use the following

 du -d 1| sort -nr | cut -f2 | sed 's/ /\\ /g' | xargs du -sh

The additional sed statement will help alleviate issues with folders with names such as Application Support

Chealion
  • 5,753
2

http://dev.yorhel.nl/ncdu

command: ncdu

Directory navigation, sorting (name and size), graphing, human readable, etc...

2

Another awk solution -

du -k ./* | sort -nr | 
awk '
{split("KB,MB,GB",size,",");}
{x = 1;while ($1 >= 1024) 
{$1 = $1 / 1024;x = x + 1} $1 = sprintf("%-4.2f%s", $1, size[x]); print $0;}'


[jaypal~/Desktop/Reference]$ du -k ./* | sort -nr | awk '{split("KB,MB,GB",size,",");}{x = 1;while ($1 >= 1024) {$1 = $1 / 1024;x = x + 1} $1 = sprintf("%-4.2f%s", $1, size[x]); print $0;}'
15.92MB ./Personal
13.82MB ./Personal/Docs
2.35MB ./Work Docs
1.59MB ./Work Docs/Work
1.46MB ./Personal/Raa
584.00KB ./scan 1.pdf
544.00KB ./Personal/Resume
44.00KB ./Membership.xlsx
16.00KB ./Membership Transmittal Template.xlsx
2

Here is an example

du -h /folder/subfolder --max-depth=1 | sort -hr

Returns:

233M    /folder/subfolder
190M    /folder/subfolder/myfolder1
15M     /folder/subfolder/myfolder4
6.4M    /folder/subfolder/myfolder5
4.2M    /folder/subfolder/myfolder3
3.8M    /folder/subfolder/myfolder2

You could also add | head -10 to find the top 10 or any number of sub-folders in the specified directory.

ode2k
  • 174
1

Voilà:

du -sk /var/log/* | sort -rn | awk '{print $2}' | xargs -ia du -hs "a"
weeheavy
  • 4,149
  • 1
  • 30
  • 41
1

I had been using the solution provided by @ptman, but a recent server change made it no longer viable. Instead, I'm using the following bash script:

#!/bin/bash
# File: duf.sh
# list contents of the current directory by increasing 
#+size in human readable format

# for some, "-d 1" will be "--maxdepth=1"
du -k -d 1 | sort -g | awk '
{
if($1<1024)
    printf("%.0f KB\t%s",$1,$2);
else if($1<1024*1024)
    printf("%.1f MB\t%s",$1/1024,$2);
else
    printf("%.1f GB\t%s",$1/1024/1024,$2);
}'
1

du -s * | sort -nr | cut -f2 | xargs du -sh

1

There are a lot of answers here, many of which are duplicates. I see three trends: piping through a second du call, using complicated shell/awk code, and using other languages.

Here is a POSIX-compliant solution using du and awk that should work on every system.

I've taken a slightly different approach, adding -x to ensure we stay on the same filesystem (I only ever need this operation when I'm short on disk space, so why weed out stuff I've mounted within this FS tree or moved and symlinked back?) and displaying constant units to make for easier visual parsing. In this case, I typically choose not to sort so I can better see the hierarchical structure.

sudo du -x | awk '
  $1 > 2^20 { s=$1; $1=""; printf "%7sG%s\n", sprintf("%.2f",s/2^21), $0 }'

(Since this is in consistent units, you can then append | sort -n if you really want sorted results.)

This filters out any directory whose (cumulative) content fails to exceed 512MB and then displays sizes in gigabytes. By default, du uses a 512-byte block size (so awk's condition of 220 blocks is 512MB and its 221 divisor converts the units to GB — we could use du -kx with $1 > 512*1024 and s/1024^2 to be more human-readable). Inside the awk condition, we set s to the size so we can remove it from the line ($0). This retains the delimiter (which is collapsed to a single space), so the final %s represents a space and then the aggregated directory's name. %7s aligns the rounded %.2f GB size (increase to %8s if you have >10TB).

Unlike most of the solutions here, this properly supports directories with spaces in their names (though every solution, including this one, will mishandle directory names containing line breaks).

Adam Katz
  • 1,082
0

Here's my solution, a simple bash script that only calls du once, and shows you only directories of size 1 MB or larger:

#!/bin/env bash
# Usage: my_du.sh [subdirectory levels]
#   For efficiency, only calls "du" once, and stores results in a temp file
#   Stephen Becker, 2/23/2010

if [ $# -gt 0 ]; then
# You may prefer, as I do, to just summarize the contents of a directory
# and not view the size of its subdirectories, so use this:
    du -h --max-depth $1 > temp_du_file
else
    du -h > temp_du_file
fi


# Show all directories of size > 1 GB:
cat temp_du_file | grep "^\([0-9]\|\.\)\+G" | sort -nr
# Show all directories of size > 1 MB:
cat temp_du_file | grep "^\([0-9]\|\.\)\+M" | sort -nr

rm temp_du_file
0

Why not throw another hat into the ring.... it's an old question, but here's an example that is (mostly) pure shell script (fwiw) -- i.e, just bash and no perl/python/awk/etc. So in that sense maybe it offers something new to the discussion (or not). It calculates file size just once, but prints in various units (my preference). (The un-simplified version includes getopts that excludes "GB" if unwanted.)

#!/bin/bash

printf -- ' %9s %9s %9s       %-30s\n' 'K'        'M'        'G'        'Path'
printf -- ' %9s %9s %9s       %-30s\n' '--------' '--------' '--------' '-----------'
du -sk "$@" | while read val; do
    file=$(echo "$val" | cut -f2-)
    size_k=$(echo "$val"  | cut -f1)
    printf ' %9s %9s %9s       %-30s\n' \
          ${size_k}  \
          $(( size_k / 1024 ))  \
          $(( size_k / 1024 / 1024 ))  \
          "$file"
  done | sort -n
michael
  • 404
0

At least with the usual tools, this will be hard because of the format the human-readable numbers are in (note that sort does a "good job" here as it sorts the numbers - 508, 64, 61, 2, 2 - it just can't sort floating point numbers with an additional multiplier).

I'd try it the other way round - use the output from "du | sort -n -r" and afterwards convert the numbers to human-readable format with some script or program.

schnaader
  • 121
0

What you can try is:

for i in `du -s * | sort -n | cut -f2`
do
  du -h $i;
done

Hope that helps.

0
du | sort -nr | awk '{ cmd = "du -h -d0 "$2"| cut -f1"; cmd | getline human; close(cmd); print human"\t"$2 }'
0

The following solution is similar to cadrian's original however this will only run 2 du commands as opposed to one du for each directory in the tree.

du -hs `du |sort -g |cut -f2- `

However Cardrian's solution is more robust as the above will not work for very heavily populated trees as it could exceed the limit on the size of the arguments passed to du

0

This is the alias I have in my .profile

alias du='sudo du -xh --max-depth=1 | sort -h'

sort -h is what really helps here to the question asked.

Another useful options are du -x to stay on the same filesystem; also sudo helps not to see errors if there are directories that aren't world-readable. Also, I always do du --max-depth=1, then drill down further etc..

Tagar
  • 159
0

Sorts in descending order:

du -s ./* | sort -n| cut -f 2-| xargs -I{} du -sh {}
TrinitronX
  • 1,161
0

Loosely based on the logic in this one-liner, I wrote a script that provides a sorted human-readable du(1) output. Other than requiring the -h flag for human-readability, it requires no other non-POSIX-compatible commands.

It is available at https://github.com/pleappleappleap/sorted-human-du.

0

Yet another du script!

As there is already a lot of answer, I just post my own script there. I use from more than eight years now.

This could by run by

/somepath/rdu.sh [-b] [/somepath] [minSize]

where

  • optional flag -b tell to use byte count instead of block count
  • optional path as 1st argument, current directory if default.
  • if no second argument given, minimal size to be printed is 256Mb.

The output could look like:

\___   3.01G                 21.67%                .cache
|   \___   1.37G                 45.54%                mozilla
|   |   \___   1.37G                100.00%                firefox
|   |   |   \___ 581.71M                 41.48%                billiethek.default
|   |   |   |   \___ 522.64M                 89.85%                cache2
|   |   |   |   |   \___ 522.45M                 99.96%                entries
...

There is the script:

#!/bin/bash

if [ "$1" == "-b" ] ;then shift units=(b K M G T P) duargs="-xbs" minsize=${2:-$((25610242))} else units=(K M G T P) duargs="-xks" minsize=${2:-$((2561024))} fi

humansize() { local _c=$1 _i=0 while [ ${#_c} -gt 3 ] ;do ((_i++)) _c=$((_c>>10)) done _c=$(( ( $11000 ) >> ( 10_i ) )) printf ${2+-v} $2 "%.2f%s" ${_c:0:${#_c}-3}.${_c:${#_c}-3} ${units[_i]} } percent() { local p=000$((${1}00000/$2)) printf ${3+-v} $3 "%.2f%%" ${p:0:${#p}-3}.${p:${#p}-3} }

device=$(stat -c %d "${1:-.}") printf -v sep "%16s" ""

rdu() { local _dir="$1" _spc="$2" _crt _siz _str _tot _pct while read _siz _crt;do if [ "$_crt" = "total" ]; then _tot=$_siz else [ "$_tot" ] || _tot=$_siz if [ $_siz -gt $minsize ];then humansize $_siz _str percent $_siz $_tot _pct printf "%s___ %7s%s%7s%s%s\n"
"$_spc" $_str "$sep" $_pct "$sep" "${_crt##*/}" [ -d "$_crt" ] && [ $(stat -c %d "$_crt") -eq $device ] && rdu "$_crt" "| $_spc" fi fi done < <( find "$_dir" -mindepth 1 -maxdepth 1 -xdev
( -type f -o -type d ) -printf "%D;%p\n" | sed -ne "s/^${device};//p" | tr \n \0 | xargs -0 du ${duargs}c | sort -nr ) }

rdu "${1:-.}"

You may show script on my own site or download them there.

-2

Instead of raping du and friends, you can use ls alone to do what you want:

ls -1Ssh

That will print all files sorted by size written in human-readable form. The first line it prints is the total, if you want to get rid of it you can simply use

ls -1Ssh | tail -n +2

You can add the -r flag to ls if you want the files in the reversed order (from smallest to largest).

drrlvn
  • 129