405

I'd like to graph the size (in bytes, and # of items) of an Amazon S3 bucket and am looking for an efficient way to get the data.

The s3cmd tools provide a way to get the total file size using s3cmd du s3://bucket_name, but I'm worried about its ability to scale since it looks like it fetches data about every file and calculates its own sum. Since Amazon charges users in GB-Months it seems odd that they don't expose this value directly.

Although Amazon's REST API returns the number of items in a bucket, s3cmd doesn't seem to expose it. I could do s3cmd ls -r s3://bucket_name | wc -l but that seems like a hack.

The Ruby AWS::S3 library looked promising, but only provides the # of bucket items, not the total bucket size.

Is anyone aware of any other command line tools or libraries (prefer Perl, PHP, Python, or Ruby) which provide ways of getting this data?

mob
  • 5
powdahound
  • 4,505

27 Answers27

507

This can now be done trivially with just the official AWS command line client:

aws s3 ls --summarize --human-readable --recursive s3://bucket-name/

Official Documentation: AWS CLI Command Reference (version 2)

This also accepts path prefixes if you don't want to count the entire bucket:

aws s3 ls --summarize --human-readable --recursive s3://bucket-name/directory
Synexis
  • 103
philwills
  • 5,244
  • 2
  • 12
  • 5
227

The AWS CLI now supports the --query parameter which takes a JMESPath expressions.

This means you can sum the size values given by list-objects using sum(Contents[].Size) and count like length(Contents[]).

This can be be run using the official AWS CLI as below and was introduced in Feb 2014

 aws s3api list-objects --bucket BUCKETNAME --output json --query "[sum(Contents[].Size), length(Contents[])]"
188

AWS Console:

As of 28th of July 2015 you can get this information via CloudWatch. If you want a GUI, go to the CloudWatch console: (Choose Region > ) Metrics > S3

AWS CLI Command:

This is much quicker than some of the other commands posted here, as it does not query the size of each file individually to calculate the sum.

 aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2015-07-15T10:00:00 --end-time 2015-07-31T01:00:00 --period 86400 --statistics Average --region eu-west-1 --metric-name BucketSizeBytes --dimensions Name=BucketName,Value=toukakoukan.com Name=StorageType,Value=StandardStorage

Important: You must specify both StorageType and BucketName in the dimensions argument otherwise you will get no results. All you need to change is the --start-date, --end-time, and Value=toukakoukan.com.


Here's a bash script you can use to avoid having to specify --start-date and --end-time manually.

#!/bin/bash
bucket=$1
region=$2
now=$(date +%s)
aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time "$(echo "$now - 86400" | bc)" --end-time "$now" --period 86400 --statistics Average --region $region --metric-name BucketSizeBytes --dimensions Name=BucketName,Value="$bucket" Name=StorageType,Value=StandardStorage
Efren
  • 155
Sam Martin
  • 2,074
  • 2
  • 13
  • 10
109

s3cmd can do this :

s3cmd du s3://bucket-name

wazoox
  • 7,156
Stefan Ticu
  • 1,199
30

If you want to get the size from AWS Console:

  1. Go to S3 and select the bucket
  2. Click on "Metrics" tab

enter image description here

By default you should see Total bucket size metrics on the top

26

If you download a usage report, you can graph the daily values for the TimedStorage-ByteHrs field.

If you want that number in GiB, just divide by 1024 * 1024 * 1024 * 24 (that's GiB-hours for a 24-hour cycle). If you want the number in bytes, just divide by 24 and graph away.

26

Using the official AWS s3 command line tools:

aws s3 ls s3://bucket/folder --recursive | awk 'BEGIN {total=0}{total+=$3}END{print total/1024/1024" MB"}'

This is a better command, just add the following 3 parameters --summarize --human-readable --recursive after aws s3 ls. --summarize is not required though gives a nice touch on the total size.

aws s3 ls s3://bucket/folder --summarize --human-readable --recursive
dyltini
  • 361
16

s4cmd is the fastest way I've found (a command-line utility written in Python):

pip install s4cmd

Now to calculate the entire bucket size using multiple threads:

s4cmd du -r s3://bucket-name
9

You can use the s3cmd utility, e.g.:

s3cmd du -H s3://Mybucket
97G      s3://Mybucket/
Giovanni Toraldo
  • 2,607
  • 21
  • 27
6

I used the S3 REST/Curl API listed earlier in this thread and did this:

<?php
if (!class_exists('S3')) require_once 'S3.php';

// Instantiate the class
$s3 = new S3('accessKeyId', 'secretAccessKey');
S3::$useSSL = false;

// List your buckets:
echo "S3::listBuckets(): ";
echo '<pre>' . print_r($s3->listBuckets(), 1). '</pre>';

$totalSize = 0;
$objects = $s3->getBucket('name-of-your-bucket');
foreach ($objects as $name => $val) {
    // If you want to get the size of a particular directory, you can do
    // only that.
    // if (strpos($name, 'directory/sub-directory') !== false)
    $totalSize += $val['size'];
}

echo ($totalSize / 1024 / 1024 / 1024) . ' GB';
?>
Vic
  • 294
6

So trolling around through the API and playing some same queries, S3 will produce the entire contents of a bucket in one request and it doesn't need to descend into directories. The results then just requiring summing through the various XML elements, and not repeated calls. I don't have a sample bucket that has thousands of items so I don't know how well it will scale, but it seems reasonably simple.

4

I recommend using S3 Usage Report for large buckets, see my How To on how to get it Basically you need to download Usage Report for S3 service for the last day with Timed Storage - Byte Hrs and parse it to get disk usage.

cat report.csv | awk -F, '{printf "%.2f GB %s %s \n", $7/(1024**3 )/24, $4, $2}' | sort -n
4

The AWS documentation tells you how to do it:

aws s3 ls s3://bucketnanme --recursive --human-readable --summarize

This is the output you get:

2016-05-17 00:28:14    0 Bytes folder/
2016-05-17 00:30:57    4.7 KiB folder/file.jpg
2016-05-17 00:31:00  108.9 KiB folder/file.png
2016-05-17 00:31:03   43.2 KiB folder/file.jpg
2016-05-17 00:31:08  158.6 KiB folder/file.jpg
2016-05-17 00:31:12   70.6 KiB folder/file.png
2016-05-17 00:43:50   64.1 KiB folder/folder/folder/folder/file.jpg

Total Objects: 7

   Total Size: 450.1 KiB
4

... A bit late but, the best way I found is by using the reports in the AWS portal. I made a PHP class for downloading and parsing the reports. With it you can get total number of objects for each bucket, total size in GB or byte hrs and more.

Check it out and let me know if was helpful

AmazonTools

2

For a really low-tech approach: use an S3 client that can calculate the size for you. I'm using Panic's Transmit, click on a bucket, do "Get Info" and click the "Calculate"-button. I'm not sure how fast or accurate it is in relation to other methods, but it seems to give back the size I had expected it to be.

zmippie
  • 129
2

Since there are so many answers, I figured I'd pitch in with my own. I wrote my implementation in C# using LINQPad. Copy, paste, and enter in the access key, secret key, region endpoint, and bucket name you want to query. Also, make sure to add the AWSSDK nuget package.

Testing against one of my buckets, it gave me a count of 128075 and a size of 70.6GB. I know that is 99.9999% accurate, so I'm good with the result.

void Main() {
    var s3Client = new AmazonS3Client("accessKey", "secretKey", RegionEndpoint.???);
    var stop = false;
    var objectsCount = 0;
    var objectsSize = 0L;
    var nextMarker = string.Empty;

    while (!stop) {
        var response = s3Client.ListObjects(new ListObjectsRequest {
            BucketName = "",
            Marker = nextMarker
        });

        objectsCount += response.S3Objects.Count;
        objectsSize += response.S3Objects.Sum(
            o =>
                o.Size);
        nextMarker = response.NextMarker;
        stop = response.S3Objects.Count < 1000;
    }

    new {
        Count = objectsCount,
        Size = objectsSize.BytesToString()
    }.Dump();
}

static class Int64Extensions {
    public static string BytesToString(
        this long byteCount) {
        if (byteCount == 0) {
            return "0B";
        }

        var suffix = new string[] { "B", "KB", "MB", "GB", "TB", "PB", "EB" };
        var longBytes = Math.Abs(byteCount);
        var place = Convert.ToInt32(Math.Floor(Math.Log(longBytes, 1024)));
        var number = Math.Round(longBytes / Math.Pow(1024, place), 1);

        return string.Format("{0}{1}", Math.Sign(byteCount) * number, suffix[place]);
    }
}
1

I know this is an older question but here is a PowerShell example:

Get-S3Object -BucketName <buckename> | select key, size | foreach {$A += $_.size}

$A contains the size of the bucket, and there is a keyname parameter if you just want the size of a specific folder in a bucket.

BE77Y
  • 2,697
DCJeff
  • 21
1

To check all buckets size try this bash script

s3list=`aws s3 ls | awk  '{print $3}'`
for s3dir in $s3list
do
    echo $s3dir
    aws s3 ls "s3://$s3dir"  --recursive --human-readable --summarize | grep "Total Size"
done
1

You can use s3cmd:

s3cmd du s3://Mybucket -H

or

s3cmd du s3://Mybucket --human-readable

It gives the total objects and the size of the bucket in a very readable form.

womble
  • 98,245
bpathak
  • 11
0

I wrote a Bash script, s3-du.sh that will list files in bucket with s3ls, and print count of files, and sizes like

s3-du.sh testbucket.jonzobrist.com
149 files in bucket testbucket.jonzobrist.com
11760850920 B
11485205 KB
11216 MB
10 GB

Full script:

#!/bin/bash

if [ “${1}” ]
then
NUM=0
COUNT=0
for N in `s3ls ${1} | awk ‘{print $11}’ | grep [0-9]`
do
NUM=`expr $NUM + $N`
((COUNT++))
done
KB=`expr ${NUM} / 1024`
MB=`expr ${NUM} / 1048576`
GB=`expr ${NUM} / 1073741824`
echo “${COUNT} files in bucket ${1}”
echo “${NUM} B”
echo “${KB} KB”
echo “${MB} MB”
echo “${GB} GB”
else
echo “Usage : ${0} s3-bucket”
exit 1
fi    

It does do subdirectory size, as Amazon returns the directory name and the size of all of it's contents.

Deer Hunter
  • 1,110
0

Also Hanzo S3 Tools does this. Once installed, you can do:

s3ls -s -H bucketname

But I believe this is also summed on the client side and not retrieved through the AWS API.

Giacomo1968
  • 3,553
  • 29
  • 42
Ville
  • 277
0

Hey there is a metdata search tool for AWS S3 at https://s3search.p3-labs.com/.This tool gives statstics about objects in a bucket with search on metadata.

longneck
  • 23,272
pyth
  • 101
0

By Cloudberry program is also possible to list the size of the bucket, amount of folders and total files, clicking "properties" right on top of the bucket.

Giacomo1968
  • 3,553
  • 29
  • 42
KiKo
  • 1
0

If you don't want to use the command-line, on Windows and OSX, there's a general purpose remote file management app called Cyberduck. Log into S3 with your access/secret key pair, right-click on the directory, click Calculate.

jpillora
  • 101
0

This works for me..

aws s3 ls s3://bucket/folder/ --recursive | awk '{sz+=$3} END {print sz/1024/1024 "MB"}'
Flup
  • 8,398
GrantO
  • 11
0

CloudWatch has a default S3 service dashboard now which lists it in a graph called "Bucket Size Bytes Average". I think this link will work for anyone already logged into AWS Console:

flickerfly
  • 2,903
-1

Following way uses AWS PHP SDK to get the total size of the bucket.

// make sure that you are using correct region (where the bucket is) to get new Amazon S3 client
$client = \Aws\S3\S3Client::factory(array('region' => $region));

// check if bucket exists
if (!$client->doesBucketExist($bucket, $accept403 = true)) {
    return false;
}
// get bucket objects
$objects = $client->getBucket(array('Bucket' => $bucket));

$total_size_bytes = 0;
$contents = $objects['Contents'];

// iterate through all contents to get total size
foreach ($contents as $key => $value) {
   $total_bytes += $value['Size'];
}
$total_size_gb = $total_size_bytes / 1024 / 1024 / 1024;