How to determine the file size of a remote HTTP object

May 17, 2016 using tags curl, http, bash, numfmt

There are times when you need to know the file size of an HTTP object without actually downloading the file. This little trick comes in very handy when web servers respond with the Content-Length of an object in the HEAD request itself.

Let’s use the front page banner for disjoint.ca as an example.

$ curl -sI https://s3.amazonaws.com/media.disjoint.ca/disjoint-ca-banner.jpg | grep Content-Length
Content-Length: 233659

Nice. You can see from the HEAD request that the content length for that particular object (the jpg file) is 233659 bytes.

Let’s try and make this human friendly.

$ curl -sI https://s3.amazonaws.com/media.disjoint.ca/disjoint-ca-banner.jpg | grep Content-Length | sed 's/[^0-9]//g' | numfmt --to=si
234K

Better. The sed here takes care of removing anything that isn’t a number and the numfmt utility takes care of converting it to human readable units.

Let’s take this one step further and turn this into a bash function (in ~/.bashrc) so we never have to remember these pipes and switches again!

# Determine size of a remote file via a HEAD request
function rfs() {
  local url="$1"

  if [ -z "$url" ]; then
    echo "usage: rfs <url>"
    return 1
  fi

  curl -sIL "$url" | grep Content-Length | sed 's/[^0-9]//g' | numfmt --to=si
}

This is much nicer.

$ rfs https://s3.amazonaws.com/media.disjoint.ca/disjoint-ca-banner.jpg
234K

If you’ve never used numfmt before, this website has a pretty good primer on its various options; definitely worth a read!