Lab Notes

October 31, 2015

An Observation on the Size of Images on the Internet

I haven't posted anything on this site in two years. One might say I've gotten lazy or disappeared altogether, but the truth is I've gotten so busy with other projects for people who are paying me, that I haven't had time to contribute much. But, over time I have amassed a collection of random and miscellaneous observations that would certainly be nice to write down somewhere. So thus begins the start of my online lab notes. Plus there are some new large articles coming soon...including some research into face recognition, and ways to make cell phones save battery power.

What a better way to start than with an observation about the Internet itself. Lately I've been working on a very large R&D project about pictures on the Internet, being funded by a ... wealthy interested party. So naturally, I've been doing lots and lots of research about the properties of the pictures that people post to the Internet.

Here's one interesting thing that I found: the file sizes of the pictures that people post to the Internet are pareto distributed. 20 percent of all of the images on the Internet represent approximately 80 percent of the total on-disk filesize of all those images.

Take a look at the file sizes of about 400 user-posted images that were collected from a popular image website. (Click for full size)

Here it is with a log scale for file size. (Click for full size)

Interesting little bit of information, especially if you are someone looking to start a file sharing website.