Data Storage: How Many Words Is A Picture Really Worth?
September 11, 2014
They say a picture is worth a thousand words. Well, it occurs to me that the worth of an image, with words as a unit of worth, is entirely quantifiable. Using simple mathematical calculations based on agreed upon definitions, I can determine exactly how many words any picture of any size is "worth."
The common points of reference are the bit and the byte. A bit is the smallest possible unit of data storage. Basically it can be described as a question of yes/no, on/off, true/false, or 1/0. A byte is precisely 8 bits. Both words and pictures (because both get stored on computers) can be quantified as a number of bits and bytes.
A single byte can be used to describe any letter (upper or lower case) or character of punctuation that is possible to be typed in your keyboard. A combination of ones and zeros with a length of 8 has (2^8) 256 possible combinations. Fifty two are used for the letters in both upper and lower case, ten for the digits, a few dozen for the punctuation marks, leaving many left over for other special characters that I'll bet you didn't even know you could type if you knew the proper combination of
functions on your keyboard. For the purpose of evaluating typing speed, it is assumed that the average word (including spaces and punctuation) is five characters long. Therefore, a word is "worth" five bytes.
The current standard in high quality digital image storage is 24 bit color. All colors are defined as the combination of 256 shades of red, 256 shades of blue, and 256 shades of green. If all three are absent (0,0,0), the color is black. If all three are fully present (256, 256, 256), the color is white. The number of colors in between (16.7 million) defines more shades than the human eye can differentiate by several orders of magnitude. So, every dot in a picture is worth 3 bytes total. The standard for high quality printed images is 300 dots per inch. So, every square inch of a picture is 300 x 300 (90,000) dots, and if each one is "worth" three bytes we have 270,000 bytes per square inch. At five bytes per word, we get 54,000 words per square inch.
Now it is a simple thing to determine how many words a picture is worth based on the size of that picture.
- Wallet Size (2" x 3") 324,000 words
- 4" x 6" 1.3 million words
- 5" x 6" 1.9 million words
- 8" x 10" 4.3 million words
So, what do we learn from this? "A Picture is Worth a Thousand Words," is a GROSS UNDERSTATEMENT.
Why is this important? This illustrates a point regarding data storage. Pictures, not text, take up the room in the hard drives and on the websites, etc. In a loan system, one scanned image is likely to require as much if not more data storage space than all the setup and historical data that is present for the loan to which the scanned image is linked. This is not to say that your images shouldn't be scanned and stored, but it is important to understand. As your data storage needs grow, you will at least know why. It's not the daily accruals and the transaction history, and it's not the attached text documents and spread sheets. It is the scanned documents.
Author's note: If your documents are scanned in grayscale instead of color, the storage requirements are cut down by a factor of three (one byte per dot instead of three). Also, most documents are not scanned and stored at a full print resolution, and most (but not all) graphic file types use data compression algorithms (example if the next 40,000 dots are all white, it takes less data to store the fact that the next 40,000 dots are all white than to store the three bit color definition for each of the 40,000 dots). In any event, at any compression ratio, pictures will still take up more data storage space than text and numeric data.