A not very brave new world of storage

I just bought a pair of 4TB drives. I installed them in a LaCie RAID drive enclosure to make a single 8TB drive that will become my Time Machine back-up drive.

Until I run out of space again.

I know that people occasionally wax nostalgic about the good ol’ days when everything we had on our computer could fit on a single 10 MB disk drive. Allow me to be nostalgic for a few paragraphs…

I bought my first hard drive in 1980. I don’t remember the brand, but I do remember the size – it was about the size of a small refrigerator, it weighed over 80 pounds, and it took two people to carry it. The platters inside the drive were 14 inches in diameter, and it had a drive motor about the size of a coffee can. A v-belt went from the motor to the disk hub.

Because of the overhead needed to keep track of the information on the drive, the 10 MB drive lost 1.3 MB to directory and housekeeping files, limiting the capacity of the drive to 8.7 MB. I remember the manufacturer of the drive telling me that their drives were not foolproof, so they recommended that I buy two of them, and write to both idependently and separately so that they both had the same information on them in case one failed.

So I bought two, for about $10,000.

The drives worked pretty well, but they had a serious fragmentation problem, leaving sectors of the disk surface empty – and unusable – when we deleted files. This resulted in a false measure of the drive’s remaining capacity. The manufacturer had a solution for this fragmentation problem: you would invoke a command to copy the contents of the drive back to itself, and simultaneously recover the lost sectors. The problem was that the information was “lifted” from the surface, then the surface was erased, then the data in memory would be written back to the newly-erased sectors on the drive, eliminating the unused sectors, and making the free space available again.

Big LaCie drive

This is the LaCie RAID drive I have. It originally came with two 2TB drives; now it has two 4TB drives (I confess I broke the little seal that said the warranty would be void if broken). Now the drive has 8TB of capacity to act as my Time Machine back-up disk.

The problem with the technique was that the information was erased from the disk before it was re-written, leaving important information held only by whatever memory technology was inside the cabinet. If there was a power failure or just a momentary transient electrical glitch in the power, my data would be lost, never to be recovered.

The drives, and the computer consoles that drove data to them, communicated by a electronic scheme called differential communication. This wasn’t serial, not parallel, but differential. I remember being told that the method was developed for the railroads to communicate over pairs of wires traveling long distances between stations. Differential communication was very reliable, and the signal strength did not suffer from line-loss as it does with serial communication wiring.

Despite their tenuous temperaments, the two 10MB hard drives behaved pretty well over the years, providing a safe storage space for hundreds of typesetting jobs that had been created in my shop. These typesetting jobs were written in a coding scheme called TTS, which originally came from paper tape punch machines. This six-level coding allowed for type and control information to be sent – mechanically – from the typist’s keyboard to the typesetting machine.

Before we had these terminals and the hard disk, we stored jobs on punched paper tapes. If a correction or rerun was needed, we would retrieve the paper tape and edit the file, then generate a new paper tape and run it on our typesetting machines.

Back to my future…

My current hard drive array is a two-drive set-up acting as a single disk. Its capacity is 8 TB, which is a pretty big number: 8,796,093,022,208. Lop-off a few megabytes for housekeeping, and it’s surely a bit smaller. The Library of Congress, according to Wikipedia, adds five terabytes of data to their archive every month. I add a bit less than that.

But, the reason I am installing this drive is that four terabytes was not enough back-up space for my current computer. In the four drive slots on my Mac Pro are two-terabyte drives, totaling twice what my old 4TB back-up drive could record. Now they will be matched, and my Time Machine back-up can at least keep up with the information on my internal drives. I will have solved the problem for the short term.

In the 1980s, when I bought my first hard drive, a single terabyte of data was inconceivable. Ten megabytes seemed impossible to fill (though we did it many times). Back then we were thinking only in terms of text files. The text of a book is extraordinarily small; in a book I just printed, the 632 text pages amount to only 772,000 bytes of data. At that modest size 12 of those books would have fit on the hard drive in 1980. I could fit 11 million books of that size on my new RAID drive.

In 1980 we were not thinking about photographs.

Photographs are the voracious consumers of hard drives; they eat drives for lunch and don’t leave any scraps. My current camera makes images that are 60.2 MB in size uncompressed. In their original (Raw) format, they are typically 20-25 MB each, meaning that my 1980 hard drive, filled to capacity, could have stored less than half of one image. So, when I go out to shoot at an event, and come home with 800 photos in a single day, I am consuming the equivalent of 1,600 of those 1980 hard drives.

And then there is video.

This never ends. My son, who is a cinematographer, shoots whole terabyte drives worth of video in a single day. He owns a stack of external 2TB drives on which he puts individual projects while they are edited. When I think of the challenge he faces, I am grateful that I just shoot still photos. He has consumed exabytes of data (the one after tera), and he’s barely getting started in his career. As cameras get better (higher resolution), I am sure he will be at the forefront of storage-consumption.

And then there are the GigaPan photos

My small but growing collection of GigaPan images is also taxing my storage systems. My largest file to date, the Bishop Peak panorama, consists of 2,058 individual images, each of which is about 25 MB. My work flow for completing a GigaPan image is to convert the Raw images to TIFF, then stitch the TIFFs into the final image. GigaPan images are measured in gigabytes (obviously), and that particular image is 18.4 GB in size, almost 2,000 times the capacity of my 1980 hard drive. The folder of images that make-up the Bishop Peak photo is 108,648,971,239 bytes (109 GB) with over 4,000 high-resolution photos.

…at a price that I can afford!

The amazing thing about all this is that the capacity and price of hard drives has changed so dramatically in the 32 years since I bought the original hard drive. My two new 4TB drives are made by Western Digital. They are exactly the same size as the two 2TB drives I took out. The price per megabyte is an astonishingly low fraction of a penny (it has nine zeroes after the decimal). And the irony here is that these are considered “expensive” in the computer industry right now.

My next task will be to build and install a large storage network device, called a NAS. My reason for doing this is that I am fairly quickly running out of storage on my internal drives, and my back-up archive is on DVDs. I have over 730,000 files stored on these optical discs, and they are starting to fail.

We were warned!

The National Archive told us, years ago, not to trust writable optical discs. Though I did not ignore their pleas, I had no alternative, so I continued to write my archive onto commercial DVDs. Over time, light and dye density have diminished the readability of some of these discs, making some of them unreadable.

To sidestep this crisis, I plan to move my archive off of the seemingly reliable but unstable optical discs and back onto mechanical, moving hard drives in a storage array. As for the capacity of that NAS device, I am currently looking at a 15TB model which can easily hold the archive. All of my archived files comprise only one-tenth the capacity of that device.

It’s ironic that I now see mechanical drives as being more reliable than non-moving optical discs. These are interesting times indeed.

More on the NAS in a future post.

About Brian Lawler

Brian Lawler is an Associate Professor of Graphic Communication at California Polytechnic State University, San Luis Obispo. He writes about graphic arts processes and technologies for various industry publications, and on his blog, The Blognosticator.
This entry was posted in Business, Digital video, New technology, Software. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>