The Durability of Data
Copyright (C) 02/1998 by Howard Fuhs
The Information is Invisible
Redundancy through Innovation
The Durability of Data Media
Construction of Soft Data Media
Magnetic Metal Particles
Hard Disk Media
Optical Storage Media
Physical Write and Read Techniques
Analogue Storage Methods
High-Density Storage Media
Storage of Data Media
Unexpected Storage Problems
The Cost of Archives
Lifetime of Different Storage Media
Even Charlie Chaplin would not have been able to imagine just how complex modern times can be when the question is about the endurance of data media and digitally stored information.
A steel engineering company has been in the market for 48 years and is proud of the results they have accomplished and the constructions they have erected. Everything has been properly documented in a complete and excellently organised drawings' archive. Irrespective of whether a drawing is from 1994, or from 1954 it can be found within a few minutes. However, this oldfashioned paper-based archive is now to be terminated. Everything gets scanned into computer systems and stored and controlled digitally.
The advantages of computerising access to documents and information in comparison to maintaining a paper-based archive are obvious. Digital information is faster to retrieve or sort according to a multitude of criteria, and less storage space is required. While we have been able to read information stored on paper or clay tiles for hundreds or even thousands of years - think of the Gutenberg Bible and other documents important to the history of mankind - in the case of digitally stored information we can express the problem of endurance with the simple question: 48 years from now, who will be able to read information stored in 1997?
This problem, which has only started to become evident in the last few years, is already troublesome for some computer users. Nothing is as volatile as stored data. E.g., NASA has lost access to rocket design data because data tapes have become unreadable, and in some cases because the computers and programs used to process the data no longer exist. In this article we will discuss some of the problems associated with long-term storage of digital information and the lifetime limitations of media used to store data.
Let us imagine the situation of an archaeologist in the 28th century (I'm a practising optimist) who has excavated a CD-ROM from 1998. lf no records exist that inform about the fundamental functions of a CD-ROM, our noble successor is placed in a situation where he can only speculate about the significance of the silvery disk. Just think of the huge enigmas posed us by the early Egyptians, the Atzteks and the Mayas, civilisations that did not even use such a complex technology as computers and digital data storage. In the case of the CD-ROM it might be an ornament, a burial gift or perhaps a symbol of fertility. But would our archaeologist be able to imagine that this silvery disk might be a digital storage medium, perhaps containing important information about our culture or this particular period of time?
A fundamental problem in relation to long-term data storage is the fact that human beings after a period of time has elapsed, no longer are able to read and process the stored information. Machines are required that are able to extract and interpret the information in order to make it visible to humans. The information is not directly understandable or readable.
Information stored on paper can directly be recognised as such by humans, and if written in a language understood by the person, also directly read. Even in case of a foreign language, a translation would be easy to accomplish, thus making the information understandable.
Not so in the case of data on a storage medium. A glance at the storage media reveals nothing about its content. Both equipment and knowledge is required to read the information. First you need a mechanism that is mechanically capable of handling the data medium. Then you need a suitable computer with suitable software and an operating system to control this mechanism. On top of this it is necessary to know how the data is organised on the storage medium as well as the actual format used to represent the information. With regard to the data organisation it is necessary to possess information about e.g. the number of sectors and tracks, and about which storage system has been used (FAT16 or VFAT32, HPPS or NTFS). About the data format it is necessary to know which application was used to store the information and how that particular application placed the information on the storage medium. lf this information is unavailable, the information comes across as a series of binary 0s and 1s and nothing else. lf you consider the large number of different data formats in use in just office software packages and the number of filters used to read different file formats supplied with these packages, then you can begin to appreciate the magnitude of the problems facing the next generations when they attempt to read our digitally stored information.
According to a popular saying, it is impossible to arrest technical progress. The proliferation of Computers certainly seems to bear this out. The life cycles of both hardware and software get shorter and shorter. Inherent to these innovation cycles are not only new data formats but also completely new storage technologies and new storage media that make older storage technologies obsolete over a somewhat longer period of time. Just think of the 8" and the 5.25" diskettes. Drives and controllers to handle these storage media can no longer be bought through normal trade channels. The same is true for hard-disk systems. RLL or MFM systems were the standard in the beginning of the '90s. Whoever now still owns a RLL hard-disk containing important information will no longer be able to buy a suitable controller and access the data.
Even if the raw data can be accessed, the risk remains that the data format can no longer be understood. A database produced in 1987 using e.g. Open Access Il software, will in fact be seriously difficult to read today. Where would you find a copy of this software? Even if one were available it might not run on a modern computer platform. Software originally developed for e.g. 8086 or 80286 based machines does not automatically run on a 80486 or a Pentium.
Operating systems for old Computers can also pose a problem. Programs designed to run under e.g. DOS 3.3 and Windows 2.0 will not run under the present operating systems.
The problem of redundancy through innovation is both a short-term problem and also automatically a long-term problem for those who are forced to store data and information digitally over longer periods of time. Because the technology is in a permanent state of flux, it is in practice necessary to store a complete hardware setup (e.g. a UNIVAC from 1968) plus the necessary operating software for each type of data storage medium in an archive in order to be able to read and interpret the data.
This also holds true for instruction manuals for hardware and software (who will remember in a couple of decades that the directory on a diskette for a Commodore C64 is accessed with the command LOAD "$" ,8,1 ......?)
Knowledge about the durability and stability of data media is extremely important when developing archival strategies for mass storage. Only by possessing a thorough knowledge of media stability and durability is it possible to make informed selections when it comes to choosing suitable mass storage and archive media.
When discussing the durability of data media it is necessary to distinguish between the physical resilience of the medium, itself, and the longevity of the data stored on the medium. The fact that data disappear from a storage medium does not necessarily indicate that this in itself is defective. On the other hand, any physical defect in a storage medium automatically compromises the integrity of the data stored in the defective area of the medium.
The durability of a storage medium depends fundamentally on three factors: The mix of materials used to manufacture the medium, the manner in which the storage medium is physically stored and the technology used to write/read the information.
The materials used in manufacture have a significant influence on the durability of storage media. Depending on the choice of materials, physical and chemical changes can be observed in a storage medium, which over a period of time may lead to loss of information or a shortening of expected lifetime. E.g. some materials can over long periods of time develop chemical reactions leading to diffusion, deformation or oxidation of the materials.
The physical build of diskette and tape media are almost identical, only the physical form of the medium is different in the two cases. A binder is spread out on a carrier film and used to bind small magnetic particles to the film. These particles are used to store the information. All three components can give rise to problems which again may lead to damage to the storage medium.
The substrate film is responsible for correct transport of the magnetic particles inside the storage drive. A high degree of dimensional stability is required. Currently, the most frequently used material for the film is PET (Polyethylene Terephthalate). This material has proven to be very chemically stable, both under laboratory conditions and in the field.
Furthermore, PET is very resistant to oxidation and hydrolysis (the breakdown of chemical substances because of humidity, mostly under influence of a catalyst or enzyme). With respect to chemical stability the durability of PET is longer than that of the binder and the metal particles.
lt is extremely important for substrate film materials to remain dimensionally stable despite heavy and protracted use. Lack of dimensional stabifity may lead to tracks or sectors being shorter or more narrow when a read attempt is made than when they were written. The substrate film must also be resistant to electrostatic charging.
In order to make polymers flexible, softeners are added during the manufacturing process. These softeners may disappear with time, making the film brittle, thus causing it to crack or break.
The binder is responsible for attaching the magnetic particles to the film substrate. In most cases polyester polyurethane is employed. This polyester compound is susceptible to hydrolysis, which causes polymer chains to untangle from the compound because of a reaction between water and polyester. This process releases corrosive and alcoholic compounds, which speed up the hydrolysis and furthermore attack the magnetic metal particles. lf the binder layer gets damaged the storage medium is rendered useless and the data in most cases irretrievable. Damage through hydrolysis is radically and progressively increased by increased humidity in the environment used to store the media.
Some manufacturers use polyther polyurethane instead of polyester polyurethane. This material is not as susceptible to hydrolysis but so much the more to oxidation.
Hydrolysis can often lead to a rubbery tape surface and increased friction between tape and read/write heads. The consequences of this state is a succession of worsening faults ranging from frequent tape rewinding and tape misalignment problems over constant fouling of read/write heads and tape salad, to serious defects in read/write heads caused by excessive wear.
The effect of hydrolysis is temporarily reversible. The result is never as good as the original product but it may be sufficiently adequate to recover data from a defective data medium within a few days. Because of a steady improvement in quality of the metal pigmentation the binders are the weak links in the materials chain today.
The magnetic particles bound to the film substrate by the binder are responsible for magnetically storing the actual information. lf the magnetic state of a metal particle changes, this means that the data stored in that particular spot change as well. Thus, it is important that the metal pigment can retain a particular magnetic state for as long as possible. Modern storage media often use particles of metal oxides such as barium ferrite (Ba0.6Fe2O3) or gamma iron oxide (y-Fe2O3) because these materials have proven themselves particularly stable.
Above all, iron (Fe) pigments have proven extremely susceptible to corrosion and oxidation. To counter this, methods are used to cover an iron core with a protective coating of Aluminium oxide or silicon dioxide. This drastically reduces oxidation problems.
Hard disk media are basically constructed similarly to diskettes. The main differences are the data density plus the fact that the magnetic layer is not bonded to a film but rather to a rigid plate of metal or glass. This removes the problems caused by the substrate film in the soft storage media. The stability of a hard disk drive is expressed by the manufacturers as the mean number of hours before failure (MTBF, Mean Time Between Failures). In spite of an MTBF of more than 100,000 hours this does not mean that fixed disk drives are able to store data for longer periods of time without problems. lt is important even for hard disks to maintain the optimum room temperature and humidity. The hard disks may be encapsulated in a rigid metal house, but the encapsulation is not airtight. In the 1980s some airtight hard disk drives were produced but this was given up for reasons of cost. The result is that even hard disk drives are susceptible to problems caused by oxidation and hydrolysis in case of elevated humidity.
A further problem is posed by the heat generated inside a hard disk drive. Depending on drive model and the rotational speed of the disk(s) it may even be necessary to provide extra cooling to obtain the MTBF specified by the manufacturer.
Because optical storage media are read from and written to by means of laser light they are not subject to the same wear mechanisms as tapes. For this reason the theoretical durability of optical media is considerably longer than that for data media brought in direct contact with read/write heads.
Even the durability of optical storage media such as CD-ROMs is limited despite the touch-free data transfer. Also in this case the materials play a large role. A CD-ROM consists of a polycarbonate substrate, which is vapour-coated with an Aluminium layer. A frequent reason for a CD-ROM to stop functioning is that the reflectivity of the aluminium layer changes. This may happen because of corrosion or oxidation, or because the metal layer flakes off the polycarbonate substrate.
Particularly in cases of corrosion or oxidation it has often been observed that the protective lacquer (acrylic or nitro-cellulose) covering the metal layer had been damaged. The title-print on a CDROM can also negatively influence its life expectancy. The inks may interact chemically with the metal layer and gradually damage it.
With regard to the polycarbonate substrate, a gradual reduction in optical translucency may shorten the life of a CD-ROM. Polycarbonate can above all be attacked by various organic substances in the immediate environment, e.g. brought there as parts of fingerprints. In this case the result is a local change in the polycarbonate substrate which leads to read errors.
The polycarbonate substrate is obviously, just as the metal layer, vulnerable to mechanical damage and scratches.
The influence on the durability by material selections and other choices made by the manufacturer is considerable. Laboratory tests have demonstrated that even the quality of what seems to be the exact same product can vary quite widely between different production runs from a single manufacturer. This means that e.g. two tapes from the same manufacturer but from two different production runs can demonstrate differences in lifetime. This is caused by tolerances in the manufacturing process, itself, as well as in the raw materials processed. Normally, these differences are marginal, but they must be taken into consideration when designing a project involving long-term data archiving.
Physical access techniques are basically partitioned into those which touch the storage medium during this process, and those that don't. Effectively contactless techniques are employed in case of hard disks, diskettes and CD-ROMs. Accessing these media causes little, or in case of the CD-ROM, no, surface wear. The case is different for tapes. For technical reasons a contact-based read/write procedure is always employed in this case. In most tape drives this consists of pulling the tape around a rotating read/write head in a half-circle. This does not only cause a certain wear but additional damage to the tape in case foreign particles are present on the tape or the read/write head. These foreign particles may e.g. be flecks of dust or particles that have loosened from the tape surface and become logged between the tape and the surface of the rotating read/write head.
lt is interesting that analogue storage methods normally offer longer durability than digital methods. Analogue tapes such as those used to record sound remain playable for many years because their signal structure is more robust and fault tolerant than is the case with the other methods. Cases in point are the old 'Datasettes' for the C64 computers and the cassette drives for the old CPM computers from the same manufacturer. In these cases data were still analogously stored.
By considering the progress in data density until now (1996, 20MB hard disk, 5.25"; 1993, 200MB hard disk, 3.5"; 1997, 4GB hard disk, 3.5") you reach the conclusion that data density is an important factor with regard to long-term data storage. Unfortunately, a higher data density does not imply increased data longevity. Not only that, but increased data density also means that each bit of information occupies less physical space. Thus, the increase in density must be paid for with an increase in data loss caused by damage to a similar surface area. The consequence is that the damage caused by a dust particle or surface oxidation is much larger than for low-density media. For high-density data media to display the same level of reliability as low-density media in daily use, the requirements to the cleanliness of surroundings and drives as well as to the storage of the data media have increased.
Ensuring data integrity is an important consideration when we talk about data storage. When back-up programs and procedures are used it must be assumed that these include an integrity test after writing. The case is different if we look at a simple write to a hard disk. In this case it is necessary to use a checksum program to establish the integrity of the data and subsequently to perform a recalculation and comparison at regular intervals. A checksumming program serves to quickly and reliably detect changes to data. A good checksumming program is often able to tell something about the cause of a change, e.g. if it is caused by surface oxidation or a computer virus. Regular integrity tests are an important component in any archival strategy.
Correct physical storage of data media is required in order to be able to attain the longest possible storage time without data loss. Environmental factors such as air temperature and humidity should be kept constant. The greater the variations in temperature and humidity, the shorter the life-span of stored data. Basically, manufacturers' recommendations should be adhered to with regard to temperature and humidity. It is normally safe to apply an air temperature value of 21ºC and a relative humidity of 50% when it is necessary to keep the data available for immediate use. For long-term storage without access these values should be reduced considerably. A temperature reduction means that chemical processes in the storage media will be slowed down. A constant temperature of 4ºC and a relative humidity of 20% is normally recornmended for long-term storage of data media.
The lifetime of storage media can also be shortened by other environmental influences. Chemically polluted air can lead to oxidation of the surface of a storage medium. Dust particles on the surface of a storage medium must also be avoided because they damage the data surface during read/write operations and pollute the read/write head of the drive.
UV light can also lead to premature ageing of the substrate by initialising or speeding up detrimental chemical processes. lt follows that storage media should not be exposed to prolonged periods of sunlight, i.e. they should be stored in closed cabinets or dark rooms rather than on shelves in rooms with windows.
Finally, one more remark regarding the storage of data media. Massive data loss has been incurred in some catastrophic cases in spite of both security measures and the storage of the data media being exemplary. All of these cases involved storage media contained in fireproof boxes designed to protect them from fire. Fireproof boxes are guaranteed by their manufacturers to be able to maintain a specified internal maximum temperature for a specified period of time in case of fire. These boxes are designed to protect data media stored inside them for a prolonged period of time despite the influence of a large fire, so that data are not destroyed by deformation, surface oxidation or loosening of the magnetic layer from the carrier film.
In all the known cases the boxes did actually keep the fire and its effects at bay. The damage leading to total loss of data was caused by the fact that these boxes were not water tight. They let in water or chemical agents used to fight the fire. This damaged the storage media and rendered them useless. Especially older fireproof boxes tend to display this problem when they are needed the most.
When it is necessary to store information in digital form on storage media over an extended period of time it is also necessary to consider the costs of doing so securely and in a well organised fashion. The costs associated with data storage comprise a considerably higher number of components than just the room rent. In case of longterm storage it is recommended to transfer all information to new data media at regular intervals, every 2 - 5 years depending on the media in use. This procedure minimises the risk of data loss, but it does carry additional costs for staff and the purchase of new storage media. The storage and maintenance of older types of computer equipment must also be taken into consideration when budgeting an archive. lt can become particularly costly when old data need to be converted into newer formats for reasons of compatibility. However, this procedure will often serve to ascertain that the data will remain accessible for some years into the future. The same is true when transferring data from old storage media to modern media. lt is necessary to consider the additional expenses to purchase new hardware in addition to the cost of copying the information.
Only a few scientific studies exist that specify figures for the expected longevity and the durability of different storage media. The data that do exist are so imprecise that they only can be considered rough guidelines. Furthermore, it should be kept in mind that the increasing utilisation of storage media (because of system changes) will tend to push their expected lifetime downwards.
One of the most poignant facts is that no storage medium manufacturer is ready to give a legally binding guarantee with respect to the durability of his products. Some approximate values can be found in the table below.
|Data Medium||Format||Life Length|
|Tape||DLT||10 - 30 years|
|DD-2||10 - 15 years|
|QIC||5 - 30 years|
|D8 (8mm data)||2 - 39 years|
|CD-ROM||ISO||5 - 100 years|
|M-O Disk||3.5"/5.25"||5 - 100 years|
|WORM||10 - 100 years|
Computers allow users to process a huge amount of data within a short span of time. Bearing this in mind, it is inevitable that historically and culturally important information that society must retain, gets stored on digital storage media. We do, however, run the risk of this computer-generated information for a variety of reasons becoming inaccessible after a few years of storage. lf we only start to think about how best to organise computer archives when that happens, it will be too late. The knowledge would be lost. lf such a loss does not raise any concerns it is relevant to ask why we now expend so much time and effort to generate and process information that is deemed unimportant and insignificant. So insignificant that the loss of this information does not raise concerns...
Copyright (C) 02/1998 by Howard Fuhs. All rights reserved.