5

I have been thinking about the permanence of information lately and I was wondering at what rate a bit on a consumer spinning disk hard drive is corrupted and what are some factors that speed up or slow down this rate? I am astonished that programs written to memory today may slowly decay in time due to decaying memory! Will programmers in fifty years have to spend time repairing or recreating ancient underlying libs?

edit:Thank you @clabacchio for pointing me to error correction algorithms, these usually treat the problem I am anticipating here.

Sean Southern
  • 343
  • 1
  • 3
  • 8

4 Answers4

7

I tell people (nobody listens) that ALL hard drives will crash, and that this occurs sometime between 10 years and 10 minutes from new.

I've not yet seen one reach 10 years of continual use - but I've met a few 10 minutes ones.

CDs that I've checked that I wrote 15+ years ago are generally still OK when I try them - casual check only. Data on them is either now irrelevant or redundantly stored elsewhere.

If you want a 50 + year archive (I do) then you MUST work at it with redundant backups, off site storage and more.

Increased temperature kills hard drives.

Brand matters. Google know but won't tell.

Vibration doesn't help.

Some CD/DVD material is better than others and the top manufacturers have archival material that they claim will have 100 year life if stored as recommended. Don't trust to this if you care.

Powered on drives seem to last as well as those powered on less often BUT it seems likely that if you have a drive which is ONLY used for backup that it will last well if powered up only when used BUT store at good temperature etc when off. BUT if it uses Aluminum wet electrolytics they will die FASTER when the drive is unpowered!!! If you wanted really long hard drive life with occasional use you'd have to work at it and still use several redundant copies.

50 year backup has to survive standard electrical hazards and disk death + fire, theft, earthquake and operator death.

Online storage can help as it puts the burden on those who SHOULD know BUT do not trust it. Do not use Megaupload :-). Multiple independent online storage sites can help. Ensuring true independence may be tricky. As google moves to own, er, store all the world's data you may find that vendors A B and C also use Google behind the scenes.

I have almost 1 million photos stored. I wonder if my great grandchildren will get to see any of them ? :-). 119,204 / 90.55 GB of them here That's paid up 10years ahead BUT no guarantee it will be there 10 years from now.

Russell McMahon
  • 150,303
  • 18
  • 213
  • 391
  • I hope that there might one day be a better way to store data, managing backups can be a chore! – Sean Southern Mar 02 '12 at 13:06
  • 1
    Google did a pretty interesting study awhile back about drive failures. Anyone who loves statistics might enjoy reading it: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/disk_failures.pdf – Kellenjb Mar 02 '12 at 13:39
  • The Google report as what I had in mind when I said that they knew but would not say which brands of drives worked most reliably. Their naming brands would have destroyed the company concerned overnight,. – Russell McMahon Mar 02 '12 at 16:53
  • @RussellMcMahon I figured that was what you were talking about. I understand Google trying to play nice, but it sure would be nice to know. It would be interesting to try to figure out what drives google actually is buying. This is relevant: http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F9034%2F28680%2F01285441.pdf%3Farnumber%3D1285441&authDecision=-203 if you have a membership. – Kellenjb Mar 02 '12 at 17:40
  • I have a Sun E250 SPARC machine deployed in 2001 and working continuously to this day, with all 12 of its original disks. Three of my own IDE disks from the mid-90s still work, but on the whole, the lifetime of a single disk is immaterial. That's why we have RAID and backups. Information doesn't need to depend on the health of an intrinsically imperfect physical medium. – Alexios Mar 02 '12 at 21:35
  • Regarding vibration, I am reminded of a YouTube video where an engineer from Sun was literally yelling at his hard drives, and the vibration from yelling managed to significantly affect hard drive latency. http://www.youtube.com/watch?v=tDacjrSCeq4 – ajs410 Sep 28 '12 at 19:19
5

Most consumer electronics products have a lifetime of a few years or less (think mobile phones). I can't think of anything electronic that's been designed to operate for 50 years. Even data carriers like CD-ROMs have a limited lifetime of 10, at most 20 years. Hard disks don't carry information which should remain unaltered for tens of years.
But in practice the limited lifetime of carriers is not a problem. With progressing technology data is being copied several times to new carriers during its lifetime. Business archives on CD-ROM are being copied to DVD. DVD's will be copied to BlueRay or it's successor.

edit
Also, in cases where data integrity is premium data is stored with error correction codes, which can repair the occasional error. For instance CD-ROMS use Reed-Solomon codes.

stevenvh
  • 145,832
  • 21
  • 457
  • 668
  • When copying information to a new carrier are defects that have already accumulated carry on to the new copy? – Sean Southern Mar 02 '12 at 12:16
  • @SeanSouthern no, because there are correction mechanism that use buffering sectors to save data from damaged memory cells. So, unless you have an error involving several bits, it's most probably corrected by forms of redundancy. – clabacchio Mar 02 '12 at 12:23
  • @Sean - the idea is that you copy before end-of-life, i.e. before errors occur. If a device has a data retention of 20 years (like Atmel DataFlash) you don't wait until the last day (even if that 20 years will have a safety margin). – stevenvh Mar 02 '12 at 12:23
  • @clabacchio Ah! That is very interesting, I did not know that that kind of thing is used. But what do you mean by mechanism? Is the error correction that you speak of usually a process started by the OS or is there software included with the drive to perform this much faster? – Sean Southern Mar 02 '12 at 12:46
  • @SeanSouthern it depends; I don't really know the implementation, but I know (I've seen a patent too) that there is an error correction mechanism that works during the writing process (!) which checks that the data is correctly placed in a good sector, otherwise it's copied to a safety buffer from which is then read, whith a sort of indexing table. So AFAIK doesn't prevent from losing data over time. But I know that there are also correction algorithms that work using redundancy (it reduces drastically the effective density of data) – clabacchio Mar 02 '12 at 13:08
  • @SeanSouthern You might be interested in the Reed-Solomon Error Correction. It is used on CDs/DVDs/bluray/dsl/wimax/raid6.... – Kellenjb Mar 02 '12 at 13:31
  • On any media, if you get too many errors you are unable to overcome them even with error correction. Many of the fancy tools that do data recovery will read the bits at a deeper level then just 1s and 0s. They will mark each bit as what they believe the bit is plus the percent that they are sure it is a bit. So if you can imagine 0v meaning a 0 and 1v meaning a 1... if the bit is charged to .6v they would say, we think it is a 1, but with little confidence. They then can feed that into the error correction scheme and recover a lot of data that wouldn't have been recovered otherwise.... – Kellenjb Mar 02 '12 at 13:36
  • .... This of course only applies to media that you can actually read physically what has happened. – Kellenjb Mar 02 '12 at 13:37
4

While 'bit rot' is easy to define, the probability of bit rot to to occur is rather difficult to express. There are several papers (see here) about the subject. In the case of bit rot strictly caused by some cosmic radiation flipping 1 bit or something similar (assuming everything works fine within the system) we are talking about extremely big amounts of data and/or time spans to witness one such event.

IMHO, as bloatware keeps growing and since we are at the dawn of the 'Internet of Things' (another, not potential, but real biblical data flood), in a few decades this could be a headache for the everyday user. I guess large server farms (Google? Facebook?) already are experiencing such issues. There is however a technique, called 'data scrubbing', which basically consists of a background task that periodically checks data integrity and does the necessary corrections. Microsoft has developed a new file system for Windows 8 servers, called ReFS (Resilient File System) - a less imaginative name than the leaked working name 'Protogon' - which incorporates scrubbing.

On the perverse side of things, real bit rot is not really solved by back-up. While backup can be effective in the sense that the errors tend to be localized and it is highly unlikely that the same data stored in two different places is corrupted in both locations at the same time, the increased amount of raw data will increase the probability of an accidentally flipped bit.

Glorfindel
  • 1,245
  • 3
  • 15
  • 20
Count Zero
  • 993
  • 1
  • 6
  • 21
4

Library and information science is another field at the heart of this debate. The International Journal of Library Science has many articles about long-term data storage.

My two bits as to how this affects many embedded engineering projects. Seriously consider how you archive your projects! Can you install all tools and patches from local files? Do you have all the drivers for the hardware? Do you have the correct viewers for the output file types? Are there any operating system constraints?

If the data lasts 50 years but you don't have the tools, it is useless. I once needed to retrieve user data 10 years after it had been lovingly stored on 100MB ZIP disks. It took over a week to build a Windows 95 machine (we needed a 16-bit OS and DOSBox didn't emulate properly), find drivers, copy files, and track down and buy a used UV chip eraser in order to make a small change to the legacy code. Had a single Windows 95 machine been kept as the "archive" machine, none of that would have been necessary.

Data retention is only part of the equation, you must also maintain the tools that give you access to your data.

spearson
  • 1,533
  • 8
  • 13