Look before you leap into Disk Encryption

Posted by tong on Tue 28 May 2013 at 16:42

Thinking Disk Encryption give you more peace of mind? Then think again. It's well known that "fail to plan" means "plan to fail". But when comes to Disk Encryption I did not see any reasonable planning on disk failure, even though I've googled extensively.

Before I go into detail, let's outline the problem we are trying to solve here -- Disk Encryption for *normal* *home* user. They differ from big corporation in that, big corporation will throw away disks once SMART *indicates* the disk is failing, while normal home user will try still to use it until it fails massively. Well, at least I do that, and I buy cheap 3TB Seagate Barracuda drive, which is even cheaper than 2TB Western Digital hard drive, because I know carefully planning makes all the differences.

When I asked the question on the Debian-user mailing-list, the first answer I got was, "have more backups". If that's the only answer that comes into your mind, then you are just kidding yourself or not a normal home user as I outlined above, because practically, how would you backup a 3TB hard drive, onto spindles and spindles of DVDs or just buy another 3TB hard drive? What if they fail as well?

So it all boils down to assessing the risk and do proper planning.

How big the risk is with Disk Encryption? A tiny error in the hard drive, your 3TB storage could be gone forever. If someone is saying at the back of their mind "I might still have a chance to salvage the situation as always before", then he is simply "planning" to fail, because Disk Encryption is designed to fight against forensic analysis, even the pro's can't do it. Blindly go into the Full Disk Encryption without knowing how to properly provide a safety net for yourself is going to be a total disaster, because when I googled for answers, all that I got was incidents after incidents that the disk is gone forever.

The solution?

I'm still investigating the situation, and have the following three strategies available. Please jump in if you have more suggestions.

1. Know what to backup. This is very important, especially for Disk Encryption. So far I haven't backup anything on my encrypted disk yet (because it is impractical to me as a normal home user), but will do the following immediately. Details is from http://lists.debian.org/debian-user/2013/05/msg00025.html:

"I guess what you are referring to can happen if you get bad sectors where the LUKS header resides. This is a single point of failure in LUKS whole-disk encryption, to plan for this you must have current backups (but most likely on another encrypted media, so there is always a tiny probability that this is going to happen there too), and backup the LUKS headers (see command "cryptsetup luksHeaderBackup"). See cryptsetup man for security good practice regarding the headers backups."

2. Armed with better gadget. Having backups is the passive way to build a safety net, but how about active protection? Markus Gattol, the author of dm-crypt and LUKS full-disk encryption, recommend to use Btrfs. He has been a BtrFS fan as early as August 2008 (http://www.markus-gattol.name/ws/dm-crypt_luks.html). Why? Because "Btrfs is a new copy on write (CoW) filesystem for Linux aimed at implementing advanced features while focusing on fault tolerance, repair and easy administration" (https://help.ubuntu.com/community/btrfs). It is jointly developed at Oracle, Red Hat, Fujitsu, Intel, SUSE, STRATO and many others (https://btrfs.wiki.kernel.org/index.php/Main_Page). In 2008, the principal developer of the ext3 and ext4 file systems, Theodore Ts'o, stated that although ext4 has improved features, it is not a major advance, it uses old technology, and is a stop-gap; Ts'o believes that Btrfs is the better direction because "it offers improvements in scalability, reliability, and ease of management" (http://en.wikipedia.org/wiki/Btrfs). What attract me of the BtrFS fault tolerance features are:

  • Online data scrubbing for finding errors and automatically fixing them for files with redundant copies.
  • Checksums on data and metadata
  • Check out the rest at http://en.wikipedia.org/wiki/Btrfs

3. Quarantine the disk failures. Sharing a secret to all normal home user like me -- you don't need to buy expensive hard drives in hoping that they will not fail. Bad disk sector will happen, regardlessly, no matter how many times more expensive your disk is than mine. Hard disk will fail, no matter what, but what hardly happens is when it fails massively. What I used to do is to mark the bad sectors in inodes as bad and not using them any more. Works great. Don't believe my words? Check this out: http://www.linuxforum.com/threads/3265-bad-sectors-on-disk, "I have some bad sectors on my hard drive. What I did was to make a partition on the part which has the bad sectors. Then I just do not use that particular partition. It's been two years now. The rest of the hard drive is still working well, 12-16 hours every day, seven days a week."

Still don't feel safe enough? Here is another trick, about how I control where my disk failures occur. My old 1.5T Seagate Barracuda is nearly 10 years old now, living way pass its warranty period. If you to take a look at the SMART status report, you will find the astonishing "DISK FAILURE IS IMMINENT" warning, because the reallocated units is more than 100 times over the disk failing threshold. But still I'm confident that it's working fine for me. The trick is to quarantine the disk failures. I knew that my Seagate Barracuda will fail much easily than any other brands, so I treat it like a re-writable CD/DVD. How to prolong the life span of re-writable CD/DVD? By minimizing the re-write times.

In brief there are three kind of partitions in my hard disk.:

  • My caches, constantly being writing to and overwritten, but the content is not important. Cope mechanism: constant bad block checks.
  • My documents, which are fairly constantly being written and overwritten, and the content is important. Cope mechanism: triple, quadruple backups.
  • My collections, HUGE amount, no way to backup for me. Cope mechanism: put them into "write-once" partitions. I.e., the whole partition is only updated when new files comes in. No any unnecessary updates what so ever.

Thus, even SMART tells me that my DISK FAILURE IS IMMINENT, even the reallocated units is more than 100 times over the disk failing threshold. I know my file are safe.

Alright, enough bubbling. What's your idea to cope with disk failures, especially for full-disk encryption?

 

 


Posted by Anonymous (77.56.xx.xx) on Tue 28 May 2013 at 23:44
I'd suggest using eCryptFS instead. You still have the same file system and have some information leakage, but the encryption is good enough to avoid data loss from theft.

You can backup the encrypted keyfile and encrypted files directly to the cloud or other media.

[ Parent | Reply to this comment ]

Posted by Anonymous (113.71.xx.xx) on Fri 5 Jul 2013 at 08:26
I don't know if eCryptFS gets the benefits of the CPU's AES accelerations or not. Last I checked, it didn't but that might not be true anymore.
In any case, I think hardware encryption used by devices such as Intel's 500 series SSDs is a lot better than any software encryption. There is absolutely 0% overhead. Even if you don't set the hardware encryption, the SSD still uses encryption but it just encrypts everything without a password. So there's no performance hit when you enable an encrypted password on the hardware device, but I think you do have to wipe it.

[ Parent | Reply to this comment ]

Posted by Anonymous (66.93.xx.xx) on Wed 29 May 2013 at 18:27
"how would you backup a 3TB hard drive, onto spindles and spindles of DVDs or just buy another 3TB hard drive? What if they fail as well?"

Easy-peasy. Never buy one drive. Always buy three. Like you said, they're cheap!

[ Parent | Reply to this comment ]

Posted by Anonymous (24.104.xx.xx) on Mon 8 Jul 2013 at 20:56
Linux mdadm is one of the few I've seen that allows three drive RAID 1 mirroring. It would actually make it easy to pop out the third drive every once in a while and take it offsite, replacing it with a returned drive from offsite.

[ Parent | Reply to this comment ]

Posted by simonw (84.45.xx.xx) on Thu 30 May 2013 at 22:35
[ View Weblogs ]
Backups.

Note that when hard disks fail, they commonly fail catastrophically. Google's large study notes that significant proportion of failures had no SMART parameters that would indicate a gradual failure, or predictable failure. Home users would expect higher failures rates than disks in data centres, due to coffee, children, and the like. They also are less likely to take a drive out of service because of SMART parameter changes.

Motor and circuitry failures are common. The average home user can't afford to fix this, because it is expensive. In the case of head crash (which can't be that rare because I've heard it enough to recognize the noise above the hum of a data centre) the data is typically unrecoverable.

Thus if they don't have backups then their data will be likely be lost, encryption or not.

What is the precise failure mode you are worried about. I think you'll find that the encryption folk have it reasonably well covered. Truecrypt recommend a recovery disk is created that allows recovery from corruption of encryption meta-data, or corruption of the operating system files (when the rest of the disk can be decrypted to allow normal recovery procedures to be performed).

Note that most disks have bad block sparing, so any indication of a failed block (in regular usage) is usually a sign of some sort of failure process in action. The idea that you should carry on using a modern drive reporting bad blocks at the operating system level is madness.

I wouldn't be surprised if by forcing a write to every block by encrypting empty space, encryption will force bad block sparing, detecting problem areas before they are used in anger. Be interesting to see a proper study comparing failure rates with/without encryption.

The biggest problem is forgetting passwords. Sometimes keeping keys in key safes is the right approach, that applies to some passwords.

Where data matters, I mirror disks, I take backups. I agree many home users don't, but I don't believe encryption, done right, changes this scenario significantly. Ultimately we need people to make backups, and I don't think PC vendors like to mention it, because it typically adds to the cost, and they all like to compete on cost.

[ Parent | Reply to this comment ]

Posted by Anonymous (24.104.xx.xx) on Mon 8 Jul 2013 at 20:58
"They also are less likely to take a drive out of service because of SMART parameter changes. "
Does the newest version of Windows even monitor smart by default? Last time I used Windows, I remember that it just ignores it.

[ Parent | Reply to this comment ]

Posted by mcortese (193.78.xx.xx) on Fri 31 May 2013 at 14:49
[ View Weblogs ]
I agree with simonw: disk encryption won't make failures happen more often (except, perhaps, when using SSD, because the need to obfuscate actual block usage may spoil the wear-leveling algorithms).

If a particular file system has a single point of failure, then of course it makes sense to take special measures to backup the relevant bits of metadata. But this is not specific to disk encryption.

And once failure happens, home users don't normally resort to forensic techniques to recover their data, encrypted or not.

[ Parent | Reply to this comment ]

Posted by Anonymous (101.98.xx.xx) on Tue 18 Jun 2013 at 03:04
As I see it there are four different issues
1. Total disk failure
2. Encrypted container(disk) failure/corruption
3. File/block corruption
4. File system failure/corruption
The only solution I know of for total disk failure is to have more than one copy of the data. eg Backup or Backup + raid
As I understand it many encryption containers allow the backing up of the container headers/meta-data to prevent whole container loss (and the all files within) due to bit(s) errors in their headers. So use one that does.
To protect against file/block corruption due to bit-rot I know of two options, first use error correcting files like .par2 (en.wikipedia.org/wiki/Parchive) or iceecc (ice-graphics.com/ICEECC/IndexE.html) on Windows so errors can be fixed in place and secondly use grc.com's spinrite for data recovery and preventive maintenance(fix potential bit-rot before it causes problems).
Once bit rot has been removed standard file-system tools can be used for any file system issues.

When I was archiving or backing up to DVD I would always filling the remaining space on the disk with error correcting file(s), that way if I had problems reading any of the files I could recover them with out resorting to an off-site backup.

I have a large media collection that I can't afford to backup but I can afford a little space (<10%) for error correction files and a copy of spinrite. This keeps my files in shape until I move them to a new higher capacity drive (eg 10TB when available) or it dies.

[ Parent | Reply to this comment ]

Posted by Anonymous (168.221.xx.xx) on Tue 2 Jul 2013 at 19:45
1 work drive and three back up drives
I work on one machine and I have a back up drive for that one machine, plus I sync all of my data to two laptops I have... all of them have full disk encryption. Should I lose my workstation, my laptops, and the back up drive have my information and should my work laptop fail, my workstation, back up drive and the other laptop have my data.

In anycase, you could always back up your documents to google dive, you pictures to picasa, your movies to youtube and you should still be good to go, if you are ok with them knowing what you fap to... xD
However you could theoretically store hundruds of gigs of information that way, without having to pay hundreds of dollars.

[ Parent | Reply to this comment ]

Posted by Anonymous (144.32.xx.xx) on Tue 26 Nov 2013 at 12:02
When your disk is showing as failing, how are you checking that your data is not being corrupted?
I know that one of the major advantages of ZFS/BTRFS is that it is checksumming your data on read and write and can recover if there is a data error on the disk.

If your SMART data is telling you that the disk is failing then chances are some of the data you are reading is not correct.

So maybe the solution is two or three disks, LUKS encrypt all of them then BTRFS for RAID on top of that allowing it to fix the bad bits as it finds them.

Would be nice if the OS had a nice way of letting you know it was seeing a lot of bad bits on a disk so that you could replace it.

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

What do you use for configuration management?








( 123 votes ~ 0 comments )