Weblog entry #82 for dkg
I'm happy to report that despite several weeks of regular upgrade/churn from unstable and experimental, i have yet to see any data loss or other serious forms of failure.
Unfortunately, i'm not impressed with the performance. The machine feels sluggish in this configuratiyon, compared to how i remember it running with previous non-btrfs installations. So i ran some benchmarks. The results don't look good for btrfs in its present incarnation.
UPDATE: see the comments section for revised statistics from a quieter system, with the filesystems over the same partition (btrfs is still much slower).
The simplified test system i'm running has Linux kernel 2.6.39-rc6-686-pae (from experimental), 1GiB of RAM (no swap), and a single 2GHz P4 CPU. It has one parallel ATA hard disk (WDC WD400EB-00CPF0), with two primary partitions (one btrfs and one ext3). The root filesystem is btrfs. The ext3 filesystem is mounted at /mnt
I used bonnie++ to benchmark the ext3 filesystem against the btrfs filesystem as a non-privileged user.
Here are the results on the test ext3 filesystem:
consoleuser@loki:~$ cat bonnie-stats.ext3
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
loki 2264M 331 98 23464 11 10988 4 1174 85 39629 6 130.4 5
Latency 92041us 1128ms 1835ms 166ms 308ms 6549ms
Version 1.96 ------Sequential Create------ --------Random Create--------
loki -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 9964 26 +++++ +++ 13035 26 11089 27 +++++ +++ 11888 24
Latency 17882us 1418us 1929us 489us 51us 650us
1.96,1.96,loki,1,1305039600,2264M,,331,98,23464,11,10988,4,1174,85,39629,6,130.4,5,16,,,,,9964,26,+++++,+++,13035,26,11089,27,+++++,+++,11888,24,92041us,1128ms,1835ms,166ms,308ms,6549ms,17882us,1418us,1929us,489us,51us,650us
consoleuser@loki:~$
And here are the results for btrfs (on the main filesystem):
consoleuser@loki:~$ cat bonnie-stats.btrfs
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
loki 2264M 43 99 22682 17 10356 6 1038 79 28796 6 86.8 99
Latency 293ms 727ms 1222ms 46541us 504ms 13094ms
Version 1.96 ------Sequential Create------ --------Random Create--------
loki -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 1623 33 +++++ +++ 2182 57 1974 27 +++++ +++ 1907 44
Latency 78474us 6839us 8791us 1746us 66us 64034us
1.96,1.96,loki,1,1305040411,2264M,,43,99,22682,17,10356,6,1038,79,28796,6,86.8,99,16,,,,,1623,33,+++++,+++,2182,57,1974,27,+++++,+++,1907,44,293ms,727ms,1222ms,46541us,504ms,13094ms,78474us,6839us,8791us,1746us,66us,64034us
consoleuser@loki:~$
As you can see, btrfs is significantly slower in several categories: - writing character-at-a-time is *much* slower: 43K/sec vs. 331K/sec
- reading block-at-a-time is slower: 28796K/sec vs. 39629K/sec
- all forms of file creation and deletion are nearly an order of magnitude slower
- Random seeks are almost as fast, but they swamp the CPU
I like the sound of the features we will eventually get from btrfs, but these performance figures seem like a pretty rough tradeoff.
Comments on this Entry
I just tested a bunch of disks today and confirmed this, once again.
[ Parent | Reply to this comment ]
However where can one(I), get this information for modern hard drives?
Does anyone publish disc sector maps?
Can this be done with LBA?
I know that I can run lots of automated speed tests on very small groups of sectors, which should show me the locations of inner vs outer tracks.
But does any manufacturer actually publish this for just this purpose of having really fast filesystems?
Does anybody really care anymore?, besides me or you that is.
Thanks for an interesting comment.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Send Message | View dkg's Scratchpad | View Weblogs ]
[ Parent | Reply to this comment ]
The number of remapped sectors since the time of purchase will be small when compared to the total disk capacity. Presumably some sectors are remapped before the disk leaves the factory, zcav isn't accurate enough to measure the impact of such sectors on overall performance.
I presume that manufacturers have spare zones that are distributed throughout the disk to avoid significant seek penalties for remapped sectors.
While there's no guarantee that a single block will remain fast (it could end up having one of it's sectors being somewhere that requires a seek) as a general rule you can rely on a range of 1GB of data that's located at a fast part of the disk (usually the low sector numbers) remaining relatively fast - even if it has a few remapped sectors.
[ Parent | Reply to this comment ]
[ Send Message | View dkg's Scratchpad | View Weblogs ]
consoleuser@loki:~$ cat bonnie-stats.2.ext3
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
loki 2264M 290 98 24403 12 12580 5 1308 99 44483 7 140.1 5
Latency 116ms 1123ms 1692ms 26977us 168ms 6488ms
Version 1.96 ------Sequential Create------ --------Random Create--------
loki -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 15663 40 +++++ +++ 21690 39 16036 39 +++++ +++ 20955 39
Latency 22223us 1415us 2046us 355us 56us 902us
1.96,1.96,loki,1,1305056984,2264M,,290,98,24403,12,12580,5,1308,9 9,44483,7,140.1,5,16,,,,,15663,40,+++++,+++,21690,39,16036,39,+++ ++,+++,20955,39,116ms,1123ms,1692ms,26977us,168ms,6488ms,22223us, 1415us,2046us,355us,56us,902us
and here's btrfs:
consoleuser@loki:~$ cat bonnie-stats.2.btrfs
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
loki 2264M 94 99 26102 18 12079 9 422 94 36563 9 126.6 53
Latency 202ms 420ms 523ms 135ms 138ms 12659ms
Version 1.96 ------Sequential Create------ --------Random Create--------
loki -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 4844 80 +++++ +++ 3615 92 5132 78 +++++ +++ 7089 89
Latency 2926us 6800us 7076us 1209us 61us 1256us
1.96,1.96,loki,1,1305054222,2264M,,94,99,26102,18,12079,9,422,94, 36563,9,126.6,53,16,,,,,4844,80,+++++,+++,3615,92,5132,78,+++++,+ ++,7089,89,202ms,420ms,523ms,135ms,138ms,12659ms,2926us,6800us,70 76us,1209us,61us,1256us
consoleuser@loki:~$
[ Parent | Reply to this comment ]
How-tos for native zfs root:
[0]Ubuntu + ZFS native + root filesystem
Excellent guide showing real-world zfs usage examples in addition to the zfs installation and zpool set up.
[1]HOWTO install Ubuntu to a Native ZFS Root Filesystem
By Darik Horn, of DBAN fame!
Re: licensing:
[3]http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIssue
"modified to build as a CDDL licensed kernel module which is not distributed as part of the Linux kernel. This makes a Native ZFS on Linux implementation possible if you are willing to download and build it yourself."
Source:
[4]https://github.com/behlendorf/zfs
[5]https://github.com/behlendorf/spl
PPA [maverick packages are said to be binary compatible with squeeze] :
[6]https://launchpad.net/~dajhorn/+archive/zfs
[7]https://launchpad.net/~dajhorn/+archive/zfs-grub
PPA maintained by Darik Horn, of DBAN fame!
Apt-clone:
Nifty: the folks at Nexenta have integrated zfs snapshotting with apt. See the perl script for apt-clone (and man page) within their modified apt source, usage is identical to apt-get:
[8]http://apt.nexenta.org/wip/dists/unstable/main/source/admin/apt_0 .8.0nexenta8.tar.gz
[ Parent | Reply to this comment ]
[ Send Message | View dkg's Scratchpad | View Weblogs ]
I'd be very happy if zfs was licensed to work with the linux kernel and distributed by upstream, though.
[ Parent | Reply to this comment ]
Consider: you are violating the copyright of the software authors. When this is done in the context of musicians, it's called "stealing music" and is treated rather seriously.
[ Parent | Reply to this comment ]
[ Send Message | View dkg's Scratchpad | View Weblogs ]
The "keep it free" (or "viral") provisions of the GPL have specific triggers -- for example, if you redistribute the software (modified or not), you must ensure that the redistributed software is itself under the terms of the GPL. This isn't doable if the modifications make it impossible to satisfy the licenses.
I'd actually argue that the combined software that is not redistributable is not ultimately free software, since you no longer have freedom 2 -- the ability to redistribute copies. This is the same reason that debian wouldn't include such a beast in the debian repositories.
But none of this means that a user who uses a privately-made combination of mutually-incompatibly-licensed software is in fact violating the copyright of the software authors; they've already granted the user the freedom to do what they like with the tools. But the user is no longer using free software, because redistribution (or other triggering actions) are now impossible without a copyright violation.
[ Parent | Reply to this comment ]
Including the plain text output as well is good because last time I checked there was no web browser that displayed such tables in a manner that was fit for a Braille reader. But for those of us who use graphical browsers the output of bon_csv2html is much more useful.
[ Parent | Reply to this comment ]
[ Send Message | View dkg's Scratchpad | View Weblogs ]
| Version 1.96 | Sequential Output | Sequential Input | Random Seeks | Sequential Create | Random Create | |||||||||||||||||||||
| Size | Per Char | Block | Rewrite | Per Char | Block | Num Files | Create | Read | Delete | Create | Read | Delete | ||||||||||||||
| K/sec | % CPU | K/sec | % CPU | K/sec | % CPU | K/sec | % CPU | K/sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | |||
| ext3 | 2264M | 331 | 98 | 23464 | 11 | 10988 | 4 | 1174 | 85 | 39629 | 6 | 130.4 | 5 | 16 | 9964 | 26 | +++++ | +++ | 13035 | 26 | 11089 | 27 | +++++ | +++ | 11888 | 24 |
| Latency | 92041us | 1128ms | 1835ms | 166ms | 308ms | 6549ms | Latency | 17882us | 1418us | 1929us | 489us | 51us | 650us | |||||||||||||
| btrfs | 2264M | 43 | 99 | 22682 | 17 | 10356 | 6 | 1038 | 79 | 28796 | 6 | 86.8 | 99 | 16 | 1623 | 33 | +++++ | +++ | 2182 | 57 | 1974 | 27 | +++++ | +++ | 1907 | 44 |
| Latency | 293ms | 727ms | 1222ms | 46541us | 504ms | 13094ms | Latency | 78474us | 6839us | 8791us | 1746us | 66us | 64034us | |||||||||||||
[ Parent | Reply to this comment ]
Also tests with much larger values for -n would be interesting, -n1024 (a million files) would be a good test if you have plenty of time.
[ Parent | Reply to this comment ]
[ Send Message | View dkg's Scratchpad | View Weblogs ]
Below are the results for -n 1024 (it took about 8 hours on this machine to run it against ext3, ext4, and btrfs). Interestingly, with the larger numbers (and compared against ext4 as well), the results don't seem as clear cut.
| Version 1.96 | Sequential Output | Sequential Input | Random Seeks | Sequential Create | Random Create | |||||||||||||||||||||
| Size | Per Char | Block | Rewrite | Per Char | Block | Num Files | Create | Read | Delete | Create | Read | Delete | ||||||||||||||
| K/sec | % CPU | K/sec | % CPU | K/sec | % CPU | K/sec | % CPU | K/sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | /sec | % CPU | |||
| ext3 | 2264M | 291 | 98 | 25491 | 12 | 12152 | 5 | 900 | 99 | 43078 | 6 | 140.6 | 5 | 1024 | 8921 | 30 | 2126 | 3 | 442 | 1 | 8821 | 30 | 2974 | 5 | 308 | 1 |
| Latency | 127ms | 1022ms | 1964ms | 37681us | 283ms | 8986ms | Latency | 4363ms | 477ms | 58147ms | 3235ms | 289ms | 35291ms | |||||||||||||
| ext4 | 2264M | 288 | 98 | 27934 | 9 | 13119 | 5 | 1297 | 97 | 42403 | 6 | 143.5 | 4 | 1024 | 9554 | 34 | 18894 | 30 | 509 | 1 | 9670 | 34 | 11519 | 20 | 368 | 1 |
| Latency | 31764us | 422ms | 429ms | 30167us | 356ms | 8491ms | Latency | 3516ms | 333ms | 39248ms | 3758ms | 416ms | 34514ms | |||||||||||||
| btrfs | 2264M | 93 | 99 | 27444 | 9 | 12186 | 7 | 270 | 96 | 35813 | 9 | 124.5 | 52 | 1024 | 801 | 15 | 18312 | 82 | 562 | 17 | 569 | 11 | 19251 | 90 | 94 | 5 |
| Latency | 101ms | 826ms | 823ms | 32888us | 379ms | 11605ms | Latency | 26820ms | 64847us | 23235ms | 29521ms | 23745us | 32222ms | |||||||||||||
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
ext3 2264M 291 98 25491 12 12152 5 900 99 43078 6 140.6 5
Latency 127ms 1022ms 1964ms 37681us 283ms 8986ms
Version 1.96 ------Sequential Create------ --------Random Create--------
ext3 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
1024 8921 30 2126 3 442 1 8821 30 2974 5 308 1
Latency 4363ms 477ms 58147ms 3235ms 289ms 35291ms
1.96,1.96,ext3,1,1305221427,2264M,,291,98,25491,12,12152,5,900,99 ,43078,6,140.6,5,1024,,,,,8921,30,2126,3,442,1,8821,30,2974,5,308 ,1,127ms,1022ms,1964ms,37681us,283ms,8986ms,4363ms,477ms,58147ms, 3235ms,289ms,35291ms
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
ext4 2264M 288 98 27934 9 13119 5 1297 97 42403 6 143.5 4
Latency 31764us 422ms 429ms 30167us 356ms 8491ms
Version 1.96 ------Sequential Create------ --------Random Create--------
ext4 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
1024 9554 34 18894 30 509 1 9670 34 11519 20 368 1
Latency 3516ms 333ms 39248ms 3758ms 416ms 34514ms
1.96,1.96,ext4,1,1305224451,2264M,,288,98,27934,9,13119,5,1297,97 ,42403,6,143.5,4,1024,,,,,9554,34,18894,30,509,1,9670,34,11519,20 ,368,1,31764us,422ms,429ms,30167us,356ms,8491ms,3516ms,333ms,3924 8ms,3758ms,416ms,34514ms
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
btrfs 2264M 93 99 27444 9 12186 7 270 96 35813 9 124.5 52
Latency 101ms 826ms 823ms 32888us 379ms 11605ms
Version 1.96 ------Sequential Create------ --------Random Create--------
btrfs -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
1024 801 15 18312 82 562 17 569 11 19251 90 94 5
Latency 26820ms 64847us 23235ms 29521ms 23745us 32222ms
1.96,1.96,btrfs,1,1305234245,2264M,,93,99,27444,9,12186,7,270,96, 35813,9,124.5,52,1024,,,,,801,15,18312,82,562,17,569,11,19251,90, 94,5,101ms,826ms,823ms,32888us,379ms,11605ms,26820ms,64847us,2323 5ms,29521ms,23745us,32222ms
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Send Message | View dkg's Scratchpad | View Weblogs ]
[ Parent | Reply to this comment ]
This algorithm makes more sense for the per-char tests than for some of the other ones.
[ Parent | Reply to this comment ]
Google 'Phoronix btrfs lzo' and check out the first link.
[ Parent | Reply to this comment ]
[ Send Message | View dkg's Scratchpad | View Weblogs ]
That article doesn't show convincing gains to me for LZO over gzip except for one or two benchmarks (some of which seem dubious, like massive write tests -- i wonder if the data being written in that particular benchmark happens to compress well with LZO), and it shows a pretty convincing failure in "multithreaded random writes", which sounds like the closest to real-world activity :/
I haven't read anything about space_cache yet; got any pointers? Maybe you want to try to run btrfs with these different options and report back your results?
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]