PDA

View Full Version : [SOLVED] F17 & btrfs


tox
14th November 2011, 03:20 AM
will btrfs be made default FS in F17?

or should i make a post on the devel-list and hope Josef Bacik replies to it?


Answer to my question is in the F17 Feature List

renkinjutsu
14th November 2011, 05:36 AM
I think one of the main things holding back btrfs from being made default is a good fsck tool.. The repair tool is only being worked on by one guy and there's no definite timeline. I doubt Josef bacik would know much more than that.. All we know is that btrfsck is coming.. someday

tox
14th November 2011, 05:41 AM
I think one of the main things holding back btrfs from being made default is a good fsck tool.. The repair tool is only being worked on by one guy and there's no definite timeline. I doubt Josef bacik would know much more than that.. All we know is that btrfsck is coming.. someday

your probably right. i havent seen much movement in the way of a workable or decent FS check with brefs

AdamW
16th November 2011, 12:18 AM
yup, that's exactly the status. we still aim to switch to btrfs as soon as it's plausible, but not before.

tox
18th November 2011, 12:15 AM
going by this Bug (https://bugzilla.redhat.com/show_bug.cgi?id=689512) it looks like there is a working fsck ? a ist of other bugs https://bugzilla.redhat.com/show_bug.cgi?id=689509

vein
18th November 2011, 12:19 AM
The fsck is in the final stages of validation I believe, or so I heard. I'll be excited to a see a move to btrfs, as it's been fantastic in terms of performance/features/stability for me, and in a high-usage system to boot.

tox
18th November 2011, 12:45 AM
i aint tried it, maybe it'll be stable enuff for F17 as default FS

chrismurphy
18th November 2011, 11:46 PM
What is the btrfsck that's included with F16? It's reporting itself as v0.19.

RahulSundaram
19th November 2011, 05:38 AM
Hi

The one included now can only detect problems. Not fix them.

tox
19th November 2011, 07:00 AM
Hi

The one included now can only detect problems. Not fix them.

what about in the 3.2 kernel? im assuming that kernel that has btrfs can fix them? or is that still being tested

renkinjutsu
19th November 2011, 09:21 AM
Were there ever plans to build fsck capabilities into the kernel?

tox
19th November 2011, 09:38 AM
Were there ever plans to build fsck capabilities into the kernel?

dunno atm, im to drunk to remember if there was

jvillain
19th November 2011, 11:12 PM
LinuxCon Europe

The Kernel Summit was followed by LinuxCon Europe, the first ever European offshoot of the Linux Foundation conference which has long been a established on the North American Linux circuit. With around 900 delegates, numbers were better than expected. Presentations were given by a number of kernel developers, including lead btrfs developer Chris Mason.

He discussed a number of enhancements aimed at improving the performance and stability of the experimental file system which are expected to find their way into Linux 3.2. In the long term, file system level support for RAID 5 and 6 arrays is planned. Before that happens, however, he is, keen to work on a program for checking and repairing btrfs file systems. He gave a brief demonstration of a prototype of this tool, known as btrfsck, which is able to resolve one of the most critical errors and should make its way into a development version soon. Mason has also merged a program known as restore, which can be used to rescue data from damaged btrfs file systems, into the git development tree for the recently updated btrfs tool. How long it will be before the major distributions start to include these new and updated tools is a question of wait and see.

http://www.h-online.com/open/features/Kernel-Log-more-details-on-the-kernel-org-hack-1371641.html

renkinjutsu
20th November 2011, 12:05 AM
The fsck is in the final stages of validation I believe, or so I heard. I'll be excited to a see a move to btrfs, as it's been fantastic in terms of performance/features/stability for me, and in a high-usage system to boot.

Do you make regular snapshots of your filesystem? Btrfs is extremely slow for my setup.. but I make use of lots of snapshots

vein
20th November 2011, 12:14 AM
I'm making infrequent snapshots of my filesystems, as my current setup has a cron job backing up key files on a separate server.

RahulSundaram
20th November 2011, 12:28 AM
what about in the 3.2 kernel? im assuming that kernel that has btrfs can fix them? or is that still being tested

fsck is a user space utility. It has no connection to the version of the kernel.

tox
20th November 2011, 12:47 AM
fsck is a user space utility. It has no connection to the version of the kernel.

thanks for the explanation

chrismurphy
20th November 2011, 06:19 PM
Were there ever plans to build fsck capabilities into the kernel?

I don't know that it was going to be put into the kernel, or not, but I had read somewhere that an online self-repairing file system is not just desired but a design goal. No idea what filesystem or time frame that is though. Maybe it was ZFS.

Does anyone know if btrfs supports lvm2 like features other than snapshots? For example I just migrated a Linux installation from a smaller hard drive to a larger hard drive just by cp -a for the ext4 /boot partition. But for root & swap on the LVM, I used pvcreate, vgextend, pvmove, vgreduce, lvresize (then resize2fs) to add the new much larger partition on the new drive to the volumegroup, move the extents to the new pv, remove the old pv (drive) from the volume group, resize the logical volume then resize the ext4 filesystem to fit. It took maybe 10 minutes (OK not a lot of data) and it did not seem to piss off SELinux or fstab or any of the other things I've had to deal with after the fact. Just had to grub-install, per usual.

jvillain
20th November 2011, 11:02 PM
Btrfs file systems are very easy to resize.

btrfsctl -r [+-]size[gkm]: resize the FS by size amount

At the partition level you need to think of it a little differently than you would a regular volume manager.

A good place to start is wikipedia

http://en.wikipedia.org/wiki/Btrfs

chrismurphy
21st November 2011, 12:40 AM
Yeah I know it's possible to resize the filesystem. What's not clear to me is if there's an equivalent to physical volumes that can be added/removed. That is, does or will, btrfs obviate the need for lvm2?

SlowJet
25th November 2011, 04:21 PM
Yeah I know it's possible to resize the filesystem. What's not clear to me is if there's an equivalent to physical volumes that can be added/removed. That is, does or will, btrfs obviate the need for lvm2?

LVM2 is for managing Volumes. It may make more since to you if you thing of the cases where only whole disks are used.

1. Each disk becomes a PV.
2. One or more PV become a VG.

Now you can create LV's all over the disks, a pice or a whole disk or a 1,000 disks.
There is NO file system involved yet.

Now mk some file systems on some LV.

Whatever the management of the file system is, including like terms, is contained within the fs's LV.

So to get s f/s resized you use f/s management, except when the f/s has to grow onto more space bigger than the LV on a VG.
Therefore one would need to use LVM to manage the Volume manager first, then the fs's management .
Toe regain LV space on a VG from the f/s, it would be done in reverse.

So an LVM system could have ext3,ext4, btrfs, in various converted states.
In most cases a raid would be narrowed down to a whole disk or disks LV per raid node otherwise the performance would be unpredictable and the management more complex.

So btrfs can be on LVM using raid10 for example, with each node of the raid being one or more disks (PV's) on a n LV, and all nodes being in the same or different VG.

Also, the newer versions of LVM2 have some command changes that allow subcommand parms, like
Create a PV and VG at the same time, and maybe LV's also.
So LVM management can be much simpler on whole disks.

The one or disks with many LV's and fs's that grow and shrink and use snapshots, etc can be a s messy as you like but it all works the same, just more management.

Personally, I don't see btrfs on a single disk, or a mirror being better than ext4.
And I think btrfs will wear out disks faster. :)

SJ
.

stevea
27th November 2011, 07:24 AM
LVM2 is for managing Volumes. It may make more since to you if you thing of the cases where only whole disks are used.

Perhaps clearer to think of LVM and MDADM (software RAID) as being block managers with a little r/w mux/demuxing.. They each are given one or more block devices (disks, partitions, /dev/loop* device ...) and they produce as a result of configuration new block devices under /dev/mapper. LVM lets you slice&dice the input block devices in interesting ways - but it's just mapping blocks an allowing migration, mirrors, stripes ...

The big plus w/ LVM is that avoids the performance penalty of (physical disk) partitioning on rotating media.- but comes with it's own performance penalty.
=========

Personally, I don't see btrfs on a single disk, or a mirror being better than ext4.

btrfs is already a match for ext4 on most measures and much faster on a several, alas slower on a few - in addition it has a load of important features that ext4 lacks. There is little doubt that btrfs will become the first choice in Linux filesystems once it is out of development.

Among features:
Integrated LVM:
/ snapshots - in fact multiple snapshots.
/ devices - you can add or delete devices from a btrfs file system much like an LVM.

/ subvolumes - (associated with directories) which can be separately mounted or snapshotted.
/ data/metadata types - you can create a btrfs as a raid0, raid1, raid10 or single across devices (raid 5 & 6 planned).
/ tansparent compression (LZO or gzip)
/ file ecc checksums.

transpraent encryption is in planned.

So btrfs eliminates much of the need for LVM, MDADM(sw raid), and partitioning and eventually the crypt file systems.



And I think btrfs will wear out disks faster. :)

SJ
.

More stories to scare the children ? Got any evidence ?
Generally a filesystem can't be faster unless it's doing less I/O - so you claim smells like BS.

SlowJet
27th November 2011, 08:38 AM
LOL, yet again. That should do until next year. :)

ext4 has a lot of stuff in memory compared to ext3.
and does less i/o.

btrfs is like resersfs and has several B-tree areas to R/W in to complete a transaction.
The disk is seeking all over the place.

btrfs on a single disk is no advantage at all except for some redundancy checks if you take the default of dual checksum areas.

btrfs does not have enough commands to manage it yet. The snapshot subvolumes for updates is going to go over like a lead balloon. All or nothing on rawhide? May as well wait to version eol and install and update.

btrfs is too little too late.
The google type back-end storage old and new, and cloud storage is going to be around for a couple of decades before btrfs replaces them.

I doubt if Stanford Uni will bother changing its peta f/s to btrfs

SJ

chrismurphy
27th November 2011, 06:33 PM
Yes, and "the cloud" (a b.s. term if there ever was one) will store all of its data in air. No filesystem required. It's magic.

This might explain how Google's "cloud" dropped 6 weeks of revisions on one of my documents. Regardless of their overall data integrity stat (which I am unfamiliar with), my own experience with "the cloud" is an extremely negative one. I have never had such data loss with my own data "cloud" through no fault of my own.

ext4 just made it into RHEL 5.6 this year (and in RHEL 6 one year ago), is that too little too late also? I think such statements are not very forward thinking at all. There is a need for what btrfs promises, and the work is progressing very consistently.

stevea
28th November 2011, 04:35 PM
LOL, yet again. That should do until next year. :)

Dismissive snide comments are evidence of nothing except that you have no evidence for your position.


ext4 has a lot of stuff in memory compared to ext3.
and does less i/o.

Where is the evidence for this ? Is the amount of amount of memory significant or trivial ? btrfs can and does by default keep duplicate sets of file metadata which improves FS integrity. Is this extra and optional feature the cause of your claim of extra I/O ? The clone/COW operation should reduce i/o far below ext3/4 for most real-world uses.

btrfs is like resersfs and has several B-tree areas to R/W in to complete a transaction.
The disk is seeking all over the place.

Btrfs (aka "B-tree F S") directories are based on B-tree, an yes it absorbed a lot of ideas from Reiserfs. No it doesn't make the isk "write all over the place". Creating a balanced b-tree structure for directories should cause a MAJOR improvement in directory accesses (which where pathetic in ext3, and modestly improved in ext4).

btrfs on a single disk is no advantage at all except for some redundancy checks if you take the default of dual checksum areas.

Yeah - faster directory access, file-cloning w/ COW, LVM and MDADM features, snapshoting, per file checksum ... that's no advantage if you have your head in the sand. The improved integrity is worth a lot eve nif there was no performance increase.

btrfs does not have enough commands to manage it yet. The snapshot subvolumes for updates is going to go over like a lead balloon. All or nothing on rawhide? May as well wait to version eol and install and update.

I'm not having any trouble managing the current features. I've tested btrfs snapshot in subvol's and it works very well and gives some very substantial power. For example you can give each user a separate subvol a a home directory - then you can create a snapshot at the start of any work, and only accept or revert the changes at the end. The ability to create multiple snapshots should be great. It takes a lot of the fear out of mystery make files and 3rd party sw installs. I have no idea what you are saying about rawhide and waiting for EOL but I'm pretty certain you don't understand the issue. Please explain.


btrfs is too little too late.

Too late for what ? There is no better alternative on the horizon. Theodore T'so - the creator of ext3, ext4 and well know kernel contributor says ext4 is just a stopgap effort and that btrfs is the future. Unless you have some real evidence you lose that argument badly.


I doubt if Stanford Uni will bother changing its peta f/s to btrfs

I doubt you have a clue what Stanford will do. Since btrfs is aimed at enterprise I suspect your thinking is somehow distorted on the matter ?



Again I see loads of negative innuendo and no evidence from you. Show a source or evidence for your claims please.

stevea
1st December 2011, 07:07 AM
================================================== =========================

Just for grins I ran a few tests comparing ext4 and btrfs under F16, most recent kernel.
3.1.2-1.fc16.x86_64

I have an otherwise unused disk ....
Disk: Seagate ST3300831AS
330GB,not fast by today's standards with one big partition ...
Timing buffered disk reads: 168 MB in 3.02 seconds = 55.63 MB/sec

To test I reboot, untar a kernel source set (from ramdisk) into a newly created FS and then build a kernel with a make -j3 ..... all # dual core E6700.
I measures the I/O in several ways including using the kernel blktrace feature and the
/sys/block/sdb/sdb1/stat differences from beginning to end. I also periodically sampled the kernel
slab usage to determine how much memory is used by the kernel FS.


For a first comparison I created filesystems like ....
mkfs.ext4 -j -O extent -L "ext4-test" /dev/sdb1
vs
mkfs.btrfs -L "btrfs-test" /dev/sdb1

The results are .......

BuildTime:
EXT4
real 40m5.150s
user 61m4.958s
sys 11m6.853s
BTRFS:
real 39m58.112s
user 61m15.032s
sys 11m14.589s

So btrfs creates a ~1.1% higher system load, and in this case executed about 0.3% faster in clock-time. These differences are minor and within the margin of error. After a couple repeats I get the impression that btrfs is a bit slower creating/writing files and a bit faster reading.



I traced the total amount of I/O and also recorded EVERY disk I/O on these drive, then did a post capture analysis.

Total sectors:
READS: ext4=3.60M btrfs=4.08M sectors
WRITES: ext4=23.7M btrfs=15.1M sectors
R-iops: ext4=51.6k btrfs=52.0k ops
W-iops: ext4=39.9k btrfs=25.1k ops


These results are IMO astonishing. The btrfs read 1.13 times more sectors but wrote only 63.7% as many sectors as ext4. This implies that ext4 is re-writing blocks (4k block size) often since both file systems end with very nearly 7.0GB in use. The btrfs number of write IOPs was significantly lower.

When I analysed the blktrace results I was able to calculate some interesting numbers.

AVERAGE I/O size in sectors:
ext4: 296 sectors
btrfs: 248 sectors

Average seek distance in sectors:
ext4: 7.33 Msect
btrfs: 2.17 Msect

Ext4 IOPS were about 19% larger, but there were 19% more of them !!
Yes btrfs seeks were 3.4 times smaller distance than ext4 on average ! Huge win.


The kernel slab memory usage was 78.7MB for the ext4 case (root + data disk) and btrfs used 64.4MB + 18.9MB for the rootfs / ext4. So it's a reasonable but crude estimate that the ext4 structures for the data disk were only about 59.8MB. IOW Btrfs may use ~7.7% more slab space than ext4 (4.6MB of DRAM in this case). Minor on PCs or servers I think.


We shouldn't generalize too much, but it's fair to say that for this particular test, between EXT4 and BTRFS;
1/ There was no significant difference in time/performance.
2/ There was no significant difference in disk space used
3/ There was a minor increase in kernel DRAM used for Btrfs.
4/ Btrfs performed substantially fewer sectors and IOPs for the same task.
5/ Ext4 did a better job coalescing IOPS into larger transfers, but did much more I/O.
6/ Btrfs seeks were dramatically smaller on average.

================================================== =======

I ran two other btrfs tests.

In one I mounted the btrfs filesystem with ....
mount -o noacl,notreelog /dev/sdb1 /test
however instead of improving performance is slowed the make time by ~4% and increased the number of reads and writes.

--

In another test I mounted the btrfs with lzo compression
mount -o noacl,notreelog,space_cache,compress=lzo /dev/sdb1 /test

In this case ...the system & user time was almost identical with the first btrfs result, however the real time was about 2.2% slower. This 2.2% may imply some of the kernel compression/decompression wasn't evenly load-shared between cores.

The astonishing part .... The final filesystem had 2.8GB in use, while the uncompressed FSs were 7.0GB. a 60% compression for a mostly binary, mostly small-file scheme. The total amount of I/O was also about halved. Note that kernel source currently takes a little over 1GB (includes firmware blobs) and the remainder is almost all binary. If your CPU isn't overloaded you can probably get a ~2X or perhaps better saving in disk space by compressing at little or no system performance cost. Unclear if this would slow down specific apps, but on the surface it seems like a big win.


===============

Btrfs wears your disk : WRONG - btrfs required significantly less IOPs and fewer number of sector I/O for the same task.
Btrfs seeks a lot more than ext4: WRONG - btrfs seeks a lot less than ext4.
Btrfs performance is poor: WRONG - Btrfs performance was very comparable to ext4, and likely exceeds ext4 on LVM performance based on some LVM tests I did last year.
Also Btrfs has a load of features ext4 can't touch.

chrismurphy
2nd December 2011, 03:07 AM
Should've ask how he likes crow cooked!

SlowJet
3rd December 2011, 10:30 PM
I did some testing runs, too. :)
# ext4 vs btrfs - values maked by # comments - summary at bottom.

# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
LogVol71Storage VolGroup71 owi-ao 425.00g
LogVol71swap VolGroup71 -wi-ao 4.00g
LogVol72root VolGroup71 -wi-ao 32.00g
snaphome VolGroup71 swi-a- 1.22g LogVol71Storage 0.00
LogVol72Storage VolGroup72 -wi-ao 600.00g
LogVol72btrfs VolGroup72 -wi-ao 331.50g
[root@Ruthie-07 ~]# mount /dev/VolGroup71/snaphome /mnt/snaphome
[root@Ruthie-07 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
rootfs 32G 5.7G 26G 19% /
udev 1.5G 0 1.5G 0% /dev
tmpfs 1.5G 272K 1.5G 1% /dev/shm
tmpfs 1.5G 728K 1.5G 1% /run
/dev/mapper/VolGroup71-LogVol72root
32G 5.7G 26G 19% /
tmpfs 1.5G 0 1.5G 0% /sys/fs/cgroup
tmpfs 1.5G 728K 1.5G 1% /var/run
tmpfs 1.5G 728K 1.5G 1% /var/lock
tmpfs 1.5G 0 1.5G 0% /media
/dev/sda1 310M 51M 244M 18% /boot
/dev/mapper/VolGroup71-LogVol71Storage
419G 66G 333G 17% /home
/dev/mapper/VolGroup72-LogVol72Storage
591G 195G 393G 34% /mnt/storage72
/dev/mapper/VolGroup72-LogVol72btrfs
332G 36G 295G 11% /mnt/btr72
/dev/mapper/VolGroup71-snaphome
419G 66G 333G 17% /mnt/snaphome

cd /mnt/storage72
[root@Ruthie-07 storage72]# ls
BKUPS Documents DVDISOS lost+found VDIBKUPS
[root@Ruthie-07 storage72]# cd BKUPS
[root@Ruthie-07 BKUPS]# ls
fc16-stage newthisyear oldrpms tararchives
[root@Ruthie-07 BKUPS]# cd tararchives
[root@Ruthie-07 tararchives]# ls
bkupboot11192011.bz2 bkupboot11302011.tar bkuphome11242011.tar bkuproot11192011.tar bkuproot11302011.tar
bkupboot11242011.tar bkuphome11192011.tar bkuphome11302011.tar bkuproot11242011.tar
[root@Ruthie-07 tararchives]# ls
bkupboot11192011.bz2 bkupboot11302011.tar bkuphome11242011.tar bkuproot11192011.tar bkuproot11302011.tar
bkupboot11242011.tar bkuphome11192011.tar bkuphome11302011.tar bkuproot11242011.tar
[root@Ruthie-07 tararchives]# tar -c -f bkuphome12012011.tar /mnt/snaphome --preserve-permissions --preserve-order --xattrs --totals -b32 --exclude=*.iso --exclude=*.vdi --exclude=*.rpm
tar: Removing leading `/' from member names

# results of ext4 LV snapshot of /home from ata500G disk to sata1TB ext4 LV
total bytes written: 191709184 (183MiB, 9.0MiB/s)

[root@Ruthie-07 tararchives]# cd /mt/btr72
-bash: cd: /mt/btr72: No such file or directory
[root@Ruthie-07 tararchives]# cd /mnt/btr72
[root@Ruthie-07 btr72]# ls
BKUPS Documents Downloads
[root@Ruthie-07 btr72]# cd BKUPS
[root@Ruthie-07 BKUPS]# ls
tararchives
[root@Ruthie-07 BKUPS]# cd tararchives
[root@Ruthie-07 tararchives]# tar -c -f bkuphome12012011.tar /mnt/snaphome --preserve-permissions --preserve-order --xattrs --totals -b32 --exclude=*.iso --exclude=*.vdi --exclude=*.rpm
tar: Removing leading `/' from member names

# results of ext4 LV snapshot of /home from ata500G disk to sata1TB btrfs LV
# the speed is robably due to small size and two code paths.
Total bytes written: 191709184 (183MiB, 105MiB/s)

cd btr72/BKUPS/tararchives
[root@Ruthie-07 tararchives]# ls
bkupboot11192011.bz2 bkupboot11302011.tar bkuphome11242011.tar bkuphome12012011.tar bkuproot11242011.tar
bkupboot11242011.tar bkuphome11192011.tar bkuphome11302011.tar bkuproot11192011.tar bkuproot11302011.tar
[root@Ruthie-07 tararchives]# mount /dev/VolGroup71/snaproot /mnt/snaproot
[root@Ruthie-07 tararchives]# tar -c -f bkuproot11302011.tar /mnt/snaproot --preserve-permissions --preserve-order --xattrs --totals -b32
tar: Removing leading `/' from member names
tar: Removing leading `/' from hard link targets

tar: /mnt/snaproot/usr/share/mysql-test/var/tmp/mysqld.1.sock: socket ignored
tar: /mnt/snaproot/var/run/abrt/abrt.socket: socket ignored
tar: /mnt/snaproot/var/run/dbus/system_bus_socket: socket ignored
tar: /mnt/snaproot/var/run/portreserve/socket: socket ignored

# results of ext4 LV snapshot of /root from ata500G disk to sata1TB btrfs LV
Total bytes written: 6087999488 (5.7GiB, 6.0MiB/s)

[root@Ruthie-07 tararchives]# cd /mnt/storage72/BKUPS
[root@Ruthie-07 BKUPS]# ls
fc16-stage newthisyear oldrpms tararchives
[root@Ruthie-07 BKUPS]# cd tararchives
[root@Ruthie-07 tararchives]# ls
bkupboot11192011.bz2 bkupboot11302011.tar bkuphome11242011.tar bkuphome12012011.tar bkuproot11242011.tar
bkupboot11242011.tar bkuphome11192011.tar bkuphome11302011.tar bkuproot11192011.tar bkuproot11302011.tar
[root@Ruthie-07 tararchives]# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
LogVol71Storage VolGroup71 owi-ao 425.00g
LogVol71swap VolGroup71 -wi-ao 4.00g
LogVol72root VolGroup71 owi-ao 32.00g
snaphome VolGroup71 swi-ao 1.22g LogVol71Storage 0.26
snaproot VolGroup71 swi-ao 1.22g LogVol72root 9.41
LogVol72Storage VolGroup72 -wi-ao 600.00g
LogVol72btrfs VolGroup72 -wi-ao 331.50g
[root@Ruthie-07 tararchives]# ls /mnt/
boot btr72 home root snapboot snaphome snaproot storage72

tar -c -f bkupboot12012011.tar /boot --preserve-permissions --preserve-order --xattrs --totals -b32tar: Removing leading `/' from member names

# results of ext4 LV snapshot of /boot from ata500G disk to sata1TB ext4 LV
Total bytes written: 42057728 (41MiB, 45MiB/s)

[root@Ruthie-07 tararchives]# tar -c -f bkuproot12012011.tar /mnt/snaproot --preserve-permissions --preserve-order --xattrs --totals -b32
tar: Removing leading `/' from member names
tar: Removing leading `/' from hard link targets

tar: /mnt/snaproot/var/run/dbus/system_bus_socket: socket ignored
tar: /mnt/snaproot/var/run/portreserve/socket: socket ignored

# results of ext4 LV snapshot of /root from ata500G disk to sata1TB ext4 LV
Total bytes written: 6088015872 (5.7GiB, 6.1MiB/s)

[root@Ruthie-07 tararchives]# tar -c -f bkuphome12012011.tar /mnt/snaphome --preserve-permissions --preserve-order --xattrs --totals -b32 --exclude=*.iso --exclude=*.vdi --exclude=*.rpm
tar: Removing leading `/' from member names

# results of ext4 LV snapshot of /home from ata500G disk to sata1TB ext4 LV
Total bytes written: 191725568 (183MiB, 7.4MiB/s)

# In the opposite direction
# coping 4GB of the tararchives dir from btrfs LV on sata1TB to ext4 LV on ata500G
# using file manager indicated 54.xMB/s
# for ext4 to ext4 the rate was 57.xMB/s
#
# summary of tests.
#
# Small datasets of less than a few hundred MB is affeccted by the dual code path of two f/s, and cpu power and memory available.
# Otherwise ext4 reads and writes faster than btrfs on LVM.
#
# Same disk test.
# Coping ext4 on sata1TB disk to ext4 on sata1TB was 34MB/s
# For btrfs is was 26 MB/s
#
# btrfs-progs-0.19-16.fc15.i686
# e2fsprogs-1.42-1.fc17.i686
# lvm2-2.02.88-1.fc17.i686
# fedora-release-15-3.noarch
#

SJ

RahulSundaram
4th December 2011, 08:04 PM
Btrfs on LVM is not that useful for benchmarks. Try it without LVM in Rawhide

SlowJet
4th December 2011, 11:20 PM
Btrfs on LVM is not that useful for benchmarks. Try it without LVM in Rawhide

I disagrree.
As long as it is apples to apples the math does not care if apples are in crate or not.

kernel-3.x.x does not run on my perfectly good old PentIV. (Or 2.6.40+)

btrfs as a volume manage is limited compared to LVM2, so LVM2 may still be in the works for awhile.

SJ

chrismurphy
4th December 2011, 11:43 PM
Btrfs doesn't really have volume management per se. It does have a way to do the things I think are relatively common in LVM2 like adding a larger disk, moving data from the smaller to the larger, then removing smaller - while maintaining a live system. This is done with RAID in btrfs, which is not sector identical but rather based on chunks. So you can add a drive to a volume, remove the small drive from the volume, then rebalance.

Something like that anyway.

And it's not exactly true that just because LVM is in place in both tests that this is apples to apples, because there has been quite a bit of effort put into both LVM and extX for their mutual performance benefit. Unlike with btrfs most likely. A better apples to apples is no LVM. Otherwise all tests conflate how well the file systems behave with LVM underneath them.

tox
4th December 2011, 11:45 PM
I disagrree.
As long as it is apples to apples the math does not care if apples are in crate or not.

kernel-3.x.x does not run on my perfectly good old PentIV. (Or 2.6.40+)

btrfs as a volume manage is limited compared to LVM2, so LVM2 may still be in the works for awhile.

SJ

time for a new PC dude. but i cant see them working on LVM2 anymore now that there main focus is with BTRFS

chrismurphy
4th December 2011, 11:49 PM
Btrfs doesn't really have volume management per se.

This may be incorrect. It doesn't have the same volume management style as LVM(2) but, it does clearly have some volume management capabilities as well as chunk rather than sector based RAID. That's kinda important, because it should prevent (or at least allow mitigation of) silent data corruption and its replication. Unlike sector based RAID with which you're screwed if corruption creeps in.

tox
12th January 2012, 05:39 AM
since i cant be bothered posting this to the mailing lists im gonna ask it here , maybe someone might have an update on it

but will BTRFS make it this time as default for F17?

Haber_Nir
12th January 2012, 12:28 PM
in the fedora 17 features list it is in : 0%. soo i want to know too :):)

sea
12th January 2012, 12:33 PM
Dont think so, thought havent had any issues the time i had used 1 brtfs partition.
I've read that some still count it unstable, sadly, dont recall source or why its declared as such.

EDIT:
But doesnt matter, i'll skip F17 anyway, just because i dont want no BSE (beefy miracle) on my devices.

tox
23rd January 2012, 10:05 PM
how correct is this story? http://www.phoronix.com/scan.php?page=news_item&px=MTA0Njk

nonamedotc
23rd January 2012, 10:11 PM
I was under the impression that Btrfs was to be included in F16. It is getting a little late I suppose ...

tox
23rd January 2012, 10:58 PM
hopefully the story is right and F17 will have BTRFS as default . i might post a message on the devel list and ask. perhaps Josef Bacik will answer it

---------- Post added at 09:58 AM ---------- Previous post was at 09:21 AM ----------

ok i just posted the question on devel list, lets see what answer's i get, i shall post them here or follow it here http://lists.fedoraproject.org/pipermail/devel/2012-January/thread.html

Dutchy
23rd January 2012, 11:09 PM
I've read on some mailing list that it was already almost ready in November or so, but the dev went on vacation and couldn't deliver it before that, so Phoronix could be right on it.

tox
24th January 2012, 01:58 AM
hard to say atm but looks like it wont be default. untill i see more conclusive answer from someone like Josef , since he'll know more about whats going on

another update. these bugs need to be resolved first https://bugzilla.redhat.com/show_bug.cgi?id=689509

---------- Post added at 12:58 PM ---------- Previous post was at 10:09 AM ----------

a post from Rahul on Phoronix forum about it http://phoronix.com/forums/showthread.php?68379-Error-Fixing-Btrfs-FSCK-Tool-Is-Imminent/page2

hadrons123
24th January 2012, 02:42 AM
btrfs is not stable yet at upstream!

tox
24th January 2012, 02:44 AM
im gonna have a bet. i reckon BTRFS will become default in F19

tox
6th February 2012, 09:36 PM
looks like there gonna try an maker BTRFS default in F17 going by what i read on the devel list. a filesystem check should of been released today or tomorrow going by what Josef bacik has said Read from here http://lists.fedoraproject.org/pipermail/devel/2012-February/162188.html

sea
6th February 2012, 09:59 PM
Cool.
Just to share this experience, for some odd reason F16 told me, i'd need to run a btrfs in order to make my future root partition a btrfs.
Today, it formated btrfs without a problem.

WIth rawhides anaconda (pre-alpha), just the selection of btrfs caused a fatal crash.
Hope thats been fixed with the beta.

tox
7th February 2012, 09:07 PM
ok i hear now BTRFS will NOT be default in F17. its now moved to F18

DBelton
7th February 2012, 09:23 PM
I guess the filesystem check didn't make in in by the feature freeze deadline.

I want to try out btrfs, but I would not have been using it for the default filesystem in F17 even if they had made the deadline and it was the default, except for possibly on a test box.

I wonder if the filesystem check will make it into F17 even if it's missed the feature freeze?

tox
7th February 2012, 09:29 PM
the FSCK should be in either today or tomorrow going by what Josef Bacik has said but i think it has othr issues to at this point

18:17:37 [mjg59] #topic #704 F17 Feature: BTRFS default file system https://fedoraproject.org/wiki/Features/F17BtrfsDefaultFs
18:17:41 [mjg59] .fesco 704
18:17:42 [twoerner] t8m: NetworkManager stores the zones for connections
18:17:42 [zodbot] mjg59: #704 (F17 Feature: BTRFS default file system https://fedoraproject.org/wiki/Features/F17BtrfsDefaultFs) – FESCo - https://fedorahosted.org/fesco/ticket/704
18:17:52 [mjg59] josef: You here?
18:17:54 [josef] yup
18:17:57 [twoerner] thanks, guys
18:18:07 [mjg59] josef: So really definitely actually btrfsck?
18:18:16 [josef] yup he's doing all the cherry picking now
18:18:21 [josef] says it will be ready tomorrow morning
18:18:21 [mitr] Is anaconda able to handle this right now?
18:18:30 [josef] anaconda has all the kickstart stuff and it works for hte auto partitioning
18:18:47 [josef] but heres the gotcha, you wont get any fancy stuff if you do a custom partiotn setup
18:18:53 * nirik would prefer to land in f18.
18:19:11 [josef] i dont think thats a huge deal since i wasnt planning on anaconda being able to do anything fancy at first anyway
18:19:17 [limburgher] fancy such as. . . ?
18:19:25 [josef] subvols, raid, compression etc
18:19:36 [notting] how often are we discovering any new data-eaters?
18:19:40 [pjones] yeah, we're looking at very basic support right now
18:19:50 [josef] notting: no big dataeaters since the last one
18:19:53 [drago01] what about the "omg it kills vms" bug?
18:19:53 [mjg59] josef: And we're now stable and don't have any known critical crash-my-system-and-eat-my-data issues?
18:20:02 [notting] josef: the last one?
18:20:07 [limburgher] josef: :)
18:20:15 [limburgher] josef: That's always true. :)
18:20:19 [josef] so we had a problem with our barrier code so if you crashed the box at the right second you'd lose stuff
18:20:30 [josef] but we built this nice cool tool to verify stuff
18:20:31 [pjones] limburgher: always except once, anyway ;)
18:20:37 [josef] so we're pretty confident its all ok now
18:20:48 [limburgher] pjones: Fair. :)
18:20:54 [josef] if worse comes to worse i have this really cool restore tool that will pick up the peices and get all of your data back :)
18:21:13 [limburgher] josef: So you keep the baby pictures on it currently then? :)
18:21:15 [josef] but we spent a lot of time verifying that we were doing everything right with our barriers to make sure we dont have this problem
18:21:20 [josef] i do
18:21:34 [josef] all my boxes (exception for my devel box of course) are btrfs
18:21:44 [josef] same goes for chris
18:21:49 [josef] been that way for years
18:22:04 [mjg59] josef: Ok, so you're good with people coming to your house and setting you on fire if this all goes awfully wrong?
18:22:13 [sgallagh] Just as a notable data point: I've been running with a btrfs filesystem since F16 beta. I can vouch for its stability in an assortment of workloads.
18:22:14 [josef] yup
18:22:20 [mjg59] Ok, good enough for me
18:22:28 [pjones] sgallagh: not really a useful datapoint, no.
18:22:34 * nirik had a corrupted btrfs install, but josef worked hard to get my data back, and I did.
18:23:01 [josef] worse comes to worse its pretty easy to make anaconda go back to what we had (afaik)
18:23:13 [notting] sure, but the josef-get-your-data-back service doesn't scale
18:23:17 [nirik] I'd still prefer to land in f18 (land right after branch) and give time to have the cool features and test out things much better.
18:23:31 [mmaslano] nirik: me too
18:23:41 [mjg59] pjones: Switching the default back is achievable?
18:23:44 [limburgher] nirik: nods
18:24:06 [josef] yes waiting till f18 gives us a nicer looking release with more complete anaconda support
18:24:12 [pjones] mjg59: no reason it wouldn't be.
18:24:19 [mitr] mjg59: for new installs anyway
18:24:21 [mjg59] josef: How valuable would the testing in F17 be?
18:24:22 [sgallagh] josef: Sounds like you're arguing to defer
18:24:25 [nirik] josef: what about live media?
18:24:45 [pjones] sgallagh: no, not defer - split between minor and major feature sets.
18:24:55 [josef] mjg59: i think very valuable, we're getting to the point where it's mostly stable for us, we need users to break it in new and interesting ways ;)
18:25:04 [drago01] are we planning to do btrfs convert on upgrades?
18:25:06 [josef] nirik: i havent done anything for live media
18:25:07 [notting] that sounds like an excellent opt-in feature :)
18:25:08 [sgallagh] pjones: Ah, ship all the features but not flip the "default" switch?
18:25:10 [drago01] or new installs only?
18:25:20 [josef] new installs only
18:25:27 [drago01] ok
18:25:36 [josef] afaik live media will probably have to stay on ext whatever
18:25:39 [josef] for now
18:25:41 [pjones] sgallagh: btrfs tools support more complex options that anaconda simply won't be using at this time.
18:25:47 [sgallagh] ok
18:26:05 [notting] "need users to break it in new and interesting ways" screams like a bad idea for people's data. now, if we had some large scale *system* installations where we could swap 25% of them for btrfs and test the results, sure
18:26:11 [sgallagh] Sorry, my knowledge level on btrfs is limited to "it's the Next Big Thing we should all be testing".
18:26:35 [mjg59] Do we still have time to make the change pre-Alpha?
18:26:46 [sgallagh] mjg59: Oh, HOURS at least :)
18:26:53 [notting] freeze is tomorrow, so, sure!
18:26:57 [mjg59] Ok, so
18:26:59 [sgallagh] notting: Not quite
18:27:25 [pjones] that's a question for dlehman I guess.
18:27:26 [sgallagh] notting: According to dgilmore: "plan on landing today. tomorrow is too late"
18:27:41 [mjg59] Proposal: Switch to btrfs by default for alpha. Revert if it's overly bumpy.
18:27:46 [pjones] mjg59: +1
18:27:53 * notting points dgilmore at rbergeron to get everyone on the same page w.r.t. communications
18:28:20 [mmaslano] -1 fsck is here for only a while. I'd rather see it in F-18
18:28:22 [mitr] Was the VM question answered yet?
18:28:27 [josef] so what do i do if i dont get the package built before the branch, just pull the f17 branch and update it?
18:28:31 [sgallagh] Yeah, the source of this was me asking rbergeron for clarification and getting dgilmore's answer of "tonight"
18:29:03 [mjg59] I'm +1
18:29:15 [mjg59] Any other votes?
18:29:21 [limburgher] I'm on the fence.
18:29:21 [sgallagh] I'm also -1 for F17. I think we're a bit too close to the line here.
18:29:22 [notting] i'm -1
18:29:23 * nirik ponders
18:30:06 [mitr] josef: KVM performance?
18:30:07 [notting] it would be interesting if we could enforce separation where it would be the default for system partitions but not data partitions
18:30:17 [nirik] yeah, -1 I guess. I'd like to see us use it all around with nice features and also have some time to see that fsck is working well in the wild.
18:30:24 [t8m] I'm +0 I think btrfs might be ready for general consumption now however without the full support in anaconda, does it make really sense to switch?
18:30:26 [josef] mitr: been something i'm working on, we're getting better but not quite to ext* speed yet
18:30:29 [mjg59] +2/-4 at present, then
18:30:53 [pjones] t8m: honestly from the anaconda POV, we'd rather have a release with the basic support so we can isolate bugs from it vs our big UI redesign.
18:30:53 [mjg59] And with a +0 it's not going to reach +5
18:30:55 [mitr] I'm +1 to the idea, but I think this really needs to land (including the anaconda default flip) for Alpha - is that manageable?
18:31:10 [pjones] the anaconda default flip is not a big change.
18:31:29 [t8m] pjones, ok that's good idea
18:31:34 [sgallagh] mitr: I don't think we can allow this to land post-alpha. Alpha is supposed to be feature-complete and testable
18:31:46 [t8m] ok changing my vote to +1 if it lands pre-alpha
18:31:48 [sgallagh] If it lands post-alpha, it won't be installable until beta, which isn't enough time for testing
18:31:59 [mjg59] +3/-4
18:32:14 [mitr] sgallagh: right
18:32:20 [mjg59] Just waiting for nirik and limburgher I think?
18:32:25 [limburgher] I think so.
18:32:42 [mitr] mjg59: I count +4
18:32:43 [nirik] I was -1
18:32:52 [limburgher] Reluctant +1 if pre-alpha.
18:33:20 [mjg59] Oh, yeah
18:33:21 [mjg59] Ok
18:33:26 [mjg59] So that makes +5/-4
18:33:38 [drago01] so we basically would end up with two different default file systems depending on install method / media?
18:33:43 [drago01] doesn't sound right to me
18:33:44 [mjg59] Which I guess means we're going to give this a go?
18:33:50 [pjones] I _think_ the anaconda change is roughly http://fpaste.org/p9zw/
18:33:53 [nirik] drago01: yeah, it would.
18:33:59 [sgallagh] That vote is still a little skewed
18:34:11 [sgallagh] Some were unqualified +1, others only pre-Alpha
18:34:21 [mjg59] sgallagh: I think the assumption is pre-Alpha, yes
18:34:23 [sgallagh] Do we assume that makes the whole vote "yes, if pre-Alpha"?
18:34:24 [pjones] I think pre-alpha is the working assumption there
18:34:25 [josef] and pre-alpha means today right?
18:34:29 [sgallagh] (Just wanted to clarify)
18:34:29 [pjones] josef: right
18:34:31 [nirik] yeah, asap
18:34:32 [sgallagh] josef: Yes
18:34:34 [mjg59] Ok
18:34:42 [josef] ok so if i miss that i'm screwed?
18:34:50 [pjones] only for six months!
18:34:52 [josef] haha
18:34:52 [mjg59] #agreed We'll try btrfs by default as long as it lands in alpha - if not, push to F18

---------- Post added at 08:29 AM ---------- Previous post was at 08:25 AM ----------

http://lists.fedoraproject.org/pipermail/devel/2012-February/162303.html

---------- Post added at 08:29 AM ---------- Previous post was at 08:29 AM ----------

i'll buy ya a New mouse after scrolling leigh :dance:

jamielinux
7th February 2012, 11:05 PM
Thanks for posting the excerpt, some of the comments really made me laugh.


18:22:04 [mjg59] josef: Ok, so you're good with people coming to your house and setting you on fire if this all goes awfully wrong?


18:22:34 * nirik had a corrupted btrfs install, but josef worked hard to get my data back, and I did.
18:23:13 [notting] sure, but the josef-get-your-data-back service doesn't scale


Some of the Btrfs features look awesome, and Avi Miller's video from LCA2012 really whet my appetite.

tox
9th February 2012, 12:08 AM
this thread could be closed or merged with this http://forums.fedoraforum.org/showthread.php?t=275100

thanks to the Mod that Merged it

Haber_Nir
9th February 2012, 05:27 PM
ok i hear now BTRFS will NOT be default in F17. its now moved to F18

is it official?

chrismurphy
9th February 2012, 06:39 PM
If I understand correctly, branch has already occurred (Rawhide is now F18), and new btrfs-progs and btrfsck did not make it before branch. But the alpha blocker meeting is tomorrow. So maybe it'll sneak in, I'm not sure.

tox
9th February 2012, 09:38 PM
its official that BTRFS will go to F18 as the FSCK didnt land in time

---------- Post added at 08:38 AM ---------- Previous post was at 08:37 AM ----------

http://lists.fedoraproject.org/pipermail/devel/2012-February/162303.html

DBelton
9th February 2012, 10:29 PM
even if the fsck didn't make it in before the freeze, they really should allow it once it ready since it really doesn't affect any other packages.

I'm not going to install BTRFS until there is a filesystem check for it. So by allowing the fsck even after the freeze, it would give more testing time for BTRFS before F18.

tox
9th February 2012, 10:39 PM
even if the fsck didn't make it in before the freeze, they really should allow it once it ready since it really doesn't affect any other packages.

I'm not going to install BTRFS until there is a filesystem check for it. So by allowing the fsck even after the freeze, it would give more testing time for BTRFS before F18.

my guess they will still allow it in to be tested by users, they just wont make BTRFS as default in this release.

chrismurphy
10th February 2012, 03:32 AM
my guess they will still allow it in to be tested by users, they just wont make BTRFS as default in this release.

This is my understanding as well. While I understand the fsck requirement for default and production use, I think in some sense it's overly conservative. There's a lot in common between btrfs and ZFS conceptually, and ZFS still does not have an fsck. Whether that's a good idea or not has been debated, but it's been considered production quality for some time now without one.

tox
10th February 2012, 03:45 AM
bit dumb not to have a fsck. imagine if MS windows came without one

jpollard
10th February 2012, 03:52 AM
At least part of that is how much memory and time it would take to actually scan a filesystem.

Scanning a filesystem requires every file and directory to be read and matched up against the inode table. Originally, this was done in memory (fast for small filesystems), now an external file (really slow for large ones). The amount of disk space required will also be significant.

Once the directory/inode is validated, the storage bitmaps have to be validated... even more time...

I don't think there IS a good way to do it (I remember it taking 10 hours to check a 5 TB filesystems). And the more files, the longer it will take.

So the effort is going to be considerable. I imagine they are trying to come up with ways to perform the scann in an incremental manner - having the entire filesystem locked, and only unlocking validated parts as they are finished. This would at least allow it to be done, and allow the unlocked parts to be used while it continues for several days.

chrismurphy
10th February 2012, 04:04 AM
bit dumb not to have a fsck. imagine if MS windows came without one

Of course that would make zero sense because Windows doesn't have a COW file system, that by design assures its always consistent. Comparing btrfs to NTFS is a bit dumb, they aren't at all the same thing.

dyfet
23rd February 2012, 09:14 AM
I think a working fsck is essential. I recall recently I was updating with yum, and did not know my laptop battery was dead and the power cord was loose. In the middle it just died...with ext4, no problem, just a fsck, and then some work getting packaging and the system back into state sane. If I had used btrfs instead, would this have been recoverable at all without a working fsck? If not, I would consider it unacceptable for realworld use. Things happen.

tox
23rd February 2012, 10:23 AM
I think a working fsck is essential. I recall recently I was updating with yum, and did not know my laptop battery was dead and the power cord was loose. In the middle it just died...with ext4, no problem, just a fsck, and then some work getting packaging and the system back into state sane. If I had used btrfs instead, would this have been recoverable at all without a working fsck? If not, I would consider it unacceptable for realworld use. Things happen.

BTRFS prolly wont be used as default till F18 due to the FSCK - without that its useless.

boydrice
23rd February 2012, 12:54 PM
This is my understanding as well. While I understand the fsck requirement for default and production use, I think in some sense it's overly conservative. There's a lot in common between btrfs and ZFS conceptually, and ZFS still does not have an fsck. Whether that's a good idea or not has been debated, but it's been considered production quality for some time now without one.

here is an interesting argument on why ZFS doesn't need an fsck.

http://www.c0t0d0s0.org/archives/6071-No,-ZFS-really-doesnt-need-a-fsck.html

http://docs.oracle.com/cd/E19253-01/819-5461/gbbwa/index.html

chrismurphy
23rd February 2012, 01:26 PM
ext4's recovery from power failure has to do with the journal, not fsck. And it is not at all impervious to corruption in the exact situation you described, even if corruption didn't happen in your case. Disk drives regularly write blocks out of order from the intention of the file system, meaning a commit block can be written before journal metdata. This is why there are write barriers, but apparently there are a significant number of drives that say they've written data from cache to disk but in fact have not - they lie.

So just because in your particular example you did not experience any corruption, doesn't mean it can't occur, isn't likely to occur, or that btrfs would have had unrecoverable corruption without an fsck. It's pure speculation.

And it's ridiculous to say btrfs is useless without fsck. It's been used without an fsck going on two years now. That it's not useful for YOU without an fsck is a different argument than categorically considering it useless, because of the lack of an fsck. ZFS, once again, lacks an fsck and has been around for 6+ years.

jpollard
23rd February 2012, 02:18 PM
Oh.. zfs has a fsck. It just doesn't call it fsck.

1. what does fsck do? puts the filesystem into a consistant state from an inconsistent state.
2. what does the journal do? It records changes made to the metadata to protect the state of the filesystem.

In the case of zfs, the rollback to a previous valid state is equivalent to running an fsck to repair a filesystem. It can also be compaired to applying updates from the journal. I don't see the significant difference.

chrismurphy
23rd February 2012, 02:26 PM
OK using that logic, btrfs has an fsck which is -o recovery.

Neither ZFS nor btrfs have a journal. Both are COW file systems. If you don't see the difference, you don't understand the file system basics.

jpollard
23rd February 2012, 05:55 PM
OK using that logic, btrfs has an fsck which is -o recovery.

Neither ZFS nor btrfs have a journal. Both are COW file systems. If you don't see the difference, you don't understand the file system basics.

Copy on write is equivalent to a journal, just more flexible, without the limit of the allocated size of the journal.

The update is put on a copy (equivalent to the journal), then the metadata is commited (equivalent to flushing the journal).

So yes, I do understand.

chrismurphy
23rd February 2012, 06:36 PM
It is not equivalent to a journal, which on ext3/4 is a metadata only journal. ext3/4 actually modifies journal metadata first, then overwrites data, then commits that change. It's not at all the same with ZFS or btrfs which don't overwrite data.

jpollard
23rd February 2012, 10:39 PM
It accomplishes the same activity, just with a slightly different mechanism.

Both overwrites.

If they didn't, then you would never see the update, and that would cause the corruption. The difference is HOW they handle the overwrite. The advantage zfs/btrfs have is that there are fewer critical overwrites.

chrismurphy
24th February 2012, 02:54 AM
Right, and SSD and HDD are the same, just slightly different mechanisms.

The claim both overwrite data demonstrates the fundamental misunderstanding of how both COW based file systems work, how they maintain consistency, why fsck would only rarely be needed rather than as a routinely needed tool, how snapshots work rather differently than LVM2, etc.

jpollard
24th February 2012, 01:25 PM
And in most ways they are the same, though making an ssd look like a disk is clumsy and slow.

It just shows a lack of history and evolution of algorithms. COW is a variation of RCU as applied to disks. Journaling is also such an application.

Been there done that.

chrismurphy
24th February 2012, 04:52 PM
Haha. OK so in effect there is no functional or meaningful difference between any file system at all, they're all basically the same thing. And the only meaningful difference between SSD and HDD is speed. And ZFS has fsck even though it doesn't. And btrfs mount -o recovery doesn't qualify as a ZFS like fsck even though it functionally does what you describe ZFS does as a functional fsck.

I get it. You're a comedian.

jpollard
24th February 2012, 09:50 PM
Didn't say anything even close to that.

Both btrfs and zfs are designed to scale to exabytes in size. To support anything near that requires multiple disks in multiple locations, not necessarily directly attached to the same host.

Raid operation doesn't even really support current 3TB disks - recovery of a failed disk is measured not in hours, but days. And that opens up potential catastrophic failure of the filesystem.

Running the traditional fsck offline is also out of the question - I have already seen fsck passes take 10-15 hours for large filesystems, and LOTS of memory for workspace. The memory workspace got so bad that now it uses a different filesystem for that instead of main memory.

Such large filesystems NEED to have the capabilities of the traditional fsck built into the filesystem. It also needs them to work on only portions of the filesystem so as to not make the entire thing unusable for days. This has not yet been fully accomplished. A "mount -o recovery" doesn't qualify as it doesn't support partial recovery. It just loses data by rolling back to the last valid checkpoint... And that could lose days of work (depending on application).

That alone requires a totally different approach to the definition of "consistent filesystem structure" which allows for "partially consistent filesystem structure". And that needs a different approach to the implementation of both storage, and repair of that storage, and managment of that storage.

Journaling works reasonably well for 100GB. It has known problems in high performance due to the bottleneck of the journal size. Workarounds to date have put the journal on a totally different disk than that holding the data. I have even seen them put on SSD. Still, the limit exists. The evolution of RCU and other database capabilities (which, by the way, is where journaling itself came from) have given ext3/4 the ability work work with 1/2 TB filesystems reasonably well (just don't do a high rate of updates or the journal will show itself as the primary limit).

As both disks and filesystems get larger (I was working with a 300TB filesystems 15 years ago - problem.. deletes could take forever - cleanup of a directory with 20,000 files could take days).

The directory structures used in btrfs at least, was known way back then, just to improve file search performance (just compair 64 bit name hashes, if fail, next file... if match, then match the string name). The size of hashes may change, but the intent is the same.

And then there is the problem of memory - zfs recommends 2GB of memory for each TB of disk (and it may be worse than that). Even using SSD to try to improve caches will not work for really large filesystems (a system with 16TB filesystem would need a system with 32GB of memory just for the cache... not good). I've already worked with 16TB filesystems, and though the fsck was slow, it didn't require a mainframe to support. Using simple snapshots improved the fsck pass after a crash, though it could still take a couple of hours after a particularly dirty power failure. After that, application caching took care of recovery (the applications used a DIFFERENT 9TB filesystem to hold files of pending updates to the production filesystem).

Handling 50 million files (the production filesystem) and 35 million (the pending new files) was not the most fun. Worse was the backup time... a complete (portable) backup would take almost a month to do. Vendor propriatary backups (hence nonportable) would take a bit over a week.

I still haven't seen a decent large filesystem (including zfs) that was really usable - they are all rather dependant on a single host (including Lustre based ones, though that may be changing), but managing storage that exceeds a single hosts capacity (fibre channel/infiniband help) but don't solve the problem... yet.

And then there are the ancillary requirements on filesystems - data backup (mentioned briefly for smaller filesystems). Data may be on a real disk, or on huge offline/nearline storage. This is how the 300TB filesystems were handled in 1990. The physical filesystem storage was on disks - 2GB each, 45 of them. This only held the metadata, cache, and files that had not yet been archived to long term storage- tapes. 20000 of them in the silos, and another 10,000 (or so, I wasn't responsible for those) external as "long term archives" as they were very infrequently accessed, but when they were... it took a long time to get them (1/2 hour-48hours depending on which rack they were in). Silo access was known to reach about 1/2 hour (lots of files being accessed at once) but usually was less than a minute, and at best 10 seconds.

zfs tries to ease this pressure with deduplication (but I wonder exactly how well it really works), not sure how btrfs will deal with this yet.

Part of the problem is that storage is no longer treated as an array of blocks. It must be treated as an array of an array of an array of blocks, hence use of storage pools, and complex structures to identify (via multiple levels of redirection) where the data resides. One of the purposes of an fsck (besides just the repair) is to help debug errors in the filesystem operation. This is still missing for btrfs (though that could change rather quickly, as there IS an fsck there, just that it seems to not check everything).

chrismurphy
24th February 2012, 10:13 PM
That's all very interesting and many of the points are well taken, but this is a spaghetti on the wall to see what sticks. It's all out of scope for the original assertion that ZFS has an fsck that is not called fsck, that returns the filesystem to a consistent state, which is exactly what mount -o recovery does, yet somehow that doesn't qualify as fsck like. Now if you are saying neither ZFS nor btrfs have a means of arriving at a satisfactory state of consistency, that's another argument, and quite a bit more subjective (not arbitrary).

Let's say I accept the possibly false argument for unacceptable time to arrive at file system consistency, or the possibly false argument that the state of consistency arrived at with ZFS or btrfs is a regression from XFS or ext4. For such large storage requirements, with huge repercussions for the file system becoming inconsistent in the first place, how is it becoming inconsistent? The example so far is power failure – for the class of storage you're talking about, surely some combination of battery backup and clustered file system is employed to prevent the very real secondary consequences you're talking about.

I mean, really? You've got petabytes of data, and if a disk or LUN dies due to power failure, you're looking at 10 days of fsck? I mean...come on. That's just....

I'm even accepting the possibly false argument that a btrfs fsck takes longer for the same size array as an ext4 or XFS fsck. That also may not be true, because it should take btrfs very little time to determine at what point the inconsistency begins. It does not have to scrub the disk to do that. No filesystem does that for fsck, and no other remotely common filesystems even have scrubbing as an option.

The conversation was about a laptop, and now you're expanding it to a very different scale. Entirely relevant, and absolutely interesting and valid, but you're shifting the goalposts of the conversation. And I think this is ostensibly to avoid walking back on the claim that ZFS has an fsck that merely isn't called fsck, and that btrfs mount -o recovery doesn't at all compare, otherwise you'd have to admit that btrfs functionally has an fsck, per your definition of one, which just so happens to also be inadequate because it might "that could lose days of work (depending on application)" - which is grossly speculative if not total hyperbole.

Just exactly what condition have you encountered, with either ZFS or btrfs, that even approaches these claims and have you filed bugs and what were the dev's responses?

jpollard
24th February 2012, 10:45 PM
I don't have these filesystems anymore, and if I did, they still wouldn't be using zfs or btrfs, as neither are fast enough to handle mainframe/supercomputer environments I worked in.

Luster I have seen handle lots of data quickly, as has CXFS (not XFS - too slow). Both allow multiple systems to access the hardware directly, though neither allows for multiple metadata handling. CXFS also has offline and nearline storage capability.

At the present time, I've seen nothing about a distributed capability for either zfs or btrfs. And nothing for a HSM capability.

For laptops, using ext3/4 will be more than enough reliability and performance for regular use. Neither btrfs NOR zfs are designed for that environment, as both have way more overhead. When laptop internal disks reach 1TB, then ext4 would be more appropriate. But laptops do not have the performace nor memory (though coming close) to support either btrfs/zfs, even if they can actually run them.

chrismurphy
24th February 2012, 11:14 PM
OMFG, I didn't realize I gave you that much rope to hang yourself! I'm so sorry.

Have you actually used ZFS or btrfs with systems and datasets that you've also used ext4 or XFS on? What problems, exactly, did you encounter? What reports of such problems have you made?

Supercomputing environments are out of scope for the conversation, so you can stop bringing up that straw man.

GlusterFS, which by the way Red Hat owns, can use any underlying file system you want, including ext4, XFS, or btrfs.

The entire last paragraph is crap. Have you been insufflating sriracha today? MeeGo is a mobile computing platform that has defaulted to btrfs for almost two years. It supports ARM. Too much overhead for a laptop but OK for ARM based tablet? Seriously? You have no idea what you're talking about. I've been using btrfs on three laptops and in multiple VMs for two years and overhead is not one of the issues. Performance compared to ext4 has been, but the gap has been closed substantially since the 3.0 and 3.1 kernels. If you did know anything about the design of these file systems you'd know why certain compromises on performance are being exchanged for. But performance != overhead.

"even if they can actually run them"

It's just insanity. You've hanged yourself totally, and crapped your pants.

stevea
25th February 2012, 08:21 AM
At the present time, I've seen nothing about a distributed capability for either zfs or btrfs. And nothing for a HSM capability.

Nor will you. Distributed FS design has quite different constraints from a local block device based FS.

For laptops, using ext3/4 will be more than enough reliability and performance for regular use. Neither btrfs NOR zfs are designed for that environment, as both have way more overhead.

"Overhead' is a weasel word. Say EXACTLY what you mean wrt extra btrfs overhead. Btrfs doesn't have significantly distinct performance , memory or cpu usage than ext4.

You do know Meego uses btrfs in light mobile devices - right ? Not just for Enterprise. The issue is NOT just reliability & performance nor the max FS or file size. Btrfs integrates almost every important feature of LVM within the FS - striping, raid, snapshot, and can add/remove physical volumes to/from a filesystem. The subvolume concept is similar to the LV+filesystem. This is exactly why Slowjet's test was silly - it's unrealistic to put btrfs on LVM. LVM adds a performance deficit of it's own too. Now I would be the frst to argue the most of LVMs features aure useless on a single spindle machine - except for snapshot.

But btrfs includes snapshot and in addition you can use subvolumes to act like LVs (or like physical partitions) except subvolumes don't introduce the extra seek time between the 'partitions'. In addiiton th ecompression performance is a huge plus on small disks.

When laptop internal disks reach 1TB, then ext4 would be more appropriate. But laptops do not have the performace nor memory (though coming close) to support either btrfs/zfs, even if they can actually run them.

Why would anyone chose ext4 over btrfs in any application except as legacy - assuming the niggling performance deficits and tools are worked out. It makes no sense.

jpollard
25th February 2012, 12:28 PM
I simply pointed out what they were designed for. Not what they were used for.

And your mention of GlusterFS, shows some of the additional overhead being imposed by not addressing distributed use at the foundation.

And a lot of "performance" depends on your need. Where I worked, the requirement is for 100GB/second (not that they get it yet). Applications are usually finite element analyisis based, wether, chemical, and various physical simulations. Lots of data needed quickly.

And what is the time spent on if not overhead? Is it wasted? That depends on the goal - good long term storage, not so much. Fast access.... not there.

Not a fairly simple interactive environment.

And not insanity - just using what is appropriate for the job.

And prefer ext4 over btrfs... yes. Until btrfs leaves the research stage and becomes production for a couple of years to proove itself, which full recovery and stability demonstrated... a lot of critical systems will not use it.

Is a cell phone critical? only to the owner. Is btrfs stable enough for single application use? probably. Is it certifiable for life critical issues? Not yet.

chrismurphy
25th February 2012, 06:07 PM
http://oss.oracle.com/projects/crfs/

jpollard
25th February 2012, 06:50 PM
http://oss.oracle.com/projects/crfs/

Could be interesting.

Too bad it is Oracle hosted. A number of organizations are loosing their trust in them.

chrismurphy
25th February 2012, 10:57 PM
I simply pointed out what they were designed for. Not what they were used for.

You are a hideous debater. You break accepted rules of debate by constantly changing context and scope. You stated that ZFS and btrfs probably wouldn't even work on a laptop because of the memory and CPU requirements. YOu did not "simply" point out what they were designed for, and you're wrong on that front. I've read thousands of developer entries on ZFS and btrfs and they were very clearly aiming for general purpose user hardware as well as aiming for enterprise requirements as well. They were not looking for an HPC file system as you claim by saying a f'n super computer is needed to run btrfs while you also basically say it's nothing really that new so why bother.


And your mention of GlusterFS, shows some of the additional overhead being imposed by not addressing distributed use at the foundation.

Straw man. You made the over head claim before GlusterFS was even brought up, and now you use GlusterFS as your sole claim for overhead. A file system not even required under the original contexts. Your debate methods are transparently asinine to the degree a house plant could identify your wilfull goal post shifting style of debate, which cites ZERO data to back up any of your specious claims.


And not insanity - just using what is appropriate for the job.

How would you know? You haven't even admitted you've used ZFS or btrfs. You haven't stated exactly what real world experiences are that relate to your complaints about either of them. You've speculated and acted as an armchair file system nerd argumenter/agitator for no real good reason.

And prefer ext4 over btrfs... yes. Until btrfs leaves the research stage and becomes production for a couple of years to proove itself, which full recovery and stability demonstrated... a lot of critical systems will not use it.

You might not think so, but words have meaning. btrfs is not in the research stage. It's an actual deployed file system for at least hundreds if not thousands of users. That it does not meet your requirements, whatever they are, notwithstanding. You do not get to call it research stage. You get to call it in active development, but it is considered stable on a stable machine. You will see it

Is a cell phone critical? only to the owner. Is btrfs stable enough for single application use? probably. Is it certifiable for life critical issues? Not yet.

WTF does that even mean? Certifiable for life critical issues? Is there really such a certification? You're complaining about the need for an fsck to fix btrfs corruption that occurs when power is lost. That happens on ext3. It happens on JHFS+. It happens on many file systems. One of the reasons for COW is to AVOID having to use an fsck in the first place.

In real world contexts, you can produce a system with either ZFS or btrfs that are more verifiable and reliable than any other filesystem, because of their design. No one ever said it was panacea, or better than other options in all cases. You're making the claim that other options are better than ZFS or btrfs in all cases. A claim you cannot possibly prove.

Cloppenburg Photos on Instagram - Mandelieu-la-Napoule Photos - Zhijiang Travel Photos