PDA

View Full Version : Strange problem with e2fsck


Palooka
3rd April 2012, 10:00 PM
I have a strange problem with e2sck. I think the issue is present in F16 too.

There are two (permanently connected) external USB drives which are in my /etc/fstab. They are both ext2.

I can successfuly umount either of them, but when I attempt to run e2fsck (as root) on the unmounted partition, I get:

e2fsck: Device or resource busy while trying to open /dev/sdc1 (or sdb1 if it is the other one)
Filesystem mounted or opened exclusively by another program?

It is certainly not mounted, nor mounted elsewhere. So what is causing the problem?

chrismurphy
9th April 2012, 07:17 AM
See what happens if you reboot in a recovery mode, and then run fsck on those partitions. I've read some reports where CUPS is causing USB sticks to not be unmounted immediately, I wonder if this can also affect USB HDD (don't see why not). Recovery mode or 'single' as a kernel parameter, should translate into CUPS not running.

george_toolan
9th April 2012, 10:31 AM

Please check df output to see if the filesystem is still mounted.

Maybe try e2fsck -n if you just want to check if there are any errors.

-n Open the filesystem read-only, and assume an answer of `no' to
all questions. Allows e2fsck to be used non-interactively.

Palooka
9th April 2012, 11:05 AM
Thanks, Chris and George.

It is definitely not mounted.

Tried in single user mode, and fsck proceeded normally. But I don't think it can be CUPS though; the CUPS service is completely disabled.

On thing I noticed; it happens on fixed disk partitions too, so it isn't specifically a USB issue.

stevea
9th April 2012, 01:30 PM
I just recently saw similar behavior.

I had a btrfs partition I was playing with - and I umounted it successfully, and every attempt to reformat the partition with mkfs causes an 'in use' message. I used 'mount' and 'lsof' to verify nothing was open.

system is 3.3.0-8.fc16.x86_64

I'm fairly certain we are looking at a real bug.
We should try to characterize the problem.
Anyone care to post their fc number and kernel rev ? (uname -r output).

Doing some <mkfs, mount, ls, umount> cycles did not reproduce the problem.
Can anyone say how to reproduce reliably ?

Palooka
9th April 2012, 06:45 PM
I just recently saw similar behavior.

I had a btrfs partition I was playing with - and I umounted it successfully, and every attempt to reformat the partition with mkfs causes an 'in use' message. I used 'mount' and 'lsof' to verify nothing was open.

system is 3.3.0-8.fc16.x86_64

I'm fairly certain we are looking at a real bug.
We should try to characterize the problem.
Anyone care to post their fc number and kernel rev ? (uname -r output).

Doing some <mkfs, mount, ls, umount> cycles did not reproduce the problem.
Can anyone say how to reproduce reliably ?

3.3.1-3.fc17.x86_64 here.
It happens every time for me, as long as the partition is in fstab. It does not happen if the partition is not.

PascalC
13th April 2012, 09:26 AM
3.3.1-3.fc17.x86_64 here.
It happens every time for me, as long as the partition is in fstab. It does not happen if the partition is not.

Hello, same issue for me, with 3.3.x or 3.4 rc1/2; but this is not related to fstab ->

- open a tty (alt + F2)
- init 2 (killing user graphical session and some level 5 systemd units)

And now fsck runs properly.

The culprit is probably a systemd unit (or gnome3 tool ?)

Pascal

Edit : Fedora 16 with last updates

Edit 2 : dmesg : systemd-fsck[2550]: fsck failed with error code 8.

# /lib/systemd/systemd-fsck -h
fsck.ext4*: option invalide -- 'h'
Usage*: fsck.ext4 [-panyrcdfvtDFV] [-b super-bloc] [-B taille-de-bloc]
[-I nombre-blocs-du-tampon-i-noeuds] [-P taille-i-noeud-processus]
[-l|-L fichiers-des-blocs-défectueux] [-C fd] [-j journal-externe]
[-E options-étendues] périphérique

Seems that systemd provides a wrapper for fsck (probably handled by fedora-storage-unit ?)

stevea
13th April 2012, 11:02 AM
I've been unable to reproduce.
The only time I've seen it the fs was mounted from fstab, this may imply it' a result of the mount unit of systemd, but ...


- open a tty (alt + F2)
- init 2 (killing user graphical session and some level 5 systemd units)
And now fsck runs properly.

The culprit is probably a systemd unit (or gnome3 tool ?)


Um - well that a huge leap. "init 2" changes loads of things besides the systemd state and shutting down gnome.

I suggest IF you can reproduce it, that you check he status of all the systemd units related to that mount point.
Then see if stopping them helps.

PascalC
13th April 2012, 12:02 PM
Um - well that a huge leap. "init 2" changes loads of things besides the systemd state and shutting down gnome.

My point was : not kernel related, not /etc/ftab related.

Of course, I'll try to determine precisely the culprit : not so much services from init 5 to 2as level 2 seems to be also a multi-user target, but systemd is really hard to debug...

Edit : to reproduce the fsck error, the filesystem must be described in /etc/fstab; I forgot to mention that in my first post.
What I mean when I said this is not fstab related : you don't need to remove fstab entry to perform properly the fsck...

PascalC
18th April 2012, 12:15 PM
I was wrong : multi-user.target or graphical.target behave similarly.
Under a gnome-terminal session (graphical) or a tty session (multi_user), you can umount a filesystem described in /etc/fstab and perform a fsck, but just once; if you remount the filesystem, and then umount it, fsck fails :

[root@euler src]# umount /boot
[root@euler src]# fsck /dev/sda5
fsck from util-linux 2.20.1
e2fsck 1.41.14 (22-Dec-2010)
/dev/sda5: clean, 278/128016 files, 88123/512000 blocks
[root@euler src]# fsck /dev/sda5
fsck from util-linux 2.20.1
e2fsck 1.41.14 (22-Dec-2010)
/dev/sda5: clean, 278/128016 files, 88123/512000 blocks
[root@euler src]# systemctl start boot.mount
[root@euler src]# umount /boot
[root@euler src]# fsck /dev/sda5
fsck from util-linux 2.20.1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Device or resource busy while trying to open /dev/sda5
Filesystem mounted or opened exclusively by another program?
[root@euler src]# fuser -m /dev/sda5
/dev/sda5: 1 459 591 1020 1044 1045 1051 1057 1064 1068 1074
1082 1083 1084 1092m 1095 1096 1160 1163 1172 1176 1181m 1184 1188
1191 1195 1198 1216 1217 1223 1232 1233 1295 1298 1309 1390 1392
1402 1403 1405 1461 1465 1486 1487 1490 1492 1497 1499 1505 1506
1527 1530 1531m 1553 1554 1555 1561 1584 1587 1590 1594 1597 1610
1619 1627 1629 1647 1655 1659 1674 1698 1700 1706 1710 1713 1719
3352 3383 4133 4249


Similarly, you can remove a logical volume that was not mounted at boot, but once mounted, no matter :
[root@euler ~]# lvremove /dev/vg_euler/test
Do you really want to remove active logical volume test? [y/n]: n
Logical volume test not removed
[root@euler ~]# mount /dev/vg_euler/test /mnt
[root@euler ~]# umount /mnt
[root@euler ~]# lvremove /dev/vg_euler/test
Can't remove open logical volume "test"
[root@euler ~]# lvchange -an /dev/vg_euler/test
LV vg_euler/test in use: not deactivating
[root@euler ~]# fuser -m /dev/vg_euler/test
/dev/dm-1: 1 442 574 978 995 1007 1022 1030
1034 1041 1048 1050 1051 1058m 1062 1063 1064 1141
1142 1143 1146 1153 1155m 1161 1165 1167 1184 1185
1190 1199 1200 1262 1265 1276 1357 1359 1369 1370
1374 1396 1398 1421 1422 1425 1427 1436 1439 1440
1447 1450 1451m 1457 1458 1459 1463 1465 1467 1471
1474 1485 1494 1507 1520 1527 1548 1559 1567 1571
1591 1615 1617 1625 1629 1630 1633 1702 1708 2044 2080
2138


Processus listed by fuser don't show any obvious attachment to the device (partition or LV) with lsof, so I guess the lock on fsck or lvremove is the socket driven systemd mechanism ?

However, is this a feature or a bug ? after all, we don't need to perform a fsck twice ?
It is more annoying for lvremove, as we have to disable the LV in /etc/fstab, and reboot to perform it.

Can anybody reproduce the issue ? (I am running F16 with systemd 37.17)

jpollard
18th April 2012, 01:00 PM
Interesting list of processes there... 1 is init, also known as systemd.

And you can't kill it without either a panic or shutdown.

PascalC
18th April 2012, 03:28 PM
Interesting list of processes there... 1 is init, also known as systemd.

And you can't kill it without either a panic or shutdown.

Yes.

Another test that shows that this is definitely a bug:

[root@euler ~]# df -hP /data
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_data-lv_data 300G 219G 66G 77% /data
[root@euler ~]# fuser -m /dev/mapper/vg_data-lv_data
/dev/dm-0: 1 458 585 740 745 999 1020 1021 1028 1036 1043 1047 1053 1061 1062 1063 1071m 1074 1075 1141 1144 1150 1153 1159m 1163 1164 1170 1174 1177 1194 1196 1202 1211 1212 1274 1277 1287 1368 1372 1377 1378 1386 1408 1410 1441 1466 1469 1471 1476 1478 1480 1482 1491 1494 1495m 1504 1505 1506 1511 1516 1530 1539 1550 1572 1575 1579 1580 1588 1628 1636 1643 1648 1677 1682 1686 1688 1698 1705
[root@euler ~]# umount /data
[root@euler ~]# fuser -m /dev/mapper/vg_data-lv_data
/dev/dm-0: 1 458 585 740 745 999 1020 1021 1028 1036 1043 1047 1053 1061 1062 1063 1071m 1074 1075 1141 1144 1150 1153 1159m 1163 1164 1170 1174 1177 1194 1196 1202 1211 1212 1274 1277 1287 1368 1372 1377 1378 1386 1408 1410 1441 1466 1469 1471 1476 1478 1480 1482 1491 1494 1495m 1504 1505 1506 1511 1516 1530 1539 1550 1572 1575 1579 1580 1588 1628 1636 1643 1648 1677 1682 1686 1688 1698 1705
[root@euler ~]# fsck /dev/mapper/vg_data-lv_data
fsck from util-linux 2.20.1
e2fsck 1.41.14 (22-Dec-2010)
/dev/mapper/vg_data-lv_data: clean, 201433/19644416 files, 57396087/78575616 blocks
[root@euler ~]# mount /data
[root@euler ~]# fuser -m /dev/mapper/vg_data-lv_data
/dev/dm-0: 1 458 585 740 745 999 1020 1021 1028 1036 1043 1047 1053 1061 1062 1063 1071m 1074 1075 1141 1144 1150 1153 1159m 1163 1164 1170 1174 1177 1194 1196 1202 1211 1212 1274 1277 1287 1368 1372 1377 1378 1386 1408 1410 1441 1466 1469 1471 1476 1478 1480 1482 1491 1494 1495m 1504 1505 1506 1511 1516 1530 1539 1550 1572 1575 1579 1580 1588 1628 1636 1643 1648 1677 1682 1686 1688 1698 1705
[root@euler ~]# umount /data
[root@euler ~]# fsck /dev/mapper/vg_data-lv_data
fsck from util-linux 2.20.1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Device or resource busy while trying to open /dev/mapper/vg_data-lv_data
Filesystem mounted or opened exclusively by another program?
[root@euler ~]# fuser -m /dev/mapper/vg_data-lv_data
/dev/dm-0: 1 458 585 740 745 999 1020 1021 1028 1036 1043 1047 1053 1061 1062 1063 1071m 1074 1075 1141 1144 1150 1153 1159m 1163 1164 1170 1174 1177 1194 1196 1202 1211 1212 1274 1277 1287 1368 1372 1377 1378 1386 1408 1410 1441 1466 1469 1471 1476 1478 1480 1482 1491 1494 1495m 1504 1505 1506 1511 1516 1530 1539 1550 1572 1575 1579 1580 1588 1628 1636 1643 1648 1677 1682 1686 1688 1698 1705


Edit : I replayed the whole sequence in order to have the list of involved processes at each stage...

---------- Post added at 03:01 PM ---------- Previous post was at 02:14 PM ----------

I report this bug to Red Hat bugzilla :
https://bugzilla.redhat.com/show_bug.cgi?id=813794

---------- Post added at 04:28 PM ---------- Previous post was at 03:01 PM ----------

My bug report was a duplicate of 808795 (still not solved) :
https://bugzilla.redhat.com/show_bug.cgi?id=808795

chrismurphy
18th April 2012, 07:36 PM
I wonder if this leaking is what causes the light on my external drive, formatted ext4 and mounted, to flash 4x per second. And I'm not causing it to be used for anything. Nothing is being written or read. And lsof is unrevealing.

The behavior is not reproducible with the same disk formatted JHFSX, or XFS, or Btrfs - and also mounted. Seems to just be an ext4 thing.

And I further wonder if this business is what's causing XFS and Btrfs to appear faster than ext4 (by quite a bit). in F17. Interrupts of some sort? When I copy 15G worth of files from a faster disk, to this external Firewire disk: ext4 exhibits many pauses up to 3 seconds long where no writing is occurring. Whereas XFS and Btrfs saturate the drive.

And get this, if I set up a dmcypt partition, and drop ext4 on top of that, ext4 reads and writes are FASTER. It still doesn't beat XFS or Btrfs. But has anyone heard of a dm device making a file system faster?

It's weird.

jpollard
19th April 2012, 03:39 PM
I think it depends on the DM configuration...

If the metadata is on one drive, but the data is on another, then yes, dm will make a filesystem faster. It can also alter how the cache is used.