PDA

View Full Version : Fedora 15: NFS mount in fstab hangs boot


zack_dingbat
18th July 2011, 10:51 AM
I have been trying to work out how to set up Fedora 15 to automatically mount an NFS share at boot time. I can mount the share interactively using 'mount -t nfs server:/usr/local /usr/local'.

When I put the entry in /etc/fstab, it stops the machine booting. It tries to give me a shell ('Enter root password for shell or press Control-D to exit') or something close to that. However, I cannot enter the maintenance mode, it hangs. Same thing with pressing control-D, it hangs and doesn't get any further.

I rescued the system by booting off a CD, mounting root, and removing the nfs entry from fstab. After that it booted fine.

The entry I had in the fstab is:

nfsserver:usr/local /usr/local nfs ro,hard,bg,intr,comment=systemd.automount 0 0

I put the 'comment=systemd.automount' entry in because of some related searches I did in forums. Someone suggested that fixed things...it doesn't :(

Anyone got any suggestions about how to make this work in Fedora 15?

Thanks very much

glennzo
18th July 2011, 10:57 AM
The system is trying to mount the share before the network is up. That's why it hangs. It's all about timing. If I remember correctly, this has been covered in a few threads here so a search may turn up some helpful threads.

zack_dingbat
18th July 2011, 12:41 PM
I have searched and not found anything useful. I don't know if I'm using the wrong search terms...I searched for 'nfs mount hang'.

A further thought I had was to try putting _netdev in the fstab line, like I do with iscsi devices, but that doesn't seem to have helped, the system still hangs on boot.

jpollard
18th July 2011, 01:19 PM
Enter it as yet another systemd bug.

DBelton
18th July 2011, 09:36 PM
I believe you need the noauto option when using the systemd.automount.

add this to your options: noauto,comment=systemd.automount

also, if you are using network manager for your connection, make sure you have the enable at boot option on, and also enable for all users.

jpollard
18th July 2011, 09:58 PM
It's still a bug.

Using noauto is fine if someone can log in to mount it.

But it doesn't work if your home directory happens to be on NFS.

DBelton
19th July 2011, 12:57 AM
it wouldn't be a systemd bug if he is using network manager and don't have it set to start for all users since it will wait until login to start the network unless set for all users.

If your home directory is on nfs, then you either need to use network or set network-manager to be available for all users.

I just did some testing, and the noauto isn't required anymore. It was for a little while during the alpha, but it appears it's not needed anymore.

(but the noauto along with the systemd.automount would mount the share the first time anyone tried to access it)

jpollard
19th July 2011, 12:04 PM
It is still a bug.

No network, means NFS mount attempts should report an error.

Not hang Fedora 15 boot.

And that makes it a bug.

BTW, it still won't mount on first access if the network isn't working - login tests the home directory before the user is logged in as part of authorization tests.

And since NetworkManager doesn't connect the network until after the user is logged in... you deadlock, no login.

DBelton
19th July 2011, 03:07 PM
I guess it would be a bug in NFS if it hangs instead of reporting an error.

Network manager will connect the network before user login if you have it set to start at boot, and set it to be available for all users. It then starts the network when the network manager service is started.

lightman47
19th July 2011, 06:18 PM
I encounter this "hang" when the target of the mount is not available (machine off, for instance). The timeout is EXCRUCIATINGLY long, and then there is another after the error message (We're talking MINUTES - several). Eventually the boot-up continues, less the failed mounts. It took me a LONG time to discern this and I rebooted to no avail many times I didn't have to! Mine only happens after power outages. Why? Cause doap that I am, the "target" of my mount is set to remain off after power out, but my server still comes back on and tries to mount it. Seeing THAT pattern was tricky & elusive.

jpollard
19th July 2011, 06:44 PM
You will only get that hang if the network is up.

lightman47
19th July 2011, 07:17 PM
When target available, no hang. (At least for me). Target missing - go into town for coffee.

jpollard
19th July 2011, 08:11 PM
so?

Your network is up. Therefore you get a delay caused by a TCP timeout (assuming NFSv4/3 over tcp).

It used to be that the error handling was done by the startup scripts as part of the SysV init levels.

Now, systemd has taken over all responsibility for TCP services (including NFS) it also has the responsibility for all the error handling.

Obviously, it doesn't do it.

stevea
19th July 2011, 11:34 PM
so?

Your network is up. Therefore you get a delay caused by a TCP timeout (assuming NFSv4/3 over tcp).

It used to be that the error handling was done by the startup scripts as part of the SysV init levels.

Now, systemd has taken over all responsibility for TCP services (including NFS) it also has the responsibility for all the error handling.

Obviously, it doesn't do it.

Well I know you have some personal grudge against systemd - but you supposition is wrong.

I just removed the exports on my server (F14, NFS4) and rebooted an F15 client and it came up without delay.
systemctl status 'mount-point'.mount
correctly describes the error/failure.

My server /etc/exports looks like ....
...
/home 192.168.17.0/24(rw,insecure,sync,no_subtree_check,mp=/home,fsid=0,no_root_squash)
/home/common 192.168.17.0/24(rw,insecure,sync,no_subtree_check,mp=/home,fsid=1,no_root_squash)

The client fstab contains:
...
server:/common /home/common nfs4 _netdev,rw,noatime 0 0

...

zack_dingbat
20th July 2011, 10:05 AM
The NFS server is on, and NetworkManager is set to be up on boot up (before anyone logs in). I am still having the problem of hanging on boot. And it's not just a hang for a few minutes..I left the machine on all weekend (I was sick of it and I couldn't be bothered to go to the server room to turn it off) and it still hadn't booted by Monday. As soon as I removed the NFS mount in fstab, it booted in a few seconds.

Despite all the posts further to my initial one (thankyou all) I am still no closer to resolving this. So I conclude it's not just a simple configuration error on my part.

I find it worrying that something like this is so broken in a new distro. It's not the first time I've had this problem - Ubuntu broke NFS mounting when it introduced Upstart. Surely I'm not the only admin using NFS mounts? I know it's a bit old hat now, and there's a lot I don't like about NFS, but in the absence of a decent replacement, we're stuck with it for the time being. (I use pam_mount for some filesystems, but it only works when people log in, of course.) What happens when systemd makes it through to Redhat 7? Will this issue be fixed then? :blink:

I'm going to just accept defeat, I think, and do something else, like rsync the NFS file system to the new server.

jpollard
20th July 2011, 12:27 PM
There are other timing issues that can occur - having the mount fail before the network is up is only one of them.

Upstart had a similar problem early on, but as I recall, it was fixed in fedora quickly.

Steve - exporting is a totally different issue. If your network is down on the F15 client, it will hang during boot because of the improperly handled failure. It doesn't matter whether the server is up or down.

stevea
20th July 2011, 07:03 PM
It's unclear that anything is broken aside from your config. I appreciate that you may not have the time or inclination to debug the issue, but let's not assign blame until/unless you do. If you can't tolerate some breakage out of the box, then Fedora is not a good choice.

As it hung so long it's clearly neither a TCP timeout nor an NFS timeout (default 5 minutes, 3x retries).
Unless the NFS in/out of fstab is reproducible ion repeated trials it may be a coincidence.
Searching the logs and ... but you've already moved on.


Steve - exporting is a totally different issue. If your network is down on the F15 client, it will hang during boot because of the improperly handled failure. It doesn't matter whether the server is up or down.

Thanks for the clarification, but that's quite differnet from TCP timeout you postulated.

---------- Post added at 02:03 PM ---------- Previous post was at 12:29 PM ----------


Steve - exporting is a totally different issue. If your network is down on the F15 client, it will hang during boot because of the improperly handled failure. It doesn't matter whether the server is up or down.

Nope - wrong !

I yanked the cable in the F15 client and it came up fine - no delays.
Of course the NFS mount failed showing.

[root@lycoperdon Desktop]# systemctl status home-common.mount
home-common.mount - /home/common
Loaded: loaded
Active: failed since Wed, 20 Jul 2011 13:51:54 -0400; 2min 15s ago
Where: /home/common
What: hypoxylon:/common
Process: 776 ExecMount=/bin/mount /home/common (code=exited, status=32)
CGroup: name=systemd:/system/home-common.mount

About 30 seconds after plugging the cable in - and without any administration, I get
[root@lycoperdon Desktop]# systemctl status home-common.mount
home-common.mount - /home/common
Loaded: loaded
Active: active (mounted) since Wed, 20 Jul 2011 13:54:49 -0400; 1min 41s ago
Where: /home/common
What: hypoxylon:/common
Process: 776 ExecMount=/bin/mount /home/common (code=exited, status=32)
CGroup: name=systemd:/system/home-common.mount

The network is then up and the NFS mount is complete,

======

BTW there is a nearly vanilla F15 install, packages added include the nfs stuff (not on the live CD) and the only client nfs client configuration was to create the fstab enty manually.


I'm perfectly willing to believe there are bugs in systemd's config, but this looks like an admin induced error. by the OP.

marko
20th July 2011, 07:25 PM
I have been trying to work out how to set up Fedora 15 to automatically mount an NFS share at boot time. I can mount the share interactively using 'mount -t nfs server:/usr/local /usr/local'.
.......
The entry I had in the fstab is:

nfsserver:usr/local /usr/local nfs ro,hard,bg,intr,comment=systemd.automount 0 0




You need a "/" (slash) in front of the 'usr' in nfsserver:/usr/local

zack_dingbat
21st July 2011, 10:36 AM
You need a "/" (slash) in front of the 'usr' in nfsserver:/usr/local


I changed the name of the server when I posted, and must have erased the slash. It would be good if the solution were that simple, albeit embarassing on my part :)

---------- Post added at 10:36 AM ---------- Previous post was at 10:10 AM ----------

[QUOTE=stevea;1496922]It's unclear that anything is broken aside from your config. I appreciate that you may not have the time or inclination to debug the issue, but let's not assign blame until/unless you do. If you can't tolerate some breakage out of the box, then Fedora is not a good choice.
As it hung so long it's clearly neither a TCP timeout nor an NFS timeout (default 5 minutes, 3x retries).
Unless the NFS in/out of fstab is reproducible ion repeated trials it may be a coincidence.
Searching the logs and ... but you've already moved on.



OK... I've had a look in the /var/log/messages and there is no reference to the failed NFS mount. I have some messages relating to a timeout when trying to unmount the filesystem on shutdown (as I say I can manually mount the NFS file system fine).

The hang at boot is definitely reproducible and due to the fstab entry. I have tried at least 3 times removing the entry after a failed boot, and when I remove the entry it boots fine.

I did think it odd that it hung for so long as I had read that the timeout for systemd services was just a few minutes.

The current entry is this (although it's commented out):
ideptstore:/export/fedexe /usr/local nfs ro,hard,bg,intr,comment=systemd.automount,_netdev 0 0

I'm willing to have a go at resolving this, if you can suggest things to try out. As I'm sure you understand, as a sysadmin it's my job to get things working, and I have found a way around this issue. But it would be good to sort out the NFS mounting if it's possible, for the sake of satisfaction and possible usage in the future.

Cygn
21st July 2011, 11:32 AM
hmm if your NFS server is not version 4, but 3 for example as in my case, you should specify it in mount options. thats what I have:
nas:/shares/Volume1 /mnt/nas nfs defaults,vers=3,rw,soft,intr,rsize=8192,wsize=8192 0 0

jpollard
21st July 2011, 11:52 AM
Normally, that shouldn't matter, NFSv4 clients also try 3 if 4 doesn't work.

Even then, there should be an error message about the failed mount...
and no hang.

DBelton
21st July 2011, 01:19 PM
I am trying to find the problem on my systems here, and I can't seem to get it to fail :(

I have tried with network-manager, nfs, and mounting the share in fstab

Things I have tried..

booting with the server powered down...

booting with the network down.

I tried various options in fstab, with and without the comment=systemd.automount, _netdev, auto and noauto

Setting network-manager to start at boot, and setting it to not start at boot

setting network-manager to be available for all users, and not be available

I even tried putting known errors in the fstab and in exports on the server

I couldn't get NFS to hang on the client, several times it threw errors, or timed out, but it never hung.

I did notice that you specified a background mount in your fstab entry. Did you realize that the default timeout value for bg mounts is.. well.. about 10000 minutes?

Also, the bg mount is causing NFS to fork a new process for the mount, and return a 0 exit code back to NFS.


Edit:

Giving it a little thought, the bg option may be what is causing the problem. systemd may be having issues with NFS forking a new process.

Try it using a fg mount and see if your results are different. At least maybe it might give you an indication of the error if it is getting on on the mount.

jpollard
21st July 2011, 04:09 PM
Speaks to the problem of difficulty in debugging doesn't it.

NFS backgrounding is normal. This is designed to allow the boot to continue without waiting for it to complete.

Errors (and warnings) that occur should still go to syslog, though with systemd it seems that many such "should still go" messages disappear.

zack_dingbat
21st July 2011, 04:25 PM
Thanks for the suggestion.

I removed bg from the options and replaced it with fg, still hangs...

...but...

I then removed all the custom mount options and left it as defaults,ro and now it boots! There were some errors regarding the mounting when it was booting, but when I look in `systemctl` it shows the service running, and my file system is indeed mounted.

So somewhere in my original fstab entry, I messed it up with the options. I guess it's now a case of trying each one to see which one is the culprit.

Anyway it's good news.

jpollard
21st July 2011, 04:47 PM
My guess would be mixing "hard" and "bg", but both should be valid.

zack_dingbat
22nd July 2011, 09:51 AM
Actually, it seems the mount options have been changed quite a bit. If I do a 'man mount' I can't find bg, fg, intr, or hard, so there's no point in putting those in my fstab entry.

It's also pointless putting in _netdev, as the system boots without it.

DBelton
22nd July 2011, 12:18 PM
that is because those options are listed if you do a "man nfs" (they are specific options for nfs) look for the section "mount options"

dwisehart
9th August 2011, 02:34 PM
This is also a CIFS mount issue. CIFS mounts are behaving just like you describe the problem with NFS mounts at boot time.

I think this is a bigger issue than NFS.

DBelton
9th August 2011, 03:34 PM
yes, it is a bigger issue than NFS.

I have found a workaround, though. What is happening is that systemd is handing off starting the network to network manager, then it's going on and trying to mount network shares before the network is fully up and settled.

Try enabling this service to delay the start of the mounts just a little.

From a terminal or console screen as root user:

systemctl enable NetworkManager-wait-online.service

lensman3
16th August 2011, 04:04 AM
I had a similar problem with starting arnos firewall. It would never get started because the network wasn't working/up. It turned out again to be a problem with systemd. I worked around it by putting the firewall command in /etc/rc.0/rc.local. You might have to put the single mount command in that file. system.d insures that this rc.local is run LAST. (I left the default firewalls as is because at least those firewalls would keep my machine off the Internet while the ether card was brought up. There was no timing hole).

system.d is not ready for prime time!!!! (Mainly a lack of ANY reliable documentation.)

jpollard
16th August 2011, 04:30 AM
That still doesn't guarantee the timing.

Too much of systemd doesn't do anything until something requests it.

Then and only then is the requested service started... but the time it takes for that may be longer than the request is allowed to take.

It all depends on what you are doing. A NFS mount may still not work because the network has not been started... or is still in the process of being started.

I agree about the lack of documentation - I've complained about it before.