Killing NFS by setting the time.
FedoraForum.org - Fedora Support Forums and Community
Results 1 to 8 of 8
  1. #1
    Join Date
    Oct 2010
    Location
    Canberra
    Posts
    3,171
    Mentioned
    1 Post(s)
    Tagged
    1 Thread(s)

    Killing NFS by setting the time.

    I am trying to set up a LAN of virtual machines. The user data ($HOME) on all the machines is NFS mounted from a server (virtual) machine [apart from an admin user on each guest.]

    When working on large projects, like VICI, I like to keep the computer on for extended periods, and just use "suspend" when its not in use. Having to log in and start up all manner of tools is a pain that I would prefer to avoid if possible.

    I tried to just suspend the host while the guests were running, but when resumed the guests went a bit crazy - 100% CPU and not very responsive. I assume some network connections got messed up.

    Next step was to use the virtual machine manager to "Pause" each guest before suspending the host. A paused guest simply gets no CPU time - it is "snap frozen". This worked a bit better in that the guests could be resumed after the host was resumed. However the guests clocks were not updated so they were about 10 hours behind after an overnight suspension, and would not recover properly even if running a ntp daemon.

    After a bit research I found a way for the clocks to be corrected from the host. It involved loading a program (qemu-guest-agent) into each of the eight guests and some configuration. A small script to automate it all and the guests could be resumed with the correct time.

    I tested this for the first time this morning. It all seemed to go fine. Then after a few minutes everything came to a stop in the guest I was using. It seemed to be still running but was unresponsive. After reboot and a look at the logs it appeared that NFS had stopped working - since the home directory for the user(s) is NFS mounted this was not good.

    A careful look at the logs showed the following sequence: The guest noticed that time had changed; The guest asked the DHCP server for a renewal on its IP lease. The stupid (and not very configurable) DHCP server that is managed by the virtual machine manager gave back a new, different IP address. For 480 seconds there is a grace period where both addresses work. After that time the NFS server no longer recognised the requests from the guest. Dead.

    One fix would be to remount the NFS, but you cannot unmount a file system when its in use, so this option would require the user to log out, which sort of defeats the entire point.

    The normal fix would be to tell the DHCP server to use indefinite leases so that the same address is always assigned to the same client. Sadly the configuration for the DHCP server is auto-generated by the virtual machine manager and that has no option for specifying the lease intervals.

    So, this afternoon I will be updating the configuration of all the guests to use fixed IP addresses. So much for DHCP.

    User error. Please replace user and try again

  2. #2
    Join Date
    Apr 2009
    Location
    central NY, USA
    Posts
    1,284
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Re: Killing NFS by setting the time.

    I'd be wondering WHY the DHCP server assigned a new address. Perhaps the MAC is incorrect, or worse, changed??
    Change - the only constant.

  3. #3
    Join Date
    Oct 2010
    Location
    Canberra
    Posts
    3,171
    Mentioned
    1 Post(s)
    Tagged
    1 Thread(s)

    Re: Killing NFS by setting the time.

    Quote Originally Posted by lightman47
    I'd be wondering WHY the DHCP server assigned a new address. Perhaps the MAC is incorrect, or worse, changed??
    There is no evidence that the MAC is changing for the virtual machines.
    I agree that the DHCP server should be handing back the same address since its default mechanism is to provide an address based on a hash of the MAC.
    This is what happened on the client:
    Code:
    Jul 26 09:10:53 lumeah systemd: Time has been changed
    ...
    Jul 26 09:20:28 lumeah NetworkManager[1049]: <info>  [1564096828.2101] dhcp4 (ens3): state changed bound -> expire
    Jul 26 09:20:28 lumeah NetworkManager[1049]: <info>  [1564096828.2106] device (ens3): DHCPv4: 480 seconds grace period started
    Jul 26 09:20:28 lumeah NetworkManager[1049]: <info>  [1564096828.2193] dhcp4 (ens3): state changed expire -> unknown
    Jul 26 09:20:28 lumeah dhclient[1267]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 4 (xid=0x48205ed5)
    Jul 26 09:20:31 lumeah dhclient[1267]: DHCPREQUEST on ens3 to 255.255.255.255 port 67 (xid=0x48205ed5)
    Jul 26 09:20:31 lumeah dhclient[1267]: DHCPOFFER from 192.168.122.1
    Jul 26 09:20:31 lumeah dhclient[1267]: DHCPACK from 192.168.122.1 (xid=0x48205ed5)
    Jul 26 09:20:31 lumeah NetworkManager[1049]: <info>  [1564096831.2484] dhcp4 (ens3):   address 192.168.122.37
    Jul 26 09:20:31 lumeah NetworkManager[1049]: <info>  [1564096831.2484] dhcp4 (ens3):   plen 24 (255.255.255.0)
    Jul 26 09:20:31 lumeah NetworkManager[1049]: <info>  [1564096831.2485] dhcp4 (ens3):   gateway 192.168.122.1
    Jul 26 09:20:31 lumeah NetworkManager[1049]: <info>  [1564096831.2485] dhcp4 (ens3):   lease time 3600
    Jul 26 09:20:31 lumeah NetworkManager[1049]: <info>  [1564096831.2485] dhcp4 (ens3):   hostname 'lumeah'
    Jul 26 09:20:31 lumeah NetworkManager[1049]: <info>  [1564096831.2486] dhcp4 (ens3):   nameserver '192.168.122.1'
    Jul 26 09:20:31 lumeah NetworkManager[1049]: <info>  [1564096831.2486] dhcp4 (ens3):   domain name 'vm'
    Jul 26 09:20:31 lumeah NetworkManager[1049]: <info>  [1564096831.2487] dhcp4 (ens3): state changed unknown -> bound
    Jul 26 09:20:31 lumeah avahi-daemon[994]: Withdrawing address record for 192.168.122.36 on ens3.
    Jul 26 09:20:31 lumeah avahi-daemon[994]: Leaving mDNS multicast group on interface ens3.IPv4 with address 192.168.122.36.
    Jul 26 09:20:31 lumeah avahi-daemon[994]: Interface ens3.IPv4 no longer relevant for mDNS.
    Jul 26 09:20:31 lumeah avahi-daemon[994]: Joining mDNS multicast group on interface ens3.IPv4 with address 192.168.122.37.
    Jul 26 09:20:31 lumeah avahi-daemon[994]: New relevant interface ens3.IPv4 for mDNS.
    Jul 26 09:20:31 lumeah avahi-daemon[994]: Registering new address record for 192.168.122.37 on ens3.IPv4.
    ...
    Jul 26 09:20:31 lumeah dhclient[1267]: bound to 192.168.122.37 -- renewal in 1645 seconds.

    User error. Please replace user and try again

  4. #4
    Join Date
    Apr 2018
    Location
    bash shell
    Posts
    76
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Re: Killing NFS by setting the time.

    (ntp)
    About ntp, I've noticed the same problem with updating the current time when the guest was frozen for a long time. Apparently, this is normal because ntp wasn't made to perform such time updates. The theory is, that ntp must be running "forever" thus it stretches or squeezes the time in seconds to adjust the clock over long periods of time. Thus a 10 hour difference will never be fixed.

    Which is why ntp was deprecated and removed from Fedora and CentOS.

    (chrony)
    The solution is to use chrony, which supports instant time update and in addition it allows you to set a more aggressive update schedule. This is the default for Fedora, CentOS and RHEL of course.

    (vm)
    For VM environments, there is support for directly updating the time of the guest, like you discovered with qemu-guest-agent or via virtualbox-guest-additions, depending which VM thing you are using.

    (nfs)
    Anyway, about your NFS problem, one method is to run your own DHCP server which assigns addresses based on MAC, that way your guests will always receive the same IP address. If you can't run your own DHCP server, then your solution is the good choice, just set the IP addresses manually.

  5. #5
    Join Date
    Jun 2004
    Location
    Maryland, US
    Posts
    7,663
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)

    Re: Killing NFS by setting the time.

    Quote Originally Posted by fedup4ever
    (ntp)
    About ntp, I've noticed the same problem with updating the current time when the guest was frozen for a long time. Apparently, this is normal because ntp wasn't made to perform such time updates. The theory is, that ntp must be running "forever" thus it stretches or squeezes the time in seconds to adjust the clock over long periods of time. Thus a 10 hour difference will never be fixed.

    Which is why ntp was deprecated and removed from Fedora and CentOS.
    ntpd does have a 'one shot' time update mode for any size jump, but only by running ntpd once with "ntpd -g" and then starting ntpd normally afterwards

    so I suspect the problem is that "unpausing" a VM isn't treated like starting the OS, the VM would need to restart ntp.service at unpause to simulate rebooting the OS so that the -g would be able to work correctly.

    REF
    https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s1-understanding_the_ntpd_sysconfig_file

    Last edited by marko; 27th July 2019 at 06:27 PM.

  6. #6
    Join Date
    Apr 2018
    Location
    bash shell
    Posts
    76
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Re: Killing NFS by setting the time.

    That is correct, but a manual intervention is a problem, thus ntp always appeared to be broken and not do its job properly, at least on systems that suspend/hybernate in some way or another (like laptops).

    chrony on the other hand is much more robust by default and does not require manual intervention. The best option in chrony is to set:

    /etc/chrony.conf
    Code:
    makestep 1 -1
    Thus allow the clock to be stepped forcefully at any time, without manual commands.

    In addition, some people prefer to make chrony more aggressive with certain ntp servers, especially if the ntp server is on the local network, for example:

    Code:
    server 192.168.1.1 minpoll 2 maxpoll 4 polltarget 30 iburst

  7. #7
    Join Date
    Oct 2010
    Location
    Canberra
    Posts
    3,171
    Mentioned
    1 Post(s)
    Tagged
    1 Thread(s)

    Re: Killing NFS by setting the time.

    This is not solved yet.
    Using fixed IP addresses helps, and sometimes it all works OK, other times the NFS system has another problem:

    NFS: state manager: check lease failed on NFSv4 server servervm.vm with error 13
    This message is from the clients. There are no simultaneous messages from the server.

    The "error 13" makes me think that the problem might be some sort of authentication issue - perhaps the time jump has made some security service a bit skitish.

    User error. Please replace user and try again

  8. #8
    Join Date
    Apr 2018
    Location
    bash shell
    Posts
    76
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Re: Killing NFS by setting the time.


Similar Threads

  1. Setting the time
    By Adunaic in forum Using Fedora
    Replies: 3
    Last Post: 12th March 2011, 02:03 AM
  2. [SOLVED]
    Time setting
    By theAdmiral in forum Installation, Upgrades and Live Media
    Replies: 2
    Last Post: 29th January 2010, 12:07 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •