PDA

View Full Version : authentication problem



William Haller
10th February 2010, 03:45 AM
I recently updated to the latest patches. One box fails to boot to a usable graphical login (login list is empty) or screen is blank/blue depending on whether or not ldap is configured via authconfig-tui. In both cases, the mouse and keyboard are inoperable. A hard power off and boot to single user is possible.

Booting to single user gives the following information - the following services are failing to start:

haldaemon

nfslock:
from messages - unable to register (statd, 1 udp)

netfs fails due to no statd running

rpcbind is running
rpcinfo -p returns portmapper valid
rpcinfo -p localhost (or 127.0.0.1) returns complaint about weak credentials.

Digging more into statd start failure would also indicate failure due to credentials in its strace I believe.

hosts.allow and hosts.deny copied from working system. hosts.allow has been modified in several ways trying to get rpc.statd to start, but no success. The copied hosts.allow file from the working system has LOCAL, localhost, 127.0.0.1, and the full subnet info with an explicit ALLOW. hosts defines localhost correctly.

nscd is off. WIth nscd on, it appears that local /etc/passwd files are never consulted. nsswitch.conf, ldap.conf, openldap/ldap.conf copied from working system. With nscd on, can't sshd into system because can't find privilege separation user to start daemon. WIth nscd off can sshd into system.

I've tried pulling backups onto the box from the first of the year and doing a diff to see if anything came up blazingly obvious in /etc that changed, but I don't spot anything. I'll be going back through the backup log to see what files have been modified from their original state as well, but haven't done that yet.

I'm missing something and google hasn't given any good info. The working system is essentially identical hardware and software. It has two additional web and IM related packages which have nothing to do with authentication. The failing box has added the fprint pam module since authconfig-tui complained about it, although it was working before without those installed.

The first time it rebooted after restart, I was able to log in until nscd (which was running at the time) timed out its user information. At that point it was toast. It was running fine before I shut it down after the kernel update (2.16). It hadn't been shut down (nor had I logged off) since the 2.13 kernel, which admittedly wasn't that long back. Currently have authconfig-tui set for ldap w/o caching. An ldapsearch -x returns the expected information without delay however only real users are in the LDAP data base - the system daemon users and root should all come from local files. This hadn't changed from previously and again, other network boxes aren't having trouble.

Any ideas?

William Haller
10th February 2010, 08:16 PM
Further information. Not a disk space problem. Turning iptables off temporarily doesn't change anything.

William Haller
12th February 2010, 03:05 PM
Additional Information:

The rpcinfo program on the failing system can do rpcinfo requests to any other appropriate host on the network and get results. It can do a rpcinfo -p. But any time it tries to use any form of localhost it gets complaint about weak client credentials.

Likewise, other working hosts can do rpcinfo -p queries for any other host except the failing host. For it, they get complaints about weak credentials.

localhost resolves - it can be pinged and returns 127.0.0.1 as it should. Again - hosts.allow and hosts.deny have been copied from working hosts, and the system name matches the DNS name, IP address resolves in reverse DNS to that name - doesn't seem to be a DNS issue. I'd remove tcp wrappers, but at this point doing that would take out the world due to dependencies.

A wireshark trace on a working query shows the rpcinfo -p sending and receiving AUTH_NULL credentials. For the failing box, the trace shows AUTH_NULL being sent, but being refused as a credential.

So the question that Google really hasn't answered for me is how do you go about restoring a system to a state where AUTH_NULL is OK? I'm not mounting at this point. I know you can set security for mounts on the command line. This is just about rpcinfo -p on this box not taking AUTH_NULL. This appears to be what is killing the rpc.statd startup (nfslock), and is probably the root of the problem.

Thanks.

Ihatewindows
14th November 2012, 09:11 PM
I know its been a couple years...still have the problem?

William Haller
14th November 2012, 09:15 PM
I ended up doing a reinstall and restoration of config files from backups.