Weblog entry #39 for Utumno

Problem: Lack of entropy?
Posted by Utumno on Mon 2 Jun 2008 at 05:14
Tags: none.

We have a server here ( CentOS 5.1, so this is maybe not the best place to ask this ) that has the following problem:

every few days, one simply cannot log in to the system. Neither through ssh, ftp, from the console, nothing. Most of the times when I enter my username in the login prompt and press enter, even the "Password:" prompt does not get displayed. Other times the "Password:" prompt gets displayed, but after typing the password, the comp simply hangs. I've tried waiting for 30 minutes for it, and nothing.

There is absolutely nothing in any of the /var/log/* files that would shed any light on this.

Could this be the lack of entropy? It's a little server with a monitor, mouse and keyboard attached, but we rarely use those. Normally we only log in through the network.

The server is not very security-critical. It's kept locked in the intranet with no direct access to the wwweb. Would it be a good idea to simply

rm -f /dev/random
ln -s /dev/urandom /dev/random

to find out if it's really a problem with entropy ??

Could I leave some command running on the screen ( like 'top' ) that would hopefully shed some light on this the next time this problem happens?

Right now I am stuck - there's nothing in the logs, and when it happens, the only thing I can do is reboot it...

This:

http://www.centos.org/modules/newbb/viewtopic.php?forum=6&topic_id=4312&viewmode=flat

suggests the problem might be with auditd shutting off logins when HDD is almost full, but our partitions are not ( all below 20% full )

 

Comments on this Entry

Posted by Anonymous (59.176.xx.xx) on Mon 2 Jun 2008 at 10:07
Could be a hardware problem:

suggestions, things to think about:

temperature failure (cpu fan, smps fan)? (hdtemp in smartctl package for temperature monitoring of drive)

voltage from smps marginal -> on ssh cpu use -> system glitch.

memory modules? -> press them in firmly. memtest86(+)

waking up from sleep -> power surge related issue?

Does the problem show if you already have an ssh session and you start doing stuff in it?


best
PJ

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Mon 2 Jun 2008 at 11:07
[ Send Message | View Steve's Scratchpad | View Weblogs ]

I have seen loack of entropy problems in the past affecting Solaris machines and the symptoms almost fit.

A good idea might be to enable telnet access - that should continue to work if the problem is SSH failure due to lack of numbers. (Obviously this is only an option if you're really on a trusted network, etc).

Hardware issues, as suggested above, might be more likely, but I guess if you could telnet in and not ssh then you've certainly learned something...

Steve

[ Parent | Reply to this comment ]

Posted by Utumno (60.248.xx.xx) on Tue 3 Jun 2008 at 04:34
[ Send Message | View Utumno's Scratchpad | View Weblogs ]

I have some more info.

This time I kept an active SSH session to the server and the problem happened sometime during the night again. Unfortunately, I am only logged in as a regular user, and I cannot su to root ( I type 'su' , "Password:" prompt appears, type my password, it hangs. I can type Ctrl+C to come back to bash prompt however )

Facts:

1) when I type 'ps aux' I can see a lot of hanging 'crond' processes
2) apart from the problem with logging in, the whole system appears to work normally.
3) Each time I try su'ing to root over the SSH session, I can see the following pop up on tty1:

audit(1212459844.653:196) : user pid=4535 uid=500 auid=500 subj=user_u:system_r:unconfined_t:s0 msg='PAM: authentication acct="root" : exe="/bin/su" (hostname=?, addr=?, terminal=pts1, res=success)'

4) Each time I try logging in from the console, I can see the following pop up just below the hanging login attempt:

audit(1212459844.653:196) : user pid=4638 uid=0 auid=4291967295 subj=system_u:system_r:local_login_t:s0-s0:c0.c1023 msg='PAM: authentication acct="root" : exe="/bin/login" (hostname=?, addr=?, terminal=tty3, res=success)'

5) the above 'audit' lines appear only if I type the correct password. If I type the incorrect one, the lines do not appear, but the system hangs in exactly the same way nonetheless.

[ Parent | Reply to this comment ]

Posted by Utumno (60.248.xx.xx) on Tue 3 Jun 2008 at 04:43
[ Send Message | View Utumno's Scratchpad | View Weblogs ]

It also doesn't seem to be a problem with overheating, or fan mulfunction.
'top' shows cpu usage 0.1%. Memory usage looks normal to me. Swap almost empty.
vmstat - looks normal.

Maybe it's something related to HDD mulfunction, but would that kind of problem exhibit itself only during logins? Seems to me that'd be highly improbable...

[ Parent | Reply to this comment ]

Posted by Anonymous (194.176.xx.xx) on Wed 4 Jun 2008 at 11:28
Is there some problem writing to /var/loag/auth.log (or the equivalent on CentOS)? SSH logins, cron and sudo all write to this on my default ubuntu install. Does CentOS limit the size of the /var/log directory in some way? Is /var mounted separately?

[ Parent | Reply to this comment ]

Posted by francois99 (82.238.xx.xx) on Thu 5 Jun 2008 at 20:07
[ Send Message ]
I had exactly the same (mysterious ?) problem on an IBM ThinkCenter A51 running under Fedora Core 6 (kernel 2.6.18).
The computer previously run perfectly (H24 7/7) under Fedora Core 3 ; the problem appeared when installing FC6 and last several weeks with unsuccessful investigations (no logs, random locking...).
It was solved (without explanation) by upgrading the kernel to version 2.6.20, available in the FC6 updates.
The same distribution with the same kernel (2.6.18) and configuration run perfectly on some other computers.
I think that you have a 2.6.18 kernel in CentOS 5.1 (like RHEL 5.1).
Does it mean that the 2.6.18 kernel running on certain kind of hardware is prone to locking ?

[ Parent | Reply to this comment ]

Posted by Utumno (61.229.xx.xx) on Wed 11 Jun 2008 at 15:29
[ Send Message | View Utumno's Scratchpad | View Weblogs ]

Yes, it's 2.6.18

I upgraded to 2.6.20 ( also from CentOS updates ) and so far, after 3 days, the problem has not reappeared...

[ Parent | Reply to this comment ]

User Login

Username:

Password:

[ Advanced Login ]

Register Account

Quick Site Search