Debugging system freezes
Posted by niol on Thu 25 Jan 2007 at 10:40
Sometimes your Debian box hangs, and for a strange reason, there is no debugging information printed on your screen. What options do you have?System logs
The first place to look for debug information is /var/log. kern.log, daemon.log, messages and dmesg often contain precious information about what went wrong. This will help you identify which hardware or even software component is causing trouble to the kernel.Console output
Kernel oopses are usually printed in /var/log/dmesg but if the problem stalls hard drive I/O, you won't find much in the log files. And if you are running X, you won't be able to see what is printed to the console. But there are ways to get output.
The first one is to use CONFIG_MAGIC_SYSRQ which enables the magic ALT+PRINTSCREEN kernel command line interface.Serial console
Another one is to plug a serial console, i.e another computer with a null-modem cable on the COM port, or a dumb terminal antique, to the box on which you are experiencing problems. Then, boot your kernel with the console=ttyS[X] where X is the COM port number. From the other box, you can use gkermit to open the console from the other box. This may even work using USB but I could not find how.The netconsole kernel module
If you do not have the hardware, wich is common because most laptops do not come with a COM port nowadays, you can use the netconsole module wich is very handy. It uses very low level network device calls to send via UDP console output across your network. It is included in the standard debian kernel. Using this may help you debug anything but your network device controller driver. In /etc/modprobe.d/, add a file that reads :
options netconsole email@example.com/eth1,firstname.lastname@example.org/01:23:34:56:78:9A
- 192.168.1.1:32769 on eth1 is the ip/port/interface to use the send output from.
- 192.168.1.6:32769 and mac address 01:23:34:56:78:9A is the ip/port/mac to send packets to.
On the 192.168.1.6 box, run :
$ nc -l -p 32769 -u
Then, simply modprobe netconsole on 192.168.1.1 and output should start to appear on 192.168.1.6.
More information on Using Netconsole to See Kernel Messages.Nothing shows when my kernel hangs!
This is the worse case scenario. Linux is usually very talkative. At this point, there is a very good chance that your problem is hardware related :
- Try to reproduce with very few peripherals connected.
- Check you CPU temperature.
- The odds are good that your RAM stick has defects (this is what happened to me), so try another one.
- Do not say that you hate hardware and try to remember what it was like back in the other OS days...
Good luck, because I know this is very annoying!