Remotely rebooting Debian GNU/Linux machines

Posted by docelic on Mon 3 Jan 2005 at 18:26

When you are trying to reboot the system remotely after a kernel upgrade it's a good idea to have a rescue net. Using lilo allows you just such a thing, automatic rebooting if the machine panics or hangs.

When you reboot after an upgrade there are two things you want to make sure:

  - That the kernel will boot properly (mount the root filesystem)
  - That the network interfaces will go up as expected

We will take care of the first problem by supplying panic= argument on system boot, to auto-reboot in case of problems. We'll take care of the second problem by running a special script that will perform network connectivity test.

There are other things you'd want to check for. For example, you could check if the SSH daemon is running and is properly accepting connections, or you could perform some site-specific checks. This ideas are not implemented in my article. If you enhance my procedure, please notify me of the results because at some future time I might create a Debian package dedicated to setting things up for more reliable remote reboots.

Here are the steps needed:

* Adding a special bootloader entry (LILO example):

  image=/vmlinuz.new
    label=Newkernel
    append="panic=5 newkernel"
    read-only
    optional
This will create a special image name to test the new kernel. panic=5 makes sure the kernel autoreboots on panic in 5 secods, and newkernel is an arbitrary "newinit"-like name we chose, so we can later check if we're in test phase or normal run.

* Put the testnet script in /etc/init.d/ and activate it:

    update-rc.d testnet start 40 S .
This will install a script that will test network connectivity once you reboot. This will handle cases where the kernel does not panic (it mounts the root filesystem at least), but then something goes wrong and it doesnt start up the network properly (incorrect kernel driver modules setup or something). The script can be found at my website, or locally here

* If you installed the kernel image from the .deb package, make
sure the /vmlinuz link still points to the old kernel, and
/vmlinuz.new to the new one.

* Also make sure there's an account left open for the colo
facility personnel to access the system if you mess it up.
Use adduser to create the account "support", then add them to
sudoers file without a password:

    echo "support ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers

* Write the new lilo.conf (which we modified in step 1), and make
the test kernel (newkernel image) the default for just the next boot:

    lilo
    lilo -R newkernel

* Reboot and see how it plays out.
If you get into the new kernel, adjust /vmlinuz and /vmlinuz.old symlinks appropriately, re-run lilo, and reboot once again to the new image (which is now the default (so no lilo -R ...)).

* When the new system comes up the second time, disable the 'support'
account.


Original location of this article is http://colt.projectgamma.com/debian/remote-reboot.html . It was written by Davor Ocelic (docelic+mail.inet.hr) to help with the maintenance of The Internet Hosting Cooperative ( http://www.hcoop.net ) machines. Improvements to the process are of course welcome, please send them to the above author's email address.

 

 


Posted by unixsurfer (80.56.xx.xx) on Tue 3 Jan 2006 at 17:20
Hello
Is it possible to the above(just the panic part) with grub?
unixsurfer

[ Parent | Reply to this comment ]

Posted by Anonymous (153.5.xx.xx) on Fri 17 Feb 2006 at 18:28
Yes, of course. Probably every bootloader supports
passing options to the kernel image.

In grub, you would simply press 'e' for edit, then modify the
kernel parameters line to include panic=XX.

-docelic

[ Parent | Reply to this comment ]

Posted by Anonymous (69.156.xx.xx) on Tue 2 May 2006 at 16:15
You could also add the panic=5 to the end of the kernel line in /boot/grub/menu.lst

[ Parent | Reply to this comment ]

Posted by Anonymous (200.61.xx.xx) on Fri 17 Feb 2006 at 15:56
When you mention "and newkernel is an arbitrary "newinit"-like name we chose", is it the label of a stable/well tested kernel?

I'm not sure what I have to put in there. TIA. Robert.

[ Parent | Reply to this comment ]

Posted by Anonymous (153.5.xx.xx) on Fri 17 Feb 2006 at 18:30
You put there the exact string "newkernel". I thought this was obvious from the context. And the remark about "newinit" is meant to remind (those who are familiar with init) of the new init installation which uses special name "newinit". OK, anyway hope it's clear now, -docelic

[ Parent | Reply to this comment ]

Posted by t-om (58.10.xx.xx) on Wed 10 Oct 2007 at 18:22
It would be useful to give instructions also for rebooting a sick machine remotely for which complicated remote access (telnet, ssh, http) is no longer available because of the problems in the machine (e.g. temporarily broken disk access) but which is still touchable with single packets remotely. So far the best approach I have been able to think of would be doing it somehow with iptables. Better ideas?

[ Parent | Reply to this comment ]

Posted by docelic (213.147.xx.xx) on Wed 5 Dec 2007 at 15:52
It's all too complicated. If the machine isn't working, it isn't working. Buy a cheap KVM solution which hooks itself on computer's keyboard/mouse port and on a separate IP address. By SSH-ing to that IP or opening a VNC client to it, you get "localhost" access to your server.

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

What do you use for configuration management?








( 673 votes ~ 10 comments )