Weblog entry #4 for endecotp

Good ways to detect and restart failed daemons
Posted by endecotp on Sat 8 Mar 2008 at 23:45
Tags: none.
Does anyone have any hints for automatically detecting and restarting daemons that have failed for some reason? I have a couple of concerns: my own code that could just crash, and "innocent" daemons that are killed by the OOM killer when memory is low (because my own code got into a loop eating memory...). In both cases these are things that are started by /etc/init.d scripts and there are probably /var/run/*.pid files for them.

I could easily knock together a cron script that would use the pid files to check for daemons that have gone away, and restart them. Of course automatic restart would not always be appropriate, but I'm thinking about the best thing to do when I'm on holiday and human intervention could be weeks away! I was wondering if there is any existing utility that would do this - maybe even tied in to start_stop_daemon, for example, or the metadata at the start of the init.d files.

Any ideas anyone, before I roll my own?

 

Comments on this Entry

Posted by mwr (24.158.xx.xx) on Sun 9 Mar 2008 at 00:56
[ Send Message | View Weblogs ]
For the specific task of restarting processes, monit. For the more general case of restarting processes plus other configuration tasks, puppet.

[ Parent | Reply to this comment ]

Posted by Steve (82.32.xx.xx) on Sun 9 Mar 2008 at 12:03
[ Send Message | View Steve's Scratchpad | View Weblogs ]

Definitely using monit is the way forward..

Steve

[ Parent | Reply to this comment ]

Posted by Anonymous (201.208.xx.xx) on Mon 10 Mar 2008 at 03:08
I'll try it, my clamav has been dying since a few months ago and I don't know why.

[ Parent | Reply to this comment ]

Posted by dkg (216.254.xx.xx) on Mon 10 Mar 2008 at 07:18
[ Send Message | View dkg's Scratchpad | View Weblogs ]
You could also use runit as a process supervision suite. It's a beautifully clean design and implementation, and the maintainer (Gerrit Pape) is reasonable, responsive and easy to work with.

[ Parent | Reply to this comment ]

Posted by endecotp (86.6.xx.xx) on Mon 10 Mar 2008 at 11:10
[ Send Message | View Weblogs ]
Thanks for the monit suggestions. I wasn't aware of its ability to restart things. I'll look into it some more.

[ Parent | Reply to this comment ]

Posted by drgraefy (128.59.xx.xx) on Thu 13 Mar 2008 at 20:10
[ Send Message | View Weblogs ]
I agree with dkg that the real proper way to manage daemons is with runit. It handles daemon supervision from top to bottom, including writing to logs, and can automatically restart and manage daemons if they die.

[ Parent | Reply to this comment ]

User Login

Username:

Password:

[ Advanced Login ]

Register Account

Quick Site Search