Weblogs for endecotp
#4
Posted by endecotp on Sat 8 Mar 2008 at 23:45
Does anyone have any hints for automatically detecting and restarting daemons that have failed for some reason? I have a couple of concerns: my own code that could just crash, and "innocent" daemons that are killed by the OOM killer when memory is low (because my own code got into a loop eating memory...). In both cases these are things that are started by /etc/init.d scripts and there are probably /var/run/*.pid files for them.
I could easily knock together a cron script that would use the pid files to check for daemons that have gone away, and restart them. Of course automatic restart would not always be appropriate, but I'm thinking about the best thing to do when I'm on holiday and human intervention could be weeks away! I was wondering if there is any existing utility that would do this - maybe even tied in to start_stop_daemon, for example, or the metadata at the start of the init.d files.
Any ideas anyone, before I roll my own?
I could easily knock together a cron script that would use the pid files to check for daemons that have gone away, and restart them. Of course automatic restart would not always be appropriate, but I'm thinking about the best thing to do when I'm on holiday and human intervention could be weeks away! I was wondering if there is any existing utility that would do this - maybe even tied in to start_stop_daemon, for example, or the metadata at the start of the init.d files.
Any ideas anyone, before I roll my own?
#3
Posted by endecotp on Tue 9 Oct 2007 at 19:29
For a while I have been running a few Debian boxes from NFS root filesystems - mainly NSLU2s and similar little ARM things. But what I have never successfully done is to run them simultaneously from a single shared NFS root filesystem. I'm now trying to do this.
The idea is that they share a single root but have their own /tmp and /var. /home is also an NFS mount.
But there are files in /var/lib/dpkg, and elsewhere in /var, that presumably need to be kept in-sync with the contents of /bin, /usr and so on. So some parts of /var need to be shared too.
So, what I need is a list of the subdirectories of /var that should be shared (like /var/lib/dpkg) and the subdirectories that should be per-machine (like /var/tmp and /var/run). Is it safer to default to "shared" or "per-machine"?
I'm particularly worried about the dpkg and apt-related stuff, since its operation is "magic" to me and getting it wrong could leave everything horribly screwed up...
Here's the editted output of "tree -L 2" in /var :
|-- backups
| |-- dpkg.status.0
| |-- group.bak
| |-- gshadow.bak
| |-- infodir.bak
| |-- passwd.bak
| `-- shadow.bak
|-- cache
| |-- apt
| |-- debconf
| |-- locate
| `-- man
|-- lib
| |-- apt
| |-- aptitude
| |-- dhcp3
| |-- dpkg
| |-- initramfs-tools
| |-- initscripts
| |-- logrotate
| |-- misc
| |-- urandom
| |-- usbutils
| `-- vim
|-- local
|-- lock
|-- log
| [snip]
|-- mail
|-- opt
|-- run
| |-- crond.pid
| |-- crond.reboot
| |-- klogd.pid
| |-- motd
| |-- network
| |-- sshd
| |-- sshd.pid
| |-- syslogd.pid
| `-- utmp
|-- spool
| |-- cron
| `-- mail -> ../mail
`-- tmp
[snip]
The idea is that they share a single root but have their own /tmp and /var. /home is also an NFS mount.
But there are files in /var/lib/dpkg, and elsewhere in /var, that presumably need to be kept in-sync with the contents of /bin, /usr and so on. So some parts of /var need to be shared too.
So, what I need is a list of the subdirectories of /var that should be shared (like /var/lib/dpkg) and the subdirectories that should be per-machine (like /var/tmp and /var/run). Is it safer to default to "shared" or "per-machine"?
I'm particularly worried about the dpkg and apt-related stuff, since its operation is "magic" to me and getting it wrong could leave everything horribly screwed up...
Here's the editted output of "tree -L 2" in /var :
|-- backups
| |-- dpkg.status.0
| |-- group.bak
| |-- gshadow.bak
| |-- infodir.bak
| |-- passwd.bak
| `-- shadow.bak
|-- cache
| |-- apt
| |-- debconf
| |-- locate
| `-- man
|-- lib
| |-- apt
| |-- aptitude
| |-- dhcp3
| |-- dpkg
| |-- initramfs-tools
| |-- initscripts
| |-- logrotate
| |-- misc
| |-- urandom
| |-- usbutils
| `-- vim
|-- local
|-- lock
|-- log
| [snip]
|-- opt
|-- run
| |-- crond.pid
| |-- crond.reboot
| |-- klogd.pid
| |-- motd
| |-- network
| |-- sshd
| |-- sshd.pid
| |-- syslogd.pid
| `-- utmp
|-- spool
| |-- cron
| `-- mail -> ../mail
`-- tmp
[snip]
#2
Posted by endecotp on Tue 10 Jul 2007 at 23:15
My box has a single network interface and two IP addresses. It is currently working fine with just the first of them configured.
Now I think that I should be able to bring up the second one using an alias, something like this:
ifconfig eth0:0 netmask <something> x.y.z.2
but I'm having trouble with the netmask setting. The netmask that I've been told applies to the two (neighbouring) addresses is 255.255.255.224. This is what the first interface is configured with. Trying to use the same netmask on the alias gives an error (below) - but I'm not sure that I should be using the same netmask for the alias anyway; one search result told me that the alias should always have the netmask 255.255.255.255. I tried that, but I still get this same error:
# ifconfig eth0:0 netmask 255.255.255.224 x.y.z.2
SIOCSIFNETMASK: Cannot assign requested address
Actually although that looks like an error, it seems that it does bring up the interface alias and it is pingable, but it gives it a netmask of 255.0.0.0, which looks very wrong.
Can anyone explain what needs to happen here, and how to achieve it?
One possible answer would be "just add another section to /etc/network/interfaces and ifup eth0:0". Maybe ifup can do all this automagically for me. But I hesitate to do that, because I don't have physical access to this machine. At the moment, if I accedentally "ifdown eth0" I can remotely power-cycle it. However, if I screw up editing /etc/network/interfaces then a power-cycle might not help. So I'm only going to touch that file when I'm really confident that I have it correct.
In fact that's an interesting thought: could /etc/network/interfaces, and other critical configuration files, be set up with known-good fallbacks, used a bit like those "screen mode changed, click here within 10 secs if you can read this." dialogs?
Now I think that I should be able to bring up the second one using an alias, something like this:
ifconfig eth0:0 netmask <something> x.y.z.2
but I'm having trouble with the netmask setting. The netmask that I've been told applies to the two (neighbouring) addresses is 255.255.255.224. This is what the first interface is configured with. Trying to use the same netmask on the alias gives an error (below) - but I'm not sure that I should be using the same netmask for the alias anyway; one search result told me that the alias should always have the netmask 255.255.255.255. I tried that, but I still get this same error:
# ifconfig eth0:0 netmask 255.255.255.224 x.y.z.2
SIOCSIFNETMASK: Cannot assign requested address
Actually although that looks like an error, it seems that it does bring up the interface alias and it is pingable, but it gives it a netmask of 255.0.0.0, which looks very wrong.
Can anyone explain what needs to happen here, and how to achieve it?
One possible answer would be "just add another section to /etc/network/interfaces and ifup eth0:0". Maybe ifup can do all this automagically for me. But I hesitate to do that, because I don't have physical access to this machine. At the moment, if I accedentally "ifdown eth0" I can remotely power-cycle it. However, if I screw up editing /etc/network/interfaces then a power-cycle might not help. So I'm only going to touch that file when I'm really confident that I have it correct.
In fact that's an interesting thought: could /etc/network/interfaces, and other critical configuration files, be set up with known-good fallbacks, used a bit like those "screen mode changed, click here within 10 secs if you can read this." dialogs?
#1
Posted by endecotp on Fri 4 May 2007 at 00:06
Once upon a time I'm sure there used to be a "how to make your (Debian?) root filesystem read-only" document. As I recall, the guy responsible for it had submitted quite a few bugs / patches to move everything that prevented the root filesystem from being read-only into /var.
Now that I come to look for it, I can't find it.
If anyone knows what I'm talking about, please point me in the right direction. Thanks!
Now that I come to look for it, I can't find it.
If anyone knows what I'm talking about, please point me in the right direction. Thanks!