Help: Backporting nscd to sarge or upgrading glibc?
Posted by lollipop on Mon 24 Apr 2006 at 09:23
Our current server setup is composed of 25 or so servers running Debian sarge. I use openldap for managing authentication and userinfo. Everything works quite well when the LDAP server is up and running, however whenever it goes down, havoc ensues across all our servers.
I assumed that nscd (Name Service Caching Daemon) would cache the important information allowing our servers to continue to function during a small ldap outage. However, nscd on my Sarge servers was not caching any data.
After some investigation with strace I discovered that the resolving library was looking for the nscd socket at /var/run/nscd/socket. In Sarge nscd creates the socket file in /var/run/.nscd_socket, there does not seem to be a way to tell the daemon where to create the socket. This problem is fixed in unstable, but as a work around for Sarge I just added a symlink to the real nscd socket.
So now 'nscd --statistic' was showing that data was indeed being cached and applications were successfully querying nscd. Unfortunately, running 'lsof -i @:ldap' on my web machines still showed connections to our ldap server from the apache process.
This was due to my nsswitch.conf setup:
passwd: files ldap group: files ldap shadow: files ldap hosts: files dns networks: files protocols: files services: files ethers: files rpc: files netgroup: nis
By default group membership is checked for all databases listed in the 'group:' line. So every time apache spawns a process it queries files and ldap to determine in what groups the apache user(www-data) is a member of, nscd was not caching this query.
I came to find out that the enumeration of groups is not cached in the Sarge nscd version, 2.3.2, which renders it useless to use nscd to cache LDAP data in Sarge
Group caching was added in 2.3.3 according to this nscd changelog.
I would really like to upgrade nscd to the latest version on my Sarge boxes, so I can obtain this functionality. I haven't been able to find a backport of nscd. Upgrading to the latest in unstable would necessitate me upgrading glibc as well.
I was warned that it is not advisable to upgrade to a different version of glibc then what is in stable, is this still the case?
I tried compiling the latest version of the nscd package on Sarge, but it appears that you have to compile glibc as well. Is there a way to compile nscd against the version of glibc in Sarge?
Any suggestions other than nscd to the LDAP caching problem?
That's very strange and in my opinion, it should be marked as a critical bug.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
I don't know a lot about nscd. But if it is supposed to cache the data and it is not caching it at all ... that's a BIG problem.
[ Parent | Reply to this comment ]
I had looked at this before, but got sidetracked tring to determine exactly what was happening with nscd.
I just gave it a whirl and it seems to work well when the ldap server is inaccessible.
It does not however handle the caching problem, which means individual servers can still flood my ldap servers with queries.
I think nss-updatedb would work best in combination with a working nscd package.
[ Parent | Reply to this comment ]
I mean it is standard practice whereever I've been to have 3 NIS servers, 3 ADS servers, 3 DNS servers, and make all critical network services triply redundant.
Not that this solves the bug, but it should make it a lot less obvious.
[ Parent | Reply to this comment ]
What happens to your three redundant ldap servers when a firewall change prevents your dmz web servers from querying your internal ldap servers?
How do your redundant ldap servers help when a commit to the master ldap server database breaks group queries and this broken entry is replicated to all your slave ldap servers?
What happens when your apache servers start spawning hundreds of threads because of a broken cgi script, flooding all you ldap servers with queries, resulting in ldap timeouts across you whole lan?
I have experienced all of theses problems, and a working nscd program would help to avoid them.
[ Parent | Reply to this comment ]
Depends: libc6 (>= 2.3.5-1), libclamav1 (>= 0.86.2), libesmtp5 (>= 0.8.8), libgcc1 (>= 1:4.0.0-9), libstdc++6 (>= 4.0.1), zlib1g (>= 1:1.2.1),
I did this by installing the etch versions, and it works fine. I have two different systems doing this, and i'm still tossing up whether i should continue upgrading as new etch versions are released. My gut feeling is no, but i may change this depending on security issues.
I also pin shorewall and adzapper to testing for all my sarge systems.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
I was unable to solve the caching problem as even in the latest version of nscd there are problems with nscd invalidating group data too early, bug 173019. So I decided it wasn't worth attempting to get the latest nscd sources to compile against sarge's glibc.
Instead I focused on cases where the ldap server was inaccessible. I used nss-updatedb to permanently cache the group and user data. This solution works well, since our ldap database is fairly small. In addition I added these values to my libnss-ldap.conf:
timelimit 5 bind_timelimit 1 bind_policy soft
which helps to prevent queries to ldap from just hanging indefinitely.
[ Parent | Reply to this comment ]
We are using nscd on 20+ Debian sarge machines with user data stored in LDAP and I have just checked it works without any problems (passwd + group data).
Please provide more information, because I would be tempted to say that your claims are not valid.
[ Parent | Reply to this comment ]
I did a fair amount of investigative work and I believe my claims are correct.
With regards to nscd not caching data, please see my comment on the debian bug, 345168. Perhaps the nscd sockets are created correctly under some circumstances, but they were not during my install attempts.
As to my claims that nscd is not caching initgroup entries, this is fairly well documented in debian and redhat bugs, as well as the glibc changelog. Even the ability to cache initgroup data in the latest versions is still broken, 173019.
[ Parent | Reply to this comment ]