Using DTrace to observe nscd(1M) and nis_cachemgr(1M) interaction

I wanted to know why the NIS+ cache manager (nis_cachemgr(1M), aka NCM) was struggling on one of our internal Sun Ray servers. It appeared to be wasting time looking up non-existent directories. This is a problem for NIS+ as:

(a) The NCM is single threaded, other callers must wait,
(b) The directory cache managed by the NCM doesn’t cache negative hits, and
(c) Our internal NIS+ domain had a bogus domain with a single unresponding server.

The first two aren’t so bad as the NCM deals with that quite quickly, the real problem was the last point. For some reason we kept trying to contact that broken domain.

The NCM doesn’t know why it’s being asked to bind a directory, only the caller knows why and this is where DTrace came in. I knew the most common caller would be name services and that means nscd(1M), the name services cache. I wrote a D script that extracted the name being looked up:

#!/usr/sbin/dtrace -qs
#pragma D option nspec=32
pid$1::_nss_nisplus_expand_lookup:entry
{
self->timestamp = timestamp;
self->line = 0;
self->spec = speculation();
speculate(self->spec);
self->lookup = stringof(copyinstr(arg3));
printf("%016.16#x %02d Looking up: %s\n",
self->timestamp, self->line++,
self->lookup);
}
pid$1::_nss_nisplus_expand_lookup:return
/self->spec && ! self->called_ncm/
{
discard(self->spec);
self->spec = 0;
}
pid$1::_nss_nisplus_expand_lookup:return
/self->spec && self->called_ncm/
{
speculate(self->spec);
this->time = timestamp - self->timestamp;
this->secs = this->time / 1000000000;
this->msecs = (this->time - this->secs * 1000000000) / 1000000;
this->timestamp = timestamp;
printf(
"%016.16#x %02d Total time for %s: %2d.%03d secs\n%016.16#x %02d\n",
self->timestamp, self->line++,
self->lookup, this->secs, this->msecs,
self->timestamp, self->line++);
commit(self->spec);
self->spec = 0;
self->called_ncm = 0;
}
pid$1:libnsl:nis_list:entry
/self->spec/
{
speculate(self->spec);
printf("%016.16#x %02d Calling nis_list(): %s\n",
self->timestamp, self->line++,
stringof(copyinstr(arg0)));
}
pid$1:libnsl:nis_cache_bind_replica_2:entry
/self->spec/
{
self->called_ncm = 1;
this->dnamep = *((uint32_t *)copyin(arg0, 4));
self->dname = stringof(copyinstr(this->dnamep));
self->ncm_start = timestamp - self->timestamp;
}
pid$1:libnsl:nis_cache_bind_replica_2:return
/self->spec/
{
speculate(self->spec);
nis_error[0] = " Success";
nis_error[5] = " Name unreachable";
nis_error[20] = " No such name";
this->err = *((uint32_t *)copyin(arg1, 4));
this->start_secs = self->ncm_start / 1000000000;
this->start_msecs = (self->ncm_start - this->start_secs * 1000000000)
/ 1000000;
printf("%016.16#x %02d %2d.%03d %26.26s (%d%s)\n",
self->timestamp, self->line++,
this->start_secs, this->start_msecs,
self->dname, this->err, nis_error[this->err]);
}

It’s run like this:

# dtrace -o dtrace.out -q -s nis_list_cache_calls.d $(pgrep nscd)

The interesting thing for me was using speculations. We speculate when nscd calls the NIS+ specific code and record what lookup is taking place. We trace calls to nis_cache_bind_replica_2() and mark this speculation as worthy of interest. When we return from the NIS+ code we check to see if the NCM has been called and if so commit() the speculation.

One thing I learnt was that speculation buffers, like other DTrace buffers, are per-CPU and the traced output may not be in time order. I had to add a timestamp prefix and a line number to the output to make sense of it.

Here’s some sample output:

timestamp-.  Line-.  Action (some with start times)-.
|       |                                 |
 +> 
0x0009c35e1ad134f0 00 Looking up: mail.google.com
0x0009c35e1ad134f0 01 Calling nis_list(): [name=mail]ipnodes.org_dir.google.com
0x0009c35e1ad134f0 02  0.000        org_dir.google.com. (20 No such name)
0x0009c35e1ad134f0 03  0.001 ipnodes.org_dir.google.com (20 No such name)
0x0009c35e1ad134f0 04  0.003 org_dir.google.com.eu.cte. (20 No such name)
0x0009c35e1ad134f0 05  0.004 ipnodes.org_dir.google.com (20 No such name)
0x0009c35e1ad134f0 06  0.005 org_dir.google.com.cte.sun (20 No such name)
0x0009c35e1ad134f0 07  0.006 ipnodes.org_dir.google.com (20 No such name)
0x0009c35e1ad134f0 08  0.008 org_dir.google.com.sun.com (20 No such name)
0x0009c35e1ad134f0 09  0.009 ipnodes.org_dir.google.com (20 No such name)
0x0009c35e1ad134f0 10 Total time for mail.google.com:  0.010 secs
...
0x0009c3697101c690 00 Looking up: foo.swiss
0x0009c3697101c690 01 Calling nis_list(): [name=foo]hosts.org_dir.swiss
0x0009c3697101c690 02  0.000 org_dir.swiss.eu.cte.sun.c (20 No such name)
0x0009c3697101c690 03  0.001 hosts.org_dir.swiss.eu.cte (20 No such name)
0x0009c3697101c690 04  0.002 org_dir.swiss.cte.sun.com. (5 Name unreachable)
0x0009c3697101c690 05  5.025 hosts.org_dir.swiss.cte.su (5 Name unreachable)
0x0009c3697101c690 06 10.045     org_dir.swiss.sun.com. (20 No such name)
0x0009c3697101c690 07 10.048 hosts.org_dir.swiss.sun.co (20 No such name)
0x0009c3697101c690 08 Total time for foo.swiss: 10.050 secs

In the first example someone is trying to resolve mail.google.com and we can see NIS+ following its search path as that’s not a fully qualified name. It took 10ms, not too shabby.

In the second example I tried to resolve foo.swiss which hits a broken domain, that takes a far more significant 10s to resolve as each attempt to bind to the domain has a 5s timeout.

It turned out that the reason we were having this problem is that our Sun Rays now allow direct access to web sites, ie no proxy. That means a far more varied set of name lookups. As nisplus is before dns in our /etc/nsswitch.conf all these are checked with NIS+ first. The combination of the wide range of names, the fact they are not usually fully qualified, the short negative cache time in nscd(1M), the NIS+ search path, the lack of NIS+ caching of non-existent directories, the fact that NCM is single threaded and the broken NIS+ domain resulted in a somewhat hesitant name service.

The resolution was to fix the broken domain and/or put dns first in the /etc/nsswitch.conf. Some people argue that if you’re using DNS for host name resolution then don’t use NIS+ for host name resolution as well — this is one example of how that pays off.

Another day, more things learnt, I love this job 🙂

Eddies in the Space-Time Continuum

Using DTrace to observe nscd(1M) and nis_cachemgr(1M) interaction

Leave a comment Cancel reply

Sporadic tweets from me

Tags of interest

Search this blog

Categories

Eddies in the Space-Time Continuum

Using DTrace to observe nscd(1M) and nis_cachemgr(1M) interaction

Rate this:

Share this:

Related

Leave a comment Cancel reply

Sporadic tweets from me

Tags of interest

Search this blog

Categories