Process exit status explored: SMF and what does shell’s $? represent?

Introduction

All UNIX shells I’ve seen allow access to the exit value of the last command through the $? macro. The ksh(1) man page neatly states:

The value of a simple-command is its exit status if it terminates
normally. If it terminates abnormally due to receipt of a signal,
the value is the signal number plus 128.  See signal.h(3HEAD) for
a list of signal values. Obviously, normal exit status values 129
to 255 cannot be distinguished from abnormal exit caused by
receiving signal numbers 1 to 127.

How is exit status set? And how much can we derive from it? The reason I asked this question was I was trying to understand SMF‘s handling of process exit status.

SMF digression

I was looking at the SMF method failure messages. There are three method failed error paths in usr/src/cmd/svc/startd/method.c method_run():

signal "%s: Method \"%s\" failed due to signal %s.\n"

exit() "%s: Method \"%s\" failed with exit status %d.\n"

other "%s: Method \"%s\" failed with exit status %d.\n"

What I was confused about is under what conditions the third error path can ever be taken. Note the ambiguity in that the last two error messages are identical.

Exit status

Exit status is documented in the wait(3c) man page. This is what we have:

In the  following,  status  is  the  object  pointed  to  by
stat_loc:
o  If the child process terminated due to an _exit() call,
the  low  order 8 bits of status will be 0 and the high
order 8 bits will contain the low order 7 bits  of  the
argument  that the child process passed to _exit(); see
exit(2).
o  If the child process terminated due to  a  signal,  the
high order 8 bits of status will be 0 and the low order
7bits will contain the number of the signal that caused
the  termination.  In  addition, if  WCOREFLG is set, a
"core   image"   will   have   been    produced;    see
signal.h(3HEAD) and wait.h(3HEAD).

In other words:

If lower 8 bits are zero, we called exit() or fell off the end of main(), exit status in the top 8 bits.

If lower 8 bits are not zero and the upper 8 bits are zero, we took a signal, the top bit (WCOREFLG) is set in the lower 8 bit we also core dumped. The signal taken is in the lower 7 bits.

What this doesn’t document is when the lower 8 bits are non-zero and the upper 8 bits are non-zero. The exit status is set by wstat():

/*
* convert code/data pair into old style wait status
*/
int
wstat(int code, int data)
{
int stat = (data & 0377);
switch (code) {
case CLD_EXITED:
stat <<= 8;
break;
case CLD_DUMPED:
stat |= WCOREFLG;
break;
case CLD_KILLED:
break;
case CLD_TRAPPED:
case CLD_STOPPED:
stat <<= 8;
stat |= WSTOPFLG;
break;
case CLD_CONTINUED:
stat = WCONTFLG;
break;
default:
cmn_err(CE_PANIC, "wstat: bad code");
/* NOTREACHED */
}
return (stat);
}

If the lower 8 bits and the upper 8 bits are non-zero it looks like a CLD_STOPPED or CLD_TRAPPED. Reading the source further the upper 8 bits will be the signal that caused it.

What you also have to bear in mind is that applications interpret the status returned from wait(3c). For ksh(1) this is as documented above, but what of SMF?

SMF handling of exit status

The SMF code looks like this (with comments from me):

if (!WIFEXITED(ret_status)) {
WIFEXITED tests for ((int)((stat)&0xFF) == 0)
We didn't exit cleanly, so let's find out why
if (WIFSIGNALED(ret_status)) {
WIFSIGNALED does this test:
((int)((stat)&0xFF) > 0 && (int)((stat)&0xFF00) == 0) */
We already know the first test is non-zero and, assuming
I've not confused my types, ANDing 0xFF with a signed int
will always be >=0 as we implicitly cast to the larger
type (unsigned), AND with 0xFF and then explicitly cast
back to signed. In other words, it's something of a no-op.
The second test is more interesting as it relates to the
overloading of the exit status. If the upper 8 bits are
zero then it's a simple signal.
Log: "%s: Method \"%s\" failed due to signal %s.\n"
Signal is derived using WTERMSIG(), see
wait.h(3head). Currently that's:
#define	WTERMSIG(stat)		((int)((stat)&0x7F))
} else {
We can only reach this clause if we have something more
complex than a signal, like the child stopping.
Log: "%s: Method \"%s\" failed with exit status %d.\n"
WEXITSTATUS(ret_status))
The exit status is: ((int)(((stat)>>8)&0xFF))
As noted above, I believe the value is the signal that
that caused the CLD_STOPPED or CLD_TRAPPED.
}
** Jump out **
}
Normal exit
*exit_code = WEXITSTATUS(ret_status);
if (*exit_code != 0) {
Log: "%s: Method \"%s\" failed with exit status %d.\n"
}

Conclusion

There’s an ambiguity in the SMF method failed code in that we don’t distinguish between a clean exit and the child being stopped or trapped.

As far as shells are concerned, the man pages need to be checked as regards the handling of the exit status and the resulting $? exit value presented by the shell.

Advertisements
Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: