Increasing UNIX group membership – easy, surely?

Having OpenSolaris come along I can openly discuss some changes I’m proposing for Solaris. If you have comments I’d love to hear them. Here’s what I’m looking at:

Synopsis: UNIX users can belong to UNIX groups and for many years the maximum number of groups in Solaris has been limited to 16. Increasing it sounds easy and of obvious benefit. It turns out to be neither, read on.

What’s the problem?

Start with bug 1 4088757. I wrote in an internal-only section of that bug:

The bug has so many customers on it I’m surprised this hasn’t been addressed before. It’s also featured heavily on internal and external mail aliases. This makes me think that simply increasing the group limit isn’t the answer.

Why are customers putting users in an excessive number of groups? The oft suggested fix of using ACLs clearly isn’t meeting their needs or being communicated well.

I favour understanding what the customer is trying to achieve – I don’t believe UNIX groups are particularly useful in today’s networked, multi-platform IT infrastructures … but then I don’t get out much.

Of course, using ACLs to control file access isn’t particularly cross-platform either but you get the idea. I’m trying to understand why people want to solve problems using large UNIX group membership so that we can design operating system features that meet that need.

All well and good, until Samba started integrating with Windows Active Directory and dealing with huge group memberships. As Samba has to map them on to the underlying OS it relies on the group membership offered. Result? We have a problem we need to fix, especially as the Linux 2.6 kernel now allows 65536 groups.

So, fix it!

It isn’t easy. Too much stuff would/could break.

NFS, or rather AUTH_SYS, can’t handle it

The most obvious breakage is NFS. Strictly speaking it’s not NFS that’s at fault, it’s more a victim. The underlying problem is a limitation in an authentication flavour commonly used – AUTH_SYS – and is pretty much the default. From RFC 1057:

9.2 UNIX Authentication
The client may wish to identify itself as it is identified on a
UNIX(tm) system.  The value of the credential's discriminant of an
RPC call message is "AUTH_UNIX".  The bytes of the credential's
opaque body encode the the following structure:
struct auth_unix {
unsigned int stamp;
string machinename;
unsigned int uid;
unsigned int gid;
unsigned int gids;
};

In other words, the list of supplementary groups is a variable sized array of up to 16 entries. You simply cannot have more than 16 groups and use AUTH_SYS.

Of course, NFSv4 isn’t affected by this as there are plenty of other authentication flavours that are mandatory for clients and servers which are not affected by the group limits.

If you’ve been paying attention you might be given to wonder how the Linux 2.6 kernel handles this. Answer? It doesn’t, it just truncates the group list at NFS_NGROUPS (16).

Changing a well known kernel structure isn’t easy

Up until Solaris 10 our credentials structure was public and anybody could tinker with it. Then Casper introduced Least Privilege which had to make struct cred private and placed an API between kernel routines using creds and the cred structure itself.

For reference (but not use!) here is the private credential structure:

struct cred {
uint_t		cr_ref;		/* reference count */
uid_t		cr_uid;		/* effective user id */
gid_t		cr_gid;		/* effective group id */
uid_t		cr_ruid;	/* real user id */
gid_t		cr_rgid;	/* real group id */
uid_t		cr_suid;	/* "saved" user id (from exec) */
gid_t		cr_sgid;	/* "saved" group id (from exec) */
uint_t		cr_ngroups;	/* number of groups returned by */
/* crgroups() */
cred_priv_t	cr_priv;	/* privileges */
projid_t	cr_projid;	/* project */
struct zone	*cr_zone;	/* pointer to per-zone structure */
gid_t		cr_groups[1];	/* cr_groups size not fixed */
/* audit info is defined dynamically */
/* and valid only when audit enabled */
/* auditinfo_addr_t	cr_auinfo;	audit info */
};

Hurrah – it’s looking more fixable now. Anyone tinkering with the cred structure directly would have had to fix their code for Solaris 10. In addition, it may even be possible to back port the change to Solaris 10.

The group list is an array and the maximum size is controlled by ngroups_max which itself is limited as follows:

/*
* These define the maximum and minimum allowable values of the
* configurable parameter NGROUPS_MAX.
*/
#define	NGROUPS_UMIN	0
#define	NGROUPS_UMAX	32
/*
* NGROUPS_MAX_DEFAULT: *MUST* match NGROUPS_MAX value in limits.h.
* Remember that the NFS protocol must rev. before this can be increased
*/
#define	NGROUPS_MAX_DEFAULT	16

So that’s it is it? Just increase NGROUPS_MAX_DEFAULT?

Err, not quite. Not if we want 10,000+ groups. Let’s see …

  • Scalability
  • Memory

    A number of Solaris components (user and kernel) allocate structures related to ngroup_max size.

    On my local Sun Ray the cred_cache 2 is currently 248 KB (1426 allocations of 148 bytes (sizeof (cred_t) [88] + sizeof (gid_t) [4] * (ngroups_max – 1)) + a small overhead). If we increased ngroup_max to the current Linux limit (65536) this would be in excess of 350 MB (1426 * (88 + 4 * 65535)). Having said that, this machine has 64 GB of memory 🙂

    The main point is that a common kernel structure could increase in size from 148 bytes to potentially 256 KB.

    Other subsystems outside of the cred_cache that do this include procfs.

    3rd party kernel modules may be affected.

    Performance

    Checking group access in the kernel uses groupmember() which currently scans the group list in a simple loop. Large group membership might impact performance without changes in the searching of group membership. An obvious resolution would be to keep the list sorted and use a binary search to reduce lookup time.

    3rd party kernel modules may be affected.

  • Compatibility
  • NFS

    As discussed.

    3rd party code

    • 3rd party user code
    • Some poorly written user programs might make assumptions about the size of the group list. A simple interposer library using LD_PRELOAD could be used to fix this. AFAIK nothing should make assumptions about group size and they should always check the ABI.

    • 3rd party kernel modules
    • A complete unknown.

So this what I’m thinking …

This kind of change could break many things so we have an internal architectural review group that discuss this sort of thing. Here’s what I had in mind for them, I just need to shape it up:

We are not proposing changing the default ngroups_max value of 16. This would break AUTH_SYS. We would propose adding comments in /etc/system and our documentation explaining how to increase group membership.

Internally this will be implemented by changing the credential structure so that the list of groups is a pointer to a separate kmem_alloc(). This means that the base cred structure is a (small) fixed size but group list can be variable.

A single kernel allocation could have been made for the whole variable sized cred structure but this would have meant dropping the use of a dedicated kernel memory cache. The observability and potential performance gain from using a dedicated cache is desired.

Update crdup() (etc) to handle the new cred structure. This will also include rewriting groupmember() to efficiently search larger group lists.

Update ucred_t. It’s private (hurrah!) but also includes the group membership list in the structure so we’d need to change that and lots of bits of procfs too.

Hang on, what’s that about ucred_t? Ah, yes – not so much an exercise for the reader, more an exercise for me. Never a dull moment.


1 Bug or Request For Enhancement (RFE)? There’s a long standing internal aphorism related to fixing things in the current release: You can’t escalate an RFE. This is broadly true as we like to focus our development on the next release be that an update or a whole new version. It’s usually blindingly obvious what is a bug and what is an RFE except for a very small number borderline cases … and I work in sustaining where those cases tend to cluster. In other words, I see a lot of them. There’s a danger of individual groups toggling the bug/RFE status to suit their needs and arguing about it. My take is that if you’re arguing about whether it’s a bug or an RFE you’re trying to answer the wrong question – a better question is just what is it that needs changing and why?

2 How can you find that? Easy …

> ::kmastat ! head -3
cache                        buf    buf    buf    memory     alloc alloc
name                        size in use  total    in use   succeed  fail
------------------------- ------ ------ ------ --------- --------- -----
> ::kmastat ! grep cred_cache
cred_cache                   148   1356   1426    253952    601703     0
>

Getting the 3 top lines and the line matching cred_cache in a concise but not obfuscated single line is left as an exercise for the reader.


Technorati Tag:
Technorati Tag:
Technorati Tag: Note that Samba is a file system technology and a racy Latin-American dance. You may find some unexpected pictures on the Technorati Samba tag page from Flickr.

Advertisements
Leave a comment

7 Comments

  1. Mike Felder

     /  June 28, 2005

    How would you propose that a large organization control data on Unix & Windows platforms without the use of ACLs?

    Reply
  2. James R Grinter

     /  November 27, 2006

    The NFS/AUTH_SYS limitations could be overcome if it were easier (i.e. better documented, simple step-by-step notes for what to configure where) how to set up the other AUTH types to work with NFS.
    My understanding (correct me if I’m wrong!) is that all the other methods then utilise the groups that the user is a member of from the point of view of the fileserver instead of the NFS client. Whilst, in one or two cases, it would probably break e.g. where someone’s modified a user’s group memberships locally to a client, it seems a better, more secure way of running things. But it’s not obvious how to go about it.
    (Bonus points for documentation that covers back to at least Solaris 8. We can’t “big-bang” all our NFS clients, or our NFS servers!)

    Reply
  3. David Collier-Brown

     /  May 23, 2007

    Could you add me to the interest list on this bug under my non-Sun email address above? I’m contracting for PS these days and don’t have Sun email, but I’m still a Samba contact for this bug…

    Reply
  4. Where does this leave sites like mine that use Samba and NFS, but need to upgrade from version 3.2 of Samba which does not appear to fail with this group>16 issue?
    As this post dates back such a long time, you would think SUN would of addressed the problem by now.

    Reply
  5. Casper Dik is currently working on this, see:
    http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4088757
    There’s a discussion going on about how far to raise the group limit.

    Reply
  6. Pablo Méndez Hernández

     /  November 27, 2009

    A fix was commited by Casper Dik on the Nov 20th and will be in snv_129:
    Author: Casper H.S. Dik <Casper.Dik@Sun.COM>
    Repository: /hg/onnv/onnv-gate
    Latest revision: 8aa0c4ca66399c389efebd8bbf2512983fa05378
    Total changesets: 1
    Log message:
    PSARC 2009/542 Increase the maximum value of NGROUPS_MAX to 1024
    4088757 Customer would like to increase ngroups_max more than 32
    6853435 Many files incorrectly include the private <sys/cred_impl.h>

    Reply
  7. Casper Dik

     /  April 3, 2013

    The following additional changes since the putback in snv_129:

    Support of 1024 gids for secure NFS (RPC_GSS, AUTH_DH) in Solaris 11 (and Solaris 10u11)

    Support for 1024 gids using look-aside in the name service for NFS auth_sys since Solaris 11.1)

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: