Good Morning Build 81, or not.

February 5, 2008

I did not even get a chance to login to the Sun Ray server running build 82 before it had crashed twice. So all was not well. A bit of digging and it was looking like a problem somewhere in portfs with kmem corruption. Since the problem was easily reproducible (boot system login and use for a few hours) I got the lab staff to set kmem_flags to 0xf in /etc/system and boot again.

Sure enough this morning there were two more crash dumps with variations of this in the message buffers:

kernel memory allocator:  duplicate free: buffer freed twice buffer=60063bfed60  bufctl=300f08886b8  cache: kmem_alloc_32 previous transaction on buffer 60063bfed60: thread=300f43dac60  time=T-0.000269600  slab=300f08761e0  cache: kmem_alloc_32 kmem_cache_free+30 port_pcache_remove_fop+44 port_pfp_setup+198 port_associate_fop+2b8 portfs+2c8  panic[cpu512]/thread=300f43dac60:  kernel heap corruption detected  > $c vpanic(12ac480, 5, 2c8, 1, 18de000, 12ac400) kmem_error+0x4e8(18de000, 3000005ae08, 60063bfed60, 12ac400, 12ac478,  2afdfbc8220) port_associate_fop+0x408(16, 7, 4a330, 16, 4a330, 2a10424d968) portfs+0x2c8(1, 0, 7, 2a0, 0, 4a330) syscall_trap32+0xcc(1, a, 7, 4a330, 10000006, 4a330) >  

Looking at the code it appears that if port_pfp_setup encounters an error it frees the some kernel memory twice. Specifically it frees the memory pointed to by the cname local variable in port_associate_fop twice. Hence the random panics. The diffs for the fix are:

*** port_fop.c  Fri Oct 26 08:58:01 2007 — /tmp/cg13442/port_fop.c     Tue Feb  5 14:04:21 2008 *************** *** 1306,1311 **** — 1306,1312 —-                 if (error = port_pfp_setup(&pfp, pp, vp, pfcp, object,                     events, user, cname, clen, dvp)) {                         mutex_exit(&pfcp->pfc_lock); +                       cname = NULL;                         goto errout;                 }

I have just files this bug:

6659309: port_associate_fop frees a buffer twice if port_pfp_setup returns an error.

What I don’t know is why we suddenly started seeing the bug. Is it that build 82 exercise event ports more or that the bug has been revealed by some other change? Either way it make me nervous for my home server running, you guessed it, build 82! At least next time someone asks why we bother running a Sun Ray server on the latest greatest nevada bits I have a preprepared place to send them. It is here.


