Skip to content

Two line fix in init.

June 14, 2005

Two line fix in init.

As I blogged a while back I got to investigate UNIX’s second only process. Now that Open Solaris is live I can tell the full story of how literally weeks of work boiled down to a two line fix of staggering simplicity.


We had had bug 6183189 logged which stated that occasionally when you shutdown a system the shutdown would hang. You then get sent a crash dump of the whole system which looks odd, everything appears to have exited but init is just stuck in poll as if waiting for more to do.


That turned out to be exactly what it was doing. Init keeps house keeping information about the processes it manages in an array of PROC_TABLE structures. When it shuts down it sends each process a signal and then in the SIGCHLD signal handler it removes the “LIVING” flag from the p_flags entry.


However when it is doing this it did not check that the LIVING flag was set. So if a new process was reaped that had exactly the same PID as one of the processes that was being managed via inittab init assumed it was an inittab process and it had no more work to do.


In childeath_single my fix turned out to be a 2 line change to make sure init still believed that the process in question was “LIVING”, if it did not then the PIDs on the system must have wrapped and one of the processes generated during shutdown has been given the same PID as one of the processes init was managing via inittab and has now killed.


<                 if ((process->p_flags & OCCUPIED) == OCCUPIED && <                     process->p_pid == pid) { — >                 if ((process->p_flags & (LIVING|OCCUPIED)) == >                     (LIVING|OCCUPIED) && process->p_pid == pid) {


This fix would have been simple had the length of time1 taken to track this one down not meant that by the time I went to put the fix back the excellent service management facility not been added. The testing of this fix highlighted two other problems that had been introduced: 6185257 and 6192173 which both effected init’s handling of the the utmpx file and how that should now be handled with the putback of smf(5). These added some extra lines to the fix after a short discussion with some others about the right way that utmpx should be handled and then I was done. Just had to back port my original fix.

Technorati Tag: OpenSolaris, Solaris

1In my defence this bug was in Solaris 2.0 and was also in the releases from which this part of Solaris 2.0 was derived so had been around and kicking for a long time before I was tasked with finding and fixing it.

Advertisements

From → Solaris

One Comment
  1. I alway cringe when I see init, utmpx and bug in the same sentence! Just prior to the putback of smf, I put back changes to init, utmpd and utmp.h which implemented a new utmp record ( DOWN_TIME ) which tracks when a machine is shutdown ( either controlled or uncontrolled ). That putback was my first ever RFE and boy was I nervous!

    Now, every time I run “last reboot” I smile.

    -ejo

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: