Will I ever escape NIS+?

April 10, 2008

Todays blast from the past is what do to when your NIS+, yes I did say NIS+, name space does not do what you are expecting. Contrary to popular myth NIS+ can be reliable and can scale to large deployments, so much so that there are a number of customers that do have large deployments and that does not include the two that I’m aware of in Sun. That said, even I would not advocate anyone setting up a NIS+ namespace now. LDAP is the future and the way to go.

Now back to NIS+. Today’s problem was not atypical of the kind of issues you can see with NIS+ and was also an interesting as the SGRT questions or at a least the answers to the SGRT questions did not immediately lead to a resolution. The problem statement was “New users are not correctly authenticated”. So when they logged in “nisdefautlts -p” would say they were “nobody”. Having them keylogin would then, it was claimed, resolve the issue.

After a bit of questioning it was clear that either I was asking the wrong questions or the answers I was getting were not accurate or someone had installed some randomizing function into the system. Shared Shell to the rescue. I could now see with my own eyes what was going on and then suggest the next command to run without worrying about translation. It became clear that the problem was indeed random. Successive calls to “nisdefaults -p” would give different results and I would hazard a guess, although I did not confirm this, this effected all the users and all the systems.

The key to tracking this down is the NIS_OPTIONS envirnment variable which allows you to see each NIS+ call and it’s return status and more interestingly in this case lets you see which server served you:

: FSS 6 $; env NIS_OPTIONS="debug_bind debug_calls" nisdefaults -p nis_list([auth_name=14442,auth_type=LOCAL],, 0x30003, 0x0, 0x0) binding to directory (parent first) bind succeeded create handle: DG release, status = 0 status=Success, 1 object, [z=427, d=363, a=3327, c=4918] : FSS 7 $;  

I got lucky with the customer and the problem fell out at the first attempt. They had a half deleted a NIS+ replica server so it was still in the org_dir directory object and was still running rpc.nisd but would respond with an error when ever it was called. If you got another NIS+ server you were o.k. In a way it was a pity to get there so quickly as I never had the chance to send them this script:

#!/bin/ksh unset dom unset host verbose=0 vecho() { 	if [ $verbose -eq 1 ] 	then 		echo $@ 	fi } while getopts vd:h: c      do            case $c in           	d) dom=$OPTARG ;;           h)       host=$OPTARG;; 	  v)	   verbose=1 ;;           \?)      echo "USAGE ${0##*/} [-v] -h host -d domain — command"              exit 2;;           esac      done  shift `expr $OPTIND – 1`   if [ "${host}" = "" -a "$dom" = "" ]  then 	echo one or both of -h and -m must be used 	exit 1 fi  if [ "$dom" != "" ] then for server in $(niscat -o ${dom} | nawk ‘/Master/ { master=1 } /Name/ { if (master==1) print $3 }’) do 	echo server=$server 	vecho NIS_OPTIONS="server=$server" $@ 	NIS_OPTIONS="server=$server" $@ 	x=$? 	if [ $x -ne 0 ] 	then 		niserror $x 	fi done fi if [ "$host" != "" ] then 	vecho NIS_OPTIONS="server=$host" $@ 	NIS_OPTIONS="server=$host" $@ 	x=$? 	if [ $x -ne 0 ] 	then 		niserror $x 	fi fi exit 0 

Which amongst other things will run the same command using each NIS+ server for a directory in turn. Great when you think something is misbehaving but can’t quite put your finger on which server it is.

: FSS 10 $; ./nis_server -d nismatch [auth_name=14442,auth_type=LOCAL],cred.org_dir,2192,14,2703,2400,2502,2705,2194,3000,2708,826:,2192,14,2703,2400,2502,2705,2194,3000,2708,826:,2192,14,2703,2400,2502,2705,2194,3000,2708,826:,2192,14,2703,2400,2502,2705,2194,3000,2708,826: : FSS 11 $;  ls -l ./nis_server -rwxr-x–x   1 cg13442  staff        910 Mar 30  2001 ./nis_server : FSS 12 $; 

It appears that script is 7 years old. Again the problem was not really NIS+ at all but an admin error.


From → Solaris

  1. This reminds me of a time when someone we both knew claimed every problem they saw in the lab was a NIS+ problem…

  2. Mike Smith permalink

    One of my happiest days was the day the NIS kit was made available for 2.x!

  3. Yes YP lives on and it still sucks. You could write a paper on how not to deliver new technology around the introduction of NIS+. It was doomed from the start by it first being mandatory and second not ready for prime time when it released. This meant all it’s bugs and idiosyncrasy’s were thrust onto an unforgiving audience. It only lives on in places that either had a reason to stick with it (Sun Service as we had to support it) or were security conscious.

