Filtering what scsi.d processes scsi.d reports IO for.

July 18, 2008

I have updated scsi.d
to allow you to see just the the commands associated with particular
processes either by execname or by process id.


To see all the scsi IO associated with a process with an exec name
of “dd”:


# /tmp/scsi.d -D EXECNAME=’"dd"’
Hit Control C to interrupt
00058.416893400 glm0:-> 0×00 TEST_UNIT_READY address 00:00, lba 0×000000, len 0×000000, control 0×00 timeout 60 CDBP 300e8f41358 1 dd(23319) cdb(6) 000000000000
00058.417960685 glm0:<- 0×00 TEST_UNIT_READY address 00:00, lba 0×000000, len 0×000000, control 0×00 timeout 60 CDBP 300e8f41358, reason 0×0 (COMPLETED) state 0×17 Time 1208us
00058.418159142 glm0:-> 0×00 TEST_UNIT_READY address 00:00, lba 0×000000, len 0×000000, control 0×00 timeout 60 CDBP 300e8f41358 1 dd(23319) cdb(6) 000000000000
00058.419187590 glm0:<- 0×00 TEST_UNIT_READY address 00:00, lba 0×000000, len 0×000000, control 0×00 timeout 60 CDBP 300e8f41358, reason 0×0 (COMPLETED) state 0×17 Time 1128us
00058.419399600 glm0:-> 0×1a MODE_SENSE(6) address 00:00, lba 0×000300, len 0×000024, control 0×00 timeout 60 CDBP 300e8f41358 1 dd(23319) cdb(6) 1a0003002400
00058.443792908 glm0:<- 0×1a MODE_SENSE(6) address 00:00, lba 0×000300, len 0×000024, control 0×00 timeout 60 CDBP 300e8f41358, reason 0×0 (COMPLETED) state 0×1f Time 24492us
00058.443986521 glm0:-> 0×1a MODE_SENSE(6) address 00:00, lba 0×000400, len 0×000024, control 0×00 timeout 60 CDBP 300e8f41358 1 dd(23319) cdb(6) 1a0004002400
00058.460271773 glm0:<- 0×1a MODE_SENSE(6) address 00:00, lba 0×000400, len 0×000024, control 0×00 timeout 60 CDBP 300e8f41358, reason 0×0 (COMPLETED) state 0×1f Time 16385us
00058.460509036 glm0:-> 0×08 READ(6) address 00:00, lba 0×000000, len 0×000001, control 0×00 timeout 60 CDBP 300e8f41358 1 dd(23319) cdb(6) 080000000100
00058.464377145 glm0:<- 0×08 READ(6) address 00:00, lba 0×000000, len 0×000001, control 0×00 timeout 60 CDBP 300e8f41358, reason 0×0 (COMPLETED) state 0×1f Time 3966us
00058.464571604 glm0:-> 0×08 READ(6) address 00:00, lba 0×000000, len 0×000001, control 0×00 timeout 60 CDBP 300e8f41358 1 dd(23319) cdb(6) 080000000100
00058.465001748 glm0:<- 0×08 READ(6) address 00:00, lba 0×000000, len 0×000001, control 0×00 timeout 60 CDBP 300e8f41358, reason 0×0 (COMPLETED) state 0×1f Time 528us
00058.465739416 glm0:-> 0×08 READ(6) address 00:00, lba 0×0186a0, len 0×000001, control 0×00 timeout 60 CDBP 300e8f41358 1 dd(23319) cdb(6) 080186a00100
00058.471783255 glm0:<- 0×08 READ(6) address 00:00, lba 0×0186a0, len 0×000001, control 0×00 timeout 60 CDBP 300e8f41358, reason 0×0 (COMPLETED) state 0×1f Time 6143us





or to see all the scsi IO associated
with a particular command:


# ./scsi.d -c "dd  if=/dev/rdsk/c0t0d0s0 of=/dev/null count=1 iseek=100000"
Hit Control C to interrupt
1+0 records in
1+0 records out
00000.046670200 glm0:-> 0×00 TEST_UNIT_READY address 00:00, lba 0×000000, len 0×000000, control 0×00 timeout 60 CDBP 300e8f41358 1 dd(23326) cdb(6) 000000000000
00000.047677067 glm0:<- 0×00 TEST_UNIT_READY address 00:00, lba 0×000000, len 0×000000, control 0×00 timeout 60 CDBP 300e8f41358, reason 0×0 (COMPLETED) state 0×17 Time 1151us
00000.047871553 glm0:-> 0×00 TEST_UNIT_READY address 00:00, lba 0×000000, len 0×000000, control 0×00 timeout 60 CDBP 300e8f41358 1 dd(23326) cdb(6) 000000000000
00000.048759564 glm0:<- 0×00 TEST_UNIT_READY address 00:00, lba 0×000000, len 0×000000, control 0×00 timeout 60 CDBP 300e8f41358, reason 0×0 (COMPLETED) state 0×17 Time 988us
00000.048979511 glm0:-> 0×1a MODE_SENSE(6) address 00:00, lba 0×000300, len 0×000024, control 0×00 timeout 60 CDBP 300e8f41358 1 dd(23326) cdb(6) 1a0003002400
00000.073204217 glm0:<- 0×1a MODE_SENSE(6) address 00:00, lba 0×000300, len 0×000024, control 0×00 timeout 60 CDBP 300e8f41358, reason 0×0 (COMPLETED) state 0×1f Time 24326us
00000.073406796 glm0:-> 0×1a MODE_SENSE(6) address 00:00, lba 0×000400, len 0×000024, control 0×00 timeout 60 CDBP 300e8f41358 1 dd(23326) cdb(6) 1a0004002400
00000.089689153 glm0:<- 0×1a MODE_SENSE(6) address 00:00, lba 0×000400, len 0×000024, control 0×00 timeout 60 CDBP 300e8f41358, reason 0×0 (COMPLETED) state 0×1f Time 16385us
00000.089932178 glm0:-> 0×08 READ(6) address 00:00, lba 0×000000, len 0×000001, control 0×00 timeout 60 CDBP 300e8f41358 1 dd(23326) cdb(6) 080000000100
00000.093793880 glm0:<- 0×08 READ(6) address 00:00, lba 0×000000, len 0×000001, control 0×00 timeout 60 CDBP 300e8f41358, reason 0×0 (COMPLETED) state 0×1f Time 3962us
00000.093991464 glm0:-> 0×08 READ(6) address 00:00, lba 0×000000, len 0×000001, control 0×00 timeout 60 CDBP 300e8f41358 1 dd(23326) cdb(6) 080000000100
00000.094429110 glm0:<- 0×08 READ(6) address 00:00, lba 0×000000, len 0×000001, control 0×00 timeout 60 CDBP 300e8f41358, reason 0×0 (COMPLETED) state 0×1f Time 538us
00000.095207716 glm0:-> 0×08 READ(6) address 00:00, lba 0×0186a0, len 0×000001, control 0×00 timeout 60 CDBP 300e8f41358 1 dd(23326) cdb(6) 080186a00100
00000.101200857 glm0:<- 0×08 READ(6) address 00:00, lba 0×0186a0, len 0×000001, control 0×00 timeout 60 CDBP 300e8f41358, reason 0×0 (COMPLETED) state 0×1f Time 6096us


Similarly you can see the IO associated with a particular process:


# ./scsi.d -p $(pgrep -x dd) | head -30 | sort -n
Hit Control C to interrupt
00000.002472402 glm0:-> 0×08 READ(6) address 00:00, lba 0×02b2d2, len 0×000001, control 0×00 timeout 60 CDBP 300e8f40318 1 dd(23328) cdb(6) 0802b2d20100
00000.010438300 glm0:<- 0×08 READ(6) address 00:00, lba 0×02b2d2, len 0×000001, control 0×00 timeout 60 CDBP 300e8f40318, reason 0×0 (COMPLETED) state 0×1f Time 8156us
00000.010706063 glm0:-> 0×08 READ(6) address 00:00, lba 0×02b2d3, len 0×000001, control 0×00 timeout 60 CDBP 300e8f40318 1 dd(23328) cdb(6) 0802b2d30100
00000.011154931 glm0:<- 0×08 READ(6) address 00:00, lba 0×02b2d3, len 0×000001, control 0×00 timeout 60 CDBP 300e8f40318, reason 0×0 (COMPLETED) state 0×1f Time 560us
00000.011371674 glm0:-> 0×08 READ(6) address 00:00, lba 0×02b2d4, len 0×000001, control 0×00 timeout 60 CDBP 300e8f40318 1 dd(23328) cdb(6) 0802b2d40100
00000.011814640 glm0:<- 0×08 READ(6) address 00:00, lba 0×02b2d4, len 0×000001, control 0×00 timeout 60 CDBP 300e8f40318, reason 0×0 (COMPLETED) state 0×1f Time 552us
00000.012025254 glm0:-> 0×08 READ(6) address 00:00, lba 0×02b2d5, len 0×000001, control 0×00 timeout 60 CDBP 300e8f40318 1 dd(23328) cdb(6) 0802b2d50100
00000.012468655 glm0:<- 0×08 READ(6) address 00:00, lba 0×02b2d5, len 0×000001, control 0×00 timeout 60 CDBP 300e8f40318, reason 0×0 (COMPLETED) state 0×1f Time 552us


Clearly there are plenty of possiblities here, not least the
reporting of all IO associated with a process or command and all of
it’s children.


This is still work in progress but since I’ve used it to file a
bug I thought I better put it out here so that others can enjoy it.
This is version 1.14.


Getting dtrace to exit with the same status as the traced program

July 9, 2008

Today’s dtrace one-liner is how do you get dtrace to exit with the
same exit code as the program you are tracing so you can use it in a
script?


There are going to be many ways to solve this. Mine is simply to
include a probe for the exit system call and then pass arg0 of the
exit system call to the dtrace exit() function:


while dtrace -n ‘syscall::*exit:entry /pid == $target / { trace(arg0); exit(arg0) } fbt:autofs::return / pid == $target / { stack(5); trace(arg1) }’ -c ‘ls -d /home/cjg’
do
umountall -F nfs
svcadm restart autofs
done


Simple and allows the test to run and run until the problem is seen.


Full tracing of gethostbyXXX calls.

July 7, 2008

The gethostbyname.d “one
liner”
is now no longer anywhere near a one liner as to not
be funny. It does however do all the things you would want it to
do. Printing the entire hostent1
structure on success and all the h_error values on failure.


: enoexec.eu FSS 257 $; pfexec /usr/sbin/dtrace -32 -CZs gethostbyXXXX.d -c getent hosts www.ibm.com

129.42.56.216 www.ibm.com www.ibm.com.cs186.net
Look up: www.ibm.com: took 3276803us
Host: www.ibm.com
h_alias[0]: www.ibm.com.cs186.net
h_address[0]: 129.42.56.216
: enoexec.eu FSS 258 $; pfexec /usr/sbin/dtrace -32 -CZs gethostbyXXXX.d -c getent hosts this.host.does.not.exist.sun.com
Look up: this.host.does.not.exist.sun.com: took 31789us
gethostbyname_r failed. h_errno: 1: Host not found
: enoexec.eu FSS 259 $; pfexec /usr/sbin/dtrace -32 -CZs gethostbyXXXX.d -c getent hosts 127.0.0.1
127.0.0.1 localhost
Look up: 127.0.0.1 took 247882us:
Host: localhost
h_address[0]: 127.0.0.1
: enoexec.eu FSS 260 $;


I learned some useful things while writing this:



  1. Writing userland scripts is much
    harder than kernel scripts. If your script has “#include
    <xxxx.h> in it then you are entering a world of hurt.
    Specifically make sure you set the data model of dtrace to match
    that of the application you are scripting.


  2. The various copyin() routines
    write into scratch memory. Scratch memory is only good during the
    current clause so if you want to move the data from clause to clause
    you need to store it. Thanks again to Jon
    for pointing this out.


  3. How cool would “follow fork”
    be for dtrace? Very.


  4. It would be really nice to bundle
    this script up as “getXXXbyYYY.d and include all the other
    getXXXbyYYY routines that there are (getpwbyname, getpwbyuid etc
    etc) however even this script is on the edge of producing DIF that
    is to large for the standard settings so having more probes would
    make it unworkable. Anyway they make a good exercise for the reader.



The script is here: gethostbyXXXX.d








1Well
only the first 10 host name aliases and the first 10 IP addresses
are printed. Adding more is trivial but you will need to up the size
of the DIF that the kernel is prepared to accept.



Dtracing gethostbyname_r

July 2, 2008

Today’s dtrace “one liner” is to dig into why a
process that is failing when a call to gethostbyname_r(). Now if the
application reported all that was going on this would not be needed,
meanwhile in the real world we are lucky enough to have dtrace. To
test it out I’m tracing the getent(1) command. In real life it would
use the -p flag to dtrace and the process ID of the daemon.


/usr/sbin/dtrace -n ‘pid$target::gethostbyname_r:entry {
self->name = arg0;
}
pid$target::gethostbyname_r:return / arg1 == 0 / {
ustack(5);
}
pid$target::gethostbyname_r:return / self->name / {
printf("%s", copyinstr(self->name));
self->name = 0
}’ -c "getent hosts xxxxxdredd"
dtrace: description ‘pid$target::gethostbyname_r:entry ‘ matched 3 probes
dtrace: pid 4748 has exited
CPU ID FUNCTION:NAME
2 48759 gethostbyname_r:return
libnsl.so.1`gethostbyname_r+0xc4
getent`dogethost+0×54
getent`main+0×7c
getent`_start+0×108
2 48759 gethostbyname_r:return xxxxxdredd

It would be nice to be able to get the h_errno value as well but so far I’ve not managed that.


Update:

Jon Haslem kindly explained to me the subtleties of copyin() so that I can get the h_errno value.



/usr/sbin/dtrace -Zn ‘pid$target::gethostbyname_r:entry {
self->name = arg0;
self->errno = arg4;
}
pid$target::gethostbyname_r:return / arg1 == 0 / {
ustack(5);
printf(“%d %s h_errno %x”, pid,
copyinstr(self->name),
*(int *)copyin(self->errno,sizeof(int)));
}’ -c “getent hosts xxxxxdredd”
dtrace: description ‘pid$target::gethostbyname_r:entry ‘ matched 2 probes
dtrace: pid 5087 has exited
CPU ID FUNCTION:NAME
2 48764 gethostbyname_r:entry errno: d4220008
2 48765 gethostbyname_r:return
libnsl.so.1`gethostbyname_r+0xc4
getent`dogethost+0×54
getent`main+0×7c
getent`_start+0×108
5087 xxxxxdredd h_errno 1
#


What is the maximum number of commands queued to a LUN

July 1, 2008

This is not quite a one liner as I’m reusing the code from a
previous
post
to print out the devices in a human readable form other wise
it is just a one liner and was when I typed it in.


The question posed here was what is the
maximum number of commands sent to a LUN at any one time? Clearly
this will max out at the throttle for the device however what was
interesting, since the customer had already tuned the throttle down
and the problem had gone away was what was their configuration
capable of sending to the LUN:



#!/usr/sbin/dtrace -qCs
#define SD_TO_DEVINFO(un) ((struct dev_info *)((un)->un_sd->sd_dev))
#define DEV_NAME(un) \
stringof(`devnamesp[SD_TO_DEVINFO(un)->devi_major].dn_name) /* ` */
#define DEV_INST(un) (SD_TO_DEVINFO(un)->devi_instance)
fbt:*sd:*sd_start_cmds:entry { @[DEV_NAME(args[0]),DEV_INST(args[0])] = max(arg
s[0]->un_ncmds_in_driver) }
END {
printa("%s%d %@d\n", @);
}





This produces a nice list of disk
devices and the maximum number of commands that have been sent to
them at anyone time:


# dtrace -qCs  /var/tmp/max_sd.d -n ‘tick-5sec { exit(0) }’
sd2 1
sd0 70
#


Combine that with the dscript from the latency
bubble
posting earlier and you can drill down on where your IO is
waiting.


Today’s Dtrace one liner

July 1, 2008

Todays dtrace one liner is part of a case investigating why
messages are not making it into the messages file. Using the divide
and concur priciple the first question you need to answer is: “Is
the process that is supposed to generate the messages calling into
syslog?”


$ dtrace -n ‘pid$target::syslog:entry { printf("%d %s", arg0, copyinstr(arg1)) }’ -p $(pgrep xxxx)
dtrace: description ‘pid$target::syslog:entry ‘ matched 1 probe
CPU ID FUNCTION:NAME
1 43227 syslog:entry 5 %s
^C


That is enough to answer the first question. You can get all flash
and pull out the string passed in as the second argument but then it
is not a one liner and is answering a different question. However it
is neat so here it is:


$ dtrace -qn ‘pid$target::syslog:entry { printf("%d %s", arg0, (this->arg1 = copy
instr(arg1))) }
pid$target::syslog:entry / this->arg1 == "%s"/ { printf(" %s\n", copyinstr(arg2))
}’ -p $(pgrep xxxx)

How about varargs and vsprintf for dtrace….