Skip to content

How to shoot yourself in the foot with mdb

December 15, 2006

This would be funny if it were not for the poor customer who actually did this, lead by the hand by an info doc that suggested you do this:

#mdb -kw >do_tcp_fusion/W 0
That is it. No information as to what to do next.
However those of you steeped in adb history, remember mdb has full backward compatibility will now that if you type another address at this point it will repeat the previous command. So it will write 0 to the address specified.
If you were unfortunate enough to not be steeped in adb history they you may not know how to exit from the mdb session. If you were to guess that the way to do this was to type “exit” then mdb happily looks up “exit” in the symbol table, converts that to an address and writes 0 into that address:
# mdb -kw Loading modules: [ unix krtld genunix specfs ufs ip sctp usba s1394 nca ipc nfs audiosup random sppp sd crypto ptm lofs ] > do_tcp_fusion/W 0 do_tcp_fusion:  0x1             =       0x0 > exit exit:           0x9de3bf50      =       0x0 >  

If you are quick, lucky, and realise what has happened you can write the instruction back before the system crashes, but on a moderately busy system you have almost no time to. you have to do it before the next process exits. Hitting control D, or exiting mdb using any other method now results in the system crashing:


panic[cpu0]/thread=3000183c020: BAD TRAP: type=10 rp=2a10037ba50 addr=10c8a00 mmu_fsr=0  mdb: illegal instruction fault: addr=0x10c8a00 pid=2926, pc=0x10c8a00, sp=0x2a10037b2f1, tstate=0x9900001602, context=0x115b g1-g7: 10403ac, 58692c, 10c865c, 20, 80000305cfcc0ef8, 0, 3000183c020  000002a10037b770 unix:die+9c (10, 2a10037ba50, 10c8a00, 0, 2a10037b830, c0000000 )   %l0-3: ffffffff7f402000 0000000000000010 ffffffff7e6ebec4 0000000000000000   %l4-7: 0000000000000000 0000000000001084 0000000000001000 000000000106b800 000002a10037b850 unix:trap+12b8 (2a10037ba50, 0, 0, 1835800, 180c000, 3000183c02 0)   %l0-3: 0000000000000000 0000000000000010 0000030001832a98 0000000000000000   %l4-7: 0000000000010008 0000000000010000 0000000000000001 000000000180c180 000002a10037b9a0 unix:ktl0+48 (1, 0, 100173000, 100173, 5, 5)   %l0-3: 0000000000000003 0000000000001400 0000009900001602 0000000001013c74   %l4-7: 0000030001832cc0 0000000000000000 0000000000000000 000002a10037ba50  syncing file systems… done dumping to /dev/dsk/c0t0d0s1, offset 107806720, content: kernel


It is actually a better way to induce a panic than most of the ones documented in books like Panic.


I’ve changed the info doc in question to have the command specified as:

 echo ‘do_tcp_fusion/W 0’ | mdb -kw


So that it does not lead any more customers down that path, yes I’ve trawled sunsolve for all the cases where we suggest mdb -kw and updated them in a similar way.


Update: I also filed bug 6505499

Tags: topic:[panic] topic:[shot in the foot]

Advertisements

From → Solaris

9 Comments
  1. full backward compatibility ????

    $ cat foo.c
    int some_global = 0;
    main() { exit(0); }
    $
    $ cat foo.c
    int some_global = 0;
    main() { exit(0); }
    $
    $ cc foo.c
    $
    $ adb -w a.out
    main:b
    :r
    a.out: running
    breakpoint      ~main:          jsr     r5,csv
    some_global/o
    _some_gl:       0
    some_global/w0
    _some_gl:       0       =       0
    exit
    01:             0100360
    main
    074:            04567
    exit?i
    start+01:       bpl     0177743
    main?i
    ~main:          jsr     r5,csv
    :c
    a.out: running
    process terminated
    $
    /w command do no repeat. Fix the bug for backward compatibility with PDP-11 unix v7 ASAP PLEASE !
    Thank you
    Stéphane
  2. Ouch
    Of course a better way to do this would have been

    # echo do_tcp_fusion/W0 | mdb -kw
    

    Perhaps the infodoc should be thus modified.
    Alan.

  3. Alan,
    That is what I have changed it to, as I said in this entry. Clearly the Ashes are going to your head!
    Stephane,
    Interesting. Perhaps we need to file a bug against adb and mdb.

  4. 🙂

    I’m glad adb is sill on Solaris, and I hope nobody will remove it, because I still use it very often, like, you know, when somebody wants to change a string in a 100 Mb a.out and is complaining that loading the file in hexl mode with emacs causes a max buffer size reached error message 😉

    I didn’t used adb when I had an account on a PDP, but I’m pretty sure /W didn’t repeat when another symbol was typed in. If somebody cares, Solaris adb should be modified. If someone wants to clear a memory area, he can do symbol,size/Wvalue .

    Launching a VAX emulator and loading a 4.3BSD could probably be done in a matter of a few minutes.

    I remember ADB sources where written in C but Pascal like, something like # define BEGIN { or something. Are the sources of adb for solaris still available ?
    thanks
    stephane

  5. Hi again, it seems Chris posted a comment while I was writing mine.

    Chris, you’re thinking of filling a bug regarding the autorepeat of the /w command which should probably not be ?

    Regards – Stephane – Paris – France

  6. There is good news and bad news about adb in Solaris. The bad news is that it is not really there any more. The good news is that it is now a link to mdb, which is backward compatable with our old adb implementation (I’m going to check that the old adb we had when it was indeed a different program would repeat write operations, if it does not then this is a bug in mdb otherwise it is just a dangerous feature that I think should be fixed. That will have to wait until Monday as all the systems I have at home run Nevada and since it is the weekend I am trying not to fire up the Sun Ray as that leads to me doing even more work).
    The source to mdb is on the opensolaris site.
    http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/mdb/
    but is not in a pascal style, thankfully.

  7. stephane permalink

    It’s not Pascal like, I just checked. It’s worse than this 😉 Example below.

    endpcs()
    {
    REG BKPTR	bkptr;
    IF pid
    THEN ptrace(PT_KILL,pid,0,0); pid=0; userpc=1;
    FOR bkptr=bkpthead; bkptr; bkptr=bkptr->nxtbkpt
    DO IF bkptr->flag
    THEN bkptr->flag=BKPTSET;
    FI
    OD
    FI
    bpstate=BPOUT;
    }
    

    SunOS burton1 5.9 Generic_118558-26 sun4u sparc SUNW,Ultra-4 Solaris
    repeats the ?W or/W command at each <CR>, if that helps.

  8. SunOS 5.9 mdb and adb are the same command we have to go back to 5.8 for the old adb.
    I have filed bug 6505499
    (the link will work a when the bug makes it through the fire wall) that mdb should behave as adb did with respect to repeating commands.
    Many thanks Stephane for your help.

  9. stephane permalink

    You’re welcome.
    Now, we have two x4600 (16 cores) which will crash after a while (few minutes) running an intensive cpu test program (some stuff from distributed.net).
    The (sun) french guy in charge of the case told us something like “the machine is fine ! it’s this stupid program which causes the crash, but you don’t really need to run this code do you ?”. Imagine I had to talk about this adb bug with this guy 😉

    BTW, we fixed the second machine by installing Windows on it and finding some weird (cpu/firmware) patch somewhere so it stops crashing. It’s now calculating keys or something (rules ?) just fine !

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: