A word for WinDbg (2)

Continuing on from the previous post on using WinDbg let’s take what we learnt in that previous post and apply it to managed code.

 

WinDbg supports the debugging of managed code through an extension named SOS.DLL. This is named for esoteric reasons that I’ll not get into here but you can find out where that name came from by Googling the web for it.

 

This DLL is shipped with the latest debugger so you’ll find a version of it which matches V1.1 of the .NET framework in the debugger’s installation folder and you’ll also find a sub-folder labelled CLR10 which contains the previous version of the DLL.

 

You interact with the SOS.DLL extension by loading it up into WinDbg as you load any other extension. So, if I want to make use of SOS.DLL I issue;

 

.load SOS.DLL

 

And the debugger loads the extension for me and I can see it in the results of a;

 

.chain

 

command and I can set it as my default extension by using;

 

.setdll SOS.DLL

 

Before we go into SOS.DLL in more depth I’ll add a quick word about the symbols for the .NET framework. If you have the .NET framework SDK installed on your machine then you should have symbols for DLLs such as mscorwks.dll and mscoree.dll which make up the CLR installed onto your machine. On my machine (with VS.NET 2003 installed) these symbol files exist in a folder named;

 

C:\program files\Microsoft Visual Studio.NET 2003\SDK\v1.1\Symbols

 

So, this means that you can add this folder to your symbol path (see the previous post) and have WinDbg pick up symbols for the CLR bits for you locally. Alternatively, these symbols will come down from the symbol server anyway but it might save you some download time.

 

So, let’s consider that straight away what that adds. If I have a piece of managed code such as the following;

 

using System;

using System.Threading;

 

namespace ConsoleApplication12

{

      class BadClass

      {

            public BadClass()

            {

            }

            ~BadClass()

            {

                  Thread.Sleep(Timeout.Infinite);

            }

      }

      class EntryPoint

      {

            static void Main(string[] args)

            {

                  BadClass c1 = new BadClass();

                  BadClass c2 = new BadClass();

 

                  System.GC.Collect();

                  System.GC.Collect();

                  System.GC.WaitForPendingFinalizers();

 

                  Console.WriteLine("Done");

                  Console.ReadLine();

            }

      }

}

 

Then I’ve created a bit of a monster in that what you should find is that this piece of code probably runs to completion if you run it in debug mode but it’ll never complete if you run it in release mode. This is because in release mode we made c1 and c2 eligible for garbage collection at an earlier point and that allows this code to run the finalizer for BadClass and that finalizer blocks the finalizer thread.

 

How to debug this with WinDbg? Run the process in release mode, attach WinDbg and have a look at the stack traces with a;

 

~* kb 200

 

command. I can see straight away from this that one of my threads looks like this;

 

  2  Id: 7b8.a08 Suspend: 1 Teb: 7ffdd000 Unfrozen

ChildEBP RetAddr  Args to Child             

06bffd50 7c90fb74 7c8023ae 00000001 06bffd84 ntdll!KiFastSystemCallRet

06bffd54 7c8023ae 00000001 06bffd84 00000000 ntdll!NtDelayExecution+0xc

06bffdac 7929703b ffffffff 00000001 ffffffff KERNEL32!SleepEx+0x61

06bffdcc 792e42d5 ffffffff 06bffe04 06bffe34 mscorwks!Thread::UserSleep+0x93

06bffddc 0095a492 06bffde8 ffffffff 06bffe3c mscorwks!ThreadNative::Sleep+0x30

WARNING: Frame IP not in any known module. Following frames may be wrong.

06bffe34 791f9118 06bffeb0 7920fc3f ffffffff 0x95a492

06bffec0 791f91dd 04a619a4 00158010 00147880 mscorwks!MethodTable::CallFinalizer+0xee

06bffed4 791e1f08 04a619a4 7c8099f2 00000000 mscorwks!CallFinalizer+0x84

06bfff6c 791bbed0 04a619a4 00000000 0012fb9c mscorwks!CallFinalizer+0x245

06bfffb4 7c812ccb 00000000 0012fb9c 003e0000 mscorwks!GCHeap::FinalizerThreadStart+0xc2

06bfffec 00000000 791d0020 00000000 00000000 KERNEL32!BaseThreadStart+0x37

 

So, I can see that this is my finalizer thread and somewhere in the call chain I can see that this thread is calling Sleep which sounds like a bad idea to me for a finalizer method.

 

Instantly, you get a good picture of what might be going on here. As an aside, it’s worth taking a look around the mscorwks, mscoree and friends DLLs to have a look at places where you can set breakpoints. You can do this with;

 

X mscoree!*

X mscorwks!*

 

So, our example so far has been nice but it doesn’t directly interact with managed code. If we want to do that then we need to look to the SOS.DLL and the functionality that it offers. We can see this with !SOS.help which gives;

 

0:003> !sos.help

SOS : Help

COMState             | List COM state for each thread

ClrStack             | Provides true managed stack trace, source and line numbers.

                       Additional parameters: -p[arams] -l[ocals] -r[egs] -a[ll].

DumpClass <addr>     | Dump EEClass info

DumpDomain [<addr>]  | List assemblies and modules in a domain

DumpHeap [-stat] [-min 100] [-max 2000] [-mt 0x3000000] [-type <partial type name>] [-fix] [start [end]] | Dump GC heap contents

DumpMD <addr>        | Dump MethodDesc info

DumpMT [-MD] <addr>  | Dump MethodTable info

DumpModule <addr>    | Dump EE Module info

DumpObj <addr>       | Dump an object on GC heap

DumpStack [-EE] [-smart] [top stack [bottom stack] | -EE only shows managed stack items.

DumpStackObjects [top stack [bottom stack]

DumpVC <mt> <addr>   | Dump a value class object

EEHeap [-gc] [-win32] [-loader] | List GC/Loader heap info

EEStack [-short] [-EE] | List all stacks EE knows

EEVersion            | List mscoree.dll version

FinalizeQueue [-detail]     | Work queue for finalize thread

GCInfo [<MD>] [IP]   | Dump GC encoding info for a managed method

GCRoot <addr>        | Find roots on stack/handle for object

IP2MD <addr>         | Find MethodDesc from IP

Name2EE <module name> <item name> | Find memory address of EE data given a class/method name

ObjSize [<addr>]     | Find number of bytes that a root or all roots keep alive on GC heap.

ProcInfo [-env] [-time] [-mem] | Display the process info

RWLock [-all] <addr> | List info for a Read/Write lock

SyncBlk [-all|#]     | List syncblock

ThreadPool           | Display CLR threadpool state

Threads              | List managed threads

Token2EE  <module name> <mdToken> | Find memory address of EE data for metadata token

u [<MD>] [IP]        | Unassembly a managed code

 

 

Looking at “Global State”

 

So, there are some simpler “global-state” commands in here which show us information about the CLR. These would be things such as;

 

!ProcInfo

 

0:003> !ProcInfo

—————————————

Process Times

Process Started at: 2004 Aug  3 12:5:20.76

Kernel CPU time   : 0 days 00:00:00.03

User   CPU time   : 0 days 00:00:00.04

Total  CPU time   : 0 days 00:00:00.07

—————————————

Process Memory

WorkingSetSize:     7476 KB       PeakWorkingSetSize:     7476 KB

VirtualSize:      142832 KB       PeakVirtualSize:      142832 KB

PagefileUsage:      2916 KB       PeakPagefileUsage:      2916 KB

—————————————

32 percent of memory is in use.

 

Memory Availability (Numbers in MB)

 

                     Total        Avail

Physical Memory       2046         1372

Page File             3944         3350

Virtual Memory        2047         1907

 

!Threads

 

0:003> !Threads

ThreadCount: 2

UnstartedThread: 0

BackgroundThread: 1

PendingThread: 0

DeadThread: 0

                             PreEmptive   GC Alloc               Lock    

       ID ThreadOBJ    State     GC       Context       Domain   Count APT Exception

  0   cd0 0014cf38   2000020 Enabled  00000000:00000000 00147880     0 Ukn

  2   a08 00158010   2001220 Enabled  04a61d94:04a61ff4 00147880     0 Ukn (Finalizer)

 

 

!ThreadPool

 

0:003> !ThreadPool

Worker Thread: Total: 0 Running: 0 Idle: 0 MaxLimit: 0 MinLimit: 0

Work Request in Queue: 0

————————————–

Number of Timers: 0

————————————–

Completion Port Thread: Total: 0 Free: 0 MaxFree: 0 CurrentLimit: 0 MaxLimit: 1000 MinLimit: 0

 

!COMState

 

0:003> !COMState

     ID     TEB   APT    APTId CallerTID Context

  0  cd0 7ffdf000 Ukn

  1  e3c 7ffde000 Ukn

  2  a08 7ffdd000 Ukn

  3  d88 7ffdc000 Ukn

 

The last one here, COMState, is useful for seeing how your .NET threads have initilaised themselves to COM (i.e. are they STA threads or MTA threads and more detailed info on where they are with respect to COM apartments, contexts, etc).

 

!FinalizeQueue

 

0:003> !FinalizeQueue

SyncBlock to be cleaned up: 0

———————————-

generation 0 has 0 finalizable objects (0014d330->0014d330)

generation 1 has 0 finalizable objects (0014d330->0014d330)

generation 2 has 0 finalizable objects (0014d330->0014d330)

Ready for finalization 1 objects (0014d330->0014d334)

Statistics:

      MT    Count TotalSize Class Name

  935108        2        24 ConsoleApplication12.BadClass

Total 2 objects

 

In our scenario you can get a clear picture of what’s present on the queue for finalization in that we have 2 instances of BadClass resident in there.

 

!EEVersion

 

0:003> !EEVersion

1.1.4322.573 retail

Workstation build

 

 

 

Looking at Managed Memory

 

We can get a picture of the “global” state of memory within the application through the dumpheap command.

 

The simplest way to use dumpheap is to provide the –stat option which will give a complete list of what’s allocated on the managed heap. Note that for a big process you might need to go and get a cup of tea whilst this list is being created. In our small example dumpheap –stat gives us;

 

total 57 objects

Statistics:

      MT    Count TotalSize Class Name

79c077b4        1        20 System.Collections.ArrayList

79bff0a4        1        20 System.AppDomainSetup

79c19424        1        24 System.LocalDataStoreMgr

  935108        2        24 ConsoleApplication12.BadClass

79bfcb44        1        32 System.SharedStatics

79c0d0cc        1        52 System.Collections.Hashtable

  95236c        1        56 System.Char[]

79bfc414        1        64 System.ExecutionEngineException

79bfc2dc        1        64 System.StackOverflowException

79bfc1a4        1        64 System.OutOfMemoryException

  9526b0        1        76 System.Byte[]

79bfdbec        1        80 System.AppDomain

  952c28        1       144 System.Collections.Hashtable/bucket[]

  952964        1       292 System.Int32[]

79bf9af8       26      1448 System.String

  14d0f0       11      2800      Free

  95209c        5      6328 System.Object[]

Total 57 objects       

 

So we can see that we have 57 objects here. If you examine the line for our managed type BadClass you’ll see that it has a MT column for its method table. This effectively describes the type and we can use it to restrict the output of dumpheap by doing;

 

!dumpheap –mt 935108

 

0:003> !dumpheap -mt 935108

 Address       MT     Size

04a61998 00935108       12

04a619a4 00935108       12

total 2 objects

Statistics:

      MT    Count TotalSize Class Name

  935108        2        24 ConsoleApplication12.BadClass

Total 2 objects

 

Note that this actually gives us addresses of the instances of our type (2 in our case) which we’ll come back to in a second but, first, how would we get the MethodTable for a particular type without sitting through !dumpheap –stat first? We can use the Name2EE function as below;

 

!Name2EE ConsoleApplication12.exe ConsoleApplication12.BadClass

 

0:003> !Name2EE ConsoleApplication12.exe ConsoleApplication12.BadClass

————————————–

MethodTable: 00935108

EEClass: 06c2334c

Name: ConsoleApplication12.BadClass

 

And now we know the MethodTable address for our type which is nice J Now, coming back to those addresses. These allow us to dump out the objects themselves using the !dumpobj command;

 

!dumpobj 04a61998

 

0:003> !dumpobj 04a61998

Name: ConsoleApplication12.BadClass

MethodTable 0x00935108

EEClass 0x06c2334c

Size 12(0xc) bytes

mdToken: 02000002  (t:\Temp\ConsoleApplication12\bin\Release\ConsoleApplication12.exe)

 

This object isn’t so interesting so I’ve gone back and dumped out a hashtable so you can see what a real object looks like;

 

0:003> !dumpobj 04a61a90

Name: System.Collections.Hashtable

MethodTable 0x79c0d0cc

EEClass 0x79c0d20c

Size 52(0x34) bytes

mdToken: 020000f9  (c:\windows\microsoft.net\framework\v1.1.4322\mscorlib.dll)

FieldDesc*: 79c0d270

      MT    Field   Offset                 Type       Attr    Value Name

79c0d0cc  4000395        4                CLASS   instance 04a61d04 buckets

79c0d0cc  4000396       1c         System.Int32   instance        0 count

79c0d0cc  4000397       20         System.Int32   instance        0 occupancy

79c0d0cc  4000398       24         System.Int32   instance        7 loadsize

79c0d0cc  4000399       28        System.Single   instance 0.720000 loadFactor

79c0d0cc  400039a       2c         System.Int32   instance        0 version

79c0d0cc  400039b        8                CLASS   instance 00000000 keys

79c0d0cc  400039c        c                CLASS   instance 00000000 values

79c0d0cc  400039d       10                CLASS   instance 00000000 _hcp

79c0d0cc  400039e       14                CLASS   instance 00000000 _comparer

79c0d0cc  400039f       18                CLASS   instance 00000000 m_siInfo

79c0d0cc  4000394        0                CLASS     shared   static primes

    >> Domain:Value 00147880:04a61ac4 <<

 

So, in essence, we can traverse the entire managed heap from within WinDbg here and we can see the details of every single instance that is stored on that managed heap and we can track down all the members of those instances ad infinitum. This is really, really powerful stuff.

 

Looking at Stack Frames

 

What about stack frames? Well, if we look at the original stack trace that I had when I was trying to diagnose my “hung application” problem we can see the frames look like this;

 

  2  Id: 7b8.a08 Suspend: 1 Teb: 7ffdd000 Unfrozen

ChildEBP RetAddr  Args to Child             

06bffd50 7c90fb74 7c8023ae 00000001 06bffd84 ntdll!KiFastSystemCallRet

06bffd54 7c8023ae 00000001 06bffd84 00000000 ntdll!NtDelayExecution+0xc

06bffdac 7929703b ffffffff 00000001 ffffffff KERNEL32!SleepEx+0x61

06bffdcc 792e42d5 ffffffff 06bffe04 06bffe34 mscorwks!Thread::UserSleep+0x93

06bffddc 0095a492 06bffde8 ffffffff 06bffe3c mscorwks!ThreadNative::Sleep+0x30

WARNING: Frame IP not in any known module. Following frames may be wrong.

06bffe34 791f9118 06bffeb0 7920fc3f ffffffff 0x95a492

06bffec0 791f91dd 04a619a4 00158010 00147880 mscorwks!MethodTable::CallFinalizer+0xee

06bffed4 791e1f08 04a619a4 7c8099f2 00000000 mscorwks!CallFinalizer+0x84

06bfff6c 791bbed0 04a619a4 00000000 0012fb9c mscorwks!CallFinalizer+0x245

06bfffb4 7c812ccb 00000000 0012fb9c 003e0000 mscorwks!GCHeap::FinalizerThreadStart+0xc2

06bfffec 00000000 791d0020 00000000 00000000 KERNEL32!BaseThreadStart+0x37

 

So, what’s the bit that’s highlighted there? How come WinDbg can’t fathom this stack frame? Essentially what this points to is that in between the mscorwks!MethodTable::CallFinalizer function and mscorwks!ThreadNative::Sleep function we have managed code. If we want to see the managed stack frames then we can issue;

 

!ClrStack

 

0:002> !ClrStack

Thread 2

ESP       EIP    

06bffe04  7c911444 [FRAME: ECallMethodFrame] [DEFAULT] Void System.Threading.Thread.Sleep(I4)

06bffe14  06d4010e [DEFAULT] [hasThis] Void ConsoleApplication12.BadClass.Finalize()

  at [+0x1e] [+0x6]

 

which works on the current thread (i.e. the one that the debugger is “focused” on) and dumps out the managed stack – you can see that we now could guess that we have a problem in BadClass.Finalize which we kind of knew all along 😉

 

We can also do !ClrStack –params –locals –regs –all to include parameter, local and register information which is going to be really helpful for real-world debugging. For managed objects that we find we can then go and do !dumpobj to take a look at those objects.

 

Setting BreakPoints

 

Setting breakpoints isn’t quite as easy for managed code in WinDbg as it might be but it’s far from impossible. What we have to work out to set a breakpoint on a manged function is the address of the code for that function.

 

For our type, BadClass let’s say that we want to set a breakpoint in its constructor. What we do is;

 

1)      Get the MethodTable address for the type by issuing our !Name2EE command.

a.       !Name2EE ConsoleApplication12.exe ConsoleApplication12.BadClass

 

0:002> !Name2EE ConsoleApplication12.exe ConsoleApplication12.BadClass

————————————–

MethodTable: 00935108

EEClass: 06c2334c

Name: ConsoleApplication12.BadClass

 

2)      We can now get hold of the method table for this class by using the !dumpmt command as below;

a.       !dumpmt –md 00935108

 

0:002> !dumpmt -md 00935108

EEClass : 06c2334c

Module : 0015b610

Name: ConsoleApplication12.BadClass

mdToken: 02000002  (t:\Temp\ConsoleApplication12\bin\Release\ConsoleApplication12.exe)

MethodTable Flags : 180000

Number of IFaces in IFaceMap : 0

Interface Map : 00935148

Slots in VTable : 5

————————————–

MethodDesc Table

  Entry  MethodDesc   JIT   Name

79bf84eb 79bf84f0    None   [DEFAULT] [hasThis] String System.Object.ToString()

79bf8473 79bf8478    None   [DEFAULT] [hasThis] Boolean System.Object.Equals(Object)

79bf848b 79bf8490    None   [DEFAULT] [hasThis] I4 System.Object.GetHashCode()

009350fb 00935100    None   [DEFAULT] [hasThis] Void ConsoleApplication12.BadClass.Finalize()

009350eb 009350f0    None   [DEFAULT] [hasThis] Void ConsoleApplication12.BadClass..ctor()

 

3)      We can see that the address for our constructor here is 009350eb. We can provide this as a breakpoint to WinDbg

a.       bp 009350eb

b.      and when that breakpoint hits we can issue a step command (“t”) and we’ll find that we’re inside the function that we wanted to break in.

 

 

Determining Roots

 

Last topic for this post – how to determine what is holding on to your CLR memory. I talked about how we can use !dumpheap to take a look at what’s on the managed heap but we can also use the fantastic !gcroot command in order to determine for any managed object what “set of roots” are actually causing that object to remain “alive” rather than be garbage collected by the GC.

 

This is a particularly useful command if you suspect you’ve got a memory leak somewhere in the sense that something is holding on to a managed object longer than it should be – you can break into the process with WinDbg and take a look at what “roots” are present for your particular object and, hopefully, that’d move you forward in determining what’s going on.

 

An example of the output here on !gcroot is as below let’s use the following piece of code;

 

      class EntryPoint

      {

            private static ArrayList al = new ArrayList();

 

            static void Main(string[] args)

            {

                  for (int i = 0; i < 10; ++i)

                  {

                        switch (i % 3)

                        {

                              case 0:

                                    al.Add("Hello");

                                    break;

                              case 1:

                                    al.Add(new Exception(""));

                                    break;

                              case 2:

                                    al.Add(new object());

                                    break;

                        }

                  }

                  Console.ReadLine();

            }

      }

 

 

So, if I want to know what objects are around at the point of the Console.ReadLine() call then I can use !dumpheap –stat to show me;

 

total 132 objects

Statistics:

      MT    Count TotalSize Class Name

79c53524        1        12 System.IO.TextReader/NullTextReader

79c0c46c        1        12 System.__Filters

79c0bfa8        1        12 System.Reflection.Missing

79c03458        1        12 System.IO.Stream/NullStream

79c8d504        1        16 System.IO.TextReader/SyncTextReader

79c58114        1        20 System.Text.CodePageEncoding/Decoder

79c3667c        1        20 System.Text.UTF8Encoding/UTF8Encoder

79c363c4        1        20 System.IO.TextWriter/NullTextWriter

79c077b4        1        20 System.Collections.ArrayList

79c03efc        1        20 System.Text.UTF8Encoding

79bff0a4        1        20 System.AppDomainSetup

79c61b94        2        24 System.Text.Encoding/DefaultEncoder

79bfcb44        1        32 System.SharedStatics

79bf8364        3        36 System.Object

79c36114        2        40 System.Text.CodePageEncoding

79bfb4f4        2        40 System.Text.StringBuilder

79c61cac        2        48 System.IO.TextWriter/SyncTextWriter

79c57f8c        1        56 System.IO.StreamReader/NullStreamReader

79c53fac        1        56 System.IO.StreamReader

79c618d4        3        60 System.IO.__ConsoleStream

79c0b37c        4        64 System.RuntimeType

79bfc414        1        64 System.ExecutionEngineException

79bfc2dc        1        64 System.StackOverflowException

79bfc1a4        1        64 System.OutOfMemoryException

79bfdbec        1        80 System.AppDomain

79c0c584        3        84 System.Reflection.MemberFilter

79c35d24        3       156 System.IO.StreamWriter

79bfbcdc        3       192 System.Exception

  9526b0        6      1356 System.Byte[]

  95236c       12      2008 System.Char[]

79bf9af8       56      3280 System.String

  14d0f0        6     10220      Free

  95209c        7     10440 System.Object[]

Total 132 objects

 

And if I now want to know what instances of System.Exception are around I can do !dumpheap –mt 79bfbcdc and get;

 

0:003> !dumpheap -mt 79bfbcdc       

 Address       MT     Size

04a619fc 79bfbcdc       64

04a61a48 79bfbcdc       64

04a61a94 79bfbcdc       64

total 3 objects

Statistics:

      MT    Count TotalSize Class Name

79bfbcdc        3       192 System.Exception

Total 3 objects

 

And if I now want to know why the first of these instances is still around I can use !gcroot on it to give me;

 

0:003> !gcroot 04a619fc

Scan Thread 0 (ec4)

ESP:12f644:Root:04a6197c(System.Collections.ArrayList)->04a61990(System.Object[])->04a619fc(System.Exception)

ESP:12f658:Root:04a6197c(System.Collections.ArrayList)->04a61990(System.Object[])

ESP:12f668:Root:04a6197c(System.Collections.ArrayList)->04a61990(System.Object[])

Scan Thread 2 (eb8)

Scan HandleTable 15f008

Scan HandleTable 149c88

HANDLE(Strong):9411c0:Root:05a62000(System.Object[])->04a6197c(System.Collections.ArrayList)

 

I can see that my System.Exception instance at 04a619fc is being referenced (rooted) by an array of objects at 04a61990 and that is rooted by an ArrayList at 04a619c.

 

So, we can chase back object references as far as we need to work out what’s going on with our managed memory and why particular instances are not yet eligible for GC.

 

Working with Exceptions

 

The WinDbg debugger does not know anything about managed exceptions. Whenever managed code throws an exception it throws a native exception of type 0xe0434f4d

 

If you want to stop on managed exceptions then you can issue to the debugger;

 

sxe 0xe0434f4d

 

or

 

sxe clr (much easier!)

 

and it’ll break at the point where the exception is thrown. You can grab the exception record from an unmanaged point of view by issuing a display exception record command with a minus 1 as;

 

.exr -1

 

And this will give you a result such as;

 

0:000> .exr -1

ExceptionAddress: 7c82dad4 (KERNEL32!RaiseException+0x00000038)

   ExceptionCode: e0434f4d (CLR exception)

  ExceptionFlags: 00000001

NumberParameters: 0

 

I think there’s a nice way of turning this into a managed exception object but, at the time of writing, I can’t work it out. What you can do is to execute a !dumpstackobjects and the usual case would be to should find your exception living within the set of objects that you get back there.

 

Wrap-up

 

That’s it for this posting – if you want a more info on debugging with SOS.DLL check out this big guide that’s over on MSDN which will give you more info and also have a look at resources such as;

 

http://blogs.msdn.com/mvstanton/archive/2004/04/05/108023.aspx

http://msdn.microsoft.com/msdnmag/issues/03/06/Bugslayer/default.aspx

 

Enjoy!