In flicking through Computer World’s “What Todays’ Software Developers Need to Know” I stopped for a little while on the section around diagnostics and tracing where Luke Kanies of Puppet Labs was talking about;
“The most important part of an application's lifetime is when it's running in production, yet developers don't spend nearly enough time thinking about how to maintain the application while it runs. For example, is there debugging or performance instrumentation? If it's running slowly, can you tell why? If there are failures, can you trace the failure without the service being down? Can you hide failures from the user but still pass the debugging information on to the sysadmin and developers? Can you tune the application, or deploy more copies, without bringing it down?”
Visual Studio 2013
It’s long been the case that a fair old chunk of your code-base might instrumentation code that’s devoted to diagnostic, performance and usage logging/tracing and, for as long as I can remember, there’s been quite an area of the .NET Framework devoted to those tasks (mostly housed inside of System.Diagnostics).
Alongside that, there’s also the work that Visual Studio can do for you in terms of the huge feature-set that it provides for debugging and diagnostics.
Visual Studio 2013 takes that further and there’s a great reference post over on the Visual Studio Blog from my friend and former UK colleague Daniel Moth providing a whole bunch of links on the new diagnostics areas;
and I’d specifically call out the new UI Responsiveness tools for both XAML and HTML environments – for someone who’s worked for a long time with XAML it’s nice to see the “black box” that has been XAML parsing and rendering opened up after 6-7 years and instrumented so that a developer can see exactly what’s going on. Big points to the XAML team for doing that work.
Windows Store App Instrumentation
In terms of instrumentation, if you’ve got an app out there in the Store and it’s running on (hopefully) thousands, 10’s of thousands or even hundreds of thousands of devices then there are perhaps diagnostics that you want to capture from that app while it’s out there in the wild. These maybe fall into three categories;
- How is the app doing?
- Slow Operations?
- Unexpected retries?
- How is the user using the app?
- Which features are they using?
- Which sections of the app do they spend most time on?
- How to resolve a specific user problem?
- User does X, Y, Z app does W where W is not an expected behaviour.
For crashes, there’s work that the Store does on your behalf when it comes to crash reports so that you can visit the Store dashboard and have a look at your app crashes and download crash dumps in order to load them up into a debugger and take a look. I wrote a little about my own experience of doing that previously.
In terms of analytics – there’s a whole industry around that and you might want to bring in a fully featured analytics package to gather and do deep analysis on that data.
In all cases, it’s likely that you’re going to want to supplement those approaches with some detailed diagnostic instrumentation of your own.
Sidebar – Event Tracing for Windows (ETW) APIs/Tools
For Windows, the main API for this kind of instrumentation is the “Event Tracing for Windows” (ETW) API that was first introduced back in Windows 2000.
That’s a big, flexible set of APIs that were designed to suit pretty much every component’s needs from the OS itself through to a specific application and, over time, instrumentation via ETW has been built into tonnes of Windows components and applications. If you’ve not done much with ETW or want a refresh on it there’s the overview docs or there’s an old ETW article from the MSDN Magazine (2007) that is still well worth a read today which talks about how the system has great performance, supports the idea of collecting flexible events from multiple sources at the same time and provides for on/off instrumentation without having to restart processes or reboot a system.
For a .NET developer building desktop/server code, ETW classes showed up (I think) in .NET 3.5 under System.Diagnostics.Eventing offering specific classes to log events via ETW but also plugging into the standard Trace.WriteX infrastructure by surfacing a TraceListener to listen to Trace.WriteX calls and publish them as events through ETW.
Beyond that, .NET 4.5 recently added System.Diagnostics.Tracing with the intention of simplifying the way in which .NET apps could publish ETW instrumentation via a new EventSource which takes away the need for an application to publish/register a manifest with the system to describe its event types.
There’s a good blog post on how that came about, how to use it and what problem it’s trying to solve over on here on Vance’s blog and followed up here.
This kind of tracing is generally targeted at a developer/administrator rather than an end user – in order to set up tracing you’re typically using tools like logman or perhaps a tool like xperf or maybe one of the newer tools introduced with Windows 8 like the Windows Performance Recorder/Analyzer. I haven’t used those new tools much so I owe it to myself to dig a bit more into WPR/WPA because from an initial play of analysing 10-15 seconds of running Metrotwit on my system I managed to extract a whole bunch of instrumentation data that seemed fascinating to me;
There’s a whole bunch of docs on Windows Performance Recorder/Analyzer up on MSDN and there was a session on it back at BUILD 2011 which I’m going to invest some time in watching;
and my purely uneducated guess would be that the new XAML/HTML UI responsiveness tools that are in Visual Studio 2013 are in some way a subset of what’s present here working from the same ETW events to display that data in Visual Studio in a more targeted manner.
Back to Windows Store App Instrumentation
For a developer of a Windows 8.0 Store app, I think it’s fair to say that the support for high-performance instrumentation was a bit limited/patchy and there are certainly forum posts on this topic;
- In the .NET environment, the System.Diagnostics namespace is lacking the Eventing namespace so there’s no EventProviderTraceListener class but the Tracing namespace does have the EventSource class.
- In the native environment, the APIs for registering and logging from an ETW provider are there like EventRegister and EventWrite but…
In all of these environments, I’m not sure that it’s possible for a Windows 8.0 Store app to register itself as an ETW provider with the system. I can’t find a reference for that so feel free to tell me that’s wrong but I could see why that might be the case because I’m not sure that it’d make sense in the general case for an app coming from the Store to start registering itself as an ETW provider.
What I think that means is that while an app can use ETW tracing it can only do it in the sense that it can publish events effectively back to itself rather than to a session that can then be consumed by any consumer on the system. If the app is generating those events then it can choose what to do with them such as plugging in some listener for the events which periodically writes them to the filesystem.
That said, if those are the sorts of scenarios that can be achieved with the APIs provided it doesn’t really feel to me like the APIs are a good fit for that – it seems cumbersome to have to jump through some hoops to get logging done from a Store app with that set of APIs.
Fortunately, that changes with Windows 8.1.
Windows 8.1 – Adding Windows.Foundation.Diagnostics
There’s a video that covers these APIs from BUILD 2013;
with the section on these new APIs beginning at 17 minutes into the video although I’d encourage you to watch from the start because Harry walks through the aspects of Store diagnostics/error reports before he switches to these new APIs.
The essence of the API set is that it accommodates 2 scenarios;
- Logging to a circular memory buffer such that at some point in time when an error condition is detected (perhaps detected by your code or perhaps signified by a crash) it’s possible to get the OS to flush out the log from the buffer so that you can get something of a “flight recorder” log of what caused the error condition.
- Logging to files with the ability to keep a limit on the size of those files and automatically start new files when an old one is full.
I can see the former being used to figure out why an application is misbehaving – e.g.
- When the app crashes, flush the tracing log and submit it (manually or automatically) to some web service or email address for analysis.
- When the app is misbehaving from a functional point of view, the developer might offer a “dump log” UI option somewhere to capture recent events for automatic/manual submission.
I can see the latter perhaps being used more to analyse how an application works in regular use – e.g.
- Capture logs automatically (as part of an opt-in “improve the experience” option for the user) and submit them periodically (and perhaps automatically) to some back end service where they can be analysed.
In terms of coding against the APIs the main conceptual pieces are the LoggingSession and the LoggingChannel and there is a sample in all 3 implementation technologies on the dev centre illustrating their use. At the time of writing, I haven’t quite got my head around exactly how I’d use sessions/channels in my own apps so, to keep this post shorter, I’ll follow up with another post on actually using those classes to get some logging done.