Published Tuesday, October 03, 2006 4:23 AM by mtaulty

Death of the TCHAR?

The Windows API as it's implemented in all versions of Windows NT is a Unicode API. That means that all the strings that you pass into it are 2-byte character strings and as a C++ programmer you use wchar_t to represent that (there's lots of other definitions of that same type in header files all over the platform).

On the older Windows 9.X platforms this wasn't the case so you passed single-byte-wide chars into API's and that would have caused you a problem when you came to port your code to NT so the guys made sure that didn't hurt you too much by providing two variants of each API. So, even today if you look at the exports from (say) kernel32.dll you'll spot (as an example);

CreateFileA

CreateFileW

and the platform makes sure that CreateFile is either #defined to be CreateFileW if you've got the UNICODE symbol defined and otherwise it'll be defined to CreateFileA. If you end up calling the CreateFileA routine then on NT it's going to widen out that string to Unicode and then call the CreateFileW routine.

If you wanted to be able to write code that could compile when the UNICODE symbol was defined and when it wasn't then you needed to have 2 different kinds of strings going on in your program and one of the nice ways of doing that was to use the TCHAR type which is hash defined as one of either of these two depending on the UNICODE symbol;

typedef wchar_t TCHAR;

typedef char TCHAR;

and that let you use TCHAR* and std::basic_string<TCHAR> and so on and, once again, if you compiled with/without the UNICODE symbol to the pre-processor then you either got 1-byte wide strings going in to non-Unicode functions or you got 2-byte wide strings going in to the Unicode variants.

This still has relevance today. If you're looking to set breakpoints in a debugger then you need to know about CreateFileW and CreateFileA (and all the 1000's of other similar API's) and if you're writing PInvoke code in .NET then you come up against the heuristics that the DllImport functionality applies based upon the CharSet property which can be Auto/Ansi/Unicode and determines which function is picked when you have something like;

[DllImport("kernel32")]

private static extern IntPtr CreateFile(......)

I've noticed that new API's that are being added for Vista no longer seem to follow this pattern and I guess that ultimately what that signifies is the death of the Windows 9.x platform in that there's no longer a need for API's to have a A and a W variant because those API's are added for Vista only and, consequently, they're not going to be called from code that isn't Unicode.

I'm not sure if this happened for XP but I think some of the 9.x platforms were still supported at that point and so I don't think it did which would make Vista the first of the NT variants that finally shrugs off this bit of heritage and moves forward :-)

So, bear in mind that with Vista there are API's (e.g. the P2P API's) that have only a single name and only take Unicode strings as this whole (useful but confusing) period drifts off into history :-)

Update

I'm not 100% sure on this one yet but Daniel was talking to me about problems he was having with PInvoking across to one of the new Windows Error Reporting functions WerRegisterFile and he was finding that if he did something like;

[DllImport("kernel32")]

private static extern int WerRegisterFile(...)

then that was not working whereas if he went back and explicitly chose the CharSet.Auto or CharSet.Unicode then it worked.

I had a bit of a poke around in the sample that he'd got and my thoughts were;

  1. There is no WerRegisterFileA and WerRegisterFileW, there's just WerRegisterFile and that's a Unicode version.
  2. CharSet on the DllImport class defaults to CharSet.Ansi for C# and VB
  3. At runtime, the PInvoke goo is going to go looking for WerRegisterFileA and falls back to WerRegisterFile.
  4. Again, the PInvoke goo thinks that WerRegisterFile is an ANSI function so it marshals the Unicode string to ANSI then passes that ANSI string into the Unicode function WerRegisterFile which then doesn't like the string very much.

So, I think that if you want to DllImport one of these new Vista functions from C# or VB code then you need to be explicit about your CharSet rather than letting it default to CharSet.Ansi.

Daniel will no doubt post something more complete about this in the next day or two and I'll link this post to his post when it's up.