Rss Feed
Tweeter button
Facebook button
Technorati button
Reddit button
Myspace button
Linkedin button
Webonews button
Delicious button
Digg button
Stumbleupon button
Newsvine button
Software Verification logo

Posts Tagged ‘C++’

How to embed data into a resource

Saturday, August 7th, 2010

In this article I will demonstrate how you can embed data into a Windows PE format executable (EXE or DLL). At the end I will also provide a working example which you can use to embed data into your executable as custom resources.

The problem

Often software requires ancillary data to support the software we write. This data can reside in files on your hard disk, on a network computer or on a computer accessed across the Internet. Or the data can be embedded in your executable. There is no correct solution for all cases. You have to choose the correct solution for the task at hand. I’ll briefly describe the four methods, outlining the potential pitfalls involved.

  • Loading the data from disk. You need to locate the file and read the contents of the file. What happens if the file is missing? If the file is present and readable has it been modified by accident or has been deliberately tampered with? You will need a mechanism to detect this if appropriate.
  • Loading the data from a network computer. This is similar to loading the file from the disk except that you need to know the network computer name.
  • Loading the data from the a computer on the Internet. This is more complex, now you need engage in some protocol to download the file. What if the Internet connection is not available or is refused?
  • Embedding the data in your executable. Embedding the data is harder than creating a file, and reading the data is harder than reading a file. However, the data will always be available. If you application uses checksums (MD5, etc) or is digitally signed then you will know if the embedded data has been modified or tampered with.

Embedding data

Sometimes it would be more convenient if the data was embedded right into the executable we are creating.

There may be no convenient method for embedding the data. Visual Studio provides a means to embed data. You could transcribe the data by hand. But that would be time consuming, expensive, error prone and tedious. Alternatively you can add a custom resource, then edit the properties for the custom resource and identify the file that contains the data you wish to embed into the executable. We have tried this but there are no error messages for when the file cannot be found (you made a typing error typing the filename) and there is no way to conditionally change which custom resource is embedded depending on the build.

Fortunately, Windows provides an API for adding data to the resource section of an executable (.exe or .dll). The API also provides mechanisms for finding this data. With the use of the API we can create a helper application to embed as many custom resources as you want after you have built your executable.

For this example I will assume the data we adding to the executable is not data you would normally find in a resource. This means we will be adding a custom resource.

Let us say we want to add a Java class file to our executable so that we can find this class file at runtime without knowing anything about the current Java CLASSPATH or the file system. Once we’ve extracted the class file we could use it to define a class that would then be used by the Java Virtual Machine to do the work we want (presumably somewhere else we’ll be instrumenting Java class files so they know about the Java class we just defined).

We need a few things first, which we will also need when we come to extract the resource from the executable.

  • Executable to add the resource to.
  • Type name for the custom resource.
  • Name for the custom resource.
  • Data for the custom resource.

For our Java class file example, type could be “CLASSFILE”, name could be “myJavaSpy” and data would be the byte code for the class myJavaSpy which we would load from the file myJavaSpy.class (having previously compiled it from myJavaSpy.java).

The API

BeginUpdateResource

    HANDLE BeginUpdateResource(const TCHAR *executableName,
                               BOOL        fDeleteExistingResources);

Call BeginUpdateResource() to open the specified executable and return a resource handle. Pass TRUE for the second argument to erase all existing resources, pass FALSE to keep any existing resources in the executable.

UpdateResource

    BOOL UpdateResource(HANDLE  hUpdate,
                        LPCTSTR lpType,
                        LPCTSTR lpName,
                        WORD    wLanguage,
                        LPVOID  lpData,
                        DWORD   cbData);

Call UpdateResource() to update a resource in the executable represented by the handle hUpdate. Specify the type, name, language (locale) and the data with the remaining arguments. For our example above lpType would be “CLASSFILE” and lpName would be “myJavaSpy”. Pass MAKELANGID(LANG_NEUTRAL, SUBLANG_NEUTRAL) for language. Pass the java byte code and the lenght of the byte code for the last two arguments.

EndUpdateResource

    EndUpdateResource(HANDLE hUpdate,
                      BOOL   fDiscard);

Call EndUpdateResource() to finish updating the resource. If you wish to discard your changes, pass TRUE as the second argument. If you wish to keep your changes, pass FALSE as the second argument.

Putting it together

    HANDLE hUpdateRes;

    // Open the file to which you want to add the dialog box resource. 

    hUpdateRes = BeginUpdateResource(executableName,
                                     FALSE);          // do not delete existing resources
    if (hUpdateRes != NULL)
    {
        BOOL   result; 

        // Add the dialog box resource to the update list. 

        result = UpdateResource(hUpdateRes,
                                customType,
                                customName,
                                MAKELANGID(LANG_NEUTRAL, SUBLANG_NEUTRAL),
                                bytes,
                                numBytes);
	if (result)
	{
            // Write changes to the input DLL and then close it

            EndUpdateResource(hUpdateRes, FALSE);
        }
    }

First we call BeginUpdateResource() to open the executable for resource updating. We pass FALSE as the second argument to make sure we keep the existing resources and only add our new resource. This calls returns an update handle.

If the call to BeginUpdateResource() is successful we received a non NULL update handle. We use to call UpdateResource() passing the type and name of resource data we wish to update along with the data to update and its length. In this example we have specified a neutral locale.

Finally we call EndUpdateResource() to finish updating the resource and to write the results back to the executable (pass FALSE as the second argument).

addResourceToDLL

addResourceToDLL.exe is command line program that you can add to your post-build process to embed custom resources into your EXE/DLL as you build. It has a quiet mode so that you can suppress any information and/or error messages it may emit. I don’t use the quiet mode, I like to see the confirmation message that it succeeded embedding data into the DLL. Run without arguments to get the help message.

Help summary

All arguments are mandatory unless noted otherwise.

  • -moduleName pathToDLL (or EXE)
  • -customResource pathToCustomResource
  • -customType type
  • -customName name
  • -quiet (optional)
  • Example:

    addResourceToDLL.exe -moduleName c:\myJavaDetective\myJavaDetective.dll -customResource c:\myJavaDetective\myJavaSpy.class -customType CLASSFILE -customName myJavaSpy

    The example above embeds the myJavaSpy.class file into myJavaDetective.dll with the type “CLASSFILE” and name “myJavaSpy”.

    Download

    Download the addResourceToDLL source code.

    Download the addResourceToDLL executable.

    In the next article I will show you how to read the embedded data from the resource.

Doing good work can make you feel a bit stupid

Monday, July 19th, 2010

Doing good work can make you feel a bit stupid, well thats my mixed bag of feelings for this weekend. Here is why…

Last week was a rollercoaster of a week for software development at Software Verification.

Off by one, again?

First off we found a nasty off-by-one bug in our nifty memory mapped performance tools, specifically the Performance Validator. The off-by-one didn’t cause any crashes or errors or bad data or anything like that. But it did cause us to eat memory like nobodies business. But for various reasons it hadn’t been found as it didn’t trigger any of our tests.

Then along comes a customer with his huge monolithic executable which won’t profile properly. He had already thrown us a curve balled by supplying it as a mixed mode app – half native C++, half C#. That in itself causes problems with profiling – the native profiler has to identify and ignore any functions that are managed (.Net). He was pleased with that turnaround but then surprised we couldn’t handle his app, as we had handled previous (smaller) versions of his app. The main reason he was using our profiler is that he had tried others and they couldn’t handle his app – and now neither could we! Unacceptable – well that was my first thought – I was half resigned to the fact that maybe there wasn’t a bug and this was just a goliath of an app that couldn’t be profiled.

I spent a day adding logging to every place, no matter how insignificant, in our function tree mapping code. This code uses shared memory mapped space exclusively, so you can’t refer to other nodes by addresses as the address in one process won’t be valid in the other processes reading the data. We had previously reorganised this code to give us a significant improvement in handling large data volumes and thus were surprised at the failure presented to us. Then came a long series of tests, each which was very slow (the logging writes to files and its a large executable to process). The logging data was huge. Some of the log files were GBs in size. Its amazing what notepad can open if you give it a chance!

Finally about 10 hours in I found the first failure. Shortly after that I found the root cause. We were using one of our memory mapped APIs for double duty. And as such the second use was incorrect – it was multiplying our correctly specified size by a prefixed size offset by one. This behaviour is correct for a different usage. Main cause of the problem – in my opinion, incorrectly named methods. A quick edit later and we have two more sensibly named methods and a much improved memory performance. A few tests later and a lot of logging disabled and we are back to sensible performance with this huge customer application (and a happy customer).

So chalk up one “how the hell did that happen?” followed by feelings of elation and pleasure as we fixed it so quickly.
I’m always amazed by off-by-one bugs. It doesn’t seem to matter how experienced you are – it does seem that they do reappear from time to time. Maybe that is one of the persils of logic for you, or tiredness.

I guess there is a Ph.D. for someone in studying CVS commits, file modification timestamps and off-by-one bugs and trying to map them to time-of-day/tiredness attributes.

That did eat my Wednesday and Thursday evenings, but it was worth it.

Not to be outdone…

I had always thought .Net Coverage Validator was a bit slow. It was good in GUI interaction tests (which is part of what .Net Coverage Validator is about – realtime code coverage feedback to aid testing) but not good on long running loops (a qsort() for example). I wanted to fix that. So following on from the success with the C++ profiling I went exploring an idea that had been rattling around in my head for some time. The Expert .Net 2.0 IL Assembler book (Serge Lidin, Microsoft Press) was an invaluable aid in this.

What were we doing that was so slow?

The previous (pre V3.00) .Net Coverage Validator implementation calls a method for each line that is visited in a .Net assembly. That method is in a unique DLL and has a unique ID. We were tracing application execution and when we found our specific method we’d walk up the callstack one item and that would be the location of a coverage line visit. This technique works, but it has a high overhead:

  1. ICorProfiler / ICorProfiler2 callback overhead.
  2. Callstack walking overhead.

The result is that for GUI operations, code coverage is fast enough that you don’t notice any problems. But for long running functions, or loops code coverage is very slow.

This needed replacing.

What are we doing now that is so fast?

The new implementation doesn’t trace methods or call a method of our choosing. For each line we modify a counter. The location of the counter and modification of it are placed directly into the ilAsm code for each C#./VB.Net method. Our first implementation of .Net Coverage Validator could not do this because our shared memory mapped coverage data architecture did not allow it – the shared memory may have moved during the execution run and thus the embedded counter location would be invalidated. The new architecture allows the pointer to the counter to be fixed.

The implementation and testing for this only took a few hours. Amazing. I thought it was going to fraught with trouble, not having done much serious ilAsm for a year or so.

Result?

The new architecture is so lightweight that you barely notice the performance overhead. Less than 1%. Your code runs just about at full speed even with code coverage in place.

As you can imagine, getting that implemented, working and tested in less than a day is an incredible feeling. Especially compared to the previous performance level we had.

So why feel stupid?

Having acheived such good performance (and naturally feeling quite good about yourself for a while afterwards) its hard not to look back on the previous implementation and think “Why did we accept that?, We could have done so much better”. And that is where the feeling stupid comes in. You’ve got to be self critical to improve. Pat yourself on the back for the good times and reflect on the past to try to recognise where you could have done better so that you don’t make the same mistake in the future.

And now for our next trick…

The inspiration for our first .Net Coverage Validator implementation came from our Java Coverage Validator tool. Java opcodes don’t allow you to modify memory directly like .Net ilAsm does, so we had to use the method calling technique for Java. However given our success with .Net we’ve gone back to the JVMTI header files (which didn’t exist when we first wrote the Java tools) and have found there may be a way to improve things. We’ll be looking at that soon.

How to prevent a memory tool from monitoring your C/C++ allocations

Saturday, July 10th, 2010

A little known fact is that the Microsoft C Runtime (CRT) has a feature which allows some allocations (in the debug runtime) to be tagged with flags that causes these allocations to be ignored by the built in memory tracing routines. A good memory allocation tool will also use these flags to determine when to ignore memory allocations – thus not reporting any allocations that Microsoft think should remain hidden.

A customer problem

The inspiration for this article was a customer reporting that Memory Validator was not reporting any allocations in a particular DLL of his mixed mode .Net/native application. The application was interesting in that it was a combination of C#, C++ written with one version of Visual Studio and some other DLLs also written in C++ with another version of Visual Studio. Only the memory for one of the DLLs was not being reported by Memory Validator and the customer wanted to know why and could we please fix the problem?

After some investigation we found the problem was a not with Memory Validator but with the DLL in question making a call to _CrtSetDbgFlag(0); which turned off all memory tracking for that DLL. Memory Validator honours the memory tracking flags built into Visual Studio and thus did not report these memory allocations. Armed with this information the customer did some digging into their code base and found that someone had deliberately added this call into their code. Removing the call fixed the problem.

The rest of this article explains how Microsoft tags data to be ignored and what flags are used to control this process.

Why does Microsoft mark these allocation as ignore?

The reason for this is that these allocations are for internal housekeeping and sometimes also for one-off allocations that will exist until the end of the application lifetime. Such allocations could show up as memory leaks at the end of the application – that would be misleading as they were intended to persist. Better to mark them as “ignore” and not report them during a memory leak report.

Microsoft debug CRT header block

Microsoft’s debug CRT prefixes each allocation with a header block. That header block looks like this:

#define nNoMansLandSize 4

typedef struct _CrtMemBlockHeader
{
    struct _CrtMemBlockHeader * pBlockHeaderNext;
    struct _CrtMemBlockHeader * pBlockHeaderPrev;
    char *                      szFileName;
    int                         nLine;
#ifdef _WIN64
    /* These items are reversed on Win64 to eliminate gaps in the struct
     * and ensure that sizeof(struct)%16 == 0, so 16-byte alignment is
     * maintained in the debug heap.
     */
    int                         nBlockUse;
    size_t                      nDataSize;
#else  /* _WIN64 */
    size_t                      nDataSize;
    int                         nBlockUse;
#endif  /* _WIN64 */
    long                        lRequest;
    unsigned char               gap[nNoMansLandSize];
    /* followed by:
     *  unsigned char           data[nDataSize];
     *  unsigned char           anotherGap[nNoMansLandSize];
     */
} _CrtMemBlockHeader;

How does Microsoft tag an allocation as ignore?

When the CRT wishes an allocation to be ignored for memory tracking purposes, six values in the debug memory allocation header for each allocation are set to specific values.

Member Value #define
nLine 0xFEDCBABC IGNORE_LINE
nBlockUse 0×3 IGNORE_BLOCK
lRequest 0×0 IGNORE_REQ
szFileName NULL
pBlockHeaderNext NULL
pBlockHeaderPrev NULL

The Microsoft code goes out of its way to ensure no useful information can be gained from the header block for these ignored items.

When we first created MV we noticed that items marked as ignored should be ignored, otherwise you can end up with FALSE positive noise reported at the end of a memory debugging session due to the internal housekeeping of MFC/CRT.

How can you use this information in your application?

Microsoft also provides some flags which you can control which allows you to influence if any memory is reported as leaked. This is in addition to the CRT marking its own allocations as “ignore”. You can set these flags using the _CrtSetDbgFlag(int); function.

The following flags can be passed to _CrtSetDbgFlag() in any combination.

Flag Default Meaning
_CRTDBG_ALLOC_MEM_DF On On: Enable debug heap allocations and use of memory block type identifiers.
_CRTDBG_CHECK_ALWAYS_DF Off On: Call _CrtCheckMemory at every allocation and deallocation request. (Very slow!)
_CRTDBG_CHECK_CRT_DF Off On: Include _CRT_BLOCK types in leak detection and memory state difference operations.
_CRTDBG_DELAY_FREE_MEM_DF Off Keep freed memory blocks in the heap’s linked list, assign them the _FREE_BLOCK type, and fill them with the byte value 0xDD. CAUTION! Using this option will use lots of memory.
_CRTDBG_LEAK_CHECK_DF Off ON: Perform automatic leak checking at program exit via a call to _CrtDumpMemoryLeaks and generate an error report if the application failed to free all the memory it allocated.

How do I disable memory tracking for the CRT?

If you call _CrtSetDbgFlag(0); any memory allocated after that point will not be tracked.

With the above settings, all blocks are marked as ignore. You can see the code for this in the Microsoft C runtime.

The code that marks the block as “ignore” is at line 404 in dbgheap.c in the Microsoft C runtime (also used by MFC). When your code arrives here, nBLockUse == 1 and _crtDbgFlag == 0.

dbgheap.c line 404 (line number will vary with Visual Studio version)
                if (_BLOCK_TYPE(nBlockUse) != _CRT_BLOCK &&
                     !(_crtDbgFlag & _CRTDBG_ALLOC_MEM_DF))
                     fIgnore = TRUE;

This sets fIgnore to TRUE. From this point onwards the memory tracking code ignores the memory and sets the values mentioned above in the memory block header.

Default values

The default value for _crtDbgFlag is set elsewhere in the Microsoft code with this line:

extern "C"
int _crtDbgFlag = _CRTDBG_ALLOC_MEM_DF | _CRTDBG_CHECK_DEFAULT_DF;

Don’t use srand(clock()), use srand((unsigned)time(NULL)) instead

Friday, July 9th, 2010

Typically you use srand() when you need to start the random number generator in a random place. You may do this because you are going to generate some keys or coupons and want them to start in an unpredictable place.

From time to time we provide special offers to customers in the form of a unique coupon code that can be used at purchase to get a specific discount. These coupons are also used to provide discounts to customers upgrading from say Performance Validator to C++ Developer Suite so that they do not pay for Performance Validator twice.

When the coupon management system was written, we used srand(clock()) thinking that would be an acceptable random value for generating coupons. The thinking was the management system would be running all the time and thus clock() would return a value that was unlikely to be hit twice for the number of valid coupons at any one time. However, the way the system is used is that users close the coupon management system when not in use and thus clock() will return values close to the starting time (start the app, navigate to the appropriate place, generate a coupon).

Result: Sooner or later a duplicate coupon is created. And that is when we noticed this problem.

This resulted in a confused customer (“My coupon has already been used”), a confused member of customer support (“That shouldn’t be possible!”) followed by some checking of the coupon files and then the code to see how it happened. Easy to fix, but better selection of the seed in the first place would have prevented the problem.

So if you want better random numbers don’t use clock() to seed srand().

Better seeds

  • Use time(NULL) to get the time of day and cast the result to seed srand().
    time(NULL) returns the number of seconds elapsed since midnight January 1st, 1970.
  • Use rdtsc() to get the CPU timestamp and cast the result to seed srand().
    rdtsc() is unlikely to return duplicate values as it returns the number of instructions executed by the processor since startup.

How to do cpuid and rdtsc on 32 bit and 64 bit Windows

Friday, June 4th, 2010

With the introduction of WIN64 the C++ compiler has many improvements and certain restrictions. One of those restrictions is no inline assembly code. For those few of us that write hooking software this is a real inconvenience. Inline assembly is also useful for adding little snippets of code to access hardware registers that are not so easy to access from C or C++.

The 64 bit compiler also introduces some intrinsics which are defined in intrinsic.h. These intrinsics allow you to add calls to low level functionality in your 64 bit code. Such functionality includes setting breakpoints, getting CPU and hardware information (cpuid instruction) and reading the hardware timestamp counter.

In this article I’ll show you how you can use the same code for both 32 bit and 64 bit builds to have access to these intrinsics on both platforms.

__debugbreak()

The 64 bit compiler provides a convenient way for you to hard code breakpoints into your code. Often very useful for putting breakpoints in your code during testing. The __debugbreak() intrinsic provides this functionality.

There is no 32 bit __debugbreak();

For 32 bit systems you have to know 80386 assembly. The breakpoint instruction as opcode 0xcc. The inline assembly for this is __asm int 3;

	#define __debugbreak()				__asm { int 3 }

cpuid – 32 bit

On 32 bit systems there is no cpuid assembly instruction so you have to use the emit directive.

	#define cpuid	__asm __emit 0fh __asm __emit 0a2h

and then you can use cpuid anywhere you need to.

	void doCpuid()
	{
		__asm pushad;		// save all the registers - cpuid trashes EAX, EBX, ECX, EDX
		__asm mov eax, 0; 	// get simplest cpuid data

		cpuid;			// call cpuid, results returned in EAX, EBX, ECX, EDX

		// read cpuid results here before restoring the registers...

		__asm popad;		// restore registers
	}

cpuid – 64 bit

On 64 bit systems you have to use the intrinsic __cpuid(registers, 0) provided in the intrinsic.h file.

	void doCpuid()
	{
	    int registers[4];

	    __cpuid(registers, 0);
	}

rdtsc – 32 bit

On 32 bit systems there is no rdtsc assembly instruction so you have to use the emit directive.

	#define rdtsc	__asm __emit 0fh __asm __emit 031h

and then you can use rdtsc anywhere you need to.

	__int64 getTimeStamp()
	{
	    LARGE_INTEGER li;

	    rdtsc;

	    __asm	mov	li.LowPart, eax;
	    __asm	mov	li.HighPart, edx;
	    return li.QuadPart;
	}

rdtsc – 64 bit

On 64 bit systems you have to use the intrinsic __rdtsc() provided in the intrinsic.h file.

	__int64 getTimeStamp()
	{
	    return __rdtsc();
	}

Thats not very portable is it?

The problem with the above approach is that you end up having two implementations for these functions – one for your 32 bit build and one for your 64 bit build. It would be much more elegant to have a drop in replacement that you can use in your 32 bit code that will compile in the same manner as the 64 bit code that uses the intrinsics defined in intrinsic.h

Here is how you do it. Put all of the code below into a header file and #include that header file wherever you need access to __debugbreak(), __cpuid() or rdtsc().

#ifdef _WIN64
#include 
#else	// _WIN64
	// x86 architecture

	// __debugbreak()

	#if     _MSC_VER >= 1300
		// Win32, __debugbreak defined for VC2005 onwards
	#else	//_MSC_VER >= 1300
		// define for before VC 2005

		#define __debugbreak()				__asm { int 3 }
	#endif	//_MSC_VER >= 1300

	// __cpuid(registers, type)
	//		registers is int[4],
	//		type = 0

	// DO NOT add ";" after each instruction - it screws up the code generation

	#define rdtsc	__asm __emit 0fh __asm __emit 031h
	#define cpuid	__asm __emit 0fh __asm __emit 0a2h

	inline void __cpuid(int	cpuInfo[4],
						int	cpuType)
	{
		__asm pushad;
		__asm mov	eax, cpuType;

		cpuid;

		if (cpuInfo != NULL)
		{
			__asm mov	cpuInfo[0], eax;
			__asm mov	cpuInfo[1], ebx;
			__asm mov	cpuInfo[2], ecx;
			__asm mov	cpuInfo[3], edx;
		}

		__asm popad;
	}

	// __rdtsc()

	inline unsigned __int64 __rdtsc()
	{
		LARGE_INTEGER	li;

		rdtsc;

		__asm	mov	li.LowPart, eax;
		__asm	mov	li.HighPart, edx;
		return li.QuadPart;
	}

#endif	// _WIN64

Now you can just the the 64 bit style intrinsics in your code and not have to worry about any messy condition code for doing the 32 bit inline assembly or the 64 bit intrinsics. Much neater, elegant, readable and more maintainable.

Additional Information

If you wish to know more about __cpuid(), the parameters it takes and the values returned in the registers array, Microsoft have a __cpuid() instrinsic description which explains everything in great detail.

Cupid instruction

When I wrote this article I kept typing cupid instead of cpuid. I’m sure my mind was on something else :-) . How would the cupid instruction be implemented…

How to replace IsBadReadPtr?

Wednesday, May 26th, 2010

The Microsoft Win32 API contains various functions that look useful at first sight but which have now become regarded as a pariahs. A good example of this is the IsBadReadPtr() function.

Under the hood this function uses structured exception handling to catch any exceptions that are thrown when the memory location read. If an exception is thrown the function returns TRUE, otherwise FALSE. So far so good.

So what could be wrong with that? A simplistic or naive interpretation would be “nothing”. But that ignores the fact that the callstacks of your application threads grow “on-demand”. This is done to avoid committing the full (default) 1MB upfront for each thread. This places a lower demand on application virtual memory and provides slightly faster startup for each thread. To allow each thread to grow on demand the stack has guard pages, which if you try to access them cause a guard page exception to be thrown which the OS handles gracefully, extends your stack space an appropriate amount and then returns execution to your application.

The problem with IsBadReadPtr() is that the exception handler inside IsBadReadPtr() eats the exception and thus the OS will not see it. So for the case where you end up using IsBadReadPtr() on a guard page you break the on-demand stack extension mechanism.

Raymond Chen of Microsoft has written a passionate post on this topic.

Raymond (and a few other folks) say that you should never use IsBadReadPtr(). I think thats a bit strong.

There are a few occasions where you may know what the datastructure is but you also know that it may have various memory protections on it. Such a case is when inspecting a DLL. Various parts are readonly. We have found during the last 10 years of writing tools like Memory Validator that it is not uncommon for a DLL loaded by LoadLibrary() to have memory protections on parts of the DLL that you don’t expect. We can’t control what DLLs our customer’s applications choose to load, so we have to handle all eventualities. We can’t just allow a crash to happen because we read a data location (in a customer DLL) that should be valid but isn’t.

Its also worth noting that the members of the team that wrote Boundschecker also came to the same conclusion and also tested certain DLL headers this way. You can find such code examples in the BugSlayer column in issues of Microsoft Systems Journal (MSJ) before it morphed into MSDN magazine.

One argument would be “Put an exception handler around it. Its an exceptional condition, handle it that way”.

The problem with that is sometimes that breaks the flow of the code and causes all manner of problems with the inability to mix C++ objects and SEH in the same function. Sometimes its much easier and simpler just to test for readability and abandon the function if you encounter one of these unusually constructed DLLs.

We are not advocating that you routinely use IsBadReadPtr() to hide the fact that you don’t know which objects to free, so you just call free on anything that passes IsBadReadPtr(). If you do that you will end up with exactly the problems that Raymond Chen describes.

But for the case where you do want IsBadReadPtr() functionality but you don’t want to use IsBadReadPtr(), what do you do? Here are drop in replacements for IsBadReadPtr() and IsBadWritePtr() that will not affect guard pages etc.

int isNotOKToReadMemory(void    *ptr,
                        DWORD   size)
{
	SIZE_T                          dw;
	MEMORY_BASIC_INFORMATION	mbi;
	int                             ok;

	dw = VirtualQuery(ptr, &mbi, sizeof(mbi));
	ok = ((mbi.Protect & PAGE_READONLY) ||
		  (mbi.Protect & PAGE_READWRITE) ||
		  (mbi.Protect & PAGE_WRITECOPY) ||
		  (mbi.Protect & PAGE_EXECUTE_READ) ||
		  (mbi.Protect & PAGE_EXECUTE_READWRITE) ||
		  (mbi.Protect & PAGE_EXECUTE_WRITECOPY));

	// check the page is not a guard page

	if (mbi.Protect & PAGE_GUARD)
		ok = FALSE;
	if (mbi.Protect & PAGE_NOACCESS)
		ok = FALSE;

	return !ok;
}

int isNotOKToWriteMemory(void   *ptr,
                         DWORD  size)
{
	SIZE_T                          dw;
	MEMORY_BASIC_INFORMATION	mbi;
	int                             ok;

	dw = VirtualQuery(ptr, &mbi, sizeof(mbi));
	ok = ((mbi.Protect & PAGE_READWRITE) ||
		  (mbi.Protect & PAGE_WRITECOPY) ||
		  (mbi.Protect & PAGE_EXECUTE_READWRITE) ||
		  (mbi.Protect & PAGE_EXECUTE_WRITECOPY));

	// check the page is not a guard page

	if (mbi.Protect & PAGE_GUARD)
		ok = FALSE;
	if (mbi.Protect & PAGE_NOACCESS)
		ok = FALSE;

	return !ok;
}

Remember: Use with caution, use sparingly and only if you need to. If you are routinely using IsBadReadPtr() or an equivalent to avoid keeping track of which data you should or should not use, you should think again about your software design.

The cost of using OutputDebugString

Friday, May 21st, 2010

Should you use OutputDebugString? Is that an unusual question to ask? How often have you thought about the potential cost of using OutputDebugString (or TRACE, which uses OutputDebugString under the hood)?

Benefits of using OutputDebugString

The benefit of using OutputDebugString is the ability to leave a relatively benign function in your code that can output useful textual information which can be monitored by a debugger or a suitable utility (such as DebugView).

The TRACE() macro allows you to output information, but only in Debug builds. OutputDebugString() allows you to output the information in Debug and Release builds.

Problems caused by using OutputDebugString

The problem of using OutputDebugString is that it has a performance overhead, which although minimal outside a debugger is much higher in a debugger. If in a busy loop this overhead can be an unwanted burden.

It may be that the information being output by OutputDebugString is not information that you want your customers (or competitors!) to see.

Finally, depending on your software application, it may be that your customers will be using your software component (for example our injected monitoring DLLs) with their software in their debugger, debugging their software. In that situation, your customer may not appreciate the extra OutputDebugString() information filling up their Output tab with information and obscuring whatever information their own OutputDebugString() usage is providing.

I’m sorry to say it, but we have been guilty of this in the past! You may want to check your code to ensure you are not doing this by accident. Its all to easy to let things like this happen – after all there is no obvious adverse effect (like a crash or undefined behaviour) to fail your testing patterns.

Performance cost of using OutputDebugString

I’ve noticed questions asking about the cost of OutputDebugString() on a few forums but never seen any hard numbers to go with the received opinions offered. Being a curious animal I decided that I should investigate. The benefits being that I get to scratch this curious itch and if the news is bad we get to make a few modifications to the software at Software Verification.

OutputDebugString() comes in two flavours, OutputDebugStringA() for ANSI builds and OutputDebugStringW() for Unicode builds. The tests I ran tested both of these WIN32 APIs on Windows XP 64 bit, on a dual quad core Intel Xeon running at 2.83GHz. All tests done under the same load.

Testing OutputDebugString

To test the APIs we need to test a few scenarios:

  • Calling the API when running the test application outside of a debugger. This will be the equivalent of if you leave OutputDebugString() calls in your release mode application and ship it to your customer.
  • Call the API when running the test application in a debugger. This tests the overhead of OutputDebugString() communicating with the debugger so that the debugger can display the message on its output pane (assuming the debugger does that).
  • Call the API when running the test application in a debugger, adding a \r\n at the end of each line. This tests the overhead of OutputDebugString() communicating with the debugger so that the debugger can display the message on its output pane (assuming the debugger does that).

We have chosen to test with Visual Studio 6.0, Visual Studio 2005 and Visual Studio 2010. We have chosen these two IDEs/debuggers since VS6 is the old pre-.Net IDE which is still well loved by a lot of developers. We have also chosen VS2005 because based on what we can tell from our customer base this is a very popular IDE/debugger.

The test for each scenario consists of outputting three different size strings 4000 times. The three strings are a zero length strings, a short string and a longer string. The test is also repeated with the same strings with \r\n appended to the non-zero length strings. We added this test when we realized that the Visual Studio output panel behaves differently between VS6 and VS2005 for lines that do not contain \r\n at the end.

You can download full source code and project files for both VS6 and VS2005 so that you can build and run these tests for yourself. For VS2010 you can load the VS2005 solution file and convert it automatically during the load.

timeOutputDebugString screenshot

Results

We have 7 groups of test results spanning no debugger, Visual Studio 6, Visual Studio 2005, and Visual Studio 2010.

Test 1 – No Debugger

OutputDebugString() called 4000 times, no debugger monitoring the process. No \r\n per output.
What we can see is that OutputDebugStringW() is 9% slower than OutputDebugStringA() and that both calls are very fast.

Function String Time
OutputDebugStringA "" 0.0112s
OutputDebugStringA short string 0.0198s
OutputDebugStringA long string 0.0255s
Average 0.00000470s
 
OutputDebugStringW "" 0.0121s
OutputDebugStringW short string 0.0214s
OutputDebugStringW long string 0.0281s
Average 0.00000513s

Test 2 – Visual Studio 6

OutputDebugString() called 4000 times, Visual Studio 6 monitoring the process. No \r\n per output.
What we can see is that OutputDebugStringW() is 1% slower than OutputDebugStringA() and that both calls are over 19 times slower than without the debugger.

Function String Time
OutputDebugStringA "" 0.03631s
OutputDebugStringA short string 0.03837s
OutputDebugStringA long string 0.3885s
Average 0.00009641s
 
OutputDebugStringW "" 0.3693s
OutputDebugStringW short string 0.3977s
OutputDebugStringW long string 0.4068s
Average 0.00009782s

Test 3 – Visual Studio 6

OutputDebugString() called 4000 times, Visual Studio 6 monitoring the process. One \r\n per output.
What we can see is that OutputDebugStringW() is 2% slower than OutputDebugStringA() and that both calls are over 22 times slower than without the debugger.

Function String Time
OutputDebugStringA "" 0.4048s
OutputDebugStringA short string 0.4247s
OutputDebugStringA long string 0.4267s
Average 0.00010468s
 
OutputDebugStringW "" 0.4127s
OutputDebugStringW short string 0.4346s
OutputDebugStringW long string 0.4419s
Average 0.00010743s

Test 4 – Visual Studio 2005

OutputDebugString() called 4000 times, Visual Studio 2005 monitoring the process. No \r\n per output.
What we can see is that OutputDebugStringW() is 54% slower than OutputDebugStringA() and that both calls are over 65 times (95 times slower for OutputDebugStringW) slower than without the debugger.

Function String Time
OutputDebugStringA "" 1.0270s
OutputDebugStringA short string 1.2200s
OutputDebugStringA long string 1.3982s
Average 0.00030377s
 
OutputDebugStringW "" 1.5850s
OutputDebugStringW short string 1.8874s
OutputDebugStringW long string 2.1672s
Average 0.00046997s

Test 5 – Visual Studio 2005

OutputDebugString() called 4000 times, Visual Studio 2005 monitoring the process. One \r\n per output.
What we can see is that OutputDebugStringW() is 48% slower than OutputDebugStringA() and that both calls are over 68 times (92 times slower for OutputDebugStringW) slower than without the debugger.

Function String Time
OutputDebugStringA "" 1.1133s
OutputDebugStringA short string 1.2766s
OutputDebugStringA long string 1.4455s
Average 0.00031962s
 
OutputDebugStringW "" 1.6444s
OutputDebugStringW short string 1.9108s
OutputDebugStringW long string 2.1501s
Average 0.00047543s

Test 6 – Visual Studio 2010

OutputDebugString() called 4000 times, Visual Studio 2010 monitoring the process. No \r\n per output.
What we can see is that OutputDebugStringW() is 2% slower than OutputDebugStringA() and that both calls are over 133 times (142 times slower for OutputDebugStringA) slower than without the debugger.

Function String Time
OutputDebugStringA "" 2.8112s
OutputDebugStringA short string 2.6041s
OutputDebugStringA long string 2.6408s
Average 0.00067134s
 
OutputDebugStringW "" 2.5735s
OutputDebugStringW short string 2.5891s
OutputDebugStringW long string 3.0845s
Average 0.00068727s

Test 7 – Visual Studio 2010

OutputDebugString() called 4000 times, Visual Studio 2010 monitoring the process. One \r\n per output.
What we can see is that OutputDebugStringW() is 2% slower than OutputDebugStringA() and that both calls are over 132 times (141 times slower for OutputDebugStringA) slower than without the debugger.

Function String Time
OutputDebugStringA "" 2.6517s
OutputDebugStringA short string 2.6604s
OutputDebugStringA long string 2.6423s
Average 0.00066287s
 
OutputDebugStringW "" 2.6675s
OutputDebugStringW short string 2.7529s
OutputDebugStringW long string 2.7410s
Average 0.00068011s

Conclusion

Calling OutputDebugString() when the application is not being monitored by a debugger does not incur significant overhead, although in tight loops this could be problematic.

Calling OutputDebugString() when the application is monitored by Visual Studio 6 can result in OutputDebugString() running between 19 and 22 times slower than without Visual Studio 6 monitoring the application.

Calling OutputDebugString() when the application is monitored by Visual Studio 2005 can result in OutputDebugString() running between 65 and 95 times slower than without Visual Studio 2005 monitoring the application.

Calling OutputDebugString() when the application is monitored by Visual Studio 2010 can result in OutputDebugString() running between 132 and 142 times slower than without Visual Studio 2005 monitoring the application.

The most surprising aspect is that the newer, more modern Visual Studio 2005 and Visual Studio 2010 are so much slower at handling OutputDebugString() than the old Visual Studio 6 IDE, which is now 12 years old. The performance difference is a factor of 3x or 4x (for VS2005) and a factor of 6x or 7x (for VS2010) depending on the test. Our initial tests ran OutputDebugString 100,000 times, but the tests using Visual Studio 2005 were so slow we had to reduce the test run to 4,000 times so that we could get a result in a realistic time frame to do the test.

The other interesting aspect is that with Visual Studio 2005 monitoring the application, the disparity in performance between OutputDebugStringA() and OutputDebugStringW() is even greater. With Visual Studio 2010, the disparity is not so great, but the actual performance level is worse than for Visual Studio 2005.

It may be tempting to write off all of the above information with the remark that this test does not reflect real world usage of OutputDebugString because we are calling it in a tight loop. It is the case that our loop is exceptionally tight as we are trying to time just the function call, but we see in our own development work occasions where OutputDebugString is called frequently enough that while in the debugger there is a serious drop off in performance. For example our interprocess comms for sending data from our monitoring stub to the GUI, if we are debugging so items we either enable some logging or OutputDebugString. This data is provided from the application being monitored at a rate and volume determined by what the application is doing. For Memory Validator, say, monitoring certain apps, that could be 500,000 events just to start the target application. That is 500,000 OutputDebugString calls in our GUI, reporting comms activity. In such a case, using logging may be more efficient. I mention this just to show that although our test has a tight loop, the actual test number (4,000) is well within real world usage.

Thread Lock Checker now available

Thursday, April 22nd, 2010

We’ve just released Thread Lock Checker.

Took a bit longer than we anticipated (sorry about that) due to some website maintenance work. Anyway its available now, go and give your source code some TLC and find any latent lock errors in your code!

Thread Lock Checker

Improving how you use CSingleLock

Friday, April 2nd, 2010

Thread Lock Checker Logo

This posting covers a brief background:

  • Win32 critical sections.
  • How CCriticalSection and CSingleLock can be used instead of Win32 critical sections.
  • An improved way to use CSingleLock.
  • Some ways CSingleLock can be used that do not have the desired effect.

Critical Sections in Win32

The Win32 API uses InitializeCriticalSection, EnterCriticalSection, LeaveCriticalSection and DeleteCriticalSection to manage critical sections (CRITICAL_SECTION). Using these APIs is not particularly hard, but nonetheless it is possible to use critical sections that have not been initialized or that have been deleted. It is also possible to forget to leave a critical section that has been entered. In addition, any exceptions that get thrown may result in a critical section being left in its locked state.

This can cause serious performance problems as locks are held for too long, or in the case of a lock not being released, it can prevent other threads gaining access to the resource the lock was protecting, possibly resulting in a deadlock.

Example Win32 usage (assume critical section initialized in a different function):

void someFunc()
{
	doWork();

	EnterCriticalSection(&cs);

	doWorkEx();

	LeaveCriticalSection(&cs);
}

Why use CSingleLock and CMultiLock?

When using critical sections in MFC you use the CCriticalSection class instead of CRITICAL_SECTION objects.

You can directly call Lock() and Unlock() on the CCriticalSection, but it is recommended that you use CSingleLock and CMultiLock to manage your CCriticalSection objects.

The benefits of using a class such as CSingleLock (and its related class CMultiLock) are that:

  • The CSingleLock manages the activities of entering and leaving the critical section – you do not have to think about the critical section at all.
  • Any CCriticalSection object used with CSingleLocks will automatically be initialized before the CSingleLock gets to work with it.
  • The CSingleLock is automatically unlocked (if it was locked) when the CSingleLock is deleted and thus the CCriticalSection that was associated with this CSingleLock is not held locked .
  • If an exception is thrown, C++ objects are cleaned up by the exception handling chain, thus automatically deleting any CSingleLock objects and releasing any locks they hold.
  • CSingleLock can be used to lock and unlock critical sections just like the old Win32 methods, allowing for easy conversion of code from Win32 style to CSingleLock style.
  • It is possible to create a CSingleLock that is automatically locked. This is very useful for set-and-forget critical section management. Just put the CSingleLock in the right place and you can ignore it in the rest of the code. Very neat, convenient and elegant.

One way of using CSingleLock

As described above a typical style of using CSingleLocks echoes the Win32 style of using critical sections.

void someFunc()
{
	CSingleLock	lock(&csSect);

	doWork();

	lock.Lock();
	doWorkEx();
	lock.Unlock();
}

As you can see, the CSingleLock lock manager is created, the doWork() function is called outside of the protected area, the lock is locked, doWorkEx() is called, then the lock is unlocked. This is a very similar style of writing to Win32 equivalent.

A better way of using CSingleLock

The problem with the previous way of using CSingleLock is that most of the power and convenience of CSingleLock is ignored. Lock management has been made explicit via calls to Lock() and Unlock(). This means there is potential for forgetting to lock the CSingleLock, or for unlocking the CSingleLock later than desirable.

An improved way of using CSingleLock is to always create CSingleLocks in the locked state and to create CSingleLocks as close to the resource they are need to protect.

The following example shows the same function written using a CSingleLock that is automatically locked, created just before it is required and automatically destroyed at the end of the function.

void someFunc()
{
	doWork();

	CSingleLock	lock(&csSect, TRUE);

	doWorkEx();
}

If I wanted to some more work after doWorkEx() but I didn’t want that protected by the lock I could do it by using C++’s scoping capabilities. I simply create a new scope and place the CSingleLock in there. At the end of the scope the CSingleLock is destroyed and the lock is unlocked.

void someFunc()
{
	doWork();

	{
		CSingleLock	lock(&csSect, TRUE);

		doWorkEx();
	}

	doMoreWork();
}

Some problems we have seen…

During the development of code for the software tools at Software Verification and our private tools we’ve found a few interesting mistakes. Mistakes often made not through poor design, but simply a typing oversight or mistake, possibly due to tiredness of the person working on the code – the type of mistake you can only put down to the fact that humans do make mistakes, not matter how talented they are in any given field.

Where possible we like to use the CSingleLock lock(&csSect, TRUE) automatic locking style coupled with tight scoping to make the lock lifetime short. As a result we are interested in find the following coding constructs which will result in errors in expected behaviour in our software:

  • CSingleLock created without a lock argument. This defaults to an unlocked CSingleLock.
    CSingleLock	lock(&csSect);
  • CSingleLock created with a FALSE lock argument. This is an unlocked CSingleLock.
    CSingleLock	lock(&csSect, FALSE);
  • CSingleLock created with a variable declaration. This compiles but creates a lock that is immediately destroyed. Any of these three variants are interesting as none of the are useful, but all compile OK.
    CSingleLock(&csSect);
    CSingleLock(&csSect, FALSE);
    CSingleLock(&csSect, TRUE);

Thread Lock Checker

Thread Lock Checker

The problem with the examples we show above is that looking for them is hard work because humans often read what they expect to read (this is part of our predictive pattern recognition built into how we process shapes and text). As a result you may be looking right an error and not see it, but you may see the error the next time you come to the code (having forgotten all about it).

To aid in the discovery of these types of lock usage (for both CSingleLock, CMultiLock and any named classes that have the same style of behaviour) we have written a software tool, Thread Lock Checker.

We use Thread Lock Checker before we release any software. We use Thread Lock Checker to scan our codebase looking for any mistakes not identified by our software engineers. Its a very useful tool. We hope that you will also find Thread Lock Checker useful. Please check back next week for your free download.

We will be releasing Thread Lock Checker during the week of 5 April to 9 April.

Delete memory 5 times faster

Tuesday, March 2nd, 2010

Memory management in C and C++ is typically done using either the malloc/realloc/free C runtime functions or the C++ operators new and delete. Typically the C++ operators call down to the underlying malloc/free implementation to do the actual memory allocations.

This is great, its useful, it works, BUT it puts all the allocations in the same heap. So when you come around to deallocating the heap manager will need to take into account all the other objects and allocations that are also in the heap but unrelated to the data you are deallocating. That adds overhead to the heap manager, causing memory fragmentation and slower heap management.

There is a different way you can handle this situation – you can use your own heap for a given group of allocations.

	HANDLE	hHeap;

	hHeap = HeapCreate(0, 0x00010000, 0); // 64K growable heap

The downside to this is that you have to remember to use the correct allocators and deallocators for these objects and not to use malloc/free etc. You can mitigate this in C++ by overriding new/delete to use your own heap.

void *myClass::operator new(size_t size)
{
	return HeapAlloc(hHeap, 0, size);
}

void myClass::operator delete(void *ptr)
{
	HeapFree(hHeap, 0, ptr);
}

The upside to this technique is that you can delete all your deallocations in one call by deleting the heap and not bothering with deleting individual allocations. This is also about 5 times faster.

Old style:

	for(i = 0; i < count; i++)
	{
		HeapFree(hHeap, 0, ptrs[i]);
	}

New style:

	HeapDestroy(hHeap);
	hHeap = NULL;

There is another benefit to this technique: By deleting the heap and then re-creating a heap for new allocations you remove all fragmentation from the heap and start the new heap with 0% fragmentation.

HeapDestroy timing demonstration

Download the source of the demonstration application. Project and solution files for Visual Studio 6.0 and Visual Studio 2008 are provided.

We use this technique for some of our tools where we want a high performance heap and zero fragmentation.

You will also want to ensure that whatever software tool you are using to monitor memory allocations will mark all entries in a heap that is destroyed as deallocated.


Perfect imprecision, thoughts on memory leaks, performance profiling, code coverage, deadlock detection and flow tracing is proudly powered by WordPress
Entries (RSS) and Comments (RSS).

About Us | Site Map | Legal | Contact Us | ©2002-2006: Software Verification Ltd : all rights reserved: registered in England No. 3939098 112
Design by ITS GUI