August 8th, 2010
In the previous article I showed you how to embed data into a custom resource in your executable.
In this article I’m going to show you how to extract the same data using the Win32 API for use in your executable at runtime.
To extract data from a resource in an executable we need some information:
- Executable name.
- Custom resource type name.
- Custom resource name.
In our previous example, the executable name was mvJavaDetective.dll, the custom resource type name was “CLASSFILE” and the custom resource name was “myJavaSpy”.
The API
FindResource
HRSRC FindResource(HMODULE hModule,
LPCTSTR lpName,
LPCTSTR lpType);
Call FindResource() to find a resource in an executable and return a resource handle. The executable is specified using a module handle that represents a module loaded in the current program. If the module is not currently loaded you can load it with LoadLibrary(). The resource is identified by its custom resource name and custom resource type.
LoadResource
HGLOBAL LoadResource(HMODULE hModule,
HRSRC hResInfo);
Call LoadResource() to load the resource specified by the module handle and the resource handle. The returned handle should not be passed to any Global memory function for deallocation.
LockResource
LPVOID LockResource(HGLOBAL hResData);
Call LockResource() to lock the resource in memory. Pass the handle returned by LoadResource() as the input parameter. If the call succeeds a pointer to the data represented by the handle is returned.
SizeofResource
DWORD SizeofResource(HMODULE hModule,
HRSRC hResInfo);
Call SizeofResource() to determine the size of a resource. Pass the module handle and the handle returned from FindResource() as input parameters.
Putting it together
In the previous example our example DLL myJavaDetective.dll had a class myJavaSpy.class embedded into a resource with the type “CLASSFILE” and name “myJavaSpy”. I will now show you how to extract the myJavaSpy.class byte codes from the resource.
First we need to get the module handle of the executable (myJavaDetective.dll) containing the myJavaSpy.class. For this example we will assume that myJavaDetective.dll is already loaded into memory.
HMODULE hModJavaDetective;
hModJavaDetective = GetModuleHandle("myJavaDetective.dll");
Once we have the module handle we can attempt to find the resource in the executable. We don’t need to check for a NULL module handle as FindResource() handles and will return a NULL resource handle (just as it will if the resource is not embedded in the executable).
jbyte *classBytes = NULL;
DWORD classBytesLength = 0;
HRSRC hResource;
hResource = FindResource(hModJavaDetective,
_T("myJavaSpy"),
_T("CLASSFILE"));
if (hResource != NULL)
{
If FindResource() returns a non NULL handle the resource has been found. Now we must load the resource using a LoadResource().
HGLOBAL hResourceMemory;
hResourceMemory = LoadResource(hModInjectedJVMTI, hResource);
if (hResourceMemory != NULL)
{
If LoadResource() returns a non NULL handle the resource has been correctly loaded from the executable. This returns a handle of type HGLOBAL. Caution you must not pass this handle to any HGLOBAL related functions such as GlobalFree() or GlobalRealloc() as this handle does not represent a memory allocation. This type is used for backward compatibility with earlier versions of the Windows API.
Before we can use the data we must convert the returned handle into a pointer to the data by calling LockResource(). We also want to know the size of the data in the resource so we call SizeofResource() to determine the size. The pointer returned by LockResource() must not be passed to any memory deallocation functions – it does not need to be deallocated or unlocked.
void *ptr;
DWORD size;
ptr = LockResource(hResourceMemory);
size = SizeofResource(hModInjectedJVMTI, hResource);
if (ptr != NULL)
{
If LockResource() returns a non NULL pointer the pointer represents the data embedded in the executable.
Now we have the data we make a copy for our own use and continue as normal. This step is optional, you can use the data directly from the returned pointer if you wish.
classBytes = new jbyte [size];
if (classBytes != NULL)
{
memcpy(classBytes, ptr, size);
classBytesLength = size;
}
}
}
// CAUTION! LoadResource() and LockResource() DO NOT allocate handles or locks,
// read the documentation
}
Now that we have extracted the data from the resource embedded into the executable we can use the data as normal. For this example I will conclude by using the extracted Java class bytescodes to define a Java class in a Java Virtual Machine.
if (classBytes != NULL)
{
// define our class, must have same name as class file bytes
// pass NULL for the class loader - use default class loader
jclass klass = 0;
klass = jniEnv->DefineClass(SVL_COVERAGE_CLASS, NULL, classBytes, classBytesLength);
if (klass != 0)
{
// class defined correctly
}
// tidy up
delete [] classBytes;
}
Wrap up
Now you know how to embed data in an executable at runtime (and after the fact with the utility presented in the previous article) and how to extract data from an executable at runtime. The techniques are quite straightforward to master and allow you to easily embed data for you to use at runtime without worrying about distributing and locating extra data files.
Tags: C, embed, Java, resource Posted in Hints and tips | No Comments » Posted by Stephen Kellett
August 7th, 2010
In this article I will demonstrate how you can embed data into a Windows PE format executable (EXE or DLL). At the end I will also provide a working example which you can use to embed data into your executable as custom resources.
The problem
Often software requires ancillary data to support the software we write. This data can reside in files on your hard disk, on a network computer or on a computer accessed across the Internet. Or the data can be embedded in your executable. There is no correct solution for all cases. You have to choose the correct solution for the task at hand. I’ll briefly describe the four methods, outlining the potential pitfalls involved.
- Loading the data from disk. You need to locate the file and read the contents of the file. What happens if the file is missing? If the file is present and readable has it been modified by accident or has been deliberately tampered with? You will need a mechanism to detect this if appropriate.
- Loading the data from a network computer. This is similar to loading the file from the disk except that you need to know the network computer name.
- Loading the data from the a computer on the Internet. This is more complex, now you need engage in some protocol to download the file. What if the Internet connection is not available or is refused?
- Embedding the data in your executable. Embedding the data is harder than creating a file, and reading the data is harder than reading a file. However, the data will always be available. If you application uses checksums (MD5, etc) or is digitally signed then you will know if the embedded data has been modified or tampered with.
Embedding data
Sometimes it would be more convenient if the data was embedded right into the executable we are creating.
There may be no convenient method for embedding the data. Visual Studio provides a means to embed data. You could transcribe the data by hand. But that would be time consuming, expensive, error prone and tedious. Alternatively you can add a custom resource, then edit the properties for the custom resource and identify the file that contains the data you wish to embed into the executable. We have tried this but there are no error messages for when the file cannot be found (you made a typing error typing the filename) and there is no way to conditionally change which custom resource is embedded depending on the build.
Fortunately, Windows provides an API for adding data to the resource section of an executable (.exe or .dll). The API also provides mechanisms for finding this data. With the use of the API we can create a helper application to embed as many custom resources as you want after you have built your executable.
For this example I will assume the data we adding to the executable is not data you would normally find in a resource. This means we will be adding a custom resource.
Let us say we want to add a Java class file to our executable so that we can find this class file at runtime without knowing anything about the current Java CLASSPATH or the file system. Once we’ve extracted the class file we could use it to define a class that would then be used by the Java Virtual Machine to do the work we want (presumably somewhere else we’ll be instrumenting Java class files so they know about the Java class we just defined).
We need a few things first, which we will also need when we come to extract the resource from the executable.
- Executable to add the resource to.
- Type name for the custom resource.
- Name for the custom resource.
- Data for the custom resource.
For our Java class file example, type could be “CLASSFILE”, name could be “myJavaSpy” and data would be the byte code for the class myJavaSpy which we would load from the file myJavaSpy.class (having previously compiled it from myJavaSpy.java).
The API
BeginUpdateResource
HANDLE BeginUpdateResource(const TCHAR *executableName,
BOOL fDeleteExistingResources);
Call BeginUpdateResource() to open the specified executable and return a resource handle. Pass TRUE for the second argument to erase all existing resources, pass FALSE to keep any existing resources in the executable.
UpdateResource
BOOL UpdateResource(HANDLE hUpdate,
LPCTSTR lpType,
LPCTSTR lpName,
WORD wLanguage,
LPVOID lpData,
DWORD cbData);
Call UpdateResource() to update a resource in the executable represented by the handle hUpdate. Specify the type, name, language (locale) and the data with the remaining arguments. For our example above lpType would be “CLASSFILE” and lpName would be “myJavaSpy”. Pass MAKELANGID(LANG_NEUTRAL, SUBLANG_NEUTRAL) for language. Pass the java byte code and the lenght of the byte code for the last two arguments.
EndUpdateResource
EndUpdateResource(HANDLE hUpdate,
BOOL fDiscard);
Call EndUpdateResource() to finish updating the resource. If you wish to discard your changes, pass TRUE as the second argument. If you wish to keep your changes, pass FALSE as the second argument.
Putting it together
HANDLE hUpdateRes;
// Open the file to which you want to add the dialog box resource.
hUpdateRes = BeginUpdateResource(executableName,
FALSE); // do not delete existing resources
if (hUpdateRes != NULL)
{
BOOL result;
// Add the dialog box resource to the update list.
result = UpdateResource(hUpdateRes,
customType,
customName,
MAKELANGID(LANG_NEUTRAL, SUBLANG_NEUTRAL),
bytes,
numBytes);
if (result)
{
// Write changes to the input DLL and then close it
EndUpdateResource(hUpdateRes, FALSE);
}
}
First we call BeginUpdateResource() to open the executable for resource updating. We pass FALSE as the second argument to make sure we keep the existing resources and only add our new resource. This calls returns an update handle.
If the call to BeginUpdateResource() is successful we received a non NULL update handle. We use to call UpdateResource() passing the type and name of resource data we wish to update along with the data to update and its length. In this example we have specified a neutral locale.
Finally we call EndUpdateResource() to finish updating the resource and to write the results back to the executable (pass FALSE as the second argument).
addResourceToDLL
addResourceToDLL.exe is command line program that you can add to your post-build process to embed custom resources into your EXE/DLL as you build. It has a quiet mode so that you can suppress any information and/or error messages it may emit. I don’t use the quiet mode, I like to see the confirmation message that it succeeded embedding data into the DLL. Run without arguments to get the help message.
Help summary
All arguments are mandatory unless noted otherwise.
- -moduleName pathToDLL (or EXE)
- -customResource pathToCustomResource
- -customType type
- -customName name
- -quiet (optional)
Example:
addResourceToDLL.exe -moduleName c:\myJavaDetective\myJavaDetective.dll -customResource c:\myJavaDetective\myJavaSpy.class -customType CLASSFILE -customName myJavaSpy
The example above embeds the myJavaSpy.class file into myJavaDetective.dll with the type “CLASSFILE” and name “myJavaSpy”.
Download
Download the addResourceToDLL source code.
Download the addResourceToDLL executable.
In the next article I will show you how to read the embedded data from the resource.
Tags: C++, embed, Java, resource Posted in Hints and tips | 1 Comment » Posted by Stephen Kellett
August 6th, 2010
Thread Validator x64 is now available for beta testing.

Thread Validator x64 is the 64 bit version of our successful 32 bit Thread Validator software tool that runs on Microsoft Windows operating systems. Thread Validator x64 is a deadlock detection and thread analysis software tool, running on Windows 7 64 bit, Windows Vista 64 bit and Windows XP 64 bit.
Thread Validator has multiple displays to provide you with different perspectives onto the data you have collected.
What does Thread Validator do?
Thread Validator x64 identifies thread deadlocks, potential deadlocks and locks with a high contention rate.
Thread deadlocks usually mean that one or more threads can no longer function correctly because they are waiting on a lock that will never be released. This is an error condition and usually manifests as an unresponsive computer program.
Potential deadlocks are locking sequences that have not triggered a deadlock but may lead to a deadlock under slightly different conditions.
High contention rate locks result in your program spending too much time waiting for access to a lock. A different program design can often reduce a high contention rate to a less demanding contention rate.
How does Thread Validator work?
Thread Validator instruments your computer program so that it can monitor the appropriate synchronization APIs used to control access to locks, mutexes, semaphores and wait conditions. Using the information gained from monitoring these APIs, Thread Validator can calculate deadlock conditions, potential deadlock conditions and detect locks with high contention rates.
Thread Validator gathers data for all locks, all threads, all mutexes, all semaphores and all wait conditions. The data is organised into various displays allowing you to view information:
- All active locks.
- All active locks, organized by thread.
- All locks that are locked at a given time.
- Allocation information for all allocated synchronization objects, showing callstack and source code.
- Thread locking history. View all threads, see what each threads is doing and when.
- Thread lock order. View the order locks are acquired across threads for a given lock sequence.
- List of all application objects that can be used in wait conditions.
How Thread Validator helps you be more productive
Thread Validator x64 can help you:
- Identify deadlocks in your application – quickly identify and fix hard threading problems.
- Identify potential deadlocks in your application – prevent problems before they get serious.
- Identify busy contended critical sections in your application – improve performance.
- View thread locking behaviour in real time.
- Improve your software quality by modifying your threading behaviour.
- View all open handles that your application can wait on.
Join the beta test
If you are developing 64 bit software and have some multi-threading problems you would like to analyze, please join the beta, analyze your multi-threading problems and let us know your thoughts.
Tags: C Posted in Porting to Win64, Thread | No Comments » Posted by Stephen Kellett
August 5th, 2010
Give up caffeine, improve productivity – yes I am serious.
Just recently I found out that I was allergic to several things, one of them being caffeine.
My history with caffeine
I really like tea, but dislike coffee, having announced to my mother at age 5 that I didn’t like coffee. Seems to have stuck with me. I drink tea with no milk and have done for years. A little bit of sugar to take the edge off the black tea and its fine. And I loved the caffeine. I could never see the point of caffeine free tea. Until I found out I was allergic to caffeine.
Giving up caffeine
At the same time, I had noticed that a lot of the time I was distracted, unable to relax, always casting about for something to do. Granted, folks with active minds are like this a bit, I guess thats why I like to write software. But this was different, even when too tired to write software I’d still be this coiled spring.
Then I gave up drinking caffeine in any drinks. Apparently if you drink more than a few cups of tea a day you are classed as addicted to caffeine. I guess you could say I was easily addicted to caffeine. According to Wikipedia there are caffeine withdrawal symptoms but I can’t say I noticed any.
A few days after I stopped drinking caffeinated tea, my distracted state of mind went away. Easier to focus on software, on bugs, reading books, watching movies, whatever the thing was.
Caffeine also affects your blood sugar levels, causing a boost. This in turn can lead to up and down swings in your blood sugar with a possible change of mood.
The problem with energy drinks and software
Its not uncommon to see physically active people consuming lots of calories, either in the form of food or drink. Or even drinking an energy drink which may also contain caffeine. That is fine because the physical activity will consume the calories and burn them leaving your blood sugar levels relatively normal.
However if you are sitting at your desk (or in your car) then an energy drink or high carbohydrate food is just going to put a big spike into your blood sugar to which your body will have to react with some insulin to regulate it. Not so long later (hour or two) you’ll feel lethargic as you get the counter effects of the insulin kicking in.
As such I’ve never understood the idea of consuming energy drinks if you are writing software – you are winding yourself up, and also setting yourself up for a blood sugar trough after the spike. If you are taking an energy drink so you can stay awake and code that is a sign you are too tired anyway. You should take 20 minutes out and have a short sleep. Drink half a pint to a pint of water before you go to sleep. It surprising how much this short break can help. The water is to rehydrate you while you sleep – tiredness is a sign of being dehydrated.
It is not uncommon for me to wake from a short nap with the solution to a problem and the also the correct approach to implementing the solution. Try it for yourself.
You can have a similar problem with food
The same problem with energy drinks applies to fast acting carbohydrate foods. Basically anything filled with sugar (energy bars, cakes, sweets…). You’ll get a blood sugar spike followed by a trough as your body overcompensates with insulin. These foods are great if you are active and on the go and need a boost but totally counter productive if you are not physically active (typing does not count!).
You will be much better served eating something that is more slowly processed by your body. Namely protein. Vegetable protein (beans, pulses) or meat protein, it does not matter which. Protein takes time for your body to convert into energy. As a result the energy is released in a much slower, more controlled manner, supplying you with energy without any blood sugar spikes or troughs.
What do I drink instead of caffeinated tea?
I now drink the redbush caffeine free tea, various herbal fruit teas and water. I drink water because a 5% drop in your body hydration leads to a significant drop in your ability to concentrate.
Recap
- Avoid caffeine and other stimulants. More focus, less distractedness, better productivity.
- Drink water, do not work dehydrated.
- Drinking caffeinated drinks will dehydrate you – caffeine is a directic.
- Keep your blood sugar even, improve your productivity.
Tags: caffeine, energy drinks, water Posted in Life | No Comments » Posted by Stephen Kellett
July 20th, 2010
Is that headline a bit too strong? I don’t think so. Allow me to explain.
This article applies to any company where your main product, or a substantial part of your main product is software and the software is a key part of your product. For example if you made top end oscilloscopes with an embedded PC inside them, the software to run the oscilloscope on top of the embedded OS would be important. This is unlike say if you are AOL, where the dialup software is important but can be easily outsourced without impacting your business at all.
We receive approaches via email, via Linked-In, etc, from outsourcing firms on a regular basis. Their message extoll the virtues of their multi-talented legion of software developers that can be hired from as little as $5.00 / hour. On the face of it getting staff at $5.00 sounds great, if you assume the staff know how to do their job. The list of technologies, computer languages and computing platforms they support is vast. You almost wonder how they do it. And at $5.00/hour. Whats the catch?
Cost
$5/hour. 10$/hour. Either of these rates is really attractive. No surprise, that is pretty much their unique selling point.
In pure financial terms this looks like a win. But the true cost of using outsourced software development is not the expense of paying for it. The cost is elsewhere. I’ll come back to this later.
Location and Language
Typically these operations are setup in low cost parts of the world, usually developing countries, ex-eastern bloc countries, India, China. Often with a nice shiny headquarters in the US or the UK.
Communication
Everyone (well, almost) has access to the Internet these days. Distributed teams are not new. Lots of people do it. We do it. The most famous web product team on the planet does it. It is quite possible your outsourcing company will work this way. If that is important to you, you should find out.
You will need a core spoken/written language for communications. That will probably be English, just because that is the way the world works and software tends to be written that way.
You’ll want to make sure any source code comments, version control commits and design specs/documents/presentations are also in English. I’ve seen comments in French in source code bases that were substantially English. No problem with French except that practically no one in the building knew what the comments said, except for the mathematician that wrote them. Is that useful? I don’t think so. Neither did the management when they found out
You probably have to assume that most of the people that will end up developing software for you will have English as a second or third language. This generally isn’t a problem, but I’ve found a few exceptions. Different languages have different rules on sentence construction – where the verb goes, how adjectives and nouns are used and so on. I’ve found over the years, from corresponding with our customer base, the I struggle with the sentence verb/noun/adjective formation rules used by some parts of the world which are not the same as English rules. When translated from their native language to English you get a strange beast which looks like English but which reads very differently. Everyone is different, you may not have that problem, but I struggle with it.
Staff
Is it reasonable to assume these people have good staff? I think it is. I think its also reasonable to assume they probably have some useless staff too, most companies do, so I don’t see why these companies would be any different.
Sounds OK so far, so what is the problem?
The main problem with outsourced software development is that as the development proceeds the following happens:
- You are paying the outsourcing firm to learn the technologies required to build your product.
- You are building up a reservoir of knowledge in the brains of their employees, not your employees.
- At some point the outsourcing company will know more about the internals of your product that you do. At that point you become dependent upon the outsourcing company. A perfect time for them to raise their costs. I don’t know if this happens, but logic dictates that it should.
- Once your product is complete, you have a lot of software that works and has no bugs (well, you can live in hope), but you don’t know anything about it. Is that a good position to be in? Sure you’ve got documentation on it and the full version control history. But fundamentally the knowledge is in the brains of people that don’t work for your company. How are you going to handle bug reports? Are you going to let your staff lose on a codebase they don’t know or are you going to hire the outsourcing company again? At what point can you break free from the outsourcing company?
- The staff at the outsourcing company do know how your product works. If they want to go and write their own product in the same marketspace they can, and they will do it better as they will have all the knowledge about the pitfalls and mistakes made creating your product. Of course you can mitigate against this will a clause in your contract forbidding such competition.
- Fundamentally, the software development business is about developing intellectual property. You want as much of that IP in the heads of the people that work for you, not in the heads of people that do not work for you. Outsourcing this work puts this knowledge outside of your company and is ultimately commercial suicide. Of course, staff can leave your company and move on to forge their careers. But that happens over time, you can manage that. Not the same as all the knowledge in the brains of the a separate company’s employees.
All the above reasons put you at a substantial disadvantage compared to your competitors that do not outsource their software development.
Exceptions
I can think of one exception to the above, but it is not really outsourcing. The exception is when you have one small discrete component of your software that you would like implemented and all you want is a functioning piece of code that can live in a DLL and get called to do its job. Examples would be a GUI widget library, a data compression library, a disassembler for a specific microprocessor etc. Typically you can find companies or lone developers that do things like this. They often have working examples you can try with very reasonable terms for use.
This is outsourcing in that you didn’t do the work, but it falls into the “not core activity” classification I mentioned at the start of the article, so I don’t think this really counts.
Conclusion
I always reply to people that approach us asking us to outsource our work to them. I explain that I regard their offer as asking us to commit commercial suicide. Normally I don’t get a reply. But I did get a reply from one gentleman, who shall remain nameless. Here is the edited exchange. Edited just to get to the nitty gritty. I have not edited any words or punctuation in the sentences.
Software Verification: "We are not interested in outsourcing. Software development is a core activity here – outsourcing it is equivalent to outsourcing the core intellectual property. For our business that is suicide. For other businesses it makes sense."
Outsourcing reply:
"Honestly, I totally agree with you. But such is the reality nowadays.
…"
He then explained that he thought there were valid reasons for us to use his services. Good salesman, I guess.
Tags: outsourcing, software development Posted in Development | No Comments » Posted by Stephen Kellett
July 20th, 2010
I’ve been building Firefox on Windows recently. Without problems. But then I wanted a build with symbols and that is when the problems started. It is meant to be simple, and it is, so long as you don’t do a
make -f client.mk clean
If you do that, then a subsequent build will fail when it comes to updater.exe.manifest.
Setup
Assuming you have downloaded and installed Mozilla build and have also installed a suitable version of Visual Studio and the latest Microsoft Web SDK, you can start the build environment by running c:\mozilla-build\start_msvc9.bat (change the number for the version of Visual Studio you are using).
Read these instructions first.
Download the appropriate source tree with a mercurial command like this:
hg clone http://hg.mozilla.org/releases/mozilla-1.9.2/ src192
Basic Build
I’m building on drive M: and have downloaded the source into folder src192.
cd M:/src192
Configure the build with:
echo '. $topsrcdir/browser/config/mozconfig' > mozconfig
Build the build with
make -f client.mk build
This gives you a firefox release build. Strangely the build is created without debug symbols. I don’t understand this as debug symbols are really useful even with release builds. The symbols don’t have to ship with the executable code, so there is no good reason for not creating them.
Given that you’ve downloaded the source to build yourself the image its fair to assume that you are probably curious about the internal works or that you may be interested in modifying the source and/or writing an extension for it. Either way, you are going to want the symbols so that you can see what is happening in a debugger. Thus I don’t understand the lack of symbols by default.
Creating symbols
You can create symbols in several ways, by adding the following lines to your mozconfig.
echo 'ac_add_options --enable-debug' >> mozconfig
echo 'export MOZ_DEBUG_SYMBOLS=1' >> mozconfig
echo 'ac_add_options --enable-debugger-info-modules=yes' >> mozconfig
Getting a working build
At this point you’ve already done a build and have created executables. Thus it seems the appropriate thing to do is a make -f client.mk clean followed by a make -f client.mk build. This won’t work. The problem is that the clean deletes the manifest files that are present in the original mercurial source download.
In the end I wiped the whole source tree, downloaded from the source control again, modified mozconfig with the debug symbol options and did a build. That works.
This took me several attempts to understand – I thought that my various attempts at configuring the debug symbol options were breaking something, so I would change the options, clean then build. Each time taking several hours. I had this working in a virtual machine while I worked on other tasks. I probably spent about a day, including the build times before I decided to try again by starting from scratch.
Solution
I don’t know if this is true for non-Windows firefox builds, but if you want your builds to succeed don’t do a make -f client.mk clean prior to make -f client.mk build as it will prevent you from successfully building firefox.
I hope this information helps prevent you from wasting your time with this particular build problem.
Tags: firefox Posted in Test Setup | No Comments » Posted by Stephen Kellett
July 19th, 2010
Doing good work can make you feel a bit stupid, well thats my mixed bag of feelings for this weekend. Here is why…
Last week was a rollercoaster of a week for software development at Software Verification.
Off by one, again?
First off we found a nasty off-by-one bug in our nifty memory mapped performance tools, specifically the Performance Validator. The off-by-one didn’t cause any crashes or errors or bad data or anything like that. But it did cause us to eat memory like nobodies business. But for various reasons it hadn’t been found as it didn’t trigger any of our tests.
Then along comes a customer with his huge monolithic executable which won’t profile properly. He had already thrown us a curve balled by supplying it as a mixed mode app – half native C++, half C#. That in itself causes problems with profiling – the native profiler has to identify and ignore any functions that are managed (.Net). He was pleased with that turnaround but then surprised we couldn’t handle his app, as we had handled previous (smaller) versions of his app. The main reason he was using our profiler is that he had tried others and they couldn’t handle his app – and now neither could we! Unacceptable – well that was my first thought – I was half resigned to the fact that maybe there wasn’t a bug and this was just a goliath of an app that couldn’t be profiled.
I spent a day adding logging to every place, no matter how insignificant, in our function tree mapping code. This code uses shared memory mapped space exclusively, so you can’t refer to other nodes by addresses as the address in one process won’t be valid in the other processes reading the data. We had previously reorganised this code to give us a significant improvement in handling large data volumes and thus were surprised at the failure presented to us. Then came a long series of tests, each which was very slow (the logging writes to files and its a large executable to process). The logging data was huge. Some of the log files were GBs in size. Its amazing what notepad can open if you give it a chance!
Finally about 10 hours in I found the first failure. Shortly after that I found the root cause. We were using one of our memory mapped APIs for double duty. And as such the second use was incorrect – it was multiplying our correctly specified size by a prefixed size offset by one. This behaviour is correct for a different usage. Main cause of the problem – in my opinion, incorrectly named methods. A quick edit later and we have two more sensibly named methods and a much improved memory performance. A few tests later and a lot of logging disabled and we are back to sensible performance with this huge customer application (and a happy customer).
So chalk up one “how the hell did that happen?” followed by feelings of elation and pleasure as we fixed it so quickly.
I’m always amazed by off-by-one bugs. It doesn’t seem to matter how experienced you are – it does seem that they do reappear from time to time. Maybe that is one of the persils of logic for you, or tiredness.
I guess there is a Ph.D. for someone in studying CVS commits, file modification timestamps and off-by-one bugs and trying to map them to time-of-day/tiredness attributes.
That did eat my Wednesday and Thursday evenings, but it was worth it.
Not to be outdone…
I had always thought .Net Coverage Validator was a bit slow. It was good in GUI interaction tests (which is part of what .Net Coverage Validator is about – realtime code coverage feedback to aid testing) but not good on long running loops (a qsort() for example). I wanted to fix that. So following on from the success with the C++ profiling I went exploring an idea that had been rattling around in my head for some time. The Expert .Net 2.0 IL Assembler book (Serge Lidin, Microsoft Press) was an invaluable aid in this.
What were we doing that was so slow?
The previous (pre V3.00) .Net Coverage Validator implementation calls a method for each line that is visited in a .Net assembly. That method is in a unique DLL and has a unique ID. We were tracing application execution and when we found our specific method we’d walk up the callstack one item and that would be the location of a coverage line visit. This technique works, but it has a high overhead:
- ICorProfiler / ICorProfiler2 callback overhead.
- Callstack walking overhead.
The result is that for GUI operations, code coverage is fast enough that you don’t notice any problems. But for long running functions, or loops code coverage is very slow.
This needed replacing.
What are we doing now that is so fast?
The new implementation doesn’t trace methods or call a method of our choosing. For each line we modify a counter. The location of the counter and modification of it are placed directly into the ilAsm code for each C#./VB.Net method. Our first implementation of .Net Coverage Validator could not do this because our shared memory mapped coverage data architecture did not allow it – the shared memory may have moved during the execution run and thus the embedded counter location would be invalidated. The new architecture allows the pointer to the counter to be fixed.
The implementation and testing for this only took a few hours. Amazing. I thought it was going to fraught with trouble, not having done much serious ilAsm for a year or so.
Result?
The new architecture is so lightweight that you barely notice the performance overhead. Less than 1%. Your code runs just about at full speed even with code coverage in place.
As you can imagine, getting that implemented, working and tested in less than a day is an incredible feeling. Especially compared to the previous performance level we had.
So why feel stupid?
Having acheived such good performance (and naturally feeling quite good about yourself for a while afterwards) its hard not to look back on the previous implementation and think “Why did we accept that?, We could have done so much better”. And that is where the feeling stupid comes in. You’ve got to be self critical to improve. Pat yourself on the back for the good times and reflect on the past to try to recognise where you could have done better so that you don’t make the same mistake in the future.
And now for our next trick…
The inspiration for our first .Net Coverage Validator implementation came from our Java Coverage Validator tool. Java opcodes don’t allow you to modify memory directly like .Net ilAsm does, so we had to use the method calling technique for Java. However given our success with .Net we’ve gone back to the JVMTI header files (which didn’t exist when we first wrote the Java tools) and have found there may be a way to improve things. We’ll be looking at that soon.
Tags: C++, Coverage, Memory, profiler .net Posted in Coverage, Profiler | No Comments » Posted by Stephen Kellett
July 10th, 2010
A little known fact is that the Microsoft C Runtime (CRT) has a feature which allows some allocations (in the debug runtime) to be tagged with flags that causes these allocations to be ignored by the built in memory tracing routines. A good memory allocation tool will also use these flags to determine when to ignore memory allocations – thus not reporting any allocations that Microsoft think should remain hidden.
A customer problem
The inspiration for this article was a customer reporting that Memory Validator was not reporting any allocations in a particular DLL of his mixed mode .Net/native application. The application was interesting in that it was a combination of C#, C++ written with one version of Visual Studio and some other DLLs also written in C++ with another version of Visual Studio. Only the memory for one of the DLLs was not being reported by Memory Validator and the customer wanted to know why and could we please fix the problem?
After some investigation we found the problem was a not with Memory Validator but with the DLL in question making a call to _CrtSetDbgFlag(0); which turned off all memory tracking for that DLL. Memory Validator honours the memory tracking flags built into Visual Studio and thus did not report these memory allocations. Armed with this information the customer did some digging into their code base and found that someone had deliberately added this call into their code. Removing the call fixed the problem.
The rest of this article explains how Microsoft tags data to be ignored and what flags are used to control this process.
Why does Microsoft mark these allocation as ignore?
The reason for this is that these allocations are for internal housekeeping and sometimes also for one-off allocations that will exist until the end of the application lifetime. Such allocations could show up as memory leaks at the end of the application – that would be misleading as they were intended to persist. Better to mark them as “ignore” and not report them during a memory leak report.
Microsoft debug CRT header block
Microsoft’s debug CRT prefixes each allocation with a header block. That header block looks like this:
#define nNoMansLandSize 4
typedef struct _CrtMemBlockHeader
{
struct _CrtMemBlockHeader * pBlockHeaderNext;
struct _CrtMemBlockHeader * pBlockHeaderPrev;
char * szFileName;
int nLine;
#ifdef _WIN64
/* These items are reversed on Win64 to eliminate gaps in the struct
* and ensure that sizeof(struct)%16 == 0, so 16-byte alignment is
* maintained in the debug heap.
*/
int nBlockUse;
size_t nDataSize;
#else /* _WIN64 */
size_t nDataSize;
int nBlockUse;
#endif /* _WIN64 */
long lRequest;
unsigned char gap[nNoMansLandSize];
/* followed by:
* unsigned char data[nDataSize];
* unsigned char anotherGap[nNoMansLandSize];
*/
} _CrtMemBlockHeader;
How does Microsoft tag an allocation as ignore?
When the CRT wishes an allocation to be ignored for memory tracking purposes, six values in the debug memory allocation header for each allocation are set to specific values.
| Member |
Value |
#define |
| nLine |
0xFEDCBABC |
IGNORE_LINE |
| nBlockUse |
0×3 |
IGNORE_BLOCK |
| lRequest |
0×0 |
IGNORE_REQ |
| szFileName |
NULL |
|
| pBlockHeaderNext |
NULL |
|
| pBlockHeaderPrev |
NULL |
|
The Microsoft code goes out of its way to ensure no useful information can be gained from the header block for these ignored items.
When we first created MV we noticed that items marked as ignored should be ignored, otherwise you can end up with FALSE positive noise reported at the end of a memory debugging session due to the internal housekeeping of MFC/CRT.
How can you use this information in your application?
Microsoft also provides some flags which you can control which allows you to influence if any memory is reported as leaked. This is in addition to the CRT marking its own allocations as “ignore”. You can set these flags using the _CrtSetDbgFlag(int); function.
The following flags can be passed to _CrtSetDbgFlag() in any combination.
| Flag |
Default |
Meaning |
| _CRTDBG_ALLOC_MEM_DF |
On |
On: Enable debug heap allocations and use of memory block type identifiers. |
| _CRTDBG_CHECK_ALWAYS_DF |
Off |
On: Call _CrtCheckMemory at every allocation and deallocation request. (Very slow!) |
| _CRTDBG_CHECK_CRT_DF |
Off |
On: Include _CRT_BLOCK types in leak detection and memory state difference operations. |
| _CRTDBG_DELAY_FREE_MEM_DF |
Off |
Keep freed memory blocks in the heap’s linked list, assign them the _FREE_BLOCK type, and fill them with the byte value 0xDD. CAUTION! Using this option will use lots of memory. |
| _CRTDBG_LEAK_CHECK_DF |
Off |
ON: Perform automatic leak checking at program exit via a call to _CrtDumpMemoryLeaks and generate an error report if the application failed to free all the memory it allocated. |
How do I disable memory tracking for the CRT?
If you call _CrtSetDbgFlag(0); any memory allocated after that point will not be tracked.
With the above settings, all blocks are marked as ignore. You can see the code for this in the Microsoft C runtime.
The code that marks the block as “ignore” is at line 404 in dbgheap.c in the Microsoft C runtime (also used by MFC). When your code arrives here, nBLockUse == 1 and _crtDbgFlag == 0.
dbgheap.c line 404 (line number will vary with Visual Studio version)
if (_BLOCK_TYPE(nBlockUse) != _CRT_BLOCK &&
!(_crtDbgFlag & _CRTDBG_ALLOC_MEM_DF))
fIgnore = TRUE;
This sets fIgnore to TRUE. From this point onwards the memory tracking code ignores the memory and sets the values mentioned above in the memory block header.
Default values
The default value for _crtDbgFlag is set elsewhere in the Microsoft code with this line:
extern "C"
int _crtDbgFlag = _CRTDBG_ALLOC_MEM_DF | _CRTDBG_CHECK_DEFAULT_DF;
Tags: C, C++ Posted in Memory | No Comments » Posted by Stephen Kellett
July 9th, 2010
Typically you use srand() when you need to start the random number generator in a random place. You may do this because you are going to generate some keys or coupons and want them to start in an unpredictable place.
From time to time we provide special offers to customers in the form of a unique coupon code that can be used at purchase to get a specific discount. These coupons are also used to provide discounts to customers upgrading from say Performance Validator to C++ Developer Suite so that they do not pay for Performance Validator twice.
When the coupon management system was written, we used srand(clock()) thinking that would be an acceptable random value for generating coupons. The thinking was the management system would be running all the time and thus clock() would return a value that was unlikely to be hit twice for the number of valid coupons at any one time. However, the way the system is used is that users close the coupon management system when not in use and thus clock() will return values close to the starting time (start the app, navigate to the appropriate place, generate a coupon).
Result: Sooner or later a duplicate coupon is created. And that is when we noticed this problem.
This resulted in a confused customer (“My coupon has already been used”), a confused member of customer support (“That shouldn’t be possible!”) followed by some checking of the coupon files and then the code to see how it happened. Easy to fix, but better selection of the seed in the first place would have prevented the problem.
So if you want better random numbers don’t use clock() to seed srand().
Better seeds
- Use time(NULL) to get the time of day and cast the result to seed srand().
time(NULL) returns the number of seconds elapsed since midnight January 1st, 1970.
- Use rdtsc() to get the CPU timestamp and cast the result to seed srand().
rdtsc() is unlikely to return duplicate values as it returns the number of instructions executed by the processor since startup.
Tags: C, C++, clock, srand Posted in Hints and tips | No Comments » Posted by Stephen Kellett
June 17th, 2010
Datatype misalignment, there is a topic so interesting you’d probably prefer to watch paint dry.
But! There are serious consequences for getting it wrong. So perhaps you’d better read about it after all
The problem that wasted my time
Why am I writing about datatype misalignment? Because its just eaten two days of my time and if what I share with you helps save you from such trouble, all the better.
The problem I was chasing was that three calls to CreateThread() were failing. All calls were failing with ERROR_NOACCESS. They would only fail if called from functions in the Thread Validator x64 profiling DLL injected into the target x64 application. If the same functions were called later in the application (via a Win32 API hook or directly from the target application) the functions would work. That meant that the input parameters were correct.
Lots of head scratching and trying many, many variations of input parameters and asking questions on Stack Overflow and we were stuck. I could only think it was to do with the callstack but I had no idea why. So I started investigating the value of RSP during the various calls. The investigating what happened if I pushed more data onto the stack to affect the stack pointer. After some trial and error I found a combination that worked. Then I experimented with that combination to determine if it was the values being pushed that were important or the actual value of the stack pointer that was important.
At this point I was confused, as I didn’t know about any stack alignment requirements, I only knew about data alignment requirements. I then went searching for appropriate information about stack alignments and found this handy document from Microsoft clarifies that.
What is datatype misalignment?
Datatype alignment is when the data read by the CPU falls on the natural datatype boundary of the datatype. For example, when you read a DWORD and the DWORD is aligned on a 4 byte boundary.
In the following code examples, let us assume that data points to a location aligned on a four byte boundary.
void aligned(BYTE *data)
{
DWORD dw;
dw = *(DWORD *)&bp[4];
}
Datatype misalignment is when the data read by the CPU does not fall on the natural datatype boundary of the datatype. For example, when you read a DWORD and the DWORD is not aligned on a 4 byte boundary.
void misaligned(BYTE *data)
{
DWORD dw;
dw = *(DWORD *)&bp[5];
}
Why should I care about datatype misalignment?
Aligned data reads and data writes happen at the maximum speed the memory subsystem and processor can provide. For example to read an aligned DWORD, one 32 bit data read needs to be performed.
DWORD BYTE [ 0];
BYTE [ 1];
BYTE [ 2];
BYTE [ 3];
DWORD BYTE [ 4]; // ignored
BYTE [ 5]; // read
BYTE [ 6]; // read
BYTE [ 7]; // read
DWORD BYTE [ 8]; // read
BYTE [ 9]; // ignored
BYTE [10]; // ignored
BYTE [11]; // ignored
DWORD BYTE [12];
BYTE [13];
BYTE [14];
BYTE [15];
Misligned data reads and data writes do not happen at the maximum speed the memory subsystem and processor can provide. For example to read a misaligned DWORD the processor has to fetch data for the two 32 bit words that the misaligned data straddles.
In the misaligned example shown above, the read happens at offset 5 in the input array. I’ve shown the input array first 16 bytes, marking where each DWORD starts and showing which bytes are read and which are ignored. If we assume the input array is aligned then the DWORD being read has 3 byte2 in the first DWORD and 1 bytes in the second DWORD. The processor has to read both DWORDs, then shuffle the bytes around, discarding the first byte from the first DWORD and discarding the last 3 bytes from the last DWORD, then combining the remaining bytes to form the requested DWORD.
Performance tests on 32 bit x86 processors shown performance drops of between 2x and 3x. Thats quite a hit. On some other architectures, the performance hit can be much worse. This largely depends on if the processor does the rearrangement (as on x86 and x64 processors) or if an operating system exception handler handles it for you (much slower).
I’ve shown the example with DWORDs, because the they are short enough to be easily shown in a diagram whereas 8 byte or larger values would be unweildy.
The above comments also apply to 8 byte values such as doubles, __int64, DWORD_PTR (on x64), etc.
Clearly, getting your datatype alignments optimized can be very handy in performance terms. Niave porting from 32 bit to 64 bit will not necessarily get you there. You may need to reorganise the order of some data members in your structures. We’ve had to do that with Thread Validator x64.
Not just performance problems either!
In addition to the performance problems mentioned above there is another, more important consideration to be aware of: On x64 Windows operating systems you must have the stackframe correctly aligned. Correct stack alignment on x64 systems means that the stack frame must be aligned on a 16 byte boundary.
Failure to ensure that this is the case will mean that a few Windows API calls will fail with the cryptic error code ERROR_NOACCESS (hex: 0×3E6, decimal: 998). This means “Invalid access to memory location”.
The problem with the error code ERROR_NOACCESS in this case is that the real error code that gets converted into ERROR_NOACCESS is STATUS_DATATYPE_MISALIGNMENT, which tells you a lot more. I spent quite a bit of time digging around until I found the true error code for the bug I was chasing that lead me to write this article.
If you are writing code using a compiler, the compiler will sort the stack alignment details out for you. However if you are writing code in assembler, or writing hooks using dynamically created machine language, you need to be aware of the 16 byte stack alignment requirement.
x64 datatype alignment requirements
Stack:
Correct stack alignment on x64 systems means that the stack frame must be aligned on a 16 byte boundary.
Data:
| Size |
Alignment |
| 1 |
1 |
| 2 |
2 |
| 4 |
4 |
| 8 |
8 |
| 10 |
10 |
| 16 |
16 |
Anything larger than 8 bytes is aligned on the next power of 2 boundary.
Conclusion
Correct stack frame alignment is essential to ensure calling functions works reliably.
Correct datatype alignment is essential for maximum speed when accessing data.
Failure to align stack frames correctly could lead to Win32 API calls failing and or program failure or lack of correct behaviour.
Failure to align data correctly will lead to slow speed when accessing data. This could be disasterous for your application, depending upon what it is doing.
References
Microsoft x86/x64/IA64 alignment document.
Tags: alignment Posted in Porting to Win64 | No Comments » Posted by Stephen Kellett
|