Code Performance: An Inside Conversation

Internally at Microsoft, there are a zillion different e-mail lists.  Sure, there are wikis, Sharepoint sites, portals, etc., but the e-mail lists make up the hive mind of the collective.  The difficult part is "attaching" and finding conversations of interest ... and every now and then one catches my eye, as this one did.  (And of course, while I'm re-reading this several times, another is setting up a rule to delete the thread automatically, but hey, that's the fun part.)

The topic started out as a way to profile performance of 2 different algorithms -- one written in C++ and the other in C#.  The consensus was that the two are difficult to compare (apples and oranges), as of course, the idea of "better" doesn't necessarily mean performance (in fact, in today's software world, I'd argue that performance is seldom the most important thing).  But, the topic then turned to hand-coding some assembly to make up for deficiencies in the language or compiler's ability to make full use of the processor's capabilities.

Robert Ragno, a developer in Microsoft Research, chimed in with some great comparisons of several different functions calculating, for the most part, the same thing.  (Posted with his permission -- thanks Robert!) 

Suppose you wanted to natively calculate an integer log_2 -- this may look like:

int log2_simple(int n)
{
  int i = -1;
  while (n != 0)
  {
    i++;
    n >>= 1;
  }
  return i;
}

time: 20.35

You then decide that floating point units are faster (and less accurate, but you figure that's ok) so you use something like this:

#define loge2 (0.693147180559945F)
int log2_math(int n)
{
  return (int)(((float)log((double)n))/loge2);
}

time: 7.05

Perhaps the original C version can be optimized a bit since the compiler may not be optimizing as well as it could:

__inline int log2_fastc(int n)
{
  register int i,m;
  for (i=-1,m=1;m<=n;i++,m<<=1);
  return i;
}

time: 10.23

This ended up being twice as fast as the original.  However, if we're going to go to this degree, we might as well simply code this inline with assembly:

__inline int log2_asm(int n)
{
  __asm
  {
    bsr eax, n
    mov n, eax
  }
  return n;
}

time: 1.14

And of course, there's our favorite Math.Log function in .NET, which is about 35% slower than our first C example.  (And worth pointing out that it is also accurate.)  These low level math operations are often the worst performers in managed code, but we already knew that, right?

Anyway, an interesting thread made all the better by the benchmarks.  Thanks, Robert, for letting me share!
Comments are closed

My Apps

Dark Skies Astrophotography Journal Vol 1 Explore The Moon
Mars Explorer Moons of Jupiter Messier Object Explorer
Brew Finder Earthquake Explorer Venus Explorer  

My Worldmap

Month List