Ip2Location (and IPinfoDB) Performance Tips

by Brian Hitney 9. March 2010 14:00

I’ve done a number of talks lately on Worldmaps and typically in side conversations/emails, people are curious about the databases and converting IP addresses to geographic locations.   And, often when you dive into using the data, it seems there are a number of performance considerations and I thought I’d share my input on these topics.

First up, the data.  Worldmaps uses two databases for IP resolution.  The primary/production database is Ip2Location.  I’ve found this database to be very accurate.  For development/demo purposes, I use IPinfoDB.  I haven’t had too much time to play with this database yet, but so far seems accurate also.   The latter is free, whereas Ip2Location is not.

In either case, the schema is nearly identical:

image

The BeginIp and EndIp columns are a clustered primary key.  In the case of IPinfoDB, there is no EndIp field (and it’s not really needed).  When performing a resolution, a string IP address is converted into a 64 bit integer and then used in searching the table.  That’s why having a clustered key on the BeginIp (and optionally EndIp) is crucial to performance.

But it doesn’t stop there.   The examples posted in the database’s respective home pages are accurate and simple, but need to be refactored for performance.  For example, to do a simple resolution on Ip2Location, according to their docs:

SELECT * FROM dbo.Ip2Location WHERE @IpNum BETWEEN BeginIp and EndIp

And for IPInfoDB:

SELECT TOP 1 * FROM IPInfoDB where BeginIp <= @IpNum ORDER BY BeginIp DESC

Both of these methods are perfectly fine, particularly for use as generic samples.  The second one is on the right track, but it doesn’t work for joins so if you’re querying over a range, you’d need to refactor.  And in the first example, using a BETWEEN operator forces a clustered index scan when joining, killing the performance.

If we run the first example across my minified Ip2LocationSmall table, we’ll see something like this (and this is running off of SQL Azure – the perf is pretty great compared to localhost!):

image

We can also look at the time:

image

Ouch!  Now, it doesn’t seem too bad, but imagine doing thousands of these requests per minute, or doing large joins. 

The goal then is to provide some hints that will optimize the query, particularly for joins.  Our indexes are correct, so we can rework the query to get rid of the BETWEEN operator – we can sacrifice a little readability and do something like:

SELECT *   
FROM (
    select
        ( select MAX(beginip)
          from dbo.Ip2LocationSmall
          where BeginIp <= @IpNum
        ) as IP_Begin           
    ) as foo
INNER JOIN dbo.Ip2LocationSmall iploc
ON iploc.BeginIp = foo.IP_Begin

The result:

image

And the time shows some improvement:

image

But the REAL benefit comes when we need to join.   Suppose I’d like to get a list of the countries for a given map (which is a parameter called MapId):

SELECT DISTINCT(ip.CountryCode)
FROM MapHits hits
INNER JOIN dbo.Ip2LocationSmall ip
    ON hits.IpNum BETWEEN ip.BeginIp AND ip.EndIp
WHERE MapId = @MapId

The query returns 95 rows, and executes in 16 seconds:

image

image

In this case, we can refactor this using the method above to something like:

SELECT
    DISTINCT(CountryCode)   
FROM (
     select IpNum,
        ( select MAX(beginip)
          from Ip2LocationSmall
          where BeginIp <= IpNum
        ) as IP_Begin
      from dbo.MapHits as hits
      where MapId = @MapId
  ) as foo
INNER JOIN Ip2LocationSmall iploc
ON iploc.BeginIp = foo.IP_Begin

Again, not as pretty looking, but boy what a difference:

image

image

We went from 16,500 milliseconds to 260 – over 60x the performance!  Mike @AngryPets would be proud.   The reason for the perf gain is we were able to eliminate the nested loop, which is (in this case) scanning the entire clustered index for the matching rows.

The second benefit is the ability to switch schemas easily between IP2Location and IPinfoDB, and we can additionally lose the EndIp column which trims the size of the table.

Tags: , , ,

SQL Azure | SQL | Tech Tips | Development

Thoughts on Windows Azure Pricing…

by Brian Hitney 9. February 2010 08:59

There are a LOT of posts out there talking about Azure pricing.  There’s the Azure TCO Calculator, and some good practices scattered out there that demystify things.  Some of these bear repeating here, but I also wanted to take you through my math on expenses – how you design your app can have serious consequences on your pricing.  So let’s get the basic pricing out of the way first (just for the Azure platform, not AppFabric or SQL Azure):

  • Compute = $0.12 / hour
  • Storage = $0.15 / GB stored / month
  • Storage transactions = $0.01 / 10K
  • Data transfers = $0.10 in / $0.15 out / GB - ($0.30 in / $0.45 out / GB in Asia)

Myth #1:  If I shut down my application, I won’t be charged.

Fact:  You will be charged for all deployed applications, even if they aren’t running.  This is because the resources are allocated on deployment, not when the app is started.  Therefore, always be sure to remove deployments that aren’t running (unless you have a good reason to keep them there).

Myth #2:  If my application is less CPU intensive or idle, I will be charged less.

Fact:  For compute hours, you are charged the same whether your app is at 100% CPU or idle.  There’s some confusion (and I was surprised by this, too) because Azure and Cloud provisioning is often referred to as "consumption based” and (in this case, incorrectly) compared to a utility like electricity.  A better analogy is that of a hotel room.  An Azure deployment is reserving a set of resources.  Like the hotel room, whether you use it or not doesn’t change the rate.

On the plus side, Compute hours are a fairly easy thing to calculate.  It’s the number of instances in all of your roles * $.12 for small VM instances.  A medium instance (2 core) is $.24, and so on.

Myth #3:  There’s no difference between a single medium instance and two small instances.

Fact:  While there is no difference in compute price, there is significant difference in that the two small instances offer better redundancy and scalability.  It’s the difference between scaling up vs scaling out.  The ideal scenario is for an application that can add additional instances on demand, but the reality is that applications need to written to support this.

In Azure, requests are load balanced across all instances of a given webrole.   This complicates session and state management.  Many organizations do what is called sticky persistence or sticky sessions when implementing their own load balancing solution in their applications.  When a user visits a site, they will continue to visit the same server for their entire session.  The downside of this approach is that should the server go down, the user is redirected to another server and loses all state information.  However, it’s a viable solution in many scenarios, but not one that Azure load balancing offers.

Scaling up is done by increasing your VM size to medium (2 core), large (4 core), or XL (8 core), with more RAM allocated at each level.  The single instance becomes much more powerful, but your limited by the hardware of a single machine.

In Azure, the machine keys are synchronized among instances so there is no problem with cookies and authentication tokens, such as those in the ASP.NET membership providers.  If you need session state information, this is where things get more complicated.  I will probably get zinged for saying this, but there is currently no good Azure-based session management solution.  The ASP Providers contained in the SDK does have a Table Storage Session State demo, but the performance isn’t ideal.   There are a few other solutions out there, but currently the best bet is to not rely on session state and instead use cookies whenever possible.

Now, having said all this, the original purpose of the post:  I wanted to make sure folks understood transactions costs with Azure Storage.  Any time your application so much as thinks about Storage, it’s a transaction.  Let’s use my Worldmaps site as an example.  This is not how it works today, but very easy could have been.  A user visits a blog that pulls an image from Worldmaps.  Let’s follow that through:

Step Action Transaction #
1 User’s browser requests image.  
2 Worker roll checks queue. (empty) 1
3 If first hit for map (not in cache), stats/data pulled from Storage. 2
4 Application enqueues hit to Azure Queue. 3
5 Application redirects user to Blob Storage for map file. 4
6 Worker dequeues hit. 5
7 Worker deletes message from queue. 6

While #3 is only on first hit for a given map, there are other transactions going on behind the scenes and if you are using the Table Storage Session State provider … well, it’s another transaction per hit (possibly two, if session data is changed and needs to be written back to storage).

If Worldmaps does 200,000 map hits per day (not beyond the realm of possibility but currently a bit high), then 200,000 * 6 = 1,200,000 storage transactions.  They are sold in 10,000 transactions for $.01, so that’s 120 “units” or $1.20 per day.  Multiply that by 30 days, and that’s about $36/mo for storage transactions alone – not counting the bandwidth or compute time.

I realized this early on and as a result I significantly changed the way the application works.  Tips to save money:

  1. If you don’t need durability, don’t use Azure queues.  Worldmaps switches between in-memory queues and Azure queues based on load, configuration, and task.  Since queues are REST calls, you could also make a WCF call directly to another Role.
  2. Consider scaling down worker roles by multithreading particularly for IO heavy roles.  Also, a webrole’s run method (not implemented) simply calls Thread.Sleep(-1), so why not override it to do processing?  More on this soon…
  3. SQL Azure may be cheaper, depending on what you’re doing.  And potentially faster because of connection pooling.
  4. If you aren’t interested in CDN, use Azure Storage only for dynamic content.
  5. Don’t forget about LocalStorage.  While it’s volatile, you can use it as a cache to serve items from the role, instead of storage.
  6. Nifty backoff algorithms are great, but implement only to save transaction cost.  It won’t affect compute charge.
  7. Take advantage of the many programs out there, such as hours included in MSDN subscriptions, etc.

Next up will be some tips on scaling down and maximizing the compute power of each instance.

Tags: ,

Azure | Development | Tech Tips

New VS Tools for Windows Azure, Feb 2010

by Brian Hitney 8. February 2010 13:52

The latest Visual Studio Tools for Windows Azure (v 1.1, Feb 2010) was just released and can be downloaded here.  It does not support VS2010 Beta 2, so you’ll need to either use VS2008, or wait until VS2010 RC in a few weeks.

I’m really excited about this release for 1 immediate to-do in my code, and 1 would-be-fun to-do in my code.  First, Windows Azure Drive beta is available in this release (called XDrive at PDC).   Windows Azure Drive lets you mount an NTFS virtual hard disk (VHD) between one or more roles (onle 1 can mount the drive as for writing).  The drive itself is stored in Azure Blob Storage, but behind the scenes there are some nice features (like caching) to make it a great option for Azure storage, particularly if you’re migrating an application that made extensive use direct disk I/O.

Now, since the November release, Queue storage included a dequeue count property that allows you to have visibility into how many times a message has been dequeued.   But, the StorageClient included with the VS tools didn’t have this property, so until now you’d have to do your own implementation to get the value.

Seeing the dequeue count is pivotal in dealing with poison messages.  A poison message, in queue parlance, is a message that contains malformed data that ultimately causes the queue processor to throw an exception.  The result is that message isn’t processed, stays in the queue, only to be processed again and with the same result.  Depending on the visibility timeout of the message (which is 30 seconds by default) this can be a disaster. 

Looking at the dequeue count can be key to discovering a poison message.  For example, suppose you have the following exception handler:

catch (Exception e)
{
    if (msg != null && msg.DequeueCount > 2)
    {
        queue.DeleteMessage(msg);
    }

If an exception was raised in attempting to process the message that wasn’t otherwise handled, we’ll get here.  This is my outermost exception handler.  In a workerrole, you _always_ want to make sure you have the opportunity to catch any exception, because otherwise the role will exit and recycle. 

In this case, I can check the dequeue count and simply delete the message if it’s already been dequeued 3 or more times (an arbitrary number on my end, chosen because of the relatively high timeout which means the message will live for an hour before being discarded).  If we hit that number, I’m going to discard the message.  Optionally, I could log it to an invalid queue table, or put it in another queue, etc.  The important thing is that we recognize a poison message and deal with it.  With this particular queue, missing a message isn’t that critical so I can just delete the message and move on. 

Tags: , ,

Azure | Development

Windows Azure Role Communication

by Brian Hitney 3. February 2010 11:07

I have a few posts along the lines of scaling an Azure application, but thought I’d throw this tidbit out.  You might be developing your own management system for your Azure deployments, and in doing so, having visibility into the roles running within your deployment is very handy.   The idea behind this series of posts is to ultimately explore how to get visibility into roles, and then a few ways we can tackle inter-role communication. 

The first task is to define internal endpoints for each of our roles.  If you don’t do this, you will only have visibility into the current role.  So, if you have a single role solution (any number of instances) than you don’t have to do this. 

Let's take a pretty simple example where we want to enumerate all the roles in our deployment, and their instances:

foreach (var appRole in RoleEnvironment.Roles)
{
    var instances = RoleEnvironment.Roles[appRole.Key].Instances;

    foreach (var instance in instances)
    {
        Response.Write(instance.Id);
    }
}

While this code will work fine, by default it will only show you the instances for the current role – other roles will show an instance count of zero.  Running this in my workerrole and trying to enumerate my webroles, I see this:

image

What we can do is define and internal endpoint for our roles that will allow this data to be discoverable.  In Visual Studio, it’s simply a matter of going into the properties of the role and defining an internal endpoint:

image

And we’re done.  Debugging the solution again:

image

We can do the same for our workerroles to be discoverable by other webroles, etc. 

In some future posts, I’ll build off this idea for doing some lightweight service management and role communication.

Tags: , ,

Azure | Development | Tech Tips

SQL Azure Transport-Level Error

by Brian Hitney 3. February 2010 10:34

If you’ve been playing with SQL Azure, you might have run into this error when opening a connection to the database:

A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 - An established connection was aborted by the software in your host machine.)

Ick!  Anything at that transport level surely can’t be a problem with my code, can it?  :)

The goods news is that you aren’t going crazy and the problem isn’t with your code.  There’s a difference in the way SQL Azure manages its connections compared to .NET connection pooling, and the result is that the connection pool is giving out connections that have already been closed.  When you try to do something with that connection, you get this error.   It’s sporadic in that only happens when you get an idle connection from the pool that has already been dropped by SQL Azure.

There are a couple of workarounds until a fix is implemented by Microsoft (I’ve been told it’s coming soon).  One method is to retry the connection (I hate this one, but it’s a viable option nonetheless).  It’s just messy and I’d have to do this in a few dozen places.  The amusing fix is to have a thread that continually pings the database, keeping every connection alive.  The best fix that I’ve found to date is to simply turn off connection pooling temporarily by adding a pooling=false option to the connection string:

image

I tested this on my webrole, leaving my workerroles as-is, and the webrole has been running for a week or two without a single error, whereas the workerrole (without disabling pooling) will get a couple errors every day.

I haven’t done any performance tests but UA testing (which is me) sees no appreciable hit, so I’ll go with this option until the permanent fix is deployed. 

Tags: , , ,

Azure | Development | SQL | Tech Tips

Leaderboard Tweaks and T-SQL Fun

by Brian Hitney 21. January 2010 06:01

With the chaos of the Worldmaps to Azure migration winding down, I decided to take a look at some the low-pri issues on my to-do list.   Those close to the project or who follow their leaderboard ranking closely know that last year, the leaderboard was ranked by Total Hits.  This worked pretty well at the time, but it made it impossible for those who joined the site weeks or months after others to compete unless their site had considerable more volume to close the gap.

This year, I changed the formula to base the data off of average hits per day.  This adds a bit more excitement because it’s a level playfield regardless of join date, for the most part.  The one remaining problem I have with hits per day is that if a site gains popularity over time, the resulting average encourages the user to delete and recreate their map for better rankings.  At some point in time, I’ll further refine this so it looks at hits per day for the last month or two (for example).

In the meantime, though, I got some heat from Chris Eargle who pointed out some problems with my implementation when he looked at his numbers.  The way the implementation worked was that it grabbed the earliest hit, the most recent hit, and divided by the number of days.   To prevent division by zero and give a minimum of 1 day, 1 was added to the number of days like so:

coalesce(sum(hits.NumHits) / (DATEDIFF(d, MIN(hits.CreateDate), MAX(hits.ModifyDate))+1) ,0) as HitsPerDay

While this worked “okay,” it wasn’t perfect.  For starters, using DATEDIFF returns an integer based on the dates passed in.  If you pass in “1/21/2010 23:59” and “1/22/2010 00:01,” the DATEDIFF (by days) is 1, even though only 2 minutes have passed.  If you pass in “1/22/2010 00:01” and “1/22/2010 23:59,” a zero is returned even though it’s just 2 minutes short of a day.  This isn’t wrong, but it’s limited for what we need because the leaderboard is generated throughout the day. 

Adding 1 can be incorrect because, using the first example, it will now return 2 days (!) when only 2 minutes have expired.  It’s a pretty extreme example but possible.  I wanted an implementation that, using T-SQL alone, would be a bit more consistent.  The criteria I had was: 1) had to support partial days and 2) had to give a minimum of 1 day.  #2 is because I don’t want a map that was just created (as in the first time example) to end up getting (again, for example) 50 hits in 2 minutes and due to the math, getting averaged at say 100,000 hits per day.  Using a minimum of 1 day makes the leaderboard a little more stable.  The resulting query:

CAST(FLOOR(
    SUM(NumHits)/
      CASE 
        WHEN (DATEDIFF(hh, MIN(hits.CreateDate), MAX(hits.ModifyDate))/cast(24 as float)) < 1 THEN 1        
        ELSE (DATEDIFF(hh, MIN(hits.CreateDate), MAX(hits.ModifyDate))/cast(24 as float))
      END
      )
as int) as HitsPerDay

This works out really well.  First, we use hours instead of days and simply divide by 24.  It’s essentially the same as days, but gives a fractional value.  If the value is less than 1, we’ll use just 1 as the minimum.  This solves the second criteria above and also any division by zero cases.  The ELSE of the CASE gives us the fractional days, so hits per day is more accurate and no more +1 tomfoolery.  Even though the FLOOR returns an whole number, the data type is unchanged so a cast to an int is necessary for the application.

If we look at the results in query analyzer, you can see the results are more true:

image

The “days” and “days_partial” shows the difference between using hours and days in DATEDIFF, giving us more accuracy.  The “HitsPerDay_New” compared to “HitsPerDay_Original” shows the more accurate result.  In some cases unchanged, in some cases much more accurate as you can see by looking at the “FirstHit” and “MostRecentHit” columns.

Anyway, just some fun playing with T-SQL and if you see some adjustments in the Leaderboard, this is why. 

But Chris, it won’t help you get into the Top 10. :)

Tags: , ,

Worldmaps | Development | SQL

Podcast with Michael Kimsal

by Brian Hitney 19. December 2009 02:51

The other day, I had the honor of sitting down and rambling endlessly with Michael Kimsal in this podcast.  It was a lot of fun, despite the overly loud announcement system in the background (“…table 3, your order is ready” type of thing).   Michael is one of those insanely smart kind of guys that has a really balanced view of technology and a lot of fun to talk to.   Michael publishes (among other things) jsmag, a must read for javascript developers.  Check it out!   Also, if you use coupon HITNEY on the site, you can get a free issue!  (Thanks Michael!)

Tags: ,

Development | Technology | Babble

Worldmaps Migration Underway

by Brian Hitney 19. December 2009 02:40

For those of you with Worldmap accounts, the Azure migration is (hopefully) underway.  This process will take awhile, in part due to the holiday vacation :) but also testing and what not.  Starting today, the accounts section will be closed for changes until the migration is complete.

Hopefully, the migration will not cause any breaking changes.  However, the biggest possible change is to make sure you are accessing the service using www in the URL … for example, “http://www.myworldmaps.net….” – if you’re leaving that out, it will have to be added in or the map will not work after the migration due to limitations with the DNS CNAMEing in Azure.

Stay tuned, and hope for a smooth ride and management approval :)

Tags: ,

Development | Worldmaps

Pay Attention to your Azure Blobs

by Brian Hitney 13. December 2009 15:35

I have a fairly large Windows Azure migration I’m working on, and there are dozens of tips, recommendations, gotchas, etc. I’ve learned in the process.  This is one small item that cost me quite a bit of time, and it’s so simple I’m detailing it here because someone will run into this one.

First a bit of background: if you’re deploying a Windows Azure application, the package is uploaded and deployed as a whole.  If you have dynamic content or individual files that are subject to change, it’s a good idea to consider placing it in Azure Storage, otherwise you’ll have to redeploy your entire application to update one of these files.  In this case, I’d like to put a Silverlight XAP file in Azure storage, instead of the typical /ClientBin folder.

There are a number of references on the net for setting this up – searching for “XAP” and “Azure” returns a lot of good references including this one.  After checking my Silverlight project’s app manifest file, ensuring everything was correct, and uploading the XAP to my Azure storage account, my Silverlight app would refuse to run.  The page would load, but then … nothing.  I also checked Fiddler – the XAP file _was_ getting downloaded (and was downloading fine via a browser).  This is typically a tell-tale sign of a policy issue – yet I was certain everything was correct.  Here’s a screenshot (click for larger) of Fiddler downloading the file.  Can you spot the problem here?

image

The problem was indirectly with Cerebrata’s Cloud Storage Studio.   Cerebrata’s product is really nice and I enjoy working with it quite a bit to work with Azure storage (uploading/downloading files).  CloudBerry’s Explorer for Azure Blob Storage is another good one – I typically work with Cerebrata’s simply because it was easier to connect to development storage (not sure if this is possible in CloudBerry’s app).

Fast-forward and hour or two of frustration.  Staring at Cloud Storage Studio, I see:

image

Zip file as a content type?  This was in the Fiddler screenshot above, too.  I figured this was fine because after all, a XAP file is a zip file.  But as it turns out, this was the problem.   For kicks, I tried re-uploading the XAP file from CloudBerry to see how it handled the content type, and:

image 

Cloud Storage Studio does allow you to alter the content types, but truthfully I didn’t think this was all that important.  When I reloaded my application, though, the app loads and Fiddler comes to life as it should:

image

For kicks I changed the content type back to Zip, and the app would fail to load.  So, lesson learned!  Don’t forget about the content types.

Tags: , ,

Development | Microsoft | Tech Tips | Azure

Worldmaps Update & EOY Changes

by Brian Hitney 6. December 2009 16:05

Hi folks!  I wanted to take the opportunity and outline a few changes for Worldmaps – both changes in the service, the backend, and new features.

image First up: Stumbler.  A link to Stumbler was added to the nav menu at the top of the page.  If you haven’t played around with Stumbler, check it out.  It maps users in near real time to websites.

One of the biggest changes is the update to the new Bing Maps Silverlight control.  This doesn’t change much from the end-user point of view, but a few new features have been added so the UI in Stumbler is a bit cleaner around pushpins and effects.  In addition, there’s a new setting in the settings dialog to choose whether or not to scroll the map automatically.  Normally you zoom around the map automatically, but now you can turn that off to stay focused on a certain area.

image The other big new feature is multiple/extensible leaderboards.  The way it worked until now was there a single leaderboard for all users.  This “master leaderboard” is still there, but having sub leaderboards is a lot more interesting.  I defined a few leaderboards like “Tech Blogs” and “Personal Sites,” but will add more over time (leave feedback on new leaderboards!).  On the leaderboard page, the default view is the all-up leaderboard, while each leaderboard is displayed on the left nav. 

To pick a leaderboard, log in to your account and edit your maps.  In the detail section, you’ll see a drop down that allows youimage to select a leaderboard (see image to the right).  I should also point out that the sub-boards are sorted based on hits per day, not total hits. 

The other big change to outline is the end-of-year change.  At the end of the year, the individual hit data is reset.  This means all the red dots on the map will be wiped clean and start over again.  Data will be archived and available in some fashion, but this hasn’t been implemented yet.  Additionally, next year (2010) the main leaderboard will be sorted based on average hits per day. 

Now to address the biggest architectural change:  if you’ve created an account on Worldmaps but haven’t gotten an email yet with the approval, fear not.  Unfortunately the volume is such that with limited infrastructure, there’s not much more that can be done.  This will soon change.  Worldmaps has been moved to Windows Azure, Microsoft’s cloud computing platform.  This should give Worldmaps a nice bump in scalability (limited really only by funds).  So stay tuned in early 2010 for more info.

Tags: ,

Development | Technology | Worldmaps

your host...

Brian Hitney
Developer Evangelist
Microsoft Corp.

About Me

My Worldmap