Geographic Reporting, Part 2

In part 1 (read it here) I discussed the prerequisite data that we needed to act as a foundation for developing geographic-based reports.  In this part, I'll discuss how we map those IP addresses to geographic locations, and finally, in part 3, we'll plot the data on a map.

When it comes to figuring out where a user is located, there's only two pieces of information we need: latitude and longitude.  Other data may be useful for other reports (like Country, Currency, etc., but all that could be derived from the latitude and longitude). It's not surprising that this is so difficult: IP addresses (obviously) are not assigned by geographic location (though, IPs in subnets are frequently geographically close and we can use that to our advantage), so we can't simply infer this like we do with other information like the user agent.  And even if we knew where an IP was located, how do we handle proxy servers or NAT?

The answer is: we don't. Proxies and NAT are, generally speaking, not transparent (meaning, we can't see behind the proxy).  In some rare instances, a proxy may forward an HTTP_X_FORWARDED_FOR header, but most proxy servers don't (and some even put random IP addresses in there).  If I'm on east coast and VPN into my home network (or use a proxy server) on the west coast, and then visit some websites, it will look like I'm visiting from the west coast.

So there's a certain amount of inaccuracy we'll have to live with and it's the way of the internet -- perfect, 100% accuracy is simply not possible.   Geolocation of IP addresses will be particularly inaccurate with mobile or satellite based systems.

There are a few possible approaches to resolve the data.  One way is by data mining your visitors.  For example, based on IP address alone, we can look up the IP owner through a WHOIS search on one of the registrars like ARIN or RIPE.  If your site requires registration, you can cross reference that registration information with the WHOIS lookup and likely make a good guess.  Of course, users may be visiting from work or an internet cafe or a friend's house, so you'd need to evaluate IPs on a case by case basis.  But with a little logic, time, and volume, your guesses would likely be reasonably accurate.  As I pointed out in my original post, each pixel on a world map that's 720 pixels wide is about 30 square miles -- not exactly requiring laser precision.

Of course, we don't want to spend the time to investigate each IP address ourselves.  Fortunately, a LOT of people want this information, and thanks to e-commerce, sites often sell your information and, because of the demand, it's pretty easy to get.  Like credit bureaus, if a company can collect enough information from enough sources, the resolution and accuracy begins to ramp up.  My IP resolves as Bellevue, WA -- not exactly correct, but close enough and is technically pixel-accurate on my 720 pixel map. 

In fact, IP resolution is very much like credit bureaus -- individuals are scored based on individual history.  If there is no history, he or she can thank their neighbors and the number of liquor stores in their neighborhood for their credit score, because that's what they use to determine credit worthiness.  It's profiling, and research shows it's accurate.  Likewise, these providers combine and cross reference data from multiple sources to profile someone, even if they've never surfed the internet before.

So let's do what everyone else does and buy the information, and to a certain degree, get it for free.  In my quest, I found a few companies: the first is CDYNE.  They offer a web service called IP2Geo that accepts an IP address, and returns a bunch of really cool information.  Want to give a try?  Head to their asmx test page here and see if it works for you.  While I liked the convenience of a web service, the accuracy was really off (up to half the globe off) on a few of my test cases (for example, one IP that I know is in Seattle resolves as being in Tel Aviv).  Some feedback I saw online suggested the same thing.

The second I looked at was IP2Location.  Unfortunately, their database offering that included latitude and longitude was a bit pricey, so I needed to pass.  However, while searching for information on this company, I noticed that another company called FraudLabs offers the IP2Location database over a web service (see their site here).  You can sign up for a trial key and then subscribe at reasonable rates (about 4 cents per resolution, depending on volume).

The third provider I looked at (and ultimately selected) was GeoBytes.  There's a server-side solution you can purchase (like IP2Location), but I liked the flexibility of "MapBytes" -- a pay-per-resolution model they offer.  Of the three, I felt GeoBytes' website wasn't quite as professional or clean, but they nailed my test cases, responded to 2 questions almost immediately -- one on a Saturday and the other on a Sunday, and there's a support forum to boot.  Plus, there's another incentive: they offer 20 lookups per hour for free (and I don't know about you, but I rarely get more than 20 new and unique IPs per hour ... for that matter, I rarely get more than 20 requests per hour!).

It would be ideal if they offered a web service, but instead the data is retrieved by a raw HTTP GET.  The response is given in a template file that you can specify (I use the XML template).  I then parse the XML and store the result.  I purchased a set of MapBytes that essentially let me go above the 20/hour limit and seeded the bulk of my data (each resolution costs about 1/2 cent), allowing the rest to resolve over time to stay within the 20/hour limit.

Using MapBytes with a programmatic HTTP GET is a bit complicated.  The request requires a session token in the URL which means you have to log in to get the token.  While this can probably be accomplished programmatically, it's a pain and requires parsing the response headers for token.  This step isn't needed for the free 20/hour lookups -- only going above this number requires the access token, and it expires after 90 minutes of inactivity.  GeoBytes told me they're working on a web service, so hopefully they'll release that soon and resolutions will be much easier.  Their forums have a lot of information to help with this process.  So far, though, I can highly recommend MapBytes without question.   In fact, to put it to the test, I'm going to have a "Show Me" page that will try to guess your location, and I'll put a little survey on there to see how accurate it is.

Once you have the latitude and longitude and can map it to the data we gathered in step 1, you're almost home.  If you're looking to get off the ground with GeoBytes, here's some test C# code to make the request.  It will make a request to any URL and return a string containing the data received (note, however, that I'm assuming an ISO-8859-1 encoding that GeoBytes uses).

public string FetchAddress(string address)
{
    StringBuilder sb = new StringBuilder();
    HttpWebRequest httpReq;
    HttpWebResponse httpRsp = null;
 
    try
    {
        httpReq = (HttpWebRequest)WebRequest.Create(address);
        httpReq.AllowAutoRedirect = true;
        httpReq.Timeout = 30000;
      
        httpRsp = (HttpWebResponse)httpReq.GetResponse();
       
        byte[] buf = new byte[1024];
        Stream resStream = httpRsp.GetResponseStream();
        int count = 0;
 
        do
        {
            // fill the buffer with data
            count = resStream.Read(buf, 0, buf.Length);
 
            // read some data and append it to our string
            if (count != 0)
            {
                sb.Append(Encoding.GetEncoding("ISO-8859-1").GetString(buf, 0, count));
            }
        }
        while (count > 0);
 
        return sb.ToString();
    }
    catch (Exception ex)
    {
        //do whatever handling is appropriate, or just...
        return null;
    }
    finally
    {
        if (httpRsp != null)
        {
            httpRsp.Close();
        }
    }
}

Getting the results from GeoBytes can be done by simply calling FetchAddress (where the string IpAddress is the IP Address to query) like this:

string IpAddress = "1.1.1.1";
string Url = string.Format(
   "http://www.geobytes.com/IpLocator.htm?GetLocation&template=xml.txt&IpAddress={0}",
   IpAddress
   );
string result = FetchAddress(Url);

//oops, not all data coming back is encoded...
result = result.Replace("&", "&");

You'll note a minor encoding issue on their end on the line above: in one case, I noticed the comment field in their dataset contained an unencoded ampersand, which of course caused an exception to be raised when I loaded this as XML.  They're correcting the issue and this hack is just a temporary stopgap, and that only happened in one out of several thousand lookups.

Cut and paste the URL in the code above into your browser to give it a try (put an IP address in the {0} at the end, of course).  The XML you get back should look something like this:

<?xml version="1.0" encoding="iso-8859-1" ?>
<info>
  <IP>141.150.42.139</IP>
  <countryid>119</countryid>
  <country>Italy</country>
  <fips>IT</fips>
  <iso2>IT</iso2>
  <iso3>ITA</iso3>
  <ison>380</ison>
  <internet>IT</internet>
  <comment></comment>
  <regionid>2246</regionid>
  <region>Lombardia</region>
  <code>LO</code>
  <adm>IT09</adm>
  <cityid>11937</cityid>
  <city>Milano</city>
  <latitude>45.4670</latitude>
  <longitude>9.2000</longitude>
  <timezone>+01:00</timezone>
  <dmaid></dmaid>
  <dma></dma>
  <market></market>
  <certainty>50</certainty>
  <locationcode>ITLOMILA</locationcode>
  <ipaddress>141.150.42.139</ipaddress>
</info>

Parsing the XML is fairly trivial ... for a description of each field, visit GeoByte's tag information page here.  Note the certainty field -- it's provided by the vendor to indicate how conclusive the lookup is -- a higher number, of course, means there's a greater chance of the data being accurate (some of the data in the example above is fictional).  In some cases, though, the results will be empty.  When I investigated these, they are almost always some type of spider or bot that slipped passed ReverseDOS and other log filtering.

If you're going to pursue geolocation, GeoByte's website has a lot of great info on the technology and process, and what you can expect from an accuracy point of view.

To recap, we now know what IPs visited the site, the date/time and pages (if you've chosen to capture that info), and we have a likely location.  With this step complete, we can move on to step 3, plotting that info on a map!
Comments are closed

My Apps

Dark Skies Astrophotography Journal Vol 1 Explore The Moon
Mars Explorer Moons of Jupiter Messier Object Explorer
Brew Finder Earthquake Explorer Venus Explorer  

My Worldmap

Month List