Wednesday, May 30, 2018

IP to country mapping in under 1MB (node js)

Hi all,
I recently wrote some node js code to do IP to country code mapping that uses less than 1MB of RAM for data storage. It also uses the ARIN registry data directly and so doesn't require attribution, unlike some of the commercial sources out there. Because it takes up little memory, it's feasible to keep the table in-memory on my tiny web server, which means I avoid any RPC costs associated with IP lookup.

The gist is to first pull a copy of the ARIN IP registry data:

wget ftp://ftp.arin.net/pub/stats/arin/delegated-arin-extended-latest
wget ftp://ftp.ripe.net/pub/stats/ripencc/delegated-ripencc-latest
wget ftp://ftp.lacnic.net/pub/stats/lacnic/delegated-lacnic-latest
wget ftp://ftp.apnic.net/pub/stats/apnic/delegated-apnic-latest
wget ftp://ftp.afrinic.net/pub/stats/afrinic/delegated-afrinic-latest

Then do a bit of filtering and sorting:

cat delegated-* | grep '|ipv4' | sort -t '|' -k 4,4 -k 5 -V > ./comb3
cat delegated-* | grep '|ipv6' | sort -t '|' -k 4,4 -k 5 -V > ./comb3.ipv6

Then build the memory structures and serialize them to disk. I handle IPv4 and IPv6 differently. Both use the raw Uint8Array type.

IPv4

The ARIN data specifies IP address ranges as an IP followed by a length, and the length is not necessarily a power of two. Since IP address are pretty densely allocated, the datastructure I use is a 65kB array mapping the first two numbers in an IP address to a run-length-coded list of (range + countryCode) blocks and gaps between ranges. Countrycode takes up one byte and the range or gap length is one byte (storing log of the length), plus a bunch of special cases for non-power-of-two lengths and so forth.

IPv6

ARIN specifies IPv6 ranges in CIDR notation, so it's a prefix followed by the number of significant bits in the prefix. Because of that and because the IPv6 range is very sparse, I use a simple trie with one level for each IP group (hextet, e.g. "2a0a"). Each level is a hash table with a 2-byte key (for the hextet) and a 3-byte value. The value is either one byte of CIDR routing prefix (e.g. the '20' in '/20') followed by two bytes of country code, or it is 3-bytes of offset indicating the location of the next level down in the trie. There's enough extra space in there to have a special bit to indicate an empty entry.

Because this data structure is read-only in production use, each hash table is resized to something like 1.3 times the number of entries.

Space usage and performacne

As of May 30, 2018, the ipv4 table is 634kB and the ipv6 table is 302kB.

With some dumb benchmarking, it looks like lookup is about 30us for either of ipv4 or ipv6. I will note that there are certain ipv4 blocks where the linked list gets long. I think I noted a max length of 512 entries for one, so user beware. One could probably add some skip-list-like indexes to the longer lists to limit the worst-case lookup time, or better yet, use the fact that most of the time ranges are power-of-two length so probably a list is not the best representation.

The node source code for all this is about 600 lines. I could probably open-source it if there's enough interest.

Wednesday, May 9, 2018

Testing old Safari browser (6.1.6) on current Linux / Ubuntu 18.04

I run Ubuntu Bionic 18.04, and I recently needed to test my website on Safari 6.1.6, released back in 2014. The closest I got was by installing an old version of the Ephiphany/Web browser, since both Epiphany and Safari are based on the Webkit browser engine.  The steps I followed, after much trial and error were:
  1. Find release date of desired Safari version looking at the wiki page.  Safari 6.1.6 was released around August, 2014.
  2. Find a version of the epiphany-browser package from a similar date by browsing the debian package history  If you click on the links there it'll show the date the package appeared.  epiphany-browser 3.12.1-1 is from around July, 2014.
  3. Look in the Ubuntu history of the epiphany package to see if there's an Ubuntu release that has a version close to what you want.  I'm currently running Bionic, and saw the Trusty has a close version, 3.10.3-0ubuntu2.
  4. Add the old Trusty repository to your /etc/apt/sources.list by adding the following lines:
    deb http://us.archive.ubuntu.com/ubuntu trusty universe
    deb http://us.archive.ubuntu.com/ubuntu trusty multiverse
    deb http://security.ubuntu.com/ubuntu trusty-security main
    deb http://cz.archive.ubuntu.com/ubuntu trusty main
    
  5. The run some apt command to update and get the correct version.  I found the right packages to include by trial and error as the apt-get install command would fail:
    apt-get remove epiphany-browser epiphany-browser-data
    apt-get update
    sudo apt-get install epiphany-browser=3.10.3-0ubuntu2 epiphany-browser-data=3.10.3-0ubuntu2 libwebkit2gtk-3.0-25 libjavascriptcoregtk-3.0-0=2.4.10-0ubuntu0.14.04.1
    sudo apt-mark hold epiphany-browser epiphany-browser-data
    
And voila! epiphany-browser was available for testing / debugging.
My guess is that it's not a good idea to leave the trusty source lines in sources.list long-term since it might confuse future upgrades and may also confuse apt-cache. I plan to remove and try to cleanup once I'm done testing.