openNAC Analitycs Custom GeoIP Database Creation

On of the main benefits of have a custom GeoIP database used by Opennac Analytics is enable the inclusion of customers private network ranges in the set of GeoIP information corresponding to the physical location of that networks, which will have effect on creating visualizations and their presentation in dashboards.

The creation of a custom GeoIP database is performed by running a perl script that will read a CSV file as input and write a MMDB database file as output. This document describe the steps to install the necessary software dependencies in the CentOS 7 operating system, how to use the Perl script and how must be the format of the input CSV file.

Installation of Software Dependencies

As prerequisite for successful installation of all software dependencies, the operating system must be able to connect to the software repositories on the Internet. Most of software are Perl modules to be installed directly from the CPAN repository by using the Perl installation tool cpanm. Another prerequisite is have the EPEL yum repository enabled. All installation steps must be performed as root user.

Installation Steps

  1. Some rpm packages are core dependencies for the other installation steps. Install them running:
yum install perl-core perl-LWP-Protocol-https.noarch gcc libmaxminddb-devel
  1. Install the cpanm tool with the following one-liner:
curl -L https://cpanmin.us | perl - App::cpanminus
  1. Installation of MaxMind DB Perl modules dependencies using cpanm:
cpanm --installdeps Devel::Refcount MaxMind::DB::Reader::XS MaxMind::DB::Writer::Tree Net::Works::Network GeoIP2 Data::Printer
  1. Run again the cpanm command without the –installdeps option in order to finish all modules installation.
cpanm Devel::Refcount MaxMind::DB::Reader::XS MaxMind::DB::Writer::Tree Net::Works::Network GeoIP2 Data::Printer
  1. Reading CSV files requires more one perl module -> Text::CSV_XS. The installation can be achieved with the cpanm command:
cpanm Text::CSV_XS

Considerations about GeoLite2 fields and Logstash

MaxMind made available two GeoLite2 database of interest “GeoLite2 City” and “GeoLite2 Country”. They are freely accessible for download from: https://dev.maxmind.com/geoip/geoip2/geolite2/.

For purposes of this document, these two databases in CSV format can be easily queried as a reference and source of IP geolocation information, such as countries and cities names and codes, which will be used in the custom CSV to be processed by the script.

According the officel Logstash 6.x documentation, the fields permitted are:

  • city_name
  • continent_code
  • country_code2
  • country_code3
  • country_name
  • dma_code
  • ip
  • latitude
  • longitude
  • postal_code
  • region_name
  • timezone

But the Logstash log error messages say there are more permitted fields, thus we will take them in consideration:

  • autonomous_system_number
  • autonomous_system_organization
  • city_name
  • country_name
  • continent_code
  • continent_name
  • country_code2
  • country_code3
  • ip
  • isp
  • postal_code
  • dma_code
  • region_name
  • region_code
  • timezone
  • location
  • latitude
  • longitude
  • organization

The custom mmdb database must be compliance with them, otherwise the Logstash will rise an illegal field value exception when you try to use an unsupported field in the geoip plugin configuration in the Logstash filter configuration section.

Custom Input CSV Format

The custom CSV file that will be the input for the Perl script should consist of the following fields:

Field Name Description Example
ip_network_range IP network address in CIDR format. 180.239.130.0/24
city_geoname_id Numeric geoname id of the city. 3467865
city_name Name of the city, according the locale code. Campinas
country_geoname_id Numeric geoname id of the country. 3469034
country_iso_code Country code according the ISO standard. BR
country_name Name of the country, according the locale code. Brazil
continent_geoname_id Numeric geoname id of the continent. 6255147
continent_code Continent code according the ISO standard. SA
continent_name Name of the continent, according the locale code. South America
accuracy_radius The approximate accuracy radius, in kilometers, around the latitude and longitude for the geographical entity (country, subdivision, city or postal code) associated with the IP address. 50
latitude The approximate latitude of the postal code, city, subdivision or country associated with the IP address. -22.8035
longitude The approximate longitude of the postal code, city, subdivision or country associated with the IP address. -47.0596
time_zone The time zone associated with location, as specified by the IANA Time Zone Database. America/Sao_Paulo
postal_code The postal code associated with the IP address. 13083100
subdivision_1_geoname_id Numeric geoname id of the continent of the country subdivision. 3448433
subdivision_1_iso_code Code of the country subdivision according the ISO standard. SP
subdivision_1_name Name of the country subdivision. Sao Paulo

Important observations about the CSV file:

  • It must have the headers line. The headers are referenced by the CSV reader module used by the Perl script.
  • There is no need to enclose the values of the fields with any type of quotes (single or double).
  • The fields must be separated by commas.

Example of custom GeoIP database CSV file with three records (plus the header line):

ip_network_range,city_geoname_id,city_name,country_geoname_id,country_iso_code,country_name,continent_geoname_id,continent_code,continent_name,accuracy_radius,latitude,longitude,time_zone,postal_code,subdivision_1_geoname_id,subdivision_1_iso_code,subdivision_1_name
180.239.130.0/24,3467865,Campinas,3469034,BR,Brazil,6255147,SA,South America,50,-22.8035,-47.0596,America/Sao_Paulo,13083100,3448433,SP,Sao Paulo
180.239.131.0/24,3467865,Campinas,3469034,BR,Brazil,6255147,SA,South America,50,-22.8035,-47.0596,America/Sao_Paulo,13083100,3448433,SP,Sao Paulo
192.168.56.0/24,3467865,Campinas,3469034,BR,Brazil,6255147,SA,South America,50,-22.8035,-47.0596,America/Sao_Paulo,13083100,3448433,SP,Sao Paulo

Reading the custom CSV file and writing the custom MMDB database

From the revision 7461, there are two Perl scripts is available is part of the opennac-analytics package and is available at /usr/share/opennac/analytics/scripts:

  • create-custom-geoip-database.pl - Read a csv file in the format decribed in the previous topic and write a MMDB GeoIP database file.
  • read-custom-geoip-database.pl - Receive an IP address as argument and read a MMDB GeoIP database file displaying the GeoIP information corresponding to the IP network range of this IP address.

Running the Scripts

  1. The Perl script create-custom-geoip-database.pl expect to read a CSV file and write a MMDB file. The name of the input csv file must be passad as an argument. The name of the ouput mmdb file will always be custom-geoip-database.mmdb. Having the csv file in the same directory of the Perl script, run:

    perl create-custom-geoip-database.pl custom-geoip-database.csv
    

    In the above example, the CSV file name is custom-geoip-database.csv. Relplace it with the actual name of the CSV file you are providing.

    The output shoud be similar of the line below, which confirms the successful creation of the custom MMDB database file:

    custom-geoip-database.mmdb has now been created from custom-geoip-database.csv
    
  2. The Perl script read-custom-geoip-database.pl expect to receive as arguments an IP address of a network range recorded into the MMDB database and the name of the MMDB database file previously created. Run the script read-custom-geoip-database.pl in order to read a record from the MMDB database, like the example below:

    perl read-custom-geoip-database.pl 180.239.130.2 custom-geoip-database.mmdb
    

    The output will display the GeoIP information about the network range to which the IP address belongs:

    Description: Custom Geolite2 City Database
    {
        city                 {
            geoname_id   3467865,
            names        {
                en      "Campinas",
                pt-BR   "Campinas"
            }
        },
        continent            {
            code         "SA",
            geoname_id   6255147,
            names        {
                en   "South America"
            }
        },
        country              {
            geoname_id   3469034,
            iso_code     "BR",
            names        {
                en   "Brazil"
            }
        },
        location             {
            accuracy_radius   50,
            latitude          -22.8034992218018,
            longitude         -47.0596008300781,
            time_zone         "America/Sao_Paulo"
        },
        postal               {
            code   13083100
        },
        registered_country   {
            geoname_id   3469034,
            iso_code     "BR",
            names        {
                en   "Brazil"
            }
        },
        subdivisions         [
            [0] {
                geoname_id   3448433,
                iso_code     "SP",
                names        {
                    en      "Sao Paulo",
                    pt-BR   "São Paulo"
                }
            }
        ]
    }
    

    There is no need of the existence of fields like startIP or endIP since the calculation of the correct network for the IP address passed is done automatically.

Test of the Geolite2 Custom Database with Logstash 6.x

The simple configuration bellow will be enough to test the loading of the custom Geolite2 database by Logstash using the geoip information filter for the IP address 180.239.130.2 by passing a fake Apache log line as input via stdin. A pre-defined grok pattern which parses the combined Apache log will be used line in order to get the field clientip which will be the source field:

input {
  stdin {}
}

filter {
  grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
  }

  geoip {
    default_database_type => "City"
    source => "clientip"
    database => "/tmp/custom-geoip-database.mmdb"
    fields => [ "continent_code", "continent_name", "country_code2", "country_name", "region_code", "region_name", "city_name", "postal_code", "timezone", "latitude", "longitude"]
    target => "geoip"
  }
}

output {
       stdout {
       codec => rubydebug
       }

During the startup, Logstash will display a log line informing the pointed mmdb database file is being used:

[2018-11-02T00:40:59,068][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.3.0"}
[2018-11-02T00:41:04,069][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
{color:red}[2018-11-02T00:41:04,517][INFO ][logstash.filters.geoip   ] Using geoip database {:path=>"/tmp/custom-geoip-database.mmdb"}
[2018-11-02T00:41:04,734][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x490dc557 run>"}
The stdin plugin is now waiting for input:
[2018-11-02T00:41:04,913][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2018-11-02T00:41:05,315][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

The following fake Apache log line was used as input:

180.239.130.2 - - [01/Apr/2017:16:21:15 -0300] "GET /favicon.ico HTTP/1.1" 200 3638 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"

The final Logstash output to stdout, with the goip information is:

{
        "message" => "180.239.130.2 - - [01/Apr/2017:16:21:15 -0300] "GET /favicon.ico HTTP/1.1" 200 3638 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"",
    "httpversion" => "1.1",
          "bytes" => "3638",
           "verb" => "GET",
       "response" => "200",
       "@version" => "1",
       "referrer" => ""-"",
          "agent" => ""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"",
       "clientip" => "180.239.130.2",
           "auth" => "-",
          "geoip" => {
        "continent_name" => "South America",
           "region_name" => "Sao Paulo",
           "region_code" => "SP",
             "longitude" => -47.059600830078125,
              "timezone" => "America/Sao_Paulo",
          "country_name" => "Brazil",
         "country_code2" => "BR",
           "postal_code" => "13083100",
              "latitude" => -22.803499221801758,
             "city_name" => "Campinas",
        "continent_code" => "SA"
    },
           "host" => "localhost",
      "timestamp" => "01/Apr/2017:16:21:15 -0300",
          "ident" => "-",
        "request" => "/favicon.ico",
     "@timestamp" => 2018-11-02T03:56:31.688Z
}

Scripts Implementation

create-custom-geoip-database.pl

#!/usr/bin/env perl

use strict;
use warnings;
use feature qw( say );
use local::lib 'local';

use Data::Printer; # exports the np() function
use Text::CSV_XS qw( csv );

use MaxMind::DB::Writer::Tree;
use Net::Works::Network;


my $num_args = $#ARGV + 1;
if ($num_args != 1) {
    print "nUsage: perl $0 csv_filen";
    exit;
}

my $mmdb_file = 'custom-geoip-database.mmdb';
my $csv_file = $ARGV[0];
my $csv = Text::CSV_XS->new( { binary => 1 } );
open my $fh, "<:encoding(utf8)", $csv_file or die "$csv_file: $!";

my @cols = @{ $csv->getline( $fh ) };
our $row = {};
$csv->bind_columns( @{$row}{@cols} );

my %types = (
    names                  => 'map',
    city                   => 'map',
    continent              => 'map',
    registered_country     => 'map',
    represented_country    => 'map',
    country                => 'map',
    location               => 'map',
    postal                 => 'map',
    traits                 => 'map',

    geoname_id             => 'uint32',

    type                   => 'utf8_string',
    en                     => 'utf8_string',
    de                     => 'utf8_string',
    es                     => 'utf8_string',
    fr                     => 'utf8_string',
    ja                     => 'utf8_string',
    'pt-BR'                => 'utf8_string',
    ru                     => 'utf8_string',
    'zh-CN'                => 'utf8_string',

    locales                => [ 'array', 'utf8_string' ],
    code                   => 'utf8_string',
    geoname_id             => 'uint32',
    ip_address             => 'utf8_string',
    subdivisions           => [ 'array' , 'map' ],
    iso_code               => 'utf8_string',
    environments           => [ 'array', 'utf8_string' ],
    expires                => 'uint32',
    name                   => 'utf8_string',
    time_zone              => 'utf8_string',
    accuracy_radius        => 'uint32',
    latitude               => 'float',
    longitude              => 'float',
    metro_code             => 'uint32',
    time_zone              => 'utf8_string',
    is_in_european_union   => 'utf8_string',
    is_satellite_provider   => 'utf8_string',
    is_anonymous_proxy     => 'utf8_string',
    );

my $tree = MaxMind::DB::Writer::Tree->new(

    # "database_type" is some arbitrary string describing the database.
    database_type => 'GeoIP2-City',

    # "description" is a hashref where the keys are language names and the
    # values are descriptions of the database in that language.
    description =>
    { en => 'Custom Geolite2 City Database' },

    # "ip_version" can be either 4 or 6
    ip_version => 4,

    # add a callback to validate data going in to the database
    map_key_type_callback => sub { $types{ $_[0] } },

    # "record_size" is the record size in bits.  Either 24, 28 or 32.
    record_size => 28,

    # Allow use of reserved netrok ranges
    remove_reserved_networks => 0,
    );

my %custom_ip_ranges;

while ( $csv->getline( $fh ) ) {
    $custom_ip_ranges{$main::row->{ip_network_range}} = {
 city => {
     geoname_id => $main::row->{city_geoname_id},
     names => {
         en => $main::row->{city_name},
     },
        },
 continent => {
     code => $main::row->{continent_code},
     geoname_id => $main::row->{continent_geoname_id},
     names => {
         en => $main::row->{continent_name},
     },
        },
 country => {
     geoname_id => $main::row->{country_geoname_id},
     iso_code => $main::row->{country_iso_code},
     names => {
         en => $main::row->{country_name},
     },
        },
 location => {
             accuracy_radius => $main::row->{accuracy_radius},
             latitude => $main::row->{latitude},
             longitude => $main::row->{longitude},
             time_zone => $main::row->{time_zone},
        },
 postal => {
     code => $main::row->{postal_code},
        },
        registered_country => {
     geoname_id => $main::row->{country_geoname_id},
     iso_code => $main::row->{country_iso_code},
     names => {
         en => $main::row->{country_name},
     },
        },
 subdivisions => [ {
             geoname_id => $main::row->{subdivision_1_geoname_id},
             iso_code => $main::row->{subdivision_1_iso_code},
             names => {
                 en => $main::row->{subdivision_1_name},
             },
 } ],
    },
}

for my $address ( keys %custom_ip_ranges ) {

    # Create one network and insert it into our database
    my $network = Net::Works::Network->new_from_string( string => $address );

    $tree->insert_network( $network, $custom_ip_ranges{$address} );
}

# Write the database to disk.
open my $csv_fh, '>:raw', $mmdb_file;
$tree->write_tree( $csv_fh );
close $csv_fh;

say "$mmdb_file has now been created from $csv_file";

read-custom-geoip-database.pl

#!/usr/bin/env perl

use strict;
use warnings;
use feature qw( say );
use local::lib 'local';

use Data::Printer;
use MaxMind::DB::Reader;
use Net::Works::Address;

my $num_args = $#ARGV + 1;
if ($num_args != 2) {
    print "nUsage: perl $0 ip_address mmdb_filen";
    exit;
}
my $ip = $ARGV[0];
my $mmdb_file = $ARGV[1];

my $reader = MaxMind::DB::Reader->new( file => $mmdb_file );
say 'Description: ' . $reader->metadata->{description}->{en};

my $record = $reader->record_for_address( $ip );
say np $record;