openNAC Analitycs Custom GeoIP Database Creation¶
On of the main benefits of have a custom GeoIP database used by Opennac Analytics is enable the inclusion of customers private network ranges in the set of GeoIP information corresponding to the physical location of that networks, which will have effect on creating visualizations and their presentation in dashboards.
The creation of a custom GeoIP database is performed by running a perl script that will read a CSV file as input and write a MMDB database file as output. This document describe the steps to install the necessary software dependencies in the CentOS 7 operating system, how to use the Perl script and how must be the format of the input CSV file.
Installation of Software Dependencies¶
As prerequisite for successful installation of all software dependencies, the operating system must be able to connect to the software repositories on the Internet. Most of software are Perl modules to be installed directly from the CPAN repository by using the Perl installation tool cpanm. Another prerequisite is have the EPEL yum repository enabled. All installation steps must be performed as root user.
Installation Steps¶
- Some rpm packages are core dependencies for the other installation steps. Install them running:
yum install perl-core perl-LWP-Protocol-https.noarch gcc libmaxminddb-devel
- Install the cpanm tool with the following one-liner:
curl -L https://cpanmin.us | perl - App::cpanminus
- Installation of MaxMind DB Perl modules dependencies using cpanm:
cpanm --installdeps Devel::Refcount MaxMind::DB::Reader::XS MaxMind::DB::Writer::Tree Net::Works::Network GeoIP2 Data::Printer
- Run again the cpanm command without the –installdeps option in order to finish all modules installation.
cpanm Devel::Refcount MaxMind::DB::Reader::XS MaxMind::DB::Writer::Tree Net::Works::Network GeoIP2 Data::Printer
- Reading CSV files requires more one perl module -> Text::CSV_XS. The installation can be achieved with the cpanm command:
cpanm Text::CSV_XS
Considerations about GeoLite2 fields and Logstash¶
MaxMind made available two GeoLite2 database of interest “GeoLite2 City” and “GeoLite2 Country”. They are freely accessible for download from: https://dev.maxmind.com/geoip/geoip2/geolite2/.
For purposes of this document, these two databases in CSV format can be easily queried as a reference and source of IP geolocation information, such as countries and cities names and codes, which will be used in the custom CSV to be processed by the script.
According the officel Logstash 6.x documentation, the fields permitted are:
- city_name
- continent_code
- country_code2
- country_code3
- country_name
- dma_code
- ip
- latitude
- longitude
- postal_code
- region_name
- timezone
But the Logstash log error messages say there are more permitted fields, thus we will take them in consideration:
- autonomous_system_number
- autonomous_system_organization
- city_name
- country_name
- continent_code
- continent_name
- country_code2
- country_code3
- ip
- isp
- postal_code
- dma_code
- region_name
- region_code
- timezone
- location
- latitude
- longitude
- organization
The custom mmdb database must be compliance with them, otherwise the Logstash will rise an illegal field value exception when you try to use an unsupported field in the geoip plugin configuration in the Logstash filter configuration section.
Custom Input CSV Format¶
The custom CSV file that will be the input for the Perl script should consist of the following fields:
Field Name | Description | Example |
---|---|---|
ip_network_range | IP network address in CIDR format. | 180.239.130.0/24 |
city_geoname_id | Numeric geoname id of the city. | 3467865 |
city_name | Name of the city, according the locale code. | Campinas |
country_geoname_id | Numeric geoname id of the country. | 3469034 |
country_iso_code | Country code according the ISO standard. | BR |
country_name | Name of the country, according the locale code. | Brazil |
continent_geoname_id | Numeric geoname id of the continent. | 6255147 |
continent_code | Continent code according the ISO standard. | SA |
continent_name | Name of the continent, according the locale code. | South America |
accuracy_radius | The approximate accuracy radius, in kilometers, around the latitude and longitude for the geographical entity (country, subdivision, city or postal code) associated with the IP address. | 50 |
latitude | The approximate latitude of the postal code, city, subdivision or country associated with the IP address. | -22.8035 |
longitude | The approximate longitude of the postal code, city, subdivision or country associated with the IP address. | -47.0596 |
time_zone | The time zone associated with location, as specified by the IANA Time Zone Database. | America/Sao_Paulo |
postal_code | The postal code associated with the IP address. | 13083100 |
subdivision_1_geoname_id | Numeric geoname id of the continent of the country subdivision. | 3448433 |
subdivision_1_iso_code | Code of the country subdivision according the ISO standard. | SP |
subdivision_1_name | Name of the country subdivision. | Sao Paulo |
Important observations about the CSV file:
- It must have the headers line. The headers are referenced by the CSV reader module used by the Perl script.
- There is no need to enclose the values of the fields with any type of quotes (single or double).
- The fields must be separated by commas.
Example of custom GeoIP database CSV file with three records (plus the header line):
ip_network_range,city_geoname_id,city_name,country_geoname_id,country_iso_code,country_name,continent_geoname_id,continent_code,continent_name,accuracy_radius,latitude,longitude,time_zone,postal_code,subdivision_1_geoname_id,subdivision_1_iso_code,subdivision_1_name 180.239.130.0/24,3467865,Campinas,3469034,BR,Brazil,6255147,SA,South America,50,-22.8035,-47.0596,America/Sao_Paulo,13083100,3448433,SP,Sao Paulo 180.239.131.0/24,3467865,Campinas,3469034,BR,Brazil,6255147,SA,South America,50,-22.8035,-47.0596,America/Sao_Paulo,13083100,3448433,SP,Sao Paulo 192.168.56.0/24,3467865,Campinas,3469034,BR,Brazil,6255147,SA,South America,50,-22.8035,-47.0596,America/Sao_Paulo,13083100,3448433,SP,Sao Paulo
Reading the custom CSV file and writing the custom MMDB database¶
From the revision 7461, there are two Perl scripts is available is part of the opennac-analytics package and is available at /usr/share/opennac/analytics/scripts
:
create-custom-geoip-database.pl
- Read a csv file in the format decribed in the previous topic and write a MMDB GeoIP database file.read-custom-geoip-database.pl
- Receive an IP address as argument and read a MMDB GeoIP database file displaying the GeoIP information corresponding to the IP network range of this IP address.
Running the Scripts¶
The Perl script
create-custom-geoip-database.pl
expect to read a CSV file and write a MMDB file. The name of the input csv file must be passad as an argument. The name of the ouput mmdb file will always becustom-geoip-database.mmdb
. Having the csv file in the same directory of the Perl script, run:perl create-custom-geoip-database.pl custom-geoip-database.csv
In the above example, the CSV file name is custom-geoip-database.csv. Relplace it with the actual name of the CSV file you are providing.
The output shoud be similar of the line below, which confirms the successful creation of the custom MMDB database file:
custom-geoip-database.mmdb has now been created from custom-geoip-database.csv
The Perl script
read-custom-geoip-database.pl
expect to receive as arguments an IP address of a network range recorded into the MMDB database and the name of the MMDB database file previously created. Run the scriptread-custom-geoip-database.pl
in order to read a record from the MMDB database, like the example below:perl read-custom-geoip-database.pl 180.239.130.2
custom-geoip-database.mmdb
The output will display the GeoIP information about the network range to which the IP address belongs:
Description: Custom Geolite2 City Database { city { geoname_id 3467865, names { en "Campinas", pt-BR "Campinas" } }, continent { code "SA", geoname_id 6255147, names { en "South America" } }, country { geoname_id 3469034, iso_code "BR", names { en "Brazil" } }, location { accuracy_radius 50, latitude -22.8034992218018, longitude -47.0596008300781, time_zone "America/Sao_Paulo" }, postal { code 13083100 }, registered_country { geoname_id 3469034, iso_code "BR", names { en "Brazil" } }, subdivisions [ [0] { geoname_id 3448433, iso_code "SP", names { en "Sao Paulo", pt-BR "São Paulo" } } ] }
There is no need of the existence of fields like startIP or endIP since the calculation of the correct network for the IP address passed is done automatically.
Test of the Geolite2 Custom Database with Logstash 6.x¶
The simple configuration bellow will be enough to test the loading of the custom Geolite2 database by Logstash using the geoip information filter for the IP address 180.239.130.2 by passing a fake Apache log line as input via stdin. A pre-defined grok pattern which parses the combined Apache log will be used line in order to get the field clientip which will be the source field:
input { stdin {} } filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } geoip { default_database_type => "City" source => "clientip" database => "/tmp/custom-geoip-database.mmdb" fields => [ "continent_code", "continent_name", "country_code2", "country_name", "region_code", "region_name", "city_name", "postal_code", "timezone", "latitude", "longitude"] target => "geoip" } } output { stdout { codec => rubydebug }
During the startup, Logstash will display a log line informing the pointed mmdb database file is being used:
[2018-11-02T00:40:59,068][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.3.0"} [2018-11-02T00:41:04,069][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50} {color:red}[2018-11-02T00:41:04,517][INFO ][logstash.filters.geoip ] Using geoip database {:path=>"/tmp/custom-geoip-database.mmdb"} [2018-11-02T00:41:04,734][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x490dc557 run>"} The stdin plugin is now waiting for input: [2018-11-02T00:41:04,913][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]} [2018-11-02T00:41:05,315][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
The following fake Apache log line was used as input:
180.239.130.2 - - [01/Apr/2017:16:21:15 -0300] "GET /favicon.ico HTTP/1.1" 200 3638 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"
The final Logstash output to stdout, with the goip information is:
{ "message" => "180.239.130.2 - - [01/Apr/2017:16:21:15 -0300] "GET /favicon.ico HTTP/1.1" 200 3638 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"", "httpversion" => "1.1", "bytes" => "3638", "verb" => "GET", "response" => "200", "@version" => "1", "referrer" => ""-"", "agent" => ""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"", "clientip" => "180.239.130.2", "auth" => "-", "geoip" => { "continent_name" => "South America", "region_name" => "Sao Paulo", "region_code" => "SP", "longitude" => -47.059600830078125, "timezone" => "America/Sao_Paulo", "country_name" => "Brazil", "country_code2" => "BR", "postal_code" => "13083100", "latitude" => -22.803499221801758, "city_name" => "Campinas", "continent_code" => "SA" }, "host" => "localhost", "timestamp" => "01/Apr/2017:16:21:15 -0300", "ident" => "-", "request" => "/favicon.ico", "@timestamp" => 2018-11-02T03:56:31.688Z }
Scripts Implementation¶
create-custom-geoip-database.pl
#!/usr/bin/env perl use strict; use warnings; use feature qw( say ); use local::lib 'local'; use Data::Printer; # exports the np() function use Text::CSV_XS qw( csv ); use MaxMind::DB::Writer::Tree; use Net::Works::Network; my $num_args = $#ARGV + 1; if ($num_args != 1) { print "nUsage: perl $0 csv_filen"; exit; } my $mmdb_file = 'custom-geoip-database.mmdb'; my $csv_file = $ARGV[0]; my $csv = Text::CSV_XS->new( { binary => 1 } ); open my $fh, "<:encoding(utf8)", $csv_file or die "$csv_file: $!"; my @cols = @{ $csv->getline( $fh ) }; our $row = {}; $csv->bind_columns( @{$row}{@cols} ); my %types = ( names => 'map', city => 'map', continent => 'map', registered_country => 'map', represented_country => 'map', country => 'map', location => 'map', postal => 'map', traits => 'map', geoname_id => 'uint32', type => 'utf8_string', en => 'utf8_string', de => 'utf8_string', es => 'utf8_string', fr => 'utf8_string', ja => 'utf8_string', 'pt-BR' => 'utf8_string', ru => 'utf8_string', 'zh-CN' => 'utf8_string', locales => [ 'array', 'utf8_string' ], code => 'utf8_string', geoname_id => 'uint32', ip_address => 'utf8_string', subdivisions => [ 'array' , 'map' ], iso_code => 'utf8_string', environments => [ 'array', 'utf8_string' ], expires => 'uint32', name => 'utf8_string', time_zone => 'utf8_string', accuracy_radius => 'uint32', latitude => 'float', longitude => 'float', metro_code => 'uint32', time_zone => 'utf8_string', is_in_european_union => 'utf8_string', is_satellite_provider => 'utf8_string', is_anonymous_proxy => 'utf8_string', ); my $tree = MaxMind::DB::Writer::Tree->new( # "database_type" is some arbitrary string describing the database. database_type => 'GeoIP2-City', # "description" is a hashref where the keys are language names and the # values are descriptions of the database in that language. description => { en => 'Custom Geolite2 City Database' }, # "ip_version" can be either 4 or 6 ip_version => 4, # add a callback to validate data going in to the database map_key_type_callback => sub { $types{ $_[0] } }, # "record_size" is the record size in bits. Either 24, 28 or 32. record_size => 28, # Allow use of reserved netrok ranges remove_reserved_networks => 0, ); my %custom_ip_ranges; while ( $csv->getline( $fh ) ) { $custom_ip_ranges{$main::row->{ip_network_range}} = { city => { geoname_id => $main::row->{city_geoname_id}, names => { en => $main::row->{city_name}, }, }, continent => { code => $main::row->{continent_code}, geoname_id => $main::row->{continent_geoname_id}, names => { en => $main::row->{continent_name}, }, }, country => { geoname_id => $main::row->{country_geoname_id}, iso_code => $main::row->{country_iso_code}, names => { en => $main::row->{country_name}, }, }, location => { accuracy_radius => $main::row->{accuracy_radius}, latitude => $main::row->{latitude}, longitude => $main::row->{longitude}, time_zone => $main::row->{time_zone}, }, postal => { code => $main::row->{postal_code}, }, registered_country => { geoname_id => $main::row->{country_geoname_id}, iso_code => $main::row->{country_iso_code}, names => { en => $main::row->{country_name}, }, }, subdivisions => [ { geoname_id => $main::row->{subdivision_1_geoname_id}, iso_code => $main::row->{subdivision_1_iso_code}, names => { en => $main::row->{subdivision_1_name}, }, } ], }, } for my $address ( keys %custom_ip_ranges ) { # Create one network and insert it into our database my $network = Net::Works::Network->new_from_string( string => $address ); $tree->insert_network( $network, $custom_ip_ranges{$address} ); } # Write the database to disk. open my $csv_fh, '>:raw', $mmdb_file; $tree->write_tree( $csv_fh ); close $csv_fh; say "$mmdb_file has now been created from $csv_file";
read-custom-geoip-database.pl
#!/usr/bin/env perl use strict; use warnings; use feature qw( say ); use local::lib 'local'; use Data::Printer; use MaxMind::DB::Reader; use Net::Works::Address; my $num_args = $#ARGV + 1; if ($num_args != 2) { print "nUsage: perl $0 ip_address mmdb_filen"; exit; } my $ip = $ARGV[0]; my $mmdb_file = $ARGV[1]; my $reader = MaxMind::DB::Reader->new( file => $mmdb_file ); say 'Description: ' . $reader->metadata->{description}->{en}; my $record = $reader->record_for_address( $ip ); say np $record;