4.1.8. Troubleshooting

In order to troubleshoot the Visibility Use Case, we have considered the following sections. Carefully review the topics to see if they can solve your problem.

4.1.8.1. Dashboard does not show data

When a dashboard does not show the data, it means that the ON Analytics component is somehow not working as expected. We have to follow a simple flow that will be detailed by the steps below to discover our issue :

../../_images/troub_dashboard.png


4.1.8.1.1. Healthcheck

The first thing to do is to check the ON Analytics healthcheck. SSH onto the machine and execute

php /usr/share/opennac/healthcheck/healthcheck.php
../../_images/troub_healthcheck_console.png


Make sure everything shows the “OK” state. Remember that you can also check the healthcheck from the OpenNAC Web Administration Portal (looking at the ON Analytics healthcheck):

../../_images/troub_healthcheck_portal.png


If you do not see any errors here and/or the problem persists, you can go to the next step.

4.1.8.1.2. Check the logs

There are three important services on the ON Analytics: Logstash, Elasticsearch, and Kibana. To check their behavior we can see each one’s logs.

../../_images/troub_check_logs.png


  1. Logstash: check the log file in the path /var/log/logstash/logstash-plain.log at ON Analytics

tail -f /var/log/logstash/logstash-plain.log
  1. Elasticsearch: check the log file in the path /var/log/elasticsearch/elasticsearch.log at ON Analytics

tail -f /var/log/elasticsearch/elasticsearch.log
  1. Kibana: check the log file in the path /var/log/kibana/kibana.log at ON Analytics

tail -f /var/log/kibana/kibana.log

If you do not see any errors here and/or the problem persists, you can go to the next step.

4.1.8.1.3. Check the API Key

The API key configured in the OpenNAC Web Administration portal on ON CMDB -> Security -> API Key must contain the ON Analytics IP and this one must be configured on the ON Analytics server as well.

../../_images/troub_api_key.png


Go to ON CMDB -> Security -> API Key section and check if the “IP” is the ON Analytics IP address, and the generated key is the API key to be used.

../../_images/troub_api_key_portal.png


In this example, the ON Analytics IP is the 10.250.101.104, so now accessing this machine and looking into the /etc/default/opennac file, we must see the same API key on the “OPENNAC_API_KEY” field:

../../_images/troub_api_key_console.png


If you need to create a new API Key or modify any field, make sure to restart the services:

systemctl restart logstash
systemctl restart elasticsearch

If you do not see any errors here and/or the problem persists, you can go to the next step.

4.1.8.1.4. Check the /etc/hosts of the ON Analytics

On the ON Analytics server, the /etc/hosts must be correctly configured.

../../_images/troub_hosts.png


  1. Check if the hostnames of ON Analytics, On Sensor, On Aggregator, On Core and On Principal exists.

../../_images/troub_hosts_console.png


  1. Check if the IP assigned to every hostname is correct, and check if the interface can correctly resolve requests.

  2. Check that a DNS nslookup to any host returns the ip corresponding to the domain.

  3. Restart the services

    systemctl restart logstash
    systemctl restart elasticsearch
    

If you do not see any errors here and/or the problem persists, you can go to the next step.

4.1.8.1.5. Check the ON Analytics /var disk capacity

The capacity of the /var disk on the ON Analytics must remain with more than 25% free space.

../../_images/troub_disk_var.png


To check this percentage, ssh onto the ON Analytics console and run:

df -h
../../_images/troub_dfh.png


If the percentage in use is >75%, follow the next steps:

  1. Check the index size

    curl -X GET 'localhost:9200/_cat/indices/?&h=health,status,index,store.size'
    
    ../../_images/troub_index_size.png


  1. Add disk space (optional): you can make the disk size bigger.

  2. Change the /etc/elastCurator/action.yaml file. You can change the days you want to save the index.

    ../../_images/troub_action.png


  1. Execute clear index in base to the previously modified action.yaml

    /usr/share/opennac/analytics/scripts/elasticsearch_purge_index_curator.sh
    
  2. Execute the read_only.sh script

    /usr/share/opennac/analytics/scripts/read_only.sh
    
  3. Restart the services

    systemctl restart logstash
    systemctl restart elasticsearch
    

If you do not see any errors here and/or the problem persists, you can go to the next step.

4.1.8.1.6. Check ON Analytics Index Status

The last step to follow in order to solve the Dashboard not showing data issue, is to check the indexes status.

../../_images/troub_index_status.png


  1. Make sure all indexes status are green

    curl -X GET 'localhost:9200/_cat/indices/?&h=health,status,index,store.size'
    
    ../../_images/troub_index_size.png


  1. If they are not green, execute the read_only.sh script

    /usr/share/opennac/analytics/scripts/read_only.sh
    
  2. Check if the indexes have been created today. If they have, everything should work correctly (skip all following steps)

    curl -X GET 'localhost:9200/_cat/indices/?&h=health,status,index,store.size'
    
    ../../_images/troub_index_size.png


  1. Check the sources on the /etc/hosts

    Check the onanalytics entry has the correct IP on the ON Core and ON Sensor machines.

  2. Check input traffic with a tcpdump on the ON Sensor

    tcpdump -i <interface>
    

    If you do not have input traffic, contact the network manager, the problem must be external.

    If you do have input traffic, check the sensor services: zeek, filebeat and dhcp-helper-reader:

    systemctl status zeek
    systemctl status filebeat
    systemctl status dhcp-helper-reader
    
  3. On the ON Core machine, check the filebeat status.

    systemctl status filebeat
    

4.1.8.2. Business Profile entry with IP 0.0.0.0

Other common problem is to find an entry with the IP 0.0.0.0 on a Business Profile. Follow the steps below to discover the issue:

../../_images/troub_business.png


4.1.8.2.1. DHCP

On the ON Core machine, check the DHCP behavior:

../../_images/troub_dhcp.png


  1. Check the service status

    systemctl status dhcp-helper-reader
    
  2. Check the DHCP input packets

    tcpdump -vnes0 -i <interface> port 67 or port 68
    
    ../../_images/troub_dhcp_tcpdump.png


  1. View the IPs: ipMac with source dhcp. Go to the policy evaluation on the OpenNAC Enterprise Web Administration Portal go to ON NAC -> Business Profile, select the Business Profile with the issue, click on the eye icon of the event and select IpMac until you see “dhcp” as source module

    ../../_images/troub_policy_evaluation.png


  1. Repeat the same process on the ON Sensor machine

  2. If the problem is not solved, you can check the DHCP Helper Reader advanced troubleshooting.

If you do not see any errors here and/or the problem persists, contact your network manager, the problem must be external.

4.1.8.2.2. IP Static

On the ON Sensor and ON Analytics machines, follow the flow shown below:

../../_images/troub_ip_static.png


  1. Check that the ARP Plugin is enabled. On the ON Sensor machine go to /opt/zeek/share/zeek/site/local.zeek and make sure the line “@load scripts/arp_main” is uncommented.

    ../../_images/troub_arp_main.png


  1. Restart the zeek service

    systemctl restart zeek
    
  2. Check the ARP log file on ON Sensor

    tail -f /opt/zeek/logs/current/arp.log
    
  3. In ON NAC > Business Profiles in the event window we can see the IpMac event with the tag IPT_STATIC.

    ../../_images/troub_iptstatic.png


  1. Check logstash logs on the ON Analytics machine:

    tail -f /var/log/logstash/logstash-plain.log
    

If you do not see any errors here and/or the problem persists, you can go to the next step.

4.1.8.2.3. MACDISCOVER

On the ON Core, check the network configuration:

../../_images/troub_macdiscover.png


  1. Go to ON CMDB -> Networks to check the defiled networks. You and add a new one or edit an existing one.

    ../../_images/troub_oncmdb_networks.png


  1. Check the SNMP configuration on Configuration -> Configuration vars -> NetDev

    ../../_images/troub_confvars_snmp.png


  1. Query the network with SNMPWalker

    snmpwalk -v2c -c cal2kmar <ipNetToMediaPhysAddress>
    

    With the “head” filter we get the ARP table

    snmpwalk -v2c -c cal2kmar <ipNetToMediaPhysAddress> | head
    
  2. If the problem persists, contact your network manager, the problem must be external.

4.1.8.3. DHCP Helper Reader Advanced Troubleshooting

This protocol can discover a device or get complementary information about a previously discovered one. To check the behavior of the DHCP Helper Reader, requests and forwarded responses, refer to the following diagram. The main components that take part in the flow are:

  • DHCP Helper Reader Service (ON Core/ON Sensor)

  • ON Principal - Redis, Gearman, Minions (former workers), and OpenNAC Enterprise API

../../_images/troub_dhcp_helper_reader.png


  1. When traffic is received in ON Sensor, it detects the DHCP request (On Core can also receive requests).

To check that the DHCP Helper Reader is receiving the requests, we execute the following command indicating the interface where the traffic is being received:

tcpdump -i <INTERFACE> -nn port 67
../../_images/troub_dhcp_helper_reader1.png


  1. The DHCP Helper Reader checks against Redis, whether the device has been already discovered or not. If not, DHCP registers the new entry on Redis.

To check that the traffic is being received by Redis, execute the following command:

tcpdump -i <INTERFACE> -nn port 6379
../../_images/troub_dhcp_helper_reader2.png


  1. Once the DHCP Helper Reader has parsed the device information, it pushes the job against Gearmand.

To check that Gearmand is receiving jobs from DHCP Helper Reader, execute the following command:

tcpdump -i <INTERFACE> -nn port 4730
../../_images/troub_dhcp_helper_reader3.png


  1. Gearmand will then assign the job to a minion.

To check that the minion has received the job, we can look the following log in ON Principal:

grep -i ipmac /var/log/opennac/opennac-job.log
../../_images/troub_dhcp_helper_reader4.png


  1. The assigned minion will receive the information and push it to the OpenNAC Enterprise API.

To check if the API is receiving the poleval, we can look the following log in ON Principal:

/var/log/httpd/opennac-access_log
../../_images/troub_dhcp_helper_reader5.png


  1. Based on the information received, the OpenNAC Enterprise API will add the newly collected information to the new device.

We can see that these events have occurred in Business profile. The following image shows the expected results:

../../_images/troub_dhcp_helper_reader6.png