5.2.3.2.2. Healthcheck checks

In this section we can find automated checks that allow us to know the state of the healthcheck of an environment and to be able to identify the errors related to them.

These automated checks can be find in the all_checks.yml file of the following path:

cd /usr/share/opennac/ansible/

The following points detail the different checks that ansible performs in an automated way as well as the tags available when executing them.

5.2.3.2.2.1. Checks

  • Ensure that the healthcheck is all in green by fetching it from the principal API

5.2.3.2.2.2. Tags

  • healthcheck

5.2.3.2.2.3. Data Structure

"healthcheck": {
    "{{ node_id }}": {
        "error": {
            "{{ healthcheck_id }}": "{{ healthcheck_msg }}"
        },
        "hostname": "{{ node_hostname }}",
        "ip": "{{ node_ip }}",
        "ok": {
            "{{ healthcheck_id }}": "{{ healthcheck_msg }}"
        }
    }
}

The data structure shown is explained below:

  • node_id: corresponds to the hostname of the node and contains the healthcheck status of the node’s services.

  • hostname: node hostname.

  • ip: node IP address.

  • ok: services in OK status.

  • error: services with ERROR status.

5.2.3.2.2.4. Example

Command:

ansible-playbook -i inventory all_checks.yml --tags "healthcheck"

Output:

"healthcheck": {
    "02-sensor-08": {
        "error": {
            "DISK_VAR": "ERROR: Partition incorrectly specified\n\n\tThis plugin shows the % of used space of a mounted partition, using the 'df' utility\n\n\t/usr/share/opennac/healthcheck/libexec/check_disk.sh:\n\t\t-c <integer>\tIf the % of used space is above <integer>, returns CRITICAL state\n\t\t-w <integer> r>\tIf the % of used space is below CRITICAL and above <integer>, returns WARNING state\n\t\t-d <device>\tThe partition or mountpoint to be checked. eg. /dev/sda1> 1, /home, /"
        },
        "hostname": "02-sensor-08",
        "ip": "10.10.39.17",
        "ok": {
            "CACHE": "Service REDIS is UP",
            "COLLECTD": "PROCS OK: 1 process with args '/usr/sbin/collectd' | procs=1;1:1;1:1;0;",
            "DHCPHELPERREADER": "PROCS OK: 1 process with args 'dhcp-helper-reader' | procs=1;;1:1;0;",
            "DISK_ROOT": "OK - / space used=2% | '/ usage'=2%;90;95;",
            "FILEBEAT": "PROCS OK: 1 process with args '/usr/share/filebeat/bin/filebeat' | procs=1;;1:1;0;",
            "SYSTEM_INFO": "role: SENSOR",
            "SYSTEM_LOAD": "OK - load average: 4.37, 4.97, 4.97|load1=4.370;100.000;125.000;0; load5=4.970;100.000;125.000;0; load15=4.970;100.000;125.000;0;",
            "TIME_SYNC": "Service TIME_SYNC is OK",
            "ZEEK": "ZEEKCTL STATUS OK - All 1 instances are running: zeek is running,"
        }
    }
}