5.1.1. ON Core Monitoring
We define and strongly recommend to have a monitoring process in place for each Role (Sensor, Core, Analytics) in any productive environment.
We classify and define different monitoring methods as:
Trending: Where the system resources monitor hardware performance and its status.
External Network services: Availability where those are checked from outside.
Processes and Events to be monitored: That are up and running along with its related events.
Healthcheck: ON Core has a multiple internal checks to make sure services are up and running as expected.
Monitor User: To check if radius petitions work properly.
To understand better how to monitor the ON Core, we recommend to review the ON Core Architecture section
5.1.1.1. Trending
It is possible to find the status of the system resources. We can find it in the Status -> Trending. The system resources monitored are:
CPU
OpenNAC
Disk
Interface
Load
Memory
Mysql
Redis
Other
Conntrack
For more information about this topic, see full trending explanation.
5.1.1.2. External Network Services
Check service availability:
DNS server (port TCP/53 and UDP/53), if this service is enabled.
DHCP server (port UDP/67), if this service is enabled.
DHCP-HELPER-READER service (port UDP/67), if this service is enabled.
- Radius server (port UDP/1812 and UDP/1813)
It would be interesting to use a RADIUS connection check with a valid user and credentials.
- MySQL server (port TCP/3306)
iptables firewall would have to be modified to enable access from the monitor server to this service.
- Queues server (port TCP/4730)
iptables firewall would have to be modified to enable access from the monitor server to this service.
HTTP/HTTPS server (port TCP/80 and TCP/443): Apart from checking the HTTP/HTTPS service, a status page is defined as http://openNACServer/status, where the output would be a JSON like the following:
*{"db":1,"queue":{"pending_jobs":0,"running_jobs":0,"available_workers":5}}*
**db** field has to be "1", and @queue@ depending on your queues configuration and usage.
5.1.1.3. Processes and Events to be monitored
The following services can be externally monitored:
httpd
krb5kdc
Radiusd
Opennac
Mysqld
- Radius log events monitoring:
Auth fails more than 100 per minute
Errors regarding duplicated request not bigger that 50 per minutes.
5.1.1.4. Healthcheck
Different modules are being checked by the out of the Box ON Core instances. For the different roles of the ON Core we can find:
5.1.1.4.1. ON Principal
To configure the ON Principal healthcheck, visit the healthcheck configuration:
BACKEND
HTTP_CERTIFICATE
RADIUS
RADIUS_CERTIFICATE
LDAP
UDS
DNS
CACHE
QUEUES
LOGCOLLECTOR
PORTAL
DB
COLLECTD
FILEBEAT
RAM
SWAP
MEMCACHED
NTLM
AD_DOMAIN_MEMBER
WINDBIND
NXLOG
TIME_SYNC
BACKUP
DISK_ROOT
DISK_VAR
DISK_VAR_LOG
DISK_TMP
DISK_BACKUP
SYSTEM_LOAD
SYSTEM_INFO
5.1.1.4.2. ON Worker
To configure the ON Worker healthcheck, visit the healthcheck configuration:
BACKEND
HTTP_CERTIFICATE
RADIUS
RADIUS_CERTIFICATE
UDS
DHCP
DNS
CACHE
QUEUES
LOGCOLLECTOR
PORTAL
DB
DBREPLICATION
COLLECTD
FILEBEAT
DHCPHELPERREADER
RAM
SWAP
MEMCACHED
NTLM
AD_DOMAIN_MEMBER
WINDBIND
NXLOG
TIME_SYNC
BACKUP
DISK_ROOT
DISK_VAR
DISK_VAR_LOG
DISK_TMP
DISK_BACKUP
SYSTEM_LOAD
SYSTEM_INFO
5.1.1.4.3. ON Proxy
To configure ON Proxy healthcheck, visit the healthcheck configuration:
RADIUS
RADIUS_CERTIFICATE
DNS
LOGCOLLECTOR
RAM
SWAP
TIME_SYNC
BACKUP
DISK_ROOT
DISK_VAR
DISK_VAR_LOG
DISK_TMP
DISK_BACKUP
SYSTEM_LOAD
SYSTEM_INFO
5.1.1.4.4. ON Portal
To configure ON Portal healthcheck, visit the healthcheck configuration:
HTTP_CERTIFICATE
DNS
LOGCOLLECTOR
PORTAL
RAM
SWAP
TIME_SYNC
BACKUP
DISK_ROOT
DISK_VAR
DISK_VAR_LOG
DISK_TMP
DISK_BACKUP
SYSTEM_LOAD
SYSTEM_INFO
5.1.1.5. Monitor User
The Monitor user is a utility integrated to our tool that, in this case, will help us to know if RADIUS server authentications are working.
This user implemented in our system is in charge of carrying out authentication processes against the RADIUS every minute, by sending polevals that precisely simulate this authentication process.
We can see the result of the authentication process mentioned in the ON NAC > Default view > Unassigned window. This user always has the MAC address value of 00:00:00:00 :00:00, IP address value of 0.0.0.0 and User value as the name “monitor”.
If the Last Access value of this user is greater than 1 minute, we can conclude that either the RADIUS server is off or the authentication process does not work as it should.
Important
To avoid unwanted access to our system taking advantage of this monitor user, it is important to create a policy that restricts access. This way, we can use this monitoring tool without compromising our security. An example of said policy is shown below, which we will have to position as #1 in our list of policies.
It is also important when creating said user, to configure a complex password and not to write it down.
Name: Monitor
Enabled: YES
- Preconditions:Users(It will have to be created if it does not exist)
User ID:monitor
E-mail: monitor@monitor.es
Password:<Random>
TTL (in minutes):0
- Preconditions: Sources
Supplicant User: YES
User: YES
- Postconditions
- VLAN
ID: 4095
Type: Service
VLAN by default: false
Name: ACCESS DENIED
This username and password will be used by the different network devices to check the status of the RADIUS server.