5.1.1.1. ON Core Monitoring
We define and strongly recommend to have a monitoring process in place for each Role (Sensor, Core, Analytics) in any productive environment.
We classify and define different monitoring methods as:
Trending: Where the system resources monitor hardware performance and its status.
External Network services: Availability where those are checked from outside.
Processes and Events to be monitored: That are up and running along with its related events.
Healthcheck: ON Core has a multiple internal checks to make sure services are up and running as expected.
Monitor User: To check if radius petitions work properly.
To understand better how to monitor the ON Core, we recommend to review the ON Core Architecture section
5.1.1.1.1. Trending
It is possible to find the status of the system resources. We can find it in the Status -> Trending. The system resources monitored are:
CPU
OpenNAC
Disk
Interface
Load
Memory
Mysql
Redis
Other
Conntrack
5.1.1.1.2. External Network Services
Check service availability:
DNS server (port TCP/53 and UDP/53), if this service is enabled.
DHCP server (port UDP/67), if this service is enabled.
DHCP-HELPER-READER service (port UDP/67), if this service is enabled.
- Radius server (port UDP/1812 and UDP/1813)
It would be interesting to use a RADIUS connection check with a valid user and credentials.
- MySQL server (port TCP/3306)
iptables firewall would have to be modified to enable access from the monitor server to this service.
- Queues server (port TCP/4730)
iptables firewall would have to be modified to enable access from the monitor server to this service.
HTTP/HTTPS server (port TCP/80 and TCP/443): Apart from checking the HTTP/HTTPS service, a status page is defined as http://openNACServer/status, where the output would be a JSON like the following:
*{"db":1,"queue":{"pending_jobs":0,"running_jobs":0,"available_workers":5}}*
**db** field has to be "1", and @queue@ depending on your queues configuration and usage.
5.1.1.1.3. Processes and Events to be monitored
The following services can be externally monitored:
httpd
krb5kdc
Radiusd
Opennac
Mysqld
- Radius log events monitoring:
Auth fails more than 100 per minute
Errors regarding duplicated request not bigger that 50 per minutes.
5.1.1.1.4. Healthcheck
Different modules are being checked by the out of the Box ON Core instances. For the different roles of the ON Core we can find:
5.1.1.1.4.1. ON Principal
To configure the ON Principal healthcheck, visit the healthcheck configuration:
AD_DOMAIN_MEMBER
ADM_USER_PASSWD_EXPIRATION
BACKEND
BACKUP
CACHE
CAPTIVE_PORTAL
CAPTIVE_PORTAL_THEMES
COLLECTD
DB
DISK_BACKUP
DISK_ROOT
DISK_TMP
DISK_VAR
DISK_VAR_LOG
DNS
FILEBEAT
HTTP_CERTIFICATE
LDAP
LOGCOLLECTOR
NTLM
PORTAL
QUEUE
RADIUS
RADIUS_CERTIFICATE
RAM
SYSTEM_INFO
SYSTEM_LOAD
TIME_SYNC
UDS
VPN_NODES
WINBIND
5.1.1.1.4.2. ON Worker
To configure the ON Worker healthcheck, visit the healthcheck configuration section:
AD_DOMAIN_MEMBER
BACKEND
BACKUP
CACHE
CAPTIVE_PORTAL
CAPTIVE_PORTAL_THEMES
COLLECTD
DB
DBREPLICATION
DHCPHELPERREADER
DISK_BACKUP
DISK_ROOT
DISK_TMP
DISK_VAR
DISK_VAR_LOG
DNS
FILEBEAT
HTTP_CERTIFICATE
LOGCOLLECTOR
NTLM
NXLOG
PORTAL
QUEUE
RADIUS
RADIUS_CERTIFICATE
RAM
SYSTEM_INFO
SYSTEM_LOAD
TIME_SYNC
UDS
VPN_NODES
WINBIND
5.1.1.1.4.3. ON Proxy
To configure ON Proxy healthcheck, visit the healthcheck configuration:
BACKUP
COLLECTD
DISK_BACKUP
DISK_ROOT
DISK_TMP
DISK_VAR
DISK_VAR_LOG
DNS
LOGCOLLECTOR
RADIUS
RADIUS_CERTIFICATE
RAM
SYSTEM_INFO
SYSTEM_LOAD
TIME_SYNC
5.1.1.1.4.4. ON Portal
To configure ON Portal healthcheck, visit the healthcheck configuration:
BACKUP
CACHE
CAPTIVE_PORTAL_THEMES
COLLECTD
DISK_BACKUP
DISK_ROOT
DISK_TMP
DISK_VAR
DISK_VAR_LOG
DNS
HTTP_CERTIFICATE
LOGCOLLECTOR
PORTAL
RAM
SYSTEM_INFO
SYSTEM_LOAD
TIME_SYNC
5.1.1.1.5. Monitor User
The Monitor user is a utility integrated to our tool that, in this case, will help us to know if RADIUS server authentications are working.
This user implemented in our system is in charge of carrying out authentication processes against the RADIUS every minute, by sending polevals that precisely simulate this authentication process.
We can see the result of the authentication process mentioned in the ON NAC > Default view > Unassigned window. This user always has the MAC address value of 00:00:00:00 :00:00, IP address value of 0.0.0.0 and User value as the name “monitor”.
If the Last Access value of this user is greater than 1 minute, we can conclude that either the RADIUS server is off or the authentication process does not work as it should.
Important
To avoid unwanted access to our system taking advantage of this monitor user, it is important to create a policy that restricts access. This way, we can use this monitoring tool without compromising our security. An example of said policy is shown below, which we will have to position as #1 in our list of policies.
It is also important when creating said user, to configure a complex password and not to write it down.
Name: Monitor
Enabled: YES
- Preconditions:Users(It will have to be created if it does not exist)
User ID:monitor
E-mail: monitor@monitor.es
Password:<Random>
TTL (in minutes):0
- Preconditions: Sources
Supplicant User: YES
User: YES
- Postconditions
- VLAN
ID: 4095
Type: Service
VLAN by default: false
Name: ACCESS DENIED
This username and password will be used by the different network devices to check the status of the RADIUS server.