4.7.8. Troubleshooting

This section explains how to Troubleshoot VPN connection related issues. To carry out this task correctly, it is necessary to keep in mind the flow of the VPN connection, from the moment the user connects until the connection is established (this process consists of 8 steps). The following diagram shows all the steps that are carried out.

Each part of the flow can be described as follows:

The initial set up of the agent, therefore, its download and installation.
The initial connection between the agent and the Core, to carry out the profiling of the device.
The generation of the user QR Token.
It consists in the establishment of the VPN tunnel.
It consists of sending user credentials.
The validation of the 2FA Token.
The validation of the user’s password, and therefore the communication with the AD.
Once the VPN tunnel is established, connectivity to the corporate network and access to the desired servers/services.

To be more agile in solving problems, it is necessary to understand:

In which part of the above diagram flow the problem occurs.
The sources of information available to find where the problem comes from are:
- Business Profiles (seen in the previous section).
- Dashboards (seen in the previous section).
- Logs.

4.7.8.1. Flows

4.7.8.1.1. User fails to download/install ON Agent

In this case the user is not able to download or install the Agent, let’s check the following points:

../../_images/flow_user_fails_download.png

On the Windows device:
- If the problem is related to the download: verify that there is connectivity against the VPN gateway
- If the problem is related to the installation: check the errors that Windows gives you
On the VPN gateway:
- Verify that requests are being received through port 443. For this you can use the tcpdump command, accessing through the command line (remember that by default this is only accessed from the CMI)
In the ON Core:
- It is possible that the remote client reaches the download portal correctly, but there is a problem in the download. Check the apache logs.

4.7.8.1.2. The Visibility input does not appear in the system (Agent installed)

The Agent, after its download and installation, periodically analyzes the device to send information to the Backend CORE. This type of communication is shown in the Dashboard as “Visibility” and is used for profiling, that is, checking compliance with the security policy of the device that wants to access the VPN.

If the Agent is installed, and no visibility events are received, it means that there is a problem with the connection between the Agent and the Core. If this communication does not exist, the user will not be able to connect to the VPN since one of the requirements is to know the security compliance status of the device. It would be necessary to review the following points:

On the Windows device:
- See that the agent is active. If it is not restart it.
- Look in the agent logs inside the %ProgramData%\OpenCloudFactory\opennac-agent\logs\ folder and check if it is sending the requests correctly to the Core.

We can force the agent to send requests, for this we will use the “Refresh Status” option from the agent menu, by right clicking the agent icon, forcing the agent to send the information to the Core.

Check that the IP configured in the agent configuration file correctly points to the Core, %ProgramData%OpenCloudFactoryopennac-agentconfig.ini. If it is not correct, modify it

Check if there is communication between the remote device and the Core. It is possible that the connection is blocked by an intermediate firewall, the connection is made through HTTPS using port 443/tcp

4.7.8.1.3. Problems with the user’s QR token

If the user experiences problems with the Authentication Token, it can be for two main reasons:

In use, when you try to connect to the VPN.
A failure in receiving the QR that causes the user to not have access to the OTP.

Check the following points:

In the IDP (AD/LDAP):
- Check that the user’s email address is correct. NOTE: we can also verify it from the Core.
In the Core:
- Check that the user is registered in the Core
- Verify that the Core can send emails
- Check that the user’s email address is correct. You can follow the below steps to do it: verify user
- Check that the user’s email address is correct. When a user registers to send the mail with the Token for Google Authenticator, this information is saved in a local database. You can look at the user’s email in the local database and in the IDP (AD/LDAP)
- Check that the Token is correct. You can follow below steps to do it: Verify Token
On the user device:
- Verify that the email sent with the Token is not in the SPAM folder

4.7.8.1.4. Trouble starting the tunnel

When the user cannot start the VPN connection, we need to check:

On the Windows device:
- Check that the agent is active. If it is not restart it.
- If there are no visibility entries in the Core (repeat the actions described in chapter The Visibility input does not appear in the system (Agent installed) of this page)
In VPN gateway:
- Verify that the OpenVPN service is active, through the command line (remember that by default this is only accessed from the CMI)
  If we see that the service is not working, start it.
  
  If necessary restart it. (Note that if there are more users connected, it will disconnect them, so restart the service only if absolutely necessary.)
- Verify that traffic is being received through the VPN tunnel interfac. For this you can use the tcpdump command, through the command line (remember that by default this is only accessed from the CMI)

4.7.8.1.5. User credentials

The user has credential issues (OTP or wrong password) so they cannot start the VPN connection. Check the following points:

In the Core:
- Check if the connection error is related to Credentials or OTP
  Go to the menu “ON NAC -> Business Profiles -> Default View” to review the connection error message.
  
  Go to the “Analytics -> Uses Cases -> VPN” menu and look at the bottom of the Dashboard for the list of connections.
- We can narrow down if the problem is with credentials or OTP. If it is with OTP, check the actions in the next chapter.
- Check the RADIUS logs, via command line, /var/log/radius/radius.log.
At the VPN gateway:
- Check the logs, starting with /var/log/viapps/openvpn/cmix-openvpn-<VPN_NAME>.log, through the command line, (remember that by default this is only accessed from the CMI)

4.7.8.1.6. Wrong token detection on connection

The user has problems with the 2FA (OTP managed with Google Authenticator) so he cannot initiate the VPN connection. Check the following points:

In the Core:
- Check if the connection error is related to the OTP
  Go to the menu “ON NAC -> Business Profiles -> Default View” to review the connection error message
  
  Go to the “Analytics -> Uses Cases -> VPN” menu and look at the bottom of the Dashboard for the list of connections
- We can determine if the problem is an incorrect OTP or if the cause is something else. If the problem is with OTP (repeat the actions described in chapter Problems with the user’s QR token of this page))
- Check the radius logs, via the command line, /var/log/radius/radius.log.
At the VPN gateway:
- Check the logs, starting with /var/log/viapps/openvpn/cmix-openvpn-<VPN_NAME>.log, through the command line, (remember that by default this is only accessed from the CMI)

4.7.8.1.7. AD password error

The user is having problems with credentials. Check the following points:

In the Core:
- Check if the connection error is related to user credentials.
  Go to the menu “ON NAC -> Business Profiles -> Default View” to review the connection error message.
  
  Go to the “Analytics -> Uses Cases -> VPN” menu and look at the bottom of the Dashboard for the list of connections.
- If we have identified that the credentials are not correct, we must check if the username and password are being entered correctly.
  Verify that the same credentials are used in the IDP (AD/LDAP)
  
  Check the radius logs, via the command line, /var/log/radius/radius.log.

In the event that the identified error is “no logon servers”, review the actions described in the next chapter.

4.7.8.1.8. The user does not get connectivity against the expected servers

In this case, the user establishes the VPN connection successfully, but is unable to connect against the servers to which he is authorized. Therefore, we should examine the following points:

../../_images/flow_user_expected_conn.png

In the Core of the Backend:
- Verify in the Business Profiles in which policy the user accessed:
  If you accessed the wrong policy, you will not be able to access the desired servers. The next step would be to verify why the user accessed by that policy. For example, compliance compliance.
  
  If you are accessing the correct policy, check why the user is not being associated with the dynamic zone configured in the VPN gateway. For example, it is not sent or it is sent erroneously in the “Extra Radius Param” configuration.
On the VPN gateway:

Check the policies.
Check the filter rules.
Verify names of dynamic zones -> must be exactly the one configured in the Backend policy within “Extra Radius Param”, in our example: “vpnusr and vpnadm”.

4.7.8.2. Logs

In this section we will see the different logs, which are relevant to obtain information regarding the problems that may occur related to the VPN connection.

4.7.8.2.1. At the VPN gateway

Depending on the VPN provider in use, we can find different logs:

4.7.8.2.1.1. OpenVPN

In this log, you can find all the information related to openvpn. Therefore, we will be able to see what actions and what errors are occurring on /var/log/viapps/openvpn/cmix-openvpn-<NAME_VPN>.log

It can also be accessed via the web portal by going to Manage -> VPN RoadWarrior and selecting the configured VPN tunnel and clicking View Logs.

../../_images/openvpn_road_warrior_logs.png

../../_images/openvpn_road_warrior_logs_example.png

4.7.8.2.1.2. Wireguard

In this log, you can find all the information related to wireguard. Therefore, we will be able to see what actions and what errors are occurring on /var/log/viapps/wireguard/<NAME_VPN>.log

It can also be accessed via the web portal by going to Manage -> VPN RoadWarrior and selecting the configured VPN tunnel and clicking View Logs.

../../_images/wireguard_road_warrior_logs.png

../../_images/wireguard_road_warrior_logs_example.png

4.7.8.2.2. Backend Core

We can see the Radius logs. This log contains all the information related to user authentications. For example, the results of authentications and authorizations. It is located on /var/log/radius/radius.log

4.7.8.2.3. ON Agent

In this section, we will find the paths of the different logs related to the execution of the OpenNAC Agent. In these logs, we can detect service execution errors, see the processes that are executed, see the information collected by the agent service, the payload sent by the agent to the server, etc.

Windows

C:\ProgramData\OpenCloudFactory\OpenNAC Agent\OpenNACService\Logs\OpenCloudFactory.Agent.Service.Windows.log

Mac

/Library/Logs/OpenCloudFactory/OpenNAC\ Agent/OpenNACService/Logs/OpenCloudFactory.Agent.Service.Osx.log

Linux

/var/log/OpenCloudFactory/OpenNAC\ Agent/OpenNACService/Logs/OpenCloudFactory.Agent.Service.Linux.log

For the the locations of various configuration files, Agent’s UI application logs, and other relevant resources in different operating systems, refer to the Agent’s troubleshooting section.

4.7.8.3. Restart services

In some cases, we may need to restart the services.

To restart the VPN service:

Access the Firewall administration portal.
Go to Manage -> VPN RoadWarrior.
Select the VPN and click Manage VPN Status -> Restart.

To restart the ON Agent service:

Access the Windows services, to do this in the Search engine type Services

On the Windows services console, search for the openNAC Agent.

../../_images/windows_opennac_service.png

Select the Start or Restart option.

4.7.8.4. Identified Cases

In this section, we are going to explain some identified errors and its solutions.

4.7.8.4.1. VPN Agent - Please check your internet connection

The user enters the data in the form to connect to the VPN. The agent tries to connect and displays the following error to the user.

User can check that there is internet connection.

This error can be caused by several reasons:

VPN service is NOT active.
There is a “dead” openvpn process.
The Windows Device does NOT meet compliance requirements.

4.7.8.4.1.1. VPN service is not active

It is possible that the openvpn service is not active and for this reason the timeout occurs.

Access the administration portal of the VPN gateway
Go to “Manage -> VPN RoadWarrior”
Check if the name of the VPN connection is “shadowed” this means that the tunnel is not enabled.
Check that you have the Switch “Start on Boot” enabled and if it is not, enable it by entering the configuration by doing Edit

In the form you can see that the “Start On Boot” is disabled:

Enable the Start On Boot and click Accept, so that at the next reboot it will boot automatically.
Start the tunnel manually by selecting it and clicking on Start in the top menu.

4.7.8.4.1.2. There is a “dead” OpenVPN process

OpenVPN uses three processes for its operation. If for any reason any of these processes dies, the service will not work properly.

To verify that all three OpenVPN processes are active:
- Access the Tunes Finisher via SSH, remember that by default it can only be done from the CMI
- Check openvpn processes are running
  ps aux | grep openvpn

In the image you can see that there are two processes running and one is “defunct”. Therefore, the service is NOT behaving properly.

Restart the OpenVPN service by following the steps seen in section Restart services
Access the VPN service again (from the Windows device).

4.7.8.4.1.3. Windows device does not meet the requirements

If the device does NOT comply with the configured compliance policy (by default AV and FW activated and updated) defined by the Tag EPC_SECURITY_COMPLIANCE of the Tag Policies, it will not be able to connect. We must look for the connection in the Business Profile or the Dashboards, and it will be seen that there is a message indicating that there is NO compliance.

To solve it, it is necessary to find out what is not being fulfilled in the user’s device.

To do this, we will zoom in on the connection by clicking on the + on the right side and at the bottom we will find the Security table where we can see which of the Tags associated with the compliance policy we do not have. Fix the problem on the remote device and try to connect again.

4.7.8.4.2. VPN Agent - Invalid credentials

The user enters the VPN connection data, the agent tries to connect and displays the following error message to the user.

This error message can be displayed for various reasons:

The user entered an incorrect password
Wrong OTP error
No logon servers (Communication problem with AD/LDAP configured in Backend)

To search for the specific problem:

Access the Backend web portal.
Go to Analytics -> Use Cases -> VPN or ON NAC -> Buisness Profile -> Default view.
Analyze the error you are showing

Once the possible cause has been identified, it will be possible to carry out the corresponding troubleshooting.

4.7.8.4.2.1. The user entered an incorrect password

In case the user has entered the wrong password, you have to look check the following:

That the user is entering the password incorrectly several times, so the account has been blocked in the IDP (AD/LDAP)
Check that the password being entered is the one found in the AD/LDAP. That it has not been modified by the administrator or the user after the first connection.

Note

In these cases we will have to talk to the AD/LDAP administrator

4.7.8.4.2.2. OTP error

When we encounter the wrong OTP error, it can be caused by several reasons:

The user entered the wrong OTP code
The user does NOT have 2FA activated
The user does NOT exist in the AD/LDAP

THE USER ENTERED THE WRONG OTP CODE
In this case, make sure that the code entered by the user is correct. To check the code inside the Backend Core:
1. Go to Configuration -> OTP
2. Click on Check Token

Enter the user’s OTP code in Google Authenticator. We will check if the Token is valid or not. In case you try it several times and it is still invalid, regenerate the Token and send it again. If the problem persists, check the other cases; The user does NOT have 2FA activated or The user does NOT exist in the AD.

THE USER DOES NOT HAVE 2FA ACTIVATED
In this step, it is necessary to verify that the user has 2FA enabled in the Backend Core. To do so we will have to:
1. Go to the menu Configuration -> OTP
2. Enter the username in the Search input.

If the user exists, if we click in the + button, the following form will be displayed with the user information.

EL USUARIO NO EXISTE EN EL AD/LDAP
In this case, it is necessary to verify that the user exists in the AD/LDAP.
1. Go to ON CMDB -> Security -> Admin Users.
2. Choose the data source corresponding to AD/LDAP.
3. Using the search box, search for the corresponding user

In case the user does not exist, no results will be seen.

4.7.8.4.2.3. No logon servers (communication problem with the AD)

This error is shown when from the Backend Core it cannot authenticate against the AD, to verify if it is working:

Connect via SSH to the Core
Examine the /etc/resolv.conf file
DNS is probably not pointing to the AD so it does NOT resolve the domain and cannot find it.
1. Make sure there is a line similar to the one in the example, otherwise change it with the IP of your AD or of a DNS server that correctly resolves the AD domain

nameserver 172.16.30.20

Once the IP is changed, restart the kerberos, samba and windbind services.

/etc/init.d/krb5kdc restart && service smb restart && service winbind restart

Check that there is connection with the AD, you can use the following command:

ntlm_auth --username=<username>

<username>: the username of a registered user, it will ask for the user’s password.

In the following image, you can see how after restarting the services it is possible to establish the connection with the AD. (If a wrong password is entered, it gives an error, and if the password is valid, it shows the success message).

Finally, verify that the user is able to connect through the VPN.

4.7.8.4.3. VPN connected, but with incorrect access

The user enters the data to connect to the VPN and the agent connects successfully.

We see the remote user connected to the tunnel through the VPN gateway web interface.
We observe that the remote user is connected in the tunnel, but is NOT in the Correct Dynamic Zone through the VPN gateway web interface.
We see that the user accessed the correct or incorrect policy in the Default View of the Core.

This error can occur for several reasons.

The Security Compliance policy is not compliant

We have not configured the Extra Params

To find out what the specific problem is:

Go to Analytics -> Use Cases -> VPN or ON NAC -> Buisness Profile -> Default view

Go to the radius log to see what could be happening

Go to the VPN gateway log to see if the parameter is correct

Go to ON NAC -> Policies

Once the possible cause has been identified, it is possible to carry out the corresponding troubleshooting.

4.7.8.4.3.1. The security compliance policy is not followed

In the event that the device that connects to the VPN does not comply with the appropriate security posture EPC_SECURITY_COMPLIANCE, it is possible that the user’s access is validated with the Default policy (product policy) so the user lifts the tunnel , but we will not have access to the network by not having a dynamic zone applied in the VPN gateway, NOT applying any of the access policies.

The user appears connected to the VPN.

But it does NOT appear in any of the dynamic zones.

In the Radius log, we see Login OK, but we also see that it sends a VLAN that we don’t know about.

In the VPN gateway log, we have access, but we see that it does not have a VSA Attribute.

As a summary, we have validated a domain user, which is within the group that we selected in the filter, but something is wrong.

If we go to the Default View of the Core, ON NAC -> Business Profiles -> Default View and look for the connection, we will see that it entered through the Default policy. We do + in the right column to see more information about it and look for the TAGS, we will see that it complies with some of the “ISS” type TAGS but not all, so it does not comply with EPC_SECURITY_COMPLIANCE, we have allowed access to the VPN to a valid user but their device is not.

Check the policies in the Backend Core, if you did not create the Reject Policy, you must create it to avoid this type of incorrect access, blocking access to the valid user who has a device that is not.

4.7.8.4.3.2. No extra params have been configured

In case we have configured Extra Params or it is misconfigured, the remote client, despite entering the correct policy, will not be associated with the dynamic zone configured in the VPN gateway. This way, it will not apply the associated ACLs and will not have the correct access to the network.

The user appears connected to the VPN:

But it does NOT appear in any of the dynamic zones.

In the RADIUS log, we see Login OK, and we see that it sends a correct VLAN.

In the VPN gateway log, we have access, but we see that it does not have a VSA Attribute.

If we go to the Default View of the Core, ON NAC -> Business Profiles -> Default View and look for the connection, we will see that I entered the correct policy.

We click on + to see more information about it and look for the TAGS. We will see that it complies with EPC_SECURITY_COMPLIANCE. We have allowed a valid user to access the VPN, but we do not locate it where we want, what is missing?

As a summary, we have validated a domain user, who is within the group that we selected in the filter, that complies with the device’s compliance policy, but does not associate it to the dynamic zone.

Review the access policies.

Access ON NAC -> Policies and click on the Edit button.

The CORRECT parameter is Cisco-AVPair with “-“ mid-dash.