3.2.2.2. ON Analytics cluster
In this section, we are going to see how to configure an elastic cluster. First, we need to understand the following concepts:
Normally, ON Analytics clusters are configured without an aggregator, so we must first configure the analytics without an aggregator.
3.2.2.2.1. Cluster
A cluster is a collection of connected nodes identified by its cluster name. Each node in a cluster knows about the other nodes. Each node can accept a client request and forward that to the appropriate node.
It is important to have a resilient cluster, that permits a redundant cluster. If a node is down, the replicas for the primaries in that node are promoted in the nodes that are up, and other replicas are created.
To have a resilient cluster we need the following:
At least three master nodes
At least two nodes of each role
At least one replica per primary shard
3.2.2.2.2. Node
A node is a single Elasticsearch instance composed of compute and disk.
It can take multiple roles. The most important roles are:
Master-eligible node: A node that has the master role, which makes it eligible to be elected as the master node, which controls the cluster. This role is in charge with the following actions in the cluster:
Cluster settings
Deleting or creating indices and settings
Adding or removing nodes
Shard allocation to the notes
Data node: A node that has the data role. Data nodes hold data and perform data-related operations such as CRUD, search, and aggregations. A node with the data role can fill any of the specialized data node roles.
Ingest node: A node that has the ingest role. Ingest nodes can apply an ingest pipeline to a document in order to transform and enrich the document before indexing. With a heavy ingest load, it makes sense to use dedicated ingest nodes to not include the ingest role from nodes that have master or data roles.
Small clusters allow some nodes to be assigned as data and master roles at the same time. Instead, large clusters need to have dedicated master nodes and data nodes separated. The dedicated masters don’t need to have the same computing resources as the data nodes.
3.2.2.2.3. Index
An index is used to store JSON documents in dedicated data structures corresponding to the data type of fields. An Elasticsearch index is divided into shards.
3.2.2.2.5. One analytics
In this case, we only have one Elasticsearch node, so all the information will be ingested in it.

* mdi = master, data and ingest roles
If the onanalytics fall, data will be lost because it would not be ingested.
This kind of architecture does not follow the cluster configuration. The default configuration should be valid. We only need to correctly configure the following parameter.
vi /etc/elasticsearch/elasticsearch.yml
Put the onanalytics IP.
cluster.initial_master_nodes: ["10.10.36.175"]
3.2.2.2.6. Two analytics
2 nodes 1 replica
Warning
Do not use it on clients.
In this case we can find a cluster with two nodes. The two nodes have data, master and ingest roles. What is necessary in these cases, is to configure one replica to always have a backup shard.

* mdi = master, data and ingest roles
It is not recommended to have only two nodes. Imagine that onanalytics01 is the master elected. If onanalytics02 falls, the shards would be promoted, and the cluster will work correctly without data loss. But, if the node that falls is the master elected, in this case onanalytics01, the cluster will fall because the onanalytics02 would not have any other active master to do a master election. In this second case, the data ingested when the cluster is down, will be lost.
2 nodes 1 replica with a tiebreaker
Warning
If it is possible, use example 3.
In this case we can find the same scenario as in the Example 1. The difference is that we have a master voting that will help us keep the cluster up when a node fails. This will avoid data loss.

* mdi = master, data and ingest roles
3.2.2.2.7. Three analytics
In this case, we can find a cluster with three nodes. The three nodes have data, master and ingest roles. What is necessary in these cases, is to configure one replica to always have a backup shard.

* mdi = master, data and ingest roles
In this case, if one node falls, there would be two master up, so the data will not be lost and the cluster would not fall down.
3.2.2.2.8. Four analytics
The minimum number of analytics to have a multizone cluster is 4. In this case we can find two zones with two master nodes each.

* mdi = master, data and ingest roles
If one of the zones fall down, we will have the other zone with two masters, so it will be possible to do the master election and the cluster will continue up.
3.2.2.2.9. Five analytics
In this case, we can find two zones with two masters each, and one data node in one of the zones. One zone will have more disk space than the other.

* mdi = master, data and ingest roles
The necessary disk space must be considered, because if the zone with three nodes falls, the zone with two nodes must have enough capacity to store all the information.
3.2.2.2.10. Six analytics
2 zones with 3 nodes
In this example, we can see two different zones. It is recommended to separate the elastic nodes in zones when they are connected to different network devices. When one device of the zone gets down, it is possible that all the nodes for that zone are offline.

* mdi = master, data and ingest roles
It is necessary to have two masters for every zone because if one zone falls, the other zone conserve two master nodes and the cluster continue up.
2 zones with 3 nodes and 1 voting master
In this example, every zone has one master, and there is a third node, that has another master node. This third zone helps to avoid cluster failure when one zone comes down, because the cluster conserve two active masters.

* mdi = master, data and ingest roles
An example of the shards distribution with one replica and two zones can be something similar to the following:

3.2.2.2.11. More than six analytics
If we have more than 6 analytics, it is necessary to create the zones corresponding to the number of sites of the client. Then, we need to distribute two master nodes to each zone. The nodes remaining will be data nodes equitably distributed among the different zones.

* mdi = master, data and ingest roles
3.2.2.2.11.1. Iptables
The are some iptables needed for ELK communication at onanalytics. We need to edit the file:
vi /etc/sysconfig/iptables
The Logstash pipelines will send data to Elasticsearch on port 9200. The ON Cores will access the elasticsearch data on port 9200. And the Elasticsearch cluster needs to have the port 9300 accessible for the other onanalytics on the cluster:
-A INPUT -s {{ onaggregatorXX_ip }} -p tcp -m state --state NEW -m tcp --dport 9200 -j ACCEPT
-A INPUT -s {{ oncoreXX_ip }} -p tcp -m state --state NEW -m tcp --dport 9200 -j ACCEPT
-A INPUT -s {{ onanalyticsXX_ip }} -p tcp -m state --state NEW -m tcp --dport 9300 -j ACCEPT
It is necessary to restart the service:
systemctl restart iptables
3.2.2.2.11.2. Cluster Configuration
Note
Remember to perform the Basic Configuration before proceeding.
Before creating a cluster, we need to delete all the elastic data. If we have important info in our elastic, we will need to export the data before deleting it and import it when the cluster is up again. If we want to add a node to the current cluster, we also need to delete the data of the new node.
systemctl stop elasticsearch
rm -rf /var/lib/elasticsearch/nodes
3.2.2.2.12. Master
First, we need to edit the elasticsearch configuration file:
vi /etc/elasticsearch/elasticsearch.yml
For a Master node we need to edit the following lines:
The cluster name. It needs to be the same on all the analytics in the cluster.
cluster.name: {{CLUSTER_NAME}}
The node name.
node.name: {{NODE_NAME}}
The zone configuration if it is necessary. All the nodes in the same zone need to have the same zone ID.
node.attr.rack: {{ZONE_ID}}
cluster.routing.allocation.awareness.attributes: rack
The discover seed hosts. We need to add the master’s IPs, excluding its own IP.
discovery.seed_hosts: ["{{ MASTER2_IP }}","{{ MASTER3_IP }}"]
The cluster initial master nodes. We need to add the master’s node names.
cluster.initial_master_nodes: ["{{ MASTER1_NODE_NAME }}","{{ MASTER2_NODE_NAME }}","{{ MASTER3_NODE_NAME }}"]
We need to add the following lines for select master role, data role, and ingestion role.
node.master: true
node.data: true
node.ingest: true
Save the file and restart the elasticsearch.
systemctl restart elasticsearch
3.2.2.2.13. Data
First, we need to edit the elasticsearch configuration file:
vi /etc/elasticsearch/elasticsearch.yml
For a Data node, we need to edit the following lines:
The cluster name. It needs to be the same on all the analytics in the cluster.
cluster.name: {{CLUSTER_NAME}}
The node name.
node.name: {{NODE_NAME}}
The zone configuration if it is necessary. All the nodes in the same zone need to have the same zone ID.
node.attr.rack: {{ZONE_ID}}
cluster.routing.allocation.awareness.attributes: rack
The discover seed hosts that provide a list of the addresses of the master-eligible nodes in the cluster. We need to add the master’s Ips or the name resolution in /etc/hosts, excluding its own IP.
discovery.seed_hosts: ["{{ MASTER2_IP }}","{{ MASTER3_IP }}"]
The cluster’s initial master nodes, that sets the initial set of master-eligible nodes in a new cluster. We need to add the masters node names.
cluster.initial_master_nodes: ["{{ MASTER1_NODE_NAME }}","{{ MASTER2_NODE_NAME }}","{{ MASTER3_NODE_NAME }}"]
We need to add the following lines for select master role, data role, and ingestion role.
node.master: false
node.data: true
node.ingest: false
Save the file and restart the Elasticsearch service.
systemctl restart elasticsearch
To see that all the roles and nodes are configured properly we can execute the next command on any node in the cluster:
curl -XGET localhost:9200/_cat/nodes?v=true&pretty=true
We will see something like the following:

3.2.2.2.13.1. Number of shards and replicas
To add the number of shards and replicas, we need to go to a Master cli and execute the following curl:
curl -XPUT localhost:9200/_template/index_defaults -H "Content-Type: application/json" -d '
{
"template": "*",
"settings": {
"number_of_shards": {{ NUMBER_OF_SHARDS }},
"number_of_replicas": {{ NUMBER_OF_REPLICAS }}
}
}'
All the indexes that will be created in the future will be divided in the number of shards entered, and every shard will have the number of replicas entered.
If an index already exists, the number of shards will not be modified, but we can reindex it to get the number of shards we want. To reindex an index, we need to execute the following steps:
Copy the index into a new index. It will create a new index with the new shards and replica number.
curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d '
{
"source": {
"index": "<INDEX_NAME>"
},
"dest": {
"index": "new_index"
}
}'
Remove the old index.
curl -XDELETE http://localhost:9200/<INDEX_NAME>
Copy the new index into the index we are reindexing:
curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d '
{
"source": {
"index": "new_index"
},
"dest": {
"index": "<INDEX_NAME>"
}
}'
Delete the auxiliary index created:
curl -XDELETE http://localhost:9200/new_index
With the open source Cerebro (Annex A - Install Cerebro), we can see that the new changes have been applied:

3.2.2.2.13.2. Elasticsearch networking
There are some parameters that can be useful in the elasticsearch.yml file:
vi /etc/elasticsearch/elasticsearch.yml
By default, Elasticsearch only binds to loopback addresses such as 127.0.0.1 and [::1]. This is sufficient to run a cluster of one or more nodes on a single server for development and testing, but a resilient production cluster must involve nodes on other servers. There are many network settings, but usually, all you need to configure is network.host:
network.host: 192.168.1.10
When you provide a value for network.host, Elasticsearch assumes that you are moving from development mode to production mode, and upgrades a number of system start-up checks from warnings to exceptions. See the differences between development and production modes.
The network.host setting explained in Commonly used network settings is a shortcut that sets the bind host and the publish host at the same time. In advanced used cases, such as when running behind a proxy server, you may need to set these settings to different values:
network.bind_host: This specifies which network interface(s) a node should bind to in order to listen for incoming requests. A node can bind to multiple interfaces, e.g. two network cards, or a site-local address and a local address. Defaults to network.host.
network.publish_host: The publish host is the single interface that the node advertises to other nodes in the cluster, so that those nodes can connect to it. Currently, an Elasticsearch node may be bound to multiple addresses, but only publishes one. If not specified, this defaults to the “best” address from network.host, sorted by IPv4/IPv6 stack preference, then by reachability. If you set a network.host that results in multiple bind addresses yet rely on a specific address for node-to-node communication, you should explicitly set network.publish_host.
When the parameters are changed, save the file and restart the Elasticsearch service.
systemctl restart elasticsearch
3.2.2.2.13.3. Import Data to Elasticsearch
When the Elasticsearch cluster is created and up, we need to add the following templates mappings executing the following curls:
curl -s --noproxy "*" -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_template/identities' -d@/usr/share/opennac/analytics/elasticsearch/index_template_identities.json
curl -s --noproxy "*" -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_template/bro' -d@/usr/share/opennac/analytics/elasticsearch/index_template_bro.json
curl -s --noproxy "*" -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_template/radius' -d@/usr/share/opennac/analytics/elasticsearch/index_template_radius.json
curl -s --noproxy "*" -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_template/opennac_captive' -d@/usr/share/opennac/analytics/elasticsearch/index_template_opennac_captive.json
curl -s --noproxy "*" -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_template/opennac' -d@/usr/share/opennac/analytics/elasticsearch/index_template_opennac.json
curl -s --noproxy "*" -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_template/opennac_macport' -d@/usr/share/opennac/analytics/elasticsearch/index_template_opennac_macport.json
curl -s --noproxy "*" -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_template/opennac_net_dev' -d@/usr/share/opennac/analytics/elasticsearch/index_template_opennac_nd.json
curl -s --noproxy "*" -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_template/opennac_nd-' -d@/usr/share/opennac/analytics/elasticsearch/index_template_opennac_nd-.json
curl -s --noproxy "*" -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_template/opennac_ud' -d@/usr/share/opennac/analytics/elasticsearch/index_template_opennac_ud.json
curl -s --noproxy "*" -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_template/misc' -d@/usr/share/opennac/analytics/elasticsearch/index_template_misc.json
curl -s --noproxy "*" -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_template/external_syslog' -d@/usr/share/opennac/analytics/elasticsearch/index_template_external_syslog.json
curl -s --noproxy "*" -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_template/third_party_vpn' -d@/usr/share/opennac/analytics/elasticsearch/index_template_third_party_vpn.json
Then, we need to import all the Kibana customized data to the .kibana index in Elasticsearch:
bash /usr/share/opennac/analytics/scripts/elk8_update.sh -o update
3.2.2.2.13.3.1. Annex A - Install Cerebro
Cerebro is an open source (MIT License) elasticsearch web admin tool built using Scala, Play Framework, AngularJS and Bootstrap.
https://github.com/lmenezes/cerebro
Is it important to install Cerebro in a server that has access to the elasticsearch cluster.
To install Cerebro, we need to download the last release from:
https://github.com/lmenezes/cerebro/releases
In our case, we are going to download the next release:
wget https://github.com/lmenezes/cerebro/releases/download/v0.9.4/cerebro-0.9.4-1.noarch.rpm
It is necessary to install it:
rpm -i cerebro-0.9.4-1.noarch.rpm
And then we need to execute it:
/bin/cerebro
The default port for Cerebro is 9000. If we want to execute Cerebro in another port we can use the following command:
bin/cerebro -Dhttp.port={{ NEW_PORT }}
When Cerebro is running we can access from a browser with the following URL:
http://{{ SERVER_IP }}:{{ CEREBRO_PORT }}
We will see the following:

Entering an elastic server IP, we will see all the information that Cerebro give us about the elastic cluster. For example:
http://{{ ELASTIC_SERVER_IP }}:9200
