ELK Stack (Elasticsearch, Logstash and Kibana) on FreeBSD - Part 3

This article is the third of a three-piece how to.

You can find the first part here and the second part here

To create alerts for our ELK setup, we can use different methods.

The one I will show you is based on ElastAlert from Yelp.

Let's install ElastAlert (no port is available, so I will install it manually in a virtualenv).

We need to be root (and use bash - for the virtualenv)

sudo su

Install py-virtualenv

portmaster devel/py-virtualenv

Create and use a virtualenv

virtualenv /usr/local/elastalert
source /usr/local/elastalert/bin/activate
mkdir -p /usr/local/elastalert/etc

Download and install the repo

mkdir /tmp/elastalert
cd /tmp/elastalert
git clone https://github.com/Yelp/elastalert.git
cd elastalert
python setup.py build
pip install setuptools --upgrade
python setup.py install
pip install -r requirements.txt
# the first time it will (probably) fail due to an error related to argparse
pip install -r requirements.txt

Create the elastalert config file in /usr/local/elastalert/etc/config.yml

rules_folder: /usr/local/elastalert/etc/rules

# The unit can be anything from weeks to seconds
run_every:
  minutes: 1

# ElastAlert will buffer results from the most recent
# period of time, in case some log sources are not in real time
buffer_time:
  minutes: 15
# The elasticsearch hostname for metadata writeback
# Note that every rule can have it's own elasticsearch host
es_host: 127.0.0.1
# The elasticsearch port
es_port: 9100
# Optional URL prefix for elasticsearch
#es_url_prefix: elasticsearch
# Connect with SSL to elasticsearch
use_ssl: False

# The index on es_host which is used for metadata storage
# This can be a unmapped index, but it is recommended that you run

# elastalert-create-index to set a mapping
writeback_index: elastalert_status

# If an alert fails for some reason, ElastAlert will retry
# sending the alert until this time period has elapsed
alert_time_limit:
  days: 2

Create the rules directory

mkdir -p /usr/local/elastalert/etc/rules

Create an alert (frequency based) that will send an email if more than 9 events will happen in 1 hour with status: 404 and type: nginx

In /usr/local/elastalert/etc/rules/frequency_nginx_404.yaml

name: Large Number of 404 Responses
es_host: 127.0.0.1
es_port: 9100
index: logstash-*
filter:
  - term:
      - status: 404
  - term:
      - type: nginx
type: frequency
num_events: 10
timeframe:
  hours: 1
alert:
  - "email"
email:
- "[email protected]"

Create an index for metadata storage

(elastalert)[dave@elk /usr/local/elastalert]$ ./bin/elastalert-create-index
Enter elasticsearch host: 127.0.0.1
Enter elasticsearch port: 9100
Use SSL? t/f: f
Enter optional basic-auth username:
Enter optional basic-auth password:
Enter optional Elasticsearch URL prefix:
New index name? (Default elastalert_status)
Name of existing index to copy? (Default None)
New index elastalert_status created
Done!

Test our rule

(elastalert)[dave@elk /usr/local/elastalert]# ./bin/elastalert-test-rule etc/rules/frequency_nginx_404.yaml
[...]

Launch ElastAlert (in a tmux session, maybe?)

(elastalert)[dave@elk /usr/local/elastalert]$ ./bin/elastalert --config etc/config.yml --debug
INFO:elastalert:Starting up
INFO:elastalert:Queried rule Large Number of 404 Responses from 2016-02-16 17:22 CET to 2016-02-16 17:37 CET: 9 hits
[...]
INFO:elastalert:Ran Large Number of 404 Responses from 2016-02-16 17:22 CET to 2016-02-16 17:37 CET: 9 query hits, 0 matches, 0 alerts sent

Let's generate some http/404 (again, it's time to let the world know how much you agree with the systemd architecture)

INFO:elastalert:Sleeping for 59 seconds
INFO:elastalert:Queried rule Large Number of 404 Responses from 2016-02-16 17:23 CET to 2016-02-16 17:38 CET: 10 hits
[...]
INFO:elastalert:Alert for Large Number of 404 Responses at 2016-02-16T16:38:00.680Z:
INFO:elastalert:Large Number of 404 Responses

At least 10 events occurred between 2016-02-16 16:38 CET and 2016-02-16 17:38 CET

@timestamp: 2016-02-16T16:38:00.680Z
@version: 1
_id: AVLq8hzNIkvyITAb373u
_index: logstash-2016.02.16
_type: nginx
agent: "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/537.36"
bytes: 564
file: /var/log/nginx/nginx.access.log
host: [
    "blog.gufi.org"
]
message: xxx.xxx.xxx.xxx - blog.gufi.org [16/Feb/2016:17:37:57 +0100] "GET /test/foo/bar HTTP/1.1" 404 564 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/537.36" "141.101.98.223" [-] "-" "-" "0.000"
offset: 1853436
remote_addr: xxx.xxx.xxx.xxx
request_httpversion: 1.1
request_time: 0.000
request_url: /test/foo/bar
request_verb: GET
status: 404
time_local: 16/Feb/2016:17:37:57 +0100
type: nginx
upstream_addr: -
xforwardedfor: "xxx.xxx.xxx.xxx"

Once removed the --debug parameter, we will start receiving e-mails. (yay!)

Before leaving

Few words to answer the question 'Ok, so how can this be useful to me/my company/my fiancee?'

Many of us, sysadmins (or gods), have used Nagios and its derivatives (Zabbix/Icinga) for years but it's time now to say Nagios goodbye (and thanks for all the fish).

Tools like ELK or OpenTSDB (or InfluxDB/KairosDB) and Bosun/Prometheus have been created to give us a new generation of more suitable tools in environments that become bigger and bigger: I know, to create and manage an ELK stack (or a OpenTSDB/Grafana/Bosun stack) requires more effort than to manage a Nagios box, but it's an overhead you will soon get used to (and probably you already have an hadoop/hbase installation to manage, right?).

In this case, having an in house tool to parse your application logs will allow you:

to blame developers if something goes wrong (just in case you need further reasons to)
to not give them access to any production machines (yes, they will ask to, anyway)
to be able to search all your logs at once (like grepping on a syslog-ng basedir with improved superpowers) or with a better semantics (i.e. spotting trends)
to create dashboards (for your management, you know...) or alerts (because you need a good reason to skip that boring meeting, right?)

ELK engineers suggest to use ELK not only for DEBUG/ERROR messages, but also for the application ones: this will add a great value to your logs and, once again, the world will be a safer place thanks to you, bro.

There were no screenshots in this article, so that's a potato for you here