Filebeat and AWS Elasticsearch

First published 12 May 2019

Elasticsearch, Logstash and Kibana (or ELK) are standard tools for aggregating and monitoring server logs. This post details the steps I took to integrate Filebeat (the Elasticsearch log scraper) with an AWS-managed Elasticsearch instance operating within the AWS free tier.

Motivation

My goal was to have Elasticsearch aggregating my server logs with a Kibana front-end for monitoring and searching. I had already set up and played with AWS Cloudwatch, but found it cumbersome and slow – adding metrics to the dashboard or exploring anomolies was painful at best.

There are three options for running Elasticsearch and Kibana:

Deploy your own instance on your own hardware or EC2 instances. There are several Terraform scripts ready-to-go for this.
Use the Elastic Cloud hosted Elasticsearch. This is a paid-for managed service run by the developers of Elasticsearch.
Use the AWS Elasticsearch service.

This last option was the only reasonable option for me as I did not want to spend any money, and the AWS service fits within their free tier. (The Elastic Cloud offering provides only a 14 day trial, after which it's a minimim $16/month, far too much for a toy project).

Note that Logstash is not part of my solution; it is not included in AWS Elasticsearch and would need to be deployed separately. Given that I only need trivial parsing of log fields before ingestion, the Filebeat modules are sufficient.

Summary of issues

These are the issues I ran into while setting this up, due to obscure or missing documentation, or incompatibilities between AWS Elasticsearch and Filebeat:

Unclear guidance on security group and VPC setup
AWS Elasticsearch runs on a non-standard port (443 instead of 9200)
Elastic-licensed Filebeat is incompatible with AWS; must use OSS-licensed version
Filebeat fails setup due to X-Pack unavailability on AWS; need to skip this
Filebeat Apache module requires GeoIP and User-Agent modules in Elasticsearch, both of which can be easily omitted

I found guidance on all but the last of these eventually by searching support forums, and the last I just hacked away until it worked.

Process

Here's my full process for setting this up. I started with an existing EC2 instance an a VPC running Ubuntu and Apache.

AWS Elasticsearch

Before setting up an Elasticsearch instance (or “domain”), create an EC2 security group that the Elasticsearch instance can use to allow ingress from other instances in the VPC. Add an ingress rule from each EC2 instance that will be providing log data into port 443.

Now just create the Elasticsearch domain choosing the default values. I selected a t2.small.elasticsearch instance in order to fit within the free tier. Ensure that the instance is VPC only, not public, and select the previously-created security group. For the access policy, select the “Do not require signing request with IAM credential” template (as Filebeat does not support signing requests AFAICT; this is the reason why the instance must be protected by the VPC).

Filebeat

Follow the directions to install Filebeat, ensuring that you use the OSS-licensed version. Initially I had installed the default Elastic-licensed version, but this cannot authenticate with AWS Elasticsearch.

Edit /etc/filebeat/filebeat.yml to set up both the Elasticsearch and Kibana URLs (these are shown on the AWS Elasticsearch dashboard). In both cases you will need to modify the URL to give it an explicit port of 443.

Now let Filebeat set up its indexes and dashboards with

sudo filebeat setup --pipelines --template --dashboards

By explicitly providing the --pipelines --template --dashboards arguments we are ommitting the --machine-learning option that is implied by default, and causes an error when used with AWS Elasticsearch.

Start Filebeat and then watch the systemd log for errors:

sudo filebeat test output
sudo service filebeat start
journalctl -f

(Press Ctrl+C to stop watching the log).

Apache logs

The Filebeat Apache module provides the necessary logic for scraping error and access logs from the web server; however it depends on some plugins to be installed in Elasticsearch – something that isn't possible with AWS Elasticsearch.

Thankfully the Filebeat modules are just a collection of YAML and JSON files that are easily modified. I copied the existing /usr/share/filebeat/module/apache directory to /usr/share/filebeat/module/apache-aws, and then edited the files inside to remove any use of the geoip or useragent modules.

To enable this new module you've created, copy /etc/filebeat/modules.d/apache.yml.disabled to /etc/filebeat/modules.d/apache-aws.yml.disabled and edit the content to point to your new module. Then enable it, restart filebeat and check for errors:

sudo service filebeat stop
sudo filebeat module enable apache-aws
sudo service filebeat start
journalctl -f

Kibana

We've now got Apache logs being read by Filebeat and ingested into Elasticsearch; time to look at them in Kibana. Because the AWS Elasticsearch instance is running in a VPC, your web browser has no access to it.

There are three possible solutions:

Create a VPN to access your VPC. I don't know how to do this.
Have your Apache instance proxy requests to Kibana into the VPC. This is insecure and would require a login (doable).
Create an SSH tunnel into the VPC using your existing EC2 instance.

I did the last of these as it's quite simple and only a minor inconvenience. Create an entry in your ~/.ssh/config (on your desktop) along the lines of

Host kibana
    HostName 1.2.3.4
    User ubuntu
    LocalForward 9200 YOUR-VPC-DOMAIN.es.amazonaws.com:443

Then open the tunnel with (on your desktop):

ssh kibana -N

And connect to https://localhost:9200 in your browser.

partially­disassembled