Adding more fields to Filebeat
First published 14 May 2019
In the previous post I wrote up my setup of Filebeat and AWS Elasticsearch to monitor Apache logs. This time I add a couple of custom fields extracted from the log and ingested into Elasticsearch, suitable for monitoring in Kibana.
This did not turn out to be straightforward—while all the required plumbing and customisation is already supported, the process of getting fields to be interpreted with the correct data type is convoluted and badly documented. Barclay Howe's blog was very useful in figuring this out.
The two fields I'm adding are:
- Request URL domain name (
url.domain
) - Response time per request, in microseconds (
http.response.time
)
Ingesting an extra field
My web server hosts pages for a few domains, using Apache's VirtualHosts. By default Filebeat provides a url.original
field from the access logs, which does not include the host portion of the URL, only the path. My goal here is to add a url.domain
field, so that I can distinguish requests that arrive at different domains.
First of all, edit /etc/apache2/apache2.conf
to add an extra field to the LogFormat
. In my case I added \"%V\"
to the end of the combined
log format directive, in order to have it output the canonical host name. Restart Apache and tail /var/log/apache2/access.log
to check that this is working.
Now we need Filebeat to parse this field from the log line. This is pretty simple; just edit the /usr/share/filebeat/module/apache/access/ingest/default.json
file, which begins with:
{
"description": "Pipeline for parsing Apache HTTP Server access logs. Requires the geoip and user_agent plugins.",
"processors": [{
"grok": {
"field": "message",
"patterns":[
"%{IPORHOST:source.address} - %{DATA:user.name} \\[%{HTTPDATE:apache.access.time}\\] \"%{WORD:http.request.method} %{DATA:url.original} HTTP/%{NUMBER:http.version}\" %{NUMBER:http.response.status_code:long} (?:%{NUMBER:http.response.body.bytes:long}|-)( \"%{DATA:http.request.referrer}\")?( \"%{DATA:user_agent.original}\")?",
"%{IPORHOST:source.address} - %{DATA:user.name} \\[%{HTTPDATE:apache.access.time}\\] \"-\" %{NUMBER:http.response.status_code:long} -",
"\\[%{HTTPDATE:apache.access.time}\\] %{IPORHOST:source.address} %{DATA:apache.access.ssl.protocol} %{DATA:apache.access.ssl.cipher} \"%{WORD:http.request.method} %{DATA:url.original} HTTP/%{NUMBER:http.version}\" %{NUMBER:http.response.body.bytes:long}"
],
"ignore_missing": true
}
Whitespace added for clarity to show that there are three patterns.
For me, the first of these patterns is the one that matches the Apache log format, so I deleted the others for readability, then added a %{DATA:url.domain}
token into the pattern at the end to match the format I'd defined in the Apache configuration:
"%{IPORHOST:source.address} - %{DATA:user.name} \\[%{HTTPDATE:apache.access.time}\\] \"%{WORD:http.request.method} %{DATA:url.original} HTTP/%{NUMBER:http.version}\" %{NUMBER:http.response.status_code:long} (?:%{NUMBER:http.response.body.bytes:long}|-) \"%{DATA:http.request.referrer}\" \"%{DATA:user_agent.original}\" \"%{DATA:url.domain}\""
In order to have this new pattern take effect, it must be uploaded into Elasticsearch. I found that I needed to add this line to my /etc/filebeat/filebeat.yml
:
filebeat.overwrite_pipelines: true
which suggests that it will overwrite existing pipelines, but in practice I also had to delete the existing pipeline for this to work:
sudo service filebeat stop
curl -XDELETE "https://YOUR-ELASTICSEARCH-DOMAIN:443/_ingest/pipeline/filebeat-*"
sudo filebeat setup --pipelines
sudo service filebeat start
Wait for some events to be harvested, then check Kibana that new events have the url.domain
field populated.
Ingesting a custom field
Because the url.domain
field is defined by the default Filebeat index template, we did not have to do any work to define it ourselves. Now we'll go through the process of adding a brand new field that Filebeat and Elasticsearch know nothing about.
In my case, I wanted telemetry on the total response processing time per request, as an indicator of server performance.
Follow the same steps as with the previous field to add %D
(response time in microseconds) to the Apache log format and parse it as %{DATA:http.response.time:long}
in the ingest pipeline. This field, http.response.time
, is made-up, and if we stop at this point it will not have any data type associated with it; this causes Elasticsearch to import it as text by default, which means we can't do useful things like compute percentiles in Kibana. Follow these steps to add the field type, beginning with stopping the Filebeat service:
sudo service filebeat stop
Add the following magic to /etc/filebeat/filebeat.yml
:
setup.template.name: "filebeat"
setup.template.fields: "fields.yml"
setup.template.overwrite: true
Add the field definition to /etc/filebeat/fields.yml
, under the response.status_code
definition (around line 1137, and be wary of indentation):
- name: response.time
level: extended
type: long
description: Time to process the request, in microseconds
Tell Filebeat to regenerate its index template (effectively just converting this YAML file to JSON):
sudo filebeat setup --template
You can verify the result of the above by examining the resulting JSON:
sudo filebeat export template
Delete the existing template from Elasticsearch (again, this seems like something that's meant to be overwritten but in my experience was not):
curl -XDELETE "https://YOUR-ELASTICSEARCH-DOMAIN:443/_template/filebeat-*"
You also need a new index, which by default is created every day. Since I don't care about the existing history I just delete all the existing indices:
curl -XDELETE "https://YOUR-ELASTICSEARCH-DOMAIN:443/filebeat-*"
Restart Filebeat and you will see it recreating the template and index in the journalctl
log:
sudo service filebeat start
The final step is to refresh the field list in Kibana, from the Management tab. This prompts with a warning that it “resets the popularity count of each field”, but more importantly it also discards the previously cached type information for each field, which is what we need.
When iterating on getting this set up correctly, it is necessary to reset/delete the template, indices, pipeline and Kibana's field cache—discovering all of these was the main impediment for me getting the field to be interpreted by Kibana as the correct type, and the probable cause of the apparent voodoo directions above.