Running ELK locally to interrogate container logs
I recently needed to investigate an issue on a live environment that had been highlighted by way of visualisations in Kibana based on application specific logs. The logs themselves were structured as JSON and contained some important metrics about the application’s performance. In order to investigate this issue locally, I needed to run the application under similar conditions to the live environment and analyse the logs with a similar visualisation to production.
Unfortunately, although running the application locally was reasonably easy, replicating the log ingestion pipeline was less-so. Our setup involved integration between AWS, Fluentd and Logz. All I really wanted to do was pickup the stdout logs from my application, parse their JSON and ingest them into an Elasticsearch database so that I could visualise them in Kibana.
In other words, I just wanted to run a local ELK stack.
It turns out, this was quite easy to achieve, and - whilst there are plenty of examples out there on the internet - this post ties together my learnings in a simple way. If you just want to jump to the implementation, you can clone https://github.com/andykuszyk/local-elk and run docker-compose up
. Don’t forget to checkout the README.md
.
Running a local ELK stack using docker-compose
It’s pretty easy to get a local ELK stack up and running using docker-compose
. The following docker-compose.yml
file demonstrates this:
version: "3.3"
services:
elasticsearch:
image: elasticsearch:7.2.0
environment:
- discovery.type=single-node
ports:
- "9200:9200"
- "9300:9300"
volumes:
- esdata1:/usr/share/elasticsearch/data
The
/usr/share/elasticsearch/data
directory is mounted into a named volume here (see the end of thedocker-compose.yml
file) so that the data stored in Elasticsearch is persisted between instances of the container. This is useful if you’re starting up and tearing down the compose file regularly and don’t want to re-create things like Kibana configuration. It also preserves all your previous logs.
kibana:
image: kibana:7.2.0
ports:
- "5601:5601"
No additional config is required for Kibana, the vanilla Docker image is fine.
logstash:
build: logstash
ports:
- "5044:5044"
A custom configuration for Logstash is useful here, so
build: logstash
instructsdocker-compose
to use theDockerfile
in the./logstash
directory. See later for details.
filebeat:
build: filebeat
user: root
environment:
- setup.kibana.host=kibana:5601
- output.elasticsearch.hosts=["elasticsearch:9200"]
- strict.perms=false
volumes:
- type: bind
source: /var/lib/docker/containers
target: /var/lib/docker/containers
- type: bind
source: /var/run/docker.sock
target: /var/run/docker.sock
mode: ro
As with Logstash, a custom configration for Filebeat is useful here. Furthermore, we’re giving Filebeat access to the Docker daemon on your local host so that it can interrogate information about containers directly and retrieve their logs.
volumes:
esdata1:
Finally, this is the named volume in use by the
elasticsearch
service.
Configuring Logstash for use locally
Its useful to do two things to configure Logstash for your local ELK setup:
- Provide a custom Logstash pipeline definition for any specific log parsing you might want to do;
- Override the default Logstash Docker entrypoint to reduce the amount of noise in your logs.
This is achieved through three files in a ./logstash
directory.
1. Dockerfile
FROM docker.elastic.co/logstash/logstash:7.2.0
COPY pipeline.conf /usr/share/logstash/pipeline/pipeline.conf
COPY entrypoint.sh ./entrypoint.sh
CMD ./entrypoint.sh
This file uses the base Logstash Docker image and copies in the two other files mentioned here, overriding the entrypoint.
2. entrypoint.sh
#!/bin/bash
# To prevent the logs from logstash itself from spamming filebeat, we re-direct
# the stdout from logstash to /dev/null here. If you need to see the output from
# logstash when debugging, remove this re-direct.
logstash > /dev/null
This file simply re-directs the Logstash output to /dev/null
. By default, Logstash outputs information for every message that it parses which adds a lot of noise to the logs ingested into Elasticsearch.
3. pipeline.conf
input {
beats {
port => 5044
}
}
filter {
# Your custom expressions here.
}
output {
elasticsearch { hosts => ["elasticsearch:9200"] }
}
For now, this pipeline definition does nothing more than pass on your log messages from Filebeat to Elasticsearch, however it can be useful for more advanced processing of your log messages. See later for details.
Configuring Filebeat for use locally
Filebeat needs some basic configuration to allow it to automatically read information from Docker about containers and their logs as well as to work with Logstash to send the log messages to Elasticsearch. This is achieved through two files in the ./filebeat
directory.
1. Dockerfile
FROM docker.elastic.co/beats/filebeat:7.2.0
COPY filebeat.yml /usr/share/filebeat/
This Dockerfile
simply uses the base Docker image and copies in the configuration file in this directory.
2. filebeat.yml
filebeat.config:
modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
filebeat.autodiscover:
providers:
- type: docker
hints.enabled: true
output.logstash:
hosts: 'logstash:5044'
The guts of this file are in the filebeat.autodiscover
directive, which instructs Filebeat to source its logs from Docker. The output
directive simply tells Filebeat to send its logs to Logstash, rather than directly to Elasticsearch.
If you’re logs are structured - for example, as JSON - this configuration can be extended to parse them. See later for details.
What if my logs need parsing? e.g. they’re JSON.
If your logs need parsing, this can be achieved in the ./filebeat/filebeat.yml
config or in the ./logstash/pipeline.conf
depending on which approach you’d like to take (Filebeat vs. Logstash).
If your logs are structured as JSON, the simplest thing to do is get Filebeat to parse them. An example filebeat.yml
is as follows:
filebeat.config:
modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
filebeat.autodiscover:
providers:
- type: docker
hints.enabled: true
templates:
- condition:
contains:
docker.container.image: YOUR_CONTAINER_NAME
config:
- type: docker
containers.ids:
- "${data.docker.container.id}"
json.keys_under_root: true
json.add_error_key: true
output.logstash:
hosts: 'logstash:5044'
In this example, replace
YOUR_CONTAINER_NAME
with part of your container’s image name. This will instruct Filebeat to only try parsing structured logs for your particular container (and avoid it trying to parse unstructured logs).
What if my logs are not parsed correctly?
Often, Filebeat does an alright job of parsing your logs, but might get things like datatypes wrong. Parsing the correct datatypes (or anything else more complicated) cannot be done in Filebeat, but a simple pipeline in Logstash can be used. An example pipeline.conf
demonstrates this:
input {
beats {
port => 5044
}
}
filter {
mutate {
convert => {
"YOUR_NUMERIC_FIELD" => "integer"
}
}
}
output {
elasticsearch { hosts => ["elasticsearch:9200"] }
}
In this example, the field
YOUR_NUMERIC_FIELD
in your JSON log message has been converted to an integer by Logstash.
Conclusion
That’s it - with the above config and Docker files, its pretty easy to get a local ELK stack running with docker-compose up
. All of the config files reference in this post can be found at https://github.com/andykuszyk/local-elk, which you can also clone and use to run docker-compose up
directly.