akuszyk.com

A blog about software engineering, collaboration, and--occasionally--Emacs.

View on GitHub

Running ELK locally to interrogate container logs

I recently needed to investigate an issue on a live environment that had been highlighted by way of visualisations in Kibana based on application specific logs. The logs themselves were structured as JSON and contained some important metrics about the application’s performance. In order to investigate this issue locally, I needed to run the application under similar conditions to the live environment and analyse the logs with a similar visualisation to production.

Unfortunately, although running the application locally was reasonably easy, replicating the log ingestion pipeline was less-so. Our setup involved integration between AWS, Fluentd and Logz. All I really wanted to do was pickup the stdout logs from my application, parse their JSON and ingest them into an Elasticsearch database so that I could visualise them in Kibana.

In other words, I just wanted to run a local ELK stack.

It turns out, this was quite easy to achieve, and - whilst there are plenty of examples out there on the internet - this post ties together my learnings in a simple way. If you just want to jump to the implementation, you can clone https://github.com/andykuszyk/local-elk and run docker-compose up. Don’t forget to checkout the README.md.

Running a local ELK stack using docker-compose

It’s pretty easy to get a local ELK stack up and running using docker-compose. The following docker-compose.yml file demonstrates this:

version: "3.3"

services:
  elasticsearch:
    image: elasticsearch:7.2.0
    environment:
      - discovery.type=single-node
    ports:
      - "9200:9200"
      - "9300:9300"
    volumes:
      - esdata1:/usr/share/elasticsearch/data

The /usr/share/elasticsearch/data directory is mounted into a named volume here (see the end of the docker-compose.yml file) so that the data stored in Elasticsearch is persisted between instances of the container. This is useful if you’re starting up and tearing down the compose file regularly and don’t want to re-create things like Kibana configuration. It also preserves all your previous logs.

  kibana:
    image: kibana:7.2.0
    ports:
      - "5601:5601"

No additional config is required for Kibana, the vanilla Docker image is fine.

  logstash:
    build: logstash
    ports:
      - "5044:5044"

A custom configuration for Logstash is useful here, so build: logstash instructs docker-compose to use the Dockerfile in the ./logstash directory. See later for details.

  filebeat:
    build: filebeat
    user: root
    environment:
      - setup.kibana.host=kibana:5601
      - output.elasticsearch.hosts=["elasticsearch:9200"]
      - strict.perms=false
    volumes:
      - type: bind
        source: /var/lib/docker/containers
        target: /var/lib/docker/containers
      - type: bind
        source: /var/run/docker.sock
        target: /var/run/docker.sock
        mode: ro

As with Logstash, a custom configration for Filebeat is useful here. Furthermore, we’re giving Filebeat access to the Docker daemon on your local host so that it can interrogate information about containers directly and retrieve their logs.

volumes:
  esdata1:

Finally, this is the named volume in use by the elasticsearch service.

Configuring Logstash for use locally

Its useful to do two things to configure Logstash for your local ELK setup:

  1. Provide a custom Logstash pipeline definition for any specific log parsing you might want to do;
  2. Override the default Logstash Docker entrypoint to reduce the amount of noise in your logs.

This is achieved through three files in a ./logstash directory.

1. Dockerfile

FROM docker.elastic.co/logstash/logstash:7.2.0
COPY pipeline.conf /usr/share/logstash/pipeline/pipeline.conf
COPY entrypoint.sh ./entrypoint.sh
CMD ./entrypoint.sh

This file uses the base Logstash Docker image and copies in the two other files mentioned here, overriding the entrypoint.

2. entrypoint.sh


#!/bin/bash

# To prevent the logs from logstash itself from spamming filebeat, we re-direct
# the stdout from logstash to /dev/null here. If you need to see the output from
# logstash when debugging, remove this re-direct.
logstash > /dev/null

This file simply re-directs the Logstash output to /dev/null. By default, Logstash outputs information for every message that it parses which adds a lot of noise to the logs ingested into Elasticsearch.

3. pipeline.conf


input {
  beats {
    port => 5044
  }
}
filter {
  # Your custom expressions here.
}
output {
  elasticsearch { hosts => ["elasticsearch:9200"] }
}

For now, this pipeline definition does nothing more than pass on your log messages from Filebeat to Elasticsearch, however it can be useful for more advanced processing of your log messages. See later for details.

Configuring Filebeat for use locally

Filebeat needs some basic configuration to allow it to automatically read information from Docker about containers and their logs as well as to work with Logstash to send the log messages to Elasticsearch. This is achieved through two files in the ./filebeat directory.

1. Dockerfile

FROM docker.elastic.co/beats/filebeat:7.2.0
COPY filebeat.yml /usr/share/filebeat/

This Dockerfile simply uses the base Docker image and copies in the configuration file in this directory.

2. filebeat.yml


filebeat.config:
  modules:
    path: ${path.config}/modules.d/*.yml
    reload.enabled: false

filebeat.autodiscover:
  providers:
    - type: docker
      hints.enabled: true

output.logstash:
  hosts: 'logstash:5044'

The guts of this file are in the filebeat.autodiscover directive, which instructs Filebeat to source its logs from Docker. The output directive simply tells Filebeat to send its logs to Logstash, rather than directly to Elasticsearch.

If you’re logs are structured - for example, as JSON - this configuration can be extended to parse them. See later for details.

What if my logs need parsing? e.g. they’re JSON.

If your logs need parsing, this can be achieved in the ./filebeat/filebeat.yml config or in the ./logstash/pipeline.conf depending on which approach you’d like to take (Filebeat vs. Logstash).

If your logs are structured as JSON, the simplest thing to do is get Filebeat to parse them. An example filebeat.yml is as follows:


filebeat.config:
  modules:
    path: ${path.config}/modules.d/*.yml
    reload.enabled: false

filebeat.autodiscover:
  providers:
    - type: docker
      hints.enabled: true
      templates:
        - condition:
            contains:
              docker.container.image: YOUR_CONTAINER_NAME 
          config:
          - type: docker
            containers.ids:
              - "${data.docker.container.id}"
            json.keys_under_root: true
            json.add_error_key: true

output.logstash:
  hosts: 'logstash:5044'

In this example, replace YOUR_CONTAINER_NAME with part of your container’s image name. This will instruct Filebeat to only try parsing structured logs for your particular container (and avoid it trying to parse unstructured logs).

What if my logs are not parsed correctly?

Often, Filebeat does an alright job of parsing your logs, but might get things like datatypes wrong. Parsing the correct datatypes (or anything else more complicated) cannot be done in Filebeat, but a simple pipeline in Logstash can be used. An example pipeline.conf demonstrates this:

input {
  beats {
    port => 5044
  }
}
filter {
  mutate {
    convert => {
      "YOUR_NUMERIC_FIELD" => "integer"
    }
  }
}
output {
  elasticsearch { hosts => ["elasticsearch:9200"] }
}

In this example, the field YOUR_NUMERIC_FIELD in your JSON log message has been converted to an integer by Logstash.

Conclusion

That’s it - with the above config and Docker files, its pretty easy to get a local ELK stack running with docker-compose up. All of the config files reference in this post can be found at https://github.com/andykuszyk/local-elk, which you can also clone and use to run docker-compose up directly.