Centralized logging is the foundation of observability in a Kubernetes cluster. If you want to understand what’s happening across distributed workloads, it quickly becomes a pain to hunt down outputs of different interacting parts.
We want to provide our users and customers with a more convenient way to have an overview of what’s going on across deployed Stackable operators and the products they manage.
As a first step towards this goal, we tried out Vector – a tool that can be configured to handle many types of observability data – logs among them.
This writeup was created as a by-product of an early proof of concept. The resulting instructions were so detailed, that we decided to publish them on their own, so more people could find them.
This started as a comment in this issue. All of Stackable’s code is open-source!
Read on, and feel free to follow the instructions, to set up a proof-of-concept Vector + Stackable setup. Mind you, the result is not meant to be production-ready, but it could be helpful if you are looking to check out Vector and learn more about cluster-wide log aggregation.
Note: if you are more curious about Stackable than logs, check out our demos.
Collecting Stackable logs with Vector – a tutorial
In this tutorial, the Stackable Data Platform is set up with an Apache Superset and Trino cluster. The logs of Superset and Trino as well as the logs from the according Stackable operators are gathered by Vector and stored in OpenSearch. These logs are then made visible in the OpenSearch Dashboards.
The Vector agents are deployed as a DaemonSet. The agents transform the log data into a uniform format and forward it directly to OpenSearch.
This tutorial is written for the release 22.09 of the Stackable Data Platform. Later releases will provide a more integrated approach.
Note: If you’re interested in learning more about the journey towards a more integrated approach, you can follow along! Check out this GitHub issue where we track the progress.
Install a Superset and Trino cluster in kind
If you haven’t set up stackablectl
, you can follow these instrcutions to do so. Now, you can create a kind cluster with the trino-taxi-data demo:
stackablectl demo install --kind-cluster trino-taxi-data
Be aware that this installs the latest version of the Stackable Data Platform.
OpenSearch prerequisites
OpenSearch uses a mmapfs directory by default to store its indices. The default operating system limits on mmap counts is likely to be too low – usually 65530, which may result in out of memory exceptions. So the Linux setting vm.max_map_count
on the host machine where kind is running, must be set to at least 262144.
Note: if you are on a Mac, the commands in this section won’t work for you. You can skip it, and continue with the rest of the tutorial. They are not strictly necessary to get a working prototype going.
To check the current value, run this command:
sysctl vm.max_map_count
The limit can be temporarily increased with:
sudo sysctl --write vm.max_map_count=262144
To permanently increase the value, add the following line to /etc/sysctl.conf
:
vm.max_map_count=262144
Then run sudo sysctl --load
to reload.
Install OpenSearch
Create the the file opensearch-values.yaml
to configure the OpenSearch Kubernetes deployment:
config:
opensearch.yml: |
plugins:
security:
# Use default security settings
allow_default_init_securityindex: true
# Allow communication between the nodes which use the
# certificates generated by the secret-operator
nodes_dn:
- CN=generated certificate for pod
# Use the certificate generated by the secret-operator
ssl:
http:
# Enable TLS on the REST layer
enabled: true
pemcert_filepath: certs/tls.crt
pemkey_filepath: certs/tls.key
pemtrustedcas_filepath: certs/ca.crt
transport:
pemcert_filepath: certs/tls.crt
pemkey_filepath: certs/tls.key
pemtrustedcas_filepath: certs/ca.crt
# Disable the verification of hostnames because internal IPs
# are used which are not included in the certificates
# generated by the secret-operator.
enforce_hostname_verification: false
extraEnvs:
# Disable the creation of demo certificates
- name: DISABLE_INSTALL_DEMO_CONFIG
value: "true"
extraVolumeMounts:
# Mount the certificate generated by the secret-operator
- name: tls
mountPath: /usr/share/opensearch/config/certs
extraVolumes:
# Request a TLS certificate from the secret-operator
- name: tls
ephemeral:
volumeClaimTemplate:
metadata:
annotations:
secrets.stackable.tech/class: tls
# Add the service opensearch-cluster-master to the
# distinguished names because this service is used by Vector
secrets.stackable.tech/scope: |-
service=opensearch-cluster-master
spec:
storageClassName: secrets.stackable.tech
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1
Install OpenSearch:
helm install opensearch opensearch \
--repo https://opensearch-project.github.io/helm-charts \
--values opensearch-values.yaml \
--wait
Install Vector with the log transformation rules
Create the file vector-transforms.yaml
containing a config map with the necessary transformation steps for the log output:
apiVersion: v1
kind: ConfigMap
metadata:
name: vector-transforms
data:
raw.vrl: |
# Create shortcuts for frequently used fields
# Short header names in OpenSearch Dashboards are also beneficial.
.component = .kubernetes.pod_labels."app.kubernetes.io/name"
.container = .kubernetes.container_name
operators.vrl: |
# Remove the colors from the log entry
.message = strip_ansi_escape_codes!(.message)
# Parse the log entry which consists of timestamp, level, and
# message separated by whitespaces.
. |= parse_regex!(.message,
r'^(?P<timestamp>\S+)\s+(?P<level>\S+)\s+(?P<message>.*)')
# Parse the timestamp in ISO 8601 / RFC 3339 date & time format
.timestamp = parse_timestamp!(.timestamp, "%+")
superset.vrl: |
# The log entry is either from Superset which consists of a
# timestamp, level, logger, and message separated by colons ...
log, err = parse_regex(.message,
r'^(?P<timestamp>\d+-\d+-\d+ \d+:\d+:\d+,\d+):(?P<level>DEBUG|INFO|WARNING|ERROR|CRITICAL):(?P<logger>[\w\.]+):(?P<message>.*)')
if err == null {
.origin = "Superset"
. |= log
.timestamp = parse_timestamp!(.timestamp, "%Y-%m-%d %H:%M:%S,%3f")
} else {
# or from gunicorn which consists of a timestamp, process, level,
# and message enclosed in brackets ...
log, err = parse_regex(.message,
r'^\[(?P<timestamp>\d+-\d+-\d+ \d+:\d+:\d+ [+-]\d+)\] \[(?P<process>\d+)] \[(?P<level>DEBUG|INFO|WARNING|ERROR|CRITICAL)\] (?P<message>.*)')
if err == null {
.origin = "gunicorn"
. |= log
.timestamp = parse_timestamp!(.timestamp, "%Y-%m-%d %H:%M:%S %z")
} else {
# or a standard output which also exists as a log entry and can
# therefore be discarded.
.origin = "stdout"
}
}
# Align the log levels with the ones used in the operators
if .level == "WARNING" {
.level = "WARN"
} else if .level == "CRITICAL" {
.level = "ERROR"
}
superset_metrics.vrl: |
# Parse the log line which consists of keys and values in the
# format `key1=value1 key2=value2`. The values are enclosed in
# quotes if necessary and are properly escaped.
.structured = parse_key_value!(.message)
.timestamp = parse_timestamp!(.structured.ts, "%+")
.level = upcase!(.structured.level)
.message = .structured.msg
trino.vrl: |
# Parse the log entry which consists of timestamp, level, thread,
# logger, and message separated by tabs.
. |= parse_regex!(.message,
r'^(?P<timestamp>[^\t]+)\t(?P<level>[^\t]+)\t(?P<thread>[^\t]+)\t(?P<logger>[^\t]+)\t(?P<message>.*)')
# Parse the timestamp in ISO 8601 / RFC 3339 date & time format
.timestamp = parse_timestamp!(.timestamp, "%+")
Then apply this file:
kubectl apply --filename=vector-transforms.yaml
Create the file vector-values.yaml
with the following content to configure the Helm deployment:
# Deploy the agent as a DaemonSet
role: Agent
service:
# Disable Vector's service because it is not used here and would
# require additional configuration.
enabled: false
customConfig:
# Set a directory used for persisting state which is writable by
# Vector
data_dir: /vector-data-dir
sources:
# Collect all log data for Kubernetes nodes enriched with metadata
# from the Kubernetes API
k8s_all:
type: kubernetes_logs
transforms:
# Raw Kubernetes logs with a component field
raw_logs:
type: remap
inputs:
- k8s_all
file: /vrl/raw.vrl
# Raw operator logs
raw_operators:
type: filter
inputs:
- raw_logs
condition: >
includes([
"superset-operator",
"trino-operator",
],
.component
)
# Operator logs with timestamp, level, and message
structured_operators:
type: remap
inputs:
- raw_operators
file: /vrl/operators.vrl
# Raw Superset logs from the superset container
raw_superset:
type: filter
inputs:
- raw_logs
condition: .component == "superset" && .container == "superset"
# Superset logs with timestamp, level, logger, and message but also
# with unstructured content from the standard output
semistructured_superset:
type: remap
inputs:
- raw_superset
file: /vrl/superset.vrl
# Superset logs with timestamp, level, logger, and message
structured_superset:
type: filter
inputs:
- semistructured_superset
condition: |-
# Discard the standard output because it also exists as a log
# entry
.origin != "stdout"
# Raw Superset logs from the metrics container
raw_superset_metrics:
type: filter
inputs:
- raw_logs
condition: .component == "superset" && .container == "metrics"
# Superset metrics logs with timestamp, level, and message
structured_superset_metrics:
type: remap
inputs:
- raw_superset_metrics
file: /vrl/superset_metrics.vrl
# Raw Trino logs
raw_trino:
type: filter
inputs:
- raw_logs
condition: .component == "trino"
# Raw multi-line Trino logs
raw_multiline_trino:
type: reduce
inputs:
- raw_trino
merge_strategies:
message: concat_newline
starts_when: |-
# The next entry starts when the message contains a timestamp,
# level, thread, logger, and message separated by tabs.
# Multi-line messages and stacktraces are merged into one log
# entry. The newline is preserved with the merge strategy
# "concat_newline".
match(string!(.message), r'^[^\t]+\t[^\t]+\t[^\t]+\t[^\t]+\t.*')
# Trino logs with timestamp, level, thread, logger, and message
structured_trino:
type: remap
inputs:
- raw_multiline_trino
file: /vrl/trino.vrl
sinks:
opensearch_out:
# Write to OpenSearch/Elasticsearch
type: elasticsearch
inputs:
- structured_*
endpoint: |-
https://opensearch-cluster-master.default.svc.cluster.local:9200
mode: bulk
# Do not send the type because it was removed in OpenSearch 2.0.0/
# Elasticsearch 8.0
suppress_type_name: true
tls:
# Add the certificate of the Certificate Authority generated by
# the secret-operator so that the OpenSearch service can be
# verified
ca_file: /certs/ca.crt
# For the sake of simplicity, the credentials provided by the
# OpenSearch Helm chart are used. This must be replaced
# with certificate based authentication when used in production.
auth:
strategy: basic
user: admin
password: admin
extraVolumeMounts:
# Use the certificate generated by the secret-operator
- name: tls
mountPath: /certs
# Use the Vector transformations stored in the config map
- name: transforms
mountPath: /vrl
extraVolumes:
- name: tls
ephemeral:
volumeClaimTemplate:
metadata:
annotations:
secrets.stackable.tech/class: tls
secrets.stackable.tech/scope: pod
spec:
storageClassName: secrets.stackable.tech
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1
- name: transforms
configMap:
name: vector-transforms
Deploy Vector:
helm install vector vector \
--repo https://helm.vector.dev \
--values vector-values.yaml \
--wait
Inspect the logs in the OpenSearch Dashboards
Install the OpenSearch Dashboards:
helm install opensearch-dashboards opensearch-dashboards \
--repo https://opensearch-project.github.io/helm-charts \
--wait
Forward the OpenSearch Dashboards service to your local host:
kubectl port-forward service/opensearch-dashboards 5601
The OpenSearch Dashboards server takes a while until it is ready. If port forwarding aborts due to a lost connection to the pod then try it again.
Open the dashboards in your browser (http://localhost:5601
) and log in with the username admin
and the password admin
.
On the “Welcome” page, click “Explore on my own”.
Select the private tenant.
In the pop-up menu, select “Stack Management”, click on “Index Patterns” and “Create index pattern”.
Insert vector-*
into “Index pattern name” and click on “Next step”.
Choose timestamp
as “Time field” and click on “Create index pattern”.
Now open the pop-up menu again and go to “Discover”.
Ensure that vector-*
is the selected index pattern. If necessary, increase the time range to see the logs produced by the clusters.
In the “Available fields”, add the fields “component”, “container”, “level”, and “message” as columns.
Now you should have an overview of what is going on in your clusters. Add filters and select a time range to further narrow the logs down.
Conclusion
We hope that this writeup has been interesting to you!
You have seen how easy it is to install a Stackable demo, and the nitty-gritty of how to configuring a Vector & OpenSearch stack around it to collect and view all logs produced by those applications.
If you care about data-platforms, or want to learn more about Stackable, check out one of our demos, or look into the one you might have already spun up on your cluster following this writeup.