Exploration: Using Vector With Stackable Operators And Products

Centralized logging is the foundation of observability in a Kubernetes cluster. If you want to understand what’s happening across distributed workloads, it quickly becomes a pain to hunt down outputs of different interacting parts.

We want to provide our users and customers with a more convenient way to have an overview of what’s going on across deployed Stackable operators and the products they manage.

As a first step towards this goal, we tried out Vector – a tool that can be configured to handle many types of observability data – logs among them.

This writeup was created as a by-product of an early proof of concept. The resulting instructions were so detailed, that we decided to publish them on their own, so more people could find them.

This started as a comment in this issue. All of Stackable’s code is open-source!

Read on, and feel free to follow the instructions, to set up a proof-of-concept Vector + Stackable setup. Mind you, the result is not meant to be production-ready, but it could be helpful if you are looking to check out Vector and learn more about cluster-wide log aggregation.

Note: if you are more curious about Stackable than logs, check out our demos.

Collecting Stackable logs with Vector – a tutorial

In this tutorial, the Stackable Data Platform is set up with an Apache Superset and Trino cluster. The logs of Superset and Trino as well as the logs from the according Stackable operators are gathered by Vector and stored in OpenSearch. These logs are then made visible in the OpenSearch Dashboards.

The Vector agents are deployed as a DaemonSet. The agents transform the log data into a uniform format and forward it directly to OpenSearch.

This tutorial is written for the release 22.09 of the Stackable Data Platform. Later releases will provide a more integrated approach.

Note: If you’re interested in learning more about the journey towards a more integrated approach, you can follow along! Check out this GitHub issue where we track the progress.

Install a Superset and Trino cluster in kind

If you haven’t set up stackablectl, you can follow these instrcutions to do so. Now, you can create a kind cluster with the trino-taxi-data demo:

stackablectl demo install --kind-cluster trino-taxi-data

Be aware that this installs the latest version of the Stackable Data Platform.

OpenSearch prerequisites

OpenSearch uses a mmapfs directory by default to store its indices. The default operating system limits on mmap counts is likely to be too low – usually 65530, which may result in out of memory exceptions. So the Linux setting vm.max_map_count on the host machine where kind is running, must be set to at least 262144.

Note: if you are on a Mac, the commands in this section won’t work for you. You can skip it, and continue with the rest of the tutorial. They are not strictly necessary to get a working prototype going.

To check the current value, run this command:

sysctl vm.max_map_count

The limit can be temporarily increased with:

sudo sysctl --write vm.max_map_count=262144

To permanently increase the value, add the following line to /etc/sysctl.conf:

vm.max_map_count=262144

Then run sudo sysctl --load to reload.

Install OpenSearch

Create the the file opensearch-values.yaml to configure the OpenSearch Kubernetes deployment:

config:
  opensearch.yml: |
    plugins:
      security:
        # Use default security settings
        allow_default_init_securityindex: true
        # Allow communication between the nodes which use the
        # certificates generated by the secret-operator
        nodes_dn:
          - CN=generated certificate for pod
        # Use the certificate generated by the secret-operator
        ssl:
          http:
            # Enable TLS on the REST layer
            enabled: true
            pemcert_filepath: certs/tls.crt
            pemkey_filepath: certs/tls.key
            pemtrustedcas_filepath: certs/ca.crt
          transport:
            pemcert_filepath: certs/tls.crt
            pemkey_filepath: certs/tls.key
            pemtrustedcas_filepath: certs/ca.crt
            # Disable the verification of hostnames because internal IPs
            # are used which are not included in the certificates
            # generated by the secret-operator.
            enforce_hostname_verification: false
extraEnvs:
  # Disable the creation of demo certificates
  - name: DISABLE_INSTALL_DEMO_CONFIG
    value: "true"
extraVolumeMounts:
  # Mount the certificate generated by the secret-operator
  - name: tls
    mountPath: /usr/share/opensearch/config/certs
extraVolumes:
  # Request a TLS certificate from the secret-operator
  - name: tls
    ephemeral:
      volumeClaimTemplate:
        metadata:
          annotations:
            secrets.stackable.tech/class: tls
            # Add the service opensearch-cluster-master to the
            # distinguished names because this service is used by Vector
            secrets.stackable.tech/scope: |-
              service=opensearch-cluster-master
        spec:
          storageClassName: secrets.stackable.tech
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 1

Install OpenSearch:

helm install opensearch opensearch \
    --repo https://opensearch-project.github.io/helm-charts \
    --values opensearch-values.yaml \
    --wait

Install Vector with the log transformation rules

Create the file vector-transforms.yaml containing a config map with the necessary transformation steps for the log output:

apiVersion: v1
kind: ConfigMap
metadata:
  name: vector-transforms
data:
  raw.vrl: |
    # Create shortcuts for frequently used fields
    # Short header names in OpenSearch Dashboards are also beneficial.
    .component = .kubernetes.pod_labels."app.kubernetes.io/name"
    .container = .kubernetes.container_name
  operators.vrl: |
    # Remove the colors from the log entry
    .message = strip_ansi_escape_codes!(.message)
    # Parse the log entry which consists of timestamp, level, and
    # message separated by whitespaces.
    . |= parse_regex!(.message,
      r'^(?P<timestamp>\S+)\s+(?P<level>\S+)\s+(?P<message>.*)')
    # Parse the timestamp in ISO 8601 / RFC 3339 date & time format
    .timestamp = parse_timestamp!(.timestamp, "%+")
  superset.vrl: |
    # The log entry is either from Superset which consists of a
    # timestamp, level, logger, and message separated by colons ...
    log, err = parse_regex(.message,
      r'^(?P<timestamp>\d+-\d+-\d+ \d+:\d+:\d+,\d+):(?P<level>DEBUG|INFO|WARNING|ERROR|CRITICAL):(?P<logger>[\w\.]+):(?P<message>.*)')
    if err == null {
      .origin = "Superset"
      . |= log
      .timestamp = parse_timestamp!(.timestamp, "%Y-%m-%d %H:%M:%S,%3f")
    } else {
      # or from gunicorn which consists of a timestamp, process, level,
      # and message enclosed in brackets ...
      log, err = parse_regex(.message,
        r'^\[(?P<timestamp>\d+-\d+-\d+ \d+:\d+:\d+ [+-]\d+)\] \[(?P<process>\d+)] \[(?P<level>DEBUG|INFO|WARNING|ERROR|CRITICAL)\] (?P<message>.*)')
      if err == null {
        .origin = "gunicorn"
        . |= log
        .timestamp = parse_timestamp!(.timestamp, "%Y-%m-%d %H:%M:%S %z")
      } else {
        # or a standard output which also exists as a log entry and can
        # therefore be discarded.
        .origin = "stdout"
      }
    }
    # Align the log levels with the ones used in the operators
    if .level == "WARNING" {
      .level = "WARN"
    } else if .level == "CRITICAL" {
      .level = "ERROR"
    }
  superset_metrics.vrl: |
    # Parse the log line which consists of keys and values in the
    # format `key1=value1 key2=value2`. The values are enclosed in
    # quotes if necessary and are properly escaped.
    .structured = parse_key_value!(.message)
    .timestamp = parse_timestamp!(.structured.ts, "%+")
    .level = upcase!(.structured.level)
    .message = .structured.msg
  trino.vrl: |
    # Parse the log entry which consists of timestamp, level, thread,
    # logger, and message separated by tabs.
    . |= parse_regex!(.message,
      r'^(?P<timestamp>[^\t]+)\t(?P<level>[^\t]+)\t(?P<thread>[^\t]+)\t(?P<logger>[^\t]+)\t(?P<message>.*)')
    # Parse the timestamp in ISO 8601 / RFC 3339 date & time format
    .timestamp = parse_timestamp!(.timestamp, "%+")

Then apply this file:

kubectl apply --filename=vector-transforms.yaml

Create the file vector-values.yaml with the following content to configure the Helm deployment:

# Deploy the agent as a DaemonSet
role: Agent
service:
  # Disable Vector's service because it is not used here and would
  # require additional configuration.
  enabled: false
customConfig:
  # Set a directory used for persisting state which is writable by
  # Vector
  data_dir: /vector-data-dir
  sources:
    # Collect all log data for Kubernetes nodes enriched with metadata
    # from the Kubernetes API
    k8s_all:
      type: kubernetes_logs
  transforms:
    # Raw Kubernetes logs with a component field
    raw_logs:
      type: remap
      inputs:
        - k8s_all
      file: /vrl/raw.vrl
    # Raw operator logs
    raw_operators:
      type: filter
      inputs:
        - raw_logs
      condition: >
        includes([
                "superset-operator",
                "trino-operator",
            ],
            .component
        )
    # Operator logs with timestamp, level, and message
    structured_operators:
      type: remap
      inputs:
        - raw_operators
      file: /vrl/operators.vrl
    # Raw Superset logs from the superset container
    raw_superset:
      type: filter
      inputs:
        - raw_logs
      condition: .component == "superset" && .container == "superset"
    # Superset logs with timestamp, level, logger, and message but also
    # with unstructured content from the standard output
    semistructured_superset:
      type: remap
      inputs:
        - raw_superset
      file: /vrl/superset.vrl
    # Superset logs with timestamp, level, logger, and message
    structured_superset:
      type: filter
      inputs:
        - semistructured_superset
      condition: |-
        # Discard the standard output because it also exists as a log
        # entry
        .origin != "stdout"
    # Raw Superset logs from the metrics container
    raw_superset_metrics:
      type: filter
      inputs:
        - raw_logs
      condition: .component == "superset" && .container == "metrics"
    # Superset metrics logs with timestamp, level, and message
    structured_superset_metrics:
      type: remap
      inputs:
        - raw_superset_metrics
      file: /vrl/superset_metrics.vrl
    # Raw Trino logs
    raw_trino:
      type: filter
      inputs:
        - raw_logs
      condition: .component == "trino"
    # Raw multi-line Trino logs
    raw_multiline_trino:
      type: reduce
      inputs:
        - raw_trino
      merge_strategies:
        message: concat_newline
      starts_when: |-
        # The next entry starts when the message contains a timestamp,
        # level, thread, logger, and message separated by tabs.
        # Multi-line messages and stacktraces are merged into one log
        # entry. The newline is preserved with the merge strategy
        # "concat_newline".
        match(string!(.message), r'^[^\t]+\t[^\t]+\t[^\t]+\t[^\t]+\t.*')
    # Trino logs with timestamp, level, thread, logger, and message
    structured_trino:
      type: remap
      inputs:
        - raw_multiline_trino
      file: /vrl/trino.vrl
  sinks:
    opensearch_out:
      # Write to OpenSearch/Elasticsearch
      type: elasticsearch
      inputs:
        - structured_*
      endpoint: |-
        https://opensearch-cluster-master.default.svc.cluster.local:9200
      mode: bulk
      # Do not send the type because it was removed in OpenSearch 2.0.0/
      # Elasticsearch 8.0
      suppress_type_name: true
      tls:
        # Add the certificate of the Certificate Authority generated by
        # the secret-operator so that the OpenSearch service can be
        # verified
        ca_file: /certs/ca.crt
      # For the sake of simplicity, the credentials provided by the
      # OpenSearch Helm chart are used. This must be replaced
      # with certificate based authentication when used in production.
      auth:
        strategy: basic
        user: admin
        password: admin
extraVolumeMounts:
  # Use the certificate generated by the secret-operator
  - name: tls
    mountPath: /certs
  # Use the Vector transformations stored in the config map
  - name: transforms
    mountPath: /vrl
extraVolumes:
  - name: tls
    ephemeral:
      volumeClaimTemplate:
        metadata:
          annotations:
            secrets.stackable.tech/class: tls
            secrets.stackable.tech/scope: pod
        spec:
          storageClassName: secrets.stackable.tech
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 1
  - name: transforms
    configMap:
      name: vector-transforms

Deploy Vector:

helm install vector vector \
    --repo https://helm.vector.dev \
    --values vector-values.yaml \
    --wait

Inspect the logs in the OpenSearch Dashboards

Install the OpenSearch Dashboards:

helm install opensearch-dashboards opensearch-dashboards \
    --repo https://opensearch-project.github.io/helm-charts \
    --wait

Forward the OpenSearch Dashboards service to your local host:

kubectl port-forward service/opensearch-dashboards 5601

The OpenSearch Dashboards server takes a while until it is ready. If port forwarding aborts due to a lost connection to the pod then try it again.

Open the dashboards in your browser (http://localhost:5601) and log in with the username admin and the password admin.

On the „Welcome“ page, click „Explore on my own“.

Select the private tenant.

In the pop-up menu, select „Stack Management“, click on „Index Patterns“ and „Create index pattern“.

Insert vector-* into „Index pattern name“ and click on „Next step“.

Choose timestamp as „Time field“ and click on „Create index pattern“.

Now open the pop-up menu again and go to „Discover“.

Ensure that vector-* is the selected index pattern. If necessary, increase the time range to see the logs produced by the clusters.

In the „Available fields“, add the fields „component“, „container“, „level“, and „message“ as columns.

Now you should have an overview of what is going on in your clusters. Add filters and select a time range to further narrow the logs down.

Conclusion

We hope that this writeup has been interesting to you!

You have seen how easy it is to install a Stackable demo, and the nitty-gritty of how to configuring a Vector & OpenSearch stack around it to collect and view all logs produced by those applications.

If you care about data-platforms, or want to learn more about Stackable, check out one of our demos, or look into the one you might have already spun up on your cluster following this writeup.