In most systems, logs are crucial in maintaining the system’s health and troubleshooting issues. While application-specific log records are valuable, they often fall short when it comes to gaining comprehensive insights. To achieve a deeper understanding, you must gather and analyze logs from various sources, including Docker containers, syslog, databases, and more. This is where a log aggregator comes into play. A log aggregator is a tool designed to collect, transform, and route logs from diverse sources to a central location, enhancing your ability to analyze and troubleshoot effectively. Many log aggregators are available, such as Vector, Fluentd, and Filebeat, to mention a few. However, in this article, we will focus on Vector.
Vector is a robust open-source log aggregator developed by Datahog. It empowers you to build observability pipelines by seamlessly fetching logs from many sources, transforming the data as needed, and routing it to your preferred destination. Vector stands out for its lightweight nature, exceptional speed, and memory efficiency, mainly owing to its implementation in Rust, a programming language renowned for its memory management capabilities.
Vector offers a rich set of features commonly found in log aggregators, including support for plugins that enable integration with various data sources and destinations, real-time monitoring, and robust security features. Additionally, Vector can be configured for high availability, ensuring it can handle substantial volumes of logs without compromising performance.
This comprehensive guide will explore how to leverage Vector to collect, forward, and manage logs effectively. we’ll start by building a sample application that writes logs to a file. Next, we’ll walk you through using Vector to read and direct the logs to the console. Finally, we’ll delve into log transformation, centralization, and monitoring to ensure the health and reliability of your Vector-based log management setup.
Prerequisites
To complete this tutorial, you will need a system with a non-root user that has sudo
privileges. Optionally, you can install Docker and Docker Compose on your system.
Once you’ve met these requirements, create a root project directory to house your application, configurations, and Dockerfiles:
mkdir log-processing-stack
This directory will serve as the foundation for your project as you progress through the tutorial.
Afterward, move into the directory:
Next, create a directory dedicated to your demo application. Then move into the newly created directory:
mkdir logify && cd logify
Developing a demo logging application
In this section, you will create a sample Bash script that generates logs at regular intervals.
In the logify
directory, create a new file named logify.sh
with the text editor of your choice:
In your logify.sh
file, and add the following code:
log-processing-stack/logify/logify.sh
Copied!
#!/bin/bash
filepath="/var/log/logify/app.log"
create_log_entry() {
local info_messages=("Connected to database" "Task completed successfully" "Operation finished" "Initialized application")
local random_message=${info_messages[$RANDOM % ${#info_messages[@]}]}
local http_status_code=200
local ip_address="127.0.0.1"
local emailAddress="user@mail.com"
local level=30
local pid=$$
local ssn="407-01-2433"
local time=$(date +%s)
local log='{"status": '$http_status_code', "ip": "'$ip_address'", "level": '$level', "emailAddress": "'$emailAddress'", "msg": "'$random_message'", "pid": '$pid', "ssn": "'$ssn'", "time": '$time'}'
echo "$log"
}
while true; do
log_record=$(create_log_entry)
echo "${log_record}" >> "${filepath}"
sleep 3
done
The create_log_entry()
function creates a log entry in the JSON format, which includes fields such as the HTTP status code, IP address, a random log message, process ID, social security number, and a timestamp. The script then enters an infinite loop, repeatedly calling this function to generate the log entries and appending them to the specified log file in the /var/log/logify
directory.
Note that while this example includes personal information, such as email addresses, social security numbers, and IP addresses, it is primarily intended for demonstration purposes. Vector can filter out sensitive data by either removing personal information fields or redacting them, which is crucial for maintaining data privacy and security. You’ll learn how to implement it later in the tutorial.
Once you are finished, save the changes you’ve made to the file. Run the following command to make the script executable:
Next, create the /var/log/logify
where the application will store the logs:
sudo mkdir /var/log/logify
Change the directory ownership to the user specified in the $USER environment variable, which contains the currently logged-in user:
sudo chown -R $USER:$USER /var/log/logify/
Now, execute the script in the background by adding &
at the end:
The bash
job control system yields output that includes the process ID:
The process ID, which is 2933
in this case, will be used to terminate the script later.
Next, view the contents of the log file using the tail
command:
tail -n 4 /var/log/logify/app.log
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 12655, "ssn": "407-01-2433", "time": 1694551048}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Connected to database", "pid": 12655, "ssn": "407-01-2433", "time": 1694551051}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Initialized application", "pid": 12665, "ssn": "407-01-2433", "time": 1694551072}
{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Initialized application", "pid": 12665, "ssn": "407-01-2433", "time": 1694551075}
Installing Vector
Now that you can generate logs, you will install the latest version of Vector. In this article, we will install Vector on Ubuntu 22.04 through the apt
package manager. If you’re using a different system, you can select the appropriate option based on your operating system on the documentation page.
To add the Vector repository, use the following command:
curl -1sLf
'https://repositories.timber.io/public/vector/cfg/setup/bash.deb.sh'
| sudo -E bash
Install Vector with the following command:
Next, confirm that the installation was successful:
vector 0.32.1 (x86_64-unknown-linux-gnu 9965884 2023-08-21 14:52:38.330227446)
When you install Vector, it will automatically launch in the background as a systemd service. However, in this tutorial, we will run Vector manually, so we don’t need the service to be running. It can lead to conflicts if you intend to run Vector manually while the background service is running.
To stop the Vector service, use the following command:
sudo systemctl stop vector
How Vector works
With Vector now installed, let’s explore how it works.
To understand Vector, imagine it as a pipeline. At one end, Vector ingests raw logs and standardizes them into a unified log event format. As the log event travels through Vector, it can undergo various manipulations using “transforms” to manipulate and enhance its content. Finally, at the end of the pipeline, the log event can be sent to multiple destinations for storage or analysis.
You can define the data sources, transforms, and destinations in a configuration file at /etc/vector/vector.toml
. This configuration file is organized into the following components:
[sources.]
. . .
[transforms.]
. . .
[sinks.]
. . .
This structure allows you to configure and customize Vector to suit your specific log aggregation and processing needs.
Let’s analyze the components:
[sources.
: this section defines the data sources that Vector should read.] [transforms.
: specifies how the data should be manipulated or transformed.] [sinks.
: defines the destinations where Vector should route the data.]
Each component requires you to specify a plugin. For sources, the following are some of the inputs you can use:
- File: fetch logs from files.
- Docker Logs:
gather logs from Docker containers. - Socket:
collect logs sent via the socket client. - Syslog:
fetches logs from Syslog.
To process the data, here are some of the
transforms that can come in handy:
-
Remap with VRL:
an expression-oriented language designed to transform your data. -
Lua: use
the Lua programming language to transform log events. -
Filter:
filter events according to the specified conditions. -
Throttle:
rate limit log streams.
Finally, let’s look at some of the
sinks available for Vector:
- HTTP: forward logs to an HTTP endpoint.
- WebSocket:
deliver observability data to a WebSocket endpoint. - Loki: forward
logs to Grafana Loki. - Elasticsearch:
deliver logs to Elasticsearch.
In the next section, you will use file source to read logs from a file and forward the records to the console using the console sink.
Getting started with Vector
Now that you know how Vector works, you will configure it to read log records from the /var/log/logify/app.log
file and redirect them to the console.
Open the /etc/vector/vector.toml
file and ensure you have the necessary superuser privileges:
sudo nano /etc/vector/vector.toml
Remove all the existing contents and add the following lines:
/etc/vector/vector.toml
Copied!
[sources.app_logs]
type = "file"
include = ["/var/log/logify/app.log"]
[sinks.print]
type = "console"
inputs = ["app_logs"]
encoding.codec = "json"
The sources.app_logs
component reads logs from a file using the file
source. The source is specified with the type
option, and you define the include
option, which contains the path to the file that should be read.
The sinks.print
component specifies the destination to send the logs. To redirect them to the console, you set the type
to the console
sink. Next, you specify the source component from which the logs will originate, which is the app_logs
component in this case. Finally, you specify that logs should be in JSON format using encoding.codec
.
Once you have made these configurations, save the file and validate your changes in the terminal:
sudo vector validate /etc/vector/vector.toml
2023-09-04T17:58:20.187534Z WARN vector::app: DEPRECATED The openssl legacy provider provides algorithms and key sizes no longer recommended for use. Set `--openssl-legacy-provider=false` or `VECTOR_OPENSSL_LEGACY_PROVIDER=false` to disable. See https://vector.dev/highlights/2023-08-15-0-32-0-upgrade-guide/#legacy-openssl for details.
√ Loaded ["/etc/vector/vector.toml"]
√ Component configuration
√ Health check "print"
------------------------------------
Validated
Now you can run Vector:
Upon starting, it will pick up the configuration file automatically.
If you defined vector.toml
in a different location, you need to pass the full path to the configuration file:
sudo vector --config
When Vector starts, you will see output confirming that it has started:
2023-09-12T05:56:41.803796Z INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=info,rdkafka=info,buffers=info,lapin=info,kube=info"
2023-09-12T05:56:41.804202Z WARN vector::app: DEPRECATED The openssl legacy provider provides algorithms and key sizes no longer recommended for use. Set `--openssl-legacy-provider=false` or `VECTOR_OPENSSL_LEGACY_PROVIDER=false` to disable. See https://vector.dev/highlights/2023-08-15-0-32-0-upgrade-guide/#legacy-openssl for details.
2023-09-12T05:56:41.805079Z INFO vector::app: Loaded openssl provider. provider="legacy"
2023-09-12T05:56:41.805287Z INFO vector::app: Loaded openssl provider. provider="default"
2023-09-12T05:56:41.806105Z INFO vector::app: Loading configs. paths=["/etc/vector/vector.toml"]
2023-09-12T05:56:41.809530Z INFO vector::topology::running: Running healthchecks.
2023-09-12T05:56:41.810125Z INFO vector: Vector has started. debug="false" version="0.32.1" arch="x86_64" revision="9965884 2023-08-21 14:52:38.330227446"
2023-09-12T05:56:41.810335Z INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
...
After a few seconds, you will start seeing log messages in JSON format appear at the end:
{"file":"/var/log/logify/app.log","host":"vector-test","message":"{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 12655, "ssn": "407-01-2433", "time": 1694551048}","source_type":"file","timestamp":"2023-09-12T20:40:21.582883690Z"}
{"file":"/var/log/logify/app.log","host":"vector-test","message":"{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Connected to database", "pid": 12655, "ssn": "407-01-2433", "time": 1694551051}","source_type":"file","timestamp":"2023-09-12T20:40:21.582980072Z"}
...
The output confirms that Vector can successfully read the log files and route the logs to the console. Vector has automatically added several fields such as file
, host
, message
, source_type
, and timestamp
to each log entry for further context.
You can now press CTRL + C
to exit Vector.
Transforming the logs
It’s uncommon to send logs without processing them in some way. Often, you may need to enrich them with important fields, redact sensitive data, or transform plain text logs into a structured format like JSON, which is easier for machines to parse.
Vector offers a powerful language for data manipulation called Vector Remap Language (VRL). VRL is a high-performance, expression-oriented language designed for transforming data. It provides functions for parsing data, converting data types, and even includes conditional statements, among other capabilities.
In this section, you will use VRL to process data in the following ways:
- Parsing JSON logs.
- Removing fields.
- Adding new fields.
- Converting timestamps.
- Redacting sensitive data.
Vector Remap Language(VRL) dot operator
Before we dive into transforming logs with VRL, let’s cover some fundamentals that will help you understand how to use it efficiently.
To get familiar with the syntax, Vector provides a vector vrl
subcommand, which starts a Read-Eval-Print Loop](REPL). To use it, you need to provide it with the --input
option, which accepts a JSON file with log events.
First, make sure you are in the log-processing-stack/logify
and create an input.json
file:
In your input.json
file, add the following log event from the output in the last section:
log-processing-stack/logify/input.json
Copied!
{"file":"/var/log/logify/app.log","host":"vector-test","message":"{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 12655, "ssn": "407-01-2433", "time": 1694551048}","source_type":"file","timestamp":"2023-09-12T20:40:21.582883690Z"}
Make sure there are no trailing spaces at the end to avoid errors.
Then, start the REPL:
vector vrl --input input.json
Type a single dot into the REPL prompt:
When Vector reads the log event in the input.json
file, the dot operator will return the following:
{ "file": "/var/log/logify/app.log", "host": "vector-test", "message": "{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 12655, "ssn": "407-01-2433", "time": 1694551048}", "source_type": "file", "timestamp": "2023-09-12T20:40:21.582883690Z" }
The .
references the incoming event, and every event that Vector processes can be accessed using the dot notation.
To access a property, you prefix it with a .
like so:
You can also reassign the value of .
to another property:
Now, type the .
again:
It will no longer refer to the original object but to the “host” property:
Now you can exit the REPL by typing exit
:
Now that you are familiar with the dot operator, you will explore VRL in more detail in the upcoming sections, starting with parsing JSON logs.
Parsing JSON logs using Vector
To begin, if you examine the log in the output closely on the message
property, you will notice that even though the log entry was originally in the JSON format, Vector has converted it into a string:
{
"file": "/var/log/logify/app.log",
"host": "vector-test",
"message": "{"status": 200, "ip": "127.0.0.1", "level": 30, "emailAddress": "user@mail.com", "msg": "Task completed successfully", "pid": 12655, "ssn": "407-01-2433", "time": 1694551048}",
"source_type": "file",
"timestamp": "2023-09-12T20:40:21.582883690Z"
}
However, our goal is to have Vector parse the JSON logs. To achieve this, open the configuration file again:
sudo nano /etc/vector/vector.toml
Next, define a transform and set it to use the remap
transform:
/etc/vector/vector.toml
Copied!
...
[transforms.app_logs_parser]
inputs = ["app_logs"]
type = "remap"
source = '''
# Parse JSON logs
., err = parse_json(.message)
'''
[sinks.print]
type = "console"
inputs = ["app_logs_parser"]
encoding.codec = "json"
You define a transform named transforms.app_logs_parser
to process the logs. You specify that the input for this component should come from the source reading the records, which is app_logs
here. Next, you conf