Logstash is a data pipeline that can take data input from various sources, filter it, and output it to various sources; these sources can be files, Kafka, or databases. Logstash is a very important tool in Elastic Stack as it's primarily used to pull data from various sources and push it to Elasticsearch; from there, Kibana can use that data for analysis or visualization. We can take any type of data using Logstash, such as structured or unstructured data , which comes from various sources, such as the internet. The data can be transformed using Logstash's filter option, which has different plugins to play with different sets of data. For example, if we get an IP address in our data, the GeoIP plugin can add geolocation using that IP address, and in the output, we can get additional information of geolocation, which can then be used in Kibana to plot a map.
The following expression shows us an example of a Logstash configuration file:
input
{
file
{
path => "/var/log/apache2/access.log"
}
}
filter
{
grok
{
match => {message => "%{COMBINEDAPACHELOG}"}
}
}
output
{
elasticsearch
{
hosts => "localhost"
}
}
In the preceding expression, we have three sections: input, filter, and output. In the input section, we're reading the Apache access log file data. The filter section is there to extract Apache access log data in different fields, using the grok filter option. The output section is quite straightforward as it's pushing the data to the local Elasticsearch cluster. We can configure the input and output sections to read or write from or to different sources, whereas we can apply different plugins to transform the input data; for example, we can mutate a field, transform a field value, or add geolocation from an IP address using the filter option.
Grok is a tool that we can use to generate structured and queryable data by parsing unstructured data.