Log files contain a lot of important data about how your infrastructure works, but when they are thousands of lines, it can be difficult to get useful insight from them. Log management tools help solve this problem.
Why should I care about log files?
Every connection to your web server is logged; whenever a user requests a resource, a line is written in the log file. You can use these types of logs to get a very accurate picture of the traffic coming into your site. They do not provide data on how the user interacts with the site, which is provided to analytics tools, but they do tell you how your web server handles each request.
The HTTP request code of the request is frequently logged, so these logs can be useful for tracking broken links and errors that return 404 (which can affect the site̵7;s ranking when search engines like Google crawl the site), something that is not returned with most analytics tools (because your page not even loaded).
Applications create error logs, which is useful for tracking backend issues. If a certain API causes errors, it will appear in the log files very quickly. Your own applications require you to implement your own logs, but there are plenty of logging libraries that make the process easier.
Unix keeps logs of everything that happens to the system. Every command you enter is logged in
~/.bash_historylogs every login attempt (including failed, potentially harmful)
/var/log/auth.log, and most other system events generate their own log files, usually stored in
The problem: too many servers, too many logs
Most applications leave logs, a paper trail of what that application has done. Some applications, such as web servers, can leave a lot of logs, which can be large enough to fill up the server’s hard drive and need to be rotated regularly.
A server is difficult enough to manage, but managing logs spread across multiple servers can be an impossible task, requiring you to authenticate on each server and manually view the log files for the specific machine.
Log management tools are the solution to this problem, so you can concentrate your logs in one place and view them much more easily. Many services also provide visualization tools, so you do not have to dig through ten thousand lines of text to find useful data.
How do log management tools work?
A log management tool like Fluentd will run on a server somewhere, whether it’s in the cloud behind a managed web interface or self-hosted on your own systems. The server running this is called a collection server and collects logs from several external sources.
The process begins with input – log files from client systems are fed into the unit using a program called a log shipper. Log transmitter as
rsyslog libraries are light libraries that sit on client systems and point to the single server.
Once the log files have been captured, it is up to the log management tool what happens to them. For some tools, the simple collection of them is sufficient and they can be sorted and entered into a time series database such as InfluxDB for further analysis. For others, like Graylog, the service is based on the quality of their visualization and analysis tools.
Which tool should I use?
Elastic Stack (also called ELK stack) is a very popular logging platform. It consists of four different applications, all open source with the same developer. It’s completely free, but you have to host it yourself.
- Beats are light log transports designed to be installed on client machines and send data to other applications in the stack.
- Logstash is the intake engine, which can take data from Beats or other programs such as
rsyslogand prepare it for submission to Elasticsearch (or another analytics engine).
- Elasticsearch is the motor in the middle of the elastic stack (after which the stack is named). It acts as a database for storing your logs (and other objects) and exposes a RESTful API for use in other applications.
- Kibana is the front end for Elastic Stack and provides all visualizations, charts, graphs and search options for the end user.
Many of the tools in Elastic Stack are quite plug-and-play with other log management tools, so if you have a preference for something else, you can probably replace that item in the stack. Overall, however, most tools and frames will follow the same general structure as Elastic Stack – log shipper> ingestion engine> database> visualization tool.
Fluentd and Filebeat are alternative intake engines and would replace Logstash in the stack. These can enter data into a time series database such as InfluxDB, which has a built-in plugin for Grafana, an analysis and visualization platform.
Logwatch is a very simple command line tool that monitors your log files and sends a daily report. It does not do any type of collection, so it is perfect for individual server sets who want a little more insight into server logs.
Graylog completely replaces Elastic Stack and only requires external log transports to retrieve data. Their web interface supports the creation of custom charts and dashboards for monitoring your logs, but may be missing compared to a proper database installation and Graph.
SolarWinds Papertrail is a fully managed service that displays logs in real time, which can be very useful when troubleshooting problems with your servers. Their plans are pretty cheap, segmented per GB and start at just $ 7.
Splunk monitors almost everything surrounding your applications, including logs. If you want a comprehensive analysis package, Splunk may be for you.
LogDNA is a simple log analysis tool with very cheap plans. If you are looking for a simple alternative to configuring an ELK stack, LogDNA can be set up quickly.