The beginning of the real challenge
When I started working with Zabbix and deployed our first project with Zabbix, some thoughts that surrounded me were as follows: whether this tool is a reliable one, whether it is possible to use it in large environments, how many users I can have using the Zabbix GUI, and how many hosts or nvps I can manage with Zabbix.
Of course, we started working with Zabbix after a lot of tests and simulations, but a test environment isn't the same as most customers' environments.
In our first project with Zabbix in a large environment (Zabbix version 1.4), we had no Zabbix proxy, caches, or any buffer inside the Zabbix server. Of course, we experienced a lot of troubles regarding performance. We started this project using the Oracle database (because our customer wanted it). After working on this project for some weeks, we began to realize that our performance could be degraded. Our Zabbix GUI was unresponsive, and we were getting some screen errors saying something related to table locks. At such times, the Zabbix database used to execute a lot of SQL update operations in a table called ids
. The ids
table is very short, with an unexpressive column and row amount. But why did we get these errors? How was Zabbix doing its work?
At this point in time, we asked Zabbix SIA about this behavior. They told us that we had no performance issues with the Zabbix server, but had issues with the Oracle database. We received this information and thought that maybe the application (the Zabbix server) has no performance issues, so let's tune the Oracle database. Therefore, we started working hard on tuning a lot of Oracle parameters. Our Oracle DBAs adjusted all the possible parameters to improve our performance. But we still had performance issues, even though they were few. At this point (2007 to 2008), we were stuck with the project and went back to the planning table.
Our Zabbix-certified guys began a deep investigation to know exactly how (by SQL statements or TCP/IP stack), when (while gathering new values or accessing gathered data), why (to clean-up old data or to create trends data), by whom (the Zabbix server pollers processes or Zabbix server trappers processes), and with how much effort Zabbix will be needed to execute all tasks.
Of course, we have new features nowadays, and it is easier to manage performance. When we started working with Zabbix, we used to read the Zabbix forum threads, looking for a magical solution to our errors. But our environment was not the same, as some specific tuning was made. I mean, zabbix_server.conf
, which works like a charm on my environment, can be bad for you.
From the Zabbix forums, it is possible to get a lot of tricks and tips on how to improve Zabbix's performance. Some say they are happy with Zabbix's performance in a large environment and others say they are unhappy with it in a small environment.
But you really need to know about Zabbix's internal tasks, flows, and process. You also need to know about your environment. After using all of this knowledge, you will experience the best monitoring tool you ever knew.