This weekend I found some time to upgrade my little Check_MK Notification bot a bit.
After a good fight with the perl POE framework and learning a thing or two (teaching me the price of not using my own proven bot framework :p) I did manage to get some new features built into the bot.
This bot has been in use by me and the company I work at for about a year now, proving to be a nice to have notification channel.
One of the things that sometimes annoyed me was when someone would put a ton of services in downtime – or when something really breaks and a ton of alerts go off that the bot(s) would spam tons of messages for a while. This lead to the first new feature called MUTE.
The bot can be silenced for a custom amount of time (defaulting to 5 minutes if you omit it) by simply saying “mute” to it. See the screenshot below for a demonstration.
If you feel lonely immediately after this or botched the time you can use unmute to immediately cancel the mute.
Another nice new feature are filters. The command “problems” would already show -all- problems, but I implemented a filter feature so now you can also search for specific host issues or maybe issues for a specific contact group such as “SLA”.
For example, you can now ask the bot: problems host=web;contact=SLA and it will return all hosts that report to the SLA contactgroup and have web in their hostname.
Following up on this, it is now also possible to acknowledge all these problems using the same filter technique by issuing a command like ack all host=web;contact=SLA || We are fixing stuff. Useful filter columns right now are host_name service_description notes comments contact_groups, but the filter matches on a partial key.
If you don’t feel like typing a key since you have a specific enough keyword to search on you can also simply filter like this: problems webserverhost.name which will search in both host and service names, notes, and comments.
Another itch that needed scratching was the need for multiple IRC connections. These days we use Slack in addition to other communication tools, so a lot of colleagues are no longer found on IRC but only linger on for instance Slack. Previously this meant either the bots were no longer seen, or you needed to run it twice.
Well, the bot can now make multiple IRC connections! 🙂
Simply add another [irc] and [channels] block with a (unique) number appended to it and the config parser should add a connection.
Because I wanted to have Slack working I also added support for IRC server Username and Password, but do note that I needed to set the nickname to username to get Slack to accept the connection. Also be mindful of the channels that the user you use for the bot may automatically be subscribed to, since it will report to -all- of the channels it is in.
NEW (version 1.3a): Unless you set the regonly option to 1 in the configuration file for that IRC connection. This option will make the bot ignore channels that are not in the channel list in the configuration file. Very useful for Slack and Bitlbee etc.
Here’s a screenie to show off some of the new things:
Obviously there are a bunch of fixes and improvements (*cough*) in the new version as well, so new bug reports are welcome 🙂
The new 1.3a version can be downloaded here:
irc_notify-1.3a.mkp (30 downloads)
It should also be up soon on the Check_MK Plugin exchange soon:
Another day, another Check_MK plugin!
This one is inspired by smokeping, but different because it doesn’t need smokeping. It does need the tool formerly known as Matt’s TraceRoute, aka mtr. It’s installed on all my machines by default and easily available in all distro’s that are worthy. Even pokemon OS has it 😉
The reason I wanted to build this plugin was first of all because of pretty graphs (of course!). The second reason was that my girlfriend had some network issues to figure out, but only ping and DNS resolve times don’t paint a complete picture. This plugin makes some graphs that hopefully fill that void a bit 🙂
Now that you’ve skipped the last 2 paragraphs, here are some example graphs that I made while testing the plugin:
This is the plugin status per host on the service overview page of Check_MK. As you can see I configured multiple hosts. (continue reading…)
One of the cool things Check_MK offers these days is the option for custom notifications. Email notifications are of course fine, but a lot of people are also interested in Pagerduty or their own SMS service or whatnot. Personally I was interested in an IRC based notification system where alerts would simply be sent as a message into a specific channel on my IRC server.
Let’s see how we can implement that 🙂
Here’s another small plugin for Check_MK – this one keeps track of Nullmailer queues.
For installation check out one of my older plugin posts 😉
Have fun with this new plugin! 🙂
V1.1: Updated agent to check different queue location for Debian etc. No other changes.
V1.0: Initial version
A while ago when faced ‘why is my disk slow’ I realized “hej, I have an SSD… let’s use it as cache!”.
Easier said than done, because these days you have tons of options. A quick glance at them shows BCache, DM-Cache, FaceBook’s Flash-Cache or what I went for which is based on Flash-Cache: EnhanceIO. There’s probably more of them, while writing this I ran into this article on LVM cache – sounds interesting too.
Here’s a little comparison between a few of the above options: different ssd to hdd cacheing options on askubuntu.com. (continue reading…)
This one has been on my todo list for a while, so today I took a stab at it: a fail2ban plugin for Check_MK.
My previous plugin (LMSensors plugin for Check_MK) still gets quite a few hits, so I figured you guys might like this one as well.
Why? Pretty graphs of course 😉
Another reason might be that you want to keep an eye on how many ssh bots etc fail2ban keeps out. (continue reading…)
Hej look! A new WordPress release…. 3.4…. and it automagically updates, nice going guys 🙂
What’s new? A bunch of stuff I don’t care about, a few more rounded corners … meh.
And apparently they’re green. Oh well. I can be green too, see? 😉
Back to stuff I do care about: Check MK released a new major version a few days ago – it’s now on version 1.2p1.
Among the new stuff some shiny interface updates (you know, rounded corners and the like), a ton of fixes and new agents/checks (postgresql is among them), a Logwatch Pattern Analyzer and tons more.
Another day, another plugin (aka package) for Check MK.
This time it’s a check for Daemontools, the daemon that keeps other daemons up and running.
Personally I use it for a lot of services on my servers, like Apache and Tinydns. However, sometimes a service is flapping in Daemontools
because of a configuration error or something similar and you fail to notice for hours because the service seems to be ‘up’ (but only for a few seconds before it restarts again).
This check makes sure all services that are supposed to be up (“normally up”) are up, and also it checks for how long they’ve been up.
If the uptime is only a few seconds it’ll issue a warning because it might be flapping.
Just a tiny update for my Check_MK qmail package.
It still checks the qmail queue size 🙂
New in this version is the added Perf-O-Meter. Shiny shiny!
On the subject of Pretty Graphs (see my earlier post), I decided to write a plugin (a.k.a. ‘Package’) for Check_MK in order to monitor (and make pretty graphs!) of the sensor output of lmsensors.
Most machines support this out of the box these days, and it’s always interesting to see the conditions of your machine. In case you don’t know, it gives the temperature and voltage of your CPU and mainboard. Thus it’s a good source for making pretty graphs 🙂
Since these plugins are very easy to install (once you’ve got check_mk up and running that is) and still nobody had written one for lmsensors, I decided to do it myself. Writing python isn’t my strongest point (yet), but these are good opportunities to learn.
One of the issues I ran into while writing the plugin was that PNP4Nagios fails on service names that have a plus character in them. For instance, I had Sensor +12V. This created the files Sensor_+12V.rrd and corresponding xml, but when one would go to the PNP4Nagios graph of that sensor it would request a file called Sensor__12V.rrd, which obviously failed.
Therefore I molested the names a bit, so your sensor might now be simply called MB12V instead of M/B+12V.
Configuration of LM-Sensors
For my plugin to work you need to make sure that you have the ‘sensors’ tool. This normally comes in a package called “lmsensors” or “lm-sensors”. Note that you obviously need some kind of hardware sensor on your machine that’s supported by lm-sensors for it to work, including the required kernel module. Fortunately this will often work out of the box.
After making sure you have sensors and that running ‘sensors’ will give output like this:
Adapter: ISA adapter
in0: +1.04 V (min = +0.00 V, max = +4.08 V)
in1: +1.66 V (min = +0.00 V, max = +4.08 V)
in2: +3.39 V (min = +0.00 V, max = +4.08 V)
+5V: +3.04 V (min = +0.00 V, max = +4.08 V)
in4: +3.10 V (min = +0.00 V, max = +4.08 V)
in5: +1.90 V (min = +0.00 V, max = +4.08 V)
in6: +4.08 V (min = +0.00 V, max = +4.08 V)
5VSB: +3.04 V (min = +0.00 V, max = +4.08 V)
Vbat: +3.30 V
fan1: 2235 RPM (min = 10 RPM)
fan2: 0 RPM (min = 0 RPM)
fan3: 2500 RPM (min = 0 RPM)
fan5: 0 RPM (min = 0 RPM)
temp1: +41.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor
temp2: +36.0°C (low = +127.0°C, high = +90.0°C) sensor = thermal diode
temp3: +38.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor
cpu0_vid: +1.250 V
Adapter: PCI adapter
temp1: +32.0°C (high = +70.0°C)
You might see things like ‘in0’ instead of ‘CPU Voltage’. Don’t ask me what voltage corresponds with what sensor on your mainboard, but you can rename the sensor output by editing /etc/sensors.conf or /etc/sensors3.conf depending on your flavor of linux.
In order for the check_mk plugin templates to recognize the type you’ll need to make sure they have some kind of indication of the type of sensor. For instance label the temperature sensors with ‘temp’ or ‘temperature’. The default names like ‘in0’ will also work, but something like ‘Pizza sensor’ obviously won’t.
To change labels or ignore certain sensors because they give bogus data (not connected etc), first find your adapter type.
In the example above this is it8720-isa-0228. Now edit the sensors.conf file and add a section
for this adapter if it isn’t already there.
Here’s an example for renaming in0 to “CPU Voltage” and turning off the second fan since it’s not connected.
Also we’ll change the minimum and maximum voltage for the CPU Voltage — this determines when nagios will send out an alarm or not:
set in0_min 1.0
set in0_max 2.0
label in0 "CPU Voltage"
After changing the sensors file you’ll need to make lmsensors aware of the configuration change by running ‘sensors -s’. (might need root).
Adapter: ISA adapter
CPU Voltage: +1.31 V (min = +1.00 V, max = +2.00 V)
fan1: 2235 RPM (min = 10 RPM)
fan3: 2500 RPM (min = 0 RPM)
# some stuff deleted to save space :)
Tada. Now repeat this process for all sensors 🙂
There are two parts to installing a Check MK plugin. First on the host that actually runs check_mk we need to install the package. This is quickly done:
root@checkmk# md5sum lmsensors-1.4.mkp
root@checmk# check_mk -vP install lmsensors-1.4.mkp
Installing lmsensors version 1.4.
Checks man pages:
root@checkmk# check_mk -II
lmsensors.fan 2 new checks
lmsensors.volt 4 new checks
root@checkmk# check_mk -O
Done. Soon there will be pretty graphs for this machine 🙂
Now for a remote machine you will need to put the agent in place. Since this is only a single file it’s trivial to do:
Note that the place you want to put that thing in is the $MK_LIBDIR/plugins directory. In my case, this was /usr/lib/check_mk_agent/plugins, but it could very well be somewhere else on your system. You can find it in the check_mk_agent script if you don’t know:
Let Check_MK do an inventory on your remote machine [check_mk -II $machine] and the rest goes automagically! 🙂
And now we have pretty graphs for my sensors.
Comments and/or suggestions are welcome.
Version 1.1: now has pnp templates to put graphs of the same type together.
Here’s an example:
Version 1.2: changed sed to perl in agent plugin, sensornames with more than one space (among things) were giving issues. Thanks to Cyril Pawelko for finding the issue and helping with testing!
Version 1.3: minor change to PNP templates — Nico Weinreich informed me that his fan templates weren’t working correctly so I updated the regular expression used to match corresponding sensor types. If you didn’t have this issue this update won’t do anything useful for you 🙂
Version 1.4: Seems like I was a dumbass and didn’t check the 1.2 package properly. This package really makes it work with perl instead of sed.
Also updated the voltage pnp template to hopefully match more voltage sensors.
Note: if the pnp4nagios template doesn’t work for you, check your pnp4nagios perfdata dir, for example /var/lib/pnp4nagios/perfdata/ and see what .rrd files exist for your host. They are based on the sensor name, so if your sensor name is “St John”, it will not match the voltage template. These names come directly from your sensors.conf (if you don’t have it the default names for the sensors).
See above on how to rename your sensors.
Version 1.5: Another bug spotted by Cyril! The pnp4nagios temperature template had a botched variable name.
lmsensors-1.5.mkp (1570 downloads)
lmsensors-1.4.mkp (407 downloads)
lmsensors-1.3.mkp (367 downloads)
lmsensors-1.2.mkp (440 downloads)
lmsensors-1.1.mkp (414 downloads)
lmsensors-1.0.mkp (482 downloads)