This weekend I found some time to upgrade my little Check_MK Notification bot a bit.
After a good fight with the perl POE framework and learning a thing or two (teaching me the price of not using my own proven bot framework :p) I did manage to get some new features built into the bot.
This bot has been in use by me and the company I work at for about a year now, proving to be a nice to have notification channel.
One of the things that sometimes annoyed me was when someone would put a ton of services in downtime – or when something really breaks and a ton of alerts go off that the bot(s) would spam tons of messages for a while. This lead to the first new feature called MUTE.
The bot can be silenced for a custom amount of time (defaulting to 5 minutes if you omit it) by simply saying “mute” to it. See the screenshot below for a demonstration.
If you feel lonely immediately after this or botched the time you can use unmute to immediately cancel the mute.
Another nice new feature are filters. The command “problems” would already show -all- problems, but I implemented a filter feature so now you can also search for specific host issues or maybe issues for a specific contact group such as “SLA”.
For example, you can now ask the bot: problems host=web;contact=SLA and it will return all hosts that report to the SLA contactgroup and have web in their hostname.
Following up on this, it is now also possible to acknowledge all these problems using the same filter technique by issuing a command like ack all host=web;contact=SLA || We are fixing stuff. Useful filter columns right now are host_name service_description notes comments contact_groups, but the filter matches on a partial key.
If you don’t feel like typing a key since you have a specific enough keyword to search on you can also simply filter like this: problems webserverhost.name which will search in both host and service names, notes, and comments.
Another itch that needed scratching was the need for multiple IRC connections. These days we use Slack in addition to other communication tools, so a lot of colleagues are no longer found on IRC but only linger on for instance Slack. Previously this meant either the bots were no longer seen, or you needed to run it twice.
Well, the bot can now make multiple IRC connections! 🙂
Simply add another [irc] and [channels] block with a (unique) number appended to it and the config parser should add a connection.
Because I wanted to have Slack working I also added support for IRC server Username and Password, but do note that I needed to set the nickname to username to get Slack to accept the connection. Also be mindful of the channels that the user you use for the bot may automatically be subscribed to, since it will report to -all- of the channels it is in.
NEW (version 1.3a): Unless you set the regonly option to 1 in the configuration file for that IRC connection. This option will make the bot ignore channels that are not in the channel list in the configuration file. Very useful for Slack and Bitlbee etc.
Here’s a screenie to show off some of the new things:
Obviously there are a bunch of fixes and improvements (*cough*) in the new version as well, so new bug reports are welcome 🙂
The new 1.3a version can be downloaded here:
irc_notify-1.3a.mkp (34 downloads)
It should also be up soon on the Check_MK Plugin exchange soon:
Another day, another Check_MK plugin!
This one is inspired by smokeping, but different because it doesn’t need smokeping. It does need the tool formerly known as Matt’s TraceRoute, aka mtr. It’s installed on all my machines by default and easily available in all distro’s that are worthy. Even pokemon OS has it 😉
The reason I wanted to build this plugin was first of all because of pretty graphs (of course!). The second reason was that my girlfriend had some network issues to figure out, but only ping and DNS resolve times don’t paint a complete picture. This plugin makes some graphs that hopefully fill that void a bit 🙂
Now that you’ve skipped the last 2 paragraphs, here are some example graphs that I made while testing the plugin:
This is the plugin status per host on the service overview page of Check_MK. As you can see I configured multiple hosts. (continue reading…)
Here’s another small plugin for Check_MK – this one keeps track of Nullmailer queues.
For installation check out one of my older plugin posts 😉
Have fun with this new plugin! 🙂
V1.1: Updated agent to check different queue location for Debian etc. No other changes.
V1.0: Initial version
A while ago when faced ‘why is my disk slow’ I realized “hej, I have an SSD… let’s use it as cache!”.
Easier said than done, because these days you have tons of options. A quick glance at them shows BCache, DM-Cache, FaceBook’s Flash-Cache or what I went for which is based on Flash-Cache: EnhanceIO. There’s probably more of them, while writing this I ran into this article on LVM cache – sounds interesting too.
Here’s a little comparison between a few of the above options: different ssd to hdd cacheing options on askubuntu.com. (continue reading…)
This one has been on my todo list for a while, so today I took a stab at it: a fail2ban plugin for Check_MK.
My previous plugin (LMSensors plugin for Check_MK) still gets quite a few hits, so I figured you guys might like this one as well.
Why? Pretty graphs of course 😉
Another reason might be that you want to keep an eye on how many ssh bots etc fail2ban keeps out. (continue reading…)
Hej look! A new WordPress release…. 3.4…. and it automagically updates, nice going guys 🙂
What’s new? A bunch of stuff I don’t care about, a few more rounded corners … meh.
And apparently they’re green. Oh well. I can be green too, see? 😉
Back to stuff I do care about: Check MK released a new major version a few days ago – it’s now on version 1.2p1.
Among the new stuff some shiny interface updates (you know, rounded corners and the like), a ton of fixes and new agents/checks (postgresql is among them), a Logwatch Pattern Analyzer and tons more.
Another day, another plugin (aka package) for Check MK.
This time it’s a check for Daemontools, the daemon that keeps other daemons up and running.
Personally I use it for a lot of services on my servers, like Apache and Tinydns. However, sometimes a service is flapping in Daemontools
because of a configuration error or something similar and you fail to notice for hours because the service seems to be ‘up’ (but only for a few seconds before it restarts again).
This check makes sure all services that are supposed to be up (“normally up”) are up, and also it checks for how long they’ve been up.
If the uptime is only a few seconds it’ll issue a warning because it might be flapping.
Just a tiny update for my Check_MK qmail package.
It still checks the qmail queue size 🙂
New in this version is the added Perf-O-Meter. Shiny shiny!
So today I noticed my pretty Check MK graphs were broken. Trying to view a random Check_MK service’s PNPGraphs gave this error:
Warning: preg_match() [function.preg-match]: Compilation failed: unknown option bit(s) set at offset 0 in /usr/lib/kohana/system/core/utf8.php on line 30
Fatal error: PCRE has not been compiled with UTF-8 support. See PCRE Pattern Modifiers for more information. This application cannot be run without UTF-8 support. in /usr/lib/kohana/system/core/utf8.php on line 38
Of course this is after I had upgraded some Slackware packages in the daily upgrades (I still run Slackware Current on non production machines, keeps things interesting) including PHP and Perl, so I wasn’t really surprised.
I reinstalled RRDTool since the perl bindings were gone (and of course I once again had to fight it) and decided to upgrade PNP4Nagios while I was at it.
However, no dice. Searching for the error gave nothing, hence this post.
To be sure it wasn’t what they (google) claimed it would be, I checked for UTF-8 support in my libpcre:
benv@graphs$ pcretest -C
PCRE version 8.12 2011-01-15
Unicode properties support
Newline sequence is LF
\R matches all Unicode newlines
Internal link size = 2
POSIX malloc threshold = 10
Default match limit = 10000000
Default recursion depth limit = 10000000
Match recursion uses stack
Jup, UTF-8 support is there, along with Unicode stuff. Yay.
Next thing I noticed was that the new PHP version was complaining about extensions that wouldn’t load.
For instance dbase.so didn’t exist anymore, and some other junk failed as well. It’s possible that this error has been around for a while on that machine, but it was time to fix it!
Since my php.ini was from 2008 I decided to simply take the stock /etc/httpd/php.ini-production for now.
After that change php was again bitching about not being able to load extensions, but this times it were different ones.
Apparently they now have ‘libenchant‘ for spell checking, so that was one of the other failures.
To fix that problem use ‘slackpkg install enchant‘. (that’ll teach me to run the slackpkg install-new every once in a while :p)
Restarting Apache the hard way (not “apachectl restart” but /etc/rc.d/rc.httpd restart) helped for the PNP4Nagios error.
However, upgrading it introduced a new gimmick called:
Please check the documentation for information about the following error.
Undefined index: auth_enabled
Right. Upgrading my Check_MK to 1.1.10p3 didn’t help.
However, checking the pnp4nagios config.php file made me aware that they added some options.
After merging in the new config.php options from their sample dir it finally worked again.
The new options I had to add when going from pnp4nagios version 0.6.7 to 0.6.13 were:
$conf['zgraph_width'] = "750";
$conf['zgraph_height'] = "450";
$conf['auth_enabled'] = FALSE;
# Adjust the next one to your configuration, it's probably different :)
$conf['livestatus_socket'] = "unix:/var/lib/nagios/rw/live";
$conf['allowed_for_all_services'] = "";
$conf['allowed_for_all_hosts'] = "";
Hooray, pretty graphs are back 🙂
If you have one or more servers, you probably have a few things that you want to be up and running all the time. And when they aren’t working for some reason, you want to know about that as soon as possible and not 2 weeks later when you finally find out the hard way because your raid array has crashed completely.
Basically you want some kind of software that monitors the state of your services/servers. Well, most competent system administrators already have this up and running.
Nagios is capable of doing this. (continue reading…)