This weekend I found some time to upgrade my little Check_MK Notification bot a bit.
After a good fight with the perl POE framework and learning a thing or two (teaching me the price of not using my own proven bot framework :p) I did manage to get some new features built into the bot.
This bot has been in use by me and the company I work at for about a year now, proving to be a nice to have notification channel.
One of the things that sometimes annoyed me was when someone would put a ton of services in downtime – or when something really breaks and a ton of alerts go off that the bot(s) would spam tons of messages for a while. This lead to the first new feature called MUTE.
The bot can be silenced for a custom amount of time (defaulting to 5 minutes if you omit it) by simply saying “mute” to it. See the screenshot below for a demonstration.
If you feel lonely immediately after this or botched the time you can use unmute to immediately cancel the mute.
Another nice new feature are filters. The command “problems” would already show -all- problems, but I implemented a filter feature so now you can also search for specific host issues or maybe issues for a specific contact group such as “SLA”.
For example, you can now ask the bot: problems host=web;contact=SLA and it will return all hosts that report to the SLA contactgroup and have web in their hostname.
Following up on this, it is now also possible to acknowledge all these problems using the same filter technique by issuing a command like ack all host=web;contact=SLA || We are fixing stuff. Useful filter columns right now are host_name service_description notes comments contact_groups, but the filter matches on a partial key.
If you don’t feel like typing a key since you have a specific enough keyword to search on you can also simply filter like this: problems webserverhost.name which will search in both host and service names, notes, and comments.
Another itch that needed scratching was the need for multiple IRC connections. These days we use Slack in addition to other communication tools, so a lot of colleagues are no longer found on IRC but only linger on for instance Slack. Previously this meant either the bots were no longer seen, or you needed to run it twice.
Well, the bot can now make multiple IRC connections! 🙂
Simply add another [irc] and [channels] block with a (unique) number appended to it and the config parser should add a connection.
Because I wanted to have Slack working I also added support for IRC server Username and Password, but do note that I needed to set the nickname to username to get Slack to accept the connection. Also be mindful of the channels that the user you use for the bot may automatically be subscribed to, since it will report to -all- of the channels it is in.
NEW (version 1.3a): Unless you set the regonly option to 1 in the configuration file for that IRC connection. This option will make the bot ignore channels that are not in the channel list in the configuration file. Very useful for Slack and Bitlbee etc.
Here’s a screenie to show off some of the new things:
Obviously there are a bunch of fixes and improvements (*cough*) in the new version as well, so new bug reports are welcome 🙂
The new 1.3a version can be downloaded here:
irc_notify-1.3a.mkp (30 downloads)
It should also be up soon on the Check_MK Plugin exchange soon:
Hej look! A new WordPress release…. 3.4…. and it automagically updates, nice going guys 🙂
What’s new? A bunch of stuff I don’t care about, a few more rounded corners … meh.
And apparently they’re green. Oh well. I can be green too, see? 😉
Back to stuff I do care about: Check MK released a new major version a few days ago – it’s now on version 1.2p1.
Among the new stuff some shiny interface updates (you know, rounded corners and the like), a ton of fixes and new agents/checks (postgresql is among them), a Logwatch Pattern Analyzer and tons more.
So today I noticed my pretty Check MK graphs were broken. Trying to view a random Check_MK service’s PNPGraphs gave this error:
Fatal error: PCRE has not been compiled with UTF-8 support. See PCRE Pattern Modifiers for more information. This application cannot be run without UTF-8 support. in /usr/lib/kohana/system/core/utf8.php on line 38
Of course this is after I had upgraded some Slackware packages in the daily upgrades (I still run Slackware Current on non production machines, keeps things interesting) including PHP and Perl, so I wasn’t really surprised.
I reinstalled RRDTool since the perl bindings were gone (and of course I once again had to fight it) and decided to upgrade PNP4Nagios while I was at it.
However, no dice. Searching for the error gave nothing, hence this post.
To be sure it wasn’t what they (google) claimed it would be, I checked for UTF-8 support in my libpcre:
PCRE version 8.12 2011-01-15
Unicode properties support
Newline sequence is LF
\R matches all Unicode newlines
Internal link size = 2
POSIX malloc threshold = 10
Default match limit = 10000000
Default recursion depth limit = 10000000
Match recursion uses stack
Jup, UTF-8 support is there, along with Unicode stuff. Yay.
Next thing I noticed was that the new PHP version was complaining about extensions that wouldn’t load.
For instance dbase.so didn’t exist anymore, and some other junk failed as well. It’s possible that this error has been around for a while on that machine, but it was time to fix it!
Since my php.ini was from 2008 I decided to simply take the stock /etc/httpd/php.ini-production for now.
After that change php was again bitching about not being able to load extensions, but this times it were different ones.
Apparently they now have ‘libenchant‘ for spell checking, so that was one of the other failures.
To fix that problem use ‘slackpkg install enchant‘. (that’ll teach me to run the slackpkg install-new every once in a while :p)
Restarting Apache the hard way (not “apachectl restart” but /etc/rc.d/rc.httpd restart) helped for the PNP4Nagios error.
However, upgrading it introduced a new gimmick called:
Undefined index: auth_enabled
Right. Upgrading my Check_MK to 1.1.10p3 didn’t help.
However, checking the pnp4nagios config.php file made me aware that they added some options.
After merging in the new config.php options from their sample dir it finally worked again.
The new options I had to add when going from pnp4nagios version 0.6.7 to 0.6.13 were:
$conf['zgraph_height'] = "450";
$conf['auth_enabled'] = FALSE;
# Adjust the next one to your configuration, it's probably different :)
$conf['livestatus_socket'] = "unix:/var/lib/nagios/rw/live";
$conf['allowed_for_all_services'] = "";
$conf['allowed_for_all_hosts'] = "";
Hooray, pretty graphs are back 🙂
If you have one or more servers, you probably have a few things that you want to be up and running all the time. And when they aren’t working for some reason, you want to know about that as soon as possible and not 2 weeks later when you finally find out the hard way because your raid array has crashed completely.
Basically you want some kind of software that monitors the state of your services/servers. Well, most competent system administrators already have this up and running.
Nagios is capable of doing this. (continue reading…)