Monitoring your systems: Nagios and Check

Monitoring your systems: Nagios and Check_MK

by BenV on Sep.22, 2010, under Software

If you have one or more servers, you probably have a few things that you want to be up and running all the time. And when they aren’t working for some reason, you want to know about that as soon as possible and not 2 weeks later when you finally find out the hard way because your raid array has crashed completely.
Basically you want some kind of software that monitors the state of your services/servers. Well, most competent system administrators already have this up and running.
Nagios is capable of doing this.

So to get it up and running: install nagios, define your contacts (email addresses), define your services (ntp, dns, www, mail, …), define your servers (including printers, switches and whatnot), install the plugins on all servers and/or SNMP including some firewalling/tunneling to get the data to your monitoring machine … yeesh, is this getting tiresome yet? And that’s only the basics.
If you want pretty graphs based on the monitored data (like ping times) you’ll be cursing for quite a while.
Wouldn’t it be great if you could simply install, define some contacts and say “Here’s my list of hostnames, go figure it out!“?

Fortunately I’m not the only one tired of it. Mathias Kettner wrote Check_MK, which is a plugin that makes the process a bit easier, and more efficient too! Plus a new fancy new webinterface!
It needs a working install of Nagios, but he does provide a script for it if you’re running a clean install of Debian/SLES.
Obviously I run Slackware, so time to do it the hard way.

Installation

First we’ll install Nagios:
root@z:~> wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.2.tar.gz root@z:~> tar zxf nagios-3.2.2.tar.gz root@z:~> cd nagios-3.2.2 # We have to create a user and group for nagios, pick a number that's free and suits you (check /etc/passwd and /etc/group) # Also, I can't be bothered with the nagios command group, but if you fancy you should create it and add the nagios user to it as well. root@z:~/nagios-3.2.2> groupadd -g 300 nagios root@z:~/nagios-3.2.2> useradd -g nagios -u 300 -d /dev/null -s /bin/false nagios root@z:~/nagios-3.2.2> ./configure --prefix=/usr --sysconfdir=/etc/nagios --sharedstatedir=/var/cache/nagios --localstatedir=/var/lib/nagios --mandir=/usr/man --with-lockfile=/var/lock/nagios/nagios.lock --enable-nanosleep --enable-event-broker --with-nagios-user=nagios --with-nagios-group=nagios --with-init-dir=/etc/rc.d --with-checkresult-dir=/var/spool/nagios/checkresults --with-httpd-conf=/etc/httpd/extra --libexecdir=/usr/libexec --libdir=/usr/lib --datadir=/usr/share/nagios # rattle rattle, overview! Make sure it's what you want :) root@z:~/nagios-3.2.2> make all root@z:~/nagios-3.2.2> mkdir pkg root@z:~/nagios-3.2.2> export DESTDIR=`pwd`/pkg root@z:~/nagios-3.2.2> make install # Note that for some idiot reason nagios dumps the cgi-bin stuff in sbin.... move it! root@z:~/nagios-3.2.2> mkdir -p pkg/usr/share/nagios/cgi-bin root@z:~/nagios-3.2.2> mv pkg/usr/sbin/*.cgi pkg/usr/share/nagios/cgi-bin root@z:~/nagios-3.2.2> make install-init root@z:~/nagios-3.2.2> make install-commandmode root@z:~/nagios-3.2.2> make install-config # Since the makefile is retarded we have to fake our httpd/extra dir root@z:~/nagios-3.2.2> mkdir -p pkg/etc/httpd/extra root@z:~/nagios-3.2.2> make install-webconf # For a decent package we want the config stuff renamed to .new files root@z:~/nagios-3.2.2> find pkg/etc -name *.cfg -exec mv "{}" "{}".new \; root@z:~/nagios-3.2.2> mkdir -p pkg/var/lock/nagios root@z:~/nagios-3.2.2> mv pkg/etc/rc.d/nagios pkg/etc/rc.d/rc.nagios root@z:~/nagios-3.2.2> cd pkg ; makepkg /usr/src/packages/nagios-3.2.2-i386-1.txz root@z:~/nagios-3.2.2> installpkg !$
That’s the first step.

Then the Nagios Plugins that normally do the detection work:
root@z:~> wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.15.tar.gz # Leech leech root@z:~> tar zxf !$:t root@z:~> chown -R root:root !$:r:r && chmod -R o-w !$:r:r root@z:~> cd !$ root@z:~/nagios-plugins-1.4.15> ./configure --prefix=/usr --with-nagios-user=nagios --with-nagios-group=nagios --enable-perl-modules --sysconfdir=/etc/nagios --sharedstatedir=/var/cache/nagios --localstatedir=/var/lib/nagios --libexecdir=/usr/libexec --libdir=/usr/lib --datadir=/usr/share/nagios --mandir=/usr/man # junk junk root@z:~/nagios-plugins-1.4.15> make # some more junk root@z:~/nagios-plugins-1.4.15> export DESTDIR=`pwd`/pkg-np root@z:~/nagios-plugins-1.4.15> mkdir pkg-np root@z:~/nagios-plugins-1.4.15> make install && make install-root root@z:~/nagios-plugins-1.4.15> cd $DESTDIR ; makepkg /usr/src/packages/nagios-plugins-1.4.15-i386-1BnV.txz # Make sure to say no to the permissions question! Otherwise some plugins that require suid (icmp ping check) won't work. root@z:~/nagios-plugins-1.4.15/pkg-np> installpkg !$
That’s another one done.

Next we need RRDTool which creates pretty graphs.
One little issue I ran into a while ago is documented here in my notes.
root@z:~> wget http://oss.oetiker.ch/rrdtool/pub/rrdtool.tar.gz # Version 1.4.4 while typing this root@z:~> tar zxf !$:t root@z:~> chown -R root:root rrdtool-* && chmod -R o-w rrdtool-* root@z:~> cd rrdtool-* root@z:~/rrdtool-1.4.4> perl -i~ -p -e 's/(Requires.*?)\s*xrender.*/$1/' /usr/lib/pkgconfig/cairo.pc # Otherwise configure will fail. root@z:~/rrdtool-1.4.4> ./configure --prefix=/usr --with-perl-options='INSTALLDIRS=vendor' root@z:~/rrdtool-1.4.4> sed -r -i 's|/usr/share/man|/usr/man|g' bindings/perl-piped/Makefile bindings/perl-shared/Makefile # Yadayada, ordering a hitman to kill Tobi Oetiker ... just kidding ;-) root@z:~/rrdtool-1.4.4> make # Go get coffee, this takes a while root@z:~/rrdtool-1.4.4> mkdir pkg ; make install DESTDIR=`pwd`/pkg root@z:~/rrdtool-1.4.4> find pkg -name perllocal.pod -o -name ".packlist" -o -name "*.bs" -delete # Get rid of these, we already have them # Note that DESTDIR has to be appended to make, just setting the environment variable will fuck up the perl bindings! root@z:~/rrdtool-1.4.4> cd pkg ; makepkg /usr/src/packages/rrdtool-1.4.4-i386-1BnV.txz root@z:~/rrdtool-1.4.4> installpkg !$

Next up, the part that draws pretty graphs for nagios. Also known as PNP4Nagios.
We need at least version 0.6 for check_mk to work with it. Bah, I hate that site…. terrible documentation.
root@z:~> wget http://sourceforge.net/projects/pnp4nagios/files/PNP-0.6/pnp4nagios-0.6.6.tar.gz/download # Leech leech root@z:~> tar zxf !$:h:t # BOOM root@z:~> chown -R root:root !$:r:r && chmod -R o-w !$:r:r && cd !$:r:r root@z:~/pnp4nagios-0.6.6> unset CONFIG_SITE # configure likes its output more than my commandline configure options. Screw that. root@z:~/pnp4nagios-0.6.6> ./configure --prefix=/usr --sysconfdir=/etc/pnp4nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-init-dir=/etc/rc.d --with-perfdata-logfile=/var/log/pnp4nagios --with-perfdata-dir=/var/lib/pnp4nagios/perfdata --with-perfdata-spool-dir=/var/spool/pnp4nagios --with-httpd-conf=/etc/httpd/extra --datarootdir=/usr/share/pnp4nagios --localstatedir=/var/lib/pnp4nagios root@z:~/pnp4nagios-0.6.6> make all root@z:~/pnp4nagios-0.6.6> mkdir pkg ; export DESTDIR=`pwd`/pkg root@z:~/pnp4nagios-0.6.6> make install install-config install-html install-processperfdata install-plugins # Rename config files root@z:~/pnp4nagios-0.6.6> mv pkg/etc/pnp4nagios/config.php pkg/etc/pnp4nagios/config.php.new root@z:~/pnp4nagios-0.6.6> find pkg/etc/pnp4nagios -name *.cfg -exec mv "{}" "{}".new \; root@z:~/pnp4nagios-0.6.6> cd pkg ; makepkg /usr/src/packages/pnp4nagios-0.6.6-i386-1BnV.txz root@z:~/pnp4nagios-0.6.6> installpkg !$

Whew.
Optionally you can install NagVis for some fancy network maps.
Check the screenshots on the NagVis homepage to get an idea.
It requires graphviz though, I couldn’t be bothered. (yet).

Now before we go on to install Check_MK, we first need to get Nagios up and running.

Configuration

First we need to put our configuration files in place. Most important is the nagios.cfg file:
root@z:~> cd /etc/nagios root@z:/etc/nagios> mv nagios.cfg.new nagios.cfg root@z:/etc/nagios> vim nagios.cfg
Let’s walk through the configuration file. (Most default settings are fine, I will only mention a few important ones).
By default the configuration is split up into several smaller config files per section. They have one for commands, one for contacts, etc.
Be sure to create/copy all the config files that you specify in nagios.cfg, and edit them so they make sense.
Again: defaults are fine, but it’s good to define at least a contact so you can be notified of problems (after all, that’s what nagios is for).
For commands.cfg make sure you check the notify-host-by-email and notify-service-by-email commands.

As you can see you can define stuff just the way you want in nagios.
Personally I got rid of most of the samples (local checks, windows templates, etc), we’ll be using check_mk for most checks anyway.
One thing that is important for Check_MK is the external commands processing.
Make sure it is enabled in your nagios.cfg, it should look something like:
check_external_commands=1 command_check_interval=-1 command_file=/var/lib/nagios/rw/nagios.cmd external_command_buffer_slots=4096
Note that I didn’t change anything there.
One thing we should change is for pnp4nagios. In order for it to make graphs we should enable the processing of performance data:
process_performance_data=1 host_perfdata_command=process-host-perfdata service_perfdata_command=process-service-perfdata
Not only that, but we should make sure those commands are defined. In my configuration (pretty default) I added those in objects/commands.cfg.
If you followed my install garbage from above you should make it look something like this:
define command{ command_name process-host-perfdata command_line /usr/bin/perl /usr/libexec/process_perfdata.pl -d HOSTPERFDATA }

define command{ command_name process-service-perfdata command_line /usr/bin/perl /usr/libexec/process_perfdata.pl }

Phew. Let’s see if Nagios agrees with our garbage:
root@z:~> nagios -v /etc/nagios/nagios.cfg Nagios Core 3.2.2 Blabla dumbass yadieyada

Processing object config file '/etc/nagios/objects/commands.cfg'... # and more junk
Now either Nagios says your are awesome and your configuration files make sense (at least in syntax), or Nagios will point you at your stupid failure to write a trivial configuration like this. If you failed, at least Nagios is polite enough to point out exactly where you made your mistake, so fix it!

However, since I refused to let localhost stay in my configuration, Nagios still complained even though my syntax was fine.
And I sort of agree: why should Nagios run if there’s nothing to monitor?
So for the heck of testing our configuration, I define a host.
In nagios.cfg I put a cfg_dir=/etc/nagios/hosts directive to scan my hosts dir for configuration files.
So let’s put a host there: /etc/nagios/hosts/jemoeder.cfg
define host { host_name jemoeder use generic-host address 127.0.0.1 alias hoer contact_groups admins check_command check_ping!100.0,20%!500.0,60% max_check_attempts 3 }
Running the configuration test again will now give you the finger because it still doesn’t have any services defined.
So we also add a service (let’s ping!) to jemoeder.cfg:
define service{ use generic-service host_name jemoeder service_description Ping check_command check_ping!100.0,20%!500.0,60% }

Now the configuration check should work and say:
Total Warnings: 0 Total Errors: 0

Things look okay - No serious problems were detected during the pre-flight check

Note. For later when installing Check_MK you might want to do this:
In nagios.cfg put a cfg_dir=/etc/nagios/check_mk directive as the first cfg_dir. Check_MK will then use this dir
to put its own garbage in later. (if you don’t do this it will use the first cfg_dir it finds).

Let’s fire up nagios!
As usual I run stuff in daemontools. A trivial run file:
#!/bin/bash exec 2>&1 sleep 1 # Prevent super fast respawning on errors exec setuidgid nagios /usr/bin/nagios /etc/nagios/nagios.cfg
(or run the init script that it installed in /etc/rc.d, whatever you like)
If everything went ok your /var/lib/nagios/nagios.log or wherever you pointed your log_file should show a startup banner and no errors (and that also goes for the output of nagios).

Apache configuration

Now obviously it would be nice if we could monitor the status of Nagios.
Soon we’ll add Check_MK for a better interface, but let’s start with the basic “comes with the package” Nagios interface.
To get it running you’ll need to configure either a few aliases (see /etc/httpd/extra/nagios.conf), or as I prefer: add a vhost configuration.
The vhost config should be something like:
ScriptAlias /nagios/cgi-bin "/usr/share/nagios/cgi-bin" Alias /pnp4nagios "/usr/share/pnp4nagios" Alias /nagios "/usr/share/nagios"


    

       Options ExecCGI

       AllowOverride None

       Order allow,deny

       Allow from all

       AuthName "Nagios Access"

       AuthType Basic

       AuthUserFile /etc/nagios/htpasswd.users

       Require valid-user

    
    

       Options None

       AllowOverride None

       Order allow,deny

       Allow from all

       AuthName "Nagios Access"

       AuthType Basic

       AuthUserFile /etc/nagios/htpasswd.users

       Require valid-user

       # Mod Rewrite stuff

        RewriteEngine On

        Options FollowSymLinks

        # Installation directory

        RewriteBase /pnp4nagios/

        # Protect application and system files from being viewed

        RewriteRule ^(application|modules|system) - [F,L]

        # Allow any files or directories that exist to be displayed directly

        RewriteCond %{REQUEST_FILENAME} !-f

        RewriteCond %{REQUEST_FILENAME} !-d

        # Rewrite all other URLs to index.php/URL

        RewriteRule .* index.php/$0 [PT,L]

ServerAdmin webmaster@nagios.jemoeder.nl DocumentRoot /www/vhosts/nagios.jemoeder.nl ServerName nagios.jemoeder.nl ErrorLog /www/logs/nagios.jemoeder.nl-error_log CustomLog /www/logs/nagios.jemoeder.nl-access_log combinedio DirectoryIndex index.php
For bonus points, make port 80 a permanent redirect to https://yourvhost/ and add the vhost with SSL enabled.
Don’t forget to create a nagios password file:
root@z:~/> htpasswd -c /etc/nagios/htpasswd.users nagiosadmin New password: Re-type new password: Adding password for user nagiosadmin
Point your browser to your vhost (I’ll use http://nagios/ for this example), and you should be greeted by a login after which Nagios says hello.
If nagios bitches about permissions, make sure your user (default nagiosadmin) matches the user in /etc/nagios/cgi.cfg.

Fixing pnp4nagios

When you go to http://nagios/pnp4nagios/ you should be greeted by an overview screen that shows you either that you didn’t install php (go fix!), didn’t enable
mod rewrite (go fix!) or when everything is ok it says something like:

Your environment passed all requirements. Remove or rename the install.php file now.

So we do that (or you can choose to rename it I guess):
root@z:~> rm /usr/share/pnp4nagios/install.php
Reloading it now will probably give you some blabla about missing performance data, which is ok for now.

Check_MK

Now finally we get to installing Check_MK. Let’s give it a shot:
root@z:~> wget http://mathias-kettner.de/download/check_mk-1.1.7i5.tar.gz # I like the latest version, if you're a coward take the stable 1.6 version root@z:~> tar zxvf !#:t root@z:~> chown -R root:root !$:r:r && chmod -R o-w !$:r:r && cd !$:r:r root@z:~/check_mk-1.1.7i5> ./setup.sh # blabla Sorry: Cannot find Nagios/Icinga process. Is it running? # Questions
(and such horrific colors…)
Mhrm, seems like the autodetect.py script assumes we’re running nagios with the -d option. Which isn’t the case because I run it in daemontools.
Worse, even after running it with the -d switch (or patching the find_pid_and_configfile function to skip the -d check) it fails to detect the PNP4Nagios settings.
Then again, the pnp4nagios detection seems broken on Debian as well, so that’s no big deal.
One other thing that needs attention is the check_icmp detection. The autodetect assumes that you have a command_line with the full path to check_icmp somewhere, which you normally won’t. Failing that it will check some weird places and then give up.

To fix these issues I have a little patch for autodetect.py here.
[Download not found]

Patch it and run it again, and you should see something like:
root@z:~/check_mk-1.1.7i5> wget http://notes.benv.junerules.com/wp-content/plugins/download-monitor/download.php?id=autodetect.slackware13.diff root@z:~/check_mk-1.1.7i5> patch -p0 < autodetect.slackware13.diff root@z:~/check_mk-1.1.7i5> ./setup.sh # blabla * Found running Nagios process, autodetected 18 settings. # ctrl-c
Much better. Now to build a package:
root@z:~/check_mk-1.1.7i5> mkdir pkg root@z:~/check_mk-1.1.7i5> export DESTDIR=`pwd`/pkg root@z:~/check_mk-1.1.7i5> ./setup.sh # CTRL-C --- ABORT!!!
… and what do we see? RETARDED ASSUMPTIONS!
For instance, what the hell is this:
Nagios binary /usr/sbin/nagios
I’m pretty sure the autodetect.py said:
nagios_binary=’/usr/bin/nagios’
… *RAGE* … WHAT’S THE POINT IN AUTODETECTION IF YOU DON’T USE IT?!

Gah. Fine, here’s my patch for setup.sh, if you want to have a package. If not, simply don’t set destdir and answer the setup questions.
In fact, answering the questions might be the best way to install Check_MK, since the setup script also tries to molest nagios.cfg which isn’t there in the DESTDIR.
[Download not found]

We try again!
root@z:~/check_mk-1.1.7i5> wget http://notes.benv.junerules.com/wp-content/plugins/download-monitor/download.php?id=setup.sh.slackware13.diff root@z:~/check_mk-1.1.7i5> patch -p0 < setup.sh.slackware13.diff root@z:~/check_mk-1.1.7i5> ./setup.sh # Note that the paths etc are now more sane. Ignore the error about nagios.cfg root@z:~/check_mk-1.1.7i5/pkg> makepkg /usr/src/packages/check_mk-1.1.7i5-i386-1.txz root@z:~/check_mk-1.1.7i5/pkg> installpkg !$
That finally worked.
Note that this package will delete/overwrite your configuration files in its current form! Either change the pkg so they are renamed to .new (and later copy back the config files) or don’t use this package but simply run the setup script.
Also note that you need to add this to your nagios.cfg, since the setup tried and failed:
# Load Livestatus Module broker_module=/usr/lib/check_mk/livestatus.o /var/lib/nagios/rw/live event_broker_options=-1

Now that we’ve got the thing installed, let’s see if it works.
The installation added a configuration file for Apache, but since I use a vhost config I threw the thing out and added this to my vhost inside the VirtualHost directive:
Alias /check_mk /usr/share/check_mk/web/htdocs AddHandler mod_python .py PythonHandler index PythonDebug On DirectoryIndex index.py


        Order deny,allow

        allow from all

        AuthName "Nagios Access"

        AuthType Basic

        AuthUserFile /etc/nagios/htpasswd.users

        require valid-user
        ErrorDocument 403 "
Authentication Problem
Either you've entered an invalid password or the authentication
configuration of your check_mk web pages is incorrect.
Please make sure that you've edited the file
/etc/apache/vhosts/check_mk and made it use the same
authentication settings as your Nagios web pages.
Restart Apache afterwards."

        ErrorDocument 500 "
Server or Configuration Problem

A Server problem occurred. You'll find details in the error log of Apache. One possible reason is, that the file /etc/nagios/htpasswd.users is missing. You can create that file with htpasswd or htpasswd2. A better solution might be to use your existing htpasswd file from your Nagios installation. Please edit /etc/apache/vhosts/check_mk and change the path there. Restart Apache afterwards."
Be sure to enable mod_python if you haven’t. (run apache2ctl configtest and see if it barfs, if it does it’s probably mod_python :))

Installing mod_python

Oh… you don’t have mod_python? Do I need to show everything?
root@z:~> wget http://mirrors.supportex.net/apache//httpd/modpython/mod_python-3.3.1.tgz root@z:~> tar zxf !$:t root@z:~> chown -R root:root !$:r && chmod -R o-w !$:r && cd !$:r # Patch a bug root@z:~/mod-python-3.3.1> sed -ie 's/APR_BRIGADE_SENTINEL(b)/APR_BRIGADE_SENTINEL(bb)/g' src/connobject.c root@z:~/mod-python-3.3.1> ./configure --prefix=/usr root@z:~/mod-python-3.3.1> make root@z:~/mod-python-3.3.1> mkdir pkg ; export DESTDIR=`pwd`/pkg ; make install root@z:~/mod-python-3.3.1> cd pkg ; makepkg /usr/src/packages/mod_python-3.3.1-i386-1.txz root@z:~/mod-python-3.3.1/pkg> installpkg /usr/src/packages/mod_python-3.3.1-i386-1.txz
Easy.
Oh, you want it enabled as well?
root@z:~> echo 'Include /etc/httpd/mod_python.conf' >> /etc/httpd/httpd.conf root@z:~> echo 'LoadModule python_module /usr/lib/httpd/modules/mod_python.so' > /etc/httpd/mod_python.conf root@z:~> apachectl restart
Tada.

Check_MK Configuration

After you’ve restarted both Nagios AND Apache you should be able to go to http://nagios/check_mk and get some fancy interface like the screenshot below. If you get “No such user ” then you should edit /etc/check_mk/multisite.mk and add your username to the admin_users statement.

Check_MK Welcome

Now for the fun part.

Getting hosts added to Nagios using Check_MK

There are basically two options for getting monitoring data. One is through SNMP, the other is the Check_mk_agent that gathers info like most check_bla things from nagios normally do. So either setup SNMP or the check_mk-agent for the hosts you want to monitor. I’ll show the agent for localhost, and a remote host through ssh.

First localhost.
We create a server that listens on localhost port 6556 and runs /usr/share/check_mk/agents/check_mk_agent.linux.
For this I use DJB’s tcpserver, but if you feel like it you can use inetd or xinetd.
My daemontools run file looks like this:
#!/bin/sh sleep 1 tcpserver -v 127.0.0.1 6556 /usr/share/check_mk/agents/check_mk_agent.linux
You can test it by running telnet 127.0.0.1 6556 and you should get pages of statistics.
Add localhost to the check_mk configuration file:
root@z:~> vim /etc/check_mk/main.mk # some comments all_hosts = [ 'localhost' ]
Next we let check_mk do an inventory of your hosts:
root@z:~> check_mk -I tcp check_mk -I tcp cpu.loads 1 new checks cpu.threads 1 new checks df 3 new checks diskstat 2 new checks # and more
Note that this doesn’t actually add any checks yet, it just shows what new checks check_mk could find on your defined hosts.
If no checks are found you probably messed up the agent part. Scroll back up and try again.
Note the argument ‘tcp‘, it simply indicates what checks to scan for now. It does not imply that
it will scan localhost for tcp, that is defined in your main.mk configuration file.

Next we add the newly found checks to Nagios by running:
root@z:~> check_mk -O Generating Nagios configuration...OK Validating Nagios configuration...OK Precompiling host checks...OK Reloading Nagios...OK
If you’re running the older version of check_mk (below 1.1.7i5) you need a different syntax:
check_mk -U -C -R.
As you can see, it restarted Nagios for you after creating some new nagios definitions.
If you go back to your browser you will suddenly notice that the numbers went up. An extra host, and a ton of extra services to check.

While you’re at it, make sure in the left pane under ‘Master Control‘ that ‘Performance Data‘ is enabled. (click it if it isn’t). This will allow
the pnp4nagios pretty graphs to work.
After a few minutes you should be able to see pretty graphs if you click the star icon or the Perf-O-Meter bargraph, like this:

Check_MK Pretty Graphs

For getting your data through SSH it’s still simple enough.
First, generate a host key and put it in /etc/check_mk/keys or something, readable by user nagios.
Then put that key into your target host under /root/.ssh/authorized_keys with a forced command. Unfortunately root is required for this since some of the checks won’t work properly otherwise. (of course you can opt out, use SNMP or figure out a better method of getting the data).
It should be safe enough though, since you’re using a forced command and a key. The authorized_keys should look something like:
command="/usr/bin/check_mk_agent" ssh-rsa AAAAsomegibberishheremakesurenottofuckupthecopypastewithspaceslalatmw+== nagios@Z
As you can see I copied the check_mk_agent.linux script to that machine as well and installed it into /usr/bin.
Before you continue, make sure the ssh setup works. A command like this: ssh -l root -i /etc/check_mk/keys/myhost myhost should return the blurb of information like telnet did before. If it doesn’t, you need to fix it first.

When it works, you need to add the host to main.mk like you did before with localhost. This time you also need to tell check_mk where to get the data using a datasource definition. The /etc/check_mk/main.mk file looks a bit like this afterwards:
all_hosts = [ 'localhost', 'myhost', ] datasource_programs = [ ( "ssh -l root -i /etc/check_mk/keys/myhost myhost", [ 'myhost' ] ), ]
Let’s do a new inventory and see if it works!
root@z:~> check_mk -I tcp cpu.loads 1 new checks cpu.threads 1 new checks df 3 new checks diskstat 2 new checks # and more root@z:~> check_mk -O Generating Nagios configuration...OK Validating Nagios configuration...OK Precompiling host checks...OK Reloading Nagios...OK
Voila. Another host added.

Well, that’s that for today. I’m surprised you kept reading until here. Either that or you hit the END button by accident.
For lots more tricks, options and fun on check_mk check out the Online Documentation.
(Host groups, checking windows machines, scanning parents, writing your own checks… etc)

Thanks for reading! 🙂

:Check_MK, monitoring, nagios, pnp4nagios, slackware

6 Comments for this entry

staypuft
October 17th, 2010 on 20:22
Hi,

This is a fantastic tutorial. pnp4nagios was the hard part for me using Nagios3 distro package. I built a new system and followed these instructions and it worked fine.

One question I can’t seem to figure out. If I change the memory / hard disk size of a physical or virtual machine, the pnp4nagios graphs do not seem to update the installed memory value. For instance I just bumped memory up from 3gb to 6gb on a box, and the graph says 3gb ram installed… They will auto extend themselves however. This is a minor annoyance, but it would be nice to know how to regenerate the values the pnp4nagios graphs are looking at.

Thanks again for the great tutorial

Log in to Reply
cterry195
August 12th, 2012 on 03:20
Ben,

Excellent post! I found some very useful information for fixing some of my issues with nagios,pn4nagios,check_mk.

I do have one puzzling problem is that for some reason just a few weeks ago I started getting lots of Warning Status in Nagios and Check_Mk consoles that say ERROR – you did an active check on this service – please disable active checks. If I disable the “Last Check” information will not update anymore. Any ideas? Thanks in Advance Charles

Log in to Reply
BenV
August 12th, 2012 on 16:19
Good to hear my notes were useful for you cterry195 🙂
Haven’t had your problem with the active checks though, the way it’s supposed to work CheckMK does all the checking and reports back to nagios as a passive check. Did you change anything in your configuration before it changed? Or click one of those enable buttons somewhere? 🙂

Log in to Reply
SirLaban
September 7th, 2012 on 13:38
Excellent how-to, I already had nagios/check_mk running but needed to monitor a host at a customer site which i only had ssh access to, this made my day.

Log in to Reply
engr_cat
September 20th, 2012 on 21:32
Thank you for this awesome walk-thru! I have a seemingly simple — probably stupid question. So, here goes…
I am running nagios and check_mk on RHEL 6.1 running all on VMs. The default is for the systemtime check to run — the VMs it is checking are both RHEL and Windows. They have the check_mk client running on them, of course. The systemtime check never worked correctly for either O/S — (which was reported back to the RHEL 6.1 system). The manpage/website for systemtime says it only works on Windows agents. So, does this mean that it check_mk systemtime check will not work because the system the agents are reporting to is RHEL? Thank you in advance.

Log in to Reply
BenV
September 21st, 2012 on 00:31
@engr_cat: for linux agents there is the NTP check. The systemtime check for windows works fine on one windows 2008 r2 server I have running, but other than that I haven’t tested it.
Have you checked the logs to see if the agent is reporting data from windows?
In my nagios logs I see something like:
nagios3: PASSIVE SERVICE CHECK: win2008;System Time;0;OK – Offset is -0.7 sec (levels at 30/60 sec)
(you can try telnetting to the windows server on the check_mk port — default 6556 — and see if there’s a <<< systemtime >>> section with a number below it)

I don’t think the issue has anything to do with RHEL, should work just fine 🙂

Log in to Reply