LXC container monitoring

From OpenNMS
Jump to navigation Jump to search

Normally lxc containers is monitored by itself. It has an own IP address and service running which can be perfectly monitored. So you will see if a service or a node has gone offline.

But imagine you have a lot of LXC hosts with a lot containers running. If they go offline and the LXC containers might not be proberly configured to be reboot safe to start automatically, then you really need a good documentation to take them online again. Because you don't know which container was running on which server.

Or you can let the monitoring system tell us on which host a container is not running.

To configure such a monitoring in OpenNMS, we need to use the hostresourceswrunmonitor. From the point of view of the LXC host, the containers are only processes.

As you can see below, the usual process names won't help us, because they have all the same name lxc-start.

Example output for process table:

[08:02]root@opennms:/root# snmpwalk -c COMMUNITY -v2c 10.10.10.10 .1.3.6.1.2.1.25.4.2.1.2
...
HOST-RESOURCES-MIB::hrSWRunName.1210 = STRING: "lxc-start"
HOST-RESOURCES-MIB::hrSWRunName.1217 = STRING: "corosync"
HOST-RESOURCES-MIB::hrSWRunName.1241 = STRING: "lxc-start"
HOST-RESOURCES-MIB::hrSWRunName.1242 = STRING: "init"
HOST-RESOURCES-MIB::hrSWRunName.1313 = STRING: "lxc-start"
...

But the process monitor is also able to query hrSwRunParameters. So here it's possible to grep each container based on it's name.

Example output for process parameters table:

[08:02]root@opennms:/root# snmpwalk -c COMMUNITY -v2c 10.10.10.10 .1.3.6.1.2.1.25.4.2.1.5
...
HOST-RESOURCES-MIB::hrSWRunParameters.1210 = STRING: "-n LXC-Container-1"
HOST-RESOURCES-MIB::hrSWRunParameters.1217 = ""
HOST-RESOURCES-MIB::hrSWRunParameters.1241 = STRING: "-n LXC-Container-2"
HOST-RESOURCES-MIB::hrSWRunParameters.1242 = ""
HOST-RESOURCES-MIB::hrSWRunParameters.1313 = STRING: "-n LXC-Container-3"
...

And that's all we need. Create as much monitors based on the config below as you need.

Example poller config:

<service name="Proc-LXC-Container-1" interval="300000" user-defined="true" status="on">
   <parameter key="retry" value="3"/>
   <parameter key="timeout" value="3000"/>
   <parameter key="service-name" value="-n LCX-Container-1"/>
   <parameter key="service-name-oid" value=".1.3.6.1.2.1.25.4.2.1.5"/>
   <parameter key="run-level" value="3"/>
   <parameter key="match-all" value="false"/>
</service>
<monitor service="Proc-LXC-Container-1" class-name="org.opennms.netmgt.poller.monitors.HostResourceSwRunMonitor"/>

To enable the services, you have to restart OpenNMS. When OpenNMS is restarted, assign the service Proc-LXC-Container-1 to an SNMP enabled IP interface of your node in OpenNMS.