SNMP Informant How-To

From OpenNMS
Jump to navigation Jump to search

People using OpenNMS often wonder how to get SNMP information, such as traps and OIDs for data collection, into OpenNMS.

Recently, I did a rather complete examination of the SNMP-Informant standard MIB for a client, so I thought I would share the process in the Wiki page.

The first step in adding MIB information to OpenNMS is to find the MIB (grin). For SNMP-Informant, there is a MIBs directory in the folder that comes with the distribution. In it are both version 1 and version 2 mibs - it really doesn't matter which one we use.

There are two MIBs for the standard SNMP Informant agent: INFORMANT-STD.MIB and WTCS.MIB

The second step is to determine exactly what you want to get out of the MIB. There are two, distinctly different things: traps to convert to events and OIDs to use in the collection of performance data.

A quick search for TRAP-TYPE and NOTIFICATION-TYPE in these two MIBs shows that neither contain traps, so we can ignore that here. Should you want to get trap information into OpenNMS, you need to use the mib2opennms tool, discussed elsewhere.

For data collection, there is another tool called the mibparser that will convert the OIDs in a MIB to a format that can be placed in the datacollection-config.xml file. There is even a convenient wrapper script to run it:

(Note: This wrapper works with Java 1.4.x ONLY)

$OPENNMS_HOME/contrib/mibparser/dist/ INFORMANT-STD.MIB

This gives me the error:

 ERROR: can't find parent 'informant' for textOid 'standard'
 Find which MIB the parent is defined in and add that to the command line

Since "informant" is defined in the WTCS.MIB file, I need to add that to my command:

<source lang="Bash"> $OPENNMS_HOME/contrib/mibparser/dist/ WTCS.MIB INFORMANT-STD.MIB </source>

This returns a lot of output in a format that can be used in the datacollection-config.xml file.

Rather than post it here (I'm going to pretty much post it all anyway), I'll break it out later in the document.

Once I have successfully produced output from a MIB, I check it out to see how easy it will be to add it to OpenNMS. The things to look for are whether or not the data is in a table, and whether or not the data type is numeric.

This MIB provides for main areas of information: disk, memory, network, processes/threads and cpu.

Since the memory and processes information is not stored in a table, it's real easy to configure that and it is already included in the basic datacollection-config.xml file.

For example, the output for the mibParser looks like this:

<source lang="XML">

<mibObj oid="." instance="0" alias="memoryAvailableBytesTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memoryAvailableKBytesTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memoryAvailableMBytesTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memoryCommittedBytesTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memoryCacheBytes" type="Gauge32" />
<mibObj oid="." instance="0" alias="memoryCacheBytesPeakTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memoryPageFaultsPerSecTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memoryPagesInputPerSecTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memoryPagesOutputPerSecTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memoryPagesPerSec" type="Gauge32" />
<mibObj oid="." instance="0" alias="memoryPoolNonpagedBytesTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memoryPoolPagedBytesTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memoryPoolPagedResidentBytesTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memorySystemCacheResidentBytesTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memorySystemCodeResidentBytesTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memorySystemCodeTotalBytesTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memorySystemDriverResidentBytesTOOLONG" type="Gauge32" />
<mibObj oid="." instance="0" alias="memorySystemDriverTotalBytesTOOLONG" type="Gauge32" />


Note that the instance is numeric ("0") which means the data is not in a table. Since RRDTool/jRobin can only store numeric data, it also helps that the data type on all of these values is "Gauge32".

You'll note that the alias for most of these OIDs has the letters "TOOLONG" in it. RRDTool has a 19 character limitation, and this is the parser's way of indicating that something needs to be changed. I also like to indicate in the alias name what device/MIB the data is from, so this ends up in datacollection-config.xml as:

<source lang="XML"> <group name="snmpinformant-memory" ifType="ignore">

       <mibObj oid="." instance="0" alias="sinfMemAvailMB" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemComBytes" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemCacheBytes" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemCacheBytesPk" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemPageFaultsPS" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemPagesInputPS" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemPagesOutPS" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemPagesPerSec" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemPNonpagedByt" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemPPagedBytes" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemPPagedResByt" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemSysCacheResB" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemSysCodeResB" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemSysCodeTotB" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemSysDrvResB" type="Gauge" />
       <mibObj oid="." instance="0" alias="sinfMemSysDrvTotB" type="Gauge" />

</group> </source>

Note that each alias is 19 characters or less, and that "sinf" for SNMP-Informant has been prefixed to each one.

The other three groups in this MIB, which reside in tables, are not so easy. The problem lies with how SNMP-Informant uses instances. For example, this is the available information for disks (output from the mibparser):

<source lang="XML"> <mibObj oid="." instance="lDiskInstance" alias="lDiskInstance" type="InstanceName" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskPercentDiskReadTimeTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskPercentDiskTimeTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskPercentDiskWriteTimeTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskPercentFreeSpaceTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskPercentIdleTimeTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskAvgDiskQueueLengthTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskAvgDiskReadQueueLengthTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskAvgDiskWriteQueueLengthTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskAvgDiskSecPerReadTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskAvgDiskSecPerTransferTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskAvgDiskSecPerWriteTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskCurrentDiskQueueLengthTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskDiskBytesPerSecTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskDiskReadBytesPerSecTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskDiskReadsPerSecTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskDiskTransfersPerSecTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskDiskWriteBytesPerSecTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskDiskWritesPerSecTOOLONG" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskFreeMegabytes" type="Gauge32" /> <mibObj oid="." instance="lDiskInstance" alias="lDiskSplitIOPerSec" type="Gauge32" /> </source>

You'll see that the "lDiskInstance" index into the table. This is where things get really weird.

First, you'll need to run "diskperf -y" as an Administrator on the command line of the target windows boxes, and you'll need to reboot to get any information about disks at all. On my lone Windows box, I have two disk drives, C: and D:. If I run:

<source lang="Bash">

$ snmpwalk -v 1 -c public .
SNMPv2-SMI::enterprises.9600. = STRING: "C:"
SNMPv2-SMI::enterprises.9600. = STRING: "D:"
SNMPv2-SMI::enterprises.9600. = STRING: _Total"


You'll see that there are three instances listed: C:, D: and _Total.

Here's the weird part. Note that the instance for the first one is "2.67.58". In ASCII, the .2 is unprintable but 67:58 is "C:" and 68:58 is "D:". Thus it becomes pretty easy to understand which instance you'll need to collect, but could get weird for oddly named drives.

Now it becomes an exercise in cut and paste. Rather than paste the whole disk group for SNMP-Informant, let's take a look at one OID.

I looked at "lDiskPercentFreeSpace" and figured that would be a good place to start, since many people want to know when their disks are full.

<source lang="Bash">

$ snmpwalk -v 1 -c public .
SNMPv2-SMI::enterprises.9600. = Gauge32: 3
SNMPv2-SMI::enterprises.9600. = Gauge32: 98
SNMPv2-SMI::enterprises.9600. = Gauge32: 75


It is pretty dead on. My C: drive is full while my D: drive is pretty empty. Note that the total drive space percentage is also available (although I am not sure how that is calculated).

If I wanted to collect this information, I would need to edit datacollection-config.xml and add something like:

<source lang="XML"> <group name="snmpinformant-disk" ifType="ignore">

      <mibObj oid="." instance="58" alias="sinfDskPtFreeSpcC" type="Gauge32" />
      <mibObj oid="." instance="58" alias="sinfDskPtFreeSpcD" type="Gauge32" />
      <mibObj oid="." instance="58" alias="sinfDskPtFreeSpcE" type="Gauge32" />
      <mibObj oid="." instance="108" alias="sinfDskPtFreeSpcTl" type="Gauge32" />

</group> </source>

And then add the "snmpinformant-disk" entry to the system definitions at the bottom of the file. Note that I changed to alias names to reflect SNMP-Informant and fit within 19 characters.

Now, adding this to datacollection-config.xml and restarting OpenNMS will (should) start data collection.

The next step will be to add reports for these variables. Editing the file and finding the SNMP Informant section, I added the following report: Disk Space (Drive C) (SNMP-Inf)
report.sinf.diskfreeC.command=--title="Windows Available Space Disk Drive C (SNMP-Informant)" \
 DEF:availspace={rrd1}:sinfDskPtFreeSpcC:AVERAGE \
 LINE2:availspace#ff0000:"% Avail." \
 GPRINT:availspace:AVERAGE:"Avg \\: %10.2lf %s" \
 GPRINT:availspace:MIN:"Min \\: %10.2lf %s" \
 GPRINT:availspace:MAX:"Max \\: %10.2lf %s\\n"

This will need to be repeated for all the other disks as well as adding it to the reports= line at the top of the file.

Finally, we want to know when the available disk gets to, say, 5%, so edit the thresholds.xml file and add:

<source lang="XML"> <threshold type="low" ds-name="sinfDskPtFreeSpcC" ds-type="node" value="5" rearm="10" trigger="1"/> <threshold type="low" ds-name="sinfDskPtFreeSpcD" ds-type="node" value="5" rearm="10" trigger="1"/> <threshold type="low" ds-name="sinfDskPtFreeSpcE" ds-type="node" value="5" rearm="10" trigger="1"/> <threshold type="low" ds-name="sinfDskPtFreeSpcTl" ds-type="node" value="5" rearm="10" trigger="1"/> </source>

The next thing to look at are CPU stats:

<source lang="XML"> <mibObj oid="." instance="cpuInstance" alias="cpuInstance" type="InstanceName" /> <mibObj oid="." instance="cpuInstance" alias="cpuPercentDPCTime" type="Gauge32" /> <mibObj oid="." instance="cpuInstance" alias="cpuPercentInterruptTimeTOOLONG" type="Gauge32" /> <mibObj oid="." instance="cpuInstance" alias="cpuPercentPrivilegedTimeTOOLONG" type="Gauge32" /> <mibObj oid="." instance="cpuInstance" alias="cpuPercentProcessorTimeTOOLONG" type="Gauge32" /> <mibObj oid="." instance="cpuInstance" alias="cpuPercentUserTime" type="Gauge32" /> <mibObj oid="." instance="cpuInstance" alias="cpuAPCBypassesPerSecTOOLONG" type="Gauge32" /> <mibObj oid="." instance="cpuInstance" alias="cpuDPCBypassesPerSecTOOLONG" type="Gauge32" /> <mibObj oid="." instance="cpuInstance" alias="cpuDPCRate" type="Gauge32" /> <mibObj oid="." instance="cpuInstance" alias="cpuDPCsQueuedPerSec" type="Gauge32" /> <mibObj oid="." instance="cpuInstance" alias="cpuInterruptsPerSec" type="Gauge32" /> </source>

I only have one CPU on my machine, but I get:

<source lang="XML">

$ snmpwalk -v 1 -c public .
SNMPv2-SMI::enterprises.9600. = STRING: "0"
SNMPv2-SMI::enterprises.9600. = STRING: "_Total"


and as you can see, we get both a single CPU and _Total.

The statistic that most people are interested in is how busy is the CPU? From the SNMP Informant MIB:

 cpuPercentProcessorTime OBJECT-TYPE
    SYNTAX     Gauge32
    MAX-ACCESS read-only
    STATUS     current
            "% Processor Time is the percentage of time
            that the processor is executing a non-Idle
            thread.  This counter was designed as a primary
            indicator of processor activity.  It is
            calculated by measuring the time that the
            processor spends executing the thread of the
            Idle process in each sample interval, and
            subtracting that value from 100%.  (Each
            processor has an Idle thread which consumes
            cycles when no other threads are ready to run).
            It can be viewed as the percentage of the
            sample interval spent doing useful work.  This
            counter displays the average percentage of busy
            time observed during the sample interval.  It
            is calculated by monitoring the time the
            service was inactive, and then subtracting that
            value from 100%."
    ::= { processorEntry 5 }

I especially liked "This counter was designed as a primary indicator of processor activity" since that is what we are looking for. So a value of 100% would be bad if sustained.

So off to modify datacollection-config.xml again:

<source lang="XML"> <mibObj oid="." instance="48" alias="sinfCpuPtProcTime0" type="Gauge32" /> <mibObj oid="." instance="49" alias="sinfCpuPtProcTime1" type="Gauge32" /> <mibObj oid="." instance="50" alias="sinfCpuPtProcTime2" type="Gauge32" /> <mibObj oid="." instance="51" alias="sinfCpuPtProcTime3" type="Gauge32" /> <mibObj oid="." instance="108" alias="sinfCpuPtProcTimeTl" type="Gauge32" /> </source>

This will collect the values we want.

And now for a sample graph to place in CPU 0 Percent Processor Time (SNMP-Inf)
 report.sinf.cpu0percent.command=--title="Windows CPU 0 Utilization (SNMP-Informant)" \
  DEF:utilization={rrd1}:sinfCpuPtProcTime0:AVERAGE \
  LINE2:utilization#ff0000:"% util." \
  GPRINT:utilization:AVERAGE:"Avg \\: %10.2lf %s" \
  GPRINT:utilization:MIN:"Min \\: %10.2lf %s" \
  GPRINT:utilization:MAX:"Max \\: %10.2lf %s\\n"

Remember to add it to the "reports=" line at the top of the file.

For thresholds, it's similar to above:

<source lang="XML"> <threshold type="high" ds-name="sinfCpuPtProcTime0" ds-type="node" value="100" rearm="90" trigger="3"/> <threshold type="high" ds-name="sinfCpuPtProcTime1" ds-type="node" value="100" rearm="90" trigger="3"/> <threshold type="high" ds-name="sinfCpuPtProcTime2" ds-type="node" value="100" rearm="90" trigger="3"/> <threshold type="high" ds-name="sinfCpuPtProcTime3" ds-type="node" value="100" rearm="90" trigger="3"/> <threshold type="high" ds-name="sinfCpuPtProcTimeTl" ds-type="node" value="100" rearm="90" trigger="3"/> </source>

This will require three consecutive polls where the CPU is at 100% before the alarm will be raised.