Data Collection Configuration How-To

From OpenNMS
Jump to: navigation, search

Originally written by Tarus Balog tarus@opennms.org.

Introduction

Purpose

This How-To is one in a series designed to serve as a reference for getting started with OpenNMS. Eventually, these documents will cover everything necessary to get OpenNMS installed and running in your environment.

Copyright

Content is available under a Creative Commons Attribution-NonCommercial-ShareAlike2.5 License.

Overview

OpenNMS is an enterprise-grade network management platform developed under the open-source model. Unlike traditional network management products which are very focused on network elements such as interfaces on switches and routers, OpenNMS focuses on the services network resources provide: web pages, database access, DNS, DHCP, etc. (although information on network elements is also available).

There are two major ways that OpenNMS gathers data about the network. The first is through polling. Processes called monitors connect to a network resource and perform a simple test to see if the resource is responding correctly. If not, events are generated. The second is through data collection using collectors. Currently data can be collected by :

  • SNMP,
  • NSClient (the Nagios Agent),
  • JMX,
  • HTTP

Getting data collection configured properly seems to be one of the more difficult tasks in OpenNMS, but it's just a matter of "getting all your ducks in a row". There are several things that have to happen in order for this to work. For all data collection methods:

Provisiond 
During the scanning process, Provisiond discovers whether the various collectable services exist on the discovered node. More specifically for SNMP collection, Provisiond must be able to access SNMP information on that interface and to form some basic mappings, such as IP Address to ifIndex.
collectd-configuration.xml 
Just as in the poller-configuration.xml file (covered elsewhere), interfaces are mapped to packages for collection in this file. If data collection is required on an interface, it needs to exist in a package in this file. The default configuration is suitable for most initial purposes.

SNMP

For SNMP data collection, the following files must be configured correctly:

snmp-config.xml 
For each interface, a valid community string must exist in this file.
datacollection-config.xml 
Each package in the collectd configuration file points to an snmp-collection definition in this file. Each snmp-collection defines what information to collect via SNMP, and it is pretty powerful as far as configuration goes. The default configuration is fairly complete for basic purposes, and will probably not require much changing initially.

NSClient

For NSClient data collection, you need to install the NSClient agent on the Windows servers (http://nsclient.ready2run.nl/), configure it with a port/password, and then configure OpenNMS:

nsclient-config.xml 
This is where you configure passwords, timeouts and ports to connect on. Each interface you want to collect on must have a valid password specified in this file (although you can specify a default set of parameters to simplify configuration).
nsclient-datacollection-config.xml 
This file configures named sets of collections which correspond to names specified in the configuration of collectd. These collection sets define which Windows Perfmon counters to collect, and how to identify which servers they should be collected from.

JMX

For JMX data collection, the following file must be configured:

jmx-datacollection-config.xml 
As for the other datacollection-config files, this file specifies which data points should be collected. In this case, it's MBeans, and which beans/attributes should be collected. Again, these are grouped by named set which corresponds to the names used in packages in collectd.

HTTP

For HTTP data collection, configure:

http-datacollection-config.xml 
In this config file you specify URLS and the regular expressions to use to extract the data points from the returned pages. Again, collections are grouped by names, corresponding to the names used in packages in collected

XML

For XML data collection, configure:

xml-datacollection-config.xml 
This modules is similar to the HTTP data collection module, but it is able to parse XML responses. It also has a few other extra tricks, however it is not yet installed by default.

The best part about data collection is that if everything goes smoothly, it is completely automated. Particularly, the out of the box configuration requires relatively little customisation (usually just providing SNMP community strings or NSClient passwords) to be usefully functional.

Data Collection

snmp-config.xml

The parameters used to connect with SNMP agents are defined in the snmp-config.xml file. Here is an example:

<snmp-config retry="3" timeout="800" read-community="public" write-community="private">
     <definition version="v2c">
          <specific>192.168.0.5</specific>
     </definition>
     <definition retry="4" timeout="2000">
          <range begin="192.168.1.1" end="192.168.1.254"/>
          <range begin="192.168.3.1" end="192.168.3.254"/>
     </definition>
     <definition read-community="bubba" write-community="zeke">
          <range begin="192.168.2.1" end="192.168.2.254"/>
     </definition>
     <definition port="1161">
          <specific>192.168.5.50</specific>
     </definition>
</snmp-config>

The common attributes for the snmp-config tag are as follows:

retry 
The number of attempts that will be made to connect to the SNMP agent. Default is 1
timeout 
The amount of time, in milliseconds, that OpenNMS will wait for a response from the agent. Default is 3000
read-community 
The default "read" community string for SNMP queries. If not specified, defaults to "public"
write-community 
The default "write" community string for SNMP queries. Note that this is for future development - OpenNMS does not perform SNMP "sets" at the moment.
port 
This overrides the default port of 161.
version 
Here you can force either SNMP version 1 by specifying "v1", version 2c with "v2c", or version 3 with "v3". Default is "v1"

For SNMPv3 authentication and collection (only available when using SNMP4J):

security-name 
A security name for SNMP v3 authentication
auth-passphrase 
The passphrase to use for SNMP v3 authentication
auth-protocol 
The authentication protocol for SNMP v3. Either "MD5" or "SHA". Default is MD5
privacy-passphrase 
A privacy pass phrase used to encrypt the contents of SNMP v3 packets
privacy-protocol 
The privacy protocol used to encrypt the contents of SNMP v3 packets. Either "DES", "AES","AES192" or "AES256". Default is DES.
engine-id 
The engine id of the target agent
context-name 
The name of the context to obtain data from on the target agent.
context-engine-id 
The context engine id of the target entity on the agent.
enterprise-id 
An enterprise id for SNMP v3 collection

More rarely used attributes in the snmp-config tag are:

proxy-host 
A proxy host to use to communicate with the specified agent(s)
max-vars-per-pdu 
Number of variables per SNMP request. Default is 10
max-request-size 
If using SNMP4J as the SNMP library, the maximum size of outgoing SNMP requests. Defaults to 65535, must be at least 484

All of the global parameters can be overridden with definition tags. These new SNMP definitions can apply to ranges or specific IP addresses.

Note that if an interface will qualify in multiple ranges in this file, the first one found will be used.

nsclient-config.xml

This is the NSClient equivalent of snmp-config.xml, where parameters for connecting to the NSClient agent are defined. An example of such a file is:

<?xml version="1.0"?>
<nsclient-config port="1248" retry="3" timeout="800" password="apassword">
</nsclient-config>

The parameters that can be configured

retry 
The number of attempts that will be made to connect to the NSClient agent. Default is 1
timeout 
The amount of time, in milliseconds, that OpenNMS will wait for a response from the agent. Default is 3000
port 
This overrides the default port of 1248.
password 
The password (if any) required to authenticate to the NSClient agent. Default is the string "None"

As with snmp-config.xml, all of the global parameters can be overridden with definition tags. These new definitions can apply to ranges or specific IP addresses. Note that if an interface is matched in multiple ranges in this file, the first one found will be used.

Capabilities

As explained in the Discovery How-To, the capabilities check process starts with a newSuspect event (generated either manually or through the discovery process). This NewSuspect event is received by the provisioning daemon (Provisiond).

The Provisiond process is responsible for scanning IP addresses for particular services. Each service that can be detected on a discovered node is defined in the default foreign-source definition. Upon receipt of a newSuspect event, Provisiond begins to test each configured service detector to see if it exists on that device.

When testing SNMP, Provisiond makes an attempt to receive the System Object ID (systemOID) for the device using the community string and port defined in snmp-config.xml.

If the sysObjectID is successfully retrieved, Provisiond gathers additional SNMP attributes from the system group, the ipAddressTable (if present), ipAddrTable (if ipAddressTable is not present), ifTable, and ifXTable.

If the ipAddressTable (or ipAddrTable) or ifTable are unavailable, the scan aborts (but the SNMP system data may show up on the node page - this happens frequently with Net-SNMP agents where only the system tree is available by default to a query using the "public" community string).

Second, all of the sub-target IP addresses in the ipAddressTable or ipAddrTable have all the configured service detectors run against them.

Third, every IP address in the ipAddressTable or ipAddrTable that supports SNMP is tested to see if it maps to a valid ifIndex in the ifTable. Each one that does is marked as a secondary SNMP interface and is a contender for becoming the primary SNMP interface.

Finally, all secondary SNMP interfaces are tested to see if they match a valid package in the collectd-configuration file. If more than one valid IP address meets all three criteria (supports SNMP, has a valid ifIndex and is included in a collection package), then the lowest-numbered IP address is marked as primary. All SNMP data collection is performed via the primary SNMP interface.

(Note: in the future we will have the ability to change to a secondary SNMP interface should the primary become unavailable).

When the Provisiond node scan and service detectors are completed, events are generated, including nodeGainedService events.

collectd-configuration.xml

Data collection is handled via the collectd process. collectd listens for NodeGainedService events for the SNMP "service". When this happens, it checks to see if the primary SNMP interface for that node exists in a collection package (which it should by definition). If so, the SNMP collector is instantiated for that IP address.

Let's look at the collectd-configuration.xml file:

<collectd-configuration
        threads="5">

        <package name="example1">
                <filter>IPADDR IPLIKE *.*.*.*</filter>
                <specific>0.0.0.0</specific>
                <include-range begin="192.168.0.1" end="192.168.0.254"/>
                <include-url>file:/opt/OpenNMS/etc/include</include-url>

                <service name="SNMP" interval="300000" user-defined="false" status="on">
                        <parameter key="collection" value="default"/>
                        <parameter key="port" value="161"/>
                        <parameter key="retry" value="3"/>
                        <parameter key="timeout" value="3000"/>
                </service>

                <outage-calendar>zzz from poll-outages.xml zzz</outage-calendar>
        </package>

        <collector service="SNMP"       class-name="org.opennms.netmgt.collectd.SnmpCollector"/>
</collectd-configuration>

If you are familiar with the poller configuration file, you can probably figure out what this file does.

The threads attribute limits the number of threads that will be used by the data collection process. You can increase or decrease this value based upon your network and the size of your server.

Just like pollers have poller packages, collectors have collection packages. Each package determines how often the device will be polled for SNMP data, and through the collection key, what will be polled and how it will be stored. The example1 package is the default included out of the box.

What Interfaces are Included in a Package?

The package name is followed by a list of tags that define what interfaces will be included in the package. All of the tags, except for filter, are optional and unbounded. There are five types of these tags:

filter 
Specify a filter that matches the interfaces to be included in the package.
<filter>IPADDR IPLIKE *.*.*.*</filter>

Each package must have a filter tag that performs the initial test to see if an interface should be included in a package. Filters operate on interfaces (not nodes) and is discussed in depth in this How-To. Only one filter statement can exist per package.

specific  
Specify a specific IP address to include in the package.
<specific>192.168.1.59</specific>
include-range 
This specifies a particular range of IP addresses to include in a package.
<include-range begin="192.168.0.1" end="192.168.0.254"/>
exclude-range
This specifies a particular range of IP addresses to exclude in a package. This will override an include-range tag.
 <exclude-range begin="192.168.0.100" end="192.168.0.104"/>
include-url 
Specify a file that contains a list of IP addresses to include.
<include-url>file:/opt/OpenNMS/etc/include</include-url>

This tag will point to a file that consists of a list of IP addresses, one to a line, that will be included in the package. Comments can be embedded in this file. Any line that begins with a "#" character will be ignored, as will the remainder of any line that includes a space followed by "#".

Services

Again, drawing on the analogy with pollers, each poller package has a set of protocols that it monitors, collectors have a set of services on which they collect data. At the time there is only one: SNMP.

The service tag names the service and also specifies various parameters:

name 
This is the name of the service.
interval 
This specifies the polling interval (5 minutes by default).
user-defined 
In the future, users may be able to define new collection sources (like from a log file) through a GUI, but at the moment this is set to "false".
status 
Also in the future, there will be an admin GUI for collectors just as there is for pollers, and users will be able to turn SNMP data collection on or off from a web page. At the moment, this can only be done by editing this file and setting status to either "off" or "on" (default).

Service Parameters

There are three parameters available common to all services:

timeout 
The timeout, in milliseconds, to wait for a response to an SNMP request.
retries 
If a timeout does occur, this controls the number of attempts to make before giving up.
port 
This allows you to override the default port for SNMP data collection.
SNMP

In addition, the SNMP service can have the following parameters specified:

collection 
This points to an SNMP collection in the datacollection-config.xml file that determines what Object IDs (OIDs) will be collected.
oid 
The collector will test to see if the interface supports SNMPv2 by doing a GET-BULK request on this OID. By default it is set to the systemOID. If the GET-BULK is successful, then the rest of the polling for this device will take advantage of SNMPv2. Otherwise, SNMPv1 will be used. The intent was to allow for this parameter to override the systemOID value, but it was never implemented, so you can ignore this parameter for now.
JBOSS

OpenNMS comes with libraries for JBoss 4.0.2. If you need the JBossCollector in order to collect data from JBoss 4.2.2, these will cause a silent failure. In that case, delete ${OPENNMSHOME}/lib/jboss*4.0.2.jar and ${OPENNMSHOME}/lib/jnp-client-4.0.2.jar, and place your own jbossall-client.jar in ${OPENNMSHOME}/lib/jboss/jbossall-client.jar.

The JBOSS4 and JBoss32 services can have the following additional parameter specified:

factory 
Specifies the method of connecting to the JBOSS server. It can be either HTTP or RMI.
NSClient

The NSClient service has the following additional parameter:

nsclient-collection 
This points to a collection in the nsclient-datacollection-config.xml file that determines what perfmon counters will be collected.

Outage Calendar

In order to keep servers operating properly, it is often necessary to bring them down for scheduled maintenance. Instead of having these maintenance outages reflected as a true service outage, they can be included in an "Poller Outage Calendar" and then referenced by the poller package using the outage-calendar tag. This tag contains the name of a valid outage in the poll-outages.xml file.

The outage-calendar tag is optional and unbounded (i.e. you can reference more than one outage).

Since version 1.5.91 you can configure scheduled outages from the GUI, got to Admin -> Scheduled Outages.

Before version 1.5.91, there were three types of outages: weekly, monthly and specific. Since 1.5.91 there is also the possibility to configure daily outages.

If you have the problem nodes are reported to be down thought they are within a daily outage which goes past midnight try to define two timespans within the outage, one until midnight and the other one starting after midnight, e.g. instead of outage 22:00:00-01:00:00 define 22:00:00-23:59:59 and 00:00:00-01:00:00.


Examples from the poll-outages file:

<outage name="global" type="weekly">
  <time day="sunday" begins="12:30:00" ends="12:45:00"/>
  <time day="sunday" begins="13:30:00" ends="14:45:00"/>
  <time day="monday" begins="13:30:00" ends="14:45:00"/>
  <time day="tuesday" begins="13:00:00" ends="14:45:00"/>
  <interface address="192.168.0.1"/>
  <interface address="192.168.0.36"/>
  <interface address="192.168.0.38"/>
</outage>

This defines an outage calendar called "global" that is run every week. It specifies four outage times: Sunday starting at 12:30 pm and lasting 15 minutes, Sunday starting at 1:30 pm and lasting an hour and fifteen minutes, the same outage on Monday, and one on Tuesday from 1:00 pm to 2:45 pm. This is to demonstrate that you can have multiple outages on a given day and the same outage on different days. Three interfaces will be affected.

<outage name="hub maintenance" type="monthly">
  <time day="1" begins="23:30:00" ends="23:45:00"/>
  <time day="15" begins="21:30:00" ends="21:45:00"/>
  <time day="15" begins="23:30:00" ends="23:45:00"/>
  <interface address="192.168.100.254"/>
  <interface address="192.168.101.254"/>
  <interface address="192.168.102.254"/>
  <interface address="192.168.103.254"/>
  <interface address="192.168.104.254"/>
  <interface address="192.168.105.254"/>
  <interface address="192.168.106.254"/>
  <interface address="192.168.107.254"/>
</outage>

This outage calendar is called "hub maintenance" that is run every month. On the first of the month the outage begins at 11:30 pm and lasts 15 minutes. The same outage occurs on the 15th of the month in addition to another outage from 9:30 pm to 9:45 pm. Thus you can have the same outage on different dates as well as more than one outage on a particular date. Eight interfaces are affected by this outage.

<outage name="proxy server tuning" type="specific">
  <time begins="10-Nov-2001 17:30:00" ends="11-Nov-2001 08:00:00"/>
  <interface address="192.168.0.1"/>
</outage>

It is also possible to include an outage on a specific date and time. This outage named "proxy server tuning" began on November 10th, 2001 at 5:30 pm and lasted until 8:00 am the next day. This affected one interface. You can have more than one "time" entry per specific outage.

If a particular outage calendar is included in a collection package, then collection will not occur during this time.

Final Tags in collectd-configuration.xml

Just like in the poller configuration file, each service that is collected on must reference the class that is to be used for this collection. Therefore there should be one or more of the following definitions (or your own if you've implemented your own collector class)

<collector service="SNMP"       class-name="org.opennms.netmgt.collectd.SnmpCollector"/>
<collector service="NSClient"       class-name="org.opennms.netmgt.collectd.NSClientCollector"/>
<collector service="JBoss4"       class-name="org.opennms.netmgt.collectd.JBossCollector"/>
<collector service="JBoss32"      class-name="org.opennms.netmgt.collectd.JBossCollector"/>
<collector service="JVM"          class-name="org.opennms.netmgt.collectd.Jsr160Collector"/>
<collector service="HttpDocCount" class-name="org.opennms.netmgt.collectd.HttpCollector" />

Data Collection Configuration

datacollection-config.xml

This is one of the more complex files in the product. It determines what values will be collected upon for a given interface and package.

At this point in time, it is probably best to review the structure of things and expand upon them somewhat. Okay, just like there are poller packages for monitoring service levels, there are collection packages that control data collection. Poller packages can monitor numerous protocols, and collection packages can collect on numerous data sources, but for now the only one is SNMP.

The SNMP data collection service points to an SNMP data collection "scheme". I am out of synonyms for "package", and I don't want to get confused between the packages in the collectd configuration file and the SNMP collections in the data collection configuration file, so for the purpose of this How-To, we'll call them schemes. These schemes bring together OIDs for collection into groups and the groups are mapped to systems. The systems are mapped to interfaces by the systemOID. In addition, each "scheme" controls how the data will be collected and stored.

It becomes clearer as we move on.

First, let's check out the datacollection-config.xml file. Outside of the snmp-collection definition (the "scheme"), there is only one parameter:

<datacollection-config
        rrdRepository = "/var/opennms/rrd/snmp/">

This determines in which directory the collected information will be stored. If you change this value, you must also change the rrdRepository values in the following files:

poller-configuration.xml
thresholds.xml
http-datacollection-config.xml
jmx-datacollection-config.xml
nsclient-datacollection-config.xml

snmp-collection General Set Up

After the repository had been defined, the next tag starts the snmp-collection definition:

<snmp-collection name="default"
                 maxVarsPerPdu = "50"
                 snmpStorageFlag = "all">

The name attribute is pretty self-explanatory. This is the name that must be matched to the key="collection" value in the collectd configuration file. The maxVarsPerPdu places a limit on the number of SNMP variables that will be retrieved with a GET-BULK request in one packet. You should not need to adjust this, but if you have some SNMP agents that are somewhat slow, you could reduce this to ease the load on them.

The snmpStorageFlag is a pretty important attribute. It can be set to "all" (default), "primary", or "select". What this does is determine if SNMP data collection will occur on all interfaces for a particular node or just the interface marked as "primary". This can greatly affect the size of your Round Robin Database (RRD) if you have a number of multi-interface devices like switches, but it won't have much affect on a network consisting mainly of servers (which tend to only have a single interface). I don't know what storage behaviour is used when the snmpStorageFlag is set to "select".

This is one instance where you may want to have two collection packages and two collection schemes. You could build a collection package for just routers where snmpStorageFlag is set to "all" in the collector scheme and then have everything else in another package where it is set to "primary" in the scheme.

RRD Configuration

<rrd step = "300">
  <rra>RRA:AVERAGE:0.5:1:8928</rra>
  <rra>RRA:AVERAGE:0.5:12:8784</rra>
  <rra>RRA:MIN:0.5:12:8784</rra>
  <rra>RRA:MAX:0.5:12:8784</rra>
</rrd>

The next section of the scheme configuration specifies RRD (round robin database) parameters for storing and rolling up the collected data sampes. RRDTool is a product that grew out of MRTG. It creates a very compact database structure for the storage of periodic data, such as is gathered by OpenNMS. RRD data is stored in files that are created when initialized to hold data for a certain amount of time. This means that with the first data collection these files are as large as they will ever get, but it also means that you will see an initially large decrease in disk space as collection is first started.. Once the RRD file is full, the oldest data is discarded.

OpenNMS releases up to and including 1.2.9 used RRDTool proper by default via a JNI (Java Native Interface), meaning that the resulting files could be read by other applications capable of consuming RRDTool's file format. The files written by OpenNMS via the JNI RRD strategy have a .rrd extension by default. Beginning with the 1.3.2 release, the default is to use JRobin, a pure-Java implementation of RRDTool 1.0's functionality. The files produced via the JRobin RRD strategy have a .jrb extension by default, and are not compatible with RRDTool proper. See the JRobin site for the motivation behind this decision.

The first line, the rrd step size, determines the granularity of the data. By default this is set to 300 seconds, or five minutes, which means that the data will be saved once every five minutes per step. Note that this is also one of the few places where time in OpenNMS is referenced in seconds instead of milliseconds.

Each RRD is made up of Round-Robin Archives. An RRA consists of a certain number of steps. All of the data that is collected in those steps is then consolidated into a single value that is then stored in the RRD. For instance, if I poll a certain SNMP variable once a minute, I could have an RRA that would collect all samples over a step of five minutes, average the (five) values together, and store the average in the RRD.

The RRA statements take the form:

RRA:Cf:xff:steps:rows
RRA 
This string defines the line as an RRA configuration command. It does not change, and is always the text "RRA".
Cf 
This field represents the "consolidation function". It can take one of four values, AVERAGE, MAX, MIN, or LAST. They are detailed below.
xff 
This is the "x-files factor". If we are trying to consolidate a number of samples into one, there is a chance that there could be gaps where a value wasn't collected (the device was down, etc.). In that case, the value would be UNKNOWN. This factor determines how many of the samples can be UNKNOWN for the consolidated sample is considered UNKNOWN. By default this is set to 0.5 or 50%.
steps 
This states the number of "steps" that make up the RRA. For example, if the step size is 300 seconds (5 minutes) and the number of steps is 12, then the RRA is 12 x 5 minutes = 60 minutes = 1 hour long, and it will stored the consolidated value for that hour.
rows 
The rows field determine the number of values that will be stored in the RRA.
Consolidation Functions

These are used in the "Cf" part of an RRA statement.

AVERAGE 
Average all the values over the number of steps in the RRA.
MAX 
Store the maximum value collected over the number of steps in the RRA.
MIN 
Store the minimum value collected over the number of steps in the RRA.
LAST 
Store the last value collected over the number of steps in the RRA.

Let's bring this all together with some more examples. Take the first RRA line in the configuration:

RRA:AVERAGE:0.5:1:8928

This says to create an archive consisting of the AVERAGE value collected over 1 step and store up to 8928 of them. If, for any step, more than 50% of the values are UNKNOWN, then the average value will be UNKNOWN. Since the default step size is 300 seconds, or five minutes, and the default polling cycle (in the collectd configuration) is five minutes, we would expect there to be one value per step, and so the AVERAGE should be the same as the MIN or MAX or LAST. 8928 five minute samples at 12 samples per hour and 24 hours per day is 31 days. Thus this RRA will hold five minute samples for 31 days before discarding data.

The next lines get a little more interesting:

RRA:AVERAGE:0.5:12:8784
RRA:MIN:0.5:12:8784
RRA:MAX:0.5:12:8784

The only difference between these lines is the consolidation function. We are going to "roll up" the step 1 samples (5 minutes) into 12 step samples (1 hour). We are also going to store three values: the average of all samples during the hour, the minimum value of those samples and the maximum value. This data is useful for various reports (the AVERAGE shows throughput whereas MAX and MIN show peaks and valleys). These will be stored as one hour samples 8784 times, or 366 days.

So, to summarize, by default the SNMP collector will poll once every five minutes. This value will be stored as collected for 31 days. Also, hourly samples will be stored which include the MIN, MAX and AVERAGE.

You can easily change these numbers to increase or decrease the amount of data stored. A few caveats. First, increasing the amount and/or frequency of samples will have a direct affect on the amount of disk space required. You could add a MIN and MAX RRA for the single step RRA, which would increase necessary disk space by up to 50%, but since by default there is only one value, MIN, MAX and AVERAGE will be the same, so it is not really necessary unless you also increase the polling rate. Second, you cannot change these numbers once collection has started without losing all of the collected data up to that point. So it is important to set your values early. When you change these numbers, you must delete all .jrb files in order for them to be re-created.

A note for international users: if your LOCALE is set to something other than "en_US" you may need to use a "comma" instead of a "period" in the xff, for example:

RRA:AVERAGE:0,5:12:8784
RRA:MIN:0,5:12:8784
RRA:MAX:0,5:12:8784

You have to do this if you see a "can't parse argument 'RRA:AVERAGE:0.5:1:8928'" in the collectd log file.

Resource Types

If you wish to collect tabular or "columnar" data from MIB tables that are indexed on some instance identifier other than ifIndex, you will need to have a custom resourceType element for each unique table-indexing strategy. Details on this type of collection are available separately in this article.

Groups

If you are still with me, let's talk about something a little more intuitive with respect to SNMP data collection, the SNMP variables themselves. OpenNMS comes with a utility (OPENNMS_HOME/contrib/mibparser/dist/parseMib.sh) that automates much of the work involved in importing OIDs for collection, but its output almost always requires some amount of work by a human operator. Each value is spelled out in a group entry:

<groups>
  <group  name = "mib2-interfaces" ifType = "all">
    <mibObj oid=".1.3.6.1.2.1.2.2.1.10" instance="ifIndex" alias="ifInOctets" type="counter"/>
    <mibObj oid=".1.3.6.1.2.1.2.2.1.13" instance="ifIndex" alias="ifInDiscards" type="counter"/>
    <mibObj oid=".1.3.6.1.2.1.2.2.1.14" instance="ifIndex" alias="ifInErrors" type="counter"/>
    <mibObj oid=".1.3.6.1.2.1.2.2.1.16" instance="ifIndex" alias="ifOutOctets" type="counter"/>
    <mibObj oid=".1.3.6.1.2.1.2.2.1.19" instance="ifIndex" alias="ifOutDiscards type="counter"/>
    <mibObj oid=".1.3.6.1.2.1.2.2.1.20" instance="ifIndex" alias="ifOutErrors" type="counter"/>
  </group>

SNMP variable collections are placed into groups to make it easier to associate with specific kinds of devices. A group consists of a group name and the types of interfaces (ifType) for which the member objects should be collected.

The ifType attribute can take on the following values:

all 
This means that all interface type will be polled for the OIDs included in the group.

ignore 
This is used for scalar values, i.e. those that appear only once on a device, such as the "load average" for a router. This value will be collected and stored once for the device.
[specific numeric value] 
You may want to poll certain value from ATM interfaces, others from point-to-point WAN links, and still others from Ethernet interfaces. For example:
<group  name = "my-ATM-example" ifType = "37">
<group  name = "ethernet-example" ifType = "6,62">

See http://www.iana.org/assignments/ianaiftype-mibfor a comprehensive list of ifType values.

As a special case, groups containing object definitions for tabular (aka "columnar") data from tables indexed by any instance identifier other than ifIndex must have an ifType of all. This type of data is referred to as generic index data, and is described in more detail in article Collecting SNMP data from tables with arbitrary indexes.

It is important never to mix scalar data, interface-level data, and generic-index data in the same group.

Each SNMP MIB variable consists of an OID plus an instance. Usually, that instance is either zero (0) or an index to a table. At the moment, OpenNMS only understands the ifIndex index to the ifTable. All other instances have to be explicitly configured. The alias must be no more than 19 characters in length (a limitation stemming from the design of RRDTool), unique per combination of device type and resource type, and usually should be unique per OID. The RRD file that is created will have the alias as its filename.

OpenNMS understands four types of numeric variables to collect: gauge, timeticks, integer, counter. Since RRD only understands numeric data, any string types encountered will be parsed to a number before being persisted in RRD storage. If the conversion cannot be made (perhaps you are trying to collect on systemName, for example), a log message will be generated. Starting with OpenNMS 1.3.2, a type of string can be used to collect string values and store their values separately from RRD files.

Systems

Once the groups are defined, the last step is to associate them with the systems to be monitored. The SNMP systemOID (.1.3.6.1.2.1.1.2, instance 0) returns another OID that is meant to uniquely identify the type of equipment being used.

<systems>
  <systemDef name = "Net-SNMP">
    <sysoidMask>.1.3.6.1.4.1.2021.250.</sysoidMask>
    <collect>
      <includeGroup>mib2-interfaces-net-snmp</includeGroup>
      <includeGroup>mib2-host-resources-storage</includeGroup>
      <includeGroup>mib2-host-resources-system</includeGroup>
      <includeGroup>mib2-host-resources-memory</includeGroup>
      <includeGroup>ucd-loadavg</includeGroup>
    </collect>
  </systemDef>

In this system definition, any device with a system OID that is being used for SNMP data collection whose systemOID starts with ".1.3.6.1.4.1.2021.250." will collect on five MIB groups: mib2-interfaces-net-snmp, mib2-host-resources-storage, mib2-host-resources-system, mib2-host-resources-memory and ucd-loadavg.

If you want to match against a specific oid use <sysoid> instead of <sysoidMask>

So, to review once again - you set up collection packages, similar to poller packages, in the collectd-configuration.xml file. A key in that file points to a particular snmp-collection tag in datacollection-config.xml (this is what I have referred to as a scheme). For each scheme, you set up how the data will be stored, whether all interfaces will be collected on or just the primary interface for each node, what MIB OIDs are included in each MIB group, and what MIB groups are associated with what systems, based on the system definition.

Got it? Whew.

Modular Configuration

As of OpenNMS 1.8.4 and 1.9.1, it is now possible to modularly include multiple configuration files into datacollection-config.xml, much like eventconf.xml.

First, make sure you have an $OPENNMS_HOME/etc/datacollection directory. If not, make it. Then, create one or more configuration files in that directory. The opening tag should be "<datacollection-group>" with a name set, which then can contain any number of resourceType, group, and systemDef definitions, just like the main datacollection-config.xml file. For example:

 <?xml version="1.0"?>
 <datacollection-group name="Cisco">
 
     <resourceType name="cbgpPeerAddrFamilyPrefixEntry" label="Cisco BGP Peer / Address Family"
                   resourceLabel="Peer ${subIndex(0,4)}">
       <persistenceSelectorStrategy class="org.opennms.netmgt.collectd.PersistAllSelectorStrategy"/>
       <storageStrategy class="org.opennms.netmgt.dao.support.IndexStorageStrategy"/>
     </resourceType>
 
       <group name="cisco-bgp-peer-addr-family-prefix-stats" ifType="all">
         <mibObj oid=".1.3.6.1.4.1.9.9.187.1.2.4.1.1" instance="cbgpPeerAddrFamilyPrefixEntry"
                   alias="cbgpPeerAcceptedPfx" type="gauge" />
       </group>
 
     <systemDef name="Cisco Routers">
       <sysoidMask>.1.3.6.1.4.1.9.1.</sysoidMask>
       <collect>
         <includeGroup>adsl-line</includeGroup>
         <includeGroup>rfc1315-frame-relay</includeGroup>
         <includeGroup>mib2-X-interfaces</includeGroup>
         <includeGroup>ietf-bgp4-peer-stats</includeGroup>
         <includeGroup>cisco-bgp-peer-addr-family-prefix-stats</includeGroup>
         </collect>
       </systemDef>
 
 </datacollection-group>

Then, add it near the end of datacollection-config.xml within an snmp-collection tag, using the group name you defined in the individual XML file:

 <?xml version="1.0"?>
 <datacollection-config ...>
   <snmp-collection ...>
 ...
     <include-collection dataCollectionGroup="Cisco" />
   </snmp-collection>
 
 </datacollection-config>

Any group of mibObjs defined on any datacollection-group can be referenced from any systemDef no matter on which file it exists. This facilitates the maintenance and the ability to define a specific set of mibObj within a group only once, and add a reference to it from many systemDefs that can be defined on different files. For this reason, this is not a simple import, the parsing behaves differently from other files like eventconf.xml.

Each import statement used inside snmp-collection will contain a reference to indicate which systemDef, or a set of systemDefs, must be imported. You can import:

- All systemDefs defined on a specific file (as the above example).

- A specific systemDef

  <include-collection systemDef="Cisco Routers"/>

- All systemDefs except those that match a list of regex for the systemDefs that must be excluded.

  <include-collection dataCollectionGroup="Cisco">
     <exclude-filter>^Cisco PIX.*</exclude-filter>
     <exclude-filter>^Cisco AS.*</exclude-filter>
  </include-collection>

All elements directly defined inside the SNMP collection will have precedence over the content, with the same name, defined on external files. This is a way to “override” certain information from the external files. For example, if the snmp-collection contains a systemDef called "Cisco Routers", this will take precedence over the same systemDef defined on datacollection/cisco.xml when importing the datacollection-group named "Cisco".

It is important to use different names for systemDef and groups across all datacollection-group files to avoid inclusion problems, because the order on which those files are readed is not be fixed. The only rule for precedence was explained before but if there are two or more groups or systemDefs with the same name on different datacollection-group files, there is no deterministic way to be sure which one will have more precedence over another.

If the feature storeByGroup is being used, keep in mind that it is not recommended to change an existing group of mibObj because that will prevent the update of the RRDs. This can only be done when storeByGroup is defined as "false" inside opennms.properties.

nsclient-datacollection-config.xml

First, a simple example:

<nsclient-datacollection-config rrdRepository="/opt/opennms/share/rrd/snmp/">
  <nsclient-collection name="default">
    <rrd step="300">
      <rra>RRA:AVERAGE:0.5:1:8928</rra>
      <rra>RRA:AVERAGE:0.5:12:8784</rra>
      <rra>RRA:MIN:0.5:12:8784</rra>
      <rra>RRA:MAX:0.5:12:8784</rra>
    </rrd>

    <wpms>
      <!--  A group for collecting processor stats.
        Check the keyvalue "% Processor Time" - if it's there (should be) collect this whole group.
      Check every recheckInterval milliseconds (3600000 = 1hr) -->
      <wpm name="Processor" keyvalue="\Processor(_Total)\% Processor Time" recheckInterval="3600000">
      	<!--  Collect these attributes.  Name is the name to pass to NSClient.  
      	Alias is the local name for the RRD file 
      	Type is used to convert values around
      	maxval/minval are optional-->
      	<attrib name="\Processor(_Total)\% Processor Time" alias="cpuProcTime" type="Gauge"/>
      	<attrib name="\Processor(_Total)\% Interrupt Time" alias="cpuIntrTime" type="Gauge"/>
      	<attrib name="\Processor(_Total)\% Privileged Time" alias="cpuPrivTime" type="Gauge"/>
      	<attrib name="\Processor(_Total)\% User Time" alias="cpuUserTime" type="Gauge"/>
      </wpm>
  </nsclient-datacollection>
</nsclient-datacollection-config>

As for datacollection-config.xml, the name attribute specifies a name that must be matched to the key="collection" value in the collectd configuration file. Similarly, the RRD section has the same syntax and meaning as in datacollection-config.xml; see the earlier section on that for details.

The performance monitor counters to collect are defined in the <wpms> section. Groups of counters are defined within a <wpm> tag. Each <wpm> has:

name 
arbitrary and for your own purposes
keyvalue 
if the keyvalue perfmon counter can be obtained from the agent, then the rest of the counters in the group are collected as well.
recheckInterval 
The presence of the key value is rechecked every recheckInterval milliseconds, to avoid causing undue load on the server checking for non-existent values.

Note that the value obtained from the keyvalue is not stored, unless explicitly mention in an additional attrib

Perfmon counters that should actually be collected and stored are defined in an attrib tag, which has the following parameters:

name 
The performance counter to collect. This name is the full path to the counter, which is typically \<section>(<specific_instance>)\<counter>. Specific instance is only used where there are more than one of a counter available. For example, in the Processor section the specific index can be either 0-(num processors-1), or _Total to see the total of counters across all instances. See the example configuration files for other examples of syntax in specifying the counter name.
alias 
This is the same as in the mibObj tag in datacollection-config.xml, and defines the name of the RRD data item that will be stored. RRD limitations require it to be 19 characters or less in length.
type 
Again the same as for mibObj in datacollection-config.xml, defining the interpretation of the data point. A "gauge" is a point in time value, e.g Processor usage, where as a "counter" is for monotonically increasing counter values such as "number of http requests".

jmx-datacollection-config.xml

Again, we start with an example:

<?xml version="1.0"?>
<jmx-datacollection-config
    rrdRepository = "/opt/opennms/rrd/snmp/">
    <jmx-collection name="jboss"
        maxVarsPerPdu = "50">
        <rrd step = "300">
            <rra>RRA:AVERAGE:0.5:1:8928</rra>
            <rra>RRA:AVERAGE:0.5:12:8784</rra>
            <rra>RRA:MIN:0.5:12:8784</rra>
            <rra>RRA:MAX:0.5:12:8784</rra>
        </rrd>
      
        <mbeans>   
          <mbean name="SystemInfo" objectname="jboss.system:type=ServerInfo">  
              <attrib name="FreeMemory"   alias="FreeMemory"       type="gauge"/> 
              <attrib name="TotalMemory"  alias="TotalMemory"      type="gauge"/>  
          </mbean> 
        </mbeans>
   </jmx-collection>
</jmx-datacollection-config>

The initial tags have the same layout and meaning as for SNMP (datacollection-config.xml) and NSClient (nsclient-datacollection-config.xml). The top level tag defines where RRD data is stored, the jmx-collection tag has a name that matches a service configuration in collectd-configuration.xml, and the RRD configuration has exactly the same syntax and meaning.

Actual data values to collect are defined within the mbeans tag. This tag has a list of mbean tags that represent the MBeans to collect. Each mbean tag has:

name 
An arbitrary name for your own use
objectname 
The object name used to identify the desired object to the JMX agent

Within each mbean tag, the attributes of that obtained object that should be collected are defined in attrib tags. Each attrib has:

name 
The name of the attribute to get out of the mbean object
alias 
This is the same as in the mibObj tag in datacollection-config.xml, and defines the name of the RRD data item that will be stored. RRD limitations require it to be 19 characters or less in length.
type 
Again the same as for mibObj in datacollection-config.xml, defining the interpretation of the data point. A "gauge" is a point in time value, e.g Processor usage, where as a "counter" is for monotonically increasing counter values such as "number of http requests".

http-datacollection-config.xml

Again, and example:
<?xml version="1.0" encoding="UTF-8"?>
<http-datacollection-config  
    xmlns:http-dc="http://xmlns.opennms.org/xsd/config/http-datacollection" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://xmlns.opennms.org/xsd/config/http-datacollection http://www.opennms.org/xsd/config/http-datacollection-config.xsd" 
    rrdRepository="@install.share.dir@/rrd/snmp/" >
  <http-collection name="doc-count">
    <rrd step="300">
      <rra>RRA:AVERAGE:0.5:1:8928</rra>
      <rra>RRA:AVERAGE:0.5:12:8784</rra>
      <rra>RRA:MIN:0.5:12:8784</rra>
      <rra>RRA:MAX:0.5:12:8784</rra>
    </rrd>
    <uris>
      <uri name="document-counts">
        <url path="/test/resources/httpcolltest.html"
             user-agent="Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/412 (KHTML, like Gecko) Safari/412" 
             matches=".*([0-9]+).*" response-range="100-399" >
        </url>
        <attributes>
          <attrib alias="documentCount" match-group="1" type="counter32"/>
        </attributes>
      </uri>
    </uris>
  </http-collection>
</http-datacollection-config>

Where Does All the Data Go?

In the last section, RRD files were mentioned pretty often. Where do they go? Well, they go into the RRD repository, defined in datacollection-config.xml, which by default is /var/opennms/rrd/snmp.

For each node for which data is collected, there will exist a directory that consists of the node number. Thus, if the system was collecting data on node 18, there would be a directory called /var/opennms/rrd/snmp/18.

RRDs that are collected for that node (i.e, the node OID matches the system's sysoidMask, and the mibObj of the groups included in that system have ifType != ignore) will be present in this directory. The files will be names with the alias defined in the mibObj element, plus ".rrd" (JNI) or ".jrd" (JRobin). For example: cpuPercentBusy.rrd and memorySize.rrd. The extension depends whether the RRD is configured to use JNI or JRobin (the default now).

For each interface on the node that is being used for data collection, a subdirectory will exist consisting of the interface description (ifDescr) and the MAC address. The MAC address was added because on some switches, multiple ports will have the same ifDescr. There have been tales of devices where the interfaces had both the same ifDescr and MAC address, and at the moment no solution exists for that case. So, if on node 18 there was an interface described as "eth0", its RRD directory would be /var/opennms/rrd/snmp/18/eth0-[MAC Addr.]. Into that directory would go all interface specific RRD files such as ifInOctets.rrd and ifOutOctets.rrd.

Troubleshooting

Here are a few tips to help troubleshooting SNMP data collection issues.

verify snmp access to device

Verify that the node supports SNMP and is reachable from your opennmns server. First try to ping the device. If successfull routing to the device is working.

Try snmpwalk like

snmpwalk -v 2c -c secret nodename

from the server opennms is running on. There might be firewall or routing issues if you try it from another machine.

If this fails check if the device has configured snmp access as expected, check the snmp community, snmp version (might be Version 1, 2c or 3) and that there are no firewalls, access-lists or whatever denying access to the device.

verify opennms snmp access to device

If still no SNMP information shows up on the node page in the WebUI, check the snmp-config.xml file to insure that the proper community name is configured (and as above insure that a given address is not included in multiple ranges, as only the first match will be used).

The next thing to check is the provisiond.log file. If this is a new installation, look to see where Provisiond tested that device. If it is an old installation, you can force a rescan from the node page, and this should create new logs.

Look to see that the SNMP service was detected for that IP address. If not, check the SNMP community name once again. Play with it until a rescan does lead to its detection.

If you have gotten this far, then SNMP information from the system tree should show up on the node page.

verify snmp oid to collect

The next error to look for will be something like:

IfTable: snmpTimeoutError for: ipaddress

This would indicate that something is wrong as we try to get the ipAddrTable and ifTable information.

Two things to try here:

  1. Run "snmpwalk -c community_name ipaddress". This should walk the entire SNMP MIB for that device. Some UCD SNMP agents by default will only return the system tree.
  2. Try forcing the version to version 1 in snmp-config.xml and doing a rescan on the node. The ifTable and ipAddrTable can be large, and thus benefit from using the SNMPv2 GET-BULK command. However, we have seen on at least one device that something gets fragmented with the command and we never get to see the tables. If this happens and is fixed by setting the version to 1, please, please, please report it and if possible get a tcpdump of the SNMP packets sent during the Provisiond scan. Note that the snmpwalk command from the command line uses SNMPGET from version 1 and will not reproduce a problem with version 2.

If you have a valid ifIndex (it will be displayed on the interface page of the WebUI), then you should be able to collect SNMP information. Check the database:

  1. Run "psql -U opennms opennms".
  2. At the command prompt, run "SELECT * FROM ipInterface WHERE nodeid=x;" and "x" is the node's ID number.
  3. Check to see if at least one interface is marked as primary ("P").
  4. To exit, type "\q"

If no IP addresses are listed as primary, check your collectd configuration file to insure that at least one IP address that supports SNMP is included in a package. Correct the omission and rescan the node.

Up to this point, you should be checking the provisiond.log for errors. For the next steps, start looking at collectd.log

verify collectd is collecting your data

Looking at collectd.log for the primary interface of your node, you should see attempts being made to collect via that interface. While the datacollection-config.xml file controls data collection, by default any sysObjectID that starts ".1.3.6.1.4.1", which is to my knowledge all of them, will match the mib-2 group which collects on ifInOctets, ifOutOctets, ifInErrors, ifOutErrors, and ifOutDiscards. See if there are any useful log messages (such as timeouts, etc.) that can give you a clue.

You may also enable debugging in $OPENNMS_HOME/etc/log4j2.xml (log4j.properties before OpenNMS 14) to get more detailed logs:

 <KeyValuePair key="collectd"             value="DEBUG" />

If there is nothing to see that opennms is trying to collect the desired data look into the documentation and your configuration for $OPENNMS_HOME/etc/datacollection-config.xml and $OPENNMS_HOME/etc/collectd-configuration.xml.

check if collected data is written to rrd files

Finally, look in the /var/opennms/rrd/snmp/nodeid directory where nodeid is the node ID number for the device you are interested in. You should see *.jrb files being updated, and you can use this command to see if the RRD actually contains data:

$OPENNMS_HOME/bin/jrobin-inspector

If the files in the /var/opennms/rrd/snmp/nodeid directory end in ".rrd", you have configured opennms to use RRD instead of jRRD-Tools, which has a slightly different file format (jRRD ist the default since version 1.3.2). Use

rrdtool dump RRDfilename

to view collected data.

If there still is no data check your rrd configuration in $OPENNMS_HOME/etc/opennms.properties<code>. If there is a line like

org.opennms.rrd.storeByGroup=false

then there should be a *.rrd or *.jrd file for every mib variable you collect. If the line looks like this

org.opennms.rrd.storeByGroup=true

different mib variables are written to a common file. If you add new mib values to collect you have to delete this file so opennms has to create a new one including the new mib values. You will lose all data that is in this file!

Conclusion

SNMP Data Collection in OpenNMS is one of the more difficult things to set up. Once configured, however, the process can be completely automatic. It is hoped that this How-To has proved useful. Please direct corrections and comments to the author.

What Now?