FAQ-Configuration

From OpenNMS
Jump to: navigation, search

Contents

Q: What monitors are currently shipped with OpenNMS?

Here is a list of the monitors.

Q: How Do I Get OpenNMS to Collect Data from All SNMP Interfaces?

A: OpenNMS has the concept of primary and secondary SNMP interfaces. These interfaces must by definition have an IP address. By default the interface with the lowest IP address is designated as primary, and becomes the interface on which all SNMP data collection is performed for the node.

By default, performance data is collected only for primary and secondary SNMP interfaces. You can choose other interfaces (particularly non-IP ones) whose data should be collected from the web UI by selecting "Configure SNMP data collection per interface". If you would like to collect data on all of the interfaces on the node, you need to make a change to the /opt/OpenNMS/etc/datacollection-config.xml file:

Change:

snmpStorageFlag = "select"

to

snmpStorageFlag = "all"

...and restart OpenNMS.

Note: Depending on the network, this will result in far more RRD files in /var/opennms/rrd (default). If you are low on disk space or if your server's disk subsystem is not very fast, you may want to reconsider this option.

Q: How Can I Change the Size of the RRD Files for Data Collection?

A: While this is not meant to be a tutorial on RRD, here is some information on how the data is stored in OpenNMS.

When a collection is first started, RRD reserves disk space to store all of the values it plans to collect. This has the benefit that the RRD file will never grow beyond its initial size, but the downside is that if you are collecting over a long period of time, this file will be large.

In the datacollection-config.xml file there is the following set of statements (in 0.9.2):

<step>300</step>
<rra>RRA:AVERAGE:0.5:1:8928</rra> 
<rra>RRA:AVERAGE:0.5:12:8784</rra>
<rra>RRA:MIN:0.5:1:8928</rra>
<rra>RRA:MIN:0.5:12:8784</rra>
<rra>RRA:MAX:0.5:1:8928</rra>
<rra>RRA:MAX:0.5:12:8784</rra>

The step defines the "unit" of collection in seconds: 300 secs or 5 minutes.

The <rra> tags define how the data will be stored. The RRA part stands for "Round Robin Archive". The next field states whether what will be stored is an Average, the Min or the Max of the samples collected. The 0.5 is a fudge factor that is hard to explain, so suffice it to say that it should just be there by default. The next two fields state the number of steps collected and over how many periods.

Thus suppose I am polling once every minute. Thus for my step size of 300 seconds I will have five values per step. The tag:

<rra>RRA:AVERAGE:0.5:1:8928</rra>

Says: Store the average of those five samples in the RRD file, and do this 8928 times. This is equal to 31 days of 5 minute samples.

Then the tag:

<rra>RRA:AVERAGE:0.5:12:8784</rra>

Says: After 31 days, store the average of 12 steps (or 12 five minute samples: an hour) 8784 times. This is equal to 366 days of 1 hour samples.

Now, since our default polling interval is one value every 5 minutes, the Min, Max and Average samples will be the same. You can get rid of them and save a bunch of disk space (and this will be the default in future releases):

<rra>RRA:AVERAGE:0.5:1:8928</rra>
<rra>RRA:AVERAGE:0.5:12:8784</rra>
<rra>RRA:MIN:0.5:12:8784</rra>
<rra>RRA:MAX:0.5:12:8784</rra>

Note: If you poll more frequently than the step size, you may want to keep MIN and MAX.

Plus, if you didn't need a year's worth of data, you could change that as well.

For daily samples (one value per day) for xxx days, you would use:

<rra>RRA:AVERAGE:0.5:288:xxx</rra>

etc.

Q: How do I use the new Java Mail API for notifications and Availability reports

A: See Notification enhancement.

Q: How can I remove (purge) old events?

A: This message discusses the history of this query.

DELETE FROM events WHERE NOT EXISTS (
    SELECT svclosteventid
    FROM outages
    WHERE svclosteventid = events.eventid
  UNION
    SELECT svcregainedeventid
    FROM outages
    WHERE svcregainedeventid = events.eventid
  UNION
    SELECT eventid
    FROM notifications
    WHERE eventid = events.eventid
);
 

As of 1.1.4, the vacuumd process will do most of this work automatically. Here is the statement that vacuumd executes (from HEAD as of 2006-03-16):

<statement>
  <!-- this deletes any events that are not associated with outages
       - Thanks to Chris Fedde for this -->
  DELETE FROM events WHERE NOT EXISTS (
    SELECT svclosteventid
      FROM outages
      WHERE svclosteventid = events.eventid  
    UNION 
      SELECT svcregainedeventid
      FROM outages
      WHERE svcregainedeventid = events.eventid 
    UNION 
      SELECT eventid
      FROM notifications
      WHERE eventid = events.eventid
  )
  AND eventtime < now() - interval '6 weeks';
</statement>

Q: I have an absurd number of open notices outstanding - how can I mass-acknowledge them all?

A: Extracted from http://marc.theaimsgroup.com/?l=opennms-discuss&m=114083961321614&w=2

The fastest method is to use psql and issue:

    UPDATE notifications 
       SET respondtime='now', 
           ANSWEREDBY='admin' 
     WHERE respondtime IS NULL;

like this (I prefer using the name sql):


### replace opennms with the user you made for postgres  for the opennms db if its not as below)
### Replace sql if you like with your username - but I prefer to use SQL for scripted acks 
psql -U opennms -c " UPDATE notifications SET respondtime='now', ANSWEREDBY='sql' WHERE respondtime IS NULL;"

Replace 'admin' by your username if appropriate.

Q: How does SNMP Data Collection Work?

A: The purpose of this note is to explain in detail how OpenNMS performs SNMP data collection. Knowing how this works is key to troubleshooting SNMP data collection problems.

Discovery

The discovery process in OpenNMS is really simple. We send out a "ping" in order to see if an IP address exists and is responsive (and which IP addresses we try are set in the discovery-configuration.xml file). When

that happens, a NewSuspect event is generated. It is also possible to use the send-event.pl script to generate NewSuspect events - bypassing discovery all together.

Service Detection

The Provisiond process uses detectors to scan IP addresses for particular capabilities. Each service that can be detected (and later monitored) has a detector configured in the appropriate foreign-source definition. Upon receipt of a newSuspect event, Provisiond invokes each configured detector to see if the named service exists on any interface of the new node.

When testing SNMP, Provisiond makes an attempt to retrieve the sysObjectID for the device using the community string and port defined in snmp-config.xml. Note that it takes the first valid match in snmp-config.xml for that IP address, something to look for if the address is included in multiple ranges.

If the sysObjectID is successfully retrieved, Provisiond gathers additional SNMP attributes from the system group, the ipAddressTable (if present), ipAddrTable (if ipAddressTable is not present), ifTable, and ifXTable.

If the ipAddressTable (or ipAddrTable) or ifTable are unavailable, the scan aborts (but the SNMP system data may show up on the node page - this happens frequently with Net-SNMP agents where only the system tree is available by default to a query using the "public" community string).

Second, all of the sub-target IP addresses in the ipAddressTable or ipAddrTable have all the configured service detectors run against them.

Third, every IP address in the ipAddressTable or ipAddrTable that supports SNMP is tested to see if it maps to a valid ifIndex in the ifTable. Each one that does is marked as a secondary SNMP interface and is a contender for becoming the primary SNMP interface.

Finally, all secondary SNMP interfaces are tested to see if they match a valid package in the collectd-configuration file. If more than one valid IP address meets all three criteria (supports SNMP, has a valid ifIndex and is included in a collection package), then the lowest-numbered IP address is marked as primary. All SNMP data collection is performed via the primary SNMP interface.

(Note: in the future we will have the ability to change to a secondary SNMP interface should the primary become unavailable).

When the Provisiond node scan and service detectors are completed, events are generated, including nodeGainedService events.

Collection

Data collection is handled via the collectd process. collectd listens for nodeGainedService events for the SNMP "service". When this happens, it checks to see if the primary SNMP interface for that node exists in a collection package (which it should by definition). If so, the SNMP collector is instantiated for that IP address.

The SNMP collector in each collectd package will have a parameter key called "collection" with a value that points to an snmp-collection defined in the datacollection-config.xml file. It is beyond the scope of this note to describe in detail this file, but I will hit on the highlights.

The datacollection-config.xml file:

  1. Determines if SNMP data will be collected on "all" interfaces or just the "primary" via the snmpStorageFlag. Note that all SNMP requests will still be sent via the primary SNMP interface's IP address.
  2. Is fed by "tributary" files found in the etc/datacollection directory
  3. Determines the structure of the RRD files that will be produced.
  4. Matches "systems" defined by their sysObjectID values to "groups" which define which SNMP OIDs will be collected upon.

Once the OIDs that will be collected are determined, SNMP data collection should start and files will be created in the rrdRepository, which by default is /var/opennms/rrd/snmp.

Under this directory, a sub-directory will be created by nodeid. Thus information for node number "3" will be in the /var/opennms/rrd/snmp/3 directory.

For each interface on the node, another sub-directory will be created labeled as the ifDescr plus the physical (MAC) address of the interface (to separate two interfaces with the same ifDescr).

All "node" level information (where ifType="ignore" in the data collection configuration) is stored in the node subdirectory. All interface level information (such as ifInOctets, ifOutErrors, etc.) is stored in the subdirectory for the particular interface. Data for generic indexed resources is stored in subdirectories of subdirectories which are named for the resource's instance identifier and the resource-type name, respectively.

The RRD files are in the format of alias.rrd, where the "alias" is defined in the data collection configuration. These files can be deleted at any time and will be recreated as needed (of course, the data in the deleted files is lost).

Reporting

On the OpenNMS home page there is a Performance Report pull down menu, where every node that is collecting SNMP information is listed. This is determined in two steps:

  1. The snmp-graph.properties file defines a set of "standard" reports (it can be modified to include your own). For each report, the report "columns" are defined.
  2. The WebUI searches through the /var/opennms/rrd/snmp directory tree to see if RRD files exist that match all of the columns for one or more reports. If so, that node will be displayed in the pull-down list.

So, if you want to see the "Bytes In/Out" report, your node must have an interface that contains RRDs for both ifInOctets and ifOutOctets. If the created RRDs do not match any reports, you can still run custom performance reports on the data, but the node will not be accessible from the main page pull down menu.

Notes:

  1. The columns must match the file names as said (which are defined in data collection) and, in the report definitions, they are referred to by special keywords {rrd1}, {rrd2} etc.
  2. If you want to understand how to define reports, a must-see are rrdgraph and rddgraph_data and related reference pages.

Thresholding

The threshd process is in charge of thresholding. Like collectd, it listens for NodeGainService? events. If the primary IP address for a node that gains SNMP as a service is in a thresholding package, threshd will then search the RRD repository directory for that node (or it's interfaces) to see if any RRDs exist in the form ds-name.rrd, where ds-name is defined in thresholds.xml.

If so, the process will scan the RRD to see if a threshold has been exceeded and generate events accordingly.

Troubleshooting

The whole purpose of this note was to aid in troubleshooting SNMP data collection problems.

If a node supports SNMP (as verified by an "snmpwalk") but no SNMP information shows up on the node page in the WebUI, check the snmp-config.xml file to insure that the proper community name is configured (and as above insure that a given address is not included in multiple ranges, as only the first match will be used).

The next thing to check is the provisiond.log file. If this is a new installation, look to see where Provisiond scanned that device. If it is an old installation, you can force a rescan from the node page, and this should create new logs.

Look to see that the SNMP service was detected for that IP address. If not, check the SNMP community name once again. Play with it until a rescan does lead to its detection.

If you have gotten this far, then SNMP information from the system tree should show up on the node page.

The next error to look for will be something like

Aborting node scan : Agent timed out while scanning the system table

Or

Aborting node scan : Agent timed out while scanning the IP address tables

This would indicate that something is wrong as we try to get the ipAddrTable and ifTable information.

Two things to try here:

  1. run "snmpwalk -c community_name ipaddress". This should walk the entire SNMP MIB for that device. Some UCD SNMP agents by default will only return the system tree.
  2. Try forcing the version to version 1 in snmp-config.xml and doing a

rescan on the node. The ifTable and ipAddrTable can be large, and thus benefit from using the SNMPv2 GET-BULK command. However, we have seen on at least one device that something gets fragmented with the command and we never get to see the tables. If this happens and is fixed by setting the version to 1, please, please, please report it and if possible get a tcpdump of the SNMP packets sent during the Provisiond scan. Note that the snmpwalk command from the command line uses SNMPGET from version 1 and will not reproduce a problem with version 2.

If you have a valid ifIndex (it will be displayed on the interface page of the WebUI), then you should be able to collect SNMP information. Check the database:

  1. Run "psql -U opennms opennms".
  2. at the command prompt, run "select * from ipinterface where nodeid=x;" and "x" is the node's ID number.
  3. Check to see if at least one interface is marked as primary (P).
  4. To exit, type "\q".

If no IP addresses are listed as primary, check your collectd configuration file to insure that at least one IP address that supports SNMP is included in a package. Correct the omission and rescan the node.

Up to this point, you should be checking the provisiond.log for errors. For the next steps, start looking at collectd.log.

Looking at collectd.log for the primary interface of your node, you should see attempts being made to collect via that interface. While the datacollection-config.xml file controls data collection, by default any sysObjectID that starts " .1.3.6.1.4.1", which is to my knowledge all of them, will match the mib-2 group which collects on ifInOctets, ifOutOctets, ifInErrors, ifOutErrors, and ifOutDiscards. See if there are any useful log messages (such as timeouts, etc.) that can give you a clue.

Finally, look in the /var/opennms/rrd/snmp/nodeid directory where nodeid is the node ID number for the device you are interested in. You should see RRD files being updated, and you can use

rrdtool dump RRDfilename

to see if the RRD actually contains data.

Q: How Do I Create Custom Reports in OpenNMS?

A: See SNMP Reports How-To.

Q: How Can I Speed Up the service detection Process?

Okay, there are a few things you can do to speed up service detection.

First, note that the "discovery.log" only reflects those things that have responded to "ping". This will generate newSuspect events, which are then received by the Provisioning Daemon (Provisiond).

Service detectors for Provisiond are configured in foreign-source definitions. For nodes added via discovery, Provisiond uses the default foreign-source which you can edit from the web by going to Admin / Manage Provisioning Requisitions and clicking the Edit Default Foreign Source Definition button at the top of the page. This same default foreign-source is used for nodes in requisitions whose corresponding foreign-source definition has not been customized.

The pristine default foreign-source definition probably contains detectors for services that you don't care about. You can remove any of these that you know are not useful to you.

Also, if you have discrete firewalls or are using portsentry or iptables in your network, and your rules tend toward silently dropping TCP connections rather than sending RST, this will slow things down as well. The main delay is in the timeouts detecting services that aren't there.

Note that Provisiond will create a node before all servie detection is done, so if you're not seeing services that you expect to be present, wait a few moments and refresh the node details page.

Q: How Can I make OpenNMS show all times in my local timezone?

A: See OpenNMS_time_settings.

Q: What are the possible parameters in events and notifications?

A: See event substitutions.

Q: What does the event mask mean?

A: When an event gets into the OpenNMS event subsystem, it typically contains only information specific to that occurence of the event. Other information for the event (severity, description, autoaction etc.) are picked up from the eventconf.xml.

To pick up relevant information, the UEI (Universal Event Identifier) is used by default. However, this does not work in some cases, like in the case of SNMP traps - traps get converted into events and are sent into the event subsystem - in order to facilitate picking up relevant information for the traps and other similar events, the event masks are used.

The eventmask for a trap would be set up as follows(for example for the warm start):

<event>
  <mask>
    <maskelement>
      <mename>id</mename>
      <mevalue>.1.3.6.1.6.3.1.1.5.2</mevalue>
    </maskelement>
  </mask>
  <uei>uei.opennms.com/traps/SNMP_Warm_Start</uei>
  <descr>
    A warmStart trap signifies that the sending protocol
    entity is reinitializing itself such that neither the
    agent configuration nor the protocol entity
    implementation is altered.
  </descr>
  <logmsg dest="logndisplay">
    Agent Up with No Changes (warmStart Trap) enterprise:%id%
    (%id%) args(%parm[##]%):%parm[all]%
  </logmsg>
  <severity>Normal</severity>
</event>

The 'mask' contains multiple mask elements each with a name and value. The 'mename' can only be one of the following sub-elements of an event:

  • uei
  • source
  • host
  • snmphost
  • nodeid
  • interface
  • service
  • id (this is the SNMP enterprise ID)

For traps, the name would be the id (the SNMP enterprise id).

Note that the values(mevalue) can be set up to be an exact match (as is the case above) or can end with a '%' in which case the configured value just needs to be a substring of the actual value in the event. For example, if you wanted to simply ignore all extraneous events generated for interfaces in your internal network, you could do a

<event>
  <mask>
    <maskelement>
      <mename>interface</mename>
      <mevalue>192.168.0.%</mevalue>
    </maskelement>
  </mask>
  <uei>http://uei.opennms.com/events/internalnetwork</uei>
  <descr>
    An event occured on interface %interface% on the internal
    network
  </descr>
  <logmsg dest="logndisplay">
    An event occured on interface %interface% on the internal network
  </logmsg>
  <severity>Normal</severity>
</event>

Note the 'extraneous' events--events like 'nodeGainedService' etc. that have a matching uei entry in eventconf will get that information, only events that did not match any other will fall through to this mask

No attempt is made to 'order' event masks for the same UEI for a match in the eventconf. A best fit from the eventconf.xml is basically the first event that fits. The ordering of the events in the eventconf.xml is the reponsiblity of the user - for e.g. if a mask for an UEI has interface and service and another mask for the same UEI has just interface, its the user's responsiblity to order them so the one whose mask has both interface and service occurs before the one with just the interface(if that's the functionality required)

Q: What can I use in "filters" and "rules"?

A: See Filters.

Q: Are There Any Options I Can Use with the Java JDK?

Note: this FAQ entry is out of date. I do not believe that any of these options work today. Dgregor 19:38, 15 February 2006 (CST)

A: OK this is kind of minor... but this patch to the opennms start script (as it exists pre-build) adds the following env variables to configure garbage collecting...

USE_CONCGC 
Use Concurrent Mark Sweep GC
USE_PARALLELGC 
Use Parallel GC (multi-processor only, untested by me)
LOG_GC 
Log GC to @root.install.logs@/gc.log (might be helpful for dev)

I was adding concurrent mark sweep and decided to include the other two while I was at it. Here are descriptions of the JVM options I use. Note that the top two require Sun J2SDK 1.4.1; the bottom I believe requires 1.4.0.

-XX:+UseConcMarkSweepGC 
This flag turns on concurrent garbage collection. This collector executes mostly concurrently with the application. It trades the utilization of processing power that would otherwise be available to the application for shorter garbage collection pause times.
-XX:+UseParallelGC 
This flag enables garbage collection to occur on multiple threads for better performance on multiprocessor machines."

From [1]:

On all platforms, new option -Xloggc:file logs each garbage-collection event in the specified file.
http://java.sun.com/j2se/1.4/docs/relnotes/features.html#tools
--- /home/nick/src-orig/source/tools/run/opennms.sh     Thu Sep 12 16:12:44 2002
+++ /home/nick/opennms-1.0.1-1/source/tools/run/opennms.sh      Mon Nov 11 16:46:57 2002
@@ -83,6 +83,15 @@
 if [ -n "$USE_INCGC" -a "$USE_INCGC" = true ] ; then
        MANAGER_OPTIONS="-Xincgc $MANAGER_OPTIONS"
 fi
+if [ -n "$USE_CONCGC" -a "$USE_CONCGC" = true ] ; then
+        MANAGER_OPTIONS="-XX:+UseConcMarkSweepGC $MANAGER_OPTIONS"
+fi
+if [ -n "$USE_PARALLELGC" -a "$USE_PARALLELGC" = true ] ; then
+        MANAGER_OPTIONS="-XX:+UseParallelGC $MANAGER_OPTIONS"
+fi
+if [ -n "$LOG_GC" -a "$LOG_GC" = true ] ; then
+        MANAGER_OPTIONS="-Xloggc:@root.install.logs@/gc.log $MANAGER_OPTIONS"
+fi
 if [ -n "$HOTSPOT" -a "$HOTSPOT" = true ] ; then
        JAVA_CMD="$JAVA_CMD -server"
 fi
)

Q: How do I use the Alamin SMS Gateway with OpenNMS?

A: First, get and install the SMS gateway from Alamin.

Then modify your notificationCommands.xml file to read:

  <command>
     <name>mobilePhoneSMS</name>
     <execute>/usr/bin/gsgc</execute>
     <comment>for sending GSM messages (SMS)</comment>
     <argument streamed="false">
        <substitution>--send</substitution>
     </argument>
     <argument streamed="false">
        <switch>-np</switch>
     </argument>
     <argument streamed="false">
        <switch>-tm</switch>
     </argument>
  </command>

Q: How Do I Configure OpenNMS/Tomcat to Use SSL?

A: From the OpenNMS Discuss List:

First create a certificate keystore for tomcat by executing $JAVA_HOME/bin/keytool -genkey -alias tomcat -keyalg RSA

the password should be "changeit".

Then uncomment the "SSL HTTP/1.1 Connector" entry in

$CATALINA_HOME/conf/server.xml. ( even move to obscure port if desired.)

Restart httpd, tomcat and opennms just to be sure and after what seemed too long, I was prompted for user name and password. It seems only the first time connecting is real slow, subsequent logins are quite speedy. I wanted to add a few comments to this.

First, if you want to use a password other than "changeit", add the keystorePass attribute:

   <Connector className="org.apache.coyote.tomcat4.CoyoteConnector"
              port="8443" minProcessors="5" maxProcessors="75"
              enableLookups="true"
              acceptCount="100" debug="0" scheme="https" secure="true"
              useURIValidationHack="false" disableUploadTimeout="true">
     <Factory className="org.apache.coyote.tomcat4.CoyoteServerSocketFactory"
              clientAuth="false" protocol="TLS" keystorePass="notchangeit"/>

Second, you will still need to have a non-SSL connection on 8080 in order for RTC to post the category information to Tomcat.

Q: How can I use OpenNMS to send pages?

A: From Rajesh Bhandari on the discuss list:

Any recommendations on how to configure text paging of notifications to use a modem and phone line to an external service i.e. Skytel?

OpenNMS already has configurations to use qpage (http://www.qpage.org). You can see them in $OPENNMS_HOME/etc/notificationCommands.xml. I set up users on onms, then setup the users (with the same name) in /etc/qpage.cf. That's pretty much all I had to do in onms.

On the qpage side, you need to configure the pager-ids and the service. The key here is that the number used to send alpha pages is not the pager number but a different number talking something called TAP. So when you look for the number for Skytel, remember to look for the TAP number.

And finally, the PC I installed onms on had a built-in modem using the Conexant/Rockwell HCF chipset. I spent a week trying to get this to work (it would dial, and then hang up, or connect and give errors) before getting an old-style external USR modem. It worked the first time! Of course, YMMV.

Q: How Do I Delete an Interface?

A: Note: From version 1.1.4 you can delete an interface from the webUI as an admin user.

In order to delete an interface, do the following, substituting [ipaddr] for the IP Address of the interface you wish to delete:

stop opennms
psql -U opennms opennms
delete from usersnotified where notifyid in ( select notifyid from notifications where eventid in (select eventid from events where ipaddr='[ipaddr]']));
delete from notifications where eventid in (select eventid from events where ipaddr='[ipaddr]');
delete from outages where svclosteventid in (select eventid from events where ipaddr='[ipaddr]');
delete from events where ipaddr='[ipaddr]';
delete from snmpinterface where ipaddr='[ipaddr]';
delete from ifservices where ipaddr='[ipaddr]';
delete from ipinterface where ipaddr='[ipaddr]';
\q

Then you may want to run:

sudo -u postgres vacuumdb opennms

before restarting OpenNMS.

Alternate Method To delete an interface

This also can be completed with send-event.pl and a python script with the interface and nodeid in a file called my_device_list in the format with as many lines as you like.

ipaddress,nodeid

To get this format you can do this in psql

\f,
\a
\t
\o my_device_list
select ipaddr,nodeid from ipinterface where ipaddr::cidr <<='192.168.5.0/24'::cidr;
\o 
#!/usr/bin/env python
import sys
import os
f = open('my_device_list','r')
for line in f:
   dev_ip, dev_nodeid = line.split('|')
   dev_nodeid = dev_nodeid[:-1]
   commandLine = 'send-event.pl  uei.opennms.org/nodes/deleteService --nodeid %s --interface %s  --service ICMP' % (dev_nodeid, dev_ip)
   os.system(commandLine)
   commandLine2 = 'send-event.pl  uei.opennms.org/nodes/deleteService --nodeid %s --interface %s --service StrafePing' % (dev_nodeid, dev_ip)
   os.system(commandLine2)
   commandLine3 = 'send-event.pl  uei.opennms.org/nodes/deleteService --nodeid %s --interface %s --service HTTP' % (dev_nodeid, dev_ip)
   os.system(commandLine3)
   commandLine4 = 'send-event.pl  uei.opennms.org/internal/capsd/deleteInterface --nodeid %s --interface %s' % (dev_nodeid, dev_ip)
   os.system(commandLine4)
   commandLine5 = 'send-event.pl uei.opennms.org/internal/capsd/deleteNode --nodeid %s' % (dev_nodeid)
   os.system(commandLine5)
sys.exit(0)

Replace the services or service string with whatever services are being monitored.

Q: Can I run OpenNMS as a non-root user?

A: Currently (as of version 1.1.3) only tomcat can run as a non-root user (with some twiddling).

The difficulty with the core of OpenNMS is that these components need to run as root to be able to bind to low-numbered ports or generate network traffic that requires root:

  • the ICMP poller (root is needed to open up a raw socket to send/receive ICMP);
  • the DHCP poller (it needs to bind to a port < 1024);
  • the SNMP trap daemon (it needs to bind to a port < 1024).

Here is an example of the error you will get if you try to startup OpenNMS as a non-root user:

2004-08-28 09:47:14,405 ERROR [main] Discovery: Failed to create ping manager: \
 java.net.SocketException: System error creating ICMP socket (1, Operation not permitted)

Q: How To Configure Thresholds Within OpenNMS?

A: Let's assume that we want to send an alert when hrSystemProcesses exceeds 250. The numeric OID is .1.3.6.1.2.1.25.1.6 . In this case, said collection definition is listed in my "/opt/OpenNMS/etc/ datacollection-config.xml" file.

So what we need to do firstly is edit: "/opt/OpenNMS/etc/thresholds.xml". Under the "default-snmp" group, add:

<threshold type="high" ds-name="hrSystemProcesses" ds-type="node"
                value="250" rearm="200" trigger="3"/>

It means the following:

vallue="250" 
Maximum processes threshold running on the system.
rearm="200" 
Rearm value, the alert status is removed if the number of processes dropped below this value.
trigger="3" 
Trigger value. How many times OpenNMS has to probe the node consequently to find that it has exceeded its threshold value.

Edit "/opt/OpenNMS/etc/threshd-configuration.xml" and ensure that the package = example1 definition is uncommented and includes your monitored node range.

Finally, from the ONMS web console goto :- Admin -> Configure Notifications -> Event Notifications. Add a new event with the type of 'highThresholdExceeded' - the values to be inserted here are rather trivial.

Finally, start running loads of applications so that your hrSystemProcesses goes over 250. You can check this variable by using the snmpwalk utility from the Net-SNMP package.

snmpwalk -c community_string -v 2c 127.0.0.1 hrSystemProcesses

Substitute your community string and IP address accordingly. Check your /var/log/opennms/{threshd,notifd}.log logfiles for debugging output. Once you have managed to get in excess of 250 processes running, wait for about 15-20 minutes and you should receive an alert. Remember that the trigger value (3) multiplied by OpenNMS's default collection interval (300 seconds) means that the threshold must be exceeded for 15 minutes prior to a (low|high)ThresholdExceeded event being generated.

In certain cases where I want multiple different nodes to have different threshold values, I have found no easy way to accomodate this within OpenNMS itself. What I have done in such cases is write an external bash script which does the SNMP polling, and in the event of a threshold excess - send a snmptrap to OpenNMS for its event consolidation purposes.

This worked for me after a bit of trial and error, HTH.

see the Thresholding page for more information - http://www.opennms.org/wiki/Thresholding

Q: How can I remove the OpenNMS database and recreate it without reinstalling OpenNMS?

A: First make sure that there are no connections to the opennms database. This you usually achieve by stopping the OpenNMS service:

service opennms stop

Then you can use the install script to remove the database:

/opt/opennms/bin/install -dZ

Or you can drop the DB with one of the ProgreSQL commands:

dropdb -U opennms -W opennms

You probably also want to remove all collected data:

rm -fr /opt/opennms/share/rrd/{response,snmp}/*

and definitions of all Provisioning Groups and Foreign Sources:

rm -fr /opt/opennms/etc/{imports,foreign-sources}/*

Then you can re-create a clean database by this command:

/opt/opennms/bin/install -dis

On the end start up the OpenNMS service:

service opennms start

Q: How do I configure Net-SNMP to work with OpenNMS?

A: By default, net-snmp is configured only to allow a really small amount of information to be accessed by the default community name of "public". The easiest way to change this is to find a line like:

  • NOTE: This appears to work with some implementations of net-snmp 5.4.1 (sunfreeware packages) with out modifications.
view systemview included .1.3.6.1.2.1.1

and change it to:

view systemview included .1

Then restart snmpd.

Q: How do I configure LDAP authentication with OpenNMS?

See LDAP Authentication

Q: How do I grant admin privileges to an user?

See User

Q: How can I use jabber (XMPP) notifications?

A: You'll need an jabber (XMPP) server, jabberd2 for example, works well. You'll need an account on this server for the OpenNMS daemon and all the users who you want to send xmpp notifications.

  1. Edit $OPENNMS_HOME/etc/xmpp-configuration.properties and add lines like these:
xmpp.server = xmpp.yourdomain.com
xmpp.user   = opennms-user
xmpp.pass   = opennms-password
  1. Add XMPP addresses for your users in users.xml (or via the WebUI - Home > Admin > Users and Groups > User List > Modify User). These addresses are of the format username@xmpp.yourdomain.com
  2. Create a destination path for XMPP notifications that uses the xmppMessage notification command, and configure your notices to use that destination path.

Q: How can I use jabber (XMPP) Group notifications?

  1. First follow:
    1. Jabberd2 and OpenNMS docs: http://www.opennms.org/index.php/How_to_configure_jabberd2_to_work_with_OpenNMS
    2. and Bottom of: http://www.opennms.org/index.php/FAQ-Configuration#Q:_How_can_I_use_jabber_.28XMPP.29_notifications.3F
  2. Go to Admin -> Configure Users and Groups -> Configure Users -> Add New User
    1. Give it a username, a password, and for XMPP Address the CHAT_ROOM_NAME@conference.server.ext
  3. Go to Admin -> Configure Notifications -> Configure Destination Paths -> New Path
    1. Give it a name, hit Edit, and add the username you just added (or the group if it's in a group)
  4. Go to Admin -> Configure Notifications -> Configure Event Notifications -> ... at the last step you can choose a path, select the one you created!

Q: Someone told me to use snmp4j instead of joesnmp, how do I do that?

A: If you don't already have one, create an opennms.conf file in your $OPENNMS_HOME/etc directory. Then add a line like -

ADDITIONAL_MANAGER_OPTIONS="-Dorg.opennms.snmp.strategyClass=org.opennms.netmgt.snmp.snmp4j.Snmp4JStrategy"

If you already have an opennms.conf file, and the ADDITIONAL_MANAGER_OPTIONS, just add the -Dorg.opennms.snmp.strategyClass=org.opennms.netmgt.snmp.snmp4j.Snmp4JStrategy within the quotes. You will then have to restart OpenNMS.

snmp4j has some better log messages and handles bad SNMP agents better.

Note: This only works for 1.3.1 and later version of OpenNMS

Q: Someone told me to change OpenNMS configure the data collector to "store by group", how do I do that?

Create a file called opennms.conf in the $OPENNMS_HOME/etc directory (if you haven't already) and add this entry:

ADDITIONAL_MANAGER_OPTIONS=-Dorg.opennms.rrd.storeByGroup=true

Note: In unstable versions prior to the 1.3.2 release timeframe, you need to omit the org.opennms, so that the property you define is instead rrd.storeByGroup.

Q: I don't get Availability Reports. Why isn't this working ?

Check out Debugging Availabilty Reports

Q: How do I configure OpenNMS to poll my radius server?

A: Try RadiusPoller.

Q: How do I setup the Postgresql database so that users can access it via Microsoft Access with an ODBC connection and have read-only rights?

A: You must setup the group and create the users. Also make sure that the pg_hba.conf file has the correct security permissions.


Setup the group “access”:

psql -U opennms opennms -c "CREATE GROUP access;"

And then just run psql interactively with these GRANTS for the group access.

CREATE GROUP access;
GRANT EXECUTE ON FUNCTION iplike(text, text) TO GROUP "access";
GRANT SELECT ON TABLE assets TO GROUP "access";
GRANT SELECT ON TABLE events TO GROUP "access";
GRANT SELECT ON TABLE ifservices TO GROUP "access";
GRANT SELECT ON TABLE ipinterface TO GROUP "access";
GRANT SELECT ON TABLE node TO GROUP "access";
GRANT SELECT ON TABLE notifications TO GROUP "access";
GRANT SELECT ON TABLE outages TO GROUP "access";
GRANT SELECT ON TABLE servermap TO GROUP "access";
GRANT SELECT ON TABLE service TO GROUP "access";
GRANT SELECT ON TABLE servicemap TO GROUP "access";
GRANT SELECT ON TABLE snmpinterface TO GROUP "access";
GRANT SELECT ON TABLE usersnotified TO GROUP "access";
GRANT SELECT ON TABLE vulnerabilities TO GROUP "access";
GRANT SELECT ON TABLE vulnplugins TO GROUP "access";

Then for each user run this…

CREATE USER user1 WITH PASSWORD 'userspassword' NOCREATEDB NOCREATEUSER;
ALTER GROUP access ADD USER user1 ;

Q: I have a device that is spamming OpenNMS with traps that I don't care about. How do I discard these traps?

A: Find the event configuration entry (in eventconf.xml or one of the referenced files) for the event UEI that you are getting spammed with. Make a copy of the <event>...</event> configuration and add it above the one that is already there. In the <mask>...</mask> section, add a <maskelement> element to match on the troublesome device, such as by node ID or the IP address on the device that is sending the traps. For example, for node ID 123:

<mask>
  ... other mask elements ...
  <maskelement>
    <mename>nodeid</mename>
    <mevalue>123</mevalue>
  </maskelement>
</mask>

Lastly, change the "dest" attribute on the <logmsg> element for this event configuration to "discardtraps" if you are running 1.3.0 or later, or "donotpersist" otherwise. The event configuration matching is done on a first-match basis, so having an entry to match and discard these troublesome traps (or at least not persist them to the database in the case of "donotpersist") before the normal event configuration entry will allow you to throw away the traps that you don't care about, while having the same traps from other systems processed normally.

You can see more information on this at internal events in the event configuration how-to.

Q: How do I prevent unwanted nodeDown notifications for nodes behind a router or switch when the router or switch goes down?

Check out Path Outage How-To.

Q: How do I delete a node?

A: Note: You can delete an Node from the webUI as an admin user.

Go to the node page and click on "Admin" and then on "Delete Node".

Notice: The deletion process is running scheduled. This means that after clicking on "Delete Node", the node is only marked for removing and is only removed from the webUI. After a while, the node is removed from the database with the vacuumd daemon.
Note: The collected data is not removed - you must delete the information from the /opt/opennms/share/rrd directory manually.

Be sure the node isn't in your discovery-configuration.xml anymore, or it will get rediscovered!

Q: Why are some services listed as Not Monitored?

A: Certain services are either useful primarily for data collection (e.g. SNMP), are pseudo-services (e.g. Router) that exist purely for informational purposes, or tend to be problematic, sometimes crashing or hanging devices when polled frequently (e.g. "Telnet"). These services are deliberately not monitored by the pollers in the default configuration. If you know that you definitely want to monitor such services, you may need to set the status attribute of the service definition to on and/or add a corresponding monitor definition in poller-configuration.xml.

Q: How can I monitor running processes on client system?

A: Process Monitoring and Collection will outline the configuration modifications required to accomplish this. You need to modify the datacollection-config.xml file and also a restart of OpenNMS will be required.

Q: How can I monitor an APC Ambient Temperature Sensor?

A: APC Ambient Temperature will outline the configuration steps required to do so and also has graph created for it. A restart of OpenNMS will be required after the change.

Q: How can I configure OpenNMS to use RRDTool instead of JRobin for latency and performance data?

A: Do not do this unless you fully understand the implications, which include losing all your historical latency and performance data and possibly decreased performance of openNMS on your system. You will need the appropriate platform-specific JNI library to allow openNMS to communicate with the C-language RRDTool libraries. Instructions for making this change are in the comments at the top of file rrd-configuration.properties.

Q: How can I monitor more resource of network appliance?

Q: Upgrading is difficult! Custom configuration must be hand-merged with upstream changes.

A: This is a known issue.