User talk:Ski98033

From OpenNMS
Jump to navigation Jump to search
⚠
This page is obsolete. Please see Mailing lists instead.

Newbie view of how OpenNMS works

  1. Once a day (or when OpenNMS is restarted), it discovers (discovery process) nodes on the defined networks by pinging them.
  2. For each possible node found by discovery, it checks for the capabilities (capsd process) of the device (e.g. web service, snmp, etc.)
  3. It then starts to monitor the capabilities (poller process) it found on the node by checking every 5 minutes. If it gets a failure, it checks every 30 seconds for 5 minutes and then starts checking at longer intervals until the capability is back up. Please note that we do NOT want to check SNMP via the poller process. We do it in the next step.
  4. Performance data (vs up/down data), is collected via SNMP (collectd process) and is independent of up/down data.
  5. Events are generated based on up/down status and on performance data thresholds (e.g. a web page should be returned in less than 1 second).
  6. Events can then generate notifications to users and groups depending on the event severity and what generated the event.
  7. Event severities are:
    • Critical (red): This event means numerous devices on the network are affected by the event. Everyone who can should stop what they are doing and focus on fixing the problem.
    • Major (orange): A device is completely down or in danger of going down. Attention needs to be paid to this problem immediately.
    • Minor (yellow): A part of a device (a service, and interface, a power supply, etc.) has stopped functioning. The device needs attention.
    • Warning (cyan): An event has occurred that may require action. This severity can also be used to indicate a condition that should be noted (logged) but does not require direct action.
    • Normal (green): Informational message. No action required.
    • Cleared (white): This event indicates that a prior error condition has been corrected and service is restored.
    • Indeterminate (light blue): The severity of the event cannot be determined.
  8. Events can be made to send a notification to email, pager, jabber, phone sms, etc.
    • Notifications are grouped by type (node up, node down, service up, service down, etc,) and by IP address. This means is it somewhat difficult to filter on windows vs unix machines (at least until we get snmp running and can key off an oid specific to windows or unix). It is easier to key off the network equipment as they are on a separate subnet (10.254.254.*).
    • To avoid service flapping (e.g. the service going down and then backup 30 seconds later), I set a delay of 1 minute on all notifications so if OpenNMS gets a down and up in a minute it will not send out a notification (there will still be events in the event log though so you can track these).
    • Notifications can be sent to a group or a specific user.
    • To avoid two notifications when a service comes back up, make sure you turn off the service regained notification.
  9. The server generates a very large number of events (hundreds of traps from the switches alone. I run a script, /usr/share/opennms/bin/archive_events.sh whenever the database starts running slow to generate an archive file of events and keep the events online down. As we tighten up our network, we should be able to keep more events online in the database.

Notes

  • After a node is discovered and capabilities determined, you may decided to not monitor some of the services (e.g. ssh) by going to the admin page, Manage/Unmanage interfaces and services.
  • You can not delete the last person or group from a destination path for a notice via the gui. You need to edit the /etc/opennms/destinationPaths.xml file.
  • The best way to edit opennms config files is to use a XML editor such as Exchanger_XML_Lite to edit the files. This is because opennms saves the files as one big blob that is very hard to read. Remember to restart opennms after editing a file so it will re-read the configuration.
  • To monitor a specific TCP service do the same as above, but use the tcp protocol when setting up the poller.
  • To monitor a more complex service use the general purpose poller. This poller runs a script on the opennms server (the script gets the ip address for the service and a timeout). The opennms then checks the result of the script to determine up/down for the service. See CheckWebSite.pl for an example.

Web Server Monitoring

  • To monitor a specific web server do the following:
    1. Get the CheckWebSite.pl script
    2. Log on to opennms
    3. Start up your xml editor
    4. Edit /etc/opennms/capsd-configuration.xml file.
      1. Copy and paste the contents for the plugin for HTTP-www.nsd.org (see config section)
      2. Rename the pasted one to HTTP-<your service name>.
      3. Edit the args property to have the port:path and the string to search for in the web page. Replace all spaces with %20. Examples are:
        • "/ District%20-%20NSD%20Web"
        • "/wiki/TechWiki NSD%20Technology%20Doc%20Wiki"
        • ":8100 nsd.org%20Email"
    5. Edit /etc/opennms/poller-configuration.xml file.
      1. Copy and paste the contents for the plugin for HTTP-www.nsd.org.
      2. Rename the pasted one to HTTP-<your service name>.
      3. Edit the args property to have the port:path and the string to search for in the web page. Replace all spaces with %20. Examples are:
        • "/ District%20-%20NSD%20Web"
        • "/wiki/Tech``Wiki NSD%20Technology%20Doc%20Wiki"
        • ":8100 nsd.org%20Email"
      4. Copy and paste the monitor line for HTTP-www.nsd.org at the bottom of the file
      5. Rename the pasted monitor line to HTTP-<your service name>
    6. Restart opennms (/etc/init.d/opennms restart)
    7. Either wait 24 hours or do a rescan on the server with that service running.
  • To monitor a https service follow the above directions, but use CheckSecureWebSite.pl.


Config

Warning.png Capsd is Deprecated

This rather old page instructs the user to make changes to the capsd-configuration.xml file. In OpenNMS 1.12.x, making these changes will have no effect. The same goals can be accomplished using Provisiond, which is configurable from the web UI without requiring a restart.

capsd:

	<ns179:protocol-plugin protocol="HTTP-www.nsd.org" class-name="org.opennms.netmgt.capsd.GpPlugin" scan="on" user-defined="true">
		<ns179:property key="script" value="/opt/opennms/scripts/CheckWebSite.pl"/>
		<ns179:property key="banner" value="Success"/>
		<ns179:property key="args" value="/ District%20-%20NSD%20Web"/>
		<ns179:property key="timeout" value="3000"/>
		<ns179:property key="retry" value="1"/>
	</ns179:protocol-plugin>

poller:

	<ns178:service name="HTTP-www.nsd.org" interval="300000" user-defined="true" status="on">
			<ns178:parameter key="script" value="/opt/opennms/scripts/CheckWebSite.pl"/>
			<ns178:parameter key="timeout" value="3000"/>
			<ns178:parameter key="banner" value="Success"/>
			<ns178:parameter key="retry" value="1"/>
			<ns178:parameter key="args" value="/ District%20-%20NSD%20Web"/>
			<ns178:parameter key="rrd-repository" value="/var/lib/opennms/rrd/response"/>
	<ns178:parameter key="ds-name" value="http-www.nsd.org"/>
        <ns178:monitor service="HTTP-www.nsd.org" class-name="org.opennms.netmgt.poller.monitors.GpMonitor"/>

Scripts

Public Only Access to Custom Reports

I had need to allow public access to the custom KSC reports, but not to anything else in OpenNMS. I did this by:

  • Modify magic-users.properties to add in a special user for KSC reports
   users=rtc, ksc
   user.ksc.username=ksc
   user.ksc.password=ksc
  • Modify the custom_view.jsp to uniquely identify the view page using the location parameter (in the call to header.jsp) and to get the remote user logged in (just before the html tag):
   <% String user = request.getRemoteUser(); %>
   <jsp:param name="location" value="KSC and Node Reports View" />
  • Modify the custom_view.jsp to hide buttons that I did not want the public user to see (e.g. they can only resize the graph time scale, they cannot change graph types, exit, or customize the reports. Essentially, I just wrap the items I do not want the ksc user to access in an if statement like so:
   <% if( ! user.equals("ksc")) { %>
       <input type="button" value="Exit Report Viewer" onclick="exitReport()">
<% } %>
  • Now that the custom_view page is uniquely identified, I need to create an error page for when the ksc user tries to go where they shouldn't. I modified unknownexception.jsp by coping it to autherror.jsp and changing the text. The only trick is to set the location like was done in the custom_view.jsp so the ksc user has access to the error page.
  • Now to tie everything together modify the header.jsp file as these are included on all the pages (at least I hope so). First get the logged in user and make sure the location parameter has a value (the location is an optional parameter).
   String user = request.getRemoteUser();
   if( location == null ) {
       location = " ";
   }
  • Then insert the following just before the header comment line. This redirects the ksc user to the autherror.jsp page unless they are accessing the report page they are allowed to:
   <% if (user.equals("ksc") && (! location.equals("KSC and Node Reports View"))) { %>
   <jsp:forward page="/errors/autherror.jsp" />
   <% } else { %>
  • Finally hide the top and bottom button bars from the ksc user as they should not be wandering around OpenNMS:
   <% if( ! user.equals("ksc")) { %>
           
           <a href="index.jsp">Home</a>
           <% for( int i = 0; i < breadcrumbs.length; i++ ) { %> > <%=breadcrumbs[i]%> <% } %>
           
   <%  } %>
   <%-- Node List --%>
   <% if( ! user.equals("ksc")) { %>
   ... all the button bar nodes listed here ...
   <% } %>
   
  • Do the same last step for footer.jsp to hide the bottom button bar.
  • The final step is to create a simple web page that tells the user the login name and password and has links directly to the reports (I could not figure out a way to do this without tomcat/opennms asking for a password). This is like demo pages you see on the internet.