Detecting Missed Events

From OpenNMS
Jump to navigation Jump to search
Tested for Versions
The instructions in this article have been tested against the following versions of OpenNMS.
Tested Against:
Version 15.0.0 tested by Unicoletti

One exciting feature of OpenNMS is that since around version 1.10 it embeds the Drools rule processing engine. Drools programs can then be used to extend the event handling logic in novel and powerful ways.

An interesting use case for Drools is detecting missed events, for example: skippped backups, heartbeats, jobs or Passive Status Keeper events.

In this use case we are going to implement monitoring of a generic recurring job like a daily backup. The job at the end of its run communicates its exit code or status to OpenNMS. It can do so in various ways:

  • syslog messages (require proper formatting and some regex-fu on the OpenNMS side). I wouldn't recommend it for cases where the message MUST be correctly delivered because OpenNMS at the moment only supports syslog over UDP
  • native Eventd events. If you feel uncomfortable opening eventd to the public network then perhaps you can use this send-event web hook (feedback welcome)

Depending on the reported status OpenNMS can be configured to send a notification so that someone or some process can be activated and fix the issue. This is standard event configuration which is already covered by the documentation.

When the job fails to start/run entirely or for whatever cause (network issue, etc) no event gets fired at all OpenNMS has no way of knowing that this problem happened and it therefore can go unnoticed until it is too late.

Fortunately the Drools Correlation Engine can be used to detect this situation.

Overview

OpenNMS must be configured to handle two custom events: one for failure and for success. The events must be configured in their own xml file to that a success event clears a pending warning event.

If using syslog as the event transport then syslogd must be configured too.

At this point we need to implement a set of Drools rules as the following:

  1. when a new event comes in (be it a failure or success) then engine starts a timer for a specific amount of time. To keep things simple the timer will expire after 26 hours exactly (to allow for some job duration variation)
  2. when a new event comes in (be it a failure or success) that is associated with an existing timer that timer must be extended for the same amount of time (26 more hours in this example)
  3. when a timer expires because 26 hours have passed without receiving any event the engine triggers a failure event, which should open an alarm and optionally send a notification

Note 1: at timer expiration the Drools engine fires a failure event, so next time the job runs it will also clear the missed execution alarm.

Note 2: this example has been kept intentionally simple, in the implementation we will handle custom timer durations and also timers that are weekends-aware.

Note 3: since all Drools state is kept in memory it will not survive an OpenNMS restart

Events

The following is the configuraton that must go into its own XML file in ${OPENNMS_HOME}/etc/events:

<?xml version="1.0" encoding="UTF-8"?>
 <events>
  <event>
   <uei>uei.opennms.org/backup/recurring/Warning</uei>
   <event-label>Backup Event: recurring backup scripts warning</event-label>
   <descr>Event generated by backup scripts.</descr>
   <logmsg dest="logndisplay">Details: %parm[all]%, backupset: %parm[backupset]%, every: %parm[every]%, interval: %parm[interval]%</logmsg>
   <severity>Warning</severity>
   <alarm-data reduction-key="%uei%:%nodeid%:%parm[backupset]%" alarm-type="1"/>
  </event>
  <event>
   <uei>uei.opennms.org/backup/recurring/Normal</uei>
   <event-label>Backup Event: recurring backup scripts normal</event-label>
   <descr>Event generated by backup scripts.</descr>
   <logmsg dest="logndisplay">Details: %parm[all]%, backupset: %parm[backupset]%, every: %parm[every]%, interval: %parm[interval]%</logmsg>
   <severity>Normal</severity>
   <alarm-data reduction-key="org.opennms/backup/recurring/Normal:%nodeid%:%parm[backupset]%" clear-key="org.opennms/backup/recurring/Warning:%nodeid%:%parm[backupset]%" alarm-type="2"/>
  </event>
 </events>

and remember to import it in eventconf.xml.

Each event carries three additional params:

  • backupset : carries the job/backup name, because one host can execute multiple jobs/backups. It must be used in the event reduction key to achieve the correct resolution of warnings
  • every : the value 'every' means it is an externally submitted event while 'missed' will be assigned to events generated internally by expired Drools timers (missed executions). Every can be used as varbind filter to implement different notifications for regular failures and missed execution failures
  • interval: positive integer value indicating the repeating interval in hours (24 for daily jobs, 1 for hourly jobs, and so on. Remember to add some more hours to compensate for faster or slower job runs, like full backups vs differential backups). Interval accepts the N:weekdays modifier meaning that job will every N hours but only on weekdays (that is Mon-Fri included)

Syslog

If receiving status via syslog drop this content in a XML file in ${OPENNMS_HOME}/etc/syslog:

<?xml version="1.0"?>
 <syslogd-configuration-group>
  <ueiList>
   <ueiMatch>
    <match type="regex" expression="^.*\[(backup\/success)\/(.*)\/(.*)\/(.*)\].*$" />
    <uei>uei.opennms.org/backup/recurring/Normal</uei>
    <parameter-assignment matching-group="1" parameter-name="status" />
    <parameter-assignment matching-group="2" parameter-name="backupset" />
    <parameter-assignment matching-group="3" parameter-name="every" />
    <parameter-assignment matching-group="4" parameter-name="interval" />
   </ueiMatch>
   <ueiMatch>
    <match type="regex" expression="^.*\[(backup\/failure)\/(.*)\/(.*)\/(.*)\].*$" />
    <uei>uei.opennms.org/backup/recurring/Warning</uei>
    <parameter-assignment matching-group="1" parameter-name="status" />
    <parameter-assignment matching-group="2" parameter-name="backupset" />
    <parameter-assignment matching-group="3" parameter-name="every" />
    <parameter-assignment matching-group="4" parameter-name="interval" />
   </ueiMatch>
  </ueiList>
 </syslogd-configuration-group>

and remember to import it in syslogd-configuration.xml.

Also remember to configure Syslogd so that the service is started and can be reached from the outside.

Drools Engine

The following are the rule definitions. Place them in a file called ${OPENNMS_HOME}/etc/drools-engine.d/RepeatingBackupRules.drl:

package org.opennms.netmgt.correlation.drools;

import java.util.Date;
import java.util.Calendar;
import org.opennms.netmgt.correlation.drools.DroolsCorrelationEngine;
import org.opennms.netmgt.xml.event.Event;
import org.opennms.netmgt.xml.event.Parms;
import org.opennms.netmgt.xml.event.Parm;
import org.opennms.netmgt.xml.event.Value;
import org.opennms.netmgt.model.events.EventBuilder;
global org.opennms.netmgt.correlation.drools.DroolsCorrelationEngine engine;
global org.opennms.netmgt.correlation.drools.NodeService nodeService;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
import java.util.Iterator;
global java.lang.Integer REPEATING_INTERVAL; // in hours

declare Execution
	nodeid : Long
	uei : String
        tag : String
        interval : String
	expireTimerId : Integer
end

/*
 * Initial execution event for a node - send the initial translated event to generate notification
 */
rule "initial backup received"
    when
		$e : Event( $uei : uei, $nodeid : nodeid )
		eval( "every".equals($e.getParm("every").getValue().getContent()) ) // filter out internally generated 'missed' events
    then
		Execution execution = new Execution();
		execution.setNodeid( $nodeid );
		execution.setUei( $uei );
                execution.setTag( getTag($e) );
                execution.setInterval( $e.getParm("interval").getValue().getContent() );
		execution.setExpireTimerId( engine.setTimer( getInterval($e, REPEATING_INTERVAL) ) );
		insert( execution );
                // the event can be retracted, or the subsequent backup completed rule will fire
		retract( $e );
		println( "Initial backup tag="+getTag($e)+" event " + $uei + " for node " + $nodeid );
end

/*
 * Subsequent backup completed
 */
rule "subsequent backup completed"
    when
		$e : Event( $uei : uei, $nodeid : nodeid )
                $execution : Execution( nodeid == $nodeid, $expireTimerId : expireTimerId )
		eval( $execution.getTag().equals( getTag($e) ) )
		eval( "every".equals($e.getParm("every").getValue().getContent()) ) // filter out internally generated 'missed' events
    then
	   retract( $e );
	   engine.cancelTimer($expireTimerId);
	   $execution.setExpireTimerId( engine.setTimer( getInterval($e, REPEATING_INTERVAL) ) );
	   update( $execution );
	   println( "Subsequent execution event " + $uei + " for node " + $nodeid +" supressed." );
end

/*
 * Expiration timer expires: warn user that another backup event was not received in the expected interval
 */
rule "timer expired"
	when
		$execution : Execution( $tag: tag, $nodeid : nodeid, $expireTimerId : expireTimerId, $uei : uei, $interval: interval)
		$expire : TimerExpired( id == $expireTimerId )
	then
                sendExecutionMissedEvent(engine, $nodeid, $uei, $tag , $interval);
		retract( $execution );
		retract( $expire );
		println( "Backup execution expiration for " + $uei + " for node " + $nodeid +"["+$tag+"]." );
end

/*
 * Utility to send a (failed) execution event.
 */
function void sendExecutionMissedEvent( DroolsCorrelationEngine engine, Long nodeId, String uei, String tag, String interval) {
        EventBuilder bldr = new EventBuilder(uei.replaceAll("Normal","Warning"), "Drools"); // clone current event
        bldr.setNodeid(nodeId.intValue());
	bldr.addParam("correlationEngineName", "Drools");
	bldr.addParam("correlationRuleSetName", engine.getName());
	bldr.addParam("correlationComments", "RepeatingBackupRules");
	if(uei.indexOf("job")!=-1) {
		bldr.addParam("job", tag);
	} else {
		bldr.addParam("backupset", tag);
	}
	bldr.addParam("tag", tag);
	bldr.addParam("interval", interval);
	bldr.addParam("every", "missed"); // this will be used to discriminate between normal failures (->"every") and missed executions (->"missed")
        engine.sendEvent(bldr.getEvent());
}

function String getTag(Event e) {
   String tag=null;

   Parm p=e.getParm("backupset");
   if(p!=null) {
	tag=p.getValue().getContent();
   }
   p=e.getParm("job");
   if(p!=null) {
        tag=p.getValue().getContent();
   }

   return tag;
}

function long getInterval(Event e, Integer defaultInterval) {
   long interval = defaultInterval.intValue(); // default, in hours

   Parm p=e.getParm("interval");
   if(p!=null) {
	String value=p.getValue().getContent();
	if(value!=null && value.indexOf(":")!=-1) {
		println("This event has an interval modifier: "+value);
 		// value contains a modifier, parse it
		String[] values=value.split(":", 2);
        	try {
        		interval=new Integer(values[0]).intValue();
		} catch(Exception exc) {
			println("Error parsing interval value="+value+" to integer. Using default REPEATING_INTERVAL");
		}

		// now find out which modifier was used
		if("weekdays".equals(values[1])) {
			println("This event has a weekdays modifier");
			// the job is only meant to be run on weekdays: this means that any execution whose timer
			// would expire on weekends (sat,sun) should be moved up to monday + interval
			// example: on friday night at 23:00 a daily backup runs, interval is: 36:weekdays
			// since the timer would expire on sunday at 11.00 it is moved up to wednesday at 11:00 (monday 23:00 + 36h)
			Calendar cal = Calendar.getInstance();
			cal.add(Calendar.HOUR_OF_DAY, (int)interval);
			int dow = cal.get(Calendar.DAY_OF_WEEK);
			
			if( ( dow == Calendar.SATURDAY || dow == Calendar.SUNDAY || dow == Calendar.MONDAY ) && interval >= 24 ) {
				// timer would expire on saturday or sunday, move it up to monday then add interval again
				println("Timer would expire on sat/sun/mon, moving it up to monday and adding interval");
				cal = Calendar.getInstance();
				cal.roll(Calendar.DAY_OF_YEAR, true);
				while(cal.get(Calendar.DAY_OF_WEEK) != Calendar.MONDAY) {
					cal.roll(Calendar.DAY_OF_YEAR, true);
				}
				// now add interval
				cal.add(Calendar.HOUR_OF_DAY, (int)interval);
			}
			println("Event timer has been moved up until: "+cal.getTime());
			interval = (cal.getTimeInMillis() - Calendar.getInstance().getTimeInMillis());
			println("Interval in ms: "+interval);
		} 
	} else {
        	try {
        		interval = new Integer(value).intValue();
			interval = interval * 60 * 60 * 1000; // hours -> milliseconds
		} catch(Exception exc) {
			println("Error parsing interval value="+value+" to integer. Using default REPEATING_INTERVAL");
		}
	}
   }

   return interval;
   //return 60 * 1000; // 1 minute, uncomment for faster timer expires useful for debugging
}


/*
 * println utility
 */
function void println(Object msg) {
	System.err.println(new Date() + " RepeatingBackups : " + msg);
}

Then activate the rule by adding it to drools-engine.xml

<?xml version="1.0" encoding="UTF-8"?>
 <engine-configuration
   xmlns="http://xmlns.opennms.org/xsd/drools-engine"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://xmlns.opennms.org/xsd/drools-engine /opt/opennms/share/xsds/drools-engine.xsd ">
  <rule-set name="repeatingBackupRules">
   <rule-file>drools-engine.d/RepeatingBackupRules.drl</rule-file>
   <event>uei.opennms.org/backup/recurring/Normal</event>
   <event>uei.opennms.org/backup/recurring/Warning</event>
    <global name="REPEATING_INTERVAL" type="java.lang.Integer" value="172800000"/> <!-- every 2 days-->
   </rule-set>
 </engine-configuration>

Remember to activate the OpenNMS:Name=Correlator service service-configuration.xml or the Drools engine will not be activated.

Restart OpenNMS and check the logs for errors.


Notifications

The following snippet must be added to notifications.xml. Note that Notifd handles the failure of a backup and its missed execution separately thanks to the varbind filter on every. Adjust text and destinations according to your environment.

Always check your xml files with xmllint before applying the changes.

 <notification name="recurring backup warning" status="on" writeable="yes">
  <uei>uei.opennms.org/backup/recurring/Warning</uei>
  <description>A warning was received for a recurring backup</description>
  <rule>(IPADDR IPLIKE *.*.*.*)</rule>
  <destinationPath>Email-sysadmin-wo-escalation</destinationPath>
  <text-message>BACKUP %parm[backupset]% : exited with errors&#xd;
 &#xd;
 111-%noticeid% [%nodelabel% - %interface%] a backup script reported an error, details below.&#xd;
 &#xd;
 Full message: %logmsg%&#xd;
 &#xd;
 Params:&#xd;
  backupset=%parm[backupset]%&#xd;
  every=%parm[every]%&#xd;
  interval=%parm[interval]%&#xd;
  status=%parm[status]%&#xd;
  tag (might be null)=%parm[tag]%&#xd;
 &#xd;
  </text-message>
  <subject>[OPENNMS] [%nodelabel% - %interface%] recurring BACKUP %parm[backupset]%: failed</subject>
  <numeric-message>[OPENNMS] [%nodelabel% - %interface%] recurring BACKUP %parm[backupset]%: failed</numeric-message>
        <varbind>
            <vbname>every</vbname>
            <vbvalue>'''every'''</vbvalue>
        </varbind>
 </notification>
 
 <notification name="recurring backup warning: missed exec" status="on" writeable="yes">
  <uei>uei.opennms.org/backup/recurring/Warning</uei>
  <description>A warning was received for a recurring backup because it skipped its interval</description>
  <rule>(IPADDR IPLIKE *.*.*.*)</rule>
  <destinationPath>Email-sysadmin-wo-escalation</destinationPath>
  <text-message>BACKUP %parm[backupset]% : did not receive an event within the following interval since last execution: %parm[interval]%h &#xd;
 &#xd;
 111-%noticeid% [%nodelabel% - %interface%] backup did not run  within the programmed interval since last execution.&#xd;
 &#xd;
 Full message: %logmsg%&#xd;
 &#xd;
 Params:&#xd;
  backupset=%parm[backupset]%&#xd;
  every=%parm[every]%&#xd;
  interval=%parm[interval]%&#xd;
  status=%parm[status]%&#xd;
  tag (might be null)=%parm[tag]%&#xd;
 &#xd;
 </text-message>
 <subject>[OPENNMS] [%nodelabel% - %interface%] BACKUP %parm[backupset]%: missed an execution since last: %parm[interval]%h</subject>
        <numeric-message>[OPENNMS] [%nodelabel% - %interface%] BACKUP %parm[backupset]%: missed an execution since last : %parm[interval]%h</numeric-message>
        <varbind>
            <vbname>every</vbname>
            <vbvalue>'''missed'''</vbvalue>
        </varbind>
 </notification>

Passive Status Keeper

The Drools rule file above can be easily made to work with Passive_Status_Keeper with a couple of modifications to accommodate for the data required by PSK.

Overview of the changes:

  1. extend the Event Translator config to enrich the event with the params required by the Drools rules (link)
  2. extend the Drools rules to save passive* event attributes
  3. add the uei to drools-engine.xml so that the passive status events reach the Drools rules

We will use the example cited in the Passive_Status_Keeper documentation and extend it.

(1) Event translator changes

Edit translator-configuration.xml so that the passiveStatusEvents are enriched with attributes required by Drools. The final event translation spec is as follows:

    <event-translation-spec uei="uei.opennms.org/vendor/pixelmetrix/traps/tspEventPCRRepetitionError">
      <mappings>
        <mapping>
          <assignment type="parameter" name="passiveNodeLabel">
            <value type="parameter" name=".1.3.6.1.4.1.6768.6.2.2.5.0" matches="^(.*)" result="${1}" />
          </assignment>
          <assignment type="parameter" name="passiveIpAddr">
            <value type="constant" result="169.254.1.1" />
          </assignment>
          <assignment type="parameter" name="passiveServiceName">
            <value type="parameter" name=".1.3.6.1.4.1.6768.6.2.2.7.0" matches="^([A-z]+): .*" result="${1}" />
          </assignment>
          <assignment type="parameter" name="passiveStatus" >
            <value type="parameter" name=".1.3.6.1.4.1.6768.6.2.2.7.0" matches="((?!exceeded).)*" result="Up" />
          </assignment>
          <assignment type="field" name="uei">
            <value type="constant" result="uei.opennms.org/services/passiveServiceStatus" />
          </assignment>
          
          <!-- MISSED EVENTS config -->
          <assignment type="parameter" name="every">
            <value type="constant" result="every" />
          </assignment>
          <assignment type="parameter" name="interval">
            <value type="constant" result="1" /> <!-- should get one event every hour -->
          </assignment>
          <assignment type="parameter" name="job">
            <value type="parameter" name=".1.3.6.1.4.1.6768.6.2.2.7.0" matches="^([A-z]+): .*" result="${1}"/> <!-- the string used by drools to correlate the events with the timer -->
          </assignment>
          <!-- EOF:MISSED EVENTS config -->

        </mapping>
      </mappings>
    </event-translation-spec>
 
    <event-translation-spec uei="uei.opennms.org/vendor/pixelmetrix/traps/tspEventPCRRepetitionError">
      <mappings>
        <mapping>
          <assignment type="parameter" name="passiveNodeLabel">
            <value type="parameter" name=".1.3.6.1.4.1.6768.6.2.2.5.0" matches="^(.*)" result="${1}" />
          </assignment>
          <assignment type="parameter" name="passiveIpAddr">
            <value type="constant" result="192.168.159.129" />
          </assignment>
          <assignment type="parameter" name="passiveServiceName">
            <value type="parameter" name=".1.3.6.1.4.1.6768.6.2.2.7.0" matches="^([A-z]+): .*" result="${1}" />
          </assignment>
          <assignment type="parameter" name="passiveStatus" >
            <value type="parameter" name=".1.3.6.1.4.1.6768.6.2.2.7.0" matches=".*exceeded.*" result="Down" />
          </assignment>
          <assignment type="field" name="uei">
            <value type="constant" result="uei.opennms.org/services/passiveServiceStatus" />
          </assignment>
          
          <!-- MISSED EVENTS config -->
          <assignment type="parameter" name="every">
            <value type="constant" result="every" />
          </assignment>
          <assignment type="parameter" name="interval">
            <value type="constant" result="1" /> <!-- should get one event every hour -->
          </assignment>
          <assignment type="parameter" name="job">
            <value type="parameter" name=".1.3.6.1.4.1.6768.6.2.2.7.0" matches="^([A-z]+): .*" result="${1}"/> <!-- the string used by drools to correlate the events with the timer -->
          </assignment>
          <!-- EOF:MISSED EVENTS config -->

        </mapping>
      </mappings>
    </event-translation-spec>

It is important to set the interval parameter to the maximum amount of time within which an event must be received or the service will be declared DOWN.

(2) Drools rules changes

As said before the changes are minor and limited to storing and retrieving the event passive* parameters in the timer. The rule names have been changed to reflect the generalization that has been done.

Link to gist showing changes between versions. The full rule file is below:

package org.opennms.netmgt.correlation.drools;

import java.util.Date;
import java.util.Calendar;
import org.opennms.netmgt.correlation.drools.DroolsCorrelationEngine;
import org.opennms.netmgt.xml.event.Event;
import org.opennms.netmgt.xml.event.Parms;
import org.opennms.netmgt.xml.event.Parm;
import org.opennms.netmgt.xml.event.Value;
import org.opennms.netmgt.model.events.EventBuilder;
global org.opennms.netmgt.correlation.drools.DroolsCorrelationEngine engine;
global org.opennms.netmgt.correlation.drools.NodeService nodeService;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
import java.util.Iterator;
global java.lang.Integer REPEATING_INTERVAL; // in hours

declare Execution
	nodeid : Long
	uei : String
        tag : String
        interval : String
	expireTimerId : Integer

        // for passive status events
	passiveIpAddr: String
	passiveServiceName: String
	passiveNodeLabel: String
end

/*
 * Initial execution event for a node - send the initial translated event to generate notification
 */
rule "initial event received"
    when
		$e : Event( $uei : uei, $nodeid : nodeid )
		eval( "every".equals($e.getParm("every").getValue().getContent()) ) // filter out internally generated 'missed' events
    then
		Execution execution = new Execution();
		execution.setNodeid( $nodeid );
		execution.setUei( $uei );
                execution.setTag( getTag($e) );
                execution.setInterval( $e.getParm("interval").getValue().getContent() );
		execution.setExpireTimerId( engine.setTimer( getInterval($e, REPEATING_INTERVAL) ) );

		// handle passive status events
		if($uei.indexOf("passiveServiceStatus")!=-1) {
			execution.setPassiveIpAddr($e.getParm("passiveIpAddr").getValue().getContent());
			execution.setPassiveServiceName($e.getParm("passiveServiceName").getValue().getContent());
			execution.setPassiveNodeLabel($e.getParm("passiveNodeLabel").getValue().getContent());
		}
		insert( execution );
                // the event can be retracted, or the subsequent backup completed rule will fire
		retract( $e );
		println( "Initial event tag="+getTag($e)+" event " + $uei + " for node " + $nodeid );
end

/*
 * Subsequent event completed
 */
rule "subsequent event completed"
    when
		$e : Event( $uei : uei, $nodeid : nodeid )
                $execution : Execution( nodeid == $nodeid, $expireTimerId : expireTimerId )
		eval( $execution.getTag().equals( getTag($e) ) )
		eval( "every".equals($e.getParm("every").getValue().getContent()) ) // filter out internally generated 'missed' events
    then
	   retract( $e );
	   engine.cancelTimer($expireTimerId);
	   $execution.setExpireTimerId( engine.setTimer( getInterval($e, REPEATING_INTERVAL) ) );
	   update( $execution );
	   println( "Subsequent execution event " + $uei + " for node " + $nodeid +" supressed." );
end

/*
 * Expiration timer expires: warn user that another backup event was not received in the expected interval
 */
rule "timer expired"
	when
		$execution : Execution( $tag: tag, $nodeid : nodeid, $expireTimerId : expireTimerId, $uei : uei, $interval: interval, $passiveNodeLabel: passiveNodeLabel, $passiveIpAddr: passiveIpAddr, $passiveServiceName: passiveServiceName)
		$expire : TimerExpired( id == $expireTimerId )
	then
                sendExecutionMissedEvent(engine, $nodeid, $uei, $tag , $interval, $passiveIpAddr, $passiveServiceName, $passiveNodeLabel);
		retract( $execution );
		retract( $expire );
		println( "Event execution expiration for " + $uei + " for node " + $nodeid +"["+$tag+"]." );
end

/*
 * Utility to send a (failed) execution event.
 */
function void sendExecutionMissedEvent( DroolsCorrelationEngine engine, Long nodeId, String uei, String tag, String interval, String passiveIpAddr, String passiveServiceName, String passiveNodeLabel) {
        EventBuilder bldr = new EventBuilder(uei.replaceAll("Normal","Warning"), "Drools"); // clone current event
        bldr.setNodeid(nodeId.intValue());
	bldr.addParam("correlationEngineName", "Drools");
	bldr.addParam("correlationRuleSetName", engine.getName());
	bldr.addParam("correlationComments", "RepeatingBackupRules");
	if(uei.indexOf("job")!=-1) {
		bldr.addParam("job", tag);
	} else {
		bldr.addParam("backupset", tag);
	}
	if(uei.indexOf("passiveServiceStatus")!=-1) {
		bldr.addParam("passiveStatus", "Down");
		bldr.addParam("passiveIpAddr", passiveIpAddr);
		bldr.addParam("passiveServiceName", passiveServiceName);
		bldr.addParam("passiveNodeLabel", passiveNodeLabel);
		bldr.addParam("passiveReasonCode", "no events received and timer expired");
	}
	bldr.addParam("tag", tag);
	bldr.addParam("interval", interval);
	bldr.addParam("every", "missed"); // this will be used to discriminate between normal failures (->"every") and missed executions (->"missed")
        engine.sendEvent(bldr.getEvent());
}

function String getTag(Event e) {
   String tag=null;

   Parm p=e.getParm("backupset");
   if(p!=null) {
	tag=p.getValue().getContent();
   }
   p=e.getParm("job");
   if(p!=null) {
        tag=p.getValue().getContent();
   }
   p=e.getParm("tag");
   if( p!=null && tag==null) {
        tag=p.getValue().getContent();
   }

   return tag;
}

function long getInterval(Event e, Integer defaultInterval) {
   long interval = defaultInterval.intValue(); // default, in hours

   Parm p=e.getParm("interval");
   if(p!=null) {
	String value=p.getValue().getContent();
	if(value!=null && value.indexOf(":")!=-1) {
		println("This event has an interval modifier: "+value);
 		// value contains a modifier, parse it
		String[] values=value.split(":", 2);
        	try {
        		interval=new Integer(values[0]).intValue();
		} catch(Exception exc) {
			println("Error parsing interval value="+value+" to integer. Using default REPEATING_INTERVAL");
		}

		// now find out which modifier was used
		if("weekdays".equals(values[1])) {
			println("This event has a weekdays modifier");
			// the job is only meant to be run on weekdays: this means that any execution whose timer
			// would expire on weekends (sat,sun) should be moved up to monday + interval
			// example: on friday night at 23:00 a daily backup runs, interval is: 36:weekdays
			// since the timer would expire on sunday at 11.00 it is moved up to tuesday at 11:00 (monday 23:00 + 36)
			Calendar cal = Calendar.getInstance();
			cal.add(Calendar.HOUR_OF_DAY, (int)interval);
			int dow = cal.get(Calendar.DAY_OF_WEEK);
			
			if(dow == Calendar.SATURDAY || dow == Calendar.SUNDAY || dow == Calendar.FRIDAY) {
				// timer would expire on saturday or sunday, move it up to monday then add interval again
				println("Timer would expire on sat/sun, moving it up to monday");
				cal = Calendar.getInstance();
				cal.roll(Calendar.DAY_OF_YEAR, true);
				while(cal.get(Calendar.DAY_OF_WEEK) != Calendar.MONDAY) {
					cal.roll(Calendar.DAY_OF_YEAR, true);
				}
				// now add interval
				cal.add(Calendar.HOUR_OF_DAY, (int)interval);
			}
			println("Event timer has been moved up until: "+cal.getTime());
			interval = (cal.getTimeInMillis() - Calendar.getInstance().getTimeInMillis());
			println("Interval in ms: "+interval);
		} 
	} else {
        	try {
        		interval = new Integer(value).intValue();
			interval = interval * 60 * 60 * 1000; // hours -> milliseconds
		} catch(Exception exc) {
			println("Error parsing interval value="+value+" to integer. Using default REPEATING_INTERVAL");
		}
	}
   }

   return interval;
}


/*
 * println utility
 */
function void println(Object msg) {
	System.err.println(new Date() + " RepeatingBackups : " + msg);
}

(3) drools-engine.xml changes

Just add the passiveServiceStatus event uei to the list of UEIs that will reach the rule file:

<?xml version="1.0" encoding="UTF-8"?>
<engine-configuration
        xmlns="http://xmlns.opennms.org/xsd/drools-engine"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://xmlns.opennms.org/xsd/drools-engine /opt/opennms/share/xsds/drools-engine.xsd ">
  <rule-set name="repeatingBackupRules">
    <rule-file>drools-engine.d/RepeatingBackupRules.drl</rule-file>
   <event>uei.opennms.org/backup/recurring/Normal</event>
   <event>uei.opennms.org/backup/recurring/Warning</event>
    <event>uei.opennms.org/services/passiveServiceStatus</event>
    <global name="REPEATING_INTERVAL" type="java.lang.Integer" value="172800000"/> <!-- every 2 days-->
  </rule-set>
</engine-configuration>

References

Original post: http://unicolet.blogspot.it/2015/04/detect-missed-execution-with-opennms.html

Drools samples: https://github.com/brozow/opennms-drools-sample

Another Drools example which inspired this solution: http://marc.info/?l=opennms-discuss&m=134262979429216&w=2