Dell OpenManage Storage

From OpenNMS
Jump to navigation Jump to search
Tested for Versions
The instructions in this article have been tested against the following versions of OpenNMS.
Tested Against:
Version 1.12.8 tested by Fuhrmann

HOWTO: Monitoring Dell PERC RAID Controllers using OpenNMS

I work in an environment where we have thousands of Dell PowerEdge servers, most of which have PERC4 and PERC5 RAID controllers. Because hard drives are easily the most common piece of hardware to fail in our environment, we had a strong desire to monitor them using OpenNMS.

Prerequisites

To make this work, this how-to assumes you have a relatively recent version of Dell OpenManage (5.1 as of this writing) installed on the target server(s) with the storage management component installed. Luckily, this exists for both Windows and Linux platforms, so the RAID monitoring goodness is available for most typical server installs. For the sake of this document, I will assume you have installed the Dell OpenManage Server Administrator Managed Node package, available from http://support.dell.com.

Be sure to properly set the SNMP public community strings in both OpenNMS and on the OMSA agent. Also, if in Windows, be sure to restart the SNMP Agent service.

You will also need to be running >1.3.x branch of OpenNMS.

Finally, you need to have RAID arrays configured on the PERCs. Duh. :)

Configuration

First off, you'll need to know the maximum number of possible logical disks you have in your environment. Even if most of your servers have one RAID 5 array, if you have one oddball machine you want to monitor that has two RAID 1 arrays and one RAID 5, you'll need to configure OpenNMS to have the capability to monitor three logical disks.

So, using this example of three possible logical disks:

The first step would be to modify capsd-configuration.xml like so:

<protocol-plugin protocol="OMSA_Storage_Disk_1" class-name="org.opennms.netmgt.capsd.plugins.SnmpPlugin" scan="on" user-defined="false">
 <property key="vbname" value=".1.3.6.1.4.1.674.10893.1.20.140.1.1.1.1" />
 <property key="vbvalue" value="1" />
</protocol-plugin>
<protocol-plugin protocol="OMSA_Storage_Disk_2" class-name="org.opennms.netmgt.capsd.plugins.SnmpPlugin" scan="on" user-defined="false">
 <property key="vbname" value=".1.3.6.1.4.1.674.10893.1.20.140.1.1.1.2" />
 <property key="vbvalue" value="2" />
</protocol-plugin>
<protocol-plugin protocol="OMSA_Storage_Disk_3" class-name="org.opennms.netmgt.capsd.plugins.SnmpPlugin" scan="on" user-defined="false">
 <property key="vbname" value=".1.3.6.1.4.1.674.10893.1.20.140.1.1.1.3" />
 <property key="vbvalue" value="3" />
</protocol-plugin>

Instead of capsd you can use provisiond detectors to detect your RAID storages (tested with OpenNMS 1.12.x):

<detector name="OMSA_Storage_Disk_1" class="org.opennms.netmgt.provision.detector.snmp.SnmpDetector">
 <parameter key="vbvalue" value="1"/>
 <parameter key="oid" value=".1.3.6.1.4.1.674.10893.1.20.140.1.1.1.1"/>
</detector>
<detector name="OMSA_Storage_Disk_2" class="org.opennms.netmgt.provision.detector.snmp.SnmpDetector">
 <parameter key="vbvalue" value="2"/>
 <parameter key="oid" value=".1.3.6.1.4.1.674.10893.1.20.140.1.1.1.2"/>
</detector>
 <detector name="OMSA_Storage_Disk_3" class="org.opennms.netmgt.provision.detector.snmp.SnmpDetector">
 <parameter key="vbvalue" value="3"/>
 <parameter key="oid" value=".1.3.6.1.4.1.674.10893.1.20.140.1.1.1.3"/>
</detector>


Next, add the following to poller-configuration.xml (the below entries need to go in the correct sections, which should be obvious when you look at them):

<service name="OMSA_Storage_Disk_1" interval="300000" user-defined="false" status="on">
 <parameter key="retry" value="3"/>
 <parameter key="timeout" value="6000"/>
 <parameter key="virtualDiskNumber" value="1"/>
</service>
<service name="OMSA_Storage_Disk_2" interval="300000" user-defined="false" status="on">
 <parameter key="retry" value="3"/>
 <parameter key="timeout" value="6000"/>
 <parameter key="virtualDiskNumber" value="2"/>
</service>
<service name="OMSA_Storage_Disk_3" interval="300000" user-defined="false" status="on">
 <parameter key="retry" value="3"/>
 <parameter key="timeout" value="6000"/>
 <parameter key="virtualDiskNumber" value="3"/>
</service>
      
<monitor service="OMSA_Storage_Disk_1" class-name="org.opennms.netmgt.poller.monitors.OmsaStorageMonitor"/>
<monitor service="OMSA_Storage_Disk_2" class-name="org.opennms.netmgt.poller.monitors.OmsaStorageMonitor"/>
<monitor service="OMSA_Storage_Disk_3" class-name="org.opennms.netmgt.poller.monitors.OmsaStorageMonitor"/>

That should be it! Restart OpenNMS and see if it picks up the new services after a rescan.

Notifications

Once you have this configured and servers monitored, depending on how your notifications are configured you'll see something similar to the following whenever a disk fails or the array becomes degraded due to a consistency check:

OMSA_Storage_Disk_1 outage identified on interface 10.1.1.50 with reason code: log vol(1) degraded phy drv(\0\0\3) Failed.

Which means that logical disk 1, adapter 0, channel 0, SCSI ID 3 has failed.

Final Thoughts

Using the OMSA Storage Monitor is an easy way to monitor your Dell PERC RAID arrays. However, it doesn't need stop there. Dell's OpenManage Storage software also has the ability to send SNMP traps on events like Predictive failures, so with a little fiddling around, you can get very in-depth in your Dell PERC array monitoring strategy.

Finally, there is a similar PERCMonitor, also available in the 1.3.x branch of OpenNMS. This monitor uses the outdated percsnmpd for similar functionality. Its configuration is identical to that of the OMSA Storage Monitor.