Monitoring a Dell PowerEdge Expandable RAID Controller 3/Di

From OpenNMS
Jump to navigation Jump to search

OpenNMS can theoretically monitor anything whose status can be retrieved over the network. In the case of the Dell PERC 3/Di, it is actually a pretty simple matter once the proper software is installed.

This controller is originally manufactured by Adaptec. Several models of Dell PERC devices are made by LSI Logic, and a similar procedure could be followed to monitor the status of those controllers using the proper SNMP agent. The following procedure was done on Centos 5 but with little tweaking should work on almost any Linux distro.

The first thing that is needed is the afa-linux-app-A01.tar.gz tarball available from Dell's website. This contains an extended SNMP agent that can be accessed by the Net-SNMP agent that is standard on most Linux distros. OpenNMS will use this to check the status of the attached drives.

Now this software has not been changed in quite a few years, so some tricks are needed to get it to install and run. These instructions have been borrowed from a post concerning setting this up for Nagios.

Untar the file and install the RPM:

# tar zxvf afa-linux-app-A01.tar.gz
# rpm -Uvh --nodeps afasnmp-2.7-1.i386.rpm

You have to use "--nodeps" since the libssl and libcrypto versions that the original code was compiled against are very old. The current ones are backward compatible, however, so all that is necessary is to create some symlinks:

# cd /lib
# ln -s
# ln -s

The next step is to create an "adasnmpd.conf" file in the location that the downloaded code expects it to be:

# mkdir -p /usr/local/etc/snmp
# cat >> /usr/local/etc/snmp/afasnmpd.conf
trapcommunity "community_name"
trapsink localhost
trapsink ip_addr_of_monitoring_host
# chmod 400 /usr/local/etc/snmp/afasnmpd.conf

The afasnmpd agent expects to be able to find standard MIB files in /usr/local/share/snmp/mibs. The Net-SNMP agent has them in /usr/share/snmp/mibs, so some symbolic links need to be created:

# mkdir -p /usr/local/share/snmp
# cd /usr/local/share/snmp/
# ln -s /usr/share/snmp/mibs .

The last piece of configuration is to tell the Net-SNMP agent to act like a master agent and where the afasnmpd OIDs can be found:

# vi /etc/snmp/snmpd.conf


# afasnmp data will be exported on port 161 by the main snmpd
master agentx
pass . /usr/sbin/afasnmpd

This assumes that snmpd has already been configured on this system.

Start the two agents:

# service snmpd restart
# service afasnmpd start

To insure that they start during a reboot:

# chkconfig snmpd on
# chkconfig afasnmpd on

If everything has gone as planned, the following test should result in values being returned:

# snmpwalk  localhost -c community_name -v2c .
SNMPv2-SMI::enterprises.795. = INTEGER: 3
SNMPv2-SMI::enterprises.795. = INTEGER: 3
SNMPv2-SMI::enterprises.795. = INTEGER: 3
SNMPv2-SMI::enterprises.795. = INTEGER: 3
SNMPv2-SMI::enterprises.795. = INTEGER: 3
SNMPv2-SMI::enterprises.795. = INTEGER: 3

Once the agent is running properly, the next big task is to determine what to monitor. The Adaptec MIB file for this device, afa-MIB.txt comes in the Dell tarball, so use mibparser to examine what is available:

# /opt/opennms/contrib/mibparser/dist/ afa-MIB.txt

This will stream a large amount of information. One of the values in the MIB is adaptecArrayControllerDevStatus, which happens to be the OID used above to test the agent. According to the MIB the values mean:

1       Device in unexpected state.
2       Device in unknown state.
3       Device functioning properly.
4       Device in warning state.
5       Device in error state.
6       Device in fatal error state.

It looks like OpenNMS should walk the MIB starting at . and report a problem if a value other than "3" is found.

The OpenNMS SnmpPlugin and SnmpMonitor can be used to implement this. In capsd-configuration.xml create the plugin to discover this service (it can easily be modeled on the "Router" plugin):

<protocol-plugin protocol="afaRAID" class-name="org.opennms.netmgt.capsd.plugins.SnmpPlugin" scan="on" user-defined="false">
    <property key="vbname" value="." />
    <property key="timeout" value="2000" />
    <property key="retry" value="1" />

If that OID exists on the remote device the "afaRAID" service will be added to it.

The final step is to modify poller-configuration.xml to monitor the "afaRAID" service.

<service name="afaRAID" interval="300000" user-defined="false" status="on">
        <parameter key="retry" value="1"/>
        <parameter key="timeout" value="3000"/>
        <parameter key="port" value="161"/>
        <parameter key="oid" value="."/>
        <parameter key="operator" value="="/>
        <parameter key="operand" value="3"/>
        <parameter key="walk" value="true"/>

This configuration says to walk all of the values starting at . and test to see that the values returned all equal "3". If just one does not, the service will be marked as down.

Remember to define the class at the bottom of that file:

<monitor service="afaRAID"         class-name="org.opennms.netmgt.poller.monitors.SnmpMonitor"/>