Tom Lanyon
tom@netspot.com.au
|
|
|
|
[fm-discuss] Unable to load disk-monitor plugin / how to change SES
indicators?
Posted:
Nov 4, 2009 9:02 PM
|
|
Hi all,
I'm trying to discover two things regarding my test system which has a bunch of SATA disks attached to a SAS expander:
* if I have a drive error, how do I know which cXtYdZ logical device maps to which physical disk/bay?
* how can I read temperature information from the drives?
There seems to be some work done in this area by Eric Schrock and Rob Johnston[1], which has led me to the disk-monitor FMA plugin. I am assuming that this plugin will automatically handle temperature monitoring and lighting the fault/locate LEDs but am not entirely sure of this.
I attempted to load the module but received:
# fmadm load /usr/lib/fm/fmd/plugins/disk-monitor.so fmadm: failed to load /usr/lib/fm/fmd/plugins/disk-monitor.so: module failed to load (consult fmd(1M) log)
I checked the log as instructed, but no errors or warnings were recorded. I know the log is working because when I accidentally tried to load the plugin's .conf file instead of the .so, I *did* receive an error in the fmd log:
Nov 05 2009 15:08:36.125443460 ereport.fm.fmd.mod_init nvlist version: 0 version = 0x0 class = ereport.fm.fmd.mod_init ena = 0x751b2ea5e2305401 msg = failed to load /usr/lib/fm/fmd/plugins/disk- monitor.conf: Operation not supported __ttl = 0x1 __tod = 0x4af256cc 0x77a1d84
Can anyone suggest whether this is indeed what I should be doing, and if so, why can't I load this FMA plugin?
Additionally, even if I get this running - what methods are there to manually identify a drive in the enclosure? ie, how do I send a command to the SES device? There needs to be some level of manual control available for this as I can think of multiple scenarios where I'd need to identify and extract a non-faulty drive from an enclosure.
Regards, Tom
[1] - http://blogs.sun.com/eschrock/entry/ses_sensors _______________________________________________ fm-discuss mailing list fm-discuss at opensolaris dot org
|
|
|
Posts:
61
From:
US
Registered:
3/9/05
|
|
|
|
Re: [fm-discuss] Unable to load disk-monitor plugin / how to change
SES indicators?
Posted:
Nov 4, 2009 11:08 PM
in response to: Tom Lanyon
|
|
Hi Tom,
The disk-monitor module is not actually used to detect or diagnose disk faults, but rather is a response agent designed for the thumper/thor platforms. The disk-monitor module subscribes to FMA diagnosis and repair events and monitors changes in the disk topology (by listening to hotplug sysevents). In response to these events, it will send requests to the service processor (via IPMI) to update FRU information and flip the disk bay LED's on/off, as appropriate.
In order for the disk-monitor module to operate, it needs to know the disk topology of the system, including, as you alluded to, the mapping of solaris disk devices to physical disk bays. For internal SATA/SAS disks, the code that constructs the disk topology currently relies on a set of xml files where we've hard-coded the mapping of drive bays to device nodes for a subset of Sun X64 platforms. For many (but not all[1]) external storage enclosures which support SES, we're able to dynamically derive the disk topology without the aid of any hard-coded information.
In the absence of this disk topology, disk-monitor will bail out during initialization, which is likely happening on your system.
That said, disk error telemetry is actually fed into the Fault Manager from two sources
1) The disk-transport module, which uses libdiskstatus to check for three failure conditions via uSCSI interfaces:
over temperature predictive failure self-test failure
2) The sd driver, which will generate error telemetry for problems detected at the target driver level.
Unfortunately, even though the your system will be capable of generating error telemetry for your disks, the system that diagnosis faults from the error telemetry also needs to consume information in the disk topology, so you're still probably out of luck.
Hope this helps,
rob
[1] The full answer as to why can't derive the topology on all SES storage enclosures is a bit too involved to dive into here, but it basically depends on the complexity of the internal SAS topology of the array in question. If the array presents a single root target at the top of the topology then libses will do the right thing. However, if the topology uses SAS expanders to either multi-attach the disks or to talk to different subsets of disks then SES will present multiple targets at the top of the topology and to libses it may appear as multiple storage enclosures, which cause us to generate an inaccurate topology.
There is a workaround for the latter case - libses provides a means of overriding the interpretation by delivering a small plugin module to ses (either a model-specific plugin for a specific array a single vendor specific plugin.
There are a couple projects underway Tom Lanyon wrote: > Hi all, > > I'm trying to discover two things regarding my test system which has a > bunch of SATA disks attached to a SAS expander: > > * if I have a drive error, how do I know which cXtYdZ logical device > maps to which physical disk/bay? > > * how can I read temperature information from the drives? > > There seems to be some work done in this area by Eric Schrock and Rob > Johnston[1], which has led me to the disk-monitor FMA plugin. I am > assuming that this plugin will automatically handle temperature > monitoring and lighting the fault/locate LEDs but am not entirely sure > of this. > > I attempted to load the module but received: > > # fmadm load /usr/lib/fm/fmd/plugins/disk-monitor.so > fmadm: failed to load /usr/lib/fm/fmd/plugins/disk-monitor.so: > module failed to load (consult fmd(1M) log) > > > I checked the log as instructed, but no errors or warnings were > recorded. I know the log is working because when I accidentally tried to > load the plugin's .conf file instead of the .so, I *did* receive an > error in the fmd log: > > Nov 05 2009 15:08:36.125443460 ereport.fm.fmd.mod_init > nvlist version: 0 > version = 0x0 > class = ereport.fm.fmd.mod_init > ena = 0x751b2ea5e2305401 > msg = failed to load > /usr/lib/fm/fmd/plugins/disk-monitor.conf: Operation not supported > > __ttl = 0x1 > __tod = 0x4af256cc 0x77a1d84 > > > Can anyone suggest whether this is indeed what I should be doing, and if > so, why can't I load this FMA plugin? > > Additionally, even if I get this running - what methods are there to > manually identify a drive in the enclosure? ie, how do I send a command > to the SES device? There needs to be some level of manual control > available for this as I can think of multiple scenarios where I'd need > to identify and extract a non-faulty drive from an enclosure. > > Regards, > Tom > > [1] - http://blogs.sun.com/eschrock/entry/ses_sensors > _______________________________________________ > fm-discuss mailing list > fm-discuss at opensolaris dot org
_______________________________________________ fm-discuss mailing list fm-discuss at opensolaris dot org
|
|
|
|
Posts:
24
From:
Registered:
1/26/09
|
|
|
|
Re: [fm-discuss] Unable to load disk-monitor plugin / how to change SES indicators?
Posted:
Nov 8, 2009 3:55 PM
in response to: Tom Lanyon
To: Communities » fm » discuss
|
|
Hmm..
Is there any info on how to actually use all the existing parts to "make it work" ?
I have visibility to my backplane, but how do I test/map/configure/debug to bridge the gap from visible to functional ?
fmtopo already sees all the sas backplanes and devices in detail (I have even tried it with mutiple enclosures connected), since this hardware has drivers already (A Supermicro SAS backplane).
hc://:product-id=LSILOGIC-SASX36-A.1:chassis-id=50030480004b617f:server-id=:s erial=WD-WCAVA0867698:part=WDC-WD20EADS-00R6B0:revision=01.00A01/ses-enclosure=0 /bay=15/disk=0
So, what do I need to do to detect a failed fan with Fault Management.
The LSI controller's lsiutil can map solaris device to bay, so is also a useful tool.
Mark.
|
|
|
|
|