OpenSolaris

Discussions Communities Projects Download Source Browser

Home » OpenSolaris Forums » fm » discuss

Thread: [fm-discuss] Unable to load disk-monitor plugin / how to change SES indicators?

Welcome, Guest Help
Login Login
Guest Settings Guest Settings
Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 2 - Last Post: Nov 8, 2009 3:55 PM by: mark0001 Threads: [ Previous | Next ]
Tom Lanyon
tom@netspot.com.au
[fm-discuss] Unable to load disk-monitor plugin / how to change SES indicators?
Posted: Nov 4, 2009 9:02 PM

  Click to reply to this thread Reply

Hi all,

I'm trying to discover two things regarding my test system which has a
bunch of SATA disks attached to a SAS expander:

* if I have a drive error, how do I know which cXtYdZ logical device
maps to which physical disk/bay?

* how can I read temperature information from the drives?

There seems to be some work done in this area by Eric Schrock and Rob
Johnston[1], which has led me to the disk-monitor FMA plugin. I am
assuming that this plugin will automatically handle temperature
monitoring and lighting the fault/locate LEDs but am not entirely sure
of this.

I attempted to load the module but received:

# fmadm load /usr/lib/fm/fmd/plugins/disk-monitor.so
fmadm: failed to load /usr/lib/fm/fmd/plugins/disk-monitor.so: module
failed to load (consult fmd(1M) log)


I checked the log as instructed, but no errors or warnings were
recorded. I know the log is working because when I accidentally tried
to load the plugin's .conf file instead of the .so, I *did* receive an
error in the fmd log:

Nov 05 2009 15:08:36.125443460 ereport.fm.fmd.mod_init
nvlist version: 0
version = 0x0
class = ereport.fm.fmd.mod_init
ena = 0x751b2ea5e2305401
msg = failed to load /usr/lib/fm/fmd/plugins/disk-
monitor.conf: Operation not supported

__ttl = 0x1
__tod = 0x4af256cc 0x77a1d84


Can anyone suggest whether this is indeed what I should be doing, and
if so, why can't I load this FMA plugin?

Additionally, even if I get this running - what methods are there to
manually identify a drive in the enclosure? ie, how do I send a
command to the SES device? There needs to be some level of manual
control available for this as I can think of multiple scenarios where
I'd need to identify and extract a non-faulty drive from an enclosure.

Regards,
Tom

[1] - http://blogs.sun.com/eschrock/entry/ses_sensors
_______________________________________________
fm-discuss mailing list
fm-discuss at opensolaris dot org


robj

Posts: 61
From: US

Registered: 3/9/05
Re: [fm-discuss] Unable to load disk-monitor plugin / how to change SES indicators?
Posted: Nov 4, 2009 11:08 PM   in response to: Tom Lanyon

  Click to reply to this thread Reply

Hi Tom,

The disk-monitor module is not actually used to detect or diagnose disk faults,
but rather is a response agent designed for the thumper/thor platforms. The
disk-monitor module subscribes to FMA diagnosis and repair events and monitors
changes in the disk topology (by listening to hotplug sysevents). In response
to these events, it will send requests to the service processor (via IPMI) to
update FRU information and flip the disk bay LED's on/off, as appropriate.

In order for the disk-monitor module to operate, it needs to know the disk
topology of the system, including, as you alluded to, the mapping of solaris
disk devices to physical disk bays. For internal SATA/SAS disks, the code that
constructs the disk topology currently relies on a set of xml files where we've
hard-coded the mapping of drive bays to device nodes for a subset of Sun X64
platforms. For many (but not all[1]) external storage enclosures which support
SES, we're able to dynamically derive the disk topology without the aid of any
hard-coded information.

In the absence of this disk topology, disk-monitor will bail out during
initialization, which is likely happening on your system.

That said, disk error telemetry is actually fed into the Fault Manager from two
sources

1) The disk-transport module, which uses libdiskstatus to check for three
failure conditions via uSCSI interfaces:

over temperature
predictive failure
self-test failure

2) The sd driver, which will generate error telemetry for problems detected at
the target driver level.

Unfortunately, even though the your system will be capable of generating error
telemetry for your disks, the system that diagnosis faults from the error
telemetry also needs to consume information in the disk topology, so you're
still probably out of luck.

Hope this helps,

rob


[1] The full answer as to why can't derive the topology on all SES storage
enclosures is a bit too involved to dive into here, but it basically depends on
the complexity of the internal SAS topology of the array in question. If the
array presents a single root target at the top of the topology then libses will
do the right thing. However, if the topology uses SAS expanders to either
multi-attach the disks or to talk to different subsets of disks then SES will
present multiple targets at the top of the topology and to libses it may appear
as multiple storage enclosures, which cause us to generate an inaccurate topology.

There is a workaround for the latter case - libses provides a means of
overriding the interpretation by delivering a small plugin module to ses (either
a model-specific plugin for a specific array a single vendor specific plugin.

There are a couple projects underway
Tom Lanyon wrote:
> Hi all,
>
> I'm trying to discover two things regarding my test system which has a
> bunch of SATA disks attached to a SAS expander:
>
> * if I have a drive error, how do I know which cXtYdZ logical device
> maps to which physical disk/bay?
>
> * how can I read temperature information from the drives?
>
> There seems to be some work done in this area by Eric Schrock and Rob
> Johnston[1], which has led me to the disk-monitor FMA plugin. I am
> assuming that this plugin will automatically handle temperature
> monitoring and lighting the fault/locate LEDs but am not entirely sure
> of this.
>
> I attempted to load the module but received:
>
> # fmadm load /usr/lib/fm/fmd/plugins/disk-monitor.so
> fmadm: failed to load /usr/lib/fm/fmd/plugins/disk-monitor.so:
> module failed to load (consult fmd(1M) log)
>
>
> I checked the log as instructed, but no errors or warnings were
> recorded. I know the log is working because when I accidentally tried to
> load the plugin's .conf file instead of the .so, I *did* receive an
> error in the fmd log:
>
> Nov 05 2009 15:08:36.125443460 ereport.fm.fmd.mod_init
> nvlist version: 0
> version = 0x0
> class = ereport.fm.fmd.mod_init
> ena = 0x751b2ea5e2305401
> msg = failed to load
> /usr/lib/fm/fmd/plugins/disk-monitor.conf: Operation not supported
>
> __ttl = 0x1
> __tod = 0x4af256cc 0x77a1d84
>
>
> Can anyone suggest whether this is indeed what I should be doing, and if
> so, why can't I load this FMA plugin?
>
> Additionally, even if I get this running - what methods are there to
> manually identify a drive in the enclosure? ie, how do I send a command
> to the SES device? There needs to be some level of manual control
> available for this as I can think of multiple scenarios where I'd need
> to identify and extract a non-faulty drive from an enclosure.
>
> Regards,
> Tom
>
> [1] - http://blogs.sun.com/eschrock/entry/ses_sensors
> _______________________________________________
> fm-discuss mailing list
> fm-discuss at opensolaris dot org

_______________________________________________
fm-discuss mailing list
fm-discuss at opensolaris dot org


mark0001

Posts: 24
From:

Registered: 1/26/09
Re: [fm-discuss] Unable to load disk-monitor plugin / how to change SES indicators?
Posted: Nov 8, 2009 3:55 PM   in response to: Tom Lanyon
To: Communities » fm » discuss
  Click to reply to this thread Reply

Hmm..

Is there any info on how to actually use all the existing parts to "make it work" ?

I have visibility to my backplane, but how do I test/map/configure/debug to bridge the gap from visible to functional ?

fmtopo already sees all the sas backplanes and devices in detail (I have even tried it with mutiple enclosures connected), since this hardware has drivers already (A Supermicro SAS backplane).

hc://:product-id=LSILOGIC-SASX36-A.1:chassis-id=50030480004b617f:server-id=:s erial=WD-WCAVA0867698:part=WDC-WD20EADS-00R6B0:revision=01.00A01/ses-enclosure=0 /bay=15/disk=0

So, what do I need to do to detect a failed fan with Fault Management.

The LSI controller's lsiutil can map solaris device to bay, so is also a useful tool.

Mark.




Terms of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
Copyright © 1995-2005 Sun Microsystems, Inc.