OpenSolaris

Discussions Communities Projects Download Source Browser

Home » OpenSolaris Forums » zfs » discuss

Thread: [zfs-discuss] Workaround for mpt timeouts in snv_127

Welcome, Guest Help
Login Login
Guest Settings Guest Settings
Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 49 - Last Post: Feb 4, 2010 2:08 AM by: tonmaus Threads: [ Previous | Next ]
Carson Gaspar
carson@taltos.org
[zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 21, 2009 12:49 AM

  Click to reply to this thread Reply

For all of those suffering from mpt timeouts in snv_127, I decided to
give the ancient itmpt driver a whirl. It works fine, and in my brief
testing a zfs scrub that would generate about 1 timeout every 2 minutes
or so now runs with no problems.

The downside is that lsiutil and raidctl both fail to work :-(

I also tried the mpt driver from Solaris 10 x86 patch 143129-01, but
that fails to load with undefined symbols, as do the mpt drivers from
snv_111b and anv_118 (unless I got something wrong with my bootadm
update-archive invocation...)

For reference, I'm running firmware 1.29.00.00 (the latest), and the old
errors were of the form:

Nov 19 03:34:43 gandalf.taltos.org scsi: [ID 107833 kern.warning]
WARNING: /pci@0,0/pci8086,27d0@1c/pci1000,3140@0 (mpt0):
Nov 19 03:34:43 gandalf.taltos.org Disconnected command timeout for
Target 13

With the target number changing. When the hangs would happen, iostat
would show "stuck" I/Os on all 8 SATA disks attached to my controller
(which are all in the same raidz2 pool)

--
Carson
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


James C. McPher...
jmcp@opensolaris.org
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 21, 2009 1:08 AM   in response to: Carson Gaspar

  Click to reply to this thread Reply

Carson Gaspar wrote:
> For all of those suffering from mpt timeouts in snv_127, I decided to
> give the ancient itmpt driver a whirl. It works fine, and in my brief
> testing a zfs scrub that would generate about 1 timeout every 2 minutes
> or so now runs with no problems.
> The downside is that lsiutil and raidctl both fail to work :-(

... and that you don't have FMA support, or MPxIO.

> I also tried the mpt driver from Solaris 10 x86 patch 143129-01, but
> that fails to load with undefined symbols, as do the mpt drivers from
> snv_111b and anv_118 (unless I got something wrong with my bootadm
> update-archive invocation...)

The version of mpt(7d) that's in Solaris 10, Solaris 10 Updates and
Solaris 10 patches will NOT work on OpenSolaris or SXCE. The codebase
has diverged between Solaris 10 and now.

Likewise, the mpt(7d) driver has diverged from the version that's in
snv_111b / OpenSolaris 2009.06, and even in the 10 builds between 118
and 127. Frankly, I'm surprised you didn't panic your system, especially
with the Solaris 10 version.


We currently have two bugs open on what I believe to be the same
issue, namely

6894775 mpt driver timeouts and bus resets under load
6900767 Server hang with LSI 1068E based SAS controller under load

If you and everybody else who is seeing this problem could provide
details about your configuration (output from cfgadm -lva, raidctl
-l, prtconf -v, what your zpool configs are, and the firmware rev
of each disk in each zpool) that would help us sort through and find
any commonalities and hopefully a fix.


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Jeremy Kitchen
kitchen@scriptkitche...
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 21, 2009 8:30 AM   in response to: James C. McPher...

  Click to reply to this thread Reply

On Nov 21, 2009, at 1:08 AM, James C. McPherson wrote:
> We currently have two bugs open on what I believe to be the same
> issue, namely
>
> 6894775 mpt driver timeouts and bus resets under load
> 6900767 Server hang with LSI 1068E based SAS controller under load
>
> If you and everybody else who is seeing this problem could provide
> details about your configuration (output from cfgadm -lva, raidctl
> -l, prtconf -v, what your zpool configs are, and the firmware rev
> of each disk in each zpool) that would help us sort through and find
> any commonalities and hopefully a fix.

I will give you all of this information on monday. This is great
news :)

-Jeremy
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


ttabbal

Posts: 76
From: US

Registered: 9/4/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 23, 2009 3:21 PM   in response to: Jeremy Kitchen
To: Communities » zfs » discuss
  Click to reply to this thread Reply

> I will give you all of this information on monday.
> This is great news :)


Indeed. I will also be posting this information when I get to the server tonight. Perhaps it will help. I don't think I want to try using that old driver though, it seems too risky for my taste.

Is there a command to get the disk firmware rev from OpenSolaris while booted up? I know of some boot CDs that can get to it, but I'm unsure about accessing it while the server is running.

James C. McPher...
jmcp@opensolaris.org
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 23, 2009 3:38 PM   in response to: ttabbal

  Click to reply to this thread Reply

Travis Tabbal wrote:
>> I will give you all of this information on monday.
>> This is great news :)
>
>
> Indeed. I will also be posting this information when I get to the server
> tonight. Perhaps it will help. I don't think I want to try using that old
> driver though, it seems too risky for my taste.

Definitely not recommended.

> Is there a command to get the disk firmware rev from OpenSolaris while
> booted up? I know of some boot CDs that can get to it, but I'm unsure
> about accessing it while the server is running.

Of course. Use prtconf -v and look for the disk node hardware
properties. Example:



Hardware properties:
name='devid' type=string items=1
value='id1,sd@SATA_____SAMSUNG_HM320JI_______S19FJ10PC45360'
name='inquiry-device-type' type=int items=1
value=00000000
name='inquiry-revision-id' type=string items=1
value='2SS00_01'
name='inquiry-product-id' type=string items=1
value='HM320JI'
name='inquiry-vendor-id' type=string items=1
value='SAMSUNG'


which tells you that this is direct-access device (device type 0),
has a non-standard revision id field (2SS00_01) of which we take
the last 4 bytes as the actual revision field, and the vendor and
product ids. The devid information helps here too.


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


ttabbal

Posts: 76
From: US

Registered: 9/4/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 23, 2009 6:46 PM   in response to: James C. McPher...
To: Communities » zfs » discuss
  Click to reply to this thread Reply
Attachment cfgadm.txt (6.8 K)
Attachment prtconf.txt (338.6 K)

> If you and everybody else who is seeing this problem
> could provide
> details about your configuration (output from cfgadm
> -lva, raidctl
> -l, prtconf -v, what your zpool configs are, and the
> firmware rev
> of each disk in each zpool) that would help us sort
> through and find
> any commonalities and hopefully a fix.

-----------------------

Small ones are inline, larger ones are attached as text files. Let me know if there is more I can get for you.

-----------------------

root@nas:~# raidctl -l
Controller: 9
Disk: 0.0.0
Disk: 0.1.0
Disk: 0.2.0
Disk: 0.3.0
Controller: 10
Disk: 0.4.0
Disk: 0.5.0
Disk: 0.6.0
Disk: 0.7.0

root@nas:~# zpool status
pool: raid
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on older software versions.
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
raid ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c9t0d0 ONLINE 0 0 0
c9t1d0 ONLINE 0 0 0
c9t2d0 ONLINE 0 0 0
c9t3d0 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
c10t4d0 ONLINE 0 0 0
c10t5d0 ONLINE 0 0 0
c10t6d0 ONLINE 0 0 0
c10t7d0 ONLINE 0 0 0

errors: No known data errors

pool: rpool
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on older software versions.
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c8t0d0s0 ONLINE 0 0 0
c8t2d0s0 ONLINE 0 0 0

errors: No known data errors

tru

Posts: 3
From: FR

Registered: 6/7/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 30, 2009 1:24 AM   in response to: James C. McPher...

  Click to reply to this thread Reply

On Sat, Nov 21, 2009 at 07:08:20PM +1000, James C. McPherson wrote:
> If you and everybody else who is seeing this problem could provide
> details about your configuration (output from cfgadm -lva, raidctl
> -l, prtconf -v, what your zpool configs are, and the firmware rev
> of each disk in each zpool) that would help us sort through and find
> any commonalities and hopefully a fix.
>
On a supermicro board, with 3 hw raid6 vdev joined in a single pool,
random hangs (<weekly) which required hardware reset, nothing on the logs.

symptoms: rpool fine, zfs status hangs on the other volume
all nfs shares stalled on all linux clients (local "share" -> nothing).

Attached the requested files on 2 machines having the same issue.

Thanks

Tru
--
Dr Tru Huynh | http://www.pasteur.fr/recherche/unites/Binfs/
mailto:tru at pasteur dot fr | tel/fax +33 1 45 68 87 37/19
Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15 France
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


James C. McPher...
jmcp@opensolaris.org
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 30, 2009 1:59 AM   in response to: tru

  Click to reply to this thread Reply

Tru Huynh wrote:
> On Sat, Nov 21, 2009 at 07:08:20PM +1000, James C. McPherson wrote:
>> If you and everybody else who is seeing this problem could provide
>> details about your configuration (output from cfgadm -lva, raidctl
>> -l, prtconf -v, what your zpool configs are, and the firmware rev
>> of each disk in each zpool) that would help us sort through and find
>> any commonalities and hopefully a fix.
>>
> On a supermicro board, with 3 hw raid6 vdev joined in a single pool,
> random hangs (<weekly) which required hardware reset, nothing on the logs.
>
> symptoms: rpool fine, zfs status hangs on the other volume
> all nfs shares stalled on all linux clients (local "share" -> nothing).
>
> Attached the requested files on 2 machines having the same issue.


Two things here:

(1) your hba is a MegaRAID SAS ELP, which uses the mega_sas driver
not mpt, and
(2) If you've got nothing in the logs, then you need to do more
investigation to work out where the problem lies.

Since your system is not using mpt, you have a different problem.


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


ttabbal

Posts: 76
From: US

Registered: 9/4/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 23, 2009 7:28 PM   in response to: Carson Gaspar
To: Communities » zfs » discuss
  Click to reply to this thread Reply

I have a possible workaround. Mark Johnson <Mark dot Johnson at sun dot com> has been emailing me today about this issue and he proposed the following:

> You can try adding the following to /etc/system, then rebooting...
> set xpv_psm:xen_support_msi = -1

I have been able to format a ZVOL container from a VM 3 times while other activity is going on the system and it's working. I think performance is down a bit, but it's still acceptable. More importantly, it does so without killing the server. I would get the stall every time I would try this test before. So at least 1 case seems to be helped by doing this. I'll watch the server over the next few days to see if it stays improved. He mentioned that there is a fix being worked on for MSI handling in XVM that might make it into b129 that could fix this problem.

Carson Gaspar
carson@taltos.org
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 23, 2009 8:04 PM   in response to: ttabbal

  Click to reply to this thread Reply

Travis Tabbal wrote:
> I have a possible workaround. Mark Johnson <Mark dot Johnson at sun dot com> has
> been emailing me today about this issue and he proposed the
> following:
>
>> You can try adding the following to /etc/system, then rebooting...
>> set xpv_psm:xen_support_msi = -1

I am also running XVM, and after modifying /etc/system and rebooting, my
zpool scrub test is runing along merrily with no hangs so far, where
usually I would expect to see several by now.

Can the other folks who have seen this please test and report back? I'd
hate to think we solved it only to discover there were overlapping bugs.

Fingers crossed, and many thanks to those who have worked to track this
down!

--
Carson
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


ttabbal

Posts: 76
From: US

Registered: 9/4/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 24, 2009 10:34 AM   in response to: Carson Gaspar
To: Communities » zfs » discuss
  Click to reply to this thread Reply

> Travis Tabbal wrote:
> > I have a possible workaround. Mark Johnson
> <Mark dot Johnson at sun dot com> has
> > been emailing me today about this issue and he
> proposed the
> > following:
> >
> >> You can try adding the following to /etc/system,
> then rebooting...
> >> set xpv_psm:xen_support_msi = -1
>
> I am also running XVM, and after modifying
> /etc/system and rebooting, my
> zpool scrub test is runing along merrily with no
> hangs so far, where
> usually I would expect to see several by now.
>
> Can the other folks who have seen this please test
> and report back? I'd
> hate to think we solved it only to discover there
> were overlapping bugs.
>
> Fingers crossed, and many thanks to those who have
> worked to track this
> down!


Nice to see we have one confirmed report that things are working. Hopefully we get a few more! Even if it's just a workaround until a real fix makes it in, it gets us running.

James C. McPher...
jmcp@opensolaris.org
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 24, 2009 10:47 AM   in response to: ttabbal

  Click to reply to this thread Reply


Thankyou for all who've procvided data about this. I've updated
the bugs mentioned earlier and I believe we can now make progress
on diagnosis.

The new synopsis (should show up on b.o.o tomorrow) is as follows:

6894775 mpt's msi support is suboptimal with xVM




James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


acheal

Posts: 28
From:

Registered: 6/24/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 24, 2009 1:15 PM   in response to: James C. McPher...
To: Communities » zfs » discuss
  Click to reply to this thread Reply
Attachment config.tar.gz (47.2 K)

>
> Thankyou for all who've procvided data about this.
> I've updated
> the bugs mentioned earlier and I believe we can now
> make progress
> on diagnosis.
>
> The new synopsis (should show up on b.o.o tomorrow)
> is as follows:
>
> 6894775 mpt's msi support is suboptimal with xVM
>

FYI, as the original submitter of 6894775 I can tell you that we don't use XVM so the title change is misleading; we are using a simple physical server with a LSI3801E attached to 2 JBODs. My case has really gone nowhere and the original engineer in charge of it only made a few suggestions. Namely:

- Use the latest OS build (we've tried all versions from b118, when CIFS became a usable feature, and up and it made no difference)
- Use the latest FW available from LSI for the card and leave it at default settings (which we were already doing)
- Try disabling IOMMU via /etc/system or rootnex.conf entry, in case it interferes with MPT (which we tried and it didn't help)

So, we are still looking for the root cause of this problem. I have attached all of our config, as you requested, including our system file and interrupt listing. I even updated to b127 to try and keep the comparison as "apples-to-apples" as possible.

In summary, here is what we've tried (to no avail):
- Thottling IO to vdevs using zfs_vdev_max_pending=10 and zfs_scrub_limit=1
- Using older and newer LSI firmware releases
- Using all "stable" builds from b118 up
- Using the older LSI itmpt driver instead of mpt

Note that we have multiple systems with this exact same config and we can replicate the problem on all of them; the only real requirement is to have about a TB or so of data in the zpool so that the scrub can work for at least 5 minutes before the timeouts etc. begin. The more data on the zpool, the more IO the scrub generates and therefore the faster the problem starts appearing.

If you need anything else, please ask as I've been banging my head against a wall with this problem for months now.

James C. McPher...
jmcp@opensolaris.org
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 29, 2009 7:34 PM   in response to: acheal

  Click to reply to this thread Reply

Adam Cheal wrote:
>> Thankyou for all who've procvided data about this. I've updated the
>> bugs mentioned earlier and I believe we can now make progress on
>> diagnosis.
>>
>> The new synopsis (should show up on b.o.o tomorrow) is as follows:
>>
>> 6894775 mpt's msi support is suboptimal with xVM
>>
>
> FYI, as the original submitter of 6894775 I can tell you that we don't
> use XVM so the title change is misleading; we are using a simple physical
> server with a LSI3801E attached to 2 JBODs. My case has really gone
> nowhere and the original engineer in charge of it only made a few
> suggestions. Namely:


Hi Adam,
thanks for this info. I've talked with my colleagues in Beijing (since
I'm in Beijing this week) and we'd like you to try disabling MSI/MSI-X
for your mpt instances. In /etc/system, add

set mpt:mpt_enable_msi = 0

then regen your boot archive and reboot.

I've added this to the public comments field of the CR, and removed
the reference to xVM from the synopsis - hopefully the mail gateway
will send your copy reasonably soon :-)


Best regards,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


acheal

Posts: 28
From:

Registered: 6/24/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 29, 2009 10:05 PM   in response to: James C. McPher...
To: Communities » zfs » discuss
  Click to reply to this thread Reply

> Hi Adam,
> thanks for this info. I've talked with my colleagues
> in Beijing (since
> I'm in Beijing this week) and we'd like you to try
> disabling MSI/MSI-X
> for your mpt instances. In /etc/system, add
>
> set mpt:mpt_enable_msi = 0
>
> then regen your boot archive and reboot.
>

I had already done this at Mark Johnson's request, though I had just added:

set mpt:mpt_enable_msi=0
set mptsas:mptsas_enable_msi=0

...to the /etc/system file and did a full reboot. I didn't know I had to regen the boot archive manually for those new settings to take effect: how/why would I do this? The fact the IO rate changed for me during the test indicated that the new settings had "taken". Long story short, I still had the problems after making this change though they took longer to appear.

Longer story:

After making the change, rebooting and starting a scrub on the pool I watched iostat for hints of trouble (i.e. error column changes). The IO rate to the disk was definately slower after this change, with individual disks never getting more than 50% busy and 2 active commands. About three hours later, the read errors/bus resets started to appear. I assume the longer delay before errors was just because the reduced IO was putting less of a strain on the driver/hardware.

Let me know if you want me to refine this test or any other diagnostics that would help you out.

James C. McPher...
jmcp@opensolaris.org
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 29, 2009 10:29 PM   in response to: acheal

  Click to reply to this thread Reply

Adam Cheal wrote:
>> Hi Adam,
>> thanks for this info. I've talked with my colleagues
>> in Beijing (since
>> I'm in Beijing this week) and we'd like you to try
>> disabling MSI/MSI-X
>> for your mpt instances. In /etc/system, add
>>
>> set mpt:mpt_enable_msi = 0
>>
>> then regen your boot archive and reboot.
>>
>
> I had already done this at Mark Johnson's request, though I had just added:
>
> set mpt:mpt_enable_msi=0
> set mptsas:mptsas_enable_msi=0
>
> ...to the /etc/system file and did a full reboot. I didn't know I had to
regen the boot archive manually for those new settings to take effect:
how/why would I do this? The fact the IO rate changed for me during the test
indicated that the new settings had "taken". Long story short, I still had
the problems after making this change though they took longer to appear.

I thought you had just set

set xpv_psm:xen_support_msi = -1

which is different, because that sets the xen_support_msi variable
which lives inside the xpv_psm module.

Setting mptsas:* will have no effect on your system if you do not
have an mptsas card installed. The mptsas cards are not generally
available yet (they're 2nd generation), so I would be surprised if
you had one.

> Longer story:
>
> After making the change, rebooting and starting a scrub on the pool I
> watched iostat for hints of trouble (i.e. error column changes). The IO
> rate to the disk was definately slower after this change, with individual
> disks never getting more than 50% busy and 2 active commands. About three
> hours later, the read errors/bus resets started to appear. I assume the
> longer delay before errors was just because the reduced IO was putting
> less of a strain on the driver/hardware. Let me know if you want me to
> refine this test or any other diagnostics that would help you out.

I think that's sufficient to go on for the moment, thankyou.


cheers,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


acheal

Posts: 28
From:

Registered: 6/24/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 29, 2009 10:47 PM   in response to: James C. McPher...
To: Communities » zfs » discuss
  Click to reply to this thread Reply

>
> I thought you had just set
>
> set xpv_psm:xen_support_msi = -1
>
> which is different, because that sets the
> xen_support_msi variable
> which lives inside the xpv_psm module.
>
> Setting mptsas:* will have no effect on your system
> if you do not
> have an mptsas card installed. The mptsas cards are
> not generally
> available yet (they're 2nd generation), so I would be
> surprised if
> you had one.

No...I had set the other two variables after Mark contacted me offline to do some testing mainly to verify the problem was, indeed, not xVM specific. I had added the mptsas line as well, as per his recommendations, because I wasn't sure if there was some crossover between it and using the MPT driver for a SAS card. Thanks for clearing that up though...obviously we don't need it for the LSI3801E we are using.

Can you explain the "regen the boot archive" request in more detail though? This has me wondering if there were additional steps we needed to take when testing out other /etc/system tweaks, such as the vdev queue limitation. I want to make sure we eliminate sources of the problem, if possible, as the chain of possible blame is still quite long right now.

James C. McPher...
jmcp@opensolaris.org
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 29, 2009 11:09 PM   in response to: acheal

  Click to reply to this thread Reply

Adam Cheal wrote:
>> I thought you had just set
>>
>> set xpv_psm:xen_support_msi = -1
>>
>> which is different, because that sets the
>> xen_support_msi variable
>> which lives inside the xpv_psm module.
>>
>> Setting mptsas:* will have no effect on your system
>> if you do not
>> have an mptsas card installed. The mptsas cards are
>> not generally
>> available yet (they're 2nd generation), so I would be
>> surprised if
>> you had one.
>
> No...I had set the other two variables after Mark contacted me offline to
>
do some testing mainly to verify the problem was, indeed, not xVM specific.
I had added the mptsas line as well, as per his recommendations, because I
wasn't sure if there was some crossover between it and using the MPT driver
for a SAS card. Thanks for clearing that up though...obviously we don't need
it for the LSI3801E we are using.


Ah, ok



> Can you explain the "regen the boot archive" request in more detail
though? This has me wondering if there were additional steps we needed to
take when testing out other /etc/system tweaks, such as the vdev queue
limitation. I want to make sure we eliminate sources of the problem, if
possible, as the chain of possible blame is still quite long right now.

The reboot command should have automatically run bootadm update-archive
for you, I have this habit of running it by hand whenever I change a
driver or /etc/system to make sure that I have an up to date boot archive
from that point in time onwards.


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


mrj

Posts: 590
From: US

Registered: 3/9/05
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 30, 2009 5:29 AM   in response to: James C. McPher...

  Click to reply to this thread Reply



James C. McPherson wrote:
> Adam Cheal wrote:
>>> I thought you had just set
>>>
>>> set xpv_psm:xen_support_msi = -1
>>>
>>> which is different, because that sets the
>>> xen_support_msi variable
>>> which lives inside the xpv_psm module.
>>>
>>> Setting mptsas:* will have no effect on your system
>>> if you do not
>>> have an mptsas card installed. The mptsas cards are
>>> not generally
>>> available yet (they're 2nd generation), so I would be
>>> surprised if
>>> you had one.
>>
>> No...I had set the other two variables after Mark contacted me offline to
>>
> do some testing mainly to verify the problem was, indeed, not xVM specific.
> I had added the mptsas line as well, as per his recommendations, because I
> wasn't sure if there was some crossover between it and using the MPT driver
> for a SAS card. Thanks for clearing that up though...obviously we don't
> need
> it for the LSI3801E we are using.
>
> Ah, ok

I think there are two different bugs here...

I think there is a problem with MSIs and some variant of mpt
card on xVM. These seem to be showing up as timeout errors.
Disabling MSIs for this adapter seems to fix this problem.
For folks seeing this problem, what HBA adapter are you using
that you see this problem on?

The second problem is there appears to be a problem with mpt
and the LSI3801E. These seem to be command errors than
timeouts? Not sure if they were seeing timeouts too?
I believe the following things were stated across the thread..
Can folks confirm/deny each of these?

o The problems are not seen with Sun's version of this card

o The problems are not seen with LSI's version of the driver

o The problems are seen with the latest LSI firmware

o Errors still occur if MSIs are disabled. They seem to
occur less frequently. Were timeouts being seen before
MSIs were disabled? If so, are timeouts being seen after
MSIs were disabled? i.e. are there two different problems
here, timeouts, and then commands failing.. If so, maybe
these are unrelated problems?

For folks seeing the command failures, what are you using for
a jbod? Is there firmware on the jbod, and if so, is it up
to date? Have you tried a different jbod? Are the command failures
tied to a subset of the disks or effect all of them? Have you tried
a different length cable?



Thanks,


MRJ


_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


acheal

Posts: 28
From:

Registered: 6/24/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 30, 2009 7:45 AM   in response to: mrj
To: Communities » zfs » discuss
  Click to reply to this thread Reply

> Can folks confirm/deny each of these?
>
> o The problems are not seen with Sun's version of
> this card

On the Thumper x4540 (which uses 6 of the same LSI 1068E controller chips), we do not see this problem. Then again, it uses a one-to-one mapping of controller PHY ports to internal disks; no JBODs or expanders here. 1 controller per 8 disks is a much more performance-oriented ratio, so I don`t expect to see the problem there.

We have not tried using a Sun re-branded LSI controller for the external JBODs. My understanding is that Sun uses a custom firmware derived from LSIs public offerings.

> o The problems are not seen with LSI's version of
> the driver

Incorrect. We have tried using the latest itmpt driver from LSI and see the same problem.

> o The problems are seen with the latest LSI
> firmware

Correct. We`ve tried Phase 15, 16 and 17. All exhibit the same problem.

> o Errors still occur if MSIs are disabled. They
> seem to
> occur less frequently. Were timeouts being seen
> before
> MSIs were disabled? If so, are timeouts being
> seen after
> MSIs were disabled?

Correct: disabling the MSIs did not affect the problem, although they did slow the IO on the system down enough to delay the onset of the problem a few hours. Timeouts were being seen before disabling MSIs and they are usually coupled with bus resets, which is standard behaviour for the sd driver if an IO is timed out for too long, I believe.

> folks seeing the command failures, what are you using
> for
> a jbod? Is there firmware on the jbod, and if so, is
> it up
> to date? Have you tried a different jbod? Are the
> command failures
> tied to a subset of the disks or effect all of them?
> Have you tried
> a different length cable?
>

We use Dell DCS J23 JBODs (23 disk enclosure), 2 per LSI3801E, fully populated with enterprise-grade WD SATA drives. We`ve tried both R105 and R106 firmware (both latest production-grade firmware) on them with no differences. The problem affects all disks in the JBOD(s), not specific ones. Usually one or two disks start to timeout which snowballs into all of them when the bus resets. We have 15 of these systems running, all with the same config using 2 foot external cables...changing cables doesn`t help. We have not tried using a different JBOD.

- Adam

ttabbal

Posts: 76
From: US

Registered: 9/4/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 30, 2009 9:27 AM   in response to: mrj
To: Communities » zfs » discuss
  Click to reply to this thread Reply

> o The problems are not seen with Sun's version of
> this card

Unable to comment as I don't have a Sun card here. If Sun would like to send me one, I would be willing to test it compared to the cards I do have. I'm running Supermicro USAS-L8i cards (LSI 1068e based).

> o The problems are not seen with LSI's version of
> the driver

I haven't tried it as comments from Sun staff here have indicated that it's not a good idea.

> o The problems are seen with the latest LSI
> firmware

Yes. When I checked, the LSI site was listing the version I see at boot.

> o Errors still occur if MSIs are disabled.

I haven't seen any command timeout errors since disabling MSIs. I tried using the command to disable MSI only for the MPT driver, but I get a similar error from the NVidia driver at that point as it has my boot drives. It seems to me that the issue seems to have more in common with MSIs than the drivers themselves. I do have a scrub scheduled for 12/1, so I can check the logs after than to see if it appears from that. My other tests have not triggered the issue since disabling MSIs. I'm currently running with "set xpv_psm:xen_support_msi = -1".

I am not using any jbod enclosures. My setup uses SAS to SATA breakout cables and connect directly to the drives. I have tried different cables and lengths. The timeouts affected drives in a seemingly random fashion. I would get timeouts on both controllers and every drive over time.

I have never had command errors here. Just the timeouts.

ttabbal

Posts: 76
From: US

Registered: 9/4/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Dec 1, 2009 7:59 AM   in response to: ttabbal
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Just an update, my scrub completed without any timeout errors in the log. XVM with MSI disabled globally.

Carson Gaspar
carson@taltos.org
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 30, 2009 2:14 PM   in response to: mrj

  Click to reply to this thread Reply

Mark Johnson wrote:

> I think there are two different bugs here...
>
> I think there is a problem with MSIs and some variant of mpt
> card on xVM. These seem to be showing up as timeout errors.
> Disabling MSIs for this adapter seems to fix this problem.
> For folks seeing this problem, what HBA adapter are you using
> that you see this problem on?

I have just confirmed that adding "set mpt:mpt_enable_msi = 0" and
removing "set xpv_psm:xen_support_msi = -1" in /etc/system also fixes
the problem for me.

I am running an LSI branded SAS3081E-R with directly attached SATA
disks. See my previous email for full system info.

For the record, I am _not_ seeing the other command error problem. But I
don't have an external chassis, expanders, etc.

--
Carson

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Jeremy Kitchen
kitchen@scriptkitche...
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 30, 2009 3:32 PM   in response to: Carson Gaspar

  Click to reply to this thread Reply


On Nov 30, 2009, at 2:14 PM, Carson Gaspar wrote:

> Mark Johnson wrote:
>
>> I think there are two different bugs here...
>> I think there is a problem with MSIs and some variant of mpt
>> card on xVM. These seem to be showing up as timeout errors.
>> Disabling MSIs for this adapter seems to fix this problem.
>> For folks seeing this problem, what HBA adapter are you using
>> that you see this problem on?
>
> I have just confirmed that adding "set mpt:mpt_enable_msi = 0" and
> removing "set xpv_psm:xen_support_msi = -1" in /etc/system also
> fixes the problem for me.
>
> I am running an LSI branded SAS3081E-R with directly attached SATA
> disks. See my previous email for full system info.
>
> For the record, I am _not_ seeing the other command error problem.
> But I don't have an external chassis, expanders, etc.

I'm using a LSI Logic SAS1068E controller (according to lsiutil) and
NOT using XVM and seeing these problems. I just put 'set
mpt:mpt_enable_msi = 0' into my /etc/system and rebooted, we'll see
how that works. If this seems to solve the issue for this machine
I'll do it on all of our others as well (we currently have about 12 of
these running, about 1.5PB of data online on them) and report back.

I just basically wanted to chime in and mention that I'm also having
these problems but NOT using XVM as other people are.

-Jeremy


_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Carson Gaspar
carson@taltos.org
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 30, 2009 6:08 PM   in response to: Carson Gaspar

  Click to reply to this thread Reply

Carson Gaspar wrote:
> Mark Johnson wrote:
>
>> I think there are two different bugs here...
>>
>> I think there is a problem with MSIs and some variant of mpt
>> card on xVM. These seem to be showing up as timeout errors.
>> Disabling MSIs for this adapter seems to fix this problem.
>> For folks seeing this problem, what HBA adapter are you using
>> that you see this problem on?
>
> I have just confirmed that adding "set mpt:mpt_enable_msi = 0" and
> removing "set xpv_psm:xen_support_msi = -1" in /etc/system also fixes
> the problem for me.
>
> I am running an LSI branded SAS3081E-R with directly attached SATA
> disks. See my previous email for full system info.
>
> For the record, I am _not_ seeing the other command error problem. But I
> don't have an external chassis, expanders, etc.

I spoke too soon, I came back home to:

Nov 30 15:52:19 gandalf.taltos.org scsi: [ID 107833 kern.warning]
WARNING: /pci@0,0/pci8086,27d0@1c/pci1000,3140@0 (mpt0):
Nov 30 15:52:19 gandalf.taltos.org Disconnected command timeout for
Target 11
Nov 30 15:52:19 gandalf.taltos.org scsi: [ID 365881 kern.notice]
/pci@0,0/pci8086,27d0@1c/pci1000,3140@0 (mpt0):
Nov 30 15:52:19 gandalf.taltos.org Log info 0x31140000 received for
target 11.
Nov 30 15:52:19 gandalf.taltos.org scsi_status=0x0,
ioc_status=0x8048, scsi_state=0xc
Nov 30 15:52:19 gandalf.taltos.org scsi: [ID 365881 kern.notice]
/pci@0,0/pci8086,27d0@1c/pci1000,3140@0 (mpt0):
Nov 30 15:52:19 gandalf.taltos.org Log info 0x31140000 received for
target 11.
Nov 30 15:52:19 gandalf.taltos.org scsi_status=0x0,
ioc_status=0x8048, scsi_state=0xc
Nov 30 15:52:19 gandalf.taltos.org scsi: [ID 365881 kern.notice]
/pci@0,0/pci8086,27d0@1c/pci1000,3140@0 (mpt0):
Nov 30 15:52:19 gandalf.taltos.org Log info 0x31130000 received for
target 11.
Nov 30 15:52:19 gandalf.taltos.org scsi_status=0x0,
ioc_status=0x8048, scsi_state=0xc

Reverting to "set xpv_psm:xen_support_msi = -1" now...

--
Carson
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


James C. McPher...
jmcp@opensolaris.org
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 30, 2009 6:36 PM   in response to: mrj

  Click to reply to this thread Reply

Hi all,
I believe it's an accurate summary of the emails on this thread
over the last 18 hours to say that

(1) disabling MSI support in xVM makes the problem go away

(2) disabling MSI support on bare metal when you only have
disks internal to your host (no jbods), makes the problem
go away
(several reports of this)

(3) disabling MSI support on bare metal when you have a non-Sun
jbod (and cables) does _not_ make the problem go away.
(several reports of this)

(4) the problem is not seen with a Sun-branded jbod and cables
(only one report of this)

(5) problem is seen with both mpt(7d) and itmpt(7d).

(6) mpt(7d) without MSI support is sloooooow.


For those who've been suffering this problem and who have non-Sun
jbods, could you please let me know what model of jbod and cables
(including length thereof) you have in your configuration.

For those of you who have been running xVM without MSI support,
could you please confirm whether the devices exhibiting the problem
are internal to your host, or connected via jbod. And if via jbod,
please confirm the model number and cables.

Please note that Jianfei and I are not making assumptions about the
root cause here, we're just trying to nail down specifics of what
seems to be a likely cause.


thankyou in advance,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Chad Cantwell
chad@iomail.org
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 30, 2009 7:06 PM   in response to: James C. McPher...

  Click to reply to this thread Reply

Hi,

I just posted a summary of a similiar issue I'm having with non-Sun hardware.
For the record, it's in a Chenbro RM41416 chassis with 4 chenbro SAS backplanes
but no expanders (each backplane is 4 disks connected by SFF-8087 cable). Each
of my LSI brand SAS3081E PCI-E cards is connected to two backplanes with 1m
SFF-8087 (both ends) cables. For more details if they are important see my
other post. I haven't tried the MSI workaround yet (although I'm not sure what
MSI is) but from what I've read the workaround won't fix the issues in my case
with non-sun hardware.

Thanks,
Chad

On Tue, Dec 01, 2009 at 12:36:33PM +1000, James C. McPherson wrote:
> Hi all,
> I believe it's an accurate summary of the emails on this thread
> over the last 18 hours to say that
>
> (1) disabling MSI support in xVM makes the problem go away
>
> (2) disabling MSI support on bare metal when you only have
> disks internal to your host (no jbods), makes the problem
> go away
> (several reports of this)
>
> (3) disabling MSI support on bare metal when you have a non-Sun
> jbod (and cables) does _not_ make the problem go away.
> (several reports of this)
>
> (4) the problem is not seen with a Sun-branded jbod and cables
> (only one report of this)
>
> (5) problem is seen with both mpt(7d) and itmpt(7d).
>
> (6) mpt(7d) without MSI support is sloooooow.
>
>
> For those who've been suffering this problem and who have non-Sun
> jbods, could you please let me know what model of jbod and cables
> (including length thereof) you have in your configuration.
>
> For those of you who have been running xVM without MSI support,
> could you please confirm whether the devices exhibiting the problem
> are internal to your host, or connected via jbod. And if via jbod,
> please confirm the model number and cables.
>
> Please note that Jianfei and I are not making assumptions about the
> root cause here, we're just trying to nail down specifics of what
> seems to be a likely cause.
>
>
> thankyou in advance,
> James C. McPherson
> --
> Senior Kernel Software Engineer, Solaris
> Sun Microsystems
> http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris dot org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


ttabbal

Posts: 76
From: US

Registered: 9/4/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 30, 2009 9:08 PM   in response to: James C. McPher...
To: Communities » zfs » discuss
  Click to reply to this thread Reply

> (1) disabling MSI support in xVM makes the problem go
> away

Yes here.


> (6) mpt(7d) without MSI support is sloooooow.


That does seem to be the case. It's not so bad overall, and at least the performance is consistent. It would be nice if this were improved.


> For those of you who have been running xVM without
> MSI support,
> could you please confirm whether the devices
> exhibiting the problem
> are internal to your host, or connected via jbod. And
> if via jbod,
> please confirm the model number and cables.


Direct connect. The drives are in hot-swap racks, but they are passive devices. No expanders or anything like that in there. In case it's interesting, the racks are StarTech HSB430SATBK devices. I'm using SAS to SATA breakout cables to connect them. I have tried different lengths with the same result.

picker

Posts: 137
From: US

Registered: 12/1/05
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 30, 2009 10:30 PM   in response to: ttabbal

  Click to reply to this thread Reply


> Chenbro 16 hotswap bay case. It has 4 mini backplanes that each connect via an SFF-8087 cable
> StarTech HSB430SATBK

hmm, both are passive backplanes with one SATA tunnel per link...
no SAS Expanders (LSISASx36) like those found in SuperMicro or J4x00 with 4 links per connection.
wonder if there is a LSI issue with too many links in HBA mode?

Rob

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


ttabbal

Posts: 76
From: US

Registered: 9/4/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Dec 1, 2009 8:18 AM   in response to: picker
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Perhaps. As I noted though, it also occurs on the onboard NVidia SATA controller when MSI is enabled. I had already put a line in /etc/system to disable MSI for that controller per a forum thread and it worked great. I'm now running with all MSI disabled via XVM as the mpt controller is giving me the same problems. As it's happening on totally different controller types, cable types, and drive types, I have to go with software issues. I know for sure the NVidia issue didn't come up on 2009.06. It makes the system take forever to boot, so it's very noticeable. It happened when I first went to dev builds, I want to say it was around b118. I updated for better XVM support for newer Linux kernels.

The NVidia controller causes similar log messages. Command timeouts. Disabling MSIs fixes it as well. Motherboard is an Asus M4N82 Deluxe. NVIDIA nForce 980a SLI chipset.

I expect the root cause is the same, and I would guess that something is causing the drivers to miss or not receive some interrupts. However, my programming at this level is limited, so perhaps I'm misdiagnosing the issue.

calvinm

Posts: 3
From:

Registered: 12/5/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Dec 5, 2009 2:47 PM   in response to: James C. McPher...
To: Communities » zfs » discuss
  Click to reply to this thread Reply

I found this thread after fighting the same problem in Nexenta which uses the OpenSolaris kernel from b104. Thankfully, I think I have (for the moment) solved my problem.

Background:

I have an LSI 3081e-R (1068E based) adapter which experiences the same disconnected command timeout error under relatively light load. This card connects to a Supermicro chassis using 2 MiniSAS cables to redundant expanders that are attached to 18 SAS drives. The card ran the latest IT firmware (1.29?).

This server is a new install, and even installing from the CD to two disks in a mirrored ZFS root would randomly cause the disconnect error. The system remained unresponsive until after a reboot.

I tried the workarounds mentioned in this thread, namely using "set mpt:mpt_enable_msi = 0" and "set xpv_psm:xen_support_msi = -1" in /etc/system. Once I added those lines, the system never really became unresponsive, however there were partial read and partial write messages that littered dmesg. At one point there appeared to be a disconnect error ( can not confirm ) that the system recovered from.

Eventually, I became desperate and flashed the IR (Integrated Raid) firmware over the top of the IT firmware. Since then, I have had no errors in dmesg of any kind.

I even removed the workarounds from /etc/system and still have had no issues. The mpt driver is exceptionally quiet now.

I'm interested to know if anyone who has a 1068E based card is having these problems using the IR firmware, or if they all seem to be IT (initiator target) related.

Chad Cantwell
chad@iomail.org
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Dec 5, 2009 3:37 PM   in response to: calvinm

  Click to reply to this thread Reply

I was under the impression that the problem affecting most of us was introduced much later than b104,
sometime between ~114 and ~118. When I first started using my LSI 3081 cards, they had the IR firmware
on them, and it caused me all kinds of problems. The disks showed up but I couldn't write to them, I
believe. Eventually I found that I needed the IT firmware for it to work properly, which is what I
have used ever since, but maybe some builds do work with IR firmware? I remember, then, when I was
originally trying to set them up with the IR firmware, Opensolaris saw my two cards as one device,
whereas with the IT firmware they were always mpt0 and mpt1. Could also be the IR works with one card
but not well when two cards are combine...

Chad

On Sat, Dec 05, 2009 at 02:47:55PM -0800, Calvin Morrow wrote:
> I found this thread after fighting the same problem in Nexenta which uses the OpenSolaris kernel from b104. Thankfully, I think I have (for the moment) solved my problem.
>
> Background:
>
> I have an LSI 3081e-R (1068E based) adapter which experiences the same disconnected command timeout error under relatively light load. This card connects to a Supermicro chassis using 2 MiniSAS cables to redundant expanders that are attached to 18 SAS drives. The card ran the latest IT firmware (1.29?).
>
> This server is a new install, and even installing from the CD to two disks in a mirrored ZFS root would randomly cause the disconnect error. The system remained unresponsive until after a reboot.
>
> I tried the workarounds mentioned in this thread, namely using "set mpt:mpt_enable_msi = 0" and "set xpv_psm:xen_support_msi = -1" in /etc/system. Once I added those lines, the system never really became unresponsive, however there were partial read and partial write messages that littered dmesg. At one point there appeared to be a disconnect error ( can not confirm ) that the system recovered from.
>
> Eventually, I became desperate and flashed the IR (Integrated Raid) firmware over the top of the IT firmware. Since then, I have had no errors in dmesg of any kind.
>
> I even removed the workarounds from /etc/system and still have had no issues. The mpt driver is exceptionally quiet now.
>
> I'm interested to know if anyone who has a 1068E based card is having these problems using the IR firmware, or if they all seem to be IT (initiator target) related.
> --
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris dot org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


calvinm

Posts: 3
From:

Registered: 12/5/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Dec 11, 2009 9:52 PM   in response to: Chad Cantwell
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Can't say when the problems may have been introduced, but it looks like we've got my report (b104) and another report from b111 of issues with the 1068E.

The IR firmware seems to do some sort of internal multipathing while the IT firmware doesn't do any. With the IT firmware, I enabled multipathing (in-between frequent reboots due to the card hanging under load) and had very good performance with the round-robin policy.

When I reverted to the IR firmware to fix my issues, I found that round-robin slowed down my disk benchmark tests (IOZone). My guess is because it was trying to use pathways that the IR firmware had marked standby internally. Disabling the round-robin policy almost doubled throughput under the IR firmware.

seibert

Posts: 1
From: US

Registered: 12/6/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Dec 6, 2009 1:33 PM   in response to: calvinm
To: Communities » zfs » discuss
  Click to reply to this thread Reply

I've spent all weekend fighting this problem on our storage server after installing a ZFS log device, and your suggestion fixed it!

I also have a LSI 3081E-R adapter (B3 revision) connected to a SAS expander backplane with 7 drives on it. None of the /etc/system options mentioned in this thread worked, but after switching to the LSI 1.29 IR firmware, the controller no longer hangs constantly. I'm using OpenSolaris 9.06 (snv 111b).

I still see warnings like this in dmesg occassionally:

Dec 6 14:15:04 wid scsi: [ID 365881 kern.info] /pci@0,0/pci1166,142@9/pci1000,3080@0 (mpt0):
Dec 6 14:15:04 wid Log info 0x31123000 received for target 26.
Dec 6 14:15:04 wid scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc

but I have not observed any problems yet while I scrub the zpool.

calvinm

Posts: 3
From:

Registered: 12/5/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Dec 11, 2009 9:44 PM   in response to: seibert
To: Communities » zfs » discuss
  Click to reply to this thread Reply

I'm glad I was able to help someone.

My card is also a 3081E-R (B3). It shipped to me with the IR firmware, and I immediately flashed the IT firmware on it because I had heard it was supposed to be (better, faster, stable, shiny) with Solaris and ZFS.

The motherboard on that server has an LSI 2008 (SAS 2.0) chip onboard. My hope is that I'll be able to upgrade in the future with less excitement than I had trying to get the 1068E to work.

Jeremy Kitchen
kitchen@scriptkitche...
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 23, 2009 9:58 PM   in response to: ttabbal

  Click to reply to this thread Reply


On Nov 23, 2009, at 7:28 PM, Travis Tabbal wrote:

> I have a possible workaround. Mark Johnson <Mark dot Johnson at sun dot com>
> has been emailing me today about this issue and he proposed the
> following:
>
>> You can try adding the following to /etc/system, then rebooting...
>> set xpv_psm:xen_support_msi = -1

would this change affect systems not using XVM? we are just using
these as backup storage.

Thanks!

-Jeremy
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


ttabbal

Posts: 76
From: US

Registered: 9/4/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Nov 24, 2009 10:33 AM   in response to: Jeremy Kitchen
To: Communities » zfs » discuss
  Click to reply to this thread Reply

>
> On Nov 23, 2009, at 7:28 PM, Travis Tabbal wrote:
>
> > I have a possible workaround. Mark Johnson
> <Mark dot Johnson at sun dot com>
> > has been emailing me today about this issue and he
> proposed the
> > following:
> >
> >> You can try adding the following to /etc/system,
> then rebooting...
> >> set xpv_psm:xen_support_msi = -1
>
> would this change affect systems not using XVM? we
> are just using
> these as backup storage.

Probably not. Are you seeing the issue without XVM installed? We had one other user report that the issue went away when they removed XVM, so I had thought it wouldn't affect other users. If you are getting the same issue without XVM, there may be overlapping bugs in play. Someone at Sun might be able to tell you how to disable MSI on the controller. Someone told me how to do it for the NVidia SATA controller when there was a bug in that driver. I would think there is a way to do it for the MPT driver.

mark0001

Posts: 43
From:

Registered: 1/26/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Jan 25, 2010 8:42 PM   in response to: ttabbal
To: Communities » zfs » discuss
  Click to reply to this thread Reply

I can produce the timeout error on multiple, similar servers.
These are storage servers, so no zones or gui running.
Hardware:
Supermicro X7DWN with AOC-USASLP-L8i controller
E1 (single port) backplanes (16 & 24 bay)
(LSILOGICSASX28 A.0 and LSILOGICSASX36 A.1)
up to 36 1Tb WD Sata disks

This server has 2 x quad core Intel CPU & 16Gb ram.
Disks: WD 1Tb c4t12d0 to c4t47d0 as single raidz pool. (6 disks per set)
Running dev 131.
I see problem on 2009.06 as well.

I note that the latest AOC-USASLP-L8i firmware is LSI Rev 1.26.00.00, which I believe does not support MSI. (working on Supermicro to update the firmware)

I have an LSI controller to swap for the AOC-USASLP-L8i with latest firmware, which I can retest with.

After a few hours of light load, no errors appear unless I initiate a scrub.

iostat -X -e -n
---- errors ---
s/w h/w trn tot device
0 0 0 0 fd0
0 9 0 9 c5t1d0
0 0 0 0 c4t8d0
0 0 0 0 c4t9d0
0 0 0 0 c4t12d0
0 0 0 0 c4t13d0
0 0 0 0 c4t14d0
0 0 0 0 c4t15d0
0 0 0 0 c4t16d0
0 0 0 0 c4t17d0
0 0 0 0 c4t18d0
0 0 0 0 c4t19d0
0 0 0 0 c4t20d0
0 0 0 0 c4t21d0
0 0 0 0 c4t22d0
0 0 0 0 c4t23d0
0 0 0 0 c4t30d0
0 1 10 11 c4t31d0
0 2 20 22 c4t32d0
0 0 0 0 c4t33d0
0 0 0 0 c4t34d0
0 0 0 0 c4t35d0
0 0 0 0 c4t36d0
0 0 0 0 c4t37d0
0 0 0 0 c4t38d0
0 0 0 0 c4t39d0
0 0 0 0 c4t40d0
0 0 0 0 c4t41d0
0 0 0 0 c4t42d0
0 1 10 11 c4t43d0
0 3 31 34 c4t44d0
0 1 10 11 c4t45d0
0 2 20 22 c4t46d0
0 1 10 11 c4t47d0
0 0 0 0 c4t48d0
0 0 0 0 c4t49d0
0 0 0 0 c4t50d0
0 0 0 0 c4t51d0
0 0 0 0 c4t52d0

In this instance, all errors are on the same (24 bay) backplane.
I have also had them on the 16 bay backplane with this 2 chassis configuration.

The problem becomes more of a pain when drives drop off for a short period, then reconnect and resilver or occassionally just stop until a reboot or hot plug.
The robustness of ZFS certainly helps keep things running.


Mark.

nipsy

Posts: 7
From: US

Registered: 12/1/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Jan 26, 2010 9:12 AM   in response to: mark0001
To: Communities » zfs » discuss
  Click to reply to this thread Reply

I would definitely be interested to see if the newer firmware fixes the problem for you. I have a very similar setup to yours, and finally forcing the firmware flash to 1.26.00 of my on-board LSI 1068E on a SuperMicro H8DI3+ running snv_131 seemed to address the issue. I'm still waiting to see if that's entirely the case, but so far so good (even with a LOT of disk activity to clean up the very messy zpool which had resulted from all of the disk timeouts/errors previously).

mark0001

Posts: 43
From:

Registered: 1/26/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Jan 26, 2010 9:55 PM   in response to: nipsy
To: Communities » zfs » discuss
  Click to reply to this thread Reply

An update:

Well things didn't quite turn out as expected.
I decided to follow the path right to the disks for clues.
Digging into the adapter diags with LSIUTIL, revealed an Adapter Link issue.

Adapter Phy 5: Link Down
Invalid DWord Count 5,969,575
Running Disparity Error Count 5,782,581
Loss of DWord Synch Count 0
Phy Reset Problem Count 0

After replacing cables, I eventually replaced the controller and then things really went pear shaped.
It turns out the backplane, that ran without major issues on the Supermicro controller, refused to operate with the LSI SAS3081E-R (with latest code)- card wouldn't initialise, links only ran at 1.5Mb/s, most disks offline etc.
Replacing the backplane (whole jbod) fixed the Adapter Link problems, but timeouts still occur when scrubbing.
Oh look, the dev names moved. they used to start at c4t8d0, but it has "made it right" all by itself. EYHOBG!

iostat -X -e -n
s/w h/w trn tot device
0 0 0 0 c4t0d0
0 0 0 0 c4t1d0
0 2 8 10 c4t2d0
0 3 18 21 c4t3d0
0 0 0 0 c4t4d0
0 2 12 14 c4t5d0
0 1 8 9 c4t6d0
0 2 15 17 c4t7d0
0 0 0 0 c4t8d0
0 0 0 0 c4t9d0
0 0 0 0 c4t10d0
0 0 0 0 c4t11d0
0 0 0 0 c4t12d0
0 0 0 0 c4t13d0
0 11 84 95 c4t41d0
0 8 62 70 c4t42d0
0 10 72 82 c4t43d0
0 19 147 166 c4t44d0
0 12 102 114 c4t45d0
0 19 145 164 c4t46d0
0 13 108 121 c4t47d0
0 7 62 69 c4t48d0
0 14 113 127 c4t49d0
0 11 96 107 c4t50d0
0 11 91 102 c4t51d0
0 8 64 72 c4t52d0
0 13 108 121 c4t53d0
0 11 106 117 c4t54d0
0 10 82 92 c4t55d0
0 10 88 98 c4t56d0
0 12 85 97 c4t57d0
0 6 38 44 c4t58d0
and
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
c4t2d0 ONLINE 0 0 1 25.5K repaired
c4t55d0 ONLINE 0 0 4 102K repaired

I do note that after these errors, there are no errors in the lsi adapter diag logs.

Data disks are all new WD10EARS.

If the OpenSolaris and ZFS combination wasn't so robust, this would have ended badly.

Next step will be trying different timeout settings on the controller and see if that helps.

P.S. I have a client with a "suspect", nearly full, 20Tb zpool to try to scrub, so this is a big issue for me. A resilver of a 1Tb disk takes up to 40 hrs., so I expect a scrub to be a week (or two), and at present, would probably result in multiple disk failures.

Mark.

mark0001

Posts: 43
From:

Registered: 1/26/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Feb 1, 2010 6:04 PM   in response to: mark0001
To: Communities » zfs » discuss
  Click to reply to this thread Reply

The results are in:

My timeout issue is definitely the WD10EARS disks.
Although differences in the error rate was seen with different LSI firmware revisions, the errors persisted. The more disks on the expander, the higher the number with iostat errors.
This then causes zpool issues (disk failures, resilvering etc.)

I replaced 24 of them with ST32000542AS (f/w CC34), and the problem departed with the WD disks.
Full scrub of 1.5Tb, not one error seen anywhere.

WD has chosen to cripple their consumer grade disks when used in quantities greater than one.

I'll now need to evaluate alternative supplers of low cost disks for low end high volume storage.

Mark.

typo ST32000542AS not NS


Message was edited by: mark0001

sbreden

Posts: 202
From:

Registered: 3/26/08
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Feb 2, 2010 5:58 AM   in response to: mark0001
To: Communities » zfs » discuss
  Click to reply to this thread Reply

> My timeout issue is definitely the WD10EARS disks.
> WD has chosen to cripple their consumer grade disks
> when used in quantities greater than one.
>
> I'll now need to evaluate alternative supplers of low
> cost disks for low end high volume storage.
>
> Mark.
>
> typo ST32000542AS not NS

This was the conclusion I came to. I'm also on the hunt for some decent consumer-priced drives for use in a ZFS RAID setup, and I created a thread to try to find which ones people recommend. See here:
http://opensolaris.org/jive/thread.jspa?threadID=121871

So far, I'm inclined to think that the Samsung HD154UI 1.5TB, and possibly the Samsung HD203WI 2TB drives might be the most reliable choices at the moment, based on the data in that thread and checking user reports.

Cheers,
Simon

http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/

tonmaus

Posts: 18
From:

Registered: 1/27/10
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Feb 2, 2010 6:52 AM   in response to: sbreden
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Hi Simon,

I am running 5 WD20EADS in a raidz-1+spare on ahci controller without any problems I could relate to TLER or head parking.

Cheers,

Tonmaus

sbreden

Posts: 202
From:

Registered: 3/26/08
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Feb 2, 2010 8:08 AM   in response to: tonmaus
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Hi Tonmaus,

That's good to hear. Which revision are they: 00R6B0 or 00P8B0? It's marked on the drive top.

From what I've seen elsewhere, people seem to be complaining about the newer 00P8B0 revision, so I'd be interested to hear from you. These revision numbers are listed in the first post of the thread below, and refer to the 1.5TB model (WD15EADS), but might also be applicable to the WD20EADS model too.

http://opensolaris.org/jive/thread.jspa?threadID=121871

Cheers,
Simon

nipsy

Posts: 7
From: US

Registered: 12/1/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Feb 2, 2010 9:17 AM   in response to: sbreden
To: Communities » zfs » discuss
  Click to reply to this thread Reply

> That's good to hear. Which revision are they: 00R6B0
> or 00P8B0? It's marked on the drive top.

Interesting. I wonder if this is the issue too with the 01U1B0 2.0TB drives? I have 24 WD2002FYPS-01U1B0 drives under OpenSolaris with an LSI 1068E controller that have weird timeout issues and I have 4 more on a 3ware 9650SE (a 9650SE-4LPML to be exact) under Linux which often show up with a status of DEVICE-ERROR even though the drive otherwise appears to be fine. All of this could be explained by these drives not responding in a timely manner I think to the controller.

I sent in a request to Western Digital support pointing out this portion of the thread from a few posts above. Hopefully some kind of response is forthcoming.

sbreden

Posts: 202
From:

Registered: 3/26/08
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Feb 2, 2010 11:22 AM   in response to: nipsy
To: Communities » zfs » discuss
  Click to reply to this thread Reply

If I'm not mistaken then the WD2002FYPS is an enterprise model: WD RE4-GP (RAID Edition, Green Power), so you almost certainly have the firmware that allows (1) the idle time before spindown to be modified with WDIDLE3.EXE and (2) the error reporting time to be modified with WDTLER.EXE.

So I expect your drives are spinning down to save power as they are Green series drives. But if this spindown is causing odd things to happen you could see if it's possible to increase the spindown time with WDIDLE3.EXE.

Let us know if you get any news back from WD.

Cheers,
Simon

nipsy

Posts: 7
From: US

Registered: 12/1/09
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Feb 3, 2010 9:17 AM   in response to: nipsy
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Looks like I got the textbook response from Western Digital:
---
Western Digital technical support only provides jumper configuration and physical installation support for hard drives used in systems running the Linux/Unix operating systems. For setup questions beyond physical installation of your Western Digital hard drive, please contact the vendor of your Linux/Unix operating system.

Please install the drive in an XP or Vista system to test the drive following the information below.
<snipped>
---

etc. So, doesn't look like I'll get any kind of reasonable response personally.

tonmaus

Posts: 18
From:

Registered: 1/27/10
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Feb 3, 2010 12:39 PM   in response to: sbreden
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Hi Simon,

they are the new revision.
I got the impression as well that the complaints you reported were mainly related to embedded Linux systems probably running LVM / mda. (thecus, Qnap, ....) Other reports I had seen related to typical HW raids. I don't think the situation is comparable to ZFS.
I have also followed some TLER related threads here. I am not sure if there was ever a clear assertion if consumer drive related Error correction will affect a ZFS pool or not. Statistically we should have a lot of "restrictive TLER settings helped me to solve my ZFS pool issues" success reports here, if it were. That all rather points to singular issues with firmware bugs or similar than to a systematic issue, doesn't it?

Cheers,

Tonmaus

sbreden

Posts: 202
From:

Registered: 3/26/08
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Feb 3, 2010 3:11 PM   in response to: tonmaus
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Hi Tonmaus,

> they are the new revision.

OK.

> I got the impression as well that the complaints you
> reported were mainly related to embedded Linux
> systems probably running LVM / mda. (thecus, Qnap,
> ....) Other reports I had seen related to typical HW
> raids. I don't think the situation is comparable to
> ZFS.

That could be the case, but maybe I'll have to create a specific thread along the lines of "Anyone having success / problems with WD Green drives?" in order to know a bit more details. There were Mac users also complaining -- see the WDC links in the "Best 1.5TB drives" thread.

> I have also followed some TLER related threads here.
> I am not sure if there was ever a clear assertion if
> consumer drive related Error correction will affect a
> ZFS pool or not. Statistically we should have a lot
> of "restrictive TLER settings helped me to solve my
> ZFS pool issues" success reports here, if it were.

IIRC I think Richard said that he thought that a troublesome non-RAID drive would affect MTTR and not reliability. I.e. you'll have to manually intervene if a consumer drive causes the system to hang, and replace it, whereas the RAID edition drives will probably report the error quickly and then ZFS will rewrite the data elsewhere, and thus maybe not kick the drive.

So it sounds preferable to have TLER in operation, if one can find a consumer-priced drive that allows it, or just take the hit and go with whatever non-TLER drive you choose and expect to have to manually intervene if a drive plays up. OK for home user where he is not too affected, but not good for businesses which need to have something recovered quickly.

> That all rather points to singular issues with
> firmware bugs or similar than to a systematic issue,
> doesn't it?

I'm not sure. Some people in the WDC threads seem to report problems with pauses during media streaming etc. As these drives have 32MB+ caches, and if one of the complaining users were streaming some AVI file, theoretically 32MB could equate to around 5 to 10 minutes of compressed video, and therefore maybe their Green drive has gone into sleep mode, and maybe it takes a while to revive itself: spinup + unpark heads etc -- and maybe this only happens when the 32MB+ cache is empty, then it loads another 32MB into cache etc and so on? That could be an explanation for that class of complaint reported, but it's just a guess. But there were also reports of stalling writes, so I don't know. If this sleeping is the cause of their complaints then maybe some 'wake up' task could be run every 5 seconds or so which keeps the drives from sleeping? I think someone else in this thread reported using something like this earlier.

Cheers,
Simon

tonmaus

Posts: 18
From:

Registered: 1/27/10
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Posted: Feb 4, 2010 2:08 AM   in response to: sbreden
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Hi Simon

> I.e. you'll have to manually intervene
> if a consumer drive causes the system to hang, and
> replace it, whereas the RAID edition drives will
> probably report the error quickly and then ZFS will
> rewrite the data elsewhere, and thus maybe not kick
> the drive.

IMHO the relevant aspects are if ZFS is able to give accurate account on cache flush status and even realize if a drive is not responsive. That being said, I have no seen a specific report that ZFS would kick green drives at random or at pattern, like the poor SoHo storage enclosure users do all the time.

>
> So it sounds preferable to have TLER in operation, if
> one can find a consumer-priced drive that allows it,
> or just take the hit and go with whatever non-TLER
> drive you choose and expect to have to manually
> intervene if a drive plays up. OK for home user where
> he is not too affected, but not good for businesses
> which need to have something recovered quickly.

One point about TLER is that two error correction schemes concur in the case you run a consumer drive on an active RAID controller that has its own mechanisms. When you run ZFS on a RAID controller in contrast to the best practise recommendations, an analogue question arises. On the other hand, if you run a green consumer drive on a dumb HBA , I wouldn't know what is wrong with it in the first place.
As much as for manual interventions, the only one I am aware of would be to re-attach a single drive. Not an option if you are really affected like those miserable Thecus N7000 users that see the entire array of only a handful of drives drop out within hours - over and over again, or not even get to finish formatting the stripe set.
The dire consequences of the gossiped TLER problems let me believe that there would be much more and quite specific reports in this place if this was a systematic issue with ZFS. Other than that, we are operating outside supported specs when running consumer level drives in large arrays. So far at least the perspective of Seagate and WD.

>
> > That all rather points to singular issues with
> > firmware bugs or similar than to a systematic
> issue,
> > doesn't it?
>
> I'm not sure. Some people in the WDC threads seem to
> report problems with pauses during media streaming
> etc.

This was again for SoHo storage enclosures - not for ZFS, right?

> when the
> 32MB+ cache is empty, then it loads another 32MB into
> cache etc and so on?

I am not sure if any current disk will have such a simplistic cache management that will draw upon completely cycling the buffer content, let alone for reads that belong to a single file (a disk basically is agnostic of files). Moreover, such a buffer management would be completely useless for a striped array. I don't know much better what a disk cache does either, but I am afraid that direction is probably not helpful to understanding certain phenomenons people have reported.

I think that at this time we are seeing a quite large amount of evolutions going on in disk storage, whereas many established assumptions are being abandoned while backwards compatibility is not always taken care of. SAS 6G (will my controller really work in a PCIe 1.1 slot?) and 4k clusters are certainly only prominent examples. It's probably even more true than ever to fall back to established technologies in such times, including of biting the bullet of cost premium on occasion.

Best regards

Tonmaus




Terms of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
© 2010, Oracle Corporation and/or its affiliates

Oracle