|
Replies:
30
-
Last Post:
Nov 11, 2009 1:23 PM
by: kebabber
|
Threads:
[
Previous
|
Next
]
|
|
Posts:
591
From:
US
Registered:
8/21/06
|
|
|
|
[zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 1, 2009 9:27 PM
|
|
I've sent this to the driver list as well, but since the zfs folks tend to be intimately involved with the marvell driver stack, I figured I'd give you guys a shot too. Does anyone happen to know if there was a driver change with build 126?
I had a pool that was 2x5+1 raidz vdev's. I moved all the data off
temporarily, changed it to one 10+2 raidz2 vdev, and am in the process
of moving all the data back.
I've had two drives "fail" in the last 3 hours that have been running
fine for over a year, and presented absolutely no issues moving the
data out of the original zpool. My first inclination is this is a
driver issue.
I'm currently running 2xMarvell SAT2-MV8 SATA controllers. 6 disks on the first controller, 7 on the second (one hot spare).
zpool status
pool: fserv
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scrub: resilver completed after 1h38m with 0 errors on Sun Nov 1 18:42:16 2009
config:
NAME STATE READ WRITE CKSUM
fserv DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
c8t0d0 ONLINE 0 0 0
c8t1d0 ONLINE 0 0 0
spare-2 DEGRADED 0 0 2.83M
c8t2d0 REMOVED 0 0 0
c7t6d0 ONLINE 0 0 0 35.6G resilvered
c8t3d0 ONLINE 0 0 0
c8t4d0 ONLINE 0 0 0
c8t5d0 ONLINE 0 0 0
c7t0d0 ONLINE 0 0 0
c7t1d0 ONLINE 0 0 0
c7t2d0 ONLINE 0 0 0
c7t3d0 ONLINE 0 0 0
c7t4d0 REMOVED 0 0 0
c7t5d0 ONLINE 0 0 0
spares
c7t6d0 INUSE currently in use
Nov 1 16:21:34 fserv sata: [ID 801593 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/ pci1033,125@0,1/pci11ab,11ab@6:
Nov 1 16:21:34 fserv SATA device at port 2 - device failed
Nov 1 16:21:34 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 16:21:34 fserv Command failed to complete...Device is gone
Nov 1 16:21:34 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 16:21:34 fserv drive offline
Nov 1 16:21:34 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 16:21:34 fserv SYNCHRONIZE CACHE command failed (5)
Nov 1 16:21:34 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 16:21:34 fserv drive offline
Nov 1 16:21:34 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 16:21:34 fserv drive offline
Nov 1 16:21:34 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 16:21:34 fserv drive offline
Nov 1 16:21:34 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 16:21:34 fserv drive offline
Nov 1 16:21:40 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 16:21:40 fserv drive offline
Nov 1 16:21:40 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 16:21:40 fserv drive offline
Nov 1 16:21:40 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 16:21:40 fserv drive offline
Nov 1 16:21:40 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 16:21:40 fserv drive offline
Nov 1 17:03:38 fserv marvell88sx: [ID 268337 kern.warning] WARNING: marvell88sx2:device on port 4 failed to reset
Nov 1 17:04:08 fserv sata: [ID 801593 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0/pci11ab,11ab@4:
Nov 1 17:04:08 fserv SATA device at port 4 - device failed
Nov 1 17:04:08 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0/pci11ab,11ab@4/disk@4,0 (sd30):
Nov 1 17:04:08 fserv Command failed to complete...Device is gone
Nov 1 17:04:08 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0/pci11ab,11ab@4/disk@4,0 (sd30):
Nov 1 17:04:08 fserv drive offline
Nov 1 17:04:09 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0/pci11ab,11ab@4/disk@4,0 (sd30):
Nov 1 17:04:09 fserv SYNCHRONIZE CACHE command failed (5)
Nov 1 17:04:09 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0/pci11ab,11ab@4/disk@4,0 (sd30):
Nov 1 17:04:09 fserv drive offline
Nov 1 17:04:09 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0/pci11ab,11ab@4/disk@4,0 (sd30):
Nov 1 17:04:09 fserv drive offline
Nov 1 17:04:09 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0/pci11ab,11ab@4/disk@4,0 (sd30):
Nov 1 17:04:09 fserv drive offline
Nov 1 17:04:09 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0/pci11ab,11ab@4/disk@4,0 (sd30):
Nov 1 17:04:09 fserv drive offline
Nov 1 18:31:59 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 18:31:59 fserv SYNCHRONIZE CACHE command failed (5)
Nov 1 18:32:11 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 18:32:11 fserv SYNCHRONIZE CACHE command failed (5)
Nov 1 18:35:00 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 18:35:00 fserv SYNCHRONIZE CACHE command failed (5)
Nov 1 18:35:12 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 18:35:12 fserv SYNCHRONIZE CACHE command failed (5)
Nov 1 18:35:21 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 18:35:21 fserv SYNCHRONIZE CACHE command failed (5)
Nov 1 18:38:36 fserv scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6/disk@2,0 (sd26):
Nov 1 18:38:36 fserv SYNCHRONIZE CACHE command failed (5)
Nov 1 21:06:31 fserv pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 2 irq 0xe vector 0x44 ioapic 0x4 intin 0xe is bound to cpu 3
Nov 1 21:06:31 fserv pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 3 irq 0xf vector 0x44 ioapic 0x4 intin 0xf is bound to cpu 0
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
Posts:
778
From:
Registered:
2/14/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 2, 2009 5:34 AM
in response to: tcook
To: Communities » zfs » discuss
|
|
I have the same card and might have seen the same problem. Yesterday I upgraded to b126 and started to migrate all my data to 8 disc raidz2 connected to such a card. And suddenly ZFS reported checksum errors. I thought the drives were faulty. But you suggest the problem could have been the driver? I also noticed that one of the drives had resilvered a small amount, just like yours.
I now use b125 and there are no checksum errors. So, is there a bug in the new b126 driver?
|
|
|
|
Posts:
591
From:
US
Registered:
8/21/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 3, 2009 10:14 AM
in response to: kebabber
|
|
On Mon, Nov 2, 2009 at 6:34 AM, Orvar Korvar <knatte_fnatte_tjatte at yahoo dot com> wrote:
I have the same card and might have seen the same problem. Yesterday I upgraded to b126 and started to migrate all my data to 8 disc raidz2 connected to such a card. And suddenly ZFS reported checksum errors. I thought the drives were faulty. But you suggest the problem could have been the driver? I also noticed that one of the drives had resilvered a small amount, just like yours.
I now use b125 and there are no checksum errors. So, is there a bug in the new b126 driver?
Can any of you Sun folks comment on this?
--Tim
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
778
From:
Registered:
2/14/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 6, 2009 8:38 AM
in response to: tcook
To: Communities » zfs » discuss
|
|
Noone has noticed this?
|
|
|
|
Posts:
125
From:
US
Registered:
12/1/05
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 6, 2009 11:38 AM
in response to: kebabber
|
|
> Nov 1 16:21:34 fserv Command failed to complete...Device is gone > Nov 1 17:04:08 fserv Command failed to complete...Device is gone
kinda looks like drive FW or cable issue... if it was a driver issue it might be a lost command or reset for phase resync.
> driver change with build 126? not for the SATA framework, but for HBAs there is: http://hub.opensolaris.org/bin/view/Community+Group+on/2009093001
Rob
_______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
778
From:
Registered:
2/14/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 6, 2009 12:10 PM
in response to: picker
To: Communities » zfs » discuss
|
|
Right now I do not dare to use builds later than 125, because in b126 the problem showed up. Maybe a coincidence, maybe not. But I think it is best to not use b126 or later, until someone has confirmed there are no driver changes.
So, to confirm, there are no driver changes in b126 for the marvell88sx2, right? So I should safely be able to use b126 and later?
|
|
|
|
Posts:
591
From:
US
Registered:
8/21/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 6, 2009 3:39 PM
in response to: kebabber
|
|
On Fri, Nov 6, 2009 at 2:10 PM, Orvar Korvar <knatte_fnatte_tjatte at yahoo dot com> wrote:
Right now I do not dare to use builds later than 125, because in b126 the problem showed up. Maybe a coincidence, maybe not. But I think it is best to not use b126 or later, until someone has confirmed there are no driver changes.
So, to confirm, there are no driver changes in b126 for the marvell88sx2, right? So I should safely be able to use b126 and later?
Let me know what your results are if you decide to upgrade. I've already replaced both drives that were having issues, I'll do cables later but I'm still having a hard time believing my cables magically went bad right when I upgraded to build 126. The new drives have the same issues the old drives did. New brand and model.
And from what I can tell, I'm getting checksum errors through the roof on the replace as well... pool: fserv state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: resilver in progress for 0h34m, 22.60% done, 1h57m to go config: NAME STATE READ WRITE CKSUM fserv DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0
c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 spare-2 DEGRADED 0 0 0 14340903866396142118 UNAVAIL 0 0 0 was /dev/dsk/c8t2d0s0
c7t6d0 ONLINE 0 0 0 c8t3d0 REMOVED 0 0 0 c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0
c7t0d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0
replacing-10 DEGRADED 0 0 816K 15401866802517339500 FAULTED 0 0 0 was /dev/dsk/c7t4d0s0/old c7t4d0 ONLINE 0 0 0 52.3G resilvered
c7t5d0 ONLINE 0 0 0 spares c7t6d0 INUSE currently in use --Tim
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
778
From:
Registered:
2/14/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 7, 2009 2:27 AM
in response to: tcook
To: Communities » zfs » discuss
|
|
Ok, so you changed drives and you still see errors? Are the drives brand new or used? What kind of drives, which brand? 2TB? And if you reboot into an earlier build such as b125 you dont see any errors, right?
Right now I am running b125. I dont dare to run b126, if your observation is correct. Could you just rip out drivers from b125? I could post them drivers here for you, if you tell me which files you need. And then you can see if it is the drivers causing the problem or not.
|
|
|
|
Posts:
591
From:
US
Registered:
8/21/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 7, 2009 9:06 AM
in response to: kebabber
|
|
On Sat, Nov 7, 2009 at 4:27 AM, Orvar Korvar <knatte_fnatte_tjatte at yahoo dot com> wrote:
Ok, so you changed drives and you still see errors? Are the drives brand new or used? What kind of drives, which brand? 2TB? And if you reboot into an earlier build such as b125 you dont see any errors, right?
Brand new. I've tried both 1TB hitachi and 1.5TB seagate (not the "bad" ones).
I can't boot into an older version because the last version I had was b118 which doesn't have zfs version 19 support. I've been looking to see if there's a way to downgrade via IPS but that's turned up a lot of nothing.
Right now I am running b125. I dont dare to run b126, if your observation is correct. Could you just rip out drivers from b125? I could post them drivers here for you, if you tell me which files you need. And then you can see if it is the drivers causing the problem or not.
It's tough to say what exactly is causing the problems. I would imagine ripping something like sd from the older version would break more than it would fix.
--Tim
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
404
From:
US
Registered:
11/2/05
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 7, 2009 10:02 AM
in response to: tcook
|
|
Hi Tim and all,
I believe you are saying that marvell88sx2 driver error messages started in build 126, along with new disk errors in RAIDZ pools.
Is this correct? If so, please send me the following information:
1. Hardware you are running
2. If you are also seeing new disk errors in your RAIDZ pools include your zpool status output.
I'm not the right person to be diagnosing driver-level issues but I will investigate.
Thanks,
Cindy
----- Original Message ----- From: Tim Cook <tim at cook dot ms> Date: Saturday, November 7, 2009 10:08 am Subject: Re: [zfs-discuss] marvell88sx2 driver build126 To: Orvar Korvar <knatte_fnatte_tjatte at yahoo dot com> Cc: zfs-discuss at opensolaris dot org
> On Sat, Nov 7, 2009 at 4:27 AM, Orvar Korvar <knatte_fnatte_tjatte at yahoo dot com > > wrote: > > > Ok, so you changed drives and you still see errors? Are the drives brand > > new or used? What kind of drives, which brand? 2TB? And if you > reboot into > > an earlier build such as b125 you dont see any errors, right? > > > > Brand new. I've tried both 1TB hitachi and 1.5TB seagate (not the "bad" > ones). > > I can't boot into an older version because the last version I had was > b118 > which doesn't have zfs version 19 support. I've been looking to see if > there's a way to downgrade via IPS but that's turned up a lot of nothing. > > > > > > > > Right now I am running b125. I dont dare to run b126, if your observation > > is correct. Could you just rip out drivers from b125? I could post them > > drivers here for you, if you tell me which files you need. And then > you can > > see if it is the drivers causing the problem or not. > > > > It's tough to say what exactly is causing the problems. I would imagine > ripping something like sd from the older version would break more than > it > would fix. > > --Tim > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris dot org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
591
From:
US
Registered:
8/21/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 7, 2009 10:44 AM
in response to: cindys
|
|
On Sat, Nov 7, 2009 at 12:02 PM, Cindy Swearingen <Cindy dot Swearingen at sun dot com> wrote:
Hi Tim and all,
I believe you are saying that marvell88sx2 driver error messages started
in build 126, along with new disk errors in RAIDZ pools.
Is this correct? If so, please send me the following information:
Yes.
1. Hardware you are running
Motherboard: SUPERMICRO MBD-H8DAE-2-O 2xAMD opteron 22xx CPU's (forget the exact model, they're 2010mhz) 8GB crucial ECC ddr2 memory
2xSupermicro AOC-SAT2-MV8 SATA adapters Supermicro SC932T-R760B case with 15xSATA passthrough backplane
I also have an nvidia video card in it, but I'm not sure of the model, and doubt it has any role in this troubleshooting.
2. If you are also seeing new disk errors in your RAIDZ pools
include your zpool status output.
Well, I can give you a current one, but I've done about a hundred things troubleshooting, so it isn't representative of what the issues were a few days ago. I'm still trying to figure out why it's choking on any drive I put into c8t2d0. It's stopped generating errors on c7t4d0, but I haven't changed a thing with that slot outside of stopping the zpool replace and restarting it a few times... which is also extremely odd to me.
r00t@fserv:~$ zpool status pool: fserv state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q scrub: resilver completed after 2h53m with 0 errors on Fri Nov 6 22:09:08 2009 config: NAME STATE READ WRITE CKSUM
fserv DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0
spare-2 DEGRADED 0 0 0 14340903866396142118 UNAVAIL 0 0 0 was /dev/dsk/c8t2d0s0 c7t6d0 ONLINE 0 0 0
c8t3d0 ONLINE 0 0 0 2.68G resilvered c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0
c7t1d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 231G resilvered
c7t5d0 ONLINE 0 0 0 spares c7t6d0 INUSE currently in use errors: No known data errors
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
778
From:
Registered:
2/14/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 7, 2009 1:33 PM
in response to: cindys
To: Communities » zfs » discuss
|
|
I saw the same checksum error problem when I booted into b126. I havent dared try b126 again, I use b125 now, without problems. Here is my hardware Intel Q9450 + P45 Gigabyte EP45-DS3P motherboard + Ati 4850 I have the same AOC SATA controller card. And some Samsung Spinpoint F1, 1TB drives. Brand new.
|
|
|
|
Posts:
404
From:
US
Registered:
11/2/05
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 9, 2009 12:51 PM
in response to: kebabber
|
|
Hi,
I can't find any bug-related issues with marvell88sx2 in b126.
I looked over Dave Hollister's shoulder while he searched for marvell in his webrevs of this putback and nothing came up:
> driver change with build 126? not for the SATA framework, but for HBAs there is: http://hub.opensolaris.org/bin/view/Community+Group+on/2009093001
I will find a thumper, load build 125, create a raidz pool, and upgrade to b126.
I'll also send the error messages that Tim provided to someone who works in the driver group.
Thanks,
Cindy
On 11/07/09 14:33, Orvar Korvar wrote: > I saw the same checksum error problem when I booted into b126. I havent dared try b126 again, I use b125 now, without problems. Here is my hardware > Intel Q9450 + P45 Gigabyte EP45-DS3P motherboard + Ati 4850 > I have the same AOC SATA controller card. And some Samsung Spinpoint F1, 1TB drives. Brand new. _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
591
From:
US
Registered:
8/21/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 9, 2009 11:59 PM
in response to: cindys
|
|
On Mon, Nov 9, 2009 at 2:51 PM, Cindy Swearingen <Cindy dot Swearingen at sun dot com> wrote:
Hi,
I can't find any bug-related issues with marvell88sx2 in b126.
I looked over Dave Hollister's shoulder while he searched for
marvell in his webrevs of this putback and nothing came up:
> driver change with build 126?
not for the SATA framework, but for HBAs there is:
http://hub.opensolaris.org/bin/view/Community+Group+on/2009093001
I will find a thumper, load build 125, create a raidz pool, and
upgrade to b126.
I'll also send the error messages that Tim provided to someone who
works in the driver group.
Thanks,
Cindy
I tried the build 125 driver and it didn't make a difference. The odd part I've just noticed is that it's port 4 on both cards that have been giving me issues. I guess it's possible it's just a coincidence/bad luck.
I've grabbed the b125 ISO from genunix and am going to try booting off the livecd to see if it produces different results.
--Tim
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
778
From:
Registered:
2/14/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 10, 2009 1:25 AM
in response to: cindys
To: Communities » zfs » discuss
|
|
Does this mean that there are no driver changes in marvell88sx2, between b125 and b126? If no driver changes, then it means that we both had extreme unluck with our drives, because we both had checksum errors? And my discs were brand new.
How probable is this? Something is weird here. What is your opinion on this? Should we agree that there was a hardware error, and it was just a coincidence?
|
|
|
|
Posts:
404
From:
US
Registered:
11/2/05
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 10, 2009 7:56 AM
in response to: kebabber
|
|
Hi Orvar,
Correct, I don't see any marvell8sx2 driver changes between b125-126.
So far, only you and Tim are reporting these issues.
Generally, we see bugs filed by the internal test teams if they see similar problems.
I will try to reproduce the RAIDZ checksum errors separately from the marvell88sx2 issue.
Thanks,
Cindy
On 11/10/09 02:25, Orvar Korvar wrote: > Does this mean that there are no driver changes in marvell88sx2, between b125 and b126? If no driver changes, then it means that we both had extreme unluck with our drives, because we both had checksum errors? And my discs were brand new. > > How probable is this? Something is weird here. What is your opinion on this? Should we agree that there was a hardware error, and it was just a coincidence? _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
1,858
From:
US
Registered:
6/17/05
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 10, 2009 8:55 AM
in response to: kebabber
|
|
On Nov 10, 2009, at 1:25 AM, Orvar Korvar wrote:
> Does this mean that there are no driver changes in marvell88sx2, > between b125 and b126? If no driver changes, then it means that we > both had extreme unluck with our drives, because we both had > checksum errors? And my discs were brand new.
There are other drivers in the software stack that may have changed. -- richard
> > How probable is this? Something is weird here. What is your opinion > on this? Should we agree that there was a hardware error, and it was > just a coincidence? > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris dot org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
591
From:
US
Registered:
8/21/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 10, 2009 3:15 PM
in response to: relling
|
|
On Tue, Nov 10, 2009 at 10:55 AM, Richard Elling <richard dot elling at gmail dot com> wrote:
On Nov 10, 2009, at 1:25 AM, Orvar Korvar wrote:
Does this mean that there are no driver changes in marvell88sx2, between b125 and b126? If no driver changes, then it means that we both had extreme unluck with our drives, because we both had checksum errors? And my discs were brand new.
There are other drivers in the software stack that may have changed.
-- richard
How probable is this? Something is weird here. What is your opinion on this? Should we agree that there was a hardware error, and it was just a coincidence?
So... I do appear to have reached somewhat of a truce with the system and b126 at the moment. I'm now going through and replacing the last of my old maxtor 300GB drives with brand new hitachi 1TB drives. One thing I'm noticing is a lot of checksum errors being generated during the resilver. Is this normal? Furthermore, since I see "no known data errors", is it safe to assume it's all being corrected, and I'm not losing any data? I still do have a separate copy of this data on a box at work that should be completely consistent... but I will need to re-purpose that storage soon, and will be without a known good backup for a while (I know, I know). I'd rather do a fresh zfs send/receive than find out 6 months from now I lost something.
pool: fserv state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete.
scrub: resilver in progress for 0h8m, 0.89% done, 15h14m to go config:
NAME STATE READ WRITE CKSUM fserv DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0
c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0
c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0
c7t2d0 ONLINE 0 0 0 replacing-9 DEGRADED 0 0 161K 14274451003165180679 FAULTED 0 0 0 was /dev/dsk/c7t3d0s0/old
c7t3d0 ONLINE 0 0 0 2.05G resilvered c7t4d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 spares
c7t6d0 AVAIL errors: No known data errors
--Tim
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
591
From:
US
Registered:
8/21/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 10, 2009 9:01 PM
in response to: tcook
|
|
On Tue, Nov 10, 2009 at 5:15 PM, Tim Cook <tim at cook dot ms> wrote:
On Tue, Nov 10, 2009 at 10:55 AM, Richard Elling <richard dot elling at gmail dot com> wrote:
On Nov 10, 2009, at 1:25 AM, Orvar Korvar wrote:
Does this mean that there are no driver changes in marvell88sx2, between b125 and b126? If no driver changes, then it means that we both had extreme unluck with our drives, because we both had checksum errors? And my discs were brand new.
There are other drivers in the software stack that may have changed.
-- richard
How probable is this? Something is weird here. What is your opinion on this? Should we agree that there was a hardware error, and it was just a coincidence?
So... I do appear to have reached somewhat of a truce with the system and b126 at the moment. I'm now going through and replacing the last of my old maxtor 300GB drives with brand new hitachi 1TB drives. One thing I'm noticing is a lot of checksum errors being generated during the resilver. Is this normal? Furthermore, since I see "no known data errors", is it safe to assume it's all being corrected, and I'm not losing any data? I still do have a separate copy of this data on a box at work that should be completely consistent... but I will need to re-purpose that storage soon, and will be without a known good backup for a while (I know, I know). I'd rather do a fresh zfs send/receive than find out 6 months from now I lost something.
pool: fserv state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete.
scrub: resilver in progress for 0h8m, 0.89% done, 15h14m to go config:
NAME STATE READ WRITE CKSUM fserv DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0
c8t2d0 ONLINE 0 0 0
c8t3d0 ONLINE 0 0 0
c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0
c7t2d0 ONLINE 0 0 0
replacing-9 DEGRADED 0 0 161K 14274451003165180679 FAULTED 0 0 0 was /dev/dsk/c7t3d0s0/old
c7t3d0 ONLINE 0 0 0 2.05G resilvered c7t4d0 ONLINE 0 0 0
c7t5d0 ONLINE 0 0 0
spares
c7t6d0 AVAIL errors: No known data errors
--Tim
Anyo ne? It's up to 7.35M checksum errors and it's rebuilding extremely slowly (as evidenced by the 10 hour time). The errors are only showing on the "replacing-9" line, not the individual drive.
pool: fserv state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete.
scrub: resilver in progress for 6h56m, 39.61% done, 10h34m to go config:
NAME STATE READ WRITE CKSUM fserv DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0
c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0
c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0
c7t2d0 ONLINE 0 0 0 replacing-9 DEGRADED 0 0 7.35M 14274451003165180679 FAULTED 0 0 0 was /dev/dsk/c7t3d0s0/old
c7t3d0 ONLINE 0 0 0 91.9G resilvered c7t4d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 spares
c7t6d0 AVAIL
errors: No known data errors
--Tim
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
4
From:
Registered:
7/28/09
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 11, 2009 5:24 AM
in response to: tcook
|
|
On Nov 11, 2009, at 12:01 AM, Tim Cook wrote:
> On Tue, Nov 10, 2009 at 5:15 PM, Tim Cook <tim at cook dot ms> wrote: >> One thing I'm >> noticing is a lot of checksum errors being generated during the >> resilver. >> Is this normal?
> Anyone? It's up to 7.35M checksum errors and it's rebuilding > extremely > slowly (as evidenced by the 10 hour time). The errors are only > showing on > the "replacing-9" line, not the individual drive.
I've only replaced a drive once, but it didn't show any checksum errors during the resilver. This was a 2 TB WD Green drive in a mirror pool that had started to show write errors. It was attached to a SuperMicro AOC-SAT2-MV8.
Good luck, Ware _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
29
From:
Registered:
3/9/05
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 11, 2009 9:23 AM
in response to: tcook
To: Communities » zfs » discuss
|
|
The checksum errors are fixed in build 128 with:
6807339 spurious checksum errors when replacing a vdev
No; you're not losing any data due to this.
- Eric
|
|
|
|
Posts:
778
From:
Registered:
2/14/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 11, 2009 1:23 PM
in response to: taylor
To: Communities » zfs » discuss
|
|
So he did actually hit a bug? But the bug is not dangerous as it doesnt destroy data?
But I did not replace any devices and still it showed checksum errors. I think I did a zfs send | zfs receive? I dont remember. But I just copied things back and forth, and the checksum errors showed up. So what does that mean?
|
|
|
|
Posts:
778
From:
Registered:
2/14/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 11, 2009 1:38 AM
in response to: relling
To: Communities » zfs » discuss
|
|
Other drivers in the stack? Which drivers? And have anyone of them been changed between b125 and b126?
|
|
|
|
Posts:
591
From:
US
Registered:
8/21/06
|
|
|
|
|
Posts:
778
From:
Registered:
2/14/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 8, 2009 3:57 AM
in response to: tcook
To: Communities » zfs » discuss
|
|
"I can't boot into an older version because the last version I had was b118 which doesn't have zfs version 19 support. I've been looking to see if there's a way to downgrade via IPS but that's turned up a lot of nothing."
If someone can tell me which files are needed for the driver I can extract them from my b125 and post them here for you, so you can try out. Then we can know if the problem is in b126 drivers or not. If b125 drivers work, we know the problem is in b126. Otherwise there might be some other problem.
Another solution could be that you install SCXE b125. There are links to that DVD b125. And from SCXE you can upgrade to later Opensolaris builds. I think.
Is it possible to upgrade to a specific build via IPS? When I use the Update Manager, I always upgrade to the latest build. Can I target, say, bXXX? Or is the only way to get bXXX, by installing SXCE?
|
|
|
|
Posts:
877
From:
GB
Registered:
10/24/07
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 8, 2009 6:31 AM
in response to: kebabber
To: Communities » zfs » discuss
|
|
> "I can't boot into an older version because the last > version I had was b118 which doesn't have zfs version > 19 support. I've been looking to see if there's a > way to downgrade via IPS but that's turned up a lot > of nothing." > > If someone can tell me which files are needed for the > driver I can extract them from my b125 and post them > here for you, so you can try out. Then we can know if > the problem is in b126 drivers or not. If b125 > drivers work, we know the problem is in b126. > Otherwise there might be some other problem. > > Another solution could be that you install SCXE b125. > There are links to that DVD b125. And from SCXE you > can upgrade to later Opensolaris builds. I think. > > Is it possible to upgrade to a specific build via > IPS? When I use the Update Manager, I always upgrade > to the latest build. Can I target, say, bXXX? Or is > the only way to get bXXX, by installing SXCE?
Here are some notes i stole from the list earlier. I think they might be on a wiki somewhere now, but it seems relatively easy to upgrade to a specific version:
Starting from OpenSolaris 2009.06 (snv_111b) active BE.
1) beadm create snv_111b-dev 2) beadm activate snv_111b-dev 3) reboot 4) pkg set-authority -O http://pkg.opensolaris.org/dev opensolaris.org 5) pkg install SUNWipkg 6) pkg list 'entire*' 7) beadm create snv_118 8) beadm mount snv_118 /mnt 9) pkg -R /mnt refresh 10) pkg -R /mnt install entire@0.5.11-0.118 11) bootadm update-archive -R /mnt 12) beadm umount snv_118 13) beadm activate snv_118 14) reboot
Now you have a snv_118 development environment.
|
|
|
|
Posts:
778
From:
Registered:
2/14/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 8, 2009 7:47 AM
in response to: myxiplx
To: Communities » zfs » discuss
|
|
Great! So if I want another build, for instance b125, I just change step 10? 10) pkg -R /mnt install entire@0.5.11-0.125 Yes?
What is this "0.5.11" thing? Should that be changed too, if I try to install b125? Like "0.5.12-0.125"?
|
|
|
|
Posts:
591
From:
US
Registered:
8/21/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 8, 2009 8:54 AM
in response to: kebabber
|
|
On Sun, Nov 8, 2009 at 9:47 AM, Orvar Korvar <knatte_fnatte_tjatte at yahoo dot com> wrote:
Great! So if I want another build, for instance b125, I just change step 10?
10) pkg -R /mnt install entire@0.5.11-0.125
Yes?
What is this "0.5.11" thing? Should that be changed too, if I try to install b125? Like "0.5.12-0.125"?
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
426
From:
GB
Registered:
3/21/06
|
|
|
|
|
Posts:
778
From:
Registered:
2/14/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 8, 2009 2:04 PM
in response to: nwsmith
To: Communities » zfs » discuss
|
|
|
|
Ok, here I attached the 64 bit variant. You can try it if you wish and see if the checksum errors disappear.
|
|
|
|
Posts:
778
From:
Registered:
2/14/06
|
|
|
|
Re: [zfs-discuss] marvell88sx2 driver build126
Posted:
Nov 8, 2009 2:05 PM
in response to: kebabber
To: Communities » zfs » discuss
|
|
This is from build 125.
|
|
|
|
|