OpenSolaris

Discussions Communities Projects Download Source Browser

Home » OpenSolaris Forums » zfs » discuss

Thread: Checksum errors...

Welcome, Guest Help
Login Login
Guest Settings Guest Settings
Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 4 - Last Post: Jan 4, 2007 11:21 AM by: goo
scrming

Posts: 19
From:

Registered: 10/12/06
Checksum errors...
Posted: Dec 28, 2006 3:49 AM
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Background:
Large ZFS pool built on a couple of Sun 3511 SATA arrays. RAID-5 is done in the 3511s. ZFS is non-redundant. We have been using this setup for a couple of months now with no issues.

Problem:
Yesterday afternoon we started getting checksum errors. There have been no hardware errors reported at either the Solaris level or the hardware level. 3511 logs are clean. Here is the zpool status:

tsmsun1 - /home/root >zpool status -xv
pool: z_tsmsun1_pool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
z_tsmsun1_pool ONLINE 0 0 180
c22t600C0FF00000000000678A0A86F3D901d0s0 ONLINE 0 0 0
c22t600C0FF00000000000678A0A86F3D900d0s0 ONLINE 0 0 0
c22t600C0FF0000000000068190A86F3D901d0s0 ONLINE 0 0 0
c22t600C0FF0000000000068190A86F3D900d0s0 ONLINE 0 0 0
c22t600C0FF0000000000068191A598ED500d0s0 ONLINE 0 0 0
c22t600C0FF00000000000678A1A598ED500d0s0 ONLINE 0 0 0
c22t600C0FF0000000000068191A598ED501d0s0 ONLINE 0 0 0
c22t600C0FF00000000000681943A7223100d0s0 ONLINE 0 0 0
c22t600C0FF00000000000681943A7223101d0 ONLINE 0 0 0
c22t600C0FF00000000000681932BBD24400d0s0 ONLINE 0 0 0
c22t600C0FF00000000000681932BBD24401d0s0 ONLINE 0 0 0
c22t600C0FF00000000000678A43A7223100d0s0 ONLINE 0 0 180
c22t600C0FF00000000000678A2055211B01d0s0 ONLINE 0 0 0
c22t600C0FF00000000000678A2055211B00d0s0 ONLINE 0 0 0
c22t600C0FF00000000000678A32BBD24401d0s0 ONLINE 0 0 0
c22t600C0FF00000000000678A1A598ED501d0s0 ONLINE 0 0 0
c22t600C0FF00000000000678A32BBD24400d0s0 ONLINE 0 0 0
c22t600C0FF00000000000678A43A7223101d0s0 ONLINE 0 0 0
c22t600C0FF0000000000068192055211B00d0s0 ONLINE 0 0 0
c22t600C0FF0000000000068192055211B01d0s0 ONLINE 0 0 0
c22t600C0FF00000000000678A44F3D81B00d0s0 ONLINE 0 0 0
c22t600C0FF00000000000678A44F3D81B01d0s0 ONLINE 0 0 0
c22t600C0FF00000000000681944F3D81B00d0s0 ONLINE 0 0 0
c22t600C0FF00000000000681944F3D81B01d0s0 ONLINE 0 0 0

errors: The following persistent errors have been detected:

DATASET OBJECT RANGE
z_tsmsun1_pool/tsmsrv1_pool 2620 8464760832-8464891904

Looks like I have possibly a single file that is corrupted. My question is how do I find the file. Is it as simple as doing a find command using "-inum 2620"?

TIA,
john

scrming

Posts: 19
From:

Registered: 10/12/06
Re: Checksum errors...
Posted: Dec 28, 2006 3:59 AM   in response to: scrming
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Ok... guess I answered my own question... LOL!

I did the find with the -inum... gave me a file name... so i did:

tsmsun1 - /tsmsrv1_pool >dd if=000203db.bfs of=/dev/null bs=128k
read: I/O error
64581+0 records in
64581+0 records out

So.... it would appear the file is poo-poo....

Now the interesting thoughts... These 3511's have been around for a couple of years. We were using them with Veritas VxFS... We only recently switched over to ZFS to take advantage of compression.... So is it safe to say that I was lucky using VxFS and never had any corruption or was I suffering from silent corruption under VxFS.... hmmm.....

thanks!
john

Robert Milkowski
rmilkowski@task.gda.pl
Re: Re: Checksum errors...
Posted: Dec 28, 2006 6:05 AM   in response to: scrming

  Click to reply to this thread Reply

Hello John,

Thursday, December 28, 2006, 12:59:34 PM, you wrote:

J> Ok... guess I answered my own question... LOL!

J> I did the find with the -inum... gave me a file name... so i did:

J> tsmsun1 - /tsmsrv1_pool >dd if=000203db.bfs of=/dev/null bs=128k
J> read: I/O error
J> 64581+0 records in
J> 64581+0 records out

J> So.... it would appear the file is poo-poo....

J> Now the interesting thoughts... These 3511's have been around for
J> a couple of years. We were using them with Veritas VxFS... We
J> only recently switched over to ZFS to take advantage of
J> compression.... So is it safe to say that I was lucky using VxFS
J> and never had any corruption or was I suffering from silent
J> corruption under VxFS.... hmmm.....

I guess you got silent corruption before.
With 3511 with sata driver I also get checksum errors from time to
time, while on 3510 with FC drives I haven't seen them (yet).
And it explains why we had to run fsck every few months before...

--
Best regards,
Robert mailto:rmilkowski at task dot gda dot pl
http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



scrming

Posts: 19
From:

Registered: 10/12/06
Re: Re: Checksum errors...
Posted: Dec 28, 2006 6:47 AM   in response to: Robert Milkowski
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Thanks for the reply!

As it turns out I ran a parity check on the suspect 3511... sure enough it popped and error! So ZFS did detect the problem with the 3511...

goo

Posts: 370
From: US

Registered: 6/13/05
Re: Checksum errors...
Posted: Jan 4, 2007 11:21 AM   in response to: scrming

  Click to reply to this thread Reply


> errors: The following persistent errors have been detected:
>
> DATASET OBJECT RANGE
> z_tsmsun1_pool/tsmsrv1_pool 2620 8464760832-8464891904
>
> Looks like I have possibly a single file that is corrupted. My question is how do I find the file. Is it as simple as doing a find command using "-inum 2620"?
>

FYI, i'm finishing up:
6410433 'zpool status -v' would be more useful with filenames

Which will give you the complete path to the file (if applicable), so
you don't have to do a 'find' on the inum.

eric
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss






Terms of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
Copyright © 1995-2005 Sun Microsystems, Inc.