|
Replies:
11
-
Last Post:
Feb 20, 2007 10:43 AM
by: goo
|
|
|
Posts:
102
From:
Louisville, CO
Registered:
3/12/06
|
|
|
|
tracking error to file
Posted:
May 19, 2006 12:23 PM
|
|
In my testing, I've found the following error:
zpool status -v pool: local state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config:
NAME STATE READ WRITE CKSUM local ONLINE 0 0 0 c0d1p0 ONLINE 0 0 0 c2d0p1 ONLINE 0 0 0 c3d0p1 ONLINE 0 0 0 c0d0s7 ONLINE 0 0 0
errors: The following persistent errors have been detected:
DATASET OBJECT RANGE 1b 2402 lvl=0 blkid=1965
I haven't found a way to report in human terms what the above object refers to. Is there such a method?
I can clear the error using existing tools, but I'd like to know what is broken before I destroy it.
Thanks!
----- Gregory Shaw, IT Architect Phone: (303) 673-8273 Fax: (303) 673-8273 ITCTO Group, Sun Microsystems Inc. 1 StorageTek Drive ULVL4-382 greg dot shaw at sun dot com (work) Louisville, CO 80028-4382 shaw at fmsoft dot com (home) "When Microsoft writes an application for Linux, I've Won." - Linus Torvalds
_______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
Matthew Ahrens
ahrens@eng.sun.com
|
|
|
|
Re: tracking error to file
Posted:
May 21, 2006 11:25 PM
in response to: shawga
|
|
On Fri, May 19, 2006 at 01:23:02PM -0600, Gregory Shaw wrote: > DATASET OBJECT RANGE > 1b 2402 lvl=0 blkid=1965 > > I haven't found a way to report in human terms what the above object > refers to. Is there such a method?
There isn't any great method currently, but you can use 'zdb' to find this information. The quickest way would be to first determine the name of dataset 0x1b (=27):
# zdb local | grep "ID 27," Dataset local/ahrens [ZPL], ID 27, ...
Then get info on that particular object in that filesystem:
# zdb -vvv <dataset_name> 2402 ... Object lvl iblk dblk lsize asize type 2402 1 16K 3.50K 3.50K 2.50K ZFS plain file 264 bonus ZFS znode path /raidz/usr/src/uts/common/fs/zfs/dmu.c ...
The "path" listed is relative to the filesystem's mountpoint.
--matt _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
102
From:
Louisville, CO
Registered:
3/12/06
|
|
|
|
Re: tracking error to file
Posted:
May 22, 2006 8:21 AM
in response to: Matthew Ahrens
|
|
Thanks! I will do the below.
I brought it up on the alias, as I thought the problem would be encountered by a user eventually. They'll want the same information -- What does the error impact?
On May 22, 2006, at 12:25 AM, Matthew Ahrens wrote:
> On Fri, May 19, 2006 at 01:23:02PM -0600, Gregory Shaw wrote: >> DATASET OBJECT RANGE >> 1b 2402 lvl=0 blkid=1965 >> >> I haven't found a way to report in human terms what the above object >> refers to. Is there such a method? > > There isn't any great method currently, but you can use 'zdb' to find > this information. The quickest way would be to first determine the > name > of dataset 0x1b (=27): > > # zdb local | grep "ID 27," > Dataset local/ahrens [ZPL], ID 27, ... > > Then get info on that particular object in that filesystem: > > # zdb -vvv <dataset_name> 2402 > ... > Object lvl iblk dblk lsize asize type > 2402 1 16K 3.50K 3.50K 2.50K ZFS plain file > 264 bonus ZFS znode > path /raidz/usr/src/uts/common/fs/zfs/dmu.c > ... > > The "path" listed is relative to the filesystem's mountpoint. > > --matt > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris dot org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
----- Gregory Shaw, IT Architect Phone: (303) 673-8273 Fax: (303) 673-8273 ITCTO Group, Sun Microsystems Inc. 1 StorageTek Drive ULVL4-382 greg dot shaw at sun dot com (work) Louisville, CO 80028-4382 shaw at fmsoft dot com (home) "When Microsoft writes an application for Linux, I've Won." - Linus Torvalds
_______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Wout Mertens
wmertens@cisco.com
|
|
|
|
Re: tracking error to file
Posted:
May 23, 2006 2:49 AM
in response to: Matthew Ahrens
|
|
Can that same method be used to figure out what files changed between snapshots?
Wout.
On 22 May 2006, at 08:25, Matthew Ahrens wrote:
> On Fri, May 19, 2006 at 01:23:02PM -0600, Gregory Shaw wrote: >> DATASET OBJECT RANGE >> 1b 2402 lvl=0 blkid=1965 >> >> I haven't found a way to report in human terms what the above object >> refers to. Is there such a method? > > There isn't any great method currently, but you can use 'zdb' to find > this information. The quickest way would be to first determine the > name > of dataset 0x1b (=27): > > # zdb local | grep "ID 27," > Dataset local/ahrens [ZPL], ID 27, ... > > Then get info on that particular object in that filesystem: > > # zdb -vvv <dataset_name> 2402 > ... > Object lvl iblk dblk lsize asize type > 2402 1 16K 3.50K 3.50K 2.50K ZFS plain file > 264 bonus ZFS znode > path /raidz/usr/src/uts/common/fs/zfs/dmu.c > ... > > The "path" listed is relative to the filesystem's mountpoint. > > --matt > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris dot org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Matthew Ahrens
ahrens@eng.sun.com
|
|
|
|
Re: tracking error to file
Posted:
May 23, 2006 9:44 AM
in response to: Wout Mertens
|
|
On Tue, May 23, 2006 at 11:49:47AM +0200, Wout Mertens wrote: > Can that same method be used to figure out what files changed between > snapshots?
To figure out what files changed, we need to (a) figure out what object numbers changed, and (b) do the object number to file name translation.
The method I described (using zdb) will not be involved in either step. zdb is an undocumented interface, and using it for this purpose is only a workaround. However, the same algorithms implemented in zdb will be used to do step (b), the object number to file name translation.
--matt _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
95
From:
US
Registered:
3/9/05
|
|
|
|
Re: tracking error to file
Posted:
Sep 27, 2006 3:32 PM
in response to: Matthew Ahrens
To: Communities » zfs » discuss
|
|
The zdb object -> path trick doesn't give me a path name:
errors: The following persistent errors have been detected:
DATASET OBJECT RANGE 13 a51b lvl=0 blkid=9
bash-3.00# zdb mypool | grep "ID 19," Dataset mypool/rab [ZPL], ID 19, cr_txg 6, last_txg 4391649, 80.3G, 41883
objectsbash-3.00# zdb -vvv mypool/rab a51b Dataset mypool/rab [ZPL], ID 19, cr_txg 6, last_txg 4391649, 80.3G, 41883 objects, rootbp [L0 DMU objset] 400L/200P DVA[0]=<1:4408daa00:200> DVA[1]=<0:8d7323200:200> DVA[2]=<1:6a1c4ee00:200> fletcher4 lzjb LE contiguous birth=4391649 fill=41883 cksum=b79e8d8b0:469ba0a4696:e05ec517a391:1ea5669d90270d
ZIL header: claim_txg 0, seq 0
first block: [L0 ZIL intent log] 20000L/20000P DVA[0]=<1:31c560000:20000> zilog uncompressed LE contiguous birth=4030488 fill=0 cksum=7e20922ee4d68bf1:e4a75d71f8cd7cb5:13:1
Block seqno 1, won't claim
Object lvl iblk dblk lsize asize type 0 6 16K 16K 22.1M 15.2M DMU dnode
Should I be concerned? If the corruption isn't in my data, and ZFS metadata self-consistent at all times, does the corruption matter?
bash-3.00# uname -a SunOS xxxx 5.11 onnv-gate:2006-09-26 i86pc i386 i86pc
|
|
|
|
Posts:
424
From:
US
Registered:
3/9/05
|
|
|
|
Re: Re: tracking error to file
Posted:
Sep 27, 2006 3:55 PM
in response to: rab
|
|
Russell Blaine wrote: > The zdb object -> path trick doesn't give me a path name: > > > errors: The following persistent errors have been detected: > > DATASET OBJECT RANGE > 13 a51b lvl=0 blkid=9
> objectsbash-3.00# zdb -vvv mypool/rab a51b
Try 0xa51b.
--matt _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
95
From:
US
Registered:
3/9/05
|
|
|
|
Re: Re: tracking error to file
Posted:
Sep 27, 2006 6:45 PM
in response to: ahrens
To: Communities » zfs » discuss
|
|
That was it. Thanks, Matt.
|
|
|
|
Posts:
1
From:
New York Metro Area
Registered:
2/18/07
|
|
|
|
Re: tracking error to file
Posted:
Feb 18, 2007 9:19 PM
in response to: Matthew Ahrens
To: Communities » zfs » discuss
|
|
I have one that looks like this: pool: preplica-1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config:
NAME STATE READ WRITE CKSUM preplica-1 ONLINE 2 0 2 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 2 0 2 c2t3d0 ONLINE 0 0 0
errors: The following persistent errors have been detected:
DATASET OBJECT RANGE 36 3a2939 lvl=0 blkid=0
% uname -a SunOS preplica01 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Fire-V210
% zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT preplica-1 9.06T 8.78T 291G 96% ONLINE -
This is a replicated filesystem, that is kept up to date with zfs send/recv, and is never even mounted locally. Originally the error was in a regular inode. So I did the find -inum thing, and found the filename. I cp'ed the file and deleted the old copy on the original filesystem, and did some incremental zfs send|recv's to propagate the fix here. And I expected the problem to go away.
But instead it started looking like that above.
I tried the trick with zdb listed here, but zdb preplica-1 | grep "ID 36," is taking forever to complete. But none of the filesystems listed near the front of the output have ID 36.
So I tried the zdb -vvv of 0x3a2939 on each of the filesystems that I have - and none of them was ID 36! Not even the one that the bad inode had originally been reported it.
Any suggestions?
I know that it's a relatively old version of Solaris 10, with a fairly old patchset.
Should I be concerned about this error? I do know what caused it (a bad disk in the underlying hardware raid5 storage - yes... I know... I know... :-) - which was removed). So I'm not concerned about ongoing corruption from this specific problem. I just want to know what file is impacted by it.
Thanks! Davin.
|
|
|
|
Posts:
370
From:
US
Registered:
6/13/05
|
|
|
|
Re: Re: tracking error to file
Posted:
Feb 20, 2007 9:27 AM
in response to: davin
|
|
On Feb 18, 2007, at 9:19 PM, Davin Milun wrote:
> I have one that looks like this: > pool: preplica-1 > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise > restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > preplica-1 ONLINE 2 0 2 > c2t0d0 ONLINE 0 0 0 > c2t1d0 ONLINE 0 0 0 > c2t2d0 ONLINE 2 0 2 > c2t3d0 ONLINE 0 0 0 > > errors: The following persistent errors have been detected: > > DATASET OBJECT RANGE > 36 3a2939 lvl=0 blkid=0 > > % uname -a > SunOS preplica01 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Fire-V210 > > % zpool list > NAME SIZE USED AVAIL CAP HEALTH > ALTROOT > preplica-1 9.06T 8.78T 291G 96% ONLINE - > > > This is a replicated filesystem, that is kept up to date with zfs > send/recv, and is never even mounted locally. Originally the error > was in a regular inode. So I did the find -inum thing, and found > the filename. I cp'ed the file and deleted the old copy on the > original filesystem, and did some incremental zfs send|recv's to > propagate the fix here. And I expected the problem to go away.
If you run a 'zpool scrub preplica-1', then the persistent error log will be cleaned up. In the future, we'll have a background scrubber to make your life easier.
eric
_______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
125
From:
MPLS
Registered:
1/5/07
|
|
|
|
Re: Re: tracking error to file
Posted:
Feb 20, 2007 10:43 AM
in response to: goo
|
|
> > If you run a 'zpool scrub preplica-1', then the persistent error log > will be cleaned up. In the future, we'll have a background scrubber > to make your life easier. > > eric
Eric,
Great news! Are there any details about how this will be implemented yet? I am most curious to how tunable it will be as far as system resources (CPU/IO etc).
-Wade
_______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
370
From:
US
Registered:
6/13/05
|
|
|
|
Re: Re: tracking error to file
Posted:
Feb 20, 2007 11:54 AM
in response to: wstuart
|
|
On Feb 20, 2007, at 10:43 AM, Wade dot Stuart at fallon dot com wrote:
> > > > > >> >> If you run a 'zpool scrub preplica-1', then the persistent error log >> will be cleaned up. In the future, we'll have a background scrubber >> to make your life easier. >> >> eric > > Eric, > > Great news! Are there any details about how this will be > implemented > yet? I am most curious to how tunable it will be as far as system > resources (CPU/IO etc). >
No details yet, still working those out along with the infrastructure to make it happen.
eric
_______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
|