|
Replies:
5
-
Last Post:
Jan 17, 2008 11:59 AM
by: benr
|
|
|
Posts:
917
From:
Registered:
4/28/05
|
|
|
|
Panic on Zpool Import (Urgent)
Posted:
Jan 12, 2008 11:15 PM
|
|
Today, suddenly, without any apparent reason that I can find, I'm getting panic's during zpool import. The system paniced earlier today and has been suffering since. This is snv_43 on a thumper. Here's the stack:
panic[cpu0]/thread=ffffffff99adbac0: assertion failed: ss != NULL, file: ../../common/fs/zfs/space_map.c, line: 145
fffffe8000a240a0 genunix:assfail+83 () fffffe8000a24130 zfs:space_map_remove+1d6 () fffffe8000a24180 zfs:space_map_claim+49 () fffffe8000a241e0 zfs:metaslab_claim_dva+130 () fffffe8000a24240 zfs:metaslab_claim+94 () fffffe8000a24270 zfs:zio_dva_claim+27 () fffffe8000a24290 zfs:zio_next_stage+6b () fffffe8000a242b0 zfs:zio_gang_pipeline+33 () fffffe8000a242d0 zfs:zio_next_stage+6b () fffffe8000a24320 zfs:zio_wait_for_children+67 () fffffe8000a24340 zfs:zio_wait_children_ready+22 () fffffe8000a24360 zfs:zio_next_stage_async+c9 () fffffe8000a243a0 zfs:zio_wait+33 () fffffe8000a243f0 zfs:zil_claim_log_block+69 () fffffe8000a24520 zfs:zil_parse+ec () fffffe8000a24570 zfs:zil_claim+9a () fffffe8000a24750 zfs:dmu_objset_find+2cc () fffffe8000a24930 zfs:dmu_objset_find+fc () fffffe8000a24b10 zfs:dmu_objset_find+fc () fffffe8000a24bb0 zfs:spa_load+67b () fffffe8000a24c20 zfs:spa_import+a0 () fffffe8000a24c60 zfs:zfs_ioc_pool_import+79 () fffffe8000a24ce0 zfs:zfsdev_ioctl+135 () fffffe8000a24d20 genunix:cdev_ioctl+55 () fffffe8000a24d60 specfs:spec_ioctl+99 () fffffe8000a24dc0 genunix:fop_ioctl+3b () fffffe8000a24ec0 genunix:ioctl+180 () fffffe8000a24f10 unix:sys_syscall32+101 ()
syncing file systems... done
This is almost identical to a post to this list over a year ago titled "ZFS Panic". There was follow up on it but the results didn't make it back to the list.
I spent time doing a full sweep for any hardware failures, pulled 2 drives that I suspected as problematic but weren't flagged as such, etc, etc, etc. Nothing helps.
Bill suggested a 'zpool import -o ro' on the other post, but thats not working either.
I _can_ use 'zpool import' to see the pool, but I have to force the import. A simple 'zpool import' returns output in about a minute. 'zpool import -f poolname' takes almost exactly 10 minutes every single time, like it hits some timeout and then panics.
I did notice that while the 'zpool import' is running 'iostat' is useless, just hangs. I still want to believe this is some device misbehaving but I have no evidence to support that theory.
Any and all suggestions are greatly appreciated. I've put around 8 hours into this so far and I'm getting absolutely nowhere.
Thanks
benr. _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
Posts:
26
From:
US
Registered:
3/9/05
|
|
|
|
Re: Panic on Zpool Import (Urgent)
Posted:
Jan 13, 2008 8:34 AM
in response to: benr
|
|
Your system seems to have hit the BUG 6458218 :
http://bugs.opensolaris.org/view_bug.do?bug_id=6458218
It is fixed in snv_60. As far ZFS, snv_43 is quite old.
-- Prabahar.
On Jan 12, 2008, at 11:15 PM, Ben Rockwood wrote:
> Today, suddenly, without any apparent reason that I can find, I'm > getting panic's during zpool import. The system paniced earlier today > and has been suffering since. This is snv_43 on a thumper. Here's > the > stack: > > panic[cpu0]/thread=ffffffff99adbac0: assertion failed: ss != NULL, > file: > ../../common/fs/zfs/space_map.c, line: 145 > > fffffe8000a240a0 genunix:assfail+83 () > fffffe8000a24130 zfs:space_map_remove+1d6 () > fffffe8000a24180 zfs:space_map_claim+49 () > fffffe8000a241e0 zfs:metaslab_claim_dva+130 () > fffffe8000a24240 zfs:metaslab_claim+94 () > fffffe8000a24270 zfs:zio_dva_claim+27 () > fffffe8000a24290 zfs:zio_next_stage+6b () > fffffe8000a242b0 zfs:zio_gang_pipeline+33 () > fffffe8000a242d0 zfs:zio_next_stage+6b () > fffffe8000a24320 zfs:zio_wait_for_children+67 () > fffffe8000a24340 zfs:zio_wait_children_ready+22 () > fffffe8000a24360 zfs:zio_next_stage_async+c9 () > fffffe8000a243a0 zfs:zio_wait+33 () > fffffe8000a243f0 zfs:zil_claim_log_block+69 () > fffffe8000a24520 zfs:zil_parse+ec () > fffffe8000a24570 zfs:zil_claim+9a () > fffffe8000a24750 zfs:dmu_objset_find+2cc () > fffffe8000a24930 zfs:dmu_objset_find+fc () > fffffe8000a24b10 zfs:dmu_objset_find+fc () > fffffe8000a24bb0 zfs:spa_load+67b () > fffffe8000a24c20 zfs:spa_import+a0 () > fffffe8000a24c60 zfs:zfs_ioc_pool_import+79 () > fffffe8000a24ce0 zfs:zfsdev_ioctl+135 () > fffffe8000a24d20 genunix:cdev_ioctl+55 () > fffffe8000a24d60 specfs:spec_ioctl+99 () > fffffe8000a24dc0 genunix:fop_ioctl+3b () > fffffe8000a24ec0 genunix:ioctl+180 () > fffffe8000a24f10 unix:sys_syscall32+101 () > > syncing file systems... done > > This is almost identical to a post to this list over a year ago titled > "ZFS Panic". There was follow up on it but the results didn't make it > back to the list. > > I spent time doing a full sweep for any hardware failures, pulled 2 > drives that I suspected as problematic but weren't flagged as such, > etc, > etc, etc. Nothing helps. > > Bill suggested a 'zpool import -o ro' on the other post, but thats not > working either. > > I _can_ use 'zpool import' to see the pool, but I have to force the > import. A simple 'zpool import' returns output in about a minute. > 'zpool import -f poolname' takes almost exactly 10 minutes every > single > time, like it hits some timeout and then panics. > > I did notice that while the 'zpool import' is running 'iostat' is > useless, just hangs. I still want to believe this is some device > misbehaving but I have no evidence to support that theory. > > Any and all suggestions are greatly appreciated. I've put around 8 > hours into this so far and I'm getting absolutely nowhere. > > Thanks > > benr. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris dot org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
125
From:
US
Registered:
12/1/05
|
|
|
|
Re: Panic on Zpool Import (Urgent)
Posted:
Jan 13, 2008 9:34 AM
in response to: prabahar
|
|
as its been pointed out it likely 6458218 but a zdb -e poolname will tell you alittle more
Rob
_______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
Posts:
330
From:
Mumbai, India
Registered:
10/8/05
|
|
|
|
Re: Panic on Zpool Import (Urgent)
Posted:
Jan 13, 2008 9:11 AM
in response to: benr
To: Communities » zfs » discuss
|
|
Hi Ben
Not that I know much, but while monitoring the posts I read sometime long ago that there was a bug/race condition in slab allocator which results in panic on double free (ss != NULL).
I think zpool is fine but your system is tripping on this bug. Since it is snv43, I'd suggest upgrading. Is LU/fresh install possible ? Can you quickly try importing it on belenix liveCD/USB ?
- Akhilesh
PS: I'll post the bug# if I find it.
|
|
|
|
Posts:
330
From:
Mumbai, India
Registered:
10/8/05
|
|
|
|
|
Posts:
917
From:
Registered:
4/28/05
|
|
|
|
Re: Panic on Zpool Import (Urgent)
Posted:
Jan 17, 2008 11:59 AM
in response to: benr
|
|
The solution here was to upgrade to snv_78. By "upgrade" I mean re-jumpstart the system.
I tested snv_67 via net-boot but the pool paniced just as below. I also attempted using zfs_recover without success.
I then tested snv_78 via net-boot, used both "aok=1" and "zfs:zfs_recover=1" and was able to (slowly) import the pool. Following that test I exported and then did a full re-install of the box.
A very important note to anyone upgrading a Thumper! Don't forget about the NCQ bug. After upgrading to a release more recent than snv_60 add the following to /etc/system:
set sata:sata_max_queue_depth = 0x1
If you don't life will be highly unpleasant and you'll believe that disks are failing everywhere when in fact they are not.
benr.
Ben Rockwood wrote: > Today, suddenly, without any apparent reason that I can find, I'm > getting panic's during zpool import. The system paniced earlier today > and has been suffering since. This is snv_43 on a thumper. Here's the > stack: > > panic[cpu0]/thread=ffffffff99adbac0: assertion failed: ss != NULL, file: > ../../common/fs/zfs/space_map.c, line: 145 > > fffffe8000a240a0 genunix:assfail+83 () > fffffe8000a24130 zfs:space_map_remove+1d6 () > fffffe8000a24180 zfs:space_map_claim+49 () > fffffe8000a241e0 zfs:metaslab_claim_dva+130 () > fffffe8000a24240 zfs:metaslab_claim+94 () > fffffe8000a24270 zfs:zio_dva_claim+27 () > fffffe8000a24290 zfs:zio_next_stage+6b () > fffffe8000a242b0 zfs:zio_gang_pipeline+33 () > fffffe8000a242d0 zfs:zio_next_stage+6b () > fffffe8000a24320 zfs:zio_wait_for_children+67 () > fffffe8000a24340 zfs:zio_wait_children_ready+22 () > fffffe8000a24360 zfs:zio_next_stage_async+c9 () > fffffe8000a243a0 zfs:zio_wait+33 () > fffffe8000a243f0 zfs:zil_claim_log_block+69 () > fffffe8000a24520 zfs:zil_parse+ec () > fffffe8000a24570 zfs:zil_claim+9a () > fffffe8000a24750 zfs:dmu_objset_find+2cc () > fffffe8000a24930 zfs:dmu_objset_find+fc () > fffffe8000a24b10 zfs:dmu_objset_find+fc () > fffffe8000a24bb0 zfs:spa_load+67b () > fffffe8000a24c20 zfs:spa_import+a0 () > fffffe8000a24c60 zfs:zfs_ioc_pool_import+79 () > fffffe8000a24ce0 zfs:zfsdev_ioctl+135 () > fffffe8000a24d20 genunix:cdev_ioctl+55 () > fffffe8000a24d60 specfs:spec_ioctl+99 () > fffffe8000a24dc0 genunix:fop_ioctl+3b () > fffffe8000a24ec0 genunix:ioctl+180 () > fffffe8000a24f10 unix:sys_syscall32+101 () > > syncing file systems... done > > This is almost identical to a post to this list over a year ago titled > "ZFS Panic". There was follow up on it but the results didn't make it > back to the list. > > I spent time doing a full sweep for any hardware failures, pulled 2 > drives that I suspected as problematic but weren't flagged as such, etc, > etc, etc. Nothing helps. > > Bill suggested a 'zpool import -o ro' on the other post, but thats not > working either. > > I _can_ use 'zpool import' to see the pool, but I have to force the > import. A simple 'zpool import' returns output in about a minute. > 'zpool import -f poolname' takes almost exactly 10 minutes every single > time, like it hits some timeout and then panics. > > I did notice that while the 'zpool import' is running 'iostat' is > useless, just hangs. I still want to believe this is some device > misbehaving but I have no evidence to support that theory. > > Any and all suggestions are greatly appreciated. I've put around 8 > hours into this so far and I'm getting absolutely nowhere. > > Thanks > > benr. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris dot org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
_______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris dot org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|
|
|
|