OpenSolaris

Discussions Communities Projects Download Source Browser

Home » OpenSolaris Forums » zfs » discuss

Thread: Panic on Zpool Import (Urgent)

Welcome, Guest Help
Login Login
Guest Settings Guest Settings
Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 5 - Last Post: Jan 17, 2008 11:59 AM by: benr
benr

Posts: 917
From:

Registered: 4/28/05
Panic on Zpool Import (Urgent)
Posted: Jan 12, 2008 11:15 PM

  Click to reply to this thread Reply

Today, suddenly, without any apparent reason that I can find, I'm
getting panic's during zpool import. The system paniced earlier today
and has been suffering since. This is snv_43 on a thumper. Here's the
stack:

panic[cpu0]/thread=ffffffff99adbac0: assertion failed: ss != NULL, file:
../../common/fs/zfs/space_map.c, line: 145

fffffe8000a240a0 genunix:assfail+83 ()
fffffe8000a24130 zfs:space_map_remove+1d6 ()
fffffe8000a24180 zfs:space_map_claim+49 ()
fffffe8000a241e0 zfs:metaslab_claim_dva+130 ()
fffffe8000a24240 zfs:metaslab_claim+94 ()
fffffe8000a24270 zfs:zio_dva_claim+27 ()
fffffe8000a24290 zfs:zio_next_stage+6b ()
fffffe8000a242b0 zfs:zio_gang_pipeline+33 ()
fffffe8000a242d0 zfs:zio_next_stage+6b ()
fffffe8000a24320 zfs:zio_wait_for_children+67 ()
fffffe8000a24340 zfs:zio_wait_children_ready+22 ()
fffffe8000a24360 zfs:zio_next_stage_async+c9 ()
fffffe8000a243a0 zfs:zio_wait+33 ()
fffffe8000a243f0 zfs:zil_claim_log_block+69 ()
fffffe8000a24520 zfs:zil_parse+ec ()
fffffe8000a24570 zfs:zil_claim+9a ()
fffffe8000a24750 zfs:dmu_objset_find+2cc ()
fffffe8000a24930 zfs:dmu_objset_find+fc ()
fffffe8000a24b10 zfs:dmu_objset_find+fc ()
fffffe8000a24bb0 zfs:spa_load+67b ()
fffffe8000a24c20 zfs:spa_import+a0 ()
fffffe8000a24c60 zfs:zfs_ioc_pool_import+79 ()
fffffe8000a24ce0 zfs:zfsdev_ioctl+135 ()
fffffe8000a24d20 genunix:cdev_ioctl+55 ()
fffffe8000a24d60 specfs:spec_ioctl+99 ()
fffffe8000a24dc0 genunix:fop_ioctl+3b ()
fffffe8000a24ec0 genunix:ioctl+180 ()
fffffe8000a24f10 unix:sys_syscall32+101 ()

syncing file systems... done

This is almost identical to a post to this list over a year ago titled
"ZFS Panic". There was follow up on it but the results didn't make it
back to the list.

I spent time doing a full sweep for any hardware failures, pulled 2
drives that I suspected as problematic but weren't flagged as such, etc,
etc, etc. Nothing helps.

Bill suggested a 'zpool import -o ro' on the other post, but thats not
working either.

I _can_ use 'zpool import' to see the pool, but I have to force the
import. A simple 'zpool import' returns output in about a minute.
'zpool import -f poolname' takes almost exactly 10 minutes every single
time, like it hits some timeout and then panics.

I did notice that while the 'zpool import' is running 'iostat' is
useless, just hangs. I still want to believe this is some device
misbehaving but I have no evidence to support that theory.

Any and all suggestions are greatly appreciated. I've put around 8
hours into this so far and I'm getting absolutely nowhere.

Thanks

benr.
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


prabahar

Posts: 26
From: US

Registered: 3/9/05
Re: Panic on Zpool Import (Urgent)
Posted: Jan 13, 2008 8:34 AM   in response to: benr

  Click to reply to this thread Reply

Your system seems to have hit the BUG 6458218 :

http://bugs.opensolaris.org/view_bug.do?bug_id=6458218

It is fixed in snv_60. As far ZFS, snv_43 is quite old.

--
Prabahar.

On Jan 12, 2008, at 11:15 PM, Ben Rockwood wrote:

> Today, suddenly, without any apparent reason that I can find, I'm
> getting panic's during zpool import. The system paniced earlier today
> and has been suffering since. This is snv_43 on a thumper. Here's
> the
> stack:
>
> panic[cpu0]/thread=ffffffff99adbac0: assertion failed: ss != NULL,
> file:
> ../../common/fs/zfs/space_map.c, line: 145
>
> fffffe8000a240a0 genunix:assfail+83 ()
> fffffe8000a24130 zfs:space_map_remove+1d6 ()
> fffffe8000a24180 zfs:space_map_claim+49 ()
> fffffe8000a241e0 zfs:metaslab_claim_dva+130 ()
> fffffe8000a24240 zfs:metaslab_claim+94 ()
> fffffe8000a24270 zfs:zio_dva_claim+27 ()
> fffffe8000a24290 zfs:zio_next_stage+6b ()
> fffffe8000a242b0 zfs:zio_gang_pipeline+33 ()
> fffffe8000a242d0 zfs:zio_next_stage+6b ()
> fffffe8000a24320 zfs:zio_wait_for_children+67 ()
> fffffe8000a24340 zfs:zio_wait_children_ready+22 ()
> fffffe8000a24360 zfs:zio_next_stage_async+c9 ()
> fffffe8000a243a0 zfs:zio_wait+33 ()
> fffffe8000a243f0 zfs:zil_claim_log_block+69 ()
> fffffe8000a24520 zfs:zil_parse+ec ()
> fffffe8000a24570 zfs:zil_claim+9a ()
> fffffe8000a24750 zfs:dmu_objset_find+2cc ()
> fffffe8000a24930 zfs:dmu_objset_find+fc ()
> fffffe8000a24b10 zfs:dmu_objset_find+fc ()
> fffffe8000a24bb0 zfs:spa_load+67b ()
> fffffe8000a24c20 zfs:spa_import+a0 ()
> fffffe8000a24c60 zfs:zfs_ioc_pool_import+79 ()
> fffffe8000a24ce0 zfs:zfsdev_ioctl+135 ()
> fffffe8000a24d20 genunix:cdev_ioctl+55 ()
> fffffe8000a24d60 specfs:spec_ioctl+99 ()
> fffffe8000a24dc0 genunix:fop_ioctl+3b ()
> fffffe8000a24ec0 genunix:ioctl+180 ()
> fffffe8000a24f10 unix:sys_syscall32+101 ()
>
> syncing file systems... done
>
> This is almost identical to a post to this list over a year ago titled
> "ZFS Panic". There was follow up on it but the results didn't make it
> back to the list.
>
> I spent time doing a full sweep for any hardware failures, pulled 2
> drives that I suspected as problematic but weren't flagged as such,
> etc,
> etc, etc. Nothing helps.
>
> Bill suggested a 'zpool import -o ro' on the other post, but thats not
> working either.
>
> I _can_ use 'zpool import' to see the pool, but I have to force the
> import. A simple 'zpool import' returns output in about a minute.
> 'zpool import -f poolname' takes almost exactly 10 minutes every
> single
> time, like it hits some timeout and then panics.
>
> I did notice that while the 'zpool import' is running 'iostat' is
> useless, just hangs. I still want to believe this is some device
> misbehaving but I have no evidence to support that theory.
>
> Any and all suggestions are greatly appreciated. I've put around 8
> hours into this so far and I'm getting absolutely nowhere.
>
> Thanks
>
> benr.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris dot org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


picker

Posts: 125
From: US

Registered: 12/1/05
Re: Panic on Zpool Import (Urgent)
Posted: Jan 13, 2008 9:34 AM   in response to: prabahar

  Click to reply to this thread Reply

as its been pointed out it likely 6458218
but a zdb -e poolname
will tell you alittle more

Rob

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


mritun

Posts: 330
From: Mumbai, India

Registered: 10/8/05
Re: Panic on Zpool Import (Urgent)
Posted: Jan 13, 2008 9:11 AM   in response to: benr
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Hi Ben

Not that I know much, but while monitoring the posts I read sometime long ago that there was a bug/race condition in slab allocator which results in panic on double free (ss != NULL).

I think zpool is fine but your system is tripping on this bug. Since it is snv43, I'd suggest upgrading. Is LU/fresh install possible ? Can you quickly try importing it on belenix liveCD/USB ?

- Akhilesh

PS: I'll post the bug# if I find it.

mritun

Posts: 330
From: Mumbai, India

Registered: 10/8/05
Re: Panic on Zpool Import (Urgent)
Posted: Jan 13, 2008 9:18 AM   in response to: benr
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Most probable culprit (close, but not identical stacktrace):

http://bugs.opensolaris.org/view_bug.do?bug_id=6458218

Fixed since snv60.

benr

Posts: 917
From:

Registered: 4/28/05
Re: Panic on Zpool Import (Urgent)
Posted: Jan 17, 2008 11:59 AM   in response to: benr

  Click to reply to this thread Reply

The solution here was to upgrade to snv_78. By "upgrade" I mean
re-jumpstart the system.

I tested snv_67 via net-boot but the pool paniced just as below. I also
attempted using zfs_recover without success.

I then tested snv_78 via net-boot, used both "aok=1" and
"zfs:zfs_recover=1" and was able to (slowly) import the pool. Following
that test I exported and then did a full re-install of the box.

A very important note to anyone upgrading a Thumper! Don't forget about
the NCQ bug. After upgrading to a release more recent than snv_60 add
the following to /etc/system:

set sata:sata_max_queue_depth = 0x1

If you don't life will be highly unpleasant and you'll believe that disks are failing everywhere when in fact they are not.

benr.




Ben Rockwood wrote:
> Today, suddenly, without any apparent reason that I can find, I'm
> getting panic's during zpool import. The system paniced earlier today
> and has been suffering since. This is snv_43 on a thumper. Here's the
> stack:
>
> panic[cpu0]/thread=ffffffff99adbac0: assertion failed: ss != NULL, file:
> ../../common/fs/zfs/space_map.c, line: 145
>
> fffffe8000a240a0 genunix:assfail+83 ()
> fffffe8000a24130 zfs:space_map_remove+1d6 ()
> fffffe8000a24180 zfs:space_map_claim+49 ()
> fffffe8000a241e0 zfs:metaslab_claim_dva+130 ()
> fffffe8000a24240 zfs:metaslab_claim+94 ()
> fffffe8000a24270 zfs:zio_dva_claim+27 ()
> fffffe8000a24290 zfs:zio_next_stage+6b ()
> fffffe8000a242b0 zfs:zio_gang_pipeline+33 ()
> fffffe8000a242d0 zfs:zio_next_stage+6b ()
> fffffe8000a24320 zfs:zio_wait_for_children+67 ()
> fffffe8000a24340 zfs:zio_wait_children_ready+22 ()
> fffffe8000a24360 zfs:zio_next_stage_async+c9 ()
> fffffe8000a243a0 zfs:zio_wait+33 ()
> fffffe8000a243f0 zfs:zil_claim_log_block+69 ()
> fffffe8000a24520 zfs:zil_parse+ec ()
> fffffe8000a24570 zfs:zil_claim+9a ()
> fffffe8000a24750 zfs:dmu_objset_find+2cc ()
> fffffe8000a24930 zfs:dmu_objset_find+fc ()
> fffffe8000a24b10 zfs:dmu_objset_find+fc ()
> fffffe8000a24bb0 zfs:spa_load+67b ()
> fffffe8000a24c20 zfs:spa_import+a0 ()
> fffffe8000a24c60 zfs:zfs_ioc_pool_import+79 ()
> fffffe8000a24ce0 zfs:zfsdev_ioctl+135 ()
> fffffe8000a24d20 genunix:cdev_ioctl+55 ()
> fffffe8000a24d60 specfs:spec_ioctl+99 ()
> fffffe8000a24dc0 genunix:fop_ioctl+3b ()
> fffffe8000a24ec0 genunix:ioctl+180 ()
> fffffe8000a24f10 unix:sys_syscall32+101 ()
>
> syncing file systems... done
>
> This is almost identical to a post to this list over a year ago titled
> "ZFS Panic". There was follow up on it but the results didn't make it
> back to the list.
>
> I spent time doing a full sweep for any hardware failures, pulled 2
> drives that I suspected as problematic but weren't flagged as such, etc,
> etc, etc. Nothing helps.
>
> Bill suggested a 'zpool import -o ro' on the other post, but thats not
> working either.
>
> I _can_ use 'zpool import' to see the pool, but I have to force the
> import. A simple 'zpool import' returns output in about a minute.
> 'zpool import -f poolname' takes almost exactly 10 minutes every single
> time, like it hits some timeout and then panics.
>
> I did notice that while the 'zpool import' is running 'iostat' is
> useless, just hangs. I still want to believe this is some device
> misbehaving but I have no evidence to support that theory.
>
> Any and all suggestions are greatly appreciated. I've put around 8
> hours into this so far and I'm getting absolutely nowhere.
>
> Thanks
>
> benr.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris dot org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





Terms of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
Copyright © 1995-2005 Sun Microsystems, Inc.