OpenSolaris

Discussions Communities Projects Download Source Browser

Home » OpenSolaris Forums » zfs » discuss

Thread: Snapshots impact on performance

Welcome, Guest Help
Login Login
Guest Settings Guest Settings
Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 25 - Last Post: Aug 24, 2007 6:49 AM by: kangurek
Robert Milkowski
rmilkowski@task.gda.pl
Snapshots impact on performance
Posted: Oct 16, 2006 5:17 AM

  Click to reply to this thread Reply

Hello zfs-discuss,

S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is
created and 7 copies are kept. There's quota set for a file system
however there's always at least 50GB of free space in a file system
(and much more in a pool). ZFS file system is exported over NFS.

Snapshots consume about 280GB of space.

We have noticed so performance problems on nfs clients to this file
system even during times with smaller load. Rising quota didn't
help. However removing oldest snapshot automatically solved
performance problems.

I do not have more details - sorry.

Is it expected for snapshots to have very noticeable performance
impact on file system being snapshoted?


--
Best regards,
Robert mailto:rmilkowski at task dot gda dot pl
http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



ahrens

Posts: 424
From: US

Registered: 3/9/05
Re: Snapshots impact on performance
Posted: Oct 16, 2006 8:07 AM   in response to: Robert Milkowski

  Click to reply to this thread Reply

Robert Milkowski wrote:
> Hello zfs-discuss,
>
> S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is
> created and 7 copies are kept. There's quota set for a file system
> however there's always at least 50GB of free space in a file system
> (and much more in a pool). ZFS file system is exported over NFS.
>
> Snapshots consume about 280GB of space.
>
> We have noticed so performance problems on nfs clients to this file
> system even during times with smaller load. Rising quota didn't
> help. However removing oldest snapshot automatically solved
> performance problems.
>
> I do not have more details - sorry.
>
> Is it expected for snapshots to have very noticeable performance
> impact on file system being snapshoted?

No, this behavior is unexpected. The only way that snapshots should
have a performance impact on access to the filesystem is if you are
running low on space in the pool or quota (which it sounds like you are
not).

Can you describe what the performance problems were? What was the
workload like? What problem did you identify? How did it improve when
you 'zfs destroy'-ed the oldest snapshot? Are you sure that the oldest
snapshot wasn't pushing you close to your quota?

--matt
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



perrin

Posts: 228
From: US

Registered: 3/9/05
Re: Snapshots impact on performance
Posted: Oct 16, 2006 9:50 AM   in response to: ahrens

  Click to reply to this thread Reply



Matthew Ahrens wrote On 10/16/06 09:07,:
> Robert Milkowski wrote:
>
>> Hello zfs-discuss,
>>
>> S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is
>> created and 7 copies are kept. There's quota set for a file system
>> however there's always at least 50GB of free space in a file system
>> (and much more in a pool). ZFS file system is exported over NFS.
>>
>> Snapshots consume about 280GB of space.
>> We have noticed so performance problems on nfs clients to this file
>> system even during times with smaller load. Rising quota didn't
>> help. However removing oldest snapshot automatically solved
>> performance problems.
>>
>> I do not have more details - sorry.
>>
>> Is it expected for snapshots to have very noticeable performance
>> impact on file system being snapshoted?
>
>
> No, this behavior is unexpected. The only way that snapshots should
> have a performance impact on access to the filesystem is if you are
> running low on space in the pool or quota (which it sounds like you are
> not).
>
> Can you describe what the performance problems were? What was the
> workload like? What problem did you identify? How did it improve when
> you 'zfs destroy'-ed the oldest snapshot? Are you sure that the oldest
> snapshot wasn't pushing you close to your quota?
>
> --matt

I could well believe there would be a hiccup when the snapshot is
taken on the rest of the pool. Each snapshot calls txg_wait_synced
four times. A few related to the zil and one from dsl_sync_task_group_wait
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Robert Milkowski
rmilkowski@task.gda.pl
Re[2]: Snapshots impact on performance
Posted: Oct 17, 2006 6:55 AM   in response to: ahrens

  Click to reply to this thread Reply

Hello Matthew,

Monday, October 16, 2006, 5:07:50 PM, you wrote:

MA> Robert Milkowski wrote:
>> Hello zfs-discuss,
>>
>> S10U2+patches. ZFS pool of about 2TB in size. Each day snapshot is
>> created and 7 copies are kept. There's quota set for a file system
>> however there's always at least 50GB of free space in a file system
>> (and much more in a pool). ZFS file system is exported over NFS.
>>
>> Snapshots consume about 280GB of space.
>>
>> We have noticed so performance problems on nfs clients to this file
>> system even during times with smaller load. Rising quota didn't
>> help. However removing oldest snapshot automatically solved
>> performance problems.
>>
>> I do not have more details - sorry.
>>
>> Is it expected for snapshots to have very noticeable performance
>> impact on file system being snapshoted?

MA> No, this behavior is unexpected. The only way that snapshots should
MA> have a performance impact on access to the filesystem is if you are
MA> running low on space in the pool or quota (which it sounds like you are
MA> not).

MA> Can you describe what the performance problems were? What was the
MA> workload like? What problem did you identify? How did it improve when
MA> you 'zfs destroy'-ed the oldest snapshot? Are you sure that the oldest
MA> snapshot wasn't pushing you close to your quota?

I wasn't on-site every-time it happened... but it looks like all NFS
clients to that file system have problem accessing data for 2-5s and
iostat on clients shows 100% busy to the disk.

There was well over 100gb of free space in a file system, then rising
quote to add another 100gb didn't help. After removing snapshots all
problems disappeared almost immediately. No new snapshots were created
during this time. It happened second time - and again, quite a lot of
free space and destroying one snapshot helps.

If it happens again I'll try to get some more specific data - however
it depends on when it happens as during peak hours I'll probably just
destroy a snapshot to get it working.


--
Best regards,
Robert mailto:rmilkowski at task dot gda dot pl
http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



ahrens

Posts: 424
From: US

Registered: 3/9/05
Re: Snapshots impact on performance
Posted: Oct 17, 2006 8:05 AM   in response to: Robert Milkowski

  Click to reply to this thread Reply

Robert Milkowski wrote:
> If it happens again I'll try to get some more specific data - however
> it depends on when it happens as during peak hours I'll probably just
> destroy a snapshot to get it working.

If it happens again, it would be great if you could gather some data
before you destroy the snapshot so we have some chance of figuring out
what's going on here. 'iostat -xnpc 1' will tell us if it's CPU or disk
bound. 'lockstat -kgiw sleep 10' will tell us what functions are using
CPU. 'echo "::walk thread|::findstack" | mdb -k' will tell us where
threads are stuck.

Actually, if you could gather each of those both while you're observing
the problem, and then after the problem goes away, that would be helpful.

--matt
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



milek

Posts: 785
From: GB

Registered: 4/27/05
Re: Snapshots impact on performance
Posted: Oct 24, 2006 12:48 PM   in response to: ahrens
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Hi.

On nfs clients which are mounting file system f3-1/d611 I can see 3-5s periods of 100% busy (iostat) and almost no IOs issued to nfs server, on nfs server at the same time disk activity is almost 0 (both iostat and zpool iostat). However CPU activity increases in SYS during that periods.

There's quite a lot of free space in a f3-1/d611 as you can see:

# zfs list | egrep "611|NAM"
NAME USED AVAIL REFER MOUNTPOINT
f3-1/d611 381G 219G 151G //f3-1/d611
f3-1/d611@auto-2006-10-18 48.2G - 156G -
f3-1/d611@auto-2006-10-19 19.4G - 145G -
f3-1/d611@auto-2006-10-20 21.9G - 155G -
f3-1/d611@auto-2006-10-21 22.9G - 151G -
f3-1/d611@auto-2006-10-22 18.5G - 145G -
f3-1/d611@auto-2006-10-23 17.6G - 148G -
f3-1/d611@auto-2006-10-24 14.5G - 154G -

And much more in a pool itself:

# zfs list | egrep "f3-1 |NAM"
NAME USED AVAIL REFER MOUNTPOINT
f3-1 563G 642G 9.18G //f3-1

# zfs get all f3-1/d611
NAME PROPERTY VALUE SOURCE
f3-1/d611 type filesystem -
f3-1/d611 creation Tue Aug 29 2:07 2006 -
f3-1/d611 used 381G -
f3-1/d611 available 219G -
f3-1/d611 referenced 151G -
f3-1/d611 compressratio 1.00x -
f3-1/d611 mounted yes -
f3-1/d611 quota 600G local
f3-1/d611 reservation none default
f3-1/d611 recordsize 128K default
f3-1/d611 mountpoint //f3-1/d611 default
f3-1/d611 sharenfs off default
f3-1/d611 checksum on default
f3-1/d611 compression off default
f3-1/d611 atime off inherited from f3-1
f3-1/d611 devices on default
f3-1/d611 exec on default
f3-1/d611 setuid on default
f3-1/d611 readonly off default
f3-1/d611 zoned off default
f3-1/d611 snapdir hidden default
f3-1/d611 aclmode groupmask default
f3-1/d611 aclinherit secure default


Access to this file system is only over NFS. Right now nfsd has less than 400 active threads.

Now output from several commands you requested during the times when disk activity is low, CPU usage higher and nfs clients see problem.

# zpool iostat 1
[...]
f3-1 562G 662G 20 0 156K 0
f3-1 562G 662G 29 6 842K 887K
f3-1 562G 662G 12 0 238K 0
f3-1 562G 662G 927 25 11.0M 2.38M
f3-1 562G 662G 1.96K 221 19.8M 5.13M
f3-1 562G 662G 110 112 2.09M 1.72M


# iostat -xnzp 1
[...]
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
44.0 0.0 819.4 0.0 0.0 0.3 0.0 7.2 0 22 c5t600C0FF000000000098FD57F9DA83C00d0
44.0 0.0 819.4 0.0 0.0 0.3 0.0 7.2 0 22 c5t600C0FF000000000098FD57F9DA83C00d0s0
26.0 0.0 179.5 0.0 0.0 0.2 0.0 6.8 0 14 c5t600C0FF000000000098FD55DBA4EA000d0
26.0 0.0 179.5 0.0 0.0 0.2 0.0 6.8 0 14 c5t600C0FF000000000098FD55DBA4EA000d0s0
26.0 0.0 224.0 0.0 0.0 0.2 0.0 7.2 0 15 c5t600C0FF000000000098FD516E4403200d0
26.0 0.0 224.0 0.0 0.0 0.2 0.0 7.2 0 15 c5t600C0FF000000000098FD516E4403200d0s0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
12.0 0.0 216.5 0.0 0.0 0.1 0.0 6.1 0 6 c5t600C0FF000000000098FD57F9DA83C00d0
12.0 0.0 216.5 0.0 0.0 0.1 0.0 6.1 0 6 c5t600C0FF000000000098FD57F9DA83C00d0s0
10.0 0.0 70.5 0.0 0.0 0.1 0.0 6.4 0 6 c5t600C0FF000000000098FD55DBA4EA000d0
10.0 0.0 70.5 0.0 0.0 0.1 0.0 6.4 0 6 c5t600C0FF000000000098FD55DBA4EA000d0s0
8.0 0.0 63.0 0.0 0.0 0.0 0.0 6.0 0 5 c5t600C0FF000000000098FD516E4403200d0
8.0 0.0 63.0 0.0 0.0 0.0 0.0 6.0 0 5 c5t600C0FF000000000098FD516E4403200d0s0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 1.0 0.0 0.5 0.0 0.0 0.0 12.3 0 1 c1t0d0
0.0 1.0 0.0 0.5 0.0 0.0 0.0 12.3 0 1 c1t0d0s6
0.0 1.0 0.0 0.5 0.0 0.0 0.0 9.5 0 1 c1t1d0
0.0 1.0 0.0 0.5 0.0 0.0 0.0 9.5 0 1 c1t1d0s6
7.0 0.0 55.8 0.0 0.0 0.1 0.0 7.8 0 5 c5t600C0FF000000000098FD57F9DA83C00d0
7.0 0.0 55.8 0.0 0.0 0.1 0.0 7.8 0 5 c5t600C0FF000000000098FD57F9DA83C00d0s0
6.0 7.0 66.8 893.2 0.0 0.1 0.0 6.4 0 5 c5t600C0FF000000000098FD55DBA4EA000d0
6.0 7.0 66.8 893.2 0.0 0.1 0.0 6.4 0 5 c5t600C0FF000000000098FD55DBA4EA000d0s0
9.0 0.0 665.4 0.0 0.0 0.1 0.0 6.5 0 4 c5t600C0FF000000000098FD516E4403200d0
9.0 0.0 665.4 0.0 0.0 0.1 0.0 6.5 0 4 c5t600C0FF000000000098FD516E4403200d0s0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
5.0 0.0 36.6 0.0 0.0 0.0 0.0 7.3 0 4 c5t600C0FF000000000098FD57F9DA83C00d0
5.0 0.0 36.6 0.0 0.0 0.0 0.0 7.3 0 4 c5t600C0FF000000000098FD57F9DA83C00d0s0
11.0 0.0 84.3 0.0 0.0 0.1 0.0 6.1 0 6 c5t600C0FF000000000098FD55DBA4EA000d0
11.0 0.0 84.3 0.0 0.0 0.1 0.0 6.1 0 6 c5t600C0FF000000000098FD55DBA4EA000d0s0
5.0 0.0 181.1 0.0 0.0 0.0 0.0 7.1 0 4 c5t600C0FF000000000098FD516E4403200d0
5.0 0.0 181.1 0.0 0.0 0.0 0.0 7.1 0 4 c5t600C0FF000000000098FD516E4403200d0s0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
229.9 10.0 2795.9 639.9 0.0 1.9 0.0 7.8 0 49 c5t600C0FF000000000098FD57F9DA83C00d0
229.9 10.0 2795.9 639.9 0.0 1.9 0.0 7.8 0 49 c5t600C0FF000000000098FD57F9DA83C00d0s0
243.9 13.0 3790.1 1151.7 0.0 2.0 0.0 7.6 0 54 c5t600C0FF000000000098FD55DBA4EA000d0
243.9 13.0 3790.1 1151.7 0.0 2.0 0.0 7.6 0 54 c5t600C0FF000000000098FD55DBA4EA000d0s0
198.0 0.0 2355.5 0.0 0.0 1.5 0.0 7.7 0 51 c5t600C0FF000000000098FD516E4403200d0
198.0 0.0 2355.5 0.0 0.0 1.5 0.0 7.7 0 51 c5t600C0FF000000000098FD516E4403200d0s0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
712.1 71.0 6038.5 1469.6 0.0 8.4 0.0 10.8 0 98 c5t600C0FF000000000098FD57F9DA83C00d0
712.1 71.0 6038.5 1469.6 0.0 8.4 0.0 10.8 0 98 c5t600C0FF000000000098FD57F9DA83C00d0s0
759.1 164.0 7574.6 2283.2 0.0 10.2 0.0 11.1 0 100 c5t600C0FF000000000098FD55DBA4EA000d0
759.1 164.0 7574.6 2283.2 0.0 10.2 0.0 11.1 0 100 c5t600C0FF000000000098FD55DBA4EA000d0s0
716.1 40.0 8526.2 1956.2 0.0 8.7 0.0 11.5 0 98 c5t600C0FF000000000098FD516E4403200d0
716.1 40.0 8526.2 1956.2 0.0 8.7 0.0 11.5 0 98 c5t600C0FF000000000098FD516E4403200d0s0


# mpstat 1
[...]
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 7 72 61 563 6 103 13 0 253 1 20 0 79
1 0 0 3 57 31 359 0 70 11 1 14 0 82 0 18
2 0 0 8 369 356 755 3 144 18 0 140 0 3 0 97
3 0 0 9 321 218 334 1 18 8 0 32 0 57 0 43
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 1 21 16 192 0 13 3 0 40 0 55 0 45
1 0 0 1 54 27 324 2 50 8 0 195 0 15 0 85
2 0 0 2 165 161 148 0 39 2 0 49 0 1 0 99
3 0 0 7 260 158 144 1 36 4 0 31 0 60 0 40
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 15 36 29 113 1 35 6 0 18 0 87 0 13
1 0 0 5 60 33 497 1 59 11 0 108 0 48 0 52
2 0 0 5 491 483 477 25 72 9 0 214 1 12 0 87
3 0 0 7 288 183 328 2 30 4 0 45 0 46 0 54
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 7 15 12 270 0 33 6 0 175 0 12 0 88
1 0 0 3 47 25 178 0 42 4 0 47 1 5 0 94
2 0 0 1 64 60 70 0 22 3 0 75 0 0 0 100
3 0 0 2 267 162 2 2 1 1 0 0 0 100 0 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 112 493 428 3068 7 638 279 91 24 0 50 0 50
1 0 0 55 87 30 1541 4 402 167 71 154 0 70 0 30
2 0 0 145 4074 3932 4564 5 998 490 497 106 0 51 0 49
3 0 0 139 764 587 4220 12 798 334 461 17 0 66 0 34
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 281 1189 902 9050 22 1867 1173 1573 139 0 62 0 38
1 0 0 136 357 27 7267 11 1645 985 1128 42 0 76 0 24
2 0 0 168 7332 7079 7255 21 1608 952 1138 80 0 63 0 37
3 0 0 331 1199 1058 1589 19 272 253 58 34 0 84 0 16



Different time period when disk activity is small:

# lockstat -kgIw sleep 10 | less

Profiling interrupt: 3904 events in 10.055 seconds (388 events/sec)

Count genr cuml rcnt nsec CPU+PIL Hottest Caller
----------------------------------------------------------------------------- --
8925 229% ---- 0.00 5067 cpu[1] nfssys
7386 189% ---- 0.00 2610 cpu[3] thread_start
6566 168% ---- 0.00 4404 cpu[2] thread_start
6453 165% ---- 0.00 3325 cpu[0] thread_start
1259 32% ---- 0.00 1756 cpu[0]+11 thread_start
1161 30% ---- 0.00 2247 cpu[2]+11 thread_start
1159 30% ---- 0.00 1472 cpu[3]+11 thread_start
939 24% ---- 0.00 2459 cpu[1]+11 thread_start
190 5% ---- 0.00 2880 cpu[0]+6 pci_intr_wrapper
181 5% ---- 0.00 2914 cpu[3]+6 ql_response_pkt
103 3% ---- 0.00 4591 cpu[2]+6 intr_thread
7 0% ---- 0.00 1118 cpu[3]+12 rmc_comm_serdev_receive
5 0% ---- 0.00 1022 cpu[3]+13 cyclic_softint
----------------------------------------------------------------------------- --




Now during another period when disk activity is low and nfs clients see problem:

# dtrace -n fbt:::entry'{self->vt=vtimestamp;}' -n fbt:::return'/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt =0;}' -n tick-5s'{printa(@);exit(0);}'
[...]
page_sub_common 840400
mutex_vector_enter 880400
callout_schedule_1 1106200
kmem_cache_alloc 1182400
lock_set_spl_spin 1240000
thread_lock 1269600
syscall_mstate 1359200
set_freemem 1575800
page_trylock 1587500
hwblkpagecopy 1885100
set_anoninfo 1904400
page_pptonum 2044900
xc_serv 2147600
lzjb_decompress 3209600
resume 4783500
hwblkclr 7101100
page_next_scan_large 23648600
generic_idle_cpu 69234100
disp_getwork 139261800
avl_walk 669424900


Lets see how it looks when there;s no problem:

# dtrace -n fbt:::entry'{self->vt=vtimestamp;}' -n fbt:::return'/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt =0;}' -n tick-5s'{printa(@);exit(0);}'
[...]
avl_insert 8591400
sleepq_insert 8660800
fletcher_4_native 9789400
buf_hash 10454600
copy_pattern 12395200
hat_getpfnum 14970100
ddi_get_instance 16348900
ddi_get_soft_state 19676500
pci_pbm_dma_sync 21993300
space_map_seg_compare 25628100
lock_set_spin 31066500
ip_cksum 31528900
kmem_cache_alloc 34203500
kmem_free 35418800
page_next_scan_large 35833700
fletcher_2_native 48868100
kmem_cache_free 54456400
avl_destroy_nodes 63271000
lzjb_decompress 72163800
lock_set_spl_spin 72655100
lzjb_compress 91146600
mutex_vector_enter 113553300
resume 2836183200


No avl_walk :)

Heh, I'm sure I have seen avl_walk consuming lot of CPU before...

So wait for another such period and (6-7seconds):

# dtrace -n fbt::avl_walk:entry'{@[stack()]=count();}'
[...]
zfs`space_map_add+0x160
zfs`space_map_load+0x218
zfs`metaslab_activate+0x3c
zfs`metaslab_group_alloc+0x1c0
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_dva_allocate+0x4c
zfs`zio_write_compress+0x1ec
zfs`arc_write+0xe4
zfs`dbuf_sync+0x6d8
zfs`dnode_sync+0x35c
zfs`dmu_objset_sync_dnodes+0x6c
zfs`dmu_objset_sync+0x54
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x104
zfs`spa_sync+0xe4
zfs`txg_sync_thread+0x134
unix`thread_start+0x4
1259

zfs`space_map_add+0x174
zfs`space_map_load+0x218
zfs`metaslab_activate+0x3c
zfs`metaslab_group_alloc+0x1c0
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_alloc_blk+0x70
zfs`zil_lwb_write_start+0x8c
zfs`zil_lwb_commit+0x1ac
zfs`zil_commit+0x1b0
zfs`zfs_fsync+0xa8
genunix`fop_fsync+0x14
nfssrv`rfs3_create+0x700
nfssrv`common_dispatch+0x444
rpcmod`svc_getreq+0x154
rpcmod`svc_run+0x198
nfs`nfssys+0x1c4
unix`syscall_trap32+0xcc
183895

zfs`space_map_add+0x160
zfs`space_map_load+0x218
zfs`metaslab_activate+0x3c
zfs`metaslab_group_alloc+0x1c0
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_alloc_blk+0x70
zfs`zil_lwb_write_start+0x8c
zfs`zil_lwb_commit+0x1ac
zfs`zil_commit+0x1b0
zfs`zfs_fsync+0xa8
genunix`fop_fsync+0x14
nfssrv`rfs3_create+0x700
nfssrv`common_dispatch+0x444
rpcmod`svc_getreq+0x154
rpcmod`svc_run+0x198
nfs`nfssys+0x1c4
unix`syscall_trap32+0xcc
196122

zfs`metaslab_ff_alloc+0x9c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x1e4
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_alloc_blk+0x70
zfs`zil_lwb_write_start+0x8c
zfs`zil_commit+0x21c
zfs`zfs_fsync+0xa8
genunix`fop_fsync+0x14
nfssrv`rfs3_remove+0x218
nfssrv`common_dispatch+0x444
rpcmod`svc_getreq+0x154
rpcmod`svc_run+0x198
nfs`nfssys+0x1c4
unix`syscall_trap32+0xcc
854743

zfs`metaslab_ff_alloc+0x9c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x1e4
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_dva_allocate+0x4c
zfs`zio_write_compress+0x1ec
zfs`arc_write+0xe4
zfs`dbuf_sync+0x6d8
zfs`dnode_sync+0x35c
zfs`dmu_objset_sync_dnodes+0x6c
zfs`dmu_objset_sync+0x54
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x104
zfs`spa_sync+0xe4
zfs`txg_sync_thread+0x134
unix`thread_start+0x4
1042372

zfs`metaslab_ff_alloc+0x9c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x1e4
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_alloc_blk+0x70
zfs`zil_lwb_write_start+0x8c
zfs`zil_lwb_commit+0x1ac
zfs`zil_commit+0x1b0
zfs`zfs_fsync+0xa8
genunix`fop_fsync+0x14
nfssrv`rfs3_create+0x700
nfssrv`common_dispatch+0x444
rpcmod`svc_getreq+0x154
rpcmod`svc_run+0x198
nfs`nfssys+0x1c4
unix`syscall_trap32+0xcc
1415957


And another "bad" time period:

# dtrace -n fbt::avl_walk:entry'{self->vt=vtimestamp;}' -n fbt::avl_walk:return'/self->vt/{@[stack()]=sum(vtimestamp-self->vt);self- >vt=0;}' -n tick-6s'{printa(@);exit(0);}'
[...]
zfs`vdev_queue_io_to_issue+0xc4
zfs`vdev_queue_io_done+0x4c
zfs`vdev_disk_io_done+0x4
genunix`taskq_thread+0x1a4
unix`thread_start+0x4
11700

zfs`vdev_queue_io_to_issue+0x58
zfs`vdev_queue_io_done+0x4c
zfs`vdev_disk_io_done+0x4
genunix`taskq_thread+0x1a4
unix`thread_start+0x4
20800

zfs`metaslab_ff_alloc+0x9c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x1e4
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_dva_allocate+0x4c
zfs`zio_write_compress+0x1ec
zfs`arc_write+0xe4
zfs`dbuf_sync+0x6d8
zfs`dnode_sync+0x35c
zfs`dmu_objset_sync_dnodes+0x6c
zfs`dmu_objset_sync+0x54
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x104
zfs`spa_sync+0xe4
zfs`txg_sync_thread+0x134
unix`thread_start+0x4
382903900

zfs`metaslab_ff_alloc+0x9c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x1e4
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_alloc_blk+0x70
zfs`zil_lwb_write_start+0x8c
zfs`zil_lwb_commit+0x1ac
zfs`zil_commit+0x1b0
zfs`zfs_write+0x6c4
genunix`fop_write+0x20
nfssrv`rfs3_write+0x368
nfssrv`common_dispatch+0x444
rpcmod`svc_getreq+0x154
rpcmod`svc_run+0x198
nfs`nfssys+0x1c4
unix`syscall_trap32+0xcc
475080700

zfs`metaslab_ff_alloc+0x9c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x1e4
zfs`metaslab_alloc_dva+0x114
zfs`metaslab_alloc+0x2c
zfs`zio_alloc_blk+0x70
zfs`zil_lwb_write_start+0x8c
zfs`zil_lwb_commit+0x1ac
zfs`zil_commit+0x1b0
zfs`zfs_fsync+0xa8
genunix`fop_fsync+0x14
nfssrv`rfs3_create+0x700
nfssrv`common_dispatch+0x444
rpcmod`svc_getreq+0x154
rpcmod`svc_run+0x198
nfs`nfssys+0x1c4
unix`syscall_trap32+0xcc
481455800




So lets destroy oldest snapshot:

# zfs destroy f3-1/d611@auto-2006-10-18
[it took about 4 minutes!]

# mdb -k
Loading modules: [ unix krtld genunix dtrace specfs ufs sd md ip sctp usba fcp fctl qlc ssd lofs logindmux ptm cpc fcip random crypto zfs nfs ]
> ::ps!grep zfs
R 20905 20464 20905 20458 0 0x4a004000 00000301ec367c20 zfs
> 00000301ec367c20::walk thread|::findstack -v
stack pointer for thread 301d1a02fe0: 2a103a28b71
[ 000002a103a28b71 cv_wait+0x38() ]
000002a103a28c21 txg_wait_synced+0x54(60007327290, 1a7bda, 1a7bd9, 600073272d0, 600073272d2, 60007327288)
000002a103a28cd1 dsl_dir_sync_task+0x124(60003867d00, 0, 60003f6800a, 0, 0, 1a7bda)
000002a103a28d81 dsl_dataset_destroy+0x98(60003f68000, 5, 3, 60003f68009, 600073271c0, 0)
000002a103a28f61 dmu_objset_destroy+0x3c(60003f68000, 0, 0, 2, 0, 54)
000002a103a29021 zfsdev_ioctl+0x160(70370c00, 54, ffbfed38, 1c, 70, e68)
000002a103a290d1 fop_ioctl+0x20(301ee6220c0, 5a1c, ffbfed38, 100003, 60003dcebf8, 12019f0)
000002a103a29191 ioctl+0x184(4, 3005b7c2ba8, ffbfed38, ff38db68, 40350, 5a1c)
000002a103a292e1 syscall_trap32+0xcc(4, 5a1c, ffbfed38, ff38db68, 40350, 0)


After snapshot was destroyed problem is completly gone.
And right now no avl_walk function at the top:

# dtrace -n fbt:::entry'{self->vt=vtimestamp;}' -n fbt:::return'/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt =0;}' -n tick-5s'{printa(@);exit(0);}'
[...]
cv_broadcast 11204800
vdev_cache_offset_compare 11977900
avl_walk 12568500
hat_getpfnum 13012900
zio_next_stage 13649700
page_next_scan_large 15850300
ddi_get_instance 16842800
kmem_free 20050300
pci_pbm_dma_sync 22854300
ddi_get_soft_state 25803500
space_map_seg_compare 32883300
kmem_cache_free 34151000
ip_cksum 34840900
kmem_cache_alloc 41557900
resume 47279100
fletcher_2_native 55060000
lzjb_compress 94374200
lzjb_decompress 97102700
mutex_vector_enter 115330500


Lets try small experiment - see if lowering quota for the fs will make things bad again:

# zfs list | egrep "NAME|f3-1/d611"
NAME USED AVAIL REFER MOUNTPOINT
f3-1/d611 334G 266G 151G //f3-1/d611
f3-1/d611@auto-2006-10-19 25.7G - 145G -
f3-1/d611@auto-2006-10-20 21.9G - 155G -
f3-1/d611@auto-2006-10-21 22.9G - 151G -
f3-1/d611@auto-2006-10-22 18.5G - 145G -
f3-1/d611@auto-2006-10-23 17.6G - 148G -
f3-1/d611@auto-2006-10-24 14.7G - 154G -

# zfs get quota f3-1/d611
NAME PROPERTY VALUE SOURCE
f3-1/d611 quota 600G local

# zfs set quota=400g f3-1/d611

# zfs list | egrep "NAME|f3-1/d611"
NAME USED AVAIL REFER MOUNTPOINT
f3-1/d611 334G 66.1G 151G //f3-1/d611
f3-1/d611@auto-2006-10-19 25.7G - 145G -
f3-1/d611@auto-2006-10-20 21.9G - 155G -
f3-1/d611@auto-2006-10-21 22.9G - 151G -
f3-1/d611@auto-2006-10-22 18.5G - 145G -
f3-1/d611@auto-2006-10-23 17.6G - 148G -
f3-1/d611@auto-2006-10-24 14.7G - 154G -

milek

Posts: 785
From: GB

Registered: 4/27/05
Re: Snapshots impact on performance
Posted: Oct 24, 2006 1:16 PM   in response to: milek
To: Communities » zfs » discuss
  Click to reply to this thread Reply

forgot to mention - after quota was lowered still no problem - everything works ok.

ahrens

Posts: 424
From: US

Registered: 3/9/05
Re: Re: Snapshots impact on performance
Posted: Oct 24, 2006 10:08 PM   in response to: milek

  Click to reply to this thread Reply

Robert Milkowski wrote:
> Hi.
>
> On nfs clients which are mounting file system f3-1/d611 I can see 3-5s periods of 100% busy (iostat) and almost no IOs issued to nfs server, on nfs server at the same time disk activity is almost 0 (both iostat and zpool iostat). However CPU activity increases in SYS during that periods.

> Different time period when disk activity is small:
>
> # lockstat -kgIw sleep 10 | less

Did you happen to get 'lockstat -kgIW' output while the problem was
occurring? (note the capital W) I'm not sure how to interpret the -w
output... (and sorry I gave you the wrong flags before).

> Now during another period when disk activity is low and nfs clients see problem:
>
> # dtrace -n fbt:::entry'{self->vt=vtimestamp;}' -n fbt:::return'/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt =0;}' -n tick-5s'{printa(@);exit(0);}'
> [...]
> page_next_scan_large 23648600
> generic_idle_cpu 69234100
> disp_getwork 139261800
> avl_walk 669424900

Hmm, that's a possibility, but the method you're using to gather this
information (tracing *every* function entry and exit) is a bit
heavy-handed, and it may be distorting the results.

> Heh, I'm sure I have seen avl_walk consuming lot of CPU before...
>
> So wait for another such period and (6-7seconds):
>
> # dtrace -n fbt::avl_walk:entry'{@[stack()]=count();}'
> [...]

> zfs`metaslab_ff_alloc+0x9c
> zfs`space_map_alloc+0x10
> zfs`metaslab_group_alloc+0x1e4
> zfs`metaslab_alloc_dva+0x114
> zfs`metaslab_alloc+0x2c
> zfs`zio_alloc_blk+0x70
> zfs`zil_lwb_write_start+0x8c
> zfs`zil_lwb_commit+0x1ac
> zfs`zil_commit+0x1b0
> zfs`zfs_fsync+0xa8
> genunix`fop_fsync+0x14
> nfssrv`rfs3_create+0x700
> nfssrv`common_dispatch+0x444
> rpcmod`svc_getreq+0x154
> rpcmod`svc_run+0x198
> nfs`nfssys+0x1c4
> unix`syscall_trap32+0xcc
> 1415957

Hmm, assuming that avl_walk() is actually consuming most of our CPU
(which the lockstat -kgIW will confirm), this seems to indicate that
we're taking a long time trying to find free chunks of space. This may
happen if you have lots of small fragments of free space, but no chunks
large enough to hold the block we're trying to allocate. We try to
avoid this situation by trying to allocate from the metaslabs with the
most free space, but it's possible that there's an error in this algorithm.

> So lets destroy oldest snapshot:
>
> # zfs destroy f3-1/d611@auto-2006-10-18
> [it took about 4 minutes!]

> After snapshot was destroyed problem is completly gone.

FYI, destroying the snapshot probably helped simply by (a) returning
some big chunks of space to the pool and/or (b) perturbing the system
enough so that we try different metaslabs which aren't so fragmented.

--matt

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Robert Milkowski
rmilkowski@task.gda.pl
Re[2]: Re: Snapshots impact on performance
Posted: Oct 25, 2006 4:32 AM   in response to: ahrens

  Click to reply to this thread Reply

Hello Matthew,

Wednesday, October 25, 2006, 7:08:56 AM, you wrote:

MA> Robert Milkowski wrote:
>> Hi.
>>
>> On nfs clients which are mounting file system f3-1/d611 I can see 3-5s periods of 100% busy (iostat) and almost no IOs issued to nfs server, on nfs server at the same time disk activity is almost 0 (both iostat and zpool iostat). However CPU activity increases in SYS during that periods.

>> Different time period when disk activity is small:
>>
>> # lockstat -kgIw sleep 10 | less

MA> Did you happen to get 'lockstat -kgIW' output while the problem was
MA> occurring? (note the capital W) I'm not sure how to interpret the -w
MA> output... (and sorry I gave you the wrong flags before).


Sorry, just blindly copied (changed i to I), next time I'll try.

>> Now during another period when disk activity is low and nfs clients see problem:
>>
>> # dtrace -n fbt:::entry'{self->vt=vtimestamp;}' -n fbt:::return'/self->vt/{@[probefunc]=sum(vtimestamp-self->vt);self->vt =0;}' -n tick-5s'{printa(@);exit(0);}'
>> [...]
>> page_next_scan_large 23648600
>> generic_idle_cpu 69234100
>> disp_getwork 139261800
>> avl_walk 669424900

MA> Hmm, that's a possibility, but the method you're using to gather this
MA> information (tracing *every* function entry and exit) is a bit
MA> heavy-handed, and it may be distorting the results.

That could be the case - however during normal periods avl_walk when
measured the same way is not on the top.


>> Heh, I'm sure I have seen avl_walk consuming lot of CPU before...
>>
>> So wait for another such period and (6-7seconds):
>>
>> # dtrace -n fbt::avl_walk:entry'{@[stack()]=count();}'
>> [...]

>> zfs`metaslab_ff_alloc+0x9c
>> zfs`space_map_alloc+0x10
>> zfs`metaslab_group_alloc+0x1e4
>> zfs`metaslab_alloc_dva+0x114
>> zfs`metaslab_alloc+0x2c
>> zfs`zio_alloc_blk+0x70
>> zfs`zil_lwb_write_start+0x8c
>> zfs`zil_lwb_commit+0x1ac
>> zfs`zil_commit+0x1b0
>> zfs`zfs_fsync+0xa8
>> genunix`fop_fsync+0x14
>> nfssrv`rfs3_create+0x700
>> nfssrv`common_dispatch+0x444
>> rpcmod`svc_getreq+0x154
>> rpcmod`svc_run+0x198
>> nfs`nfssys+0x1c4
>> unix`syscall_trap32+0xcc
>> 1415957

MA> Hmm, assuming that avl_walk() is actually consuming most of our CPU
MA> (which the lockstat -kgIW will confirm), this seems to indicate that
MA> we're taking a long time trying to find free chunks of space. This may
MA> happen if you have lots of small fragments of free space, but no chunks
MA> large enough to hold the block we're trying to allocate. We try to
MA> avoid this situation by trying to allocate from the metaslabs with the
MA> most free space, but it's possible that there's an error in this algorithm.


>> So lets destroy oldest snapshot:
>>
>> # zfs destroy f3-1/d611@auto-2006-10-18
>> [it took about 4 minutes!]

>> After snapshot was destroyed problem is completly gone.

MA> FYI, destroying the snapshot probably helped simply by (a) returning
MA> some big chunks of space to the pool and/or (b) perturbing the system
MA> enough so that we try different metaslabs which aren't so fragmented.


Well, that's scaring - it means when I'll fill up the pool a little
bit more I could get big problem even without using snapshots which is
much worse as there would be no easy and quick method to solve
symptoms like just destroying snapshot.

--
Best regards,
Robert mailto:rmilkowski at task dot gda dot pl
http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



milek

Posts: 785
From: GB

Registered: 4/27/05
Re: Re: Snapshots impact on performance
Posted: Oct 25, 2006 6:35 AM   in response to: ahrens
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Hi.

Today again the same problem - however not that severe so I'm not sure if below lockstat is representative.

# lockstat -kgIW sleep 5 | less
Profiling interrupt: 1968 events in 5.072 seconds (388 events/sec)

Count genr cuml rcnt nsec Hottest CPU+PIL Caller
----------------------------------------------------------------------------- --
1560 79% ---- 0.00 2520 cpu[3] thread_start
1018 52% ---- 0.00 2069 cpu[2]+11 idle
851 43% ---- 0.00 2093 cpu[2]+11 disp_getwork
591 30% ---- 0.00 3647 cpu[1] metaslab_alloc
591 30% ---- 0.00 3647 cpu[1] metaslab_alloc_dva
591 30% ---- 0.00 3647 cpu[1] metaslab_group_alloc
404 21% ---- 0.00 3488 cpu[1] space_map_load
404 21% ---- 0.00 3488 cpu[1] metaslab_activate
382 19% ---- 0.00 3564 cpu[1] zio_dva_allocate
382 19% ---- 0.00 3564 cpu[1] zio_write_compress
382 19% ---- 0.00 3564 cpu[1] txg_sync_thread
382 19% ---- 0.00 3564 cpu[1] spa_sync
382 19% ---- 0.00 3564 cpu[1] dsl_pool_sync
382 19% ---- 0.00 3564 cpu[1] dsl_dataset_sync
382 19% ---- 0.00 3564 cpu[1] dnode_sync
382 19% ---- 0.00 3564 cpu[1] dmu_objset_sync
382 19% ---- 0.00 3564 cpu[1] dmu_objset_sync_dnodes
382 19% ---- 0.00 3564 cpu[1] dbuf_sync
382 19% ---- 0.00 3564 cpu[1] arc_write
363 18% ---- 0.00 3510 cpu[1] syscall_trap32
361 18% ---- 0.00 3514 cpu[1] nfssys
361 18% ---- 0.00 3514 cpu[1] svc_run
350 18% ---- 0.00 3532 cpu[1] svc_getreq
349 18% ---- 0.00 3533 cpu[1] common_dispatch
251 13% ---- 0.00 3813 cpu[1] avl_find
238 12% ---- 0.00 3320 cpu[1] space_map_remove
211 11% ---- 0.00 3798 cpu[1] zil_commit
209 11% ---- 0.00 3800 cpu[1] zio_alloc_blk
209 11% ---- 0.00 3800 cpu[1] zil_lwb_write_start
198 10% ---- 0.00 4256 cpu[1] metaslab_ff_alloc
196 10% ---- 0.00 3867 cpu[1] zfs_putpage
[...]

ahrens

Posts: 424
From: US

Registered: 3/9/05
Re: Re: Re: Snapshots impact on performance
Posted: Oct 25, 2006 9:58 AM   in response to: milek

  Click to reply to this thread Reply

Robert Milkowski wrote:
> Hi.
>
> Today again the same problem - however not that severe so I'm not sure if below lockstat is representative.
>
> # lockstat -kgIW sleep 5 | less
> Profiling interrupt: 1968 events in 5.072 seconds (388 events/sec)
>
> Count genr cuml rcnt nsec Hottest CPU+PIL Caller
> -------------------------------------------------------------------------------
> 1560 79% ---- 0.00 2520 cpu[3] thread_start
> 1018 52% ---- 0.00 2069 cpu[2]+11 idle
> 851 43% ---- 0.00 2093 cpu[2]+11 disp_getwork
> 591 30% ---- 0.00 3647 cpu[1] metaslab_alloc
> 591 30% ---- 0.00 3647 cpu[1] metaslab_alloc_dva
> 591 30% ---- 0.00 3647 cpu[1] metaslab_group_alloc
> 404 21% ---- 0.00 3488 cpu[1] space_map_load
> 404 21% ---- 0.00 3488 cpu[1] metaslab_activate

Nice, this is definitely pointing the finger more definitively. Next
time could you try:

dtrace -n 'profile-97{@[stack(20)] = count()}' -c 'sleep 5'

(just send the last 10 or so stack traces)

In the mean time I'll talk with our SPA experts and see if I can figure
out how to fix this...

--matt
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



cjg

Posts: 160
From: GB

Registered: 7/28/05
Re: Re: Re: Snapshots impact on performance
Posted: Dec 6, 2006 9:23 AM   in response to: ahrens
To: Communities » zfs » discuss
  Click to reply to this thread Reply

One of our file servers internally to Sun that reproduces this running nv53 here is the dtrace output:

unix`mutex_vector_enter+0x120
zfs`metaslab_group_alloc+0x1a0
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_dva_allocate+0x54
zfs`zio_write_compress+0x248
zfs`arc_write+0xec
zfs`dbuf_sync+0x698
zfs`dnode_sync+0x2ec
zfs`dmu_objset_sync_dnodes+0x60
zfs`dmu_objset_sync+0x78
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x64
zfs`spa_sync+0x1b4
zfs`txg_sync_thread+0x120
unix`thread_start+0x4
8

unix`mutex_vector_enter+0x120
zfs`metaslab_group_alloc+0x1a0
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_fsync+0xa4
genunix`fop_fsync+0x14
nfssrv`rfs3_setattr+0x4ec
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
8

genunix`avl_walk+0x3c
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_putpage+0x1d8
genunix`fop_putpage+0x1c
nfssrv`rfs3_commit+0x130
nfssrv`common_dispatch+0x4ec
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
8

genunix`avl_walk+0x40
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_putpage+0x1d8
genunix`fop_putpage+0x1c
nfssrv`rfs3_commit+0x130
nfssrv`common_dispatch+0x4ec
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
8

genunix`avl_walk+0x4c
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_dva_allocate+0x54
zfs`zio_write_compress+0x248
zfs`arc_write+0xec
zfs`dbuf_sync+0x698
zfs`dnode_sync+0x2ec
zfs`dmu_objset_sync_dnodes+0x60
zfs`dmu_objset_sync+0x50
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x64
zfs`spa_sync+0x1b4
zfs`txg_sync_thread+0x120
unix`thread_start+0x4
8

zfs`fletcher_2_native+0x2c
zfs`arc_cksum_verify+0x64
zfs`arc_buf_thaw+0x38
zfs`dbuf_dirty+0x10c
zfs`dmu_write_uio+0xc4
zfs`zfs_write+0x3ac
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
8

zfs`fletcher_2_native+0x2c
zfs`arc_cksum_verify+0x64
zfs`arc_buf_destroy+0x1c
zfs`arc_evict+0x1f0
zfs`arc_adjust+0xf8
zfs`arc_kmem_reclaim+0x100
zfs`arc_kmem_reap_now+0x20
zfs`arc_reclaim_thread+0xdc
unix`thread_start+0x4
8

zfs`fletcher_2_native+0x2c
zfs`arc_cksum_compute+0x6c
zfs`dbuf_rele+0x40
zfs`dmu_buf_rele_array+0x34
zfs`dmu_write_uio+0x13c
zfs`zfs_write+0x3ac
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
8

unix`disp_getwork+0x7c
unix`idle+0xd0
unix`thread_start+0x4
9

unix`mutex_vector_enter+0x134
zfs`metaslab_free_dva+0xb0
zfs`metaslab_free+0x78
zfs`zio_dva_free+0x14
zfs`zio_wait+0xc
zfs`arc_free+0x13c
zfs`dsl_dataset_block_kill+0x168
zfs`dbuf_do_kill+0x14
genunix`taskq_thread+0x19c
unix`thread_start+0x4
9

genunix`avl_walk+0x3c
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_dva_allocate+0x54
zfs`zio_write_compress+0x248
zfs`arc_write+0xec
zfs`dbuf_sync+0x698
zfs`dnode_sync+0x2ec
zfs`dmu_objset_sync_dnodes+0x60
zfs`dmu_objset_sync+0x50
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x64
zfs`spa_sync+0x1b4
zfs`txg_sync_thread+0x120
unix`thread_start+0x4
9

10

unix`mutex_vector_enter+0x120
zfs`metaslab_group_alloc+0x1a0
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_putpage+0x1d8
genunix`fop_putpage+0x1c
nfssrv`rfs3_commit+0x130
nfssrv`common_dispatch+0x4ec
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
10

unix`mutex_vector_enter+0x134
zfs`metaslab_group_alloc+0x1a0
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_fsync+0xa4
genunix`fop_fsync+0x14
nfssrv`rfs3_setattr+0x4ec
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
10

unix`disp_getwork+0x8c
unix`idle+0xd0
unix`thread_start+0x4
11

unix`mutex_vector_enter+0x134
zfs`metaslab_group_alloc+0x1a0
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_dva_allocate+0x54
zfs`zio_write_compress+0x248
zfs`arc_write+0xec
zfs`dbuf_sync+0x698
zfs`dnode_sync+0x2ec
zfs`dmu_objset_sync_dnodes+0x60
zfs`dmu_objset_sync+0x78
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x64
zfs`spa_sync+0x1b4
zfs`txg_sync_thread+0x120
unix`thread_start+0x4
12

genunix`avl_walk+0x50
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_dva_allocate+0x54
zfs`zio_write_compress+0x248
zfs`arc_write+0xec
zfs`dbuf_sync+0x698
zfs`dnode_sync+0x2ec
zfs`dmu_objset_sync_dnodes+0x60
zfs`dmu_objset_sync+0x50
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x64
zfs`spa_sync+0x1b4
zfs`txg_sync_thread+0x120
unix`thread_start+0x4
12

zfs`metaslab_ff_alloc+0x7c
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
12

genunix`avl_walk+0x40
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_dva_allocate+0x54
zfs`zio_write_compress+0x248
zfs`arc_write+0xec
zfs`dbuf_sync+0x698
zfs`dnode_sync+0x2ec
zfs`dmu_objset_sync_dnodes+0x60
zfs`dmu_objset_sync+0x50
zfs`dsl_dataset_sync+0xc
zfs`dsl_pool_sync+0x64
zfs`spa_sync+0x1b4
zfs`txg_sync_thread+0x120
unix`thread_start+0x4
13

unix`mutex_vector_enter+0x120
zfs`metaslab_free_dva+0xb0
zfs`metaslab_free+0x78
zfs`zio_dva_free+0x14
zfs`zio_wait+0xc
zfs`arc_free+0x13c
zfs`dsl_dataset_block_kill+0x168
zfs`dbuf_do_kill+0x14
genunix`taskq_thread+0x19c
unix`thread_start+0x4
18

genunix`avl_walk+0x50
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
18

zfs`metaslab_ff_alloc+0x74
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
19

genunix`avl_walk+0x4c
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
20

genunix`avl_walk+0x3c
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
22

unix`mutex_vector_enter+0x134
zfs`metaslab_group_alloc+0x1a0
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_putpage+0x1d8
genunix`fop_putpage+0x1c
nfssrv`rfs3_commit+0x130
nfssrv`common_dispatch+0x4ec
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
26

genunix`avl_walk+0x40
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_commit_writer+0x2ac
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
27

zfs`metaslab_ff_alloc+0x74
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
38

genunix`avl_walk+0x50
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
45

genunix`avl_walk+0x40
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
46

genunix`avl_walk+0x3c
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
57

genunix`avl_walk+0x4c
zfs`metaslab_ff_alloc+0x90
zfs`space_map_alloc+0x10
zfs`metaslab_group_alloc+0x200
zfs`metaslab_alloc_dva+0x10c
zfs`metaslab_alloc+0x3c
zfs`zio_alloc_blk+0x34
zfs`zil_lwb_write_start+0xdc
zfs`zil_lwb_commit+0x94
zfs`zil_commit_writer+0x1e4
zfs`zil_commit+0x68
zfs`zfs_write+0x648
genunix`fop_write+0x20
nfssrv`rfs3_write+0x3d8
nfssrv`common_dispatch+0x3c8
rpcmod`svc_getreq+0x20c
rpcmod`svc_run+0x1ac
nfs`nfssys+0x18c
unix`syscall_trap32+0xcc
57

Robert Milkowski
rmilkowski@task.gda.pl
Re: Re: Re: Re: Snapshots impact on performance
Posted: Dec 12, 2006 9:28 AM   in response to: cjg

  Click to reply to this thread Reply

Hello Chris,

Wednesday, December 6, 2006, 6:23:48 PM, you wrote:

CG> One of our file servers internally to Sun that reproduces this
CG> running nv53 here is the dtrace output:

Any conclusions yet?

--
Best regards,
Robert mailto:rmilkowski at task dot gda dot pl
http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



cjg

Posts: 160
From: GB

Registered: 7/28/05
Re: Re: Re: Re: Snapshots impact on performance
Posted: Dec 13, 2006 3:25 AM   in response to: Robert Milkowski

  Click to reply to this thread Reply

Robert Milkowski wrote:
> Hello Chris,
>
> Wednesday, December 6, 2006, 6:23:48 PM, you wrote:
>
> CG> One of our file servers internally to Sun that reproduces this
> CG> running nv53 here is the dtrace output:
>
> Any conclusions yet?
>


Not yet. We had to delete all the "automatic" snapshots we had so that
the system became usable again. An engineer is looking into reproducing
this on a lab system. I hope they will turn up on this thread with a
progress report.

--
Chris Gerhard. __o __o __o
Principal Engineer _`\<,`\<,`\<,_
Sun Microsystems Limited (*)/-http://blogs.sun.com/chrisg
-----------------------------------------------------------
NOTICE: This email message is for the sole use of the
intended recipient(s) and may contain confidential and
privileged information. Any unauthorized review, use,
disclosure or distribution is prohibited.
If you are not the intended recipient, please contact
the sender by reply email and destroy all copies of the
original message.
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Robert Milkowski
rmilkowski@task.gda.pl
Re[2]: Re: Re: Re: Snapshots impact on performance
Posted: Jan 5, 2007 2:59 PM   in response to: cjg

  Click to reply to this thread Reply

Hello Chris,

Wednesday, December 13, 2006, 12:25:40 PM, you wrote:

CG> Robert Milkowski wrote:
>> Hello Chris,
>>
>> Wednesday, December 6, 2006, 6:23:48 PM, you wrote:
>>
>> CG> One of our file servers internally to Sun that reproduces this
>> CG> running nv53 here is the dtrace output:
>>
>> Any conclusions yet?
>>


CG> Not yet. We had to delete all the "automatic" snapshots we had so that
CG> the system became usable again. An engineer is looking into reproducing
CG> this on a lab system. I hope they will turn up on this thread with a
CG> progress report.

The problem here is that even without snapshots we're seeing the same
problem. It's like snapshots only make it worse.

Hope it'll be solved quickly (I've also escalating it with P1 for some
time...).


--
Best regards,
Robert mailto:rmilkowski at task dot gda dot pl
http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



gino

Posts: 168
From:

Registered: 7/20/06
Re: Re[2]: Re: Re: Re: Snapshots impact on performance
Posted: Jun 27, 2007 1:59 AM   in response to: Robert Milkowski
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Same problem here (snv_60).
Robert, did you find any solutions?

gino

iktorn

Posts: 221
From: RU

Registered: 3/27/06
Re: Re: Re[2]: Re: Re: Re: Snapshots impact on performance
Posted: Jun 27, 2007 5:19 AM   in response to: gino

  Click to reply to this thread Reply

Gino wrote:
> Same problem here (snv_60).
> Robert, did you find any solutions?

Couple of week ago I put together an implementation of space maps which
completely eliminates loops and recursion from space map alloc
operation, and allows to implement different allocation strategies quite
easily (of which I put together 3 more). It looks like it works for me
on thumper and my notebook with ZFS Root though I have almost no time to
test it more these days due to year end. I haven't done SPARC build yet
and I do not have test case to test against.

Also, it comes at a price - I have to spend some more time (logarithmic,
though) during all other operations on space maps and is not optimized now.

victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Robert Milkowski
rmilkowski@task.gda.pl
Re: Snapshots impact on performance
Posted: Jul 26, 2007 4:31 AM   in response to: iktorn

  Click to reply to this thread Reply

Hello Victor,

Wednesday, June 27, 2007, 1:19:44 PM, you wrote:

VL> Gino wrote:
>> Same problem here (snv_60).
>> Robert, did you find any solutions?

VL> Couple of week ago I put together an implementation of space maps which
VL> completely eliminates loops and recursion from space map alloc
VL> operation, and allows to implement different allocation strategies quite
VL> easily (of which I put together 3 more). It looks like it works for me
VL> on thumper and my notebook with ZFS Root though I have almost no time to
VL> test it more these days due to year end. I haven't done SPARC build yet
VL> and I do not have test case to test against.

VL> Also, it comes at a price - I have to spend some more time (logarithmic,
VL> though) during all other operations on space maps and is not optimized now.

Lukasz (cc) - maybe you can test it and even help on tuning it?


--
Best regards,
Robert mailto:rmilkowski at task dot gda dot pl
http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


kangurek

Posts: 35
From: Poland, Gdańsk

Registered: 3/22/07
Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impact on performance
Posted: Jul 27, 2007 5:41 AM   in response to: Robert Milkowski

  Click to reply to this thread Reply

Dnia 26-07-2007 o godz. 13:31 Robert Milkowski napisał(a):
> Hello Victor,
>
> Wednesday, June 27, 2007, 1:19:44 PM, you wrote:
>
> VL> Gino wrote:
> >> Same problem here (snv_60).
> >> Robert, did you find any solutions?
>
> VL> Couple of week ago I put together an implementation of space maps
> which
> VL> completely eliminates loops and recursion from space map alloc
> VL> operation, and allows to implement different allocation strategies
> quite
> VL> easily (of which I put together 3 more). It looks like it works for me
> VL> on thumper and my notebook with ZFS Root though I have almost no
> time to
> VL> test it more these days due to year end. I haven't done SPARC build
> yet
> VL> and I do not have test case to test against.
>
> VL> Also, it comes at a price - I have to spend some more time
> (logarithmic,
> VL> though) during all other operations on space maps and is not
> optimized now.
>
> Lukasz (cc) - maybe you can test it and even help on tuning it?
>
Yes, I can test it. I'm building environment to compile opensolaris
and test zfs. I will be ready next week.

Victor, can you tell me where to look for your changes ?
How to change allocation strategy ?
I can see that changing space_map_ops_t
I can declare diffrent callback functions.

Lukas

----------------------------------------------------
Tylko od nich zależy czy przeżyją tę noc. Jak uciec, gdy
oni widzą wszystko? Kate Beckinsale w mrocznym thrillerze
MOTEL - kinach od 3 sierpnia!
http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fmotel.html&sid=1236


_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


iktorn

Posts: 221
From: RU

Registered: 3/27/06
Re: Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impact on performance
Posted: Aug 24, 2007 5:57 AM   in response to: kangurek

  Click to reply to this thread Reply

Hi Lukasz and all,

I just returned from month long sick-leave, so I need some time to sort
pile of emails, do SPARC build and some testing and then I'll be able to
provide you with my changes in some form. Hope this will happen next week.

Cheers,
Victor

Łukasz K wrote:
> Dnia 26-07-2007 o godz. 13:31 Robert Milkowski napisał(a):
>> Hello Victor,
>>
>> Wednesday, June 27, 2007, 1:19:44 PM, you wrote:
>>
>> VL> Gino wrote:
>>>> Same problem here (snv_60).
>>>> Robert, did you find any solutions?
>> VL> Couple of week ago I put together an implementation of space maps
>> which
>> VL> completely eliminates loops and recursion from space map alloc
>> VL> operation, and allows to implement different allocation strategies
>> quite
>> VL> easily (of which I put together 3 more). It looks like it works for me
>> VL> on thumper and my notebook with ZFS Root though I have almost no
>> time to
>> VL> test it more these days due to year end. I haven't done SPARC build
>> yet
>> VL> and I do not have test case to test against.
>>
>> VL> Also, it comes at a price - I have to spend some more time
>> (logarithmic,
>> VL> though) during all other operations on space maps and is not
>> optimized now.
>>
>> Lukasz (cc) - maybe you can test it and even help on tuning it?
>>
> Yes, I can test it. I'm building environment to compile opensolaris
> and test zfs. I will be ready next week.
>
> Victor, can you tell me where to look for your changes ?
> How to change allocation strategy ?
> I can see that changing space_map_ops_t
> I can declare diffrent callback functions.
>
> Lukas
>
> ----------------------------------------------------
> Tylko od nich zależy czy przeżyją tę noc. Jak uciec, gdy
> oni widzą wszystko? Kate Beckinsale w mrocznym thrillerze
> MOTEL - kinach od 3 sierpnia!
> http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fmotel.html&sid=1236
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris dot org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


kangurek

Posts: 35
From: Poland, Gdańsk

Registered: 3/22/07
Re: Odp: Re[2]: Re: Re[2]: Re: Re: Re: Snapshots impacton performance
Posted: Aug 24, 2007 6:49 AM   in response to: iktorn

  Click to reply to this thread Reply

Great, I have latest snv_70 sources and I'm working on it.

Lukas


Dnia 24-08-2007 o godz. 14:57 Victor Latushkin napisał(a):
> Hi Lukasz and all,
>
> I just returned from month long sick-leave, so I need some time to sort
> pile of emails, do SPARC build and some testing and then I'll be able to
> provide you with my changes in some form. Hope this will happen next week.
>
> Cheers,
> Victor
>
> Łukasz K wrote:
> > Dnia 26-07-2007 o godz. 13:31 Robert Milkowski napisał(a):
> >> Hello Victor,
> >>
> >> Wednesday, June 27, 2007, 1:19:44 PM, you wrote:
> >>
> >> VL> Gino wrote:
> >>>> Same problem here (snv_60).
> >>>> Robert, did you find any solutions?
> >> VL> Couple of week ago I put together an implementation of space maps
> >> which
> >> VL> completely eliminates loops and recursion from space map alloc
> >> VL> operation, and allows to implement different allocation strategies
> >> quite
> >> VL> easily (of which I put together 3 more). It looks like it works for me
> >> VL> on thumper and my notebook with ZFS Root though I have almost no
> >> time to
> >> VL> test it more these days due to year end. I haven't done SPARC build
> >> yet
> >> VL> and I do not have test case to test against.
> >>
> >> VL> Also, it comes at a price - I have to spend some more time
> >> (logarithmic,
> >> VL> though) during all other operations on space maps and is not
> >> optimized now.
> >>
> >> Lukasz (cc) - maybe you can test it and even help on tuning it?
> >>
> > Yes, I can test it. I'm building environment to compile opensolaris
> > and test zfs. I will be ready next week.
> >
> > Victor, can you tell me where to look for your changes ?
> > How to change allocation strategy ?
> > I can see that changing space_map_ops_t
> > I can declare diffrent callback functions.
> >
> > Lukas
> >
> > ----------------------------------------------------
> > Tylko od nich zależy czy przeżyją tę noc. Jak uciec, gdy
> > oni widzą wszystko? Kate Beckinsale w mrocznym thrillerze
> > MOTEL - kinach od 3 sierpnia!
> > http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fmotel.html&sid=1236
> >
> >
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris dot org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

----------------------------------------------------
Najnowszy album światowej klasy DJ'a Paula van Dyka
"In Between" już w sklepach! Kompilacja zawiera singiel
"White Lies" z udziałem Jessica Sutta z Pussycat Dolls
http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fpaulvandyk.html&sid=1272


_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


kangurek

Posts: 35
From: Poland, Gdańsk

Registered: 3/22/07
Re: Re[2]: Re: Re: Re: Snapshots impact on performance
Posted: Jul 27, 2007 7:21 AM   in response to: gino
To: Communities » zfs » discuss
  Click to reply to this thread Reply

> Same problem here (snv_60).
> Robert, did you find any solutions?
>
> gino

check this http://www.opensolaris.org/jive/thread.jspa?threadID=34423&tstart=0

Check spa_sync function time
remember to change POOL_NAME !

dtrace -q -n fbt::spa_sync:entry'/(char *)(((spa_t*)arg0)->spa_name) == "POOL_NAME"/{ self->t = timestamp; }' -n fbt::spa_sync:return'/self->t/{ @m = max((timestamp - self->t)/1000000); self->t = 0; }' -n tick-10m'{ printa("%@u",@m); exit(0); }'

If you have long spa_sync times, try to check if you have
problems with finding new blocks in space map with this script:

#!/usr/sbin/dtrace -s

fbt::space_map_alloc:entry
{
self->s = arg1;
}

fbt::space_map_alloc:return
/arg1 != -1/
{
self->s = 0;
}

fbt::space_map_alloc:return
/self->s && (arg1 == -1)/
{
@s = quantize(self->s);
self->s = 0;
}

tick-10s
{
printa(@s);
}

Then change zfs set recordsize=XX POOL_NAME. Make sure that all filesystem inherits
recordsize.
#zfs get -r recordsize POOL_NAME

Other thing is space map size.

check map size

echo '::spa' | mdb -k | grep 'POOL_NAME' \
| while read pool_ptr state pool_name
do
echo "${pool_ptr}::walk metaslab|::print -d struct metaslab ms_smo.smo_objsize" \
| mdb -k \
| nawk '{sub("^0t","",$3);sum+=$3}END{print sum}'
done

The value you will get is space map size on disk. In memory space map will have
about 4 *size_on_disk. Sometimes during snapshot remove kernel will have to load
all space maps to memory. For example
if space map on disk takes 1GB then:
- kernel in spa_sync funtion will read 1GB from disk ( or from cache )
- allocate 4GB for avl trees
- do all operations on avl trees
- save maps

It is good to have enough free memory for this operations.

You can reduce space map by coping all filesystems on other pool. I recommend zfs send.

regards

Lukas

Message was edited by:
kangurek

milek

Posts: 785
From: GB

Registered: 4/27/05
Re: Re: Re: Snapshots impact on performance
Posted: Dec 22, 2006 4:33 AM   in response to: ahrens
To: Communities » zfs » discuss
  Click to reply to this thread Reply

Hi.

The problem is getting worse... now even if I destroy all snapshots in a pool I get performance problem even with zil_disable set to 1.

Despite that I have limit for maximum nfs threads set to 2048 I get only about 1700.
If I want to kill nfsd server it takes 1-4 minutes untill all threads are finished (and most of the time above 1000 threads are stil there). During nfsd is stopping I can see IOs (99% reads) to the pool.

Also simple zfs commands (like changing quote for a file system, etc.) take too much time to complete (like 3-5 minutes to set quota for a file system).

There's still over 700gb free storage in a pool, setting quota to none doesn't help.

Using iostat it looks like disks aren't saturated. It looks almost like lot of nfsd threads are spinning (probably in zfs) - I don't recall nfsd stopping so long on UFS file systems.


bash-3.00# iostat -xnzC 1
[... first output]
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1782.3 0.0 12700.7 0.0 0.0 11.1 0.0 6.2 0 293 c5
584.3 0.0 4102.2 0.0 0.0 3.6 0.0 6.2 0 96 c5t600C0FF000000000098FD57F9DA83C00d0
572.1 0.0 4144.8 0.0 0.0 3.7 0.0 6.5 0 99 c5t600C0FF000000000098FD55DBA4EA000d0
625.8 0.0 4453.7 0.0 0.0 3.7 0.0 5.9 0 98 c5t600C0FF000000000098FD516E4403200d0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
2381.9 0.0 17242.1 0.0 0.0 16.7 0.0 7.0 0 291 c5
800.0 0.0 5827.9 0.0 0.0 5.5 0.0 6.9 0 96 c5t600C0FF000000000098FD57F9DA83C00d0
690.0 0.0 4991.9 0.0 0.0 4.9 0.0 7.1 0 97 c5t600C0FF000000000098FD55DBA4EA000d0
892.0 0.0 6422.3 0.0 0.0 6.3 0.0 7.1 0 98 c5t600C0FF000000000098FD516E4403200d0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1890.2 0.0 12826.0 0.0 0.0 11.4 0.0 6.0 0 292 c5
611.1 0.0 3998.5 0.0 0.0 3.4 0.0 5.6 0 96 c5t600C0FF000000000098FD57F9DA83C00d0
604.1 0.0 4259.5 0.0 0.0 3.9 0.0 6.4 0 98 c5t600C0FF000000000098FD55DBA4EA000d0
675.1 0.0 4568.0 0.0 0.0 4.1 0.0 6.0 0 98 c5t600C0FF000000000098FD516E4403200d0
^C



Now dtrace output you've requested.

dtrace -n 'profile-97{@[stack(20)] = count()}' -c 'sleep 5'
[...]

unix`i_ddi_splhigh
unix`disp_getwork+0x38
unix`idle+0xd4
unix`thread_start+0x4
35

unix`i_ddi_splx
unix`disp_getwork+0x160
unix`idle+0xd4
unix`thread_start+0x4
38

unix`disp_getwork+0x70
unix`idle+0xd4
unix`thread_start+0x4
38

unix`disp_getwork+0x158
unix`idle+0xd4
unix`thread_start+0x4
38

unix`i_ddi_splhigh+0x14
unix`disp_getwork+0x38
unix`idle+0xd4
unix`thread_start+0x4
39

unix`disp_getwork+0x8c
unix`idle+0xd4
unix`thread_start+0x4
39

unix`idle+0x12c
unix`thread_start+0x4
44

unix`disp_getwork+0x1a8
unix`idle+0xd4
unix`thread_start+0x4
47

unix`disp_getwork+0x7c
unix`idle+0xd4
unix`thread_start+0x4
49

unix`disp_getwork+0x90
unix`idle+0xd4
unix`thread_start+0x4
56

unix`disp_getwork+0x10c
unix`idle+0xd4
unix`thread_start+0x4
62

unix`i_ddi_splx+0x1c
unix`disp_getwork+0x160
unix`idle+0xd4
unix`thread_start+0x4
117
bash-3.00#

milek

Posts: 785
From: GB

Registered: 4/27/05
Re: Re: Re: Snapshots impact on performance
Posted: Dec 22, 2006 4:42 AM   in response to: milek
To: Communities » zfs » discuss
  Click to reply to this thread Reply

bash-3.00# lockstat -kgIW sleep 100 | head -30

Profiling interrupt: 38844 events in 100.098 seconds (388 events/sec)

Count genr cuml rcnt nsec Hottest CPU+PIL Caller
----------------------------------------------------------------------------- --
32081 83% ---- 0.00 2432 cpu[1] thread_start
24347 63% ---- 0.00 2116 cpu[1]+11 idle
20724 53% ---- 0.00 2125 cpu[1]+11 disp_getwork
5525 14% ---- 0.00 3571 cpu0 syscall_trap32
5470 14% ---- 0.00 3570 cpu0 nfssys
5470 14% ---- 0.00 3570 cpu0 svc_run
5225 13% ---- 0.00 3595 cpu0 svc_getreq
5142 13% ---- 0.00 3600 cpu0 common_dispatch
4488 12% ---- 0.00 3040 cpu[2] taskq_thread
3658 9% ---- 0.00 2946 cpu[2] zio_vdev_io_assess
2943 8% ---- 0.00 2874 cpu[2] zio_read_decompress
2846 7% ---- 0.00 2120 cpu[1] splx
2654 7% ---- 0.00 4105 cpu[1] putnext
2541 7% ---- 0.00 2785 cpu[2] lzjb_decompress
2382 6% ---- 0.00 2138 cpu[2] i_ddi_splhigh
2056 5% ---- 0.00 3481 cpu0 fop_lookup
2031 5% ---- 0.00 3485 cpu0 zfs_lookup
1979 5% ---- 0.00 3482 cpu0 zfs_dirlook
1935 5% ---- 0.00 3492 cpu0 zfs_dirent_lock
1910 5% ---- 0.00 3835 cpu[1] txg_sync_thread
1910 5% ---- 0.00 3835 cpu[1] spa_sync
1896 5% ---- 0.00 3842 cpu[1] dsl_pool_sync
1896 5% ---- 0.00 3842 cpu[1] dmu_objset_sync
1895 5% ---- 0.00 3843 cpu[1] dmu_objset_sync_dnodes
1894 5% ---- 0.00 3843 cpu[1] dnode_sync
bash-3.00#

Jeff Bonwick
bonwick@zion.eng.sun...
Re: Re: Re: Snapshots impact on performance
Posted: Oct 29, 2006 5:03 PM   in response to: ahrens

  Click to reply to this thread Reply

> Nice, this is definitely pointing the finger more definitively. Next
> time could you try:
>
> dtrace -n 'profile-97{@[stack(20)] = count()}' -c 'sleep 5'
>
> (just send the last 10 or so stack traces)
>
> In the mean time I'll talk with our SPA experts and see if I can figure
> out how to fix this...

By any chance is the pool fairly close to full? The fuller it gets,
the harder it becomes to find long stretches of free space.

Jeff

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Robert Milkowski
rmilkowski@task.gda.pl
Re[2]: Re: Re: Snapshots impact on performance
Posted: Oct 30, 2006 12:54 AM   in response to: Jeff Bonwick

  Click to reply to this thread Reply

Hello Jeff,

Monday, October 30, 2006, 2:03:52 AM, you wrote:

>> Nice, this is definitely pointing the finger more definitively. Next
>> time could you try:
>>
>> dtrace -n 'profile-97{@[stack(20)] = count()}' -c 'sleep 5'
>>
>> (just send the last 10 or so stack traces)
>>
>> In the mean time I'll talk with our SPA experts and see if I can figure
>> out how to fix this...

JB> By any chance is the pool fairly close to full? The fuller it gets,
JB> the harder it becomes to find long stretches of free space.

Nope - at least 600GB free all the time in the pool.
Also it's not a quota - if I rise a quote for the file system (+100GB
for example) it doesn't help. If I remove oldest snapshot it always
helps immediately. Sometimes I have to remove two oldest snapshots.

--
Best regards,
Robert mailto:rmilkowski at task dot gda dot pl
http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss






Terms of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
© 2010, Oracle Corporation and/or its affiliates

Oracle