OpenSolaris

Discussions Communities Projects Download Source Browser

Home » OpenSolaris Forums » zfs » discuss

Thread: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

Welcome, Guest Help
Login Login
Guest Settings Guest Settings
Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 7 - Last Post: Feb 21, 2007 9:09 PM by: ahrens
napobo3

Posts: 157
From: IL

Registered: 6/14/05
SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!
Posted: Feb 12, 2007 3:32 PM
To: Communities » zfs » discuss
Cc: Communities » nfs » discuss
  Click to reply to this thread Reply

Hello,
I am running SPEC SFS benchmark [1] on dual Xeon 2.80GHz box with 4GB memory. More details:
snv_56, zil_disable=1, zfs_arc_max = 0x80000000 #2GB
Configurations that were tested:
160 dirs/1 zfs/1 zpool/4 SAN LUNs
160 zfs'es/1 zpool/4 SAN LUNs
40 zfs'es/4 zpools/4 SAN LUNs
The SAN storage array used doesn't honor flush cache commands.
NFSD_SERVERS=1024, NFS3 via UDP was used.
Max. number of obtained SPEC NFS IOPS: 5K
Max. number of SPEC NFS IOPS for SVM/VxFS configuration obtained a year ago was: 24K [2]
So we have almost a five-times difference. Can we improve this? How can we accelerate this NFS/ZFS setup?
Two serious problems were observed:
1.Degradation of benchmark results of the same setup. The same benchmark gave first time 4030 IOPS, when was ran second time - 2037 IOPS.
2.When 4 zpools were used instead of 1, the result was degraded about 4 times.

The benchmark report shows abnormally high part of [b]readdirplus[/b] operations that reached 50% of the test time. It's part in SFS mix is: 9%. Does it point to some known problem? Increasing of DNLC size doesn't help in case ZFS, I checked this.
I will appreciate your help very much. This testing is a part of preparation for production deployment. I will provide any additional information that may be needed.

Thank you,
[i]-- leon[/i]

[1] http://www.spec.org/osg/sfs/
[2] http://napobo3.blogspot.com/2006/01/turbocharging-nfs-server.html

napobo3

Posts: 157
From: IL

Registered: 6/14/05
Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!
Posted: Feb 14, 2007 1:35 AM   in response to: napobo3
To: Communities » zfs » discuss
  Click to reply to this thread Reply

An update:

Not sure is it related to the fragmentation, but I can say that serious performance degradation in my NFS/ZFS benchmarks is a result of on-disk ZFS data layout.
Read operations on directories (NFS3 readdirplus) are abnormally time consuming . That kills the server. After cold restart of the host the performans is still on the flour.
My conclusion: it's not CPU, not memory, it's ZFS on-disk structures.

Robert Milkowski
rmilkowski@task.gda.pl
Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!
Posted: Feb 14, 2007 1:43 AM   in response to: napobo3

  Click to reply to this thread Reply

Hello Leon,

Wednesday, February 14, 2007, 10:35:05 AM, you wrote:

LK> An update:

LK> Not sure is it related to the fragmentation, but I can say that
LK> serious performance degradation in my NFS/ZFS benchmarks is a
LK> result of on-disk ZFS data layout.
LK> Read operations on directories (NFS3 readdirplus) are abnormally
LK> time consuming . That kills the server. After cold restart of the
LK> host the performans is still on the flour.
LK> My conclusion: it's not CPU, not memory, it's ZFS on-disk structures.
LK>

Before jumping to any conclusions - first try to eliminate nfs and do
readdirs locally - I guess that would be quite fast. Then check on a
client (dtrace) the time distribution of nfs requests and sends us
results.

You may also want to fiddle with async clusters on a nfs client to see
if it makes any difference.

--
Best regards,
Robert mailto:rmilkowski at task dot gda dot pl
http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



napobo3

Posts: 157
From: IL

Registered: 6/14/05
Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!
Posted: Feb 18, 2007 11:29 AM   in response to: Robert Milkowski
To: Communities » zfs » discuss
Cc: Communities » nfs » discuss
  Click to reply to this thread Reply

Robert wrote:

> Before jumping to any conclusions - first try to
> eliminate nfs and do readdirs locally - I guess that would be quite fast.
> Then check on a client (dtrace) the time distribution of nfs requests
> and sends us results.

We used this test program that is doing readdirs and can be run with one argument: name of directory to dig into - like in this example :
[b]rdir /mnt[/b]
The program can be downloaded here:
http://tinyurl.com/ywcyyp/rdir.c (source code)
http://tinyurl.com/ywcyyp/rdir (executable for sparc)
http://tinyurl.com/ywcyyp/rdir.x86 (executable for x86)

Results:
1.local ZFS - we have 160 zfs'es under /tank1 : /tank1/1.../tank1/160 , each one contains directory tree that was created during the SFS benchmark run

# ptime /var/tmp/rdir /tank1
real [b]1:37.824[/b]
user 1.637
sys 38.498

again:
real 1:27.001
user 1.595
sys 32.146

To avoid an influence of local runs on the NFS runs:
# zfs unmount -a
# zfs mount -a
# zfs share -a (160 shares)

2. NFS
ssh to NFS client, create 160 dirs under /mnt, mount /tank1/i to /mnt/i (i=1...160) from the NFS server
> ptime /var/tmp/rdir /mnt
real [b]1:48.983[/b]
user 1.096
sys 17.265

again:
real [b]3:51.001[/b]
user 1.657
sys 27.468

There is definitely a problem - 2nd NFS run is more than 2 times longer! What is the reason ?

rbourbon

Posts: 505
From: Grenoble, France

Registered: 3/9/05
Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!
Posted: Feb 19, 2007 8:42 AM   in response to: napobo3

  Click to reply to this thread Reply


Leon Koll writes:
> An update:
>
> Not sure is it related to the fragmentation, but I can say that serious performance degradation in my NFS/ZFS benchmarks is a result of on-disk ZFS data layout.
> Read operations on directories (NFS3 readdirplus) are abnormally time consuming . That kills the server. After cold restart of the host the performans is still on the flour.
> My conclusion: it's not CPU, not memory, it's ZFS on-disk structures.
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris dot org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


As I understand the issue, a readdirplus is
2X slower when data is already cached in the client than
when it is not.

Given that the on-disk structure does not change between the
2 runs, I can't really place the fault on it.

-r

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



napobo3

Posts: 157
From: IL

Registered: 6/14/05
Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!
Posted: Feb 20, 2007 3:21 AM   in response to: rbourbon
To: Communities » zfs » discuss
Cc: Communities » nfs » discuss
  Click to reply to this thread Reply

>
> As I understand the issue, a readdirplus is
> 2X slower when data is already cached in the client
> than when it is not.

Yes, that's the issue. It's not always 2X slower, but ALWAYS SLOWER.
My another 2runs on NFS/ZFS show:
1.
real 2:56.760
user 2.270
sys 33.247

2.
real 4:43.397
user 2.663
sys 40.872

>
> Given that the on-disk structure does not change
> between the
> 2 runs, I can't really place the fault on it.

You mixed two different tests described in this thread: first is spec.org SFS that shows the bad results on NFS/ZFS even after reboot and second is our own "rdir" that was written to understand the problem, was run on the directories that were created by SFS test and exposed the weird/erroneous behaviour of NFS/ZFS combination.

Thank you for your attention.
[i]-- leon[/i]

napobo3

Posts: 157
From: IL

Registered: 6/14/05
Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!
Posted: Feb 21, 2007 4:14 AM   in response to: napobo3
To: Communities » zfs » discuss
Cc: Communities » nfs » discuss
  Click to reply to this thread Reply

More detailed description of readdir test and conclusion at the end:

Roch asked me:
> Is this a NFS V3 or V4 test or don't care ?

I am running NFS V3 but the short test of NFS V4 showed that the
problem is there.

Then Roch asked:
> I've run rdir on a few of my large directories, However my
> large directories are not much larger than ncsize, maybe
> your's are. Do I understand that you hit the issue only upon
> first large rdir after reboot ?

After reboot of the NFS client (see below).

Then Roch added:
> If so, it might me that we get a speedup from the part of
> the run in which we are initially filling the dnlc cache.
> That could explain thge increase in sys time. But the real
> time increase seems too much to be due to this.
>
> Anyway I'm interested in the directory size rdir reports and
> the ncsize/D from mdb -k. Also a third pass through might
> yield a lead.
>
> -r

ncsize has a default value. People told me "don't increase dnlc size when running ZFS".
# echo 'ncsize/D' | mdb -k
ncsize:
ncsize: 129675

Directory size? There are 160 ZFS'es under zpool tank1, each ZFS is
202MB, total 31.5GB, 1224000 files

# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
tank1 382G 31.5G 351G 8% ONLINE -

More detailed results:
ZFS local runs - "normal behavior":

1. 2:33.406
2. 2:25.353
3. 2:27.033

NFS V3/ZFS runs - first is ok, then jumped up:

1. 3:14.185
2. 4:47.681
3. 4:52.213
4. 4:49.841
5. 4:53.069
6. 4:45.290

after reboot of the NFS client:

1. 2:56.760
2. 4:43.397

after reboot of both client and server:

1.real 3:12.841
2.real 4:50.869

after reboot of the NFS server only:

1. 5:15.048
2. 4:54.686
3. 4:48.713

It means the problem is on the NFS client: after reboot of the client the first run is "ok", then all the rest are "bad". When the server was rebooted, it didn't help and the results stayed "bad".

Roch replied :
> I'd hypothesize that when the client doesn't know about a file he
> just gets the data and boom. But once he's got a cached copy
> he needs more time to figure out if the data is up to date.
>
> This seems to have been a tradeoff of metadata operations in favor of
> faster data op (!?).
>
> Note also that SFS doesn't use the client's NFS code. It
> runs it's own user space client.

The fact that the described problem is 100%-NFS-client-problem, there is nothing to do with ZFS code to improve the situtaion.
And the SFS problem we observed (see the first message in this thread) has nothing common with this one. Unfortunately, the abnormal behavior of NFS/ZFS during an SFS test didn't get much attention so I don't have any clue. Anyway, I'll update this thread when I have more information on the problem.

ahrens

Posts: 413
From: US

Registered: 3/9/05
Re: Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!
Posted: Feb 21, 2007 9:09 PM   in response to: napobo3

  Click to reply to this thread Reply

Leon Koll wrote:
> The fact that the described problem is 100%-NFS-client-problem, there
> is nothing to do with ZFS code to improve the situtaion.

You may want to see if the folks over at nfs-discuss at opensolaris dot org
have any ideas on your NFS problem.

--matt
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris dot org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss






Terms of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
Copyright © 1995-2005 Sun Microsystems, Inc.