SearchFAQMemberlist Log in
Reply to topic Page 2 of 2
Goto page Previous  1, 2
7.2 upgrades...
Author Message
Post 7.2 upgrades... 
That's typical I/O on the Thumper (the scrub finished).

# zpool iostat 1
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
pool 22.1T 18.8T 3.32K 212 413M 25.1M
pool 22.1T 18.8T 775 30 95.8M 3.87M
pool 22.1T 18.8T 514 0 63.9M 0
pool 22.1T 18.8T 771 0 95.8M 0
pool 22.1T 18.8T 514 0 63.8M 0
pool 22.1T 18.8T 514 0 63.9M 0
pool 22.1T 18.8T 514 0 63.9M 0
pool 22.1T 18.8T 761 0 94.5M 0
pool 22.1T 18.8T 524 0 65.1M 0
pool 22.1T 18.8T 514 0 63.9M 0
pool 22.1T 18.8T 509 0 63.3M 0
pool 22.1T 18.8T 179 4.77K 22.3M 598M
pool 22.1T 18.8T 503 2.77K 62.4M 305M
pool 22.1T 18.8T 509 0 63.2M 0
pool 22.1T 18.8T 514 0 63.9M 0
pool 22.1T 18.8T 514 0 63.9M 0
pool 22.1T 18.8T 604 0 74.9M 0
pool 22.1T 18.8T 680 0 84.5M 0
pool 22.1T 18.8T 513 0 63.7M 0
pool 22.1T 18.8T 508 0 63.0M 0
pool 22.1T 18.8T 646 0 80.3M 0


Anacreo wrote:
Ok so how are you accessing the Thumper as an adv_file over NFS or as an
iSCSI LUN?

Neither. It is a Networker storage node (running 7.5.2). All clients
are configured to send directly to it. This eliminates all NFS/iSCSI
tuning issues.


Have you been able to clock your read speed off of the Thumper through to
the T1000? If you can write through at 100MB/s can you read for at least
that speed over x number of connections - where X is the number of devices
you're trying to simultaneously clone too?

Since the Thumper is a storage node, the nsrmmd on it reads the save
set from disk and sends it over TCP/IP to the nsrmmd on the T1000 which
is in charge of the relevant tape drive. As I said, this is limited to
~60MBps regardless of the number of sessions. So if there are 3 sessions
running at ~23MBps and a fourth session starts, all four sessions will
drop to ~17MBps. The amazing thing is that all clone sessions write at
the same speed which is an indication of a 60MBps bottle neck which all
sessions are hitting and are therefore need to split among them.


Alec

On Sun, Mar 14, 2010 at 6:30 PM, Yaron Zabary <yaron < at > aristo.tau.ac.il>wrote:


Anacreo wrote:

Yaron,
What version of Solaris are you running on the Thumper, update 8 is
significantly faster than say update 3?

The Thumper is U8 with recommended patches from November 2009 (kernel is
Generic_141445-09).


Do you have any SSD's in the

thunper to handle L2ARC?

No.



What kind of performance are you getting?

As I said the problem is when staging from an AFTD on the Thumper to an
LTO4 drive (with LTO3 media) on the T1000. I can get ~30MBps per clone
session. If I run a few of them (there are four drives on the T1000), the
total will be ~60MBps. Staging from an AFTD which is local to the T1000 can
do ~70MBps. The Thumper and the T1000 are both connected via a 4 port
aggregate (dladm create-aggr -d e1000g0 -d e1000g1 -d e1000g2 -d e1000g3 1)
to the same Cisco 3560 switch, so, in theory, if I get unlucky and all
sessions hit the same interface, the network should limit me to 125MBps.



Have you tried a few tests like backing up Dev Random? To see where
you're bottlenecking?

All backups go to the Thumper and with 64 sessions I can get ~100MBps
which is OK because all clients are connected via a single 1GigE link of the
above mentioned 3560 at the campus. So, there is no performance problem with
the Thumper when doing backups. Also, zpool iostat 1 does not show any heavy
load on the pool (I cannot send some output because there is a scrub running
right now).


Feel free to make more suggestions.




Alec

On 3/14/10, Yaron Zabary <yaron < at > aristo.tau.ac.il> wrote:

tkimball wrote:

We went from 7.1.3 + AlphaStor to 7.4.4 and no AlphaStor *cheer*. The
version choice was made over a year ago, based on Stan's experience with
it on Sun hardware.

I've been overall pleased with the new version, in particular how much
easier library management is (compared to AlphaStor anyway). I'm still
poking and prodding at the GUI to see how far I can take it, and how to
document procedures for our Ops group.

Right now my only gripe is that 7.6 came out at the wrong time (final
eval
before rollout) otherwise my NMC server would have been that instead of
7.4.4. Now I'm waiting until at least June before getting back
something
similar to the old nwadmin.

Most of our troubles come from old Windows boxes, even before the
upgrade
(W2K Server and AdvServer), though we've now had one incident where the
Adv_file devices started unmounting but would not re-mount (said it was
not in media db!). Bouncing the software fixed that, it had been
running
for almost a month.

Yaron, can you give details regarding what your DBO issues are? I've
not
seen any throughput issues (actually that's been better, now that the
Server itself also went from E450 to T2000). However, our disk array is
1
Gig FC so may not be able to help.

Our setup is AFTD which is located on a Sun X4500 (Thumper) and the
tape library is connected to a T1000. Staging from the x4500 to the
T1000 is performing poorly compared to the old setup (a Clariion AX150
which was directly connected to the T1000). I suspected that this was
related to LGTsc30475 (aka 30475nw "Cloning is slow from the local to
remote device"). I was hoping that this will be solved by 7.5.2, but
after upgrading this morning, things are quite the same.

The issues we had (on previous versions) were:

. "duplicate name; pick new name or delete old one" (on 7.2.2)
upgraded to 7.3.4

. Owner notification bug (on 7.4.3).

. LGTsc24106 (on 7.4.4). (volretent) Patched a few binaries.

. Some nsrck bug (on 7.4.5) (nsrck hang on unknown clients unrelated
to AFTD). Patched some binaries.

. "Failed to fetch the saveset(ss_t) structure for ssid" (on 7.5.1).
Moved to 7.5.1.7.

--TSK


evilensky < at > gmail.com wrote:

Hi,

So what's the latest word on upgrades from 7.2? Is 7.6 a viable
option or is 7.4/7.5 more "fully cooked"? We're not really looking
for features so much as support for the latest client platforms and
stability. We're not going to be spending much money on upgraded
hardware either, so an in-place upgrade is the most likely. Thanks in
advance for any opinions/observations.


+----------------------------------------------------------------------
|This was sent by t.s.kimball < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.
+----------------------------------------------------------------------


type
"signoff networker" in the body of the email. Please write to
networker-request < at > listserv.temple.edu if you have any problems with
this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

--

-- Yaron.


type
"signoff networker" in the body of the email. Please write to
networker-request < at > listserv.temple.edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER


--

-- Yaron.



via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

--

-- Yaron.


via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

View user's profile Send private message
Post 7.2 upgrades... 
The single thread issue was supposed to be fixed with U6 (I am at
UCool, so I hope this is not the problem. Anyhow, I don't have problems
getting at 100MBps when writing, so I guess I should be OK with reading
as well (with respect to the CPU calculation of sha256 checksums).

But, keep on sending those ideas, I am trying to figure this out for
a couple of months without much success.

Anacreo wrote:
In either case please read below, I've seen the effects of this first hand
and it is easy to see if its causing your performance degradation:

From:
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

Tuning ZFS Checksums

End-to-end checksumming is one of the great features of ZFS. It allows ZFS
to detect and correct many kinds of errors other products can't detect and
correct. Disabling checksum is, of course, a very bad idea. Having file
system level checksums enabled can alleviate the need to have application
level checksums enabled. In this case, using the ZFS checksum becomes a
performance enabler.

The checksums are computed asynchronously to most application processing and
should normally not be an issue. However, each pool currently has a single
thread computing the checksums (RFE below) and it's possible for that
computation to limit pool throughput. So, if disk count is very large (>>
10) or single CPU is weak (< Ghz), then this tuning might help. If a system
is close to CPU saturated, the checksum computations might become
noticeable. In those cases, do a run with checksums off to verify if
checksum calculation is a problem.

If you tune this parameter, please reference this URL in shell script or in
an /etc/system comment.

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Tuning_ZFS_Checksums

Verify the type of checksum used:

zfs get checksum <filesystem>

Tuning is achieved dynamically by using:

zfs set checksum=off <filesystem>

And reverted:

zfs set checksum='on | fletcher2 | fletcher4 | sha256' <filesystem>

Fletcher2 checksum (the default) has been observed to consume roughly 1Ghz
of a CPU when checksumming 500 MByte per second.

On Sun, Mar 14, 2010 at 6:46 PM, Anacreo <anacreo < at > gmail.com> wrote:

Ok so how are you accessing the Thumper as an adv_file over NFS or as an
iSCSI LUN?

Have you been able to clock your read speed off of the Thumper through to
the T1000? If you can write through at 100MB/s can you read for at least
that speed over x number of connections - where X is the number of devices
you're trying to simultaneously clone too?

Alec

On Sun, Mar 14, 2010 at 6:30 PM, Yaron Zabary <yaron < at > aristo.tau.ac.il>wrote:


Anacreo wrote:

Yaron,
What version of Solaris are you running on the Thumper, update 8 is
significantly faster than say update 3?

The Thumper is U8 with recommended patches from November 2009 (kernel is
Generic_141445-09).


Do you have any SSD's in the

thunper to handle L2ARC?

No.



What kind of performance are you getting?

As I said the problem is when staging from an AFTD on the Thumper to an
LTO4 drive (with LTO3 media) on the T1000. I can get ~30MBps per clone
session. If I run a few of them (there are four drives on the T1000), the
total will be ~60MBps. Staging from an AFTD which is local to the T1000 can
do ~70MBps. The Thumper and the T1000 are both connected via a 4 port
aggregate (dladm create-aggr -d e1000g0 -d e1000g1 -d e1000g2 -d e1000g3 1)
to the same Cisco 3560 switch, so, in theory, if I get unlucky and all
sessions hit the same interface, the network should limit me to 125MBps.



Have you tried a few tests like backing up Dev Random? To see where
you're bottlenecking?

All backups go to the Thumper and with 64 sessions I can get ~100MBps
which is OK because all clients are connected via a single 1GigE link of the
above mentioned 3560 at the campus. So, there is no performance problem with
the Thumper when doing backups. Also, zpool iostat 1 does not show any heavy
load on the pool (I cannot send some output because there is a scrub running
right now).


Feel free to make more suggestions.




Alec

On 3/14/10, Yaron Zabary <yaron < at > aristo.tau.ac.il> wrote:

tkimball wrote:

We went from 7.1.3 + AlphaStor to 7.4.4 and no AlphaStor *cheer*. The
version choice was made over a year ago, based on Stan's experience
with
it on Sun hardware.

I've been overall pleased with the new version, in particular how much
easier library management is (compared to AlphaStor anyway). I'm still
poking and prodding at the GUI to see how far I can take it, and how to
document procedures for our Ops group.

Right now my only gripe is that 7.6 came out at the wrong time (final
eval
before rollout) otherwise my NMC server would have been that instead
of
7.4.4. Now I'm waiting until at least June before getting back
something
similar to the old nwadmin.

Most of our troubles come from old Windows boxes, even before the
upgrade
(W2K Server and AdvServer), though we've now had one incident where the
Adv_file devices started unmounting but would not re-mount (said it was
not in media db!). Bouncing the software fixed that, it had been
running
for almost a month.

Yaron, can you give details regarding what your DBO issues are? I've
not
seen any throughput issues (actually that's been better, now that the
Server itself also went from E450 to T2000). However, our disk array
is 1
Gig FC so may not be able to help.

Our setup is AFTD which is located on a Sun X4500 (Thumper) and the
tape library is connected to a T1000. Staging from the x4500 to the
T1000 is performing poorly compared to the old setup (a Clariion AX150
which was directly connected to the T1000). I suspected that this was
related to LGTsc30475 (aka 30475nw "Cloning is slow from the local to
remote device"). I was hoping that this will be solved by 7.5.2, but
after upgrading this morning, things are quite the same.

The issues we had (on previous versions) were:

. "duplicate name; pick new name or delete old one" (on 7.2.2)
upgraded to 7.3.4

. Owner notification bug (on 7.4.3).

. LGTsc24106 (on 7.4.4). (volretent) Patched a few binaries.

. Some nsrck bug (on 7.4.5) (nsrck hang on unknown clients unrelated
to AFTD). Patched some binaries.

. "Failed to fetch the saveset(ss_t) structure for ssid" (on 7.5.1).
Moved to 7.5.1.7.

--TSK


evilensky < at > gmail.com wrote:

Hi,

So what's the latest word on upgrades from 7.2? Is 7.6 a viable
option or is 7.4/7.5 more "fully cooked"? We're not really looking
for features so much as support for the latest client platforms and
stability. We're not going to be spending much money on upgraded
hardware either, so an in-place upgrade is the most likely. Thanks in
advance for any opinions/observations.


+----------------------------------------------------------------------
|This was sent by t.s.kimball < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.
+----------------------------------------------------------------------


type
"signoff networker" in the body of the email. Please write to
networker-request < at > listserv.temple.edu if you have any problems with
this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

--

-- Yaron.


type
"signoff networker" in the body of the email. Please write to
networker-request < at > listserv.temple.edu if you have any problems with
this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER


--

-- Yaron.




via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

--

-- Yaron.


via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

View user's profile Send private message
Post 7.2 upgrades... 
Hmm.. your situation has some uniqueness to it.. I'd keep harping on
isolating each components... I'm curious what your total throughput is
between your two servers, have you tried just simply doing a few parallel
FTP runs between the two servers, and get some timings off of that? What's
sticking in my head is that 60MB/s is basically half of a gigabit pipe,
which could be tied to some sort of Ethernet issue through your core, etc...

I'm now reading up on the ASYNCH_IO issues where NetWorker states that
Asynch IO is not available on Solaris 10 and that there will be a
performance degradation for IO intensive operations, such as cloning. It
looks like Solaris 10 does support ASYNCH_IO but its actually handled now as
a user thread instead of a kernel thread unless its a raw device. Combine
this with the fact that the T2000 has a slower processor (1.0-1.2ghz) the
effects could be compounded.

Also is your T2000 at update 8 as well, because there were a lot of
"auto-tuning" network features that were tweaked for update 8...

Ok I'm out of ideas, but really curious if you do solve this issue. Good
luck!

Alec


On Sun, Mar 14, 2010 at 6:59 PM, Yaron Zabary <yaron < at > aristo.tau.ac.il>wrote:


The single thread issue was supposed to be fixed with U6 (I am at UCool, so
I hope this is not the problem. Anyhow, I don't have problems getting at
100MBps when writing, so I guess I should be OK with reading as well (with
respect to the CPU calculation of sha256 checksums).

But, keep on sending those ideas, I am trying to figure this out for a
couple of months without much success.


Anacreo wrote:

In either case please read below, I've seen the effects of this first hand
and it is easy to see if its causing your performance degradation:

From:
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

Tuning ZFS Checksums

End-to-end checksumming is one of the great features of ZFS. It allows ZFS
to detect and correct many kinds of errors other products can't detect and
correct. Disabling checksum is, of course, a very bad idea. Having file
system level checksums enabled can alleviate the need to have application
level checksums enabled. In this case, using the ZFS checksum becomes a
performance enabler.

The checksums are computed asynchronously to most application processing
and
should normally not be an issue. However, each pool currently has a single
thread computing the checksums (RFE below) and it's possible for that
computation to limit pool throughput. So, if disk count is very large (>>
10) or single CPU is weak (< Ghz), then this tuning might help. If a
system
is close to CPU saturated, the checksum computations might become
noticeable. In those cases, do a run with checksums off to verify if
checksum calculation is a problem.

If you tune this parameter, please reference this URL in shell script or
in
an /etc/system comment.


http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Tuning_ZFS_Checksums

Verify the type of checksum used:

zfs get checksum <filesystem>

Tuning is achieved dynamically by using:

zfs set checksum=off <filesystem>

And reverted:

zfs set checksum='on | fletcher2 | fletcher4 | sha256' <filesystem>

Fletcher2 checksum (the default) has been observed to consume roughly 1Ghz
of a CPU when checksumming 500 MByte per second.

On Sun, Mar 14, 2010 at 6:46 PM, Anacreo <anacreo < at > gmail.com> wrote:

Ok so how are you accessing the Thumper as an adv_file over NFS or as an
iSCSI LUN?

Have you been able to clock your read speed off of the Thumper through to
the T1000? If you can write through at 100MB/s can you read for at least
that speed over x number of connections - where X is the number of
devices
you're trying to simultaneously clone too?

Alec

On Sun, Mar 14, 2010 at 6:30 PM, Yaron Zabary <yaron < at > aristo.tau.ac.il
wrote:


Anacreo wrote:

Yaron,
What version of Solaris are you running on the Thumper, update 8 is
significantly faster than say update 3?

The Thumper is U8 with recommended patches from November 2009 (kernel
is
Generic_141445-09).


Do you have any SSD's in the

thunper to handle L2ARC?

No.



What kind of performance are you getting?

As I said the problem is when staging from an AFTD on the Thumper to
an
LTO4 drive (with LTO3 media) on the T1000. I can get ~30MBps per clone
session. If I run a few of them (there are four drives on the T1000),
the
total will be ~60MBps. Staging from an AFTD which is local to the T1000
can
do ~70MBps. The Thumper and the T1000 are both connected via a 4 port
aggregate (dladm create-aggr -d e1000g0 -d e1000g1 -d e1000g2 -d e1000g3
1)
to the same Cisco 3560 switch, so, in theory, if I get unlucky and all
sessions hit the same interface, the network should limit me to 125MBps.



Have you tried a few tests like backing up Dev Random? To see where
you're bottlenecking?

All backups go to the Thumper and with 64 sessions I can get ~100MBps
which is OK because all clients are connected via a single 1GigE link of
the
above mentioned 3560 at the campus. So, there is no performance problem
with
the Thumper when doing backups. Also, zpool iostat 1 does not show any
heavy
load on the pool (I cannot send some output because there is a scrub
running
right now).


Feel free to make more suggestions.




Alec

On 3/14/10, Yaron Zabary <yaron < at > aristo.tau.ac.il> wrote:

tkimball wrote:

We went from 7.1.3 + AlphaStor to 7.4.4 and no AlphaStor *cheer*.
The
version choice was made over a year ago, based on Stan's experience
with
it on Sun hardware.

I've been overall pleased with the new version, in particular how
much
easier library management is (compared to AlphaStor anyway). I'm
still
poking and prodding at the GUI to see how far I can take it, and how
to
document procedures for our Ops group.

Right now my only gripe is that 7.6 came out at the wrong time (final
eval
before rollout) otherwise my NMC server would have been that instead
of
7.4.4. Now I'm waiting until at least June before getting back
something
similar to the old nwadmin.

Most of our troubles come from old Windows boxes, even before the
upgrade
(W2K Server and AdvServer), though we've now had one incident where
the
Adv_file devices started unmounting but would not re-mount (said it
was
not in media db!). Bouncing the software fixed that, it had been
running
for almost a month.

Yaron, can you give details regarding what your DBO issues are? I've
not
seen any throughput issues (actually that's been better, now that the
Server itself also went from E450 to T2000). However, our disk array
is 1
Gig FC so may not be able to help.

Our setup is AFTD which is located on a Sun X4500 (Thumper) and the
tape library is connected to a T1000. Staging from the x4500 to the
T1000 is performing poorly compared to the old setup (a Clariion AX150
which was directly connected to the T1000). I suspected that this was
related to LGTsc30475 (aka 30475nw "Cloning is slow from the local to
remote device"). I was hoping that this will be solved by 7.5.2, but
after upgrading this morning, things are quite the same.

The issues we had (on previous versions) were:

. "duplicate name; pick new name or delete old one" (on 7.2.2)
upgraded to 7.3.4

. Owner notification bug (on 7.4.3).

. LGTsc24106 (on 7.4.4). (volretent) Patched a few binaries.

. Some nsrck bug (on 7.4.5) (nsrck hang on unknown clients unrelated
to AFTD). Patched some binaries.

. "Failed to fetch the saveset(ss_t) structure for ssid" (on 7.5.1).
Moved to 7.5.1.7.

--TSK



evilensky < at > gmail.com wrote:

Hi,

So what's the latest word on upgrades from 7.2? Is 7.6 a viable
option or is 7.4/7.5 more "fully cooked"? We're not really looking
for features so much as support for the latest client platforms and
stability. We're not going to be spending much money on upgraded
hardware either, so an in-place upgrade is the most likely. Thanks
in
advance for any opinions/observations.


+----------------------------------------------------------------------
|This was sent by t.s.kimball < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.

+----------------------------------------------------------------------


type
"signoff networker" in the body of the email. Please write to
networker-request < at > listserv.temple.edu if you have any problems with
this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

--

-- Yaron.


type
"signoff networker" in the body of the email. Please write to
networker-request < at > listserv.temple.edu if you have any problems with
this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER


--

-- Yaron.




type "signoff networker" in the body of the email. Please write to
networker-request < at > listserv.temple.edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER


--

-- Yaron.



via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

View user's profile Send private message
Display posts from previous:
Reply to topic Page 2 of 2
Goto page Previous  1, 2
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB