OpenSolaris

Discussions Communities Projects Download Source Browser

Home » OpenSolaris Forums » zones » discuss

Thread: improved zones/RM integration

Welcome, Guest Help
Login Login
Guest Settings Guest Settings
Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 24 - Last Post: Nov 2, 2006 8:24 AM by: gjelinek
gjelinek

Posts: 470
From: US

Registered: 3/9/05
improved zones/RM integration
Posted: Jun 26, 2006 11:32 AM

  Click to reply to this thread Reply

Attached is a description of a project we have been refining for
a while now. The idea is to improve the integration of zones
with some of the existing resource management features in Solaris.
I would appreciate hearing any suggestions or questions. I'd
like to submit this proposal to our internal architectural review
process by mid-July. I have also posted a few slides that give an
overview of the project. Those are available on the zones files
page (http://www.opensolaris.org/os/community/zones/files/).

Thanks,
Jerry

pfreund

Posts: 32
From: Cleveland OH

Registered: 3/21/06
Re: improved zones/RM integration
Posted: Jun 26, 2006 5:49 PM   in response to: gjelinek

  Click to reply to this thread Reply

I'm just trying to get my head around how to setup the whole pool/resource capping arena for my server/zone configurations and am finding it a bit confusing and arcane. Your proposal looks pretty good to me since most of what I need would be covered by the defaults. It's certainly more understandable. Unfortunately, I have to go ahead with what's available now and will have to retrofit the changes based on this proposal if/when they get implemented.

Like many others, my biggest current need is to meet licensing restrictions on the number of CPUs. At the moment this is mostly limited to being needed in the global zone since I have to use Oracle 8i with Quick I/O on a licensed CPU limited basis from there; in the future though, I will be able to move to Oracle 10g hosted in a nonglobal zone so the concepts of the temporary pools in the proposal look pretty good. I can also see a big use for the swap sets and memory sets on a zone basis.

Overall, simplicity is good. Yet another GUI certainly would not be good. That being said, management of the proposed changes should be possible in Container Manager in SMC.

One suggestion that may only be tangential to this effort is that a mechanism be put in place to allow the global zone sysadmin to set the default pool and resource values (such as cpu shares) for the global zone without having to write a script to set them on boot. It doesn't look particularly hard but it is one more thing to maintain that really should be handled by the OS.

I also like the idea of activating FSS if the zone.cpu-shares parameter is set. Given that activating FSS requires a reboot of the global zone, you should add a check to see if FSS is already active and if it is not, put out a warning message indicating that a reboot of the global is required to make FSS active. For that matter, there should be a warning message on zone boot that FSS is not active and the configured zone.cpu-shares value will be ignored.

Phil

Phil Freund
Lead Systems and Storage Administrator
Kichler Lighting

mgerdts

Posts: 1,262
From: US

Registered: 8/5/05
Re: Re: improved zones/RM integration
Posted: Jun 26, 2006 6:33 PM   in response to: pfreund

  Click to reply to this thread Reply

On 6/26/06, Phil Freund <pfreund at kichler dot com> wrote:
> I'm just trying to get my head around how to setup the whole pool/resource capping arena for my server/zone configurations and am finding it a bit confusing and arcane. Your proposal looks pretty good to me since most of what I need would be covered by the defaults. It's certainly more understandable. Unfortunately, I have to go ahead with what's available now and will have to retrofit the changes based on this proposal if/when they get implemented.

I found Brendan Gregg's page
http://users.tpg.com.au/adsln4yb/zones.html very helpful. He
condensed dozens (hundreds?) of pages down to something digestable in
a few minutes.

> One suggestion that may only be tangential to this effort is that a mechanism be put in place to allow the global zone sysadmin to set the default pool and resource values (such as cpu shares) for the global zone without having to write a script to set them on boot. It doesn't look particularly hard but it is one more thing to maintain that really should be handled by the OS.

I have created a replacement for svc:/system/zones and its associated
/lib/svc/method/* script to activate the appropriate pools during zone
booting. It does other things too - adds a default router if
necessary, allows an override for zone shutdown timeout, makes sure
that the zone's IP address isn't pingable, etc. You may want to
consider going this route if you have a similar set of concerns.

>
> I also like the idea of activating FSS if the zone.cpu-shares parameter is set. Given that activating FSS requires a reboot of the global zone, you should add a check to see if FSS is already active and if it is not, put out a warning message indicating that a reboot of the global is required to make FSS active. For that matter, there should be a warning message on zone boot that FSS is not active and the configured zone.cpu-shares value will be ignored.

No reboot is required. See poolbind in this example:
http://users.tpg.com.au/adsln4yb/zones.html#resource_cpu1

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



gjelinek

Posts: 470
From: US

Registered: 3/9/05
Re: Re: improved zones/RM integration
Posted: Jun 27, 2006 7:27 AM   in response to: mgerdts

  Click to reply to this thread Reply

Mike Gerdts wrote:
>> I also like the idea of activating FSS if the zone.cpu-shares
>> parameter is set. Given that activating FSS requires a reboot of the
>> global zone, you should add a check to see if FSS is already active
>> and if it is not, put out a warning message indicating that a reboot
>> of the global is required to make FSS active. For that matter, there
>> should be a warning message on zone boot that FSS is not active and
>> the configured zone.cpu-shares value will be ignored.
>
> No reboot is required. See poolbind in this example:
> http://users.tpg.com.au/adsln4yb/zones.html#resource_cpu1

Phil,

Mike as already responded with a bunch of useful information.
However, I looked at this url and on this one point regarding setting
FSS as the default, it doesn't look like it actually shows you how to
do this.

You don't have to reboot to make FSS the default and have all
of the global zones processes running under FSS. Instead, you can
do something like this:

# dispadmin -d FSS
# dispadmin -u
# priocntl -s -c FSS -i all
# priocntl -s -c FSS -i pid 1

I see the -u option on dispadmin is undocumented. I'll have to
look at that and see if it might make sense to raise its stability.
In the meantime, be aware that it could change at any time, although
that is not very likely.

Also, just to clarify things a bit in regards to our proposal, if you
have a zone with cpu-shares set, we won't be changing the global zones
scheduling class to be FSS. While that would be useful for many
scenarios, it will change the behavior of the system and it might not
be what you want in all cases. Instead, we will be setting up the zone
so that it is using FSS for all of its processes but the processes in
the global zone will continue to run with whatever scheduling class they
are using. For a lot of cases with cpu-shares, it would probably be a good
idea to run with FSS as the default but we can't just make this change
to the system automatically.

Thanks,
Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



Renaud Manus
Renaud.Manus@Sun.COM
Re: Re: improved zones/RM integration
Posted: Jun 29, 2006 9:32 AM   in response to: gjelinek

  Click to reply to this thread Reply



Jerry Jelinek wrote:
> You don't have to reboot to make FSS the default and have all
> of the global zones processes running under FSS. Instead, you can
> do something like this:
>
> # dispadmin -d FSS
> # dispadmin -u
> # priocntl -s -c FSS -i all
> # priocntl -s -c FSS -i pid 1
>

Since snv_29, you can also restart the scheduler service

# dispadmin -d FSS
# svcadm restart system/scheduler

-- Renaud
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



dp

Posts: 807
From: US

Registered: 3/9/05
Re: Re: improved zones/RM integration
Posted: Jul 17, 2006 10:43 AM   in response to: gjelinek

  Click to reply to this thread Reply

On Tue 27 Jun 2006 at 08:27AM, Jerry Jelinek wrote:
> Mike Gerdts wrote:
> >>I also like the idea of activating FSS if the zone.cpu-shares
> >>parameter is set. Given that activating FSS requires a reboot of the
> >>global zone, you should add a check to see if FSS is already active
> >>and if it is not, put out a warning message indicating that a reboot
> >>of the global is required to make FSS active. For that matter, there
> >>should be a warning message on zone boot that FSS is not active and
> >>the configured zone.cpu-shares value will be ignored.
> >
> >No reboot is required. See poolbind in this example:
> >http://users.tpg.com.au/adsln4yb/zones.html#resource_cpu1
>
> Phil,
>
> Mike as already responded with a bunch of useful information.
> However, I looked at this url and on this one point regarding setting
> FSS as the default, it doesn't look like it actually shows you how to
> do this.
>
> You don't have to reboot to make FSS the default and have all
> of the global zones processes running under FSS. Instead, you can
> do something like this:
>
> # dispadmin -d FSS
> # dispadmin -u
> # priocntl -s -c FSS -i all
> # priocntl -s -c FSS -i pid 1
>
> I see the -u option on dispadmin is undocumented. I'll have to
> look at that and see if it might make sense to raise its stability.

Jerry,

Just FYI, please thoroughly digest PSARC/2004/471 before recommending
the use of -u or considering documenting it. In short, your steps
should work without needing the -u option, since -d causes an update to
both the persistent file-based setting and to the kernel:

# dispadmin -d FSS
# priocntl -s -c FSS -i all
# priocntl -s -c FSS -i pid 1

-dp

--
Daniel Price - Solaris Kernel Engineering - dp at eng dot sun dot com - blogs.sun.com/dp
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



gjelinek

Posts: 470
From: US

Registered: 3/9/05
Re: Re: improved zones/RM integration
Posted: Jul 17, 2006 10:44 AM   in response to: dp

  Click to reply to this thread Reply

Dan Price wrote:
> Just FYI, please thoroughly digest PSARC/2004/471 before recommending
> the use of -u or considering documenting it. In short, your steps
> should work without needing the -u option, since -d causes an update to
> both the persistent file-based setting and to the kernel:
>
> # dispadmin -d FSS
> # priocntl -s -c FSS -i all
> # priocntl -s -c FSS -i pid 1

Dan,

Yes, I had looked at this a bit more after I sent the email and realized
that -u was unnecessary.

Thanks,
Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



jeffv

Posts: 409
From:

Registered: 6/16/05
Re: improved zones/RM integration
Posted: Jun 27, 2006 8:24 PM   in response to: gjelinek

  Click to reply to this thread Reply

1) General comment: I agree that this will provide needed clarity to the seemingly
unorganized RM features that we have scattered through Solaris during the last
decade. The automation of certain activities (e.g. starting rcapd in the GZ when
needed) will also be extremely beneficial.

2) Terminology: Although your use of the phrase "hard partition" should be clear
to most people, through experience with other partitioning technologies, the use
of "soft partition" is less clear. To many people, a "soft" limit is one that can
be exceeded occasionally or in certain situations, or is merely an advisory limit.
It also conflicts with SVM "soft partitions."

I suggest the phrases "dedicated partition" (or "private partition") and "shared
partition" instead to clarify the intent. Choosing "dedicated partition" might
then require re-naming "dedicated-cpu" to "guaranteed-cpu" or "private-cpu" and
re-naming "dedicated-memory" to "guaranteed-memory" or "private memory".

Or we could leave "hard partition" alone and simply change "soft partition" to
"shared partition."

3) Lwps context: Why is the lwps alias defined in the context of dedicated-cpu?
Lwps seem to be unrelated to hard partitions. Further, ncpus represents a subset
of a finite resource. Lwps are not finite. The two should be separate.

4) Aliases: The notion of aliases also creates redundant output which could be
confusing. I like the simplification of aliases as described, but I wish I had a
good solution that wouldn't break existing tools that parse "zonecfg info" output
- if such tools exist.

5) Another RM setting: While we're integrating RM settings, I think we should
consider adding project.max-address-space using the same semantics as
{project,zone}.max-lwps. A zone-specific setting, zone.max-address-space, could
be added along with a zonecfg alias, max-address-space. This would allow the GZ
admin to cap the virtual memory space available to that zone. This would not take
the place of swap sets, but would be valuable if this RM integration work might be
complete before swap sets.

6) From your "Project Alternatives" slide:
Should we have yet another standalone GUI? Absolutely not. Unless it was a
wizard - IMO Solaris adoption would accelerate greatly if it had wizards to aid
potential adopters.

However this ends up looking, I look forward to seeing this integration!


Gerald A. Jelinek wrote:
> Attached is a description of a project we have been refining for
> a while now. The idea is to improve the integration of zones
> with some of the existing resource management features in Solaris.
> I would appreciate hearing any suggestions or questions. I'd
> like to submit this proposal to our internal architectural review
> process by mid-July. I have also posted a few slides that give an
> overview of the project. Those are available on the zones files
> page (http://www.opensolaris.org/os/community/zones/files/).
>
> Thanks,
> Jerry
>
>
> This message posted from opensolaris.org
>
>
> ------------------------------------------------------------------------
>
> SUMMARY:
>
> This project enhances Solaris zones[1], pools[2-4] and resource
> caps[5,6] to improve the integration of zones with resource
> management (RM). It addresses existing RFEs[7-10] in this area and
> lays the groundwork for simplified, coherent management of the various
> RM features exposed through zones.
>
> We will integrate some basic pool configuration with zones, implement
> the concept of "temporary pools" that are dynamically created/destroyed
> when a zone boots/halts and we will simplify the setting of resource
> controls within zonecfg. We will enhance rcapd so that it can cap
> a zone's memory while rcapd is running in the global zone. We will
> also make a few other changes to provide a better overall experience
> when using zones with RM.
>
> Patch binding is requested for these new interfaces and the stability
> of most of these interfaces is "evolving" (see interface table for
> complete list).
>
> PROBLEM:
>
> Although zones are fairly easy to configure and install, it appears
> that many customers have difficulty setting up a good RM configuration
> to accompany their zone configuration. Understanding RM involves many
> new terms and concepts along with lots of documentation to understand.
> This leads to the problem that many customers either do not configure
> RM with their zones, or configure it incorrectly, leading them to be
> disappointed when zones, by themselves, do not provide all of the
> containment that they expect.
>
> This problem will just get worse in the near future with the
> additional RM features that are coming, such as cpu-caps[11], memory
> sets[12] and swap sets[13].
>
> PROPOSAL:
>
> There are 7 different enhancements outlined below.
>
> 1) "Hard" vs. "Soft" RM configuration within zonecfg
>
> We will enhance zonecfg(1M) so that the user can configure basic RM
> capabilities in a structured way.
>
> The various existing and upcoming RM features can be broken down
> into "hard" vs. "soft" partitioning of the system's resources.
> With "hard" partitioning, resources are dedicated to the zone using
> processor sets (psets) and memory sets (msets). With "soft"
> partitioning, resources are shared, but capped, with an upper limit
> on their use by the zone.
>
> Hard | Soft
> ---------------------------------
> cpu | psets | cpu-caps
> memory | msets | rcapd
>
> There are also some existing rctls (zone.cpu-shares, zone.max-lwps)
> which will be integrated into this overall concept.
>
> Within zonecfg we will organize the various RM features into four
> basic zonecfg resources so that it is simple for a user to understand
> and configure the RM features that are to be used with their zone.
> Note that zonecfg "resources" are not the same as "resource
> management". Within zonecfg, a "resource" is the name of a top-level
> property of the zone (see zonecfg(1M) for more information).
>
> The four new zonecfg resources are:
> dedicated-cpu
> capped-cpu (future, once cpu-caps are integrated)
> dedicated-memory (future, once memory sets are integrated)
> capped-memory
>
> Each of these zonecfg resources will have properties that are
> appropriate to the RM capabilities associated with that resource.
> Zonecfg will only allow one instance of each these resource to be
> configured and it will not allow conflicting resources to be added
> (e.g. dedicated-cpu and capped-cpu are mutually exclusive).
>
> The mapping of these new zonecfg resources to the primary underlying RM
> feature is:
> dedicated-cpu -> temporary pset
> dedicated-memory -> temporary mset
> capped-cpu -> cpu-cap rctl [11]
> capped-memory -> rcapd running in GZ
>
> Temporary psets and msets are described below, in section 2.
> Rcapd enhancements for running in the global zone are described below,
> in section 4.
>
> The valid properties for each of these new zonecfg resources will be:
>
> dedicated-cpu
> ncpus (a positive integer or range, default value 1)
> importance (a positive integer, default value 1)
> max-lwps (an integer >= 100)
> capped-cpu
> cpu-cap (a positive integer, default value 100 which
> represents 100% of one cpu)
> max-lwps (an integer >= 100)
> cpu-shares (a positive integer)
> dedicated-memory
> TBD - once msets [12] are completed
> capped-memory
> cap (a positive decimal number with optional k, m, g,
> or t as a modifier, no modifier defaults to units
> of megabytes(m), must be at least 1m)
>
> Some of these properties actually correspond to rctls. See section 3
> below for a description of how this will work.
>
> Zonecfg will also be enhanced to check for invalid combinations.
> This means it will disallow a dedicated-cpu resource and the
> zone.cpu-shares rctl being defined at the same time. It also means
> that explicitly specifying a pool name via the 'pool' resource, along
> with either a 'dedicated-cpu' or 'dedicated-memory' resource is an
> invalid combination.
>
> These new zonecfg resource names (dedicated-cpu, capped-cpu,
> dedicated-memory & capped-memory) are chosen so as to be reasonably
> clear what the objective is, even though they do not exactly align
> with our existing underlying (and inconsistent) RM naming schemes.
>
> 2) Temporary Pools.
>
> We will implement the concept of "temporary pools" within the pools
> framework.
>
> To improve the integration of zones and pools we are allowing the
> configuration of some basic pool attributes within zonecfg, as
> described above in section 1. However, we do not want to extend
> zonecfg to completely and directly manage standard pool configurations.
> That would lead to confusion and inconsistency regarding which tool to
> use and where configuration data is stored. Temporary pools sidesteps
> this problem and allows zones to dynamically create a simple pool/pset
> configuration for the basic case where a sysadmin just wants a
> specified number of processors dedicated to the zone (and eventually a
> dedicated amount of memory).
>
> We believe that the ability to simply specify a fixed number of cpus
> (and eventually a mset size) meets the needs of a large percentage of
> zones users who need "hard" partitioning (e.g. to meet licensing
> restrictions).
>
> If a dedicated-cpu (or eventually a dedicated-memory) resource is
> configured for the zone, then when the zone boots zoneadmd will create
> a temporary pool dedicated for the zones use. Zoneadmd will
> dynamically create a pool & pset (or eventually a mset) and assign the
> number of cpus specified in zonecfg to that pset. The temporary pool
> & pset will be named 'SUNWzone{zoneid}'.
>
> Zoneadmd will set the 'pset.min' and 'pset.max' pset properties, as
> well as the 'pool.importance' pool property, based on the values
> specified for dedicated-cpu's 'ncpus' and 'importance' properties
> in zonecfg.
>
> If the cpu (or memory) resources needed to create the temporary pool
> are unavailable, zoneadmd will issue an error and the zone won't boot.
>
> When the zone is halted, the temporary pool & pset will be destroyed.
>
> We will add a new boolean property ('temporary') that can exist on
> pools and any resource set. The 'temporary' property indicates that
> the pool or resource set should never be committed to a static
> configuration (e.g. pooladm -s) and that it should never be destroyed
> when updating the dynamic configuration from a static configuration
> (e.g. pooladm -c). These temporary pools/resources can only be managed
> in the dynamic configuration. These changes will be implemented within
> libpool(3LIB).
>
> It is our expectation that most users will never need to manage
> temporary pools through the existing poolcfg(1M) commands. For users
> who need more sophisticated pool configuration and management, the
> existing 'pool' resource within zonecfg should be used and users
> should manually create a permanent pool using the existing mechanisms.
>
> 3) Resource controls in zonecfg will be simplified.
>
> Within zonecfg the existing rctls (zone.cpu-shares and zone.max-lwps)
> take a 3-tuple value where only a single component usually has any
> meaning (the 'limit'). The other two components of the value (the
> 'priv' and 'action') are not normally changed but users can be confused
> if they don't understand what the other components mean or what values
> can be specified.
>
> Here is a zonecfg example:
> > add rctl
> rctl> set name=zone.cpu-shares
> rctl> add value (priv=privileged,limit=5,action=none)
> rctl> end
>
> Within zonecfg we will introduce the idea of rctl aliases. The alias
> is a simplified name and template for the existing rctls. Behind the
> scenes we continue to store the data using the existing rctl entries
> in the XML file. Thus, the alias always refers to the same underlying
> piece of data as the full rctl.
>
> The purpose of the rctl alias is to provide a simplified name and
> mechanism to set the rctl 'limit'. For each rctl/alias pair we will
> "know" the expected values for the 'priv' and 'action' components of
> the rctl value. If an rctl is already defined that does not match this
> "knowledge" (e.g. it has a non-standard 'action' or there are multiple
> values defined for the rctl), then the user will not be allowed to use
> an alias for that rctl.
>
> Here are the aliases we will define for the rctls:
> alias rctl
> ----- ----
> max-lwps zone.max-lwps
> cpu-shares zone.cpu-shares
> cpu-cap zone.cpu-cap (future, once cpu-caps integrate)
>
> Here is an example of the max-lwps alias used as a property within the
> new 'dedicated-cpu' resource:
>
> > add dedicated-cpu
> dedicated-cpu> set ncpus=2-4
> dedicated-cpu> set max-lwps=500
> dedicated-cpu> end
> > info
> ...
> dedicated-cpu:
> ncpus: 2-4
> max-lwps: 500
> rctl:
> name: zone.max-lwps
> value: (priv=privileged,limit=500,action=deny)
>
> In the example, you can see the use of the alias when adding the
> 'dedicated-cpu' resource and you can also see the full rctl output
> within the 'info' command. If the 'max-lwps' property had not been set
> within the 'dedicated-cpu' resource, then the corresponding rctl would
> not be defined.
>
> If you update the rctl value through the 'rctl' resource within
> zonecfg, then the corresponding value within the 'dedicated-cpu'
> resource would also be updated since both the rctl and its alias refer
> to the same piece of data.
>
> If an rctl was already defined that did not match the expected value
> (e.g. it had 'action=none' or multiple values), then the 'max-lwps'
> alias will be disabled. An attempt to set 'max-lwps' within
> 'dedicated-cpu' would print the following error:
> "One or more incompatible rctls already exist for this
> property"
>
> This rctl alias enhancement is fully backward compatible with the
> existing rctl syntax. That is, zonecfg output will continue to display
> rctl settings in the current format (in addition to the new aliased
> format) and zonecfg will continue to accept the existing input syntax
> for setting rctls. This ensures full backward compatibility for any
> existing tools/scripts that parse zonecfg output or configure zones.
>
> 4) Enable rcapd to limit zone memory while running in the global zone
>
> Currently, to use rcapd(1M) to limit zone memory consumption, the
> rcapd process must be run within the zone. This exposes a loophole
> since the zone administrator, who might be untrusted, can change the
> rcapd limit.
>
> We will enhance rcapd so that it can limit zone's memory consumption
> while it is running in the global zone. This closes the rcapd
> loophole and allows the global zone administrator to set memory
> caps that can be enforced by a single, trusted process.
>
> The rcapd limit for a zone will be configured using the new
> 'capped-memory' resource and 'cap' property within zonecfg.
> When a zone with 'capped-memory' boots, zoneadmd will automatically
> start rcapd in the global zone, if necessary. The interfaces to
> communicate memory cap information between zoneadmd and rcapd
> are project private.
>
> As part of this overall project, we will be enhancing the internal
> rcapd rss accounting so that rcapd will have a more accurate
> measurement of the overall rss for each zone.
>
> 5) Use FSS when zone.cpu-shares is set
>
> Although the zone.cpu-shares rctl can be set on a zone, the Fair Share
> Scheduler (FSS) is not the default scheduling class so this rctl
> frequently has no effect, unless the user also sets FSS as the
> default scheduler or changes the zones processes to use FSS with the
> priocntl(1M) command. This means that users can easily think
> they have configured their zone for a behavior that they are not
> actually getting.
>
> We will enhance zoneadmd so that if the zone.cpu-shares rctl is set
> and FSS is not already the default scheduling class, zoneadmd will set
> the scheduling class to be FSS for processes in the zone.
>
> 6) Add RM templates for zone creation
>
> Zonecfg already supports templates on the 'create' subcommand using
> the '-t' option. We will update the documentation which currently
> states that a template must be the name of an existing zone. We
> already deliver two existing templates (SUNWblank and SUNWdefault).
>
> We will deliver at least four new templates that configure
> reasonable default properties for the four new resources in zonecfg:
> fully dedicated: dedicated-cpu & dedicated-memory
> cpu dedicated, memory capped: dedicated-cpu & capped-memory
> cpu capped, memory dedicated: capped-cpu & dedicated-memory
> cpu & memory capped: capped-cpu & capped-memory
>
> We may also deliver other templates that only pre-configure one of
> the new resources (e.g. only configures dedicated-cpu and leaves
> memory with the default handling).
>
> We will enhance the 'create' help command to briefly describe the
> templates and why you would use one vs. another.
>
> The names of all new templates will begin with SUNW. This namespace
> was already reserved by [1].
>
> This zonecfg change will primarily impact the documentation.
>
> 7) Pools system objective defaults to weighted-load (wt-load)[4]
>
> Currently pools are delivered with no objective set. This means that
> if you enable the poold(1M) service, nothing will actually happen on
> your system.
>
> As part of this project, we will set weighted load
> (system.poold.objectives=wt-load) to be the default objective.
> Delivering this objective as the default does not impact systems out
> of the box since poold is disabled by default.
>
> EXPORTED INTERFACES
>
> New zonecfg resource names
> dedicated-cpu Evolving
> capped-cpu Evolving
> dedicated-memory Evolving
> capped-memory Evolving
>
> The capped-cpu and dedicated-memory resource names are
> being reserved now in anticipation of the future integration
> of the cpu-caps and memory sets projects. However, we do
> not want to make this project dependent on [11] & [12].
>
> New zonecfg property names
> ncpus Evolving
> importance Evolving
> cap Evolving
>
> New zonecfg rctl alias names
> max-lwps Evolving
> cpu-shares Evolving
> cpu-cap Evolving
>
> The cpu-cap rctl alias is being reserved now in anticipation
> of the future integration of the cpu-caps projects. However,
> we do not want to make this project dependent on [11].
>
> Temporary pool & resource names
> SUNWzone{id} Stable
>
> New temporary pool & resource boolean properties
> 'pool.temporary' Evolving
> 'pset.temporary' Evolving
> '*.temporary' Evolving
> (for future resources such as mset)
>
> rcapd/zoneadmd interface
> zone_getattr Project Private
>
> wt-load as default Evolving
>
> IMPORTED INTERFACES
>
> libpool(3LIB) unstable
>
> REFERENCES
>
> 1. PSARC 2002/174 Virtualization and Namespace Isolation in Solaris
> 2. PSARC 2000/136 Administrative support for processor sets and extensions
> 3. PSARC 1999/119 Tasks, Sessions, Projects and Accounting
> 4. PSARC 2002/287 Dynamic Resource Pools
> 5. PSARC 2002/519 rcapd(1MSRM): resource capping daemon
> 6. PSARC 2003/155 rcapd(1M) sedimentation
> 7. 6421202 RFE: simplify and improve zones/pool integration
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6421202
> 8. 6222025 RFE: simplify rctl syntax and improve cpu-shares/FSS interaction
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6222025
> 9. 5026227 RFE: ability to rcap zones from global zone
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=5026227
> 10. 6409152 RFE: template support for better RM integration
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6409152
> 11. PSARC 2004/402 CPU Caps
> 12. PSARC 2000/350 Physical Memory Control
> 13. PSARC 2002/181 Swap Sets
>
>

--
--------------------------------------------------------------------------
Jeff VICTOR Sun Microsystems jeff.victor @ sun.com
OS Ambassador Sr. Technical Specialist
Solaris 10 Zones FAQ: http://www.opensolaris.org/os/community/zones/faq
--------------------------------------------------------------------------
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



Wences Michel
Wences.Michel@Sun.COM
Re: Applications Support in Zones Question?
Posted: Jun 27, 2006 9:21 PM   in response to: jeffv

  Click to reply to this thread Reply

Howdy!

IHAC that wants to migrate to Solaris 10, I have told them that the following applications they are running are supported on Solaris 10, but they want to know if they are supported in zones. Has anyone found or seen any issues running these apps in zones? Who do I should I contact to found out if these apps are supported in zones?

IBM WebSphere MQ 5.3 and 6.0
IBM WebSphere Application Server Network Deployment 6.0.2
Oracle 10g with RAC
IBM HTTP Server known as I.H.S. the version that ships with the app
server
BEA WebLogic 8.1 or 9.0 Enterprise.

Thanks!

Wences

_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



Neil Garthwaite
Neil.Garthwaite@Sun....
Re: Applications Support in Zones Question?
Posted: Jun 27, 2006 11:10 PM   in response to: Wences Michel

  Click to reply to this thread Reply

Wences,

I can answer for IBM WebSphere MQ,

" Installation of WebSphere MQ is supported in Global and non-Global
Whole Root zone environments.", please read the link below for more info.

http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg21233258

While I'm here, as WebSphere MQ offers a single point of failure, you
may want to also consider Sun Cluster. Currently with SC 3.1 08/05 you
can deploy WMQ v5.3 and v6.0.1 into separate whole root "failover" zones
using the Sun Cluster Data Service for Solaris Containers together with
the Sun Cluster Data Service for WebSphere MQ.

An interesting option with Sun Cluster is a single node cluster
providing a restart of WebSphere MQ if the app has failed or is wedged
within the Global or non-Global Whole Root zone. Of course further
protection against SPOFs are had when more than one node is used within
Sun Cluster.

Finally, the above applies to S10 + SC 3.1 08/05 on SPARC (v5.3 &
v6.0.1) and x86-64 (WMQ v6.0.1 only).

Regards
Neil

Wences Michel wrote:

> Howdy!
>
> IHAC that wants to migrate to Solaris 10, I have told them that the
> following applications they are running are supported on Solaris 10,
> but they want to know if they are supported in zones. Has anyone
> found or seen any issues running these apps in zones? Who do I should
> I contact to found out if these apps are supported in zones?
>
> IBM WebSphere MQ 5.3 and 6.0 IBM WebSphere Application Server Network
> Deployment 6.0.2 Oracle 10g with RAC
> IBM HTTP Server known as I.H.S. the version that ships with the app
> server
> BEA WebLogic 8.1 or 9.0 Enterprise.
>
> Thanks!
>
> Wences
>
> _______________________________________________
> zones-discuss mailing list
> zones-discuss at opensolaris dot org


_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



mgerdts

Posts: 1,262
From: US

Registered: 8/5/05
Re: Applications Support in Zones Question?
Posted: Jun 28, 2006 5:13 AM   in response to: Wences Michel

  Click to reply to this thread Reply

On 6/27/06, Wences Michel <Wences dot Michel at sun dot com> wrote:
> Oracle 10g with RAC

If you are using Veritas cluster volume manager and VxFS for this, the
only supported mechanism to mount the file system is via loopback from
the global zone. Also, the entire Veritas storage and cluster
framework would need to run in the global zone. This implies that
there would be a fair amount of rework between the CFSmount and Oracle
(is that the right name?) resources to add loopback mounts and zone
resources.

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



van30

Posts: 12
From:

Registered: 4/14/06
Re: Applications Support in Zones Question?
Posted: Jul 5, 2006 3:12 PM   in response to: Wences Michel

  Click to reply to this thread Reply

Oracle 10g with RAC is not supported inside local zones. See the Oracle metalink note 317257.1, excerpts below.

"Oracle RAC does not work in non-global Solaris 10 Containers."

&

7.1 Oracle RAC

An Oracle RAC installation is composed of several nodes, shared storage, and a private interconnect. A local Solaris Container cannot be used as an Oracle RAC node, mainly because of the following two reasons:

Cluster manager: One of the software components that must be present in a RAC installation is a cluster manager. The cluster manager is responsible for isolating failed systems from the shared storage,to avoid corruption of data, and enabling communication between nodes in the cluster via a private interconnect. In order to use local Solaris Containers as RAC nodes the cluster manager would need to become capable of managing local Containers. As of this writing no cluster solution is capable of using local containers as cluster members or nodes.

Oracle RAC VIP: Another limitation to running Oracle RAC in local Solaris Container is the Virtual IP. Oracle RAC uses a virtual interface in each node to distribute client connections to all the active nodes in the cluster. When a node (node A) leaves the cluster one of the other active members of the cluster (node B) will take over its virtual IP address by bringing up an extra virtual interface with the IP address used by node A. In this way node B will service all new connections sent to node A until node A re-joins the cluster. For security reasons, a local Container does not have the privileges required for managing virtual interfaces. Therefore, the VIP mechanism used by Oracle RAC conflicts with the local container security limitations.

mgerdts

Posts: 1,262
From: US

Registered: 8/5/05
Re: improved zones/RM integration
Posted: Jun 28, 2006 5:06 AM   in response to: jeffv

  Click to reply to this thread Reply

On 6/27/06, Jeff Victor <Jeff dot Victor at sun dot com> wrote:
> 1) General comment: I agree that this will provide needed clarity to the seemingly
> unorganized RM features that we have scattered through Solaris during the last
> decade. The automation of certain activities (e.g. starting rcapd in the GZ when
> needed) will also be extremely beneficial.

Most definitely.

> 2) Terminology: Although your use of the phrase "hard partition" should be clear
> to most people, through experience with other partitioning technologies, the use
> of "soft partition" is less clear. To many people, a "soft" limit is one that can
> be exceeded occasionally or in certain situations, or is merely an advisory limit.
> It also conflicts with SVM "soft partitions."
>
> I suggest the phrases "dedicated partition" (or "private partition") and "shared
> partition" instead to clarify the intent. Choosing "dedicated partition" might
> then require re-naming "dedicated-cpu" to "guaranteed-cpu" or "private-cpu" and
> re-naming "dedicated-memory" to "guaranteed-memory" or "private memory".
>
> Or we could leave "hard partition" alone and simply change "soft partition" to
> "shared partition."

I think that "soft partition" is more clear. "Shared partition"
suggest to me that you are sharing a single hard or soft partition for
multiple workloads.

> 3) Lwps context: Why is the lwps alias defined in the context of dedicated-cpu?
> Lwps seem to be unrelated to hard partitions. Further, ncpus represents a subset
> of a finite resource. Lwps are not finite. The two should be separate.

Something seemed odd with that to me too. I didn't see it as terribly
harmful there, but it is an excellent point. This is analagous to
maxuprc typically set in /etc/system. What are the chances that each
zone in the future gets file descriptor limits or other similar limits
that should be in the same section as the lwp limit?

> 4) Aliases: The notion of aliases also creates redundant output which could be
> confusing. I like the simplification of aliases as described, but I wish I had a
> good solution that wouldn't break existing tools that parse "zonecfg info" output
> - if such tools exist.

Because key features are missing from zones, I have been writing
scripts that sometimes parse the output of "zonecfg info". Changing
this format stands a good chance of breaking my scripts. It sounds
like if I set pool and rctl resources, they would still be displayed
as rctl and not translated to the syntax associated with temporary
pools. Is this correct?

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



gjelinek

Posts: 470
From: US

Registered: 3/9/05
Re: improved zones/RM integration
Posted: Jun 28, 2006 6:32 AM   in response to: mgerdts

  Click to reply to this thread Reply

Mike,

I think most of your comments were addressed in my response to
Jeff but I did want to make sure one thing was clear.

Mike Gerdts wrote:
> On 6/27/06, Jeff Victor <Jeff dot Victor at sun dot com> wrote:
>> 4) Aliases: The notion of aliases also creates redundant output which
>> could be
>> confusing. I like the simplification of aliases as described, but I
>> wish I had a
>> good solution that wouldn't break existing tools that parse "zonecfg
>> info" output
>> - if such tools exist.
>
> Because key features are missing from zones, I have been writing
> scripts that sometimes parse the output of "zonecfg info". Changing
> this format stands a good chance of breaking my scripts. It sounds
> like if I set pool and rctl resources, they would still be displayed
> as rctl and not translated to the syntax associated with temporary
> pools. Is this correct?

Yes, we considered lots of alternatives but we wanted to make sure
that any scripts would continue to work. So, if your scripts are setting
or looking at the rctl entries, then they should continue to work,
even if you also start to use the new resources.

Thanks,
Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



gjelinek

Posts: 470
From: US

Registered: 3/9/05
Re: improved zones/RM integration
Posted: Jun 28, 2006 6:25 AM   in response to: jeffv

  Click to reply to this thread Reply

Jeff,

Thanks for your comments. I have a few responses in-line.

Jeff Victor wrote:
> 1) General comment: I agree that this will provide needed clarity to the
> seemingly unorganized RM features that we have scattered through Solaris
> during the last decade. The automation of certain activities (e.g.
> starting rcapd in the GZ when needed) will also be extremely beneficial.
>
> 2) Terminology: Although your use of the phrase "hard partition" should
> be clear to most people, through experience with other partitioning
> technologies, the use of "soft partition" is less clear. To many people,
> a "soft" limit is one that can be exceeded occasionally or in certain
> situations, or is merely an advisory limit. It also conflicts with SVM
> "soft partitions."

We started using "hard" and "soft" to describe the general idea amongst
ourselves when we first started talking about this but we were never
very satisfied with those terms either. These two terms will not necessarily
be used in the final documentation and are not used in the resource names
themselves. Naming always seems to be a difficult area. The key technical
issue to focus on is the resource names and properties being proposed as
opposed to the overall words we are using to describe the general ideas.
I am guessing we were able to communicate the general idea using "hard" and
"soft" so I think we are ok there. We will have to figure out the best way
to document this when the time comes. It is hard to find good terms that
are not already used by other parts of the system. The word "resource" is
a good example and is probably more confusing than "soft partition".

> 3) Lwps context: Why is the lwps alias defined in the context of
> dedicated-cpu? Lwps seem to be unrelated to hard partitions. Further,
> ncpus represents a subset of a finite resource. Lwps are not finite.
> The two should be separate.

Being able to set max-lwps is a useful limit for both processor sets
and cpu-caps which is why it is available in both resources. The global
zone still manages all processes, even when you are using a processor
set, so a fork bomb can still effect the responsiveness of the system
as a whole. However, max-lwps is optional in both the dedicated-cpu and
capped-cpu resources. I should have made that clearer. I will
update the document to clarify that.

> 4) Aliases: The notion of aliases also creates redundant output which
> could be confusing. I like the simplification of aliases as described,
> but I wish I had a good solution that wouldn't break existing tools that
> parse "zonecfg info" output - if such tools exist.

This is part of the proposal which we struggled with a lot. In the
end, we decided that we needed to maintain compatibility so that
we did not break scripts that talked to the CLI directly. This was the
compromise we came up with that allows us to do that.

> 5) Another RM setting: While we're integrating RM settings, I think we
> should consider adding project.max-address-space using the same
> semantics as {project,zone}.max-lwps. A zone-specific setting,
> zone.max-address-space, could be added along with a zonecfg alias,
> max-address-space. This would allow the GZ admin to cap the virtual
> memory space available to that zone. This would not take the place of
> swap sets, but would be valuable if this RM integration work might be
> complete before swap sets.

We are looking at additional rctls, they are just not part of this project.

Thanks again for your comments,
Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



amolchip

Posts: 17
From:

Registered: 6/12/06
Re: improved zones/RM integration
Posted: Jun 28, 2006 6:24 AM   in response to: gjelinek

  Click to reply to this thread Reply


These are very exciting features !!

Some comments.

If a dedicated-cpu (or eventually a dedicated-memory) resource is
configured for the zone, then when the zone boots zoneadmd will create
a temporary pool dedicated for the zones use.

The temp pool created is going to show up in pooladm output, right ?
If someone uses such a pool in a zone configuration done in traditional
style. ( set pool=SUNWzone{zoneid} ) is zonecfg going to reject it ?
If not, it won't be "dedicated" anymore.

Also, is there something that's going to stop poolbind that may move
zones to and from temp pools to permanent pools ?
What if a zone was created as a result of which a temp pool was created,
and poolbind moves it to a permanent pool ? The temp pool is deleted ?

Resource controls in zonecfg will be simplified.
Do you think we need prctl enhancements that will allow setting rctls
using the new aliases directly ? that would be good for consistency.

Also you mention that the backword compatibility ensures the existing
tools that parse zonecfg info / export output are unaffected. It's true
to some extend. But some of them treat "unknown" resources as nop and
display them as is. Which means some tools will show the new resources
as unknown which is harmless but could be confusing sometimes.

May be having a command line option such as zonecfg info -legacy that would
suppress the new resources could help.

We will enhance zoneadmd so that if the zone.cpu-shares rctl is set
and FSS is not already the default scheduling class, zoneadmd will set
the scheduling class to be FSS for processes in the zone.

On the fly ? i.e. If a zone didn't have zone.cpu-shares when it was booted
and someone did prctl to set it, zoneadmd will change the sched class of all
processes in the zone to FSS ? That's cool.

On the other hand,
How about when this zone is sharing a pool with another zone without
cpu-shares and the pool's sched class is also not FSS ?
We don't recommend processes running in different scheduling class
share CPUs right ? This feature may take it to that situation.

thanks
- Amol


Gerald A. Jelinek wrote:
> Attached is a description of a project we have been refining for
> a while now. The idea is to improve the integration of zones
> with some of the existing resource management features in Solaris.
> I would appreciate hearing any suggestions or questions. I'd
> like to submit this proposal to our internal architectural review
> process by mid-July. I have also posted a few slides that give an
> overview of the project. Those are available on the zones files
> page (http://www.opensolaris.org/os/community/zones/files/).
>
> Thanks,
> Jerry
>
>
> This message posted from opensolaris.org
>
>
> ------------------------------------------------------------------------
>
> SUMMARY:
>
> This project enhances Solaris zones[1], pools[2-4] and resource
> caps[5,6] to improve the integration of zones with resource
> management (RM). It addresses existing RFEs[7-10] in this area and
> lays the groundwork for simplified, coherent management of the various
> RM features exposed through zones.
>
> We will integrate some basic pool configuration with zones, implement
> the concept of "temporary pools" that are dynamically created/destroyed
> when a zone boots/halts and we will simplify the setting of resource
> controls within zonecfg. We will enhance rcapd so that it can cap
> a zone's memory while rcapd is running in the global zone. We will
> also make a few other changes to provide a better overall experience
> when using zones with RM.
>
> Patch binding is requested for these new interfaces and the stability
> of most of these interfaces is "evolving" (see interface table for
> complete list).
>
> PROBLEM:
>
> Although zones are fairly easy to configure and install, it appears
> that many customers have difficulty setting up a good RM configuration
> to accompany their zone configuration. Understanding RM involves many
> new terms and concepts along with lots of documentation to understand.
> This leads to the problem that many customers either do not configure
> RM with their zones, or configure it incorrectly, leading them to be
> disappointed when zones, by themselves, do not provide all of the
> containment that they expect.
>
> This problem will just get worse in the near future with the
> additional RM features that are coming, such as cpu-caps[11], memory
> sets[12] and swap sets[13].
>
> PROPOSAL:
>
> There are 7 different enhancements outlined below.
>
> 1) "Hard" vs. "Soft" RM configuration within zonecfg
>
> We will enhance zonecfg(1M) so that the user can configure basic RM
> capabilities in a structured way.
>
> The various existing and upcoming RM features can be broken down
> into "hard" vs. "soft" partitioning of the system's resources.
> With "hard" partitioning, resources are dedicated to the zone using
> processor sets (psets) and memory sets (msets). With "soft"
> partitioning, resources are shared, but capped, with an upper limit
> on their use by the zone.
>
> Hard | Soft
> ---------------------------------
> cpu | psets | cpu-caps
> memory | msets | rcapd
>
> There are also some existing rctls (zone.cpu-shares, zone.max-lwps)
> which will be integrated into this overall concept.
>
> Within zonecfg we will organize the various RM features into four
> basic zonecfg resources so that it is simple for a user to understand
> and configure the RM features that are to be used with their zone.
> Note that zonecfg "resources" are not the same as "resource
> management". Within zonecfg, a "resource" is the name of a top-level
> property of the zone (see zonecfg(1M) for more information).
>
> The four new zonecfg resources are:
> dedicated-cpu
> capped-cpu (future, once cpu-caps are integrated)
> dedicated-memory (future, once memory sets are integrated)
> capped-memory
>
> Each of these zonecfg resources will have properties that are
> appropriate to the RM capabilities associated with that resource.
> Zonecfg will only allow one instance of each these resource to be
> configured and it will not allow conflicting resources to be added
> (e.g. dedicated-cpu and capped-cpu are mutually exclusive).
>
> The mapping of these new zonecfg resources to the primary underlying RM
> feature is:
> dedicated-cpu -> temporary pset
> dedicated-memory -> temporary mset
> capped-cpu -> cpu-cap rctl [11]
> capped-memory -> rcapd running in GZ
>
> Temporary psets and msets are described below, in section 2.
> Rcapd enhancements for running in the global zone are described below,
> in section 4.
>
> The valid properties for each of these new zonecfg resources will be:
>
> dedicated-cpu
> ncpus (a positive integer or range, default value 1)
> importance (a positive integer, default value 1)
> max-lwps (an integer >= 100)
> capped-cpu
> cpu-cap (a positive integer, default value 100 which
> represents 100% of one cpu)
> max-lwps (an integer >= 100)
> cpu-shares (a positive integer)
> dedicated-memory
> TBD - once msets [12] are completed
> capped-memory
> cap (a positive decimal number with optional k, m, g,
> or t as a modifier, no modifier defaults to units
> of megabytes(m), must be at least 1m)
>
> Some of these properties actually correspond to rctls. See section 3
> below for a description of how this will work.
>
> Zonecfg will also be enhanced to check for invalid combinations.
> This means it will disallow a dedicated-cpu resource and the
> zone.cpu-shares rctl being defined at the same time. It also means
> that explicitly specifying a pool name via the 'pool' resource, along
> with either a 'dedicated-cpu' or 'dedicated-memory' resource is an
> invalid combination.
>
> These new zonecfg resource names (dedicated-cpu, capped-cpu,
> dedicated-memory & capped-memory) are chosen so as to be reasonably
> clear what the objective is, even though they do not exactly align
> with our existing underlying (and inconsistent) RM naming schemes.
>
> 2) Temporary Pools.
>
> We will implement the concept of "temporary pools" within the pools
> framework.
>
> To improve the integration of zones and pools we are allowing the
> configuration of some basic pool attributes within zonecfg, as
> described above in section 1. However, we do not want to extend
> zonecfg to completely and directly manage standard pool configurations.
> That would lead to confusion and inconsistency regarding which tool to
> use and where configuration data is stored. Temporary pools sidesteps
> this problem and allows zones to dynamically create a simple pool/pset
> configuration for the basic case where a sysadmin just wants a
> specified number of processors dedicated to the zone (and eventually a
> dedicated amount of memory).
>
> We believe that the ability to simply specify a fixed number of cpus
> (and eventually a mset size) meets the needs of a large percentage of
> zones users who need "hard" partitioning (e.g. to meet licensing
> restrictions).
>
> If a dedicated-cpu (or eventually a dedicated-memory) resource is
> configured for the zone, then when the zone boots zoneadmd will create
> a temporary pool dedicated for the zones use. Zoneadmd will
> dynamically create a pool & pset (or eventually a mset) and assign the
> number of cpus specified in zonecfg to that pset. The temporary pool
> & pset will be named 'SUNWzone{zoneid}'.
>
> Zoneadmd will set the 'pset.min' and 'pset.max' pset properties, as
> well as the 'pool.importance' pool property, based on the values
> specified for dedicated-cpu's 'ncpus' and 'importance' properties
> in zonecfg.
>
> If the cpu (or memory) resources needed to create the temporary pool
> are unavailable, zoneadmd will issue an error and the zone won't boot.
>
> When the zone is halted, the temporary pool & pset will be destroyed.
>
> We will add a new boolean property ('temporary') that can exist on
> pools and any resource set. The 'temporary' property indicates that
> the pool or resource set should never be committed to a static
> configuration (e.g. pooladm -s) and that it should never be destroyed
> when updating the dynamic configuration from a static configuration
> (e.g. pooladm -c). These temporary pools/resources can only be managed
> in the dynamic configuration. These changes will be implemented within
> libpool(3LIB).
>
> It is our expectation that most users will never need to manage
> temporary pools through the existing poolcfg(1M) commands. For users
> who need more sophisticated pool configuration and management, the
> existing 'pool' resource within zonecfg should be used and users
> should manually create a permanent pool using the existing mechanisms.
>
> 3) Resource controls in zonecfg will be simplified.
>
> Within zonecfg the existing rctls (zone.cpu-shares and zone.max-lwps)
> take a 3-tuple value where only a single component usually has any
> meaning (the 'limit'). The other two components of the value (the
> 'priv' and 'action') are not normally changed but users can be confused
> if they don't understand what the other components mean or what values
> can be specified.
>
> Here is a zonecfg example:
> > add rctl
> rctl> set name=zone.cpu-shares
> rctl> add value (priv=privileged,limit=5,action=none)
> rctl> end
>
> Within zonecfg we will introduce the idea of rctl aliases. The alias
> is a simplified name and template for the existing rctls. Behind the
> scenes we continue to store the data using the existing rctl entries
> in the XML file. Thus, the alias always refers to the same underlying
> piece of data as the full rctl.
>
> The purpose of the rctl alias is to provide a simplified name and
> mechanism to set the rctl 'limit'. For each rctl/alias pair we will
> "know" the expected values for the 'priv' and 'action' components of
> the rctl value. If an rctl is already defined that does not match this
> "knowledge" (e.g. it has a non-standard 'action' or there are multiple
> values defined for the rctl), then the user will not be allowed to use
> an alias for that rctl.
>
> Here are the aliases we will define for the rctls:
> alias rctl
> ----- ----
> max-lwps zone.max-lwps
> cpu-shares zone.cpu-shares
> cpu-cap zone.cpu-cap (future, once cpu-caps integrate)
>
> Here is an example of the max-lwps alias used as a property within the
> new 'dedicated-cpu' resource:
>
> > add dedicated-cpu
> dedicated-cpu> set ncpus=2-4
> dedicated-cpu> set max-lwps=500
> dedicated-cpu> end
> > info
> ...
> dedicated-cpu:
> ncpus: 2-4
> max-lwps: 500
> rctl:
> name: zone.max-lwps
> value: (priv=privileged,limit=500,action=deny)
>
> In the example, you can see the use of the alias when adding the
> 'dedicated-cpu' resource and you can also see the full rctl output
> within the 'info' command. If the 'max-lwps' property had not been set
> within the 'dedicated-cpu' resource, then the corresponding rctl would
> not be defined.
>
> If you update the rctl value through the 'rctl' resource within
> zonecfg, then the corresponding value within the 'dedicated-cpu'
> resource would also be updated since both the rctl and its alias refer
> to the same piece of data.
>
> If an rctl was already defined that did not match the expected value
> (e.g. it had 'action=none' or multiple values), then the 'max-lwps'
> alias will be disabled. An attempt to set 'max-lwps' within
> 'dedicated-cpu' would print the following error:
> "One or more incompatible rctls already exist for this
> property"
>
> This rctl alias enhancement is fully backward compatible with the
> existing rctl syntax. That is, zonecfg output will continue to display
> rctl settings in the current format (in addition to the new aliased
> format) and zonecfg will continue to accept the existing input syntax
> for setting rctls. This ensures full backward compatibility for any
> existing tools/scripts that parse zonecfg output or configure zones.
>
> 4) Enable rcapd to limit zone memory while running in the global zone
>
> Currently, to use rcapd(1M) to limit zone memory consumption, the
> rcapd process must be run within the zone. This exposes a loophole
> since the zone administrator, who might be untrusted, can change the
> rcapd limit.
>
> We will enhance rcapd so that it can limit zone's memory consumption
> while it is running in the global zone. This closes the rcapd
> loophole and allows the global zone administrator to set memory
> caps that can be enforced by a single, trusted process.
>
> The rcapd limit for a zone will be configured using the new
> 'capped-memory' resource and 'cap' property within zonecfg.
> When a zone with 'capped-memory' boots, zoneadmd will automatically
> start rcapd in the global zone, if necessary. The interfaces to
> communicate memory cap information between zoneadmd and rcapd
> are project private.
>
> As part of this overall project, we will be enhancing the internal
> rcapd rss accounting so that rcapd will have a more accurate
> measurement of the overall rss for each zone.
>
> 5) Use FSS when zone.cpu-shares is set
>
> Although the zone.cpu-shares rctl can be set on a zone, the Fair Share
> Scheduler (FSS) is not the default scheduling class so this rctl
> frequently has no effect, unless the user also sets FSS as the
> default scheduler or changes the zones processes to use FSS with the
> priocntl(1M) command. This means that users can easily think
> they have configured their zone for a behavior that they are not
> actually getting.
>
> We will enhance zoneadmd so that if the zone.cpu-shares rctl is set
> and FSS is not already the default scheduling class, zoneadmd will set
> the scheduling class to be FSS for processes in the zone.
>
> 6) Add RM templates for zone creation
>
> Zonecfg already supports templates on the 'create' subcommand using
> the '-t' option. We will update the documentation which currently
> states that a template must be the name of an existing zone. We
> already deliver two existing templates (SUNWblank and SUNWdefault).
>
> We will deliver at least four new templates that configure
> reasonable default properties for the four new resources in zonecfg:
> fully dedicated: dedicated-cpu & dedicated-memory
> cpu dedicated, memory capped: dedicated-cpu & capped-memory
> cpu capped, memory dedicated: capped-cpu & dedicated-memory
> cpu & memory capped: capped-cpu & capped-memory
>
> We may also deliver other templates that only pre-configure one of
> the new resources (e.g. only configures dedicated-cpu and leaves
> memory with the default handling).
>
> We will enhance the 'create' help command to briefly describe the
> templates and why you would use one vs. another.
>
> The names of all new templates will begin with SUNW. This namespace
> was already reserved by [1].
>
> This zonecfg change will primarily impact the documentation.
>
> 7) Pools system objective defaults to weighted-load (wt-load)[4]
>
> Currently pools are delivered with no objective set. This means that
> if you enable the poold(1M) service, nothing will actually happen on
> your system.
>
> As part of this project, we will set weighted load
> (system.poold.objectives=wt-load) to be the default objective.
> Delivering this objective as the default does not impact systems out
> of the box since poold is disabled by default.
>
> EXPORTED INTERFACES
>
> New zonecfg resource names
> dedicated-cpu Evolving
> capped-cpu Evolving
> dedicated-memory Evolving
> capped-memory Evolving
>
> The capped-cpu and dedicated-memory resource names are
> being reserved now in anticipation of the future integration
> of the cpu-caps and memory sets projects. However, we do
> not want to make this project dependent on [11] & [12].
>
> New zonecfg property names
> ncpus Evolving
> importance Evolving
> cap Evolving
>
> New zonecfg rctl alias names
> max-lwps Evolving
> cpu-shares Evolving
> cpu-cap Evolving
>
> The cpu-cap rctl alias is being reserved now in anticipation
> of the future integration of the cpu-caps projects. However,
> we do not want to make this project dependent on [11].
>
> Temporary pool & resource names
> SUNWzone{id} Stable
>
> New temporary pool & resource boolean properties
> 'pool.temporary' Evolving
> 'pset.temporary' Evolving
> '*.temporary' Evolving
> (for future resources such as mset)
>
> rcapd/zoneadmd interface
> zone_getattr Project Private
>
> wt-load as default Evolving
>
> IMPORTED INTERFACES
>
> libpool(3LIB) unstable
>
> REFERENCES
>
> 1. PSARC 2002/174 Virtualization and Namespace Isolation in Solaris
> 2. PSARC 2000/136 Administrative support for processor sets and extensions
> 3. PSARC 1999/119 Tasks, Sessions, Projects and Accounting
> 4. PSARC 2002/287 Dynamic Resource Pools
> 5. PSARC 2002/519 rcapd(1MSRM): resource capping daemon
> 6. PSARC 2003/155 rcapd(1M) sedimentation
> 7. 6421202 RFE: simplify and improve zones/pool integration
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6421202
> 8. 6222025 RFE: simplify rctl syntax and improve cpu-shares/FSS interaction
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6222025
> 9. 5026227 RFE: ability to rcap zones from global zone
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=5026227
> 10. 6409152 RFE: template support for better RM integration
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6409152
> 11. PSARC 2004/402 CPU Caps
> 12. PSARC 2000/350 Physical Memory Control
> 13. PSARC 2002/181 Swap Sets
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> zones-discuss mailing list
> zones-discuss at opensolaris dot org

--
--------------------------------------
Amol Chiplunkar
Bangalore, India
Phone : +91-80-56927774

http://www.sun.com
http://blogs.sun.com/chiplunkar
--------------------------------------
There are 10 types of people.
One who understand binary,
And the others who don't -- Anonymous

This e-mail message is for the sole use of the intended recipient(s) and the
information contained herein may be privileged, confidential, and protected
from disclosure. Any unauthorized review, use, disclosure, distribution or
copying is prohibited. If you think you have received this message in error,
please e-mail the sender and destroy all copies.

_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



gjelinek

Posts: 470
From: US

Registered: 3/9/05
Re: improved zones/RM integration
Posted: Jun 28, 2006 6:51 AM   in response to: amolchip

  Click to reply to this thread Reply

Amol,

Thanks for your comments. I have some responses in-line.

Amol A Chiplunkar wrote:
>
> These are very exciting features !!
>
> Some comments.
>
> If a dedicated-cpu (or eventually a dedicated-memory) resource is
> configured for the zone, then when the zone boots zoneadmd will create
> a temporary pool dedicated for the zones use.
>
> The temp pool created is going to show up in pooladm output, right ?
> If someone uses such a pool in a zone configuration done in traditional
> style. ( set pool=SUNWzone{zoneid} ) is zonecfg going to reject it ?
> If not, it won't be "dedicated" anymore.

That is a good point. We will make sure that zonecfg disallows that and I
will clarify that in the proposal. One thing I wanted to point out is that
the name of the pool/pset will actually change when you reboot the zone since
the zoneid will change at that time. There is no easy way to predict what
the name will be or if any particular name will exist, since the zoneid is
fairly dynamic. We are not going too far out of our way to try to prevent
you from using the temp. pool for something else but I will make sure zonecfg
doesn't allow that.

> Also, is there something that's going to stop poolbind that may move
> zones to and from temp pools to permanent pools ?
> What if a zone was created as a result of which a temp pool was created,
> and poolbind moves it to a permanent pool ? The temp pool is deleted ?

That is a good question. I need to think about what we should do there
but I am inclined to say that we should disallow that. Once you start
allowing stuff like that, then we are back to the problem of where the
configuration data is stored and managed and that is what we are trying to
avoid with the whole idea of the temporary pool.

> Resource controls in zonecfg will be simplified.
> Do you think we need prctl enhancements that will allow setting rctls
> using the new aliases directly ? that would be good for consistency.

That is another good idea. I'd like to keep that separate for now but
we'll keep it on our plate.

> Also you mention that the backword compatibility ensures the existing
> tools that parse zonecfg info / export output are unaffected. It's true
> to some extend. But some of them treat "unknown" resources as nop and
> display them as is. Which means some tools will show the new resources
> as unknown which is harmless but could be confusing sometimes.

We are continuing to add new resources. 'limitpriv' and 'bootargs' are
two recent ones. We won't stop adding new resources; that would be
too constraining, but we will continue to try to make sure we don't
break scripts that depend on resources that they know about.

> May be having a command line option such as zonecfg info -legacy that
> would
> suppress the new resources could help.

I think it would be better to plan for new resources. Otherwise, what would
legacy be? Just the resources that were in the original S10 release?

> We will enhance zoneadmd so that if the zone.cpu-shares rctl is set
> and FSS is not already the default scheduling class, zoneadmd will set
> the scheduling class to be FSS for processes in the zone.
>
> On the fly ? i.e. If a zone didn't have zone.cpu-shares when it was
> booted
> and someone did prctl to set it, zoneadmd will change the sched class
> of all
> processes in the zone to FSS ? That's cool.

That is not what we plan on doing. What we are planning is to set
FSS when the zone boots, if it has cpu-shares. These enhancements are
really targeted at people who don't know a lot about the existing RM
features. If you know enough to use prctl to do this kind of thing, then
we will expect you to be able to fully manage your system using all of
the existing features.

Thanks again for your comments,
Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



dp

Posts: 807
From: US

Registered: 3/9/05
Re: improved zones/RM integration
Posted: Jul 17, 2006 5:42 PM   in response to: gjelinek

  Click to reply to this thread Reply


Very belatedly, I'm just getting around to reviewing this. Overall
I think it looks good. Comments in-line.

> 1) "Hard" vs. "Soft" RM configuration within zonecfg
>
...
> dedicated-cpu
> ncpus (a positive integer or range, default value 1)
> importance (a positive integer, default value 1)
> max-lwps (an integer >= 100)

why >= 100? I can envision a minimized zone where this is too many.

> capped-cpu
> cpu-cap (a positive integer, default value 100 which
> represents 100% of one cpu)

I'm scared of this default. To put it another way, why did you pick
100? Should there be a value which represents infinity? What is
the meaning of specifying 0, or is that an error?

> max-lwps (an integer >= 100)
> cpu-shares (a positive integer)
> dedicated-memory
> TBD - once msets [12] are completed
> capped-memory
> cap (a positive decimal number with optional k, m, g,
> or t as a modifier, no modifier defaults to units
> of megabytes(m), must be at least 1m)

I think this set of rules is too complex and too confusing for users--
it's weird to have the default units be larger than the smallest
available units. Let's mandate that the user *always* specify units.

>
> 2) Temporary Pools.
>
...
> If a dedicated-cpu (or eventually a dedicated-memory) resource is
> configured for the zone, then when the zone boots zoneadmd will create
> a temporary pool dedicated for the zones use. Zoneadmd will
> dynamically create a pool & pset (or eventually a mset) and assign the
> number of cpus specified in zonecfg to that pset. The temporary pool
> & pset will be named 'SUNWzone{zoneid}'.

Could we somehow work the zone name into this? It would be nice for
e.g. poolstat(1) observability. Otherwise the user experience is going
to be all about trying to work out what 'SUNWzone34' maps to, which
seems poor.

> Zoneadmd will set the 'pset.min' and 'pset.max' pset properties, as
> well as the 'pool.importance' pool property, based on the values
> specified for dedicated-cpu's 'ncpus' and 'importance' properties
> in zonecfg.

Is importance mandatory? Will it have a default value? What values can
it have? What does it mean? Please be a little more specific.

> If the cpu (or memory) resources needed to create the temporary pool
> are unavailable, zoneadmd will issue an error and the zone won't boot.
>
> When the zone is halted, the temporary pool & pset will be destroyed.

What about during a reboot. It seems like it'd be good to not tear
down the temporary pool during reboot, but maybe that's hard. It would
to me seem weird if my pool was 2-4 CPUs, and I had 2, then rebooted and
had 4.

> We will add a new boolean property ('temporary') that can exist on
> pools and any resource set. The 'temporary' property indicates that
> the pool or resource set should never be committed to a static
> configuration (e.g. pooladm -s) and that it should never be destroyed
> when updating the dynamic configuration from a static configuration
> (e.g. pooladm -c). These temporary pools/resources can only be managed
> in the dynamic configuration. These changes will be implemented within
> libpool(3LIB).
>
> It is our expectation that most users will never need to manage
> temporary pools through the existing poolcfg(1M) commands. For users
> who need more sophisticated pool configuration and management, the
> existing 'pool' resource within zonecfg should be used and users
> should manually create a permanent pool using the existing mechanisms.
>
> 3) Resource controls in zonecfg will be simplified.
...
> Here are the aliases we will define for the rctls:
> alias rctl
> ----- ----
> max-lwps zone.max-lwps
> cpu-shares zone.cpu-shares
> cpu-cap zone.cpu-cap (future, once cpu-caps integrate)

You've mentioned that you will substitute in sort of "the right"
defaults for the privileged and action fields. It seems like you should
spell out what those will be...

alias rctl
--------------------------------------------------------------
cpu-shares=X zone.cpu-shares(privileged, X, none)
...

> If an rctl was already defined that did not match the expected value
> (e.g. it had 'action=none' or multiple values), then the 'max-lwps'
> alias will be disabled. An attempt to set 'max-lwps' within
> 'dedicated-cpu' would print the following error:
> "One or more incompatible rctls already exist for this
> property"
>
> This rctl alias enhancement is fully backward compatible with the
> existing rctl syntax. That is, zonecfg output will continue to display
> rctl settings in the current format (in addition to the new aliased
> format) and zonecfg will continue to accept the existing input syntax
> for setting rctls. This ensures full backward compatibility for any
> existing tools/scripts that parse zonecfg output or configure zones.

Maybe I missed it-- but what is the behavior of 'zonecfg export' going to be?

> 4) Enable rcapd to limit zone memory while running in the global zone
>
> Currently, to use rcapd(1M) to limit zone memory consumption, the
> rcapd process must be run within the zone. This exposes a loophole
> since the zone administrator, who might be untrusted, can change the
> rcapd limit.

Suggest rewording: "While useful in some configurations, in situations
where the zone administrator is untrusted, this is inneffective, since
the zone administrator could simply change the rcapd limit."

> We will enhance rcapd so that it can limit zone's memory consumption
> while it is running in the global zone. This closes the rcapd
> loophole and allows the global zone administrator to set memory
> caps that can be enforced by a single, trusted process.

Ditto on the rewording (basically, I think "loophole" is too vague).

> The rcapd limit for a zone will be configured using the new

Here you say "a zone"-- can you be precise? Does that include the
global zone?

> 'capped-memory' resource and 'cap' property within zonecfg.
> When a zone with 'capped-memory' boots, zoneadmd will automatically
> start rcapd in the global zone, if necessary. The interfaces to

Would it be better to say "enable the rcap service"?

> communicate memory cap information between zoneadmd and rcapd
> are project private.

At an architectural level, it'd be nice to summarize them; for example,
does one need to reboot the zone to get the new setting? Is there
any way to do online tuning of the value? Should this just be done
with SMF properties?

> As part of this overall project, we will be enhancing the internal
> rcapd rss accounting so that rcapd will have a more accurate
> measurement of the overall rss for each zone.

More detail would be appreciated.

> 5) Use FSS when zone.cpu-shares is set
>
> Although the zone.cpu-shares rctl can be set on a zone, the Fair Share
> Scheduler (FSS) is not the default scheduling class so this rctl
> frequently has no effect, unless the user also sets FSS as the
> default scheduler or changes the zones processes to use FSS with the
> priocntl(1M) command. This means that users can easily think
> they have configured their zone for a behavior that they are not
> actually getting.
>
> We will enhance zoneadmd so that if the zone.cpu-shares rctl is set
> and FSS is not already the default scheduling class, zoneadmd will set
> the scheduling class to be FSS for processes in the zone.

Just for that zone? This to me still seems confusing to users-- you
could have 3 zone with FSS on, and two without. How about *also* issuing a
warning at zone boot if FSS is not the machine-wide default.

Apropos my earlier (today) comment about dispadmin, should we have
some sort of 'dispadmin -d -do-it-now' option?

> 7) Pools system objective defaults to weighted-load (wt-load)[4]
>
> Currently pools are delivered with no objective set. This means that
> if you enable the poold(1M) service, nothing will actually happen on
> your system.
>
> As part of this project, we will set weighted load
> (system.poold.objectives=wt-load) to be the default objective.
> Delivering this objective as the default does not impact systems out
> of the box since poold is disabled by default.

What happens if you boot a zone which uses temporary pools, but pools
are not enabled? Should booting zones enable poold?

-dp

--
Daniel Price - Solaris Kernel Engineering - dp at eng dot sun dot com - blogs.sun.com/dp
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



gjelinek

Posts: 470
From: US

Registered: 3/9/05
Re: improved zones/RM integration
Posted: Jul 18, 2006 6:06 AM   in response to: dp

  Click to reply to this thread Reply

Dan,

Thanks for your detailed comments. My responses are in-line.

Dan Price wrote:
> Very belatedly, I'm just getting around to reviewing this. Overall
> I think it looks good. Comments in-line.
>
>> 1) "Hard" vs. "Soft" RM configuration within zonecfg
>>
> ...
>> dedicated-cpu
>> ncpus (a positive integer or range, default value 1)
>> importance (a positive integer, default value 1)
>> max-lwps (an integer >= 100)
>
> why >= 100? I can envision a minimized zone where this is too many.

I picked 100 since I had a hard time getting a zone to boot with much less.
Obviously this will vary somewhat depending on the services enabled.
Is 100 really a problem as a lower limit? Part of what we are trying to
do here is help the user configure a reasonable RM configuration, especially
if they don't know a lot about RM. Allowing them to set a limit which
prevents the zone from booting seems bad. However, we could also just let
them do that if 100 seems too high for some reason. Unfortunately, it is
hard to know in advance what exact number of threads will be needed to
boot the zone.

>> capped-cpu
>> cpu-cap (a positive integer, default value 100 which
>> represents 100% of one cpu)
>
> I'm scared of this default. To put it another way, why did you pick
> 100? Should there be a value which represents infinity? What is
> the meaning of specifying 0, or is that an error?

We don't have to have a default here I guess. I picked 100 because it seemed
to be symmetrical with the dedicated cpu case where the lower number is 1.
As far as the other values (infinity and 0) that should be covered by the
cpu-caps ARC case. I am not sure if Andrei has finished that case yet.

>> max-lwps (an integer >= 100)
>> cpu-shares (a positive integer)
>> dedicated-memory
>> TBD - once msets [12] are completed
>> capped-memory
>> cap (a positive decimal number with optional k, m, g,
>> or t as a modifier, no modifier defaults to units
>> of megabytes(m), must be at least 1m)
>
> I think this set of rules is too complex and too confusing for users--
> it's weird to have the default units be larger than the smallest
> available units. Let's mandate that the user *always* specify units.

OK.

>> 2) Temporary Pools.
>>
> ...
>> If a dedicated-cpu (or eventually a dedicated-memory) resource is
>> configured for the zone, then when the zone boots zoneadmd will create
>> a temporary pool dedicated for the zones use. Zoneadmd will
>> dynamically create a pool & pset (or eventually a mset) and assign the
>> number of cpus specified in zonecfg to that pset. The temporary pool
>> & pset will be named 'SUNWzone{zoneid}'.
>
> Could we somehow work the zone name into this? It would be nice for
> e.g. poolstat(1) observability. Otherwise the user experience is going
> to be all about trying to work out what 'SUNWzone34' maps to, which
> seems poor.

We need to have the name begin with SUNW or we could have collisions with
existing pools. I supposed instead of zone{id}, it could be SUNW{zonename}
although you lose the visibility that the pool is associated with a zone.
Maybe SUNWzone_{zonename}?

>> Zoneadmd will set the 'pset.min' and 'pset.max' pset properties, as
>> well as the 'pool.importance' pool property, based on the values
>> specified for dedicated-cpu's 'ncpus' and 'importance' properties
>> in zonecfg.
>
> Is importance mandatory? Will it have a default value? What values can
> it have? What does it mean? Please be a little more specific.

Yes, 1. I will add more details referring to the pools documentation on
importance.

>> If the cpu (or memory) resources needed to create the temporary pool
>> are unavailable, zoneadmd will issue an error and the zone won't boot.
>>
>> When the zone is halted, the temporary pool & pset will be destroyed.
>
> What about during a reboot. It seems like it'd be good to not tear
> down the temporary pool during reboot, but maybe that's hard. It would
> to me seem weird if my pool was 2-4 CPUs, and I had 2, then rebooted and
> had 4.

We won't destroy the pool on reboot, it is preserved. Although it is not
a big deal right now, it will become more of an issue when we have memory
sets, so I made sure the pool is preserved across reboot. I'll clarify that.

>> We will add a new boolean property ('temporary') that can exist on
>> pools and any resource set. The 'temporary' property indicates that
>> the pool or resource set should never be committed to a static
>> configuration (e.g. pooladm -s) and that it should never be destroyed
>> when updating the dynamic configuration from a static configuration
>> (e.g. pooladm -c). These temporary pools/resources can only be managed
>> in the dynamic configuration. These changes will be implemented within
>> libpool(3LIB).
>>
>> It is our expectation that most users will never need to manage
>> temporary pools through the existing poolcfg(1M) commands. For users
>> who need more sophisticated pool configuration and management, the
>> existing 'pool' resource within zonecfg should be used and users
>> should manually create a permanent pool using the existing mechanisms.
>>
>> 3) Resource controls in zonecfg will be simplified.
> ...
>> Here are the aliases we will define for the rctls:
>> alias rctl
>> ----- ----
>> max-lwps zone.max-lwps
>> cpu-shares zone.cpu-shares
>> cpu-cap zone.cpu-cap (future, once cpu-caps integrate)
>
> You've mentioned that you will substitute in sort of "the right"
> defaults for the privileged and action fields. It seems like you should
> spell out what those will be...

I'll add that.

> alias rctl
> --------------------------------------------------------------
> cpu-shares=X zone.cpu-shares(privileged, X, none)
> ...
>
>> If an rctl was already defined that did not match the expected value
>> (e.g. it had 'action=none' or multiple values), then the 'max-lwps'
>> alias will be disabled. An attempt to set 'max-lwps' within
>> 'dedicated-cpu' would print the following error:
>> "One or more incompatible rctls already exist for this
>> property"
>>
>> This rctl alias enhancement is fully backward compatible with the
>> existing rctl syntax. That is, zonecfg output will continue to display
>> rctl settings in the current format (in addition to the new aliased
>> format) and zonecfg will continue to accept the existing input syntax
>> for setting rctls. This ensures full backward compatibility for any
>> existing tools/scripts that parse zonecfg output or configure zones.
>
> Maybe I missed it-- but what is the behavior of 'zonecfg export' going to be?

No different than it is now. That is, we still export with the traditional
rctl syntax. I'll clarify that.

>> 4) Enable rcapd to limit zone memory while running in the global zone
>>
>> Currently, to use rcapd(1M) to limit zone memory consumption, the
>> rcapd process must be run within the zone. This exposes a loophole
>> since the zone administrator, who might be untrusted, can change the
>> rcapd limit.
>
> Suggest rewording: "While useful in some configurations, in situations
> where the zone administrator is untrusted, this is inneffective, since
> the zone administrator could simply change the rcapd limit."

I'll add that.

>> We will enhance rcapd so that it can limit zone's memory consumption
>> while it is running in the global zone. This closes the rcapd
>> loophole and allows the global zone administrator to set memory
>> caps that can be enforced by a single, trusted process.
>
> Ditto on the rewording (basically, I think "loophole" is too vague).

OK.

>> The rcapd limit for a zone will be configured using the new
>
> Here you say "a zone"-- can you be precise? Does that include the
> global zone?

I'll clarify. It is not currently for the GZ, but that could be a future
enhancement (i.e. make zonecfg manage some of the GZ too).

>> 'capped-memory' resource and 'cap' property within zonecfg.
>> When a zone with 'capped-memory' boots, zoneadmd will automatically
>> start rcapd in the global zone, if necessary. The interfaces to
>
> Would it be better to say "enable the rcap service"?

I'll change that.

>> communicate memory cap information between zoneadmd and rcapd
>> are project private.
>
> At an architectural level, it'd be nice to summarize them; for example,
> does one need to reboot the zone to get the new setting? Is there
> any way to do online tuning of the value? Should this just be done
> with SMF properties?

I'll clarify this.

>> As part of this overall project, we will be enhancing the internal
>> rcapd rss accounting so that rcapd will have a more accurate
>> measurement of the overall rss for each zone.
>
> More detail would be appreciated.

OK.

>> 5) Use FSS when zone.cpu-shares is set
>>
>> Although the zone.cpu-shares rctl can be set on a zone, the Fair Share
>> Scheduler (FSS) is not the default scheduling class so this rctl
>> frequently has no effect, unless the user also sets FSS as the
>> default scheduler or changes the zones processes to use FSS with the
>> priocntl(1M) command. This means that users can easily think
>> they have configured their zone for a behavior that they are not
>> actually getting.
>>
>> We will enhance zoneadmd so that if the zone.cpu-shares rctl is set
>> and FSS is not already the default scheduling class, zoneadmd will set
>> the scheduling class to be FSS for processes in the zone.
>
> Just for that zone? This to me still seems confusing to users-- you
> could have 3 zone with FSS on, and two without. How about *also* issuing a
> warning at zone boot if FSS is not the machine-wide default.

Yes, just for that zone. We will print the warning too. That did not
seem architectural, but I'll add that just to be clear.

> Apropos my earlier (today) comment about dispadmin, should we have
> some sort of 'dispadmin -d -do-it-now' option?

We could propose that or we could just enhance the docs to describe the
procedure. Why don't we chat about this later today.

>> 7) Pools system objective defaults to weighted-load (wt-load)[4]
>>
>> Currently pools are delivered with no objective set. This means that
>> if you enable the poold(1M) service, nothing will actually happen on
>> your system.
>>
>> As part of this project, we will set weighted load
>> (system.poold.objectives=wt-load) to be the default objective.
>> Delivering this objective as the default does not impact systems out
>> of the box since poold is disabled by default.
>
> What happens if you boot a zone which uses temporary pools, but pools
> are not enabled? Should booting zones enable poold?

Yes, pools will be enabled. I'll clarify that since it is one of the important
points about the whole proposal. That is, the right things will just happen
when you are using temp. pools so that you don't have to know and run all of
the extra RM commands.

Thanks again for all of your comments. I'll roll them into the proposal
along with the other comments I received.

Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



comay

Posts: 962
From: US

Registered: 3/9/05
Re: improved zones/RM integration
Posted: Jul 18, 2006 1:54 PM   in response to: gjelinek

  Click to reply to this thread Reply

>> Could we somehow work the zone name into this? It would be nice for
>> e.g. poolstat(1) observability. Otherwise the user experience is going
>> to be all about trying to work out what 'SUNWzone34' maps to, which
>> seems poor.
>
> We need to have the name begin with SUNW or we could have collisions with
> existing pools. I supposed instead of zone{id}, it could be SUNW{zonename}
> although you lose the visibility that the pool is associated with a zone.
> Maybe SUNWzone_{zonename}?

Or perhaps SUNWtemp_{zonename}?

dsc
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



gjelinek

Posts: 470
From: US

Registered: 3/9/05
Re: improved zones/RM integration
Posted: Aug 16, 2006 9:10 AM   in response to: gjelinek

  Click to reply to this thread Reply

I received a lot of good feedback from the community when I sent
out the first draft of the proposal for improved zones/RM integration.
Based on this feedback I have made several changes. Attached
is a new draft of the proposal we will be going forward with. The
major changes we made are:

* rctl aliases will only be top-level properties in zonecfg
* enhance zones to manage the global zones RM configuration
* new zonecfg clear subcommand and updated remove behavior
* new zonecfg scheduling-class property
* new options for rcapadm and rcapstat

The details of these changes are in the attached proposal. I also
added more details throughout the proposal.

Please send me any comments or questions,
Jerry

mgerdts

Posts: 1,262
From: US

Registered: 8/5/05
Re: improved zones/RM integration
Posted: Nov 2, 2006 6:16 AM   in response to: gjelinek

  Click to reply to this thread Reply

On 6/26/06, Gerald A. Jelinek <Gerald dot Jelinek at sun dot com> wrote:
> Attached is a description of a project we have been refining for
> a while now. The idea is to improve the integration of zones
> with some of the existing resource management features in Solaris.

In the proposal you say:

As part of this overall project, we will be enhancing the internal
rcapd rss accounting so that rcapd will have a more accurate
measurement of the overall rss for each zone.

Does this spill over to prstat such that there may finally be a fix for:

4754856 prstat -atJTZ should count shared segments only once

As I am looking forward to using zfs, I am trying to figure out how I
can tell how much memory is being used aside from the zfs buffer
cache.

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



gjelinek

Posts: 470
From: US

Registered: 3/9/05
Re: improved zones/RM integration
Posted: Nov 2, 2006 6:43 AM   in response to: mgerdts

  Click to reply to this thread Reply

Mike,

Mike Gerdts wrote:
> On 6/26/06, Gerald A. Jelinek <Gerald dot Jelinek at sun dot com> wrote:
>> Attached is a description of a project we have been refining for
>> a while now. The idea is to improve the integration of zones
>> with some of the existing resource management features in Solaris.
>
> In the proposal you say:
>
> As part of this overall project, we will be enhancing the internal
> rcapd rss accounting so that rcapd will have a more accurate
> measurement of the overall rss for each zone.
>
> Does this spill over to prstat such that there may finally be a fix for:
>
> 4754856 prstat -atJTZ should count shared segments only once

Yes, we are addressing this bug as part of this work. prstat will
be able to report an accurate rss number for processes, users, projects
and tasks as well as zones. prstat and rcapd will use the same,
new underlying rss counting code we have developed.

Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



jeffv

Posts: 409
From:

Registered: 6/16/05
Re: improved zones/RM integration
Posted: Nov 2, 2006 7:44 AM   in response to: gjelinek

  Click to reply to this thread Reply

Jerry Jelinek wrote:
> Mike,
>
> Mike Gerdts wrote:
>
>> In the proposal you say:
>>
>> As part of this overall project, we will be enhancing the internal
>> rcapd rss accounting so that rcapd will have a more accurate
>> measurement of the overall rss for each zone.
>>
>> Does this spill over to prstat such that there may finally be a fix for:
>>
>> 4754856 prstat -atJTZ should count shared segments only once
>
> Yes, we are addressing this bug as part of this work. prstat will
> be able to report an accurate rss number for processes, users, projects
> and tasks as well as zones. prstat and rcapd will use the same,
> new underlying rss counting code we have developed.

Just curious: which process(es) gets "billed" for shared text pages?


--------------------------------------------------------------------------
Jeff VICTOR Sun Microsystems jeff.victor @ sun.com
OS Ambassador Sr. Technical Specialist
Solaris 10 Zones FAQ: http://www.opensolaris.org/os/community/zones/faq
--------------------------------------------------------------------------
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org



gjelinek

Posts: 470
From: US

Registered: 3/9/05
Re: improved zones/RM integration
Posted: Nov 2, 2006 8:24 AM   in response to: jeffv

  Click to reply to this thread Reply

Jeff Victor wrote:
> Just curious: which process(es) gets "billed" for shared text pages?

Jeff,

We keep track of shared cow segments so they are not double counted.
Off the top of my head I think we just credit that to the first process
we see using the segment, however I'll let Steve chime in here since
he did the implementation of this piece.

Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org






Terms of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
Copyright © 1995-2005 Sun Microsystems, Inc.