OpenSolaris

Discussions Communities Projects Download Source Browser

Home » OpenSolaris Forums » zones » discuss

Thread: zones/SNAP design

Welcome, Guest Help
Login Login
Guest Settings Guest Settings
Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 8 - Last Post: Sep 3, 2008 12:56 PM by: gjelinek
gjelinek

Posts: 470
From: US

Registered: 3/9/05
zones/SNAP design
Posted: Aug 25, 2008 8:22 AM

  Click to reply to this thread Reply

Apologies for any duplicates you might receive. This
was sent to caiman-discuss, but for those who don't follow
that list, I wanted to send this out here.

Thanks,
Jerry

----

Zones/SNAP Design
8/25/2008

I. Overview

This specification describes how ZFS datasets will be used with zones for
supporting software management operations with ipkg branded zones on
OpenSolaris. Software management includes tasks such as a SNAP upgrade or
installing/removing pkgs.

Issues are summarized in Part III.

One goal is that sw management behavior within the zone should be as similar
as possible to the behavior in the global zone. That is, when the zone admin
does sw management, the tools running within the zone should be able to take a
snapshot of the zone root, clone it, and the zone admin should be able to
roll-back if the sw installation was problematic. This implies that the zone
root itself must be a delegated dataset. Up to now, zones does not support
this, so we must extend zones to provide this capability.

(Note that these software management features within a zone are not a
requirement for the 2008.11 release, however, this proposal lays the
groundwork to enable that capability moving forward.)

There are some issues with delegating the zone root dataset to the zone. The
way zones work, it is fundamental that the zone root be mounted in the global
zone so that the system can set up the chrooted environment when the zone
boots or when a process enters the zone. However, the ZFS mountpoint property
cannot be interpreted from the global zone when a dataset is delegated,
since this could lead to security problems.

(Note that the zone root is {zonepath}/root as seen in the global zone}

To address this, the zones code must be enhanced to explicitly manage the zone
root mounts. This naturally falls out of the basic design described here.

There were two alternative proposal considered; a two-level zone layout and a
single-level zone layout. After much discussion, the single-level approach
was chosen. The two-level approach is not described further here.

II. Description

To support the management of boot environments (BEs) for zones within both the
global zone context as well as the non-global zone context, a nested,
two-level, BE dataset naming scheme is used so that there is a top-level
global zone BE namespace, as well as a zone-level BE namespace for the zone's
own use. Properties on the zone's datasets are used to manage which dataset
is active and which datasets are associated with a specific global zone BE.

There are two properties on a zone's datasets that are used to manage the
datasets; "org.opensolaris.libbe:parentbe" and "org.opensolaris.libbe:active".

The "org.opensolaris.libbe:parentbe" property is set to the UUID of the global
zone BE that is active when the non-global zone BE is created. The
"org.opensolaris.libbe:active" property is set to "on" or "off" to indicate
that the dataset is the one that should be mounted.

We leave the "org.opensolaris.libbe:active" property set to "on" to
correspond with older, inactive global zone BEs, and use the combination of
this property, along with the matching "org.opensolaris.libbe:parentbe" to
determine which dataset to mount on the zone root. Thus, rolling back to an
earlier global zone BE would cause the last active zone BE to be mounted for
that global zone BE.

When a global zone BE is deleted, all corresponding zone-specific BEs with the
matching "org.opensolaris.libbe:parentbe" property should also be deleted.
When a non-global zone is deleted, all of zone-specific BEs with the
matching "org.opensolaris.libbe:parentbe" property for the current global zone
BE UUID should be deleted, however, any zone-specific BEs for other
global zone BEs must be preserved.

The following example illustrates the dataset layout and management for zone
z1 while the global zone is running BE1. The zones code must be enhanced to
automatically create these datasets when the zone is installed.

(For clarity, the global zone "GBE" string is used instead of the global zone
BE UUID in the following examples. Likewise, the "ZBE" string is used to for
the non-global zone BE name.)

The zone has zonepath: /export/zones/z1

The relevant dataset that exists before the zone is installed:
dataset mountpoint prop. zoned prop.
rpool/export/zones /export/zones off

The datasets that must be automatically created during zone installation:
dataset mountpoint prop. zoned prop.
rpool/export/zones/z1/rpool legacy on
rpool/export/zones/z1/rpool/ZBE1 legacy on

The equivalent commands to create these datasets:

# zfs create -o mountpoint=legacy -o zoned=on rpool/export/zones/z1/rpool
# zfs create -o org.opensolaris.libbe:active=on \
-o org.opensolaris.libbe:parentbe=GBE1 rpool/export/zones/z1/rpool/ZBE1

During the zone installation and after the zone is installed, the zone's ZBE1
dataset is explicitly mounted by the global zone onto the zone root (note, the
dataset is a ZFS legacy mount so zones infrastructure itself must manage the
mounting. It uses the dataset properties to determine which dataset to
mount, as described below.): e.g.

# mount -f zfs rpool/export/zones/z1/rpool/ZBE1 /export/zones/z1/root

The rpool dataset (and by default, its child datasets) will be implicitly
delegated to the zone. That is, the zonecfg for the zone does not need to
explicitly mention this as a delegated dataset. The zones code must be
enhanced to delegate this automatically:

rpool/export/zones/z1/rpool

Once the zone is booted, running a sw management operation within the zone
does the equivalent of the following sequence of commands:
1) Create the snapshot and clone
# zfs snapshot rpool/export/zones/z1/rpool/ZBE1@mysnap
# zfs clone rpool/export/zones/z1/rpool/ZBE1@mysnap \
rpool/export/zones/z1/rpool/ZBE2
2) Mount the clone and install sw into ZBE2
# mount -f zfs rpool/export/zones/z1/rpool/ZBE2 /a
3) Install sw
4) Finish
# unmount /a

Within the zone, the admin then makes the new BE active by the equivalent of
the following sequence of commands:

# zfs set org.opensolaris.libbe:active=off rpool/export/zones/z1/rpool/ZBE1
# zfs set org.opensolaris.libbe:active=on rpool/export/zones/z1/rpool/ZBE2

Note that these commands will not need to be explicitly performed by the
zone admin. Instead, a utility such as beadm does this work (see issue #2).

When the zone boots, the zones infrastructure code in the global zone will look
for the zone's dataset that has the "org.opensolaris.libbe:active" property set
to "on" and explicitly mount it on the zone root, as with the following
commands to mount the new BE based on the sw management task just performed
within the zone:

# umount /export/zones/z1/root
# mount -f zfs rpool/export/zones/z1/rpool/ZBE2 /export/zones/z1/root

Note that the global zone is still running GBE1 but the non-global zone is
now using its own ZBE2.

If there is more than one dataset with a matching
"org.opensolaris.libbe:parentbe" property and the
"org.opensolaris.libbe:active" property set to "on", the zone won't boot.
Likewise, if none of the datasets have this property set.

When global zone sw management takes place, the following will happen.

Only the active zone BE will be cloned. This is the equivalent of the
following commands:

# zfs snapshot -r rpool/export/zones/z1/ZBE2@mysnap
# zfs clone rpool/export/zones/z1/ZBE2@mysnap rpool/export/zones/z1/ZBE3

(Note that this is using the zone's ZBE2 dataset created in the previous
example to create a zone ZBE3 dataset, even though the global zone is
going from GBE1 to GBE2.)

When global zone BE is activated and the system reboots, the zone root must
be explicitly mounted by the zones code:

# mount -f zfs rpool/export/zones/z1/rpool/ZBE3 /export/zones/z1/root

Note that the global zone and non-global zone BE names move along independently
as sw management operations are performed in the global and non-global
zone and the different BEs are activated, again by the global and non-global
zone.

One concern with this design is that the zone has access to its datasets that
correspond to a global zone BE which is not active. The zone admin could
delete the zone's inactive BE datasets which are associated with a non-active
global zone BE, causing the zone to be unusable if the global zone boots back
to an earlier global BE.

One solution is for the global zone to turn off the "zoned" property on
the datasets that correspond to a non-active global zone BE. However, there
seems to be a bug in ZFS, since these datasets can still be mounted within
the zone. This is being looked at by the ZFS team. If necessary, we can work
around this by using a combination of a mountpoint along with turning off
the "canmount" property, although a ZFS fix is the preferred solution.

Another concern is that the zone must be able to promote one of its datasets
that is associated with a non-active global zone BE. This can occur if the
global zone boots back to one of its earlier BEs. This would then cause an
earlier non-global zone BE to become the active BE for that zone. If the zone
then wants to destroy one of its inactive zone BEs it needs to be able to
promote any children of that dataset. We must make sure that any restrictions
we use with the ZFS "zoned" attribute doesn't prevent this. This may require
an enhancement in ZFS itself.

III. Issues and Tasks:
-----------------------

1) We will not allow zones in the ROOT dataset. When installing a zone,
this will be validated and result in an error if the zonepath is under
the ROOT dataset.

If necessary, we can always relax this restriction later. This proposal
does not discuss a namespace or layout for zones in the ROOT dataset.

2) We need some sort of tool to manage BE activation within the zone. This
would automatically set the value for the ZFS "org.opensolaris.libbe:active"
property for the different datasets. This most likely would be an extension
to beadm to make it work inside of a zone. (Not required for 2008.11)

3) The sw management commands must be extended to be zone aware so that
they can create the correct snapshot and clone when they are running
inside the zone. (Not required for 2008.11)

4) Zones must always live in ZFS datasets. When installing a zone,
this will be validated and result in an error if the zonepath is not under
a dataset.

If necessary, we can always relax this restriction later. This proposal
does not discuss how to handle zones that do not have their own BE dataset.

5) We must either fix the zone dev to be part of the zone root and not
a separate mount or we need to manage it as its own dataset that is
cloned and mounted as part of the global zone BE management. Fixing this
implies some devfs changes. This is assumed in the example above.

6) Global zone BEs need a UUID.

7) The zones code needs some way to determine the active global zone BE UUID
for use in creating the zone datasets and for managing the explicit mounts.

_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org


stevelaw

Posts: 128
From:

Registered: 6/13/05
Re: zones/SNAP design
Posted: Aug 25, 2008 6:30 PM   in response to: gjelinek

  Click to reply to this thread Reply

> During the zone installation and after the zone is installed, the zone's ZBE1
> dataset is explicitly mounted by the global zone onto the zone root (note, the
> dataset is a ZFS legacy mount so zones infrastructure itself must manage the
> mounting. It uses the dataset properties to determine which dataset to
> mount, as described below.): e.g.
>
> # mount -f zfs rpool/export/zones/z1/rpool/ZBE1 /export/zones/z1/root
>
> The rpool dataset (and by default, its child datasets) will be implicitly
> delegated to the zone. That is, the zonecfg for the zone does not need to
> explicitly mention this as a delegated dataset. The zones code must be
> enhanced to delegate this automatically:

Is there any requirement to have a flag go disallow a zone from doing zfs/BE
operations? I'm not sure when an admin may want to make this restrction.

>
> rpool/export/zones/z1/rpool
>
> Once the zone is booted, running a sw management operation within the zone
> does the equivalent of the following sequence of commands:
> 1) Create the snapshot and clone
> # zfs snapshot rpool/export/zones/z1/rpool/ZBE1@mysnap
> # zfs clone rpool/export/zones/z1/rpool/ZBE1@mysnap \
> rpool/export/zones/z1/rpool/ZBE2
> 2) Mount the clone and install sw into ZBE2
> # mount -f zfs rpool/export/zones/z1/rpool/ZBE2 /a
> 3) Install sw
> 4) Finish
> # unmount /a
>
> Within the zone, the admin then makes the new BE active by the equivalent of
> the following sequence of commands:
>
> # zfs set org.opensolaris.libbe:active=off rpool/export/zones/z1/rpool/ZBE1
> # zfs set org.opensolaris.libbe:active=on rpool/export/zones/z1/rpool/ZBE2
>
> Note that these commands will not need to be explicitly performed by the
> zone admin. Instead, a utility such as beadm does this work (see issue #2).

Inside a zone, beadm should "fix" this.

>From the global zone, beadm should be able to "fix" a (halted?) zone in this
state so that it may be booted.

I think this means that the global zone should be able to do some explict
beadm operations on a zone (perhaps only when it is halted?), in addition
to the automatic ones that happen when the GBE is manipulated.

>
> When the zone boots, the zones infrastructure code in the global zone will look
> for the zone's dataset that has the "org.opensolaris.libbe:active" property set
> to "on" and explicitly mount it on the zone root, as with the following
> commands to mount the new BE based on the sw management task just performed
> within the zone:
>
> # umount /export/zones/z1/root
> # mount -f zfs rpool/export/zones/z1/rpool/ZBE2 /export/zones/z1/root
>
> Note that the global zone is still running GBE1 but the non-global zone is
> now using its own ZBE2.
>
> If there is more than one dataset with a matching
> "org.opensolaris.libbe:parentbe" property and the
> "org.opensolaris.libbe:active" property set to "on", the zone won't boot.
> Likewise, if none of the datasets have this property set.
>
> When global zone sw management takes place, the following will happen.
>
> Only the active zone BE will be cloned. This is the equivalent of the
> following commands:
>
> # zfs snapshot -r rpool/export/zones/z1/ZBE2@mysnap
> # zfs clone rpool/export/zones/z1/ZBE2@mysnap rpool/export/zones/z1/ZBE3
>
> (Note that this is using the zone's ZBE2 dataset created in the previous
> example to create a zone ZBE3 dataset, even though the global zone is
> going from GBE1 to GBE2.)
>
> When global zone BE is activated and the system reboots, the zone root must
> be explicitly mounted by the zones code:
>
> # mount -f zfs rpool/export/zones/z1/rpool/ZBE3 /export/zones/z1/root
>
> Note that the global zone and non-global zone BE names move along independently
> as sw management operations are performed in the global and non-global
> zone and the different BEs are activated, again by the global and non-global
> zone.
>
> One concern with this design is that the zone has access to its datasets that
> correspond to a global zone BE which is not active. The zone admin could
> delete the zone's inactive BE datasets which are associated with a non-active
> global zone BE, causing the zone to be unusable if the global zone boots back
> to an earlier global BE.
>
> One solution is for the global zone to turn off the "zoned" property on
> the datasets that correspond to a non-active global zone BE. However, there
> seems to be a bug in ZFS, since these datasets can still be mounted within
> the zone. This is being looked at by the ZFS team. If necessary, we can work
> around this by using a combination of a mountpoint along with turning off
> the "canmount" property, although a ZFS fix is the preferred solution.
>
> Another concern is that the zone must be able to promote one of its datasets
> that is associated with a non-active global zone BE. This can occur if the
> global zone boots back to one of its earlier BEs. This would then cause an
> earlier non-global zone BE to become the active BE for that zone. If the zone
> then wants to destroy one of its inactive zone BEs it needs to be able to
> promote any children of that dataset. We must make sure that any restrictions
> we use with the ZFS "zoned" attribute doesn't prevent this. This may require
> an enhancement in ZFS itself.

I think it would be generally useful if zfs had a "destroy and promote as
necessary" operation. Otherwise, this will just be re-implemented by various
higher level software in annoyingly different ways.

-Steve

_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org


gjelinek

Posts: 470
From: US

Registered: 3/9/05
Re: zones/SNAP design
Posted: Aug 26, 2008 6:15 AM   in response to: stevelaw

  Click to reply to this thread Reply

Steve,

Thanks for looking this over, responses in-line.

Steve Lawrence wrote:
>> During the zone installation and after the zone is installed, the zone's ZBE1
>> dataset is explicitly mounted by the global zone onto the zone root (note, the
>> dataset is a ZFS legacy mount so zones infrastructure itself must manage the
>> mounting. It uses the dataset properties to determine which dataset to
>> mount, as described below.): e.g.
>>
>> # mount -f zfs rpool/export/zones/z1/rpool/ZBE1 /export/zones/z1/root
>>
>> The rpool dataset (and by default, its child datasets) will be implicitly
>> delegated to the zone. That is, the zonecfg for the zone does not need to
>> explicitly mention this as a delegated dataset. The zones code must be
>> enhanced to delegate this automatically:
>
> Is there any requirement to have a flag go disallow a zone from doing zfs/BE
> operations? I'm not sure when an admin may want to make this restrction.

There has been no discussion about disallowing a zone from installing sw,
which is what I think you are asking for. Would you want that to
be a general new feature or specific to ipkg branded zones?

>> rpool/export/zones/z1/rpool
>>
>> Once the zone is booted, running a sw management operation within the zone
>> does the equivalent of the following sequence of commands:
>> 1) Create the snapshot and clone
>> # zfs snapshot rpool/export/zones/z1/rpool/ZBE1@mysnap
>> # zfs clone rpool/export/zones/z1/rpool/ZBE1@mysnap \
>> rpool/export/zones/z1/rpool/ZBE2
>> 2) Mount the clone and install sw into ZBE2
>> # mount -f zfs rpool/export/zones/z1/rpool/ZBE2 /a
>> 3) Install sw
>> 4) Finish
>> # unmount /a
>>
>> Within the zone, the admin then makes the new BE active by the equivalent of
>> the following sequence of commands:
>>
>> # zfs set org.opensolaris.libbe:active=off rpool/export/zones/z1/rpool/ZBE1
>> # zfs set org.opensolaris.libbe:active=on rpool/export/zones/z1/rpool/ZBE2
>>
>> Note that these commands will not need to be explicitly performed by the
>> zone admin. Instead, a utility such as beadm does this work (see issue #2).
>
> Inside a zone, beadm should "fix" this.

This is already noted here.

> From the global zone, beadm should be able to "fix" a (halted?) zone in this
> state so that it may be booted.

I am not sure that is possible, since there was deliberate
effort by a sysadmin to get the zone into this state, it might
be difficult for a tool to automatically undo this in a reliable way.
I don't see this as a requirement since you can always manually
undo whatever the sysadmin did to set up the properties incorrectly,
assuming you can figure out which ZBEs are which.

> I think this means that the global zone should be able to do some explict
> beadm operations on a zone (perhaps only when it is halted?), in addition
> to the automatic ones that happen when the GBE is manipulated.
>
>> When the zone boots, the zones infrastructure code in the global zone will look
>> for the zone's dataset that has the "org.opensolaris.libbe:active" property set
>> to "on" and explicitly mount it on the zone root, as with the following
>> commands to mount the new BE based on the sw management task just performed
>> within the zone:
>>
>> # umount /export/zones/z1/root
>> # mount -f zfs rpool/export/zones/z1/rpool/ZBE2 /export/zones/z1/root
>>
>> Note that the global zone is still running GBE1 but the non-global zone is
>> now using its own ZBE2.
>>
>> If there is more than one dataset with a matching
>> "org.opensolaris.libbe:parentbe" property and the
>> "org.opensolaris.libbe:active" property set to "on", the zone won't boot.
>> Likewise, if none of the datasets have this property set.
>>
>> When global zone sw management takes place, the following will happen.
>>
>> Only the active zone BE will be cloned. This is the equivalent of the
>> following commands:
>>
>> # zfs snapshot -r rpool/export/zones/z1/ZBE2@mysnap
>> # zfs clone rpool/export/zones/z1/ZBE2@mysnap rpool/export/zones/z1/ZBE3
>>
>> (Note that this is using the zone's ZBE2 dataset created in the previous
>> example to create a zone ZBE3 dataset, even though the global zone is
>> going from GBE1 to GBE2.)
>>
>> When global zone BE is activated and the system reboots, the zone root must
>> be explicitly mounted by the zones code:
>>
>> # mount -f zfs rpool/export/zones/z1/rpool/ZBE3 /export/zones/z1/root
>>
>> Note that the global zone and non-global zone BE names move along independently
>> as sw management operations are performed in the global and non-global
>> zone and the different BEs are activated, again by the global and non-global
>> zone.
>>
>> One concern with this design is that the zone has access to its datasets that
>> correspond to a global zone BE which is not active. The zone admin could
>> delete the zone's inactive BE datasets which are associated with a non-active
>> global zone BE, causing the zone to be unusable if the global zone boots back
>> to an earlier global BE.
>>
>> One solution is for the global zone to turn off the "zoned" property on
>> the datasets that correspond to a non-active global zone BE. However, there
>> seems to be a bug in ZFS, since these datasets can still be mounted within
>> the zone. This is being looked at by the ZFS team. If necessary, we can work
>> around this by using a combination of a mountpoint along with turning off
>> the "canmount" property, although a ZFS fix is the preferred solution.
>>
>> Another concern is that the zone must be able to promote one of its datasets
>> that is associated with a non-active global zone BE. This can occur if the
>> global zone boots back to one of its earlier BEs. This would then cause an
>> earlier non-global zone BE to become the active BE for that zone. If the zone
>> then wants to destroy one of its inactive zone BEs it needs to be able to
>> promote any children of that dataset. We must make sure that any restrictions
>> we use with the ZFS "zoned" attribute doesn't prevent this. This may require
>> an enhancement in ZFS itself.
>
> I think it would be generally useful if zfs had a "destroy and promote as
> necessary" operation. Otherwise, this will just be re-implemented by various
> higher level software in annoyingly different ways.

Yes, we may need some zfs enhancements here. We'll have to see as this
moves forward, but this sounds like a good idea.

Thanks again,
Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org


gfaden

Posts: 316
From: US

Registered: 4/14/06
Re: zones/SNAP design
Posted: Aug 26, 2008 12:27 PM   in response to: gjelinek

  Click to reply to this thread Reply

Jerry,

For Trusted Extensions in OpenSolaris, we plan to use a new "labeled"
brand for zones which will be derived from the new ipkg brand. So your
proposal also affects "labeled" branded zones. I would like to explore
the plan to delegate these datasets to the zone. Since the zone's root
dataset must be mounted in the global zone before the zone can be
booted, this is distinct from the normal dataset delegation, in which
the mount is done from within the zone. For TX there are some issues
with ensuring that these datasets are properly protected and labeled.
There may also be issues with encryption keys.

For TX we would like a way to associate a label with a dataset even if
it isn't mounted. Today we derive the label from the ZFS mount attribute
which corresponds to the zonepath defined in zonecfg. Given a zone name,
we can find the label in /etc/security/tsol/tnzonecfg. However, with
multiple datasets being delegated to zones, and the use of the legacy
mount setting, the link between the dataset and it's label becomes a bit
weaker. For greater assurance, I would like to store the label as a
dataset property, but this implies it must not be delegated to the zone
since labels are only managed from the global zone.

I'm not clear about the security policy of properties, such as
org.opensolaris.libbe:parentbe. Is the ability to modify these
properties granted when the dataset is delegated? Can there by
properties that don't get delegated?

There is tangential issue with respect to NFS sharing of directories
within delegated zone datasets. Currently there is a TX extension to
zoneadmd which allows per-zone shares to be interpreted when zones are
booted or halted. Since this code must execute in the global zone, the
dataset must be previously mounted in the global zone. Delegated
datasets that aren't mounted from the global zone can't be shared today.
This issue isn't completely relevant to your proposal, but I wanted to
mention it for clarity.

--Glenn


Jerry Jelinek wrote:
> Apologies for any duplicates you might receive. This
> was sent to caiman-discuss, but for those who don't follow
> that list, I wanted to send this out here.
>
> Thanks,
> Jerry
>
> ----
>
> Zones/SNAP Design
> 8/25/2008
>
> I. Overview
>
> This specification describes how ZFS datasets will be used with zones for
> supporting software management operations with ipkg branded zones on
> OpenSolaris. Software management includes tasks such as a SNAP upgrade or
> installing/removing pkgs.
>
> Issues are summarized in Part III.
>
> One goal is that sw management behavior within the zone should be as similar
> as possible to the behavior in the global zone. That is, when the zone admin
> does sw management, the tools running within the zone should be able to take a
> snapshot of the zone root, clone it, and the zone admin should be able to
> roll-back if the sw installation was problematic. This implies that the zone
> root itself must be a delegated dataset. Up to now, zones does not support
> this, so we must extend zones to provide this capability.
>
> (Note that these software management features within a zone are not a
> requirement for the 2008.11 release, however, this proposal lays the
> groundwork to enable that capability moving forward.)
>
> There are some issues with delegating the zone root dataset to the zone. The
> way zones work, it is fundamental that the zone root be mounted in the global
> zone so that the system can set up the chrooted environment when the zone
> boots or when a process enters the zone. However, the ZFS mountpoint property
> cannot be interpreted from the global zone when a dataset is delegated,
> since this could lead to security problems.
>
> (Note that the zone root is {zonepath}/root as seen in the global zone}
>
> To address this, the zones code must be enhanced to explicitly manage the zone
> root mounts. This naturally falls out of the basic design described here.
>
> There were two alternative proposal considered; a two-level zone layout and a
> single-level zone layout. After much discussion, the single-level approach
> was chosen. The two-level approach is not described further here.
>
> II. Description
>
> To support the management of boot environments (BEs) for zones within both the
> global zone context as well as the non-global zone context, a nested,
> two-level, BE dataset naming scheme is used so that there is a top-level
> global zone BE namespace, as well as a zone-level BE namespace for the zone's
> own use. Properties on the zone's datasets are used to manage which dataset
> is active and which datasets are associated with a specific global zone BE.
>
> There are two properties on a zone's datasets that are used to manage the
> datasets; "org.opensolaris.libbe:parentbe" and "org.opensolaris.libbe:active".
>
> The "org.opensolaris.libbe:parentbe" property is set to the UUID of the global
> zone BE that is active when the non-global zone BE is created. The
> "org.opensolaris.libbe:active" property is set to "on" or "off" to indicate
> that the dataset is the one that should be mounted.
>
> We leave the "org.opensolaris.libbe:active" property set to "on" to
> correspond with older, inactive global zone BEs, and use the combination of
> this property, along with the matching "org.opensolaris.libbe:parentbe" to
> determine which dataset to mount on the zone root. Thus, rolling back to an
> earlier global zone BE would cause the last active zone BE to be mounted for
> that global zone BE.
>
> When a global zone BE is deleted, all corresponding zone-specific BEs with the
> matching "org.opensolaris.libbe:parentbe" property should also be deleted.
> When a non-global zone is deleted, all of zone-specific BEs with the
> matching "org.opensolaris.libbe:parentbe" property for the current global zone
> BE UUID should be deleted, however, any zone-specific BEs for other
> global zone BEs must be preserved.
>
> The following example illustrates the dataset layout and management for zone
> z1 while the global zone is running BE1. The zones code must be enhanced to
> automatically create these datasets when the zone is installed.
>
> (For clarity, the global zone "GBE" string is used instead of the global zone
> BE UUID in the following examples. Likewise, the "ZBE" string is used to for
> the non-global zone BE name.)
>
> The zone has zonepath: /export/zones/z1
>
> The relevant dataset that exists before the zone is installed:
> dataset mountpoint prop. zoned prop.
> rpool/export/zones /export/zones off
>
> The datasets that must be automatically created during zone installation:
> dataset mountpoint prop. zoned prop.
> rpool/export/zones/z1/rpool legacy on
> rpool/export/zones/z1/rpool/ZBE1 legacy on
>
> The equivalent commands to create these datasets:
>
> # zfs create -o mountpoint=legacy -o zoned=on rpool/export/zones/z1/rpool
> # zfs create -o org.opensolaris.libbe:active=on \
> -o org.opensolaris.libbe:parentbe=GBE1 rpool/export/zones/z1/rpool/ZBE1
>
> During the zone installation and after the zone is installed, the zone's ZBE1
> dataset is explicitly mounted by the global zone onto the zone root (note, the
> dataset is a ZFS legacy mount so zones infrastructure itself must manage the
> mounting. It uses the dataset properties to determine which dataset to
> mount, as described below.): e.g.
>
> # mount -f zfs rpool/export/zones/z1/rpool/ZBE1 /export/zones/z1/root
>
> The rpool dataset (and by default, its child datasets) will be implicitly
> delegated to the zone. That is, the zonecfg for the zone does not need to
> explicitly mention this as a delegated dataset. The zones code must be
> enhanced to delegate this automatically:
>
> rpool/export/zones/z1/rpool
>
> Once the zone is booted, running a sw management operation within the zone
> does the equivalent of the following sequence of commands:
> 1) Create the snapshot and clone
> # zfs snapshot rpool/export/zones/z1/rpool/ZBE1@mysnap
> # zfs clone rpool/export/zones/z1/rpool/ZBE1@mysnap \
> rpool/export/zones/z1/rpool/ZBE2
> 2) Mount the clone and install sw into ZBE2
> # mount -f zfs rpool/export/zones/z1/rpool/ZBE2 /a
> 3) Install sw
> 4) Finish
> # unmount /a
>
> Within the zone, the admin then makes the new BE active by the equivalent of
> the following sequence of commands:
>
> # zfs set org.opensolaris.libbe:active=off rpool/export/zones/z1/rpool/ZBE1
> # zfs set org.opensolaris.libbe:active=on rpool/export/zones/z1/rpool/ZBE2
>
> Note that these commands will not need to be explicitly performed by the
> zone admin. Instead, a utility such as beadm does this work (see issue #2).
>
> When the zone boots, the zones infrastructure code in the global zone will look
> for the zone's dataset that has the "org.opensolaris.libbe:active" property set
> to "on" and explicitly mount it on the zone root, as with the following
> commands to mount the new BE based on the sw management task just performed
> within the zone:
>
> # umount /export/zones/z1/root
> # mount -f zfs rpool/export/zones/z1/rpool/ZBE2 /export/zones/z1/root
>
> Note that the global zone is still running GBE1 but the non-global zone is
> now using its own ZBE2.
>
> If there is more than one dataset with a matching
> "org.opensolaris.libbe:parentbe" property and the
> "org.opensolaris.libbe:active" property set to "on", the zone won't boot.
> Likewise, if none of the datasets have this property set.
>
> When global zone sw management takes place, the following will happen.
>
> Only the active zone BE will be cloned. This is the equivalent of the
> following commands:
>
> # zfs snapshot -r rpool/export/zones/z1/ZBE2@mysnap
> # zfs clone rpool/export/zones/z1/ZBE2@mysnap rpool/export/zones/z1/ZBE3
>
> (Note that this is using the zone's ZBE2 dataset created in the previous
> example to create a zone ZBE3 dataset, even though the global zone is
> going from GBE1 to GBE2.)
>
> When global zone BE is activated and the system reboots, the zone root must
> be explicitly mounted by the zones code:
>
> # mount -f zfs rpool/export/zones/z1/rpool/ZBE3 /export/zones/z1/root
>
> Note that the global zone and non-global zone BE names move along independently
> as sw management operations are performed in the global and non-global
> zone and the different BEs are activated, again by the global and non-global
> zone.
>
> One concern with this design is that the zone has access to its datasets that
> correspond to a global zone BE which is not active. The zone admin could
> delete the zone's inactive BE datasets which are associated with a non-active
> global zone BE, causing the zone to be unusable if the global zone boots back
> to an earlier global BE.
>
> One solution is for the global zone to turn off the "zoned" property on
> the datasets that correspond to a non-active global zone BE. However, there
> seems to be a bug in ZFS, since these datasets can still be mounted within
> the zone. This is being looked at by the ZFS team. If necessary, we can work
> around this by using a combination of a mountpoint along with turning off
> the "canmount" property, although a ZFS fix is the preferred solution.
>
> Another concern is that the zone must be able to promote one of its datasets
> that is associated with a non-active global zone BE. This can occur if the
> global zone boots back to one of its earlier BEs. This would then cause an
> earlier non-global zone BE to become the active BE for that zone. If the zone
> then wants to destroy one of its inactive zone BEs it needs to be able to
> promote any children of that dataset. We must make sure that any restrictions
> we use with the ZFS "zoned" attribute doesn't prevent this. This may require
> an enhancement in ZFS itself.
>
> III. Issues and Tasks:
> -----------------------
>
> 1) We will not allow zones in the ROOT dataset. When installing a zone,
> this will be validated and result in an error if the zonepath is under
> the ROOT dataset.
>
> If necessary, we can always relax this restriction later. This proposal
> does not discuss a namespace or layout for zones in the ROOT dataset.
>
> 2) We need some sort of tool to manage BE activation within the zone. This
> would automatically set the value for the ZFS "org.opensolaris.libbe:active"
> property for the different datasets. This most likely would be an extension
> to beadm to make it work inside of a zone. (Not required for 2008.11)
>
> 3) The sw management commands must be extended to be zone aware so that
> they can create the correct snapshot and clone when they are running
> inside the zone. (Not required for 2008.11)
>
> 4) Zones must always live in ZFS datasets. When installing a zone,
> this will be validated and result in an error if the zonepath is not under
> a dataset.
>
> If necessary, we can always relax this restriction later. This proposal
> does not discuss how to handle zones that do not have their own BE dataset.
>
> 5) We must either fix the zone dev to be part of the zone root and not
> a separate mount or we need to manage it as its own dataset that is
> cloned and mounted as part of the global zone BE management. Fixing this
> implies some devfs changes. This is assumed in the example above.
>
> 6) Global zone BEs need a UUID.
>
> 7) The zones code needs some way to determine the active global zone BE UUID
> for use in creating the zone datasets and for managing the explicit mounts.
>
> _______________________________________________
> zones-discuss mailing list
> zones-discuss at opensolaris dot org
>

_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org


gjelinek

Posts: 470
From: US

Registered: 3/9/05
Re: zones/SNAP design
Posted: Aug 26, 2008 2:02 PM   in response to: gfaden

  Click to reply to this thread Reply

Glenn,

Glenn Faden wrote:
> Jerry,
>
> For Trusted Extensions in OpenSolaris, we plan to use a new "labeled"
> brand for zones which will be derived from the new ipkg brand. So your
> proposal also affects "labeled" branded zones. I would like to explore
> the plan to delegate these datasets to the zone.

OK. This document is currently what we have. Let me know if you
have questions about it. We'll be working on implementing toward
this over the next few months. Also, it is worth noting that the
"ipkg" brand is pretty fluid at this point and I would expect to
continue to see a fair number of changes for a while.

> Since the zone's root
> dataset must be mounted in the global zone before the zone can be
> booted, this is distinct from the normal dataset delegation, in which
> the mount is done from within the zone. For TX there are some issues
> with ensuring that these datasets are properly protected and labeled.
> There may also be issues with encryption keys.
>
> For TX we would like a way to associate a label with a dataset even if
> it isn't mounted. Today we derive the label from the ZFS mount attribute
> which corresponds to the zonepath defined in zonecfg. Given a zone name,
> we can find the label in /etc/security/tsol/tnzonecfg. However, with
> multiple datasets being delegated to zones, and the use of the legacy
> mount setting, the link between the dataset and it's label becomes a bit
> weaker. For greater assurance, I would like to store the label as a
> dataset property, but this implies it must not be delegated to the zone
> since labels are only managed from the global zone.

Currently, with user defined properties, the values can be changed
inside the zone. The proposal actually depends on this feature.
It sounds like you might need to talk to the zfs team about some
enhancements here so we can have properties that cannot be changed
on delegated datasets. That might be generally useful beyond tx, so
that would be a good conversation to have.

> I'm not clear about the security policy of properties, such as
> org.opensolaris.libbe:parentbe. Is the ability to modify these
> properties granted when the dataset is delegated? Can there by
> properties that don't get delegated?
>
> There is tangential issue with respect to NFS sharing of directories
> within delegated zone datasets. Currently there is a TX extension to
> zoneadmd which allows per-zone shares to be interpreted when zones are
> booted or halted. Since this code must execute in the global zone, the
> dataset must be previously mounted in the global zone. Delegated
> datasets that aren't mounted from the global zone can't be shared today.
> This issue isn't completely relevant to your proposal, but I wanted to
> mention it for clarity.

Most of this is just on paper now, so we can make enhancements as
it is implemented, as long as the basic design is sound.

Thanks,
Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org


darrenm

Posts: 3,793
From: GB

Registered: 3/9/05
Re: zones/SNAP design
Posted: Aug 27, 2008 2:04 AM   in response to: gjelinek

  Click to reply to this thread Reply

Jerry Jelinek wrote:
> Currently, with user defined properties, the values can be changed
> inside the zone. The proposal actually depends on this feature.
> It sounds like you might need to talk to the zfs team about some
> enhancements here so we can have properties that cannot be changed
> on delegated datasets. That might be generally useful beyond tx, so
> that would be a good conversation to have.

The label should not be done as a userproperty but a "real" property.
Doing that means that we should be able to control wither or not it is
delegated to a zone or not. I'll see if I can whip up a quick
prototype of this part - I've got a good amount of experience with ZFS
properties from the ZFS crypto project.

--
Darren J Moffat
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org


gjelinek

Posts: 470
From: US

Registered: 3/9/05
Re: zones/SNAP design
Posted: Aug 27, 2008 6:16 AM   in response to: darrenm

  Click to reply to this thread Reply

Darren J Moffat wrote:
> Jerry Jelinek wrote:
>> Currently, with user defined properties, the values can be changed
>> inside the zone. The proposal actually depends on this feature.
>> It sounds like you might need to talk to the zfs team about some
>> enhancements here so we can have properties that cannot be changed
>> on delegated datasets. That might be generally useful beyond tx, so
>> that would be a good conversation to have.
>
> The label should not be done as a userproperty but a "real" property.
> Doing that means that we should be able to control wither or not it is
> delegated to a zone or not. I'll see if I can whip up a quick
> prototype of this part - I've got a good amount of experience with ZFS
> properties from the ZFS crypto project.

Darren,

Yes, a real property gives you this control. I agree that for
this problem, its a better soution.

Thanks,
Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org


edp

Posts: 599
From: US

Registered: 3/9/05
Re: [zones-discuss] zones/SNAP design
Posted: Sep 3, 2008 12:32 PM   in response to: gjelinek

  Click to reply to this thread Reply

hey jerry,
nice proposal.
sorry i didn't reply to this earlier, i managed to miss it last
week. :(
some comments are below.

- you proposed:
The "org.opensolaris.libbe:parentbe" property is set to the
UUID of the global zone BE that is active when the non-global
zone BE is created.
when you first mention this it might make sense to say that no
such UUID currently exists (i know you say that later, but i
immediatly started thinking of the existing zone UUIDs).

also, at some point we had talked about having some core-bits
versioning that would restrict which packages a zone can update.
wouldn't it make more sense to have this propety set to that
identifier? this way a gobal zone BE can boot any non-global
zone BE which has matching core bits. this is less restrictive
that a UUID. (also, if some stuff in the global zone is updated,
but the core-bits don't change there is no reason to snapshot
all the non-global zones.)

- as mentioned in other emails, i agree that ROOT is a better name
than rpool. also, i think that since we snapshotting all zone
filesystem, the zone realy needs a default dataset that is not
contained within ROOT. (ethan mentioned this in other mails.)
i don't really have a strong opinion on the name (export, share,
common, etc) or the location (../z1/share or .../z1/data/share)
but i think this proposal is incomplete without specifying a
default common location for zone admins to place data that spans
all zone BEs.

- you said:
One concern with this design is that the zone has access to its
datasets that correspond to a global zone BE which is not active.

fixing bugs in zfs aside, i think that this issue could be addressed
by telling users not to maniuplate filesystem in ROOT directly with
the zfs command. instead these filesystems will always be manipulated
indirectly via tools like beadm and pkg, which can detect and prevent
this situation (and other invalid or inconsistent configurations).
this would also addresses the second issue of providing an interface
for promoting non-active BEs for zones.

- wrt a zones /dev filesystem, in nevada <zonepath>/root/dev is
not a re-mount of <zonepath>/dev. instead the devnames
filesystem is mounted on <zonepath>/root/dev and <zonepath>/dev
is only used as a an attribute backing store by the devnames
filesystem. this was done to allow upgrades from s10 where
devanems doesn't exist. given that currently there is no
upgrade path from s10 to opensolaris, <zonepath>/dev really
should have just been eliminated from the first opensolaris
release. all this would taken was removing:
opt="attrdir=%R/dev"
from
/usr/lib/brand/ipkg/platform.xml

of course doing that now would break upgrades from 2008.05 to
some future version of opensolaris that changes this behavior.
so we'll probably need special code to handle this upgrade.

- another issue wrt /dev and multiple BEs is that devnames
saves attributes in the backing store in a passthrough
manner. this means that items in the underlying filesystems
are actualy device nodes. so imagine the following bad
scenario:
zonecfg specified access to usb devices /dev/usb/*
zone is snapshotted and cloned
zonecfg is changed to remove access to usb devices /dev/usb/*
inside the zone, the old zone BE (and it's /dev attribute
backing store) are still accessible, so the raw
usb device nodes are still accessible in the zone.
i can't really think of a good way to protect against this other
than to have the /dev attribute store live outside of any zoned
filesystems. so perhaps something like:
<zonepath>/DEV/ZBE1
where none of the above filesystems are zoned and where there is
a 1:1 mapping between snapshots/clones of the DEV/XXX
attribute backing stores and the zone boot environments.

ed

On Mon, Aug 25, 2008 at 09:22:42AM -0600, Jerry Jelinek wrote:
> Apologies for any duplicates you might receive. This
> was sent to caiman-discuss, but for those who don't follow
> that list, I wanted to send this out here.
>
> Thanks,
> Jerry
>
> ----
>
> Zones/SNAP Design
> 8/25/2008
>
> I. Overview
>
> This specification describes how ZFS datasets will be used with zones for
> supporting software management operations with ipkg branded zones on
> OpenSolaris. Software management includes tasks such as a SNAP upgrade or
> installing/removing pkgs.
>
> Issues are summarized in Part III.
>
> One goal is that sw management behavior within the zone should be as similar
> as possible to the behavior in the global zone. That is, when the zone admin
> does sw management, the tools running within the zone should be able to take a
> snapshot of the zone root, clone it, and the zone admin should be able to
> roll-back if the sw installation was problematic. This implies that the zone
> root itself must be a delegated dataset. Up to now, zones does not support
> this, so we must extend zones to provide this capability.
>
> (Note that these software management features within a zone are not a
> requirement for the 2008.11 release, however, this proposal lays the
> groundwork to enable that capability moving forward.)
>
> There are some issues with delegating the zone root dataset to the zone. The
> way zones work, it is fundamental that the zone root be mounted in the global
> zone so that the system can set up the chrooted environment when the zone
> boots or when a process enters the zone. However, the ZFS mountpoint property
> cannot be interpreted from the global zone when a dataset is delegated,
> since this could lead to security problems.
>
> (Note that the zone root is {zonepath}/root as seen in the global zone}
>
> To address this, the zones code must be enhanced to explicitly manage the zone
> root mounts. This naturally falls out of the basic design described here.
>
> There were two alternative proposal considered; a two-level zone layout and a
> single-level zone layout. After much discussion, the single-level approach
> was chosen. The two-level approach is not described further here.
>
> II. Description
>
> To support the management of boot environments (BEs) for zones within both the
> global zone context as well as the non-global zone context, a nested,
> two-level, BE dataset naming scheme is used so that there is a top-level
> global zone BE namespace, as well as a zone-level BE namespace for the zone's
> own use. Properties on the zone's datasets are used to manage which dataset
> is active and which datasets are associated with a specific global zone BE.
>
> There are two properties on a zone's datasets that are used to manage the
> datasets; "org.opensolaris.libbe:parentbe" and "org.opensolaris.libbe:active".
>
> The "org.opensolaris.libbe:parentbe" property is set to the UUID of the global
> zone BE that is active when the non-global zone BE is created. The
> "org.opensolaris.libbe:active" property is set to "on" or "off" to indicate
> that the dataset is the one that should be mounted.
>
> We leave the "org.opensolaris.libbe:active" property set to "on" to
> correspond with older, inactive global zone BEs, and use the combination of
> this property, along with the matching "org.opensolaris.libbe:parentbe" to
> determine which dataset to mount on the zone root. Thus, rolling back to an
> earlier global zone BE would cause the last active zone BE to be mounted for
> that global zone BE.
>
> When a global zone BE is deleted, all corresponding zone-specific BEs with the
> matching "org.opensolaris.libbe:parentbe" property should also be deleted.
> When a non-global zone is deleted, all of zone-specific BEs with the
> matching "org.opensolaris.libbe:parentbe" property for the current global zone
> BE UUID should be deleted, however, any zone-specific BEs for other
> global zone BEs must be preserved.
>
> The following example illustrates the dataset layout and management for zone
> z1 while the global zone is running BE1. The zones code must be enhanced to
> automatically create these datasets when the zone is installed.
>
> (For clarity, the global zone "GBE" string is used instead of the global zone
> BE UUID in the following examples. Likewise, the "ZBE" string is used to for
> the non-global zone BE name.)
>
> The zone has zonepath: /export/zones/z1
>
> The relevant dataset that exists before the zone is installed:
> dataset mountpoint prop. zoned prop.
> rpool/export/zones /export/zones off
>
> The datasets that must be automatically created during zone installation:
> dataset mountpoint prop. zoned prop.
> rpool/export/zones/z1/rpool legacy on
> rpool/export/zones/z1/rpool/ZBE1 legacy on
>
> The equivalent commands to create these datasets:
>
> # zfs create -o mountpoint=legacy -o zoned=on rpool/export/zones/z1/rpool
> # zfs create -o org.opensolaris.libbe:active=on \
> -o org.opensolaris.libbe:parentbe=GBE1 rpool/export/zones/z1/rpool/ZBE1
>
> During the zone installation and after the zone is installed, the zone's ZBE1
> dataset is explicitly mounted by the global zone onto the zone root (note, the
> dataset is a ZFS legacy mount so zones infrastructure itself must manage the
> mounting. It uses the dataset properties to determine which dataset to
> mount, as described below.): e.g.
>
> # mount -f zfs rpool/export/zones/z1/rpool/ZBE1 /export/zones/z1/root
>
> The rpool dataset (and by default, its child datasets) will be implicitly
> delegated to the zone. That is, the zonecfg for the zone does not need to
> explicitly mention this as a delegated dataset. The zones code must be
> enhanced to delegate this automatically:
>
> rpool/export/zones/z1/rpool
>
> Once the zone is booted, running a sw management operation within the zone
> does the equivalent of the following sequence of commands:
> 1) Create the snapshot and clone
> # zfs snapshot rpool/export/zones/z1/rpool/ZBE1@mysnap
> # zfs clone rpool/export/zones/z1/rpool/ZBE1@mysnap \
> rpool/export/zones/z1/rpool/ZBE2
> 2) Mount the clone and install sw into ZBE2
> # mount -f zfs rpool/export/zones/z1/rpool/ZBE2 /a
> 3) Install sw
> 4) Finish
> # unmount /a
>
> Within the zone, the admin then makes the new BE active by the equivalent of
> the following sequence of commands:
>
> # zfs set org.opensolaris.libbe:active=off rpool/export/zones/z1/rpool/ZBE1
> # zfs set org.opensolaris.libbe:active=on rpool/export/zones/z1/rpool/ZBE2
>
> Note that these commands will not need to be explicitly performed by the
> zone admin. Instead, a utility such as beadm does this work (see issue #2).
>
> When the zone boots, the zones infrastructure code in the global zone will look
> for the zone's dataset that has the "org.opensolaris.libbe:active" property set
> to "on" and explicitly mount it on the zone root, as with the following
> commands to mount the new BE based on the sw management task just performed
> within the zone:
>
> # umount /export/zones/z1/root
> # mount -f zfs rpool/export/zones/z1/rpool/ZBE2 /export/zones/z1/root
>
> Note that the global zone is still running GBE1 but the non-global zone is
> now using its own ZBE2.
>
> If there is more than one dataset with a matching
> "org.opensolaris.libbe:parentbe" property and the
> "org.opensolaris.libbe:active" property set to "on", the zone won't boot.
> Likewise, if none of the datasets have this property set.
>
> When global zone sw management takes place, the following will happen.
>
> Only the active zone BE will be cloned. This is the equivalent of the
> following commands:
>
> # zfs snapshot -r rpool/export/zones/z1/ZBE2@mysnap
> # zfs clone rpool/export/zones/z1/ZBE2@mysnap rpool/export/zones/z1/ZBE3
>
> (Note that this is using the zone's ZBE2 dataset created in the previous
> example to create a zone ZBE3 dataset, even though the global zone is
> going from GBE1 to GBE2.)
>
> When global zone BE is activated and the system reboots, the zone root must
> be explicitly mounted by the zones code:
>
> # mount -f zfs rpool/export/zones/z1/rpool/ZBE3 /export/zones/z1/root
>
> Note that the global zone and non-global zone BE names move along independently
> as sw management operations are performed in the global and non-global
> zone and the different BEs are activated, again by the global and non-global
> zone.
>
> One concern with this design is that the zone has access to its datasets that
> correspond to a global zone BE which is not active. The zone admin could
> delete the zone's inactive BE datasets which are associated with a non-active
> global zone BE, causing the zone to be unusable if the global zone boots back
> to an earlier global BE.
>
> One solution is for the global zone to turn off the "zoned" property on
> the datasets that correspond to a non-active global zone BE. However, there
> seems to be a bug in ZFS, since these datasets can still be mounted within
> the zone. This is being looked at by the ZFS team. If necessary, we can work
> around this by using a combination of a mountpoint along with turning off
> the "canmount" property, although a ZFS fix is the preferred solution.
>
> Another concern is that the zone must be able to promote one of its datasets
> that is associated with a non-active global zone BE. This can occur if the
> global zone boots back to one of its earlier BEs. This would then cause an
> earlier non-global zone BE to become the active BE for that zone. If the zone
> then wants to destroy one of its inactive zone BEs it needs to be able to
> promote any children of that dataset. We must make sure that any restrictions
> we use with the ZFS "zoned" attribute doesn't prevent this. This may require
> an enhancement in ZFS itself.
>
> III. Issues and Tasks:
> -----------------------
>
> 1) We will not allow zones in the ROOT dataset. When installing a zone,
> this will be validated and result in an error if the zonepath is under
> the ROOT dataset.
>
> If necessary, we can always relax this restriction later. This proposal
> does not discuss a namespace or layout for zones in the ROOT dataset.
>
> 2) We need some sort of tool to manage BE activation within the zone. This
> would automatically set the value for the ZFS "org.opensolaris.libbe:active"
> property for the different datasets. This most likely would be an extension
> to beadm to make it work inside of a zone. (Not required for 2008.11)
>
> 3) The sw management commands must be extended to be zone aware so that
> they can create the correct snapshot and clone when they are running
> inside the zone. (Not required for 2008.11)
>
> 4) Zones must always live in ZFS datasets. When installing a zone,
> this will be validated and result in an error if the zonepath is not under
> a dataset.
>
> If necessary, we can always relax this restriction later. This proposal
> does not discuss how to handle zones that do not have their own BE dataset.
>
> 5) We must either fix the zone dev to be part of the zone root and not
> a separate mount or we need to manage it as its own dataset that is
> cloned and mounted as part of the global zone BE management. Fixing this
> implies some devfs changes. This is assumed in the example above.
>
> 6) Global zone BEs need a UUID.
>
> 7) The zones code needs some way to determine the active global zone BE UUID
> for use in creating the zone datasets and for managing the explicit mounts.
>
> _______________________________________________
> zones-discuss mailing list
> zones-discuss at opensolaris dot org
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org


gjelinek

Posts: 470
From: US

Registered: 3/9/05
Re: [zones-discuss] zones/SNAP design
Posted: Sep 3, 2008 12:56 PM   in response to: edp

  Click to reply to this thread Reply

Ed,

Thanks for taking the time to go through this. I have
a few responses to your comments in-line.

Edward Pilatowicz wrote:
> hey jerry,
> nice proposal.
> sorry i didn't reply to this earlier, i managed to miss it last
> week. :(
> some comments are below.
>
> - you proposed:
> The "org.opensolaris.libbe:parentbe" property is set to the
> UUID of the global zone BE that is active when the non-global
> zone BE is created.
> when you first mention this it might make sense to say that no
> such UUID currently exists (i know you say that later, but i
> immediatly started thinking of the existing zone UUIDs).

We probably don't need to rev the proposal for this since the
caiman team is aware of the issue and adding this already.

> also, at some point we had talked about having some core-bits
> versioning that would restrict which packages a zone can update.
> wouldn't it make more sense to have this propety set to that
> identifier? this way a gobal zone BE can boot any non-global
> zone BE which has matching core bits. this is less restrictive
> that a UUID. (also, if some stuff in the global zone is updated,
> but the core-bits don't change there is no reason to snapshot
> all the non-global zones.)

We don't have a design for how this would work yet, so I am not sure
if we can express it in a simple attribute value or not.

> - as mentioned in other emails, i agree that ROOT is a better name
> than rpool. also, i think that since we snapshotting all zone
> filesystem, the zone realy needs a default dataset that is not
> contained within ROOT. (ethan mentioned this in other mails.)
> i don't really have a strong opinion on the name (export, share,
> common, etc) or the location (../z1/share or .../z1/data/share)
> but i think this proposal is incomplete without specifying a
> default common location for zone admins to place data that spans
> all zone BEs.

Yes, we will be using ROOT. Once we settle on a name for a shared
dataset, it'll be easy to add that since it is orthogonal to the
rest of this stuff.

> - you said:
> One concern with this design is that the zone has access to its
> datasets that correspond to a global zone BE which is not active.
>
> fixing bugs in zfs aside, i think that this issue could be addressed
> by telling users not to maniuplate filesystem in ROOT directly with
> the zfs command. instead these filesystems will always be manipulated
> indirectly via tools like beadm and pkg, which can detect and prevent
> this situation (and other invalid or inconsistent configurations).
> this would also addresses the second issue of providing an interface
> for promoting non-active BEs for zones.

Yes, we'll tell users, but does anybody listen? :-) If someone
thinks they can just 'zfs destroy' to get some space, they will.

> - wrt a zones /dev filesystem, in nevada <zonepath>/root/dev is
> not a re-mount of <zonepath>/dev. instead the devnames
> filesystem is mounted on <zonepath>/root/dev and <zonepath>/dev
> is only used as a an attribute backing store by the devnames
> filesystem. this was done to allow upgrades from s10 where
> devanems doesn't exist. given that currently there is no
> upgrade path from s10 to opensolaris, <zonepath>/dev really
> should have just been eliminated from the first opensolaris
> release. all this would taken was removing:
> opt="attrdir=%R/dev"
> from
> /usr/lib/brand/ipkg/platform.xml
>
> of course doing that now would break upgrades from 2008.05 to
> some future version of opensolaris that changes this behavior.
> so we'll probably need special code to handle this upgrade.

We are already disallowing upgrades so that is not a factor.
I do have the ipkg brand already using the dev backing store
inside the root and I have a prototype if we wanted to put
this into s10. For that case, the code would have to handle
the migration on the fly, so we may just not want to do that.

> - another issue wrt /dev and multiple BEs is that devnames
> saves attributes in the backing store in a passthrough
> manner. this means that items in the underlying filesystems
> are actualy device nodes. so imagine the following bad
> scenario:
> zonecfg specified access to usb devices /dev/usb/*
> zone is snapshotted and cloned
> zonecfg is changed to remove access to usb devices /dev/usb/*
> inside the zone, the old zone BE (and it's /dev attribute
> backing store) are still accessible, so the raw
> usb device nodes are still accessible in the zone.
> i can't really think of a good way to protect against this other
> than to have the /dev attribute store live outside of any zoned
> filesystems. so perhaps something like:
> <zonepath>/DEV/ZBE1
> where none of the above filesystems are zoned and where there is
> a 1:1 mapping between snapshots/clones of the DEV/XXX
> attribute backing stores and the zone boot environments.

Yes, this one is a bit of a problem. I'll have to think about it some.
We don't have a way to associate zonecfg changes with a specific
snapshot/clone either.

Thanks again,
Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss at opensolaris dot org





Terms of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
Copyright © 1995-2005 Sun Microsystems, Inc.