|
Replies:
24
-
Last Post:
Sep 29, 2005 2:20 PM
by: meem
|
|
|
Posts:
3,045
From:
US
Registered:
3/9/05
|
|
|
|
Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 23, 2005 10:27 AM
|
|
In response to several requests, and in light of the size of the document, I have extended the timer for feedback to Thursday, 9/29. As before, the document is available here:
http://opensolaris.org/os/community/networking/ipmp-highlevel-design.pdf
The document has also been slightly updated in response to the feedback received so far. Note that there are no changes to the original design, but several ambiguous statements have been clarified, and the interaction between DHCP and IPMP has been expanded in Section 4.12 (thanks, dme!). Accordingly, the version number of the document has been bumped to 1.2.
Thanks again for your feedback -- and keep it coming! -- meem _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
Posts:
6,810
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture:
high-level design: extended to 9/29
Posted:
Sep 26, 2005 12:13 PM
in response to: meem
|
|
Peter Memishian writes: > http://opensolaris.org/os/community/networking/ipmp-highlevel-design.pdf
I reviewed 1.0, but I've checked all of my following comments against 1.2. They still seem mostly relevant.
(Comments by page and section number.)
p1, section 1.1:
- A bit of a nit, but it's really not about "sockets." The applications that benefit from IPMP include those that don't use sockets at all (such as RPC), and there are sockets-using applications that don't benefit from it (such as AppleTalk). Instead, the real term should be just "TCP/IP based."
- The lack of Sun Trunking across the board isn't really a failing of the technology, but rather of our implementation of it.
p3, first bullet:
- The "failsafe" mechanisms referred to here are probably route flap dampening and hold-downs.
- Some other things that could be added to this list:
. IPMP's probing mechanism is incompatible with standards- based VRRP. The former normally requires routers to respond to ICMP Echo messages, but the latter prohibits the back-up router from responding to any packets addressed directly to the virtual address, even simple ping requests. This means that a fail-over of VRRP triggers fail-over (and eventual failure) in IPMP.
. Some routers are configured not to reply to 'ping' at all, as it's sometimes viewed as a "security problem."
. Having large numbers of these probing hosts on a single network can case high amounts of ICMP Echo traffic, thus triggering ICMP rate-limiting in many routers, and resulting in false failure detection.
. At least in previous releases (not sure about today), the code assumed that the fast-path M_DATA header was the same on all member links, even though there's no mechanism that actually causes this to be true.
. The use of source addresses is confusing to users, and often results in RFEs filed asking for "same interface" semantics.
. The detailed behavior of general multicast (not the well-known link-local multicast addresses) is less clear. In particular, the behavior necessary to accomodate IGMP-snooping switches is probably missing.
p4, requirement 4:
- Should mention here that packet filtering should be done on a per-group basis to maintain state tracking.
p5, section 3.1:
- What is the MTU on an IPMP interface? Is it the minimum of all member links?
p8, section 3.7:
- I'm not sure the described behavior here really represents FAILBACK=no. (But, then, I'm not sure what behavior would really represent that.) It would help to have some examples worked out here.
For example, imagine a group of three active interfaces with FAILBACK=no. Using "A" for active, "I" for inactive, and "F" for failed, I can come up with differing end results with only minor changes in timing. For example:
(AAA) -> (FAA) -> (IAA) -> (AFA) -> (AIA)
occurs when interface 1 fails and is repaired, followed by a failure and repair of interface 2. But if we have interface 2 fail right before the interface 1 repair, we get:
(AAA) -> (FAA) -> (FFA) -> (IFA) -> (IIA)
Maybe that's actually right, and is just a consequence of an unusual usage model (FAILBACK=no without a designated STANDBY interface), but the result seems odd to me. And I'm really concerned about what happens if *all* the interfaces fail and then recover.
p9, section 3.9:
- It would help a bit, I think, to segregate out the flags that are on logical interfaces (address flags) from those that are on the underlying physical interface. (The break is after the 5th entry in the first table, and after the second in the second table.)
- DL_NOTE_LINK_DOWN causes the kernel to clear out IFF_RUNNING. Does it now result in the kernel setting IFF_FAILED as well? Or does in.mpathd (when and only when it monitors a given interface) set IFF_FAILED in response to IFF_RUNNING?
I can perhaps understand having the IFF_RUNNING flag set by the kernel to be the logical AND of having the hardware report the interface up, and software reporting the interface not failed, but I'm not sure I see why the two (IFF_FAILED and IFF_RUNNING) must be just mirror images of each other.
In fact, I'm not sure why IFF_RUNNING on the member links would be cleared out by IFF_FAILED. It's not as though ordinary applications would ever see those interfaces, so they cannot be confused by the meaning of the extra bits; they need to set special Solaris-specific flags to see them at all. So why the interlock?
p10, section 3.10:
- Could note that in.mpathd already has DLPI open, so this isn't a significant change. (But based on the above, I don't see why special link up/down handling is actually needed.)
p10, section 3.11:
- It would be possible to move IP addresses from one L2 address (or member link) to another from within in.mpathd, perhaps using the existing ARP ioctls. Does this really need to be sent to the kernel? (No problem if you _want_ it there, just that it doesn't seem to _need_ to be there.)
p13, section 3.16.4:
- I had a lot of trouble reading this section.
First, I don't see why "usesrc" is specifically senseless with IPMP. OSPF-MP assumes that the "real" interfaces themselves do have addresses, but that those addresses just aren't supposed to be used by normal applications.
I suspect part the confusion here might be between the source address selected, and the way IPMP does inbound load spreading. However, the load spreading is done a different way when OSPF-MP is in use, and isn't incompatible with IPMP. Peers will see those IPMP addresses as independent, equal-cost next-hop addresses, and will establish routes to each one. The return packets will be load-spread according to those ECMP routes rather than by the destination address (which almost certainly isn't on-link anyway).
Secondly, I don't quite see why the new model helps or hurts here, at least in the ways described. It certainly does help by getting rid of the distracting test addresses, but I see no particular relationship with "usesrc."
Thirdly, the claim about network utilization doesn't seem to be substantiated. Why exactly would IPMP improve utilization? Certainly, all interfaces in an OSPF-MP group are expected to be able to send packets. (Is that the underlying problem that this text is referencing -- the lack of ECMP support on Solaris? If so, then I don't think that recommending IPMP to solve that problem is the right path.)
Finally, I'm not sure I understand why we would want to make our implementation of "usesrc" less uniform than it is now. We did try in the original design to make sure that it didn't have odd interactions with other technologies, and could be reused as needed, and it seems odd that we're roping it off in this case.
p14, section 4.1.1:
- Who does the "next available" selection when a ~NOFAILOVER address is transferred to an ipmp logical interface? Is this done in the kernel itself, in in.mpathd, or in ifconfig? (If it's the latter, what happens to existing applications that plumb IP interfaces?)
p15, section 4.1.4:
- There's a new semantic implied here. It's no longer possible to set the group name first and then set the NOFAILOVER flag. That is, attempts to do this will produce unexpected results:
# ifconfig ce0 10.0.0.1 netmask + broadcast + up # ifconfig ce0 group foo # ifconfig ce0 -failover
After that second command, the interface that's being configured will (presumably) migrate to the ipmp interface, leaving that last command to uselessly set the flag on the unused ce0 member link.
- I would recommend keeping the "is ipmp" notion separate from the group name establishment just for the purpose of clarity. In other words, I'd prefer something like this instead:
# ifconfig outside0 ipmp group b
Or even this:
# ifconfig outside0 plumb ipmp group b
- It's specifically against the rules to have an option that takes an optional parameter, as the proposed "[ipmp [groupname]]" syntax would allow. (The problem is that it makes subsequent keywords ambiguous, and ifconfig is tortured enough as it is. ;-})
- Is it possible to give an ipmp interface the name of a real interface on the system? What happens if I do that? (I assume that the "ipmp" keyword causes an error because the interface already exists and can't switch types.)
What happens if I give it the name of an interface that _later_ is established as a real one, as with DR. Does the new interface get rejected by the system?
p16, section 4.1.5:
- The lack of symmetry between the "ifconfig ipmp1 ipmp b" and "ifconfig ipmp1 unplumb" operations doesn't look too pretty.
p16, section 4.1.6:
- Who does this address migration?
- Administrative issue to be documented: accidentally setting a group name (or the wrong group name) on an up interface is now highly toxic. It means that the address slips out from the administrator's control, and doesn't come back unless he does a series of unintuitive commands. (I.e., just clearing out the group name won't fix the problem.)
- An IPMP group with no member links has IFF_RUNNING cleared, right? Is it also IFF_FAILED?
p17, section 4.2:
- What's the privilege model for the new command?
- I like the flag-verbs, but I'm nearly certain that xDesign won't.
- I suggest that you create a machine parseable output format _now_, since Explorer-consumers are growing rampant and are nailing all the other utilities that don't have parseable forms.
p18, section 4.2.1:
- What shows in "-g" output under the 'fdt' column if the group doesn't have probe targets or doesn't have test addresses?
- "Degraded" might need a tighter definition. In particular, if I have a group with one stand-by interface, and one of the main interfaces has failed over to the stand-by, is that group now in "degraded" mode? It's not degraded from the bandwidth or availability point of view, though perhaps it is from a hardware maintenance point of view.
p18, section 4.2.2:
- Should there be a "-n" option to suppress address-to-name translation? (And should "names" be the default the way they are most everywhere else?)
- Suggest using "--" rather than "n/a" for consistency with other existing tools.
- Can "-a" or some other tool show which interface is the current multicast/broadcast "lead" interface for duplicate suppression? This part still isn't observable.
- How does the utility get this information? Via ARP ioctls or some other mechanism?
p19, section 4.2.3:
- What permissions does "-i" need? Won't it need to be root to look at the DLPI driver for the "probe" column? (Or is some other magic afoot?)
- What does the "active" column represent? It's not really explained. (It doesn't just appear to be the inverse of IFF_INACTIVE.)
- Why is it impossible to report link up/down status when the link is offline? Shouldn't the system still monitor link up/down status, even while the link is administratively offlined? Or is there some interference here with DR?
p20, section 4.2.5:
- Is it really possible to get probes that march backwards in time, as shown by the 1438 to 1439 transition? Or is that just a cut-and-paste issue?
- Should the column header be "seq" instead of "probe?"
- Why does ipmpstat need to query the IPMP subsystem periodically? Can't it just block awaiting notification from IPMP?
- It might be helpful to have the time displayed in some what that's aligned with snoop. (Though exactly how, I'm not sure.)
p21, top of page:
- When exactly is a packet declared "lost?" Is it when we go to send another and the previous hasn't arrived yet? Or is it related to the "FDT?"
p22, section 4.3.2:
- There seems to be some surprising (and unintended) new functionality here. If I manage things by group name, then I don't need to remember which ipmp group is which in order to add a new address. I can just do something like this:
# ifconfig foobar0 plumb group a 10.0.0.1 up
and since "foobar0" will never exist, this will add the address to the named group.
p23, 'route' changes:
- If this functionality is implemented in the 'route' command itself, rather than in the kernel, what does that mean for existing utilities? It seems like the "add static route" feature in Zebra and the like will be harmed by this.
(For what it's worth, I think those utilities are probably blown out of the water by removing "ce0" from the SIOCGIFCONF data, and will need manual intervention to convert their configurations over. I hope that there's not much mixed IPMP/Zebra usage.)
p23, section 4.6.1:
- "duplicate address detection will be used to ensure that no other on-link hosts are currently using it." On *what* link? I assume this means that one will be chosen arbitrarily and just used.
- Why is the link-local address unreachable if there are no interfaces in the group? Aren't local addresses always reachable?
- Why would an IPv6 ipmp interface have the BROADCAST flag set?
p24, section 4.6.2:
- NumAddrs: ew. This should really be based on the number of member links in the IPMP group. You'll want to have at least one data address per member link in order to get the inbound load-spreading right. It'd be better still if in.ndpd just did the right thing.
p25, top of page:
- Need to know exactly how statistics (particularly errors) are handled. At a guess, I think you'll need to keep a record of what the error counter for a member link was at the time it joined, so that you can add the delta during membership to the total. Otherwise, removing a link from a group could cause the counters to roll backwards, and that's a big No-No.
p25, section 4.8:
- What needs to change in ARP so that multiple L2 addresses are accepted as local by the system? It seems to me that, since ARP is plumbed over each real interface, ARP will need to be in on the game that IPMP is playing. There need to be signals between IPMP and ARP to accomplish this.
p26, section 4.10:
- "Sent to and received from" in the context of /dev/ipnet/ipmp* actually means sent or received on any member link, and not just filtered based on address. Right?
p28, section 5.1:
- One of the applications affected is SNMP.
p29, section 5.2:
- 1: If an address is up and marked IFF_NOFAILOVER, can I cause the address to migrate by clearing IFF_NOFAILOVER?
- 2: Is IFF_INACTIVE modified both by the kernel and by applications?
- 4: Why isn't IFF_COS_ENABLED set? This will likely break IPQoS. Shouldn't it just be the logical AND of all the IFF_COS_ENABLED bits on the underlying interfaces?
- 6: When adding a new interface to a group, what happens? Is the interface's IFF_ROUTER flag changed to match the flag used for the existing group?
- 7: What does IFF_XRESOLV mean on an ipmp interface?
p30, table 2:
- Might be nice to indicate which ones are physical and which are logical.
- Why would IFF_MIPRUNNING appear on any underlying interface?
- Why can't IFF_MULTI_BCAST be visible on an ipmp interface?
p31, top of page:
- Is the automatic IFF_DEPRECATED logic done in the kernel or user space?
- As before, I'm not really sure why clearing IFF_RUNNING on the member links in response to IFF_FAILED makes sense. I'd much rather see these two flags retain their original meaning, and have the ipmp aggregate interface just show ~IFF_RUNNING when all member interfaces have *either* IFF_FAILED set *or* IFF_RUNNING cleared. That would better illustrate the layering, as the "failed" concept is really an artifact of how IPMP works internally.
p31, section 5.4.1:
- There's no such thing as a "routing socket associated with the IPMP interface." There are just global routing sockets; they're associated with IP itself, not any particular interface.
p31, section 5.4.2:
- This section is a little confusing, because it talks about the possibility of seeing the IPMP member links well before it says how this might happen.
p32, top of page:
- I had trouble reading this. I assume it means just that RTM_NEWADDR will occur during address transfer, and not that IFF_NOFAILOVER is the only possible way this message could be sent.
- Is any message sent to the user when the SO_RTSIPMP flag is set or cleared? If not, then how does the user know that he's got a consistent view of the world? (I suspect the answer is that the user should set the flag before doing SIOCGLIFCONF with LIFC_IPMP, and then just _never_ clear it.)
p32, section 5.5:
- If we did allow routes to interfaces, then the behavior in section 4.5 would become a bit weird.
p32, section 5.6:
- What happens if an ifindex of a member link is used? (I assume it results in an error.)
p33, section 5.7:
- It would be nice if SIOCLIFADDIF worked to add the zeroth address on the interface when the address configured there is 0.0.0.0. This would remove one bit of asymmetry from the current design.
p33, section 5.9:
- What do SIOC[GS]LIFLNKINFO and SIOC[GS]LIFMUXID mean on ipmp interfaces? The latter seems senseless, as no real plumbing can occur there.
- What do SIOC[GS]LIFMETRIC mean on member link interfaces? This doesn't make sense, as routing (the consumer of interface metrics) won't use them. Does something in IPMP itself use them?
p34, section 5.10:
- Probably need more detailed behavior for SIOCDARP.
p34, section 5.12:
- What happens if I set the zone ID to a non-global zone first, and then set the IFF_NOFAILOVER flag?
p35, section 5.14:
- It might be worth mentioning that IGMP operation under IPMP is really unclear. (Should membership messages be echoed out all interfaces so that switches know that all interfaces are partitipating? Or just over the "lead" interface? If it's the latter, shouldn't the IGMP messages be repeated if a new "lead" is chosen?)
p35, section 5.15:
- "Reverse-engineered" might be a little strong. The same interface exists on other Mentat-derived systems, and (like fast-path) is probably documented there.
p36, section 5.16:
- If the vni driver goes away, does the vni(7D) man page go too?
p36:
- Any changes for PSARC 2002/137 (IPMP Asynchronous Event Definitions) due to this project?
-- James Carlson, KISS Network <james dot d dot carlson at sun dot com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
3,045
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 27, 2005 10:07 PM
in response to: carlsonj
|
|
Jim,
Thanks as always for your thorough comments -- the document is quite a bit improved thanks to them. My replies are inline. In most cases, I have gone ahead and updated the document, which is now available as version 1.3. I suspect subsequent reads will find some typos in my revisions, thus don't be startled if it's 1.3.1 or 1.3.2 by the time you get a chance to look at it again ;-)
When replying, please remove anything non-controversial so that we can converge on the remaining issues.
> p1, section 1.1: > > - A bit of a nit, but it's really not about "sockets." The > applications that benefit from IPMP include those that don't use > sockets at all (such as RPC), and there are sockets-using > applications that don't benefit from it (such as AppleTalk).
The intent is to make it clear that the focus is on sockets (hence all of the changes to the socket-level API's). We're not putting effort into ensuring that XTI or TLI applications will benefit, though I believe they will since AFAIK all of the TLI/XTI interface and local address discovery operations are implemented in terms of sockets.
> Instead, the real term should be just "TCP/IP based."
So UDP and SCTP don't benefit? :-P I can say "AF_INET[6]-based" if you prefer. But claiming that IPMP will work with all IP-based applications, including those using TLI/XTI, seems a bit too bold to me (and I'm not convinced it's time well spent to guarantee it).
> - The lack of Sun Trunking across the board isn't really a failing > of the technology, but rather of our implementation of it.
Sure; clarified in the document (also true with 802.3ad).
> p3, first bullet: > > - The "failsafe" mechanisms referred to here are probably route flap > dampening and hold-downs.
Yes; clarified.
> - Some other things that could be added to this list: > > . IPMP's probing mechanism is incompatible with standards- > based VRRP. The former normally requires routers to respond > to ICMP Echo messages, but the latter prohibits the back-up > router from responding to any packets addressed directly to > the virtual address, even simple ping requests. This means > that a fail-over of VRRP triggers fail-over (and eventual > failure) in IPMP. > > . Some routers are configured not to reply to 'ping' at all, > as it's sometimes viewed as a "security problem." > > . Having large numbers of these probing hosts on a single > network can case high amounts of ICMP Echo traffic, thus > triggering ICMP rate-limiting in many routers, and resulting > in false failure detection.
Sadly, these three problems are inherent in the probe-based failure detection mechanism. I'm not sure what we can do about them from a technical standpoint.
> . The use of source addresses is confusing to users, and often > results in RFEs filed asking for "same interface" semantics.
Agreed, and incorporated.
> . At least in previous releases (not sure about today), the > code assumed that the fast-path M_DATA header was the same > on all member links, even though there's no mechanism that > actually causes this to be true.
This seems like an implementation flaw that has no impact on the design.
> . The detailed behavior of general multicast (not the > well-known link-local multicast addresses) is less clear. > In particular, the behavior necessary to accomodate > IGMP-snooping switches is probably missing.
I'm not convinced this is a problem that shapes the *high-level* design.
> p4, requirement 4: > > - Should mention here that packet filtering should be done on a > per-group basis to maintain state tracking.
Sure; mentioned.
> p5, section 3.1: > > - What is the MTU on an IPMP interface? Is it the minimum of all > member links?
Yes. I've added a new section, 3.10, which covers how MTU used to be handled, and how it will be handled in the future. In retrospect, the omission of this topic is glaring; apologies.
> p8, section 3.7: > > - I'm not sure the described behavior here really represents > FAILBACK=no. (But, then, I'm not sure what behavior would really > represent that.) It would help to have some examples worked out > here. > > For example, imagine a group of three active interfaces with > FAILBACK=no. Using "A" for active, "I" for inactive, and "F" for > failed, I can come up with differing end results with only minor > changes in timing. For example: > > (AAA) -> (FAA) -> (IAA) -> (AFA) -> (AIA) > > occurs when interface 1 fails and is repaired, followed by a > failure and repair of interface 2. But if we have interface 2 > fail right before the interface 1 repair, we get: > > (AAA) -> (FAA) -> (FFA) -> (IFA) -> (IIA) > > Maybe that's actually right, and is just a consequence of an > unusual usage model (FAILBACK=no without a designated STANDBY > interface), but the result seems odd to me. And I'm really > concerned about what happens if *all* the interfaces fail and then > recover.
Yes, it's a bit of an odd bird. Personally, I'd love to get rid of this feature, but I know there are customers who hate unnecessary rebinding of addresses to interfaces (because of the affect it has on others hosts) and thus want to have that happen as little as possible.
I've added a little more rationale behind the feature, but I don't want to devote too much space to this wart (I'd like to kill it, but I can't).
> p9, section 3.9: > > - It would help a bit, I think, to segregate out the flags that are > on logical interfaces (address flags) from those that are on the > underlying physical interface. (The break is after the 5th entry > in the first table, and after the second in the second table.)
Segregate how -- with an extra line in the table? And for what purpose?
> - DL_NOTE_LINK_DOWN causes the kernel to clear out IFF_RUNNING. > Does it now result in the kernel setting IFF_FAILED as well? Or > does in.mpathd (when and only when it monitors a given interface) > set IFF_FAILED in response to IFF_RUNNING?
I'm uncomfortable with the idea of a flag that is sometimes set by the kernel, and sometimes by an application. Since there are situations where IFF_FAILED must be set by in.mpathd, I think i'd rather have it always set it. This also keeps all the policy of setting IFF_FAILED in one place (even if the policy is rigid, I prefer it centralized).
> I can perhaps understand having the IFF_RUNNING flag set by the > kernel to be the logical AND of having the hardware report the > interface up, and software reporting the interface not failed, but > I'm not sure I see why the two (IFF_FAILED and IFF_RUNNING) must > be just mirror images of each other.
The mirror-image must be maintained to ensure that naive applications behave correctly: if there was a situation where IFF_FAILED was set and IFF_RUNNING was set, then an application would try to use an unusable interface. (The other case, where IFF_FAILED was clear and IFF_RUNNING was clear makes no semantic sense: how can the interface not be IFF_RUNNING, but not be IFF_FAILED?)
> In fact, I'm not sure why IFF_RUNNING on the member links would be > cleared out by IFF_FAILED. It's not as though ordinary > applications would ever see those interfaces, so they cannot be > confused by the meaning of the extra bits; they need to set > special Solaris-specific flags to see them at all. So why the > interlock?
To make it clear to someone using ifconfig or other administrative tools.
> p10, section 3.10: > > - Could note that in.mpathd already has DLPI open, so this isn't a > significant change. (But based on the above, I don't see why > special link up/down handling is actually needed.)
It does?
> p10, section 3.11: > > - It would be possible to move IP addresses from one L2 address (or > member link) to another from within in.mpathd, perhaps using the > existing ARP ioctls. Does this really need to be sent to the > kernel? (No problem if you _want_ it there, just that it doesn't > seem to _need_ to be there.)
I feel it's more natural in the kernel.
> p13, section 3.16.4: > > - I had a lot of trouble reading this section.
We talked offline about this. I have updated the text to make it clear that I was referring to usesrc being too sharp a knife with the current IPMP administrative model, and to clarify that the network utilization comments assume the lack of ECMP.
> Finally, I'm not sure I understand why we would want to make our > implementation of "usesrc" less uniform than it is now. We did > try in the original design to make sure that it didn't have odd > interactions with other technologies, and could be reused as > needed, and it seems odd that we're roping it off in this case.
It's not less uniform -- both now and in the future, IPMP is not supported. I'm roping it off because I can't see a compelling reason to support the configuration, especially vs. OSPF-MP with ECMP.
> p14, section 4.1.1: > > - Who does the "next available" selection when a ~NOFAILOVER address > is transferred to an ipmp logical interface? Is this done in the > kernel itself, in in.mpathd, or in ifconfig? (If it's the latter, > what happens to existing applications that plumb IP interfaces?)
The kernel will do it as part of bringing the interface IFF_UP. This was intended to be implied by list item 1 on page 30.
> p15, section 4.1.4: > > - There's a new semantic implied here. It's no longer possible to > set the group name first and then set the NOFAILOVER flag. That > is, attempts to do this will produce unexpected results: > > # ifconfig ce0 10.0.0.1 netmask + broadcast + up > # ifconfig ce0 group foo > # ifconfig ce0 -failover
Actually, that has always potentially led to unexpected results (e.g., if ce0 was failed). However, I agree that some systems may have been misconfigured this way and thus we should have a release note explaining the situation. As such, I have updated the document to discuss the issue.
> - I would recommend keeping the "is ipmp" notion separate from the > group name establishment just for the purpose of clarity. In > other words, I'd prefer something like this instead: > > # ifconfig outside0 ipmp group b > > Or even this: > > # ifconfig outside0 plumb ipmp group b > > - It's specifically against the rules to have an option that takes > an optional parameter, as the proposed "[ipmp [groupname]]" syntax > would allow. (The problem is that it makes subsequent keywords > ambiguous, and ifconfig is tortured enough as it is. ;-})
To cover all of the above: based on our offline discussion, I've changed the syntax to be "ifconfig outside0 ipmp group b" (or, as a shorthand, "ifconfig outside0 ipmp"). I have also relaxed the constraint on changing the group name: it is now permitted as long as there are no underlying interfaces in the group.
The document has been updated.
> - Is it possible to give an ipmp interface the name of a real > interface on the system? What happens if I do that? (I assume > that the "ipmp" keyword causes an error because the interface > already exists and can't switch types.)
Right, this is covered in section 4.1.4.
> What happens if I give it the name of an interface that _later_ is > established as a real one, as with DR. Does the new interface get > rejected by the system?
The interface isn't rejected, but it won't be plumbed by IP (and an error will be logged). C'est la vie.
> p16, section 4.1.5: > > - The lack of symmetry between the "ifconfig ipmp1 ipmp b" and > "ifconfig ipmp1 unplumb" operations doesn't look too pretty.
The asymmetry is annoying, but the alternatives are:
* Use "plumb ipmp" for creation: this is problematic because leaving out the word "ipmp" would do something *totally* different, which I found unacceptable.
* Invent a synonym for "unplumb" which must be used with IPMP interfaces. That seemed gratuitous and a bit user-hostile (who wants to remember a second command?)
Anyway, I suspect the xDesign guys will have an opinion on this, so let's see what they have to say.
> p16, section 4.1.6: > > - Who does this address migration?
It hasn't been decided -- either the kernel, or ifconfig.
> - Administrative issue to be documented: accidentally setting a > group name (or the wrong group name) on an up interface is now > highly toxic. It means that the address slips out from the > administrator's control, and doesn't come back unless he does a > series of unintuitive commands. (I.e., just clearing out the > group name won't fix the problem.)
Yes, that's a risk. Footnote added.
> - An IPMP group with no member links has IFF_RUNNING cleared, right? > Is it also IFF_FAILED?
Yes -- it's not usable. I've updated section 5.3 to cover this.
> p17, section 4.2: > > - What's the privilege model for the new command?
As per the prompt, any user can run it. It may internally require some privileges to work, but the specifics aren't known yet. I've updated the document to mention this.
> - I like the flag-verbs, but I'm nearly certain that xDesign won't.
:-) It seems wrong to end up with a command that consists of nothing but show-* subcommands, so maybe I have an argument. We'll see.
> - I suggest that you create a machine parseable output format _now_, > since Explorer-consumers are growing rampant and are nailing all > the other utilities that don't have parseable forms.
Agreed; I have now defined one -- see section 4.2.6. Better ideas are welcome, but I'd prefer not to have to rope off too many meta-characters (right now, just "=" and "\n" are roped off).
> p18, section 4.2.1: > > - What shows in "-g" output under the 'fdt' column if the group > doesn't have probe targets or doesn't have test addresses?
"n/a" -- updated.
> - "Degraded" might need a tighter definition. In particular, if I > have a group with one stand-by interface, and one of the main > interfaces has failed over to the stand-by, is that group now in > "degraded" mode? It's not degraded from the bandwidth or > availability point of view, though perhaps it is from a hardware > maintenance point of view.
Yes, that group would now be "degraded". I agree a tighter definition is needed, and I will talk to the FMA guys about this. (Thanks for reminding me about this!)
> p18, section 4.2.2: > > - Should there be a "-n" option to suppress address-to-name > translation? (And should "names" be the default the way they are > most everywhere else?)
What name translation?
> - Suggest using "--" rather than "n/a" for consistency with other > existing tools.
Can't say I have a strong preference here. Let's wait to see what the xDesign guys have to say.
> - Can "-a" or some other tool show which interface is the current > multicast/broadcast "lead" interface for duplicate suppression? > This part still isn't observable.
Good point. I've added it as a "flags" field member to ipmpstat -i.
> - How does the utility get this information? Via ARP ioctls or some > other mechanism?
The design of ipmpstat will be covered in a separate document, but: it will be a mix of ioctl's and calls through libipmp into in.mpathd.
> p19, section 4.2.3: > > - What permissions does "-i" need? Won't it need to be root to look > at the DLPI driver for the "probe" column? (Or is some other > magic afoot?)
As implied by the prompt, none will be required by the user. However, it may need some subset of privileges to actually work. TBD.
> - What does the "active" column represent? It's not really > explained. (It doesn't just appear to be the inverse of > IFF_INACTIVE.)
Check the glossary (the original definition is back in section 3.8).
> - Why is it impossible to report link up/down status when the link > is offline? Shouldn't the system still monitor link up/down > status, even while the link is administratively offlined? Or is > there some interference here with DR?
Once an interface is offlined, it cannot be attached to with DLPI. So, there is no way to access the link up/down status. Footnote added.
> p20, section 4.2.5: > > - Is it really possible to get probes that march backwards in time, > as shown by the 1438 to 1439 transition? Or is that just a > cut-and-paste issue?
It depends on some aspects of the output format that I haven't decided on. If things remain ordered by sequence number (which I prefer), then it's entirely possible that responses on some interfaces arrived before those on others. However, that raises the question of whether to delay all output "waiting" for lost probes -- so I may end up changing things to be sorted by time.
> - Should the column header be "seq" instead of "probe?"
That seems too geeky.
> - Why does ipmpstat need to query the IPMP subsystem periodically? > Can't it just block awaiting notification from IPMP?
It could. I've changed the text to be more vague, as the details of that are really a topic for another document.
> - It might be helpful to have the time displayed in some what that's > aligned with snoop. (Though exactly how, I'm not sure.)
Yeah, dunno how to do that.
> p21, top of page: > > - When exactly is a packet declared "lost?" Is it when we go to > send another and the previous hasn't arrived yet? Or is it > related to the "FDT?"
Yes, that's when (and the rate of sending packets is related to the FDT). I've updated the document (we also need to update the public IPMP documentation to cover this -- sigh).
> p22, section 4.3.2: > > - There seems to be some surprising (and unintended) new > functionality here. If I manage things by group name, then I > don't need to remember which ipmp group is which in order to add a > new address. I can just do something like this: > > # ifconfig foobar0 plumb group a 10.0.0.1 up > > and since "foobar0" will never exist, this will add the address to > the named group. I don't really see how this is new -- I can craft up arbitrary hostname.<if> files today and achieve similar results. Note that if group "a" doesn't exist at all by the time the system gets to handling missing interfaces, the above will be ignored.
> p23, 'route' changes: > > - If this functionality is implemented in the 'route' command > itself, rather than in the kernel, what does that mean for > existing utilities? It seems like the "add static route" feature > in Zebra and the like will be harmed by this.
I'd prefer to isolate this to route. Why would zebra be adding routes to the underlying interfaces?
> (For what it's worth, I think those utilities are probably blown > out of the water by removing "ce0" from the SIOCGIFCONF data, and > will need manual intervention to convert their configurations > over. I hope that there's not much mixed IPMP/Zebra usage.)
Why would they want to know about ce0? Please elaborate.
> p23, section 4.6.1: > > - "duplicate address detection will be used to ensure that no other > on-link hosts are currently using it." On *what* link? I assume > this means that one will be chosen arbitrarily and just used.
Yes; clarified.
> - Why is the link-local address unreachable if there are no > interfaces in the group? Aren't local addresses always reachable?
The intent was to state that it's not reachable via another host. Of course it can be locally used. Updated.
> - Why would an IPv6 ipmp interface have the BROADCAST flag set?
Clearly a mistake; fixed.
> p24, section 4.6.2: > > - NumAddrs: ew. This should really be based on the number of member > links in the IPMP group. You'll want to have at least one data > address per member link in order to get the inbound load-spreading > right. It'd be better still if in.ndpd just did the right thing.
I'm fine with having in.ndpd try to initially configure as many global addresses as there are interfaces, but I'm not sure what to do if an interface is removed -- is it really okay to blow away a global address at that point? It *feels* wrong to do that.
Document updated.
> p25, top of page: > > - Need to know exactly how statistics (particularly errors) are > handled. At a guess, I think you'll need to keep a record of what > the error counter for a member link was at the time it joined, so > that you can add the delta during membership to the total. > Otherwise, removing a link from a group could cause the counters > to roll backwards, and that's a big No-No.
Right. I haven't decided on the implementation yet. I think this is too low-level for this document.
> p25, section 4.8: > > - What needs to change in ARP so that multiple L2 addresses are > accepted as local by the system? It seems to me that, since ARP > is plumbed over each real interface, ARP will need to be in on the > game that IPMP is playing. There need to be signals between IPMP > and ARP to accomplish this.
Yep, this will be covered in a low-level document. But first, we need to get some code running to see what approach makes the most sense.
> p26, section 4.10: > > - "Sent to and received from" in the context of /dev/ipnet/ipmp* > actually means sent or received on any member link, and not just > filtered based on address. Right?
That depends on whether it's in promiscuous-mode or not. If in promiscuous-mode: yes. I've updated the document to be clearer.
> p28, section 5.1: > > - One of the applications affected is SNMP.
Footnote added.
> p29, section 5.2: > > - 1: If an address is up and marked IFF_NOFAILOVER, can I cause the > address to migrate by clearing IFF_NOFAILOVER?
Yes. In response to other review feedback, this is indirectly covered in section 4.1.3.
> - 2: Is IFF_INACTIVE modified both by the kernel and by > applications?
No, only by applications (specifically in.mpathd, though I suppose anything could have a whack at it if it really wanted).
> - 4: Why isn't IFF_COS_ENABLED set? This will likely break IPQoS. > Shouldn't it just be the logical AND of all the IFF_COS_ENABLED > bits on the underlying interfaces?
IPQoS is already broken [rimshot]. "Fixed."
> - 6: When adding a new interface to a group, what happens? Is the > interface's IFF_ROUTER flag changed to match the flag used for the > existing group?
Yes. Clarified.
> - 7: What does IFF_XRESOLV mean on an ipmp interface?
You're right, it will not be supported. Likewise with IFF_NOARP and IFF_NONUD. Fixed.
> > p30, table 2: > > - Might be nice to indicate which ones are physical and which are > logical.
It's not that simple -- there are four levels of hierarchy:
* Per ill_t * Per ill_t address family (IPv6) * Per ipif_t * Per IPMP group
I thought the table was overwhelming as-is, and I couldn't see a clean way to incorporate this information.
> - Why would IFF_MIPRUNNING appear on any underlying interface?
Probably nothing, but there's nothing that would prevent it and I can't see a point in stopping it.
> - Why can't IFF_MULTI_BCAST be visible on an ipmp interface?
Are you making the same logical-AND argument as IFF_MULTI_BCAST? If so, okay.
> p31, top of page: > > - Is the automatic IFF_DEPRECATED logic done in the kernel or user > space?
Kernel; clarified.
> - As before, I'm not really sure why clearing IFF_RUNNING on the > member links in response to IFF_FAILED makes sense. I'd much > rather see these two flags retain their original meaning, and have > the ipmp aggregate interface just show ~IFF_RUNNING when all > member interfaces have *either* IFF_FAILED set *or* IFF_RUNNING > cleared. That would better illustrate the layering, as the > "failed" concept is really an artifact of how IPMP works > internally.
See my earlier response.
> p31, section 5.4.1: > > - There's no such thing as a "routing socket associated with the > IPMP interface." There are just global routing sockets; they're > associated with IP itself, not any particular interface.
Wow, I must've been really tired when I wrote that section. Fixed.
> p31, section 5.4.2: > > - This section is a little confusing, because it talks about the > possibility of seeing the IPMP member links well before it says > how this might happen.
I removed the first sentence; hopefully this makes it clearer.
> p32, top of page: > > - I had trouble reading this. I assume it means just that > RTM_NEWADDR will occur during address transfer, and not that > IFF_NOFAILOVER is the only possible way this message could be > sent.
How else could it happen? Any UP data addresses have already migrated to the IPMP interface.
> - Is any message sent to the user when the SO_RTSIPMP flag is set or > cleared? If not, then how does the user know that he's got a > consistent view of the world? (I suspect the answer is that the > user should set the flag before doing SIOCGLIFCONF with LIFC_IPMP, > and then just _never_ clear it.)
How does the user know he's got a consistent state of the world with "normal" routing sockets? I think it's the same -- an application would open a routing socket, set SO_RTSIPMP, do an SIOCGLIFCONF, build its state, and listen for routing socket messages to update that state.
> p32, section 5.5: > > - If we did allow routes to interfaces, then the behavior in section > 4.5 would become a bit weird.
It *already* is a bit weird ;-)
> p32, section 5.6: > > - What happens if an ifindex of a member link is used? (I assume it > results in an error.)
Yes; updated (in.mpathd makes use of the undocumented IP_DONTFAILOVER_IF option to guarantee that probes go out a specific interface).
> p33, section 5.7: > > - It would be nice if SIOCLIFADDIF worked to add the zeroth address > on the interface when the address configured there is 0.0.0.0. > This would remove one bit of asymmetry from the current design.
Yes, but fixing that is out-of-scope.
> p33, section 5.9: > > - What do SIOC[GS]LIFLNKINFO and SIOC[GS]LIFMUXID mean on ipmp > interfaces? The latter seems senseless, as no real plumbing can > occur there.
Since in.ndpd is performed on IPMP interfaces, my understanding is that it would use SIOC[GS]LIFLNKINFO. SIOC[GS]LIFMUXID is must provided for completeness.
> - What do SIOC[GS]LIFMETRIC mean on member link interfaces? This > doesn't make sense, as routing (the consumer of interface metrics) > won't use them. Does something in IPMP itself use them?
Could you explain more about how routing currently makes use of LIFMETRIC? Since routing daemons will run over the IPMP interface, I'd (clearly incorrectly) assumed that it would use this ioctl.
> p34, section 5.10: > > - Probably need more detailed behavior for SIOCDARP.
Please elaborate -- what more would you like to see?
> > p34, section 5.12: > > - What happens if I set the zone ID to a non-global zone first, and > then set the IFF_NOFAILOVER flag?
It will also fail. Updated.
> p35, section 5.14: > > - It might be worth mentioning that IGMP operation under IPMP is > really unclear. (Should membership messages be echoed out all > interfaces so that switches know that all interfaces are > partitipating? Or just over the "lead" interface? If it's the > latter, shouldn't the IGMP messages be repeated if a new "lead" is > chosen?)
I agree it should be discussed. Let me look into the issues involved; I will update the document once I have answers.
> p35, section 5.15: > > - "Reverse-engineered" might be a little strong. The same interface > exists on other Mentat-derived systems, and (like fast-path) is > probably documented there.
I was not aware of that; changed to "discovered".
> p36, section 5.16: > > - If the vni driver goes away, does the vni(7D) man page go too?
Yes; made explicit.
> p36: > > - Any changes for PSARC 2002/137 (IPMP Asynchronous Event > Definitions) due to this project?
Probably. That's not a documented interface, so it was not discussed. However, this is of little concern, as it seems that Sun Cluster never made use of the definitions.
Thanks again, -- meem _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
215
From:
Scotland
Registered:
9/15/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 2:54 AM
in response to: meem
|
|
Hi Peter,
Footnote 9 is confusing, does it refer to new IPMP or old? The footnote is tacked onto new, but seems to refer to 'old' style IPMP?
On Wed, 28 Sep 2005, Peter Memishian wrote: > Jim wrote:
> > - What is the MTU on an IPMP interface? Is it the minimum of all > > member links?
In a similar vein what about baud_rate and metric? (I don't think baud_rate is set to anything useful at the moment, but I wouldn't mind seeing it made useful in future. See further below for metric).
> > In fact, I'm not sure why IFF_RUNNING on the member links would be > > cleared out by IFF_FAILED. It's not as though ordinary > > applications would ever see those interfaces, so they cannot be > > confused by the meaning of the extra bits; they need to set > > special Solaris-specific flags to see them at all. So why the > > interlock? > > To make it clear to someone using ifconfig or other administrative tools.
But the interface may well be functioning fine, it may be the probe target(s) alone which have failed, while other on-link hosts are still reachable. An application that specifically asked to see the (otherwise hidden) underlying interfaces might well be interested in the difference.
Further, clearing IFF_RUNNING due to IFF_FAILOVER is going to cause problems for routing socket listeners where:
1. There are both IPMP-member logical IP interfaces and non-IPMP-member IP logical interfaces bound to the same physical interfaces
2. The 0th logical interface is an IPMP member.
3. The application uses the if_msghdr if_flags field to retrieve physical interface related flags (rather than GLIFFLAGS)
See further below.
> > p23, 'route' changes: > > > > - If this functionality is implemented in the 'route' command > > itself, rather than in the kernel, what does that mean for > > existing utilities? It seems like the "add static route" feature > > in Zebra and the like will be harmed by this. > > I'd prefer to isolate this to route. Why would zebra be adding routes to > the underlying interfaces?
> > (For what it's worth, I think those utilities are probably blown > > out of the water by removing "ce0" from the SIOCGIFCONF data, and > > will need manual intervention to convert their configurations > > over. I hope that there's not much mixed IPMP/Zebra usage.) > > Why would they want to know about ce0? Please elaborate.
They wouldn't want to, but they may already have definitions in their configuration for these interfaces from pre-new-IPMP which would then need to be migrated over to the new ipmpX interface. However, I don't think it'd be a problem (for 'zebra' at least, given the incompatibilities with the older model).
> Could you explain more about how routing currently makes use of > LIFMETRIC? Since routing daemons will run over the IPMP interface, I'd > (clearly incorrectly) assumed that it would use this ioctl.
It could also try acquire the metric from the IFINFO message. It's an administrative metric used to seed any originated routes deriving from that route and also influence route calculation.
I can't think of any way how you'd reconcile differing metrics of member links though to arrive at an 'aggregate' metric. Hence I'd suggest that the only compatible way would be to only use the member interfaces with the same best metric which are active, and treat any other lower-metric interfaces as STANDBY.
> > - If the vni driver goes away, does the vni(7D) man page go too? > > Yes; made explicit.
Hmm, the VNI driver is useful, eg for hosting addresses on - if you wanted more than 8192 addresses. ;)
I have a question about routing socket behaviour:
- Is IPMP group membership a per-logical interface thing? Ie is it possible to have a set of logical interfaces where some addresses are members of an IPMP group (NOFAILOVER, and hence meant to be hidden) and where some are not (and hence not meant to be hidden)?
If this is the case could I suggest the following:
- Do *not* suppress RTM_INFO events (listeners very likely key internal interface creation/deletion events on this event, and/or use it to 'grab' PHYINT flags - IFINFO is only ever sent for 0th interface)
- Suppress/fake *just* the address related events, RTM_{NEW,DEL}ADDR pertaining to the IPMP address, as appropriate.
- Clear IPMP related flags (IFF_FAILOVER particularly) from IFINFO for all interfaces bar IFF_IPMP interfaces. However, this is going to have issues with the proposed IFF_FAILOVER/IFF_RUNNING mirroring scheme.
Otherwise applications will potentially receive RTM_NEWADDR's for addresses (the normal, not FAILOVER address) on interfaces which they never received an IFINFO for.
The worst case with this modification is that you send IFINFO for interfaces with only IFF_FAILOVER addresses and hence you never actually send any RTM_NEWADDRs for that ifindex, but an application /must/ be able to handle this already anyway.
Note that the answer might well to be fix routing socket, rather than change the IPMP proposal.
regards,
--paulj
_______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
215
From:
Scotland
Registered:
9/15/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 4:40 AM
in response to: paulj
|
|
On Wed, 28 Sep 2005, Paul Jakma wrote:
> But the interface may well be functioning fine, it may be the probe target(s) > alone which have failed, while other on-link hosts are still reachable. An > application that specifically asked to see the (otherwise hidden) underlying > interfaces might well be interested in the difference. > > Further, clearing IFF_RUNNING due to IFF_FAILOVER is going to cause problems > for routing socket listeners where: > > 1. There are both IPMP-member logical IP interfaces and non-IPMP-member > IP logical interfaces bound to the same physical interfaces > > 2. The 0th logical interface is an IPMP member. > > 3. The application uses the if_msghdr if_flags field to retrieve > physical interface related flags (rather than GLIFFLAGS)
Ah, and even if the app does go do a GLIFFLAGS, it only has name of the 0th interface anyway (at least via IFINFO), so it won't get the logical interface flags.
> - Clear IPMP related flags (IFF_FAILOVER particularly) from IFINFO for > all interfaces bar IFF_IPMP interfaces. However, this is going to have > issues with the proposed IFF_FAILOVER/IFF_RUNNING mirroring scheme.
One answer to both this is and the first paragraph (apps that want to know link-state of member interfaces for some strange reason) would be to introduce support for the BSD if_link_state - it would /explicitely/ reflect link-state and link-state only.
That would also solve the routing-socket problems for applications which are updated to look at link-state instead, rather than the RUNNING.
--paulj
_______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
3,045
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 8:55 AM
in response to: paulj
|
|
> Footnote 9 is confusing, does it refer to new IPMP or old? The footnote > is tacked onto new, but seems to refer to 'old' style IPMP?
It refers to the new model. I will update it to be clearer.
> > > - What is the MTU on an IPMP interface? Is it the minimum of all > > > member links? > > In a similar vein what about baud_rate and metric? (I don't think > baud_rate is set to anything useful at the moment, but I wouldn't mind > seeing it made useful in future. See further below for metric).
IPMP does not work in over IFF_POINTOPOINT -- so I do not see the relevance of baud_rate.
> > > In fact, I'm not sure why IFF_RUNNING on the member links would be > > > cleared out by IFF_FAILED. It's not as though ordinary > > > applications would ever see those interfaces, so they cannot be > > > confused by the meaning of the extra bits; they need to set > > > special Solaris-specific flags to see them at all. So why the > > > interlock? > > > > To make it clear to someone using ifconfig or other administrative tools. > > But the interface may well be functioning fine, it may be the probe > target(s) alone which have failed, while other on-link hosts are still > reachable. An application that specifically asked to see the (otherwise > hidden) underlying interfaces might well be interested in the > difference.
Right. In that case, the underlying interface will not be marked IFF_FAILED. However, if the underlying interface is IFF_FAILED, then it should also have IFF_RUNNING cleared to make it clear to applications that it's not usable.
> Further, clearing IFF_RUNNING due to IFF_FAILOVER is going to cause > problems for routing socket listeners where:
You mean IFF_FAILED, not IFF_FAILOVER, right?
> 1. There are both IPMP-member logical IP interfaces and non-IPMP-member > IP logical interfaces bound to the same physical interfaces > > 2. The 0th logical interface is an IPMP member. > > 3. The application uses the if_msghdr if_flags field to retrieve > physical interface related flags (rather than GLIFFLAGS)
None of this is possible because IPMP membership is an interface property, not a logical interface property.
> > Why would they want to know about ce0? Please elaborate. > > They wouldn't want to, but they may already have definitions in their > configuration for these interfaces from pre-new-IPMP which would then > need to be migrated over to the new ipmpX interface. However, I don't > think it'd be a problem (for 'zebra' at least, given the > incompatibilities with the older model).
Since IPMP and routing do not currently work together, I don't think we need to worry about migration. However, we will need to explain to folks how to take their existing non-IPMP configurations and make them work in an IPMP environment.
> > Could you explain more about how routing currently makes use of > > LIFMETRIC? Since routing daemons will run over the IPMP interface, I'd > > (clearly incorrectly) assumed that it would use this ioctl. > > It could also try acquire the metric from the IFINFO message. It's an > administrative metric used to seed any originated routes deriving from > that route and also influence route calculation. > > I can't think of any way how you'd reconcile differing metrics of member > links though to arrive at an 'aggregate' metric. Hence I'd suggest that > the only compatible way would be to only use the member interfaces with > the same best metric which are active, and treat any other lower-metric > interfaces as STANDBY.
The applications will not be aware of member interfaces, so I don't see how that would work.
> > > - If the vni driver goes away, does the vni(7D) man page go too? > > > > Yes; made explicit. > > Hmm, the VNI driver is useful, eg for hosting addresses on - if you > wanted more than 8192 addresses. ;)
The VNI IP interface will still exist. We are only talking about the implementation.
> I have a question about routing socket behaviour: > > - Is IPMP group membership a per-logical interface thing?
No.
> Ie is it > possible to have a set of logical interfaces where some addresses are > members of an IPMP group (NOFAILOVER, and hence meant to be hidden) > and where some are not (and hence not meant to be hidden)?
No.
> Note that the answer might well to be fix routing socket, rather than > change the IPMP proposal.
If you want to fix routing sockets, by all means go ahead :-) That work is too large and too tangential to the IPMP rearchitecture to be done as part of this work. Nothing in this proposal precludes a sane rearchitecture of routing sockets.
-- meem _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
215
From:
Scotland
Registered:
9/15/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 9:59 AM
in response to: meem
|
|
On Wed, 28 Sep 2005, Peter Memishian wrote:
> IPMP does not work in over IFF_POINTOPOINT -- so I do not see the > relevance of baud_rate.
baud_rate isn't specific to PtP though. However, we don't set it at all - yet. (But if I happen to figure out where/when MII information is available and get a chance to store it in the phyint, I'd love to do so ;) ).
> Right. In that case, the underlying interface will not be marked > IFF_FAILED. However, if the underlying interface is IFF_FAILED, then it > should also have IFF_RUNNING cleared to make it clear to applications that > it's not usable.
Ok.
> > Further, clearing IFF_RUNNING due to IFF_FAILOVER is going to cause > > problems for routing socket listeners where: > > You mean IFF_FAILED, not IFF_FAILOVER, right?
Yes :).
> None of this is possible because IPMP membership is an interface property, > not a logical interface property.
Ah ok. Grand so - no problems with route-sock listeners if the member interfaces will be completely invisible.
> Since IPMP and routing do not currently work together, I don't think > we need to worry about migration. However, we will need to explain to > folks how to take their existing non-IPMP configurations and make them > work in an IPMP environment.
Yep.
> The applications will not be aware of member interfaces, so I don't > see how that would work.
The application won't be, but if there are differing metrics between the member interfaces, maybe IPMP should try honour the metrics?
> > - Is IPMP group membership a per-logical interface thing? > > No.
Ah, ok.
So logical subnets will not be possible on such physical interfaces? If you wanted that, you'd create additional IPMP groups, right?
> If you want to fix routing sockets, by all means go ahead :-) That > work is too large and too tangential to the IPMP rearchitecture to be > done as part of this work. Nothing in this proposal precludes a sane > rearchitecture of routing sockets.
:)
I plan to try some experiments later in the year, might be difficult to do and remain fully backward compatible though :(. We'll see.
--paulj
_______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
3,045
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 10:35 AM
in response to: paulj
|
|
> > IPMP does not work in over IFF_POINTOPOINT -- so I do not see the > > relevance of baud_rate. > > baud_rate isn't specific to PtP though. However, we don't set it at all > - yet. (But if I happen to figure out where/when MII information is > available and get a chance to store it in the phyint, I'd love to do so > ;) ).
Could you say more about what broadcast networks operate over a modem? In any case, once you have specifics, I'm sure we can work it in.
> > The applications will not be aware of member interfaces, so I don't > > see how that would work. > > The application won't be, but if there are differing metrics between the > member interfaces, maybe IPMP should try honour the metrics?
I don't see how that could be used properly by an application, as an application has no idea interface (in the group) a given packetwill be sent over. So, I think the IPMP group interface needs to be the maximum metric associated with any interface in the group. It could also be argued that configuring different metrics on different underlying interfaces is an administrative error.
> So logical subnets will not be possible on such physical interfaces? If > you wanted that, you'd create additional IPMP groups, right?
The addresses on each physical interface need not be on the same subnet (e.g., you could have multiple subnets on the same link).
-- meem _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
6,810
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 10:54 AM
in response to: meem
|
|
Peter Memishian writes: > > > IPMP does not work in over IFF_POINTOPOINT -- so I do not see the > > > relevance of baud_rate. > > > > baud_rate isn't specific to PtP though. However, we don't set it at all > > - yet. (But if I happen to figure out where/when MII information is > > available and get a chance to store it in the phyint, I'd love to do so > > ;) ). > > Could you say more about what broadcast networks operate over a modem? In > any case, once you have specifics, I'm sure we can work it in.
"baud_rate" is a lousy name, as it clearly makes people think "modems." "Bit rate" is better.
Interfaces do have various important metrics: nominal speed and delay are two of the more important ones, but there are certainly others, such as indications of shared facilities for those concerned about path diversity.
In an aggregate interface, such as ipmp0, you often have to represent both the aggregate speed (sum of all the member links) as well as the largest reservable chunk (speed of fastest link).
But that's probably way overkill given the lack of constraint-based routing. Merely showing the speed of the fastest link (as the 'ifspeed' kstat from "MIB-II KSTATS" [PSARC 1997/198]) might be sufficient.
-- James Carlson, KISS Network <james dot d dot carlson at sun dot com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
3,045
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 11:07 AM
in response to: carlsonj
|
|
> "baud_rate" is a lousy name, as it clearly makes people think > "modems." "Bit rate" is better.
Ah, I see what was meant now.
> In an aggregate interface, such as ipmp0, you often have to represent > both the aggregate speed (sum of all the member links) as well as the > largest reservable chunk (speed of fastest link). > > But that's probably way overkill given the lack of constraint-based > routing. Merely showing the speed of the fastest link (as the > 'ifspeed' kstat from "MIB-II KSTATS" [PSARC 1997/198]) might be > sufficient.
Why not slowest (i.e., the most we can guarantee)?
-- meem _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
6,810
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 11:10 AM
in response to: meem
|
|
Peter Memishian writes: > > But that's probably way overkill given the lack of constraint-based > > routing. Merely showing the speed of the fastest link (as the > > 'ifspeed' kstat from "MIB-II KSTATS" [PSARC 1997/198]) might be > > sufficient. > > Why not slowest (i.e., the most we can guarantee)?
Hence the problem ...
Any of those answers would actually be fine by me.
-- James Carlson, KISS Network <james dot d dot carlson at sun dot com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
3,045
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 11:15 AM
in response to: carlsonj
|
|
> Hence the problem ... > > Any of those answers would actually be fine by me.
Under what situation would it be ideal to report the the fastest link speed?
-- meem _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
6,810
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 12:22 PM
in response to: meem
|
|
Peter Memishian writes: > > > Hence the problem ... > > > > Any of those answers would actually be fine by me. > > Under what situation would it be ideal to report the the fastest link > speed?
Not sure about "ideal," but it would help in distinguishing a group that has all low-speed interfaces from one that has mostly high-speed interfaces.
-- James Carlson, KISS Network <james dot d dot carlson at sun dot com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
215
From:
Scotland
Registered:
9/15/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 1:20 PM
in response to: meem
|
|
On Wed, 28 Sep 2005, Peter Memishian wrote:
> Could you say more about what broadcast networks operate over a modem?
Ethernets have a baud rate too, I'd be hard-pressed to think of a network type which didn't.
It has become corrupted though to generally mean "bit rate", despite the fact that often baud-rate != bit-rate, eg Gige has a bit-rate of 1Gb/s (b = bit), but a baud rate of ~125Mb/s (b = baud) per pair, 500Mbaud/s in total, somesuch.
But generally it's taken to be bit/s.
> In any case, once you have specifics, I'm sure we can work it in.
See above. It's the "band-width" of the interface :). And it would be useful to report it if possible - so that OSPF would not need to have interface 'bandwidth' administratively defined. (Another corrupted use of a term, I know ;) ).
> I don't see how that could be used properly by an application, as an > application has no idea interface (in the group) a given packetwill be > sent over.
The application wouldn't have a use no. But IPMP shouldn't allow different metric interfaces to be joined together, at least - the non-best metric interfaces should be STANDBY or somesuch.
> So, I think the IPMP group interface needs to be the maximum metric > associated with any interface in the group.
Hmm, no. That would clash if there were some other interface with a metric 'in between'. Remember, routing protocols /will/ make use of this metric if it is present to decide which interfaces to install routes out of, and possibly with what protocol cost to advertise certain addresses/routes to others. If a metric is set, it is set by the administrator and presumably for good reason.
eg a system with:
ipmp0: with members bge0 (metric 100) and bge1 (metric 1000) bge2: metric X
If X is 1000, a routing application might consider ipmp0 and bge2 to be wholly equal - which clearly they are not. Lacking ECMP it might decide (arbitrarily) to use bge2, when clearly ipmp0 is the better interface (for underlying interface bge1 has same metric, and bge0 a much better metric).
Similar problems if you report the best-member-metric instead.
> It could also be argued that configuring different metrics on > different underlying interfaces is an administrative error.
That seems a simple and perfectly fine answer.
> The addresses on each physical interface need not be on the same subnet > (e.g., you could have multiple subnets on the same link).
Neat :).
--paulj
_______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
6,810
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 1:27 PM
in response to: paulj
|
|
Paul Jakma writes: > > In any case, once you have specifics, I'm sure we can work it in. > > See above. It's the "band-width" of the interface :). And it would be > useful to report it if possible - so that OSPF would not need to have > interface 'bandwidth' administratively defined. (Another corrupted use > of a term, I know ;) ).
I think the discussion is getting pretty far off the topic of IPMP redesign ... but we already have an interface speed reporting mechanism as part of the existing MIB-II family of kstats. It's called "ifspeed." We won't need another.
> > So, I think the IPMP group interface needs to be the maximum metric > > associated with any interface in the group. > > Hmm, no. That would clash if there were some other interface with a > metric 'in between'. Remember, routing protocols /will/ make use of this > metric if it is present to decide which interfaces to install routes out > of, and possibly with what protocol cost to advertise certain > addresses/routes to others. If a metric is set, it is set by the > administrator and presumably for good reason.
I agree. I think the ipmp interface ought to have its own metric, because this is an administratively-assigned value, not something that is inherent in the underlying interface. (Thus, it's not really the same as MTU or speed.)
And, in fact, simply disallowing the user from ever setting or querying the metric on the underlying links ought to underscore the issue.
(If you want to copy the metric over from the first physical link that establishes the group in the case where someone doesn't create ipmp0 explicitly, that might make sense, but I don't think it's really necessary.)
-- James Carlson, KISS Network <james dot d dot carlson at sun dot com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
215
From:
Scotland
Registered:
9/15/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 1:41 PM
in response to: carlsonj
|
|
On Wed, 28 Sep 2005, James Carlson wrote:
> I think the discussion is getting pretty far off the topic of IPMP > redesign
Sorry :).
> ... but we already have an interface speed reporting mechanism as part > of the existing MIB-II family of kstats. It's called "ifspeed." We > won't need another.
Well, ifi_baud_rate exists already, just needs to be updated with this 'ifspeed' - but another matter indeed.
What /is/ of concern to IPMP is what to report for this ifspeed/ifi_baud_rate if members have differing values. I'd agree it's fairly arbitrary, lowest speed would probably be the most conservative though (would be best choice for OSPF at least).
> I agree. I think the ipmp interface ought to have its own metric, > because this is an administratively-assigned value, not something that > is inherent in the underlying interface.
> And, in fact, simply disallowing the user from ever setting or > querying the metric on the underlying links ought to underscore the > issue.
0 unless set explicitely on the ipmp interface? Even better, yes.
--paulj
_______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
3,045
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 2:37 PM
in response to: paulj
|
|
> > I agree. I think the ipmp interface ought to have its own metric, > > because this is an administratively-assigned value, not something that > > is inherent in the underlying interface. > > > And, in fact, simply disallowing the user from ever setting or > > querying the metric on the underlying links ought to underscore the > > issue. > > 0 unless set explicitely on the ipmp interface? Even better, yes.
Done; see section 5.7 of version 1.3.1 of the document, which I just posted. The link is the same:
http://opensolaris.org/os/community/networking/ipmp-highlevel-design.pdf
Thanks guys. -- meem _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 2:54 PM
in response to: meem
|
|
On Wed, 28 Sep 2005, Peter Memishian wrote:
> Done; see section 5.7 of version 1.3.1 of the document, which I > just posted. The link is the same: > > http://opensolaris.org/os/community/networking/ipmp-highlevel-design.pdf
Ah, could the networking community page link to this? :) Also to the tunnel doc?
> Thanks guys.
No worries! Night!
regards, -- Paul Jakma paul at clubi dot ie paul at jakma dot org Key ID: 64A2FF6A Fortune: mummy, n.: An Egyptian who was pressed for time. _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
3,045
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 29, 2005 2:20 PM
in response to: Paul Jakma
|
|
> > http://opensolaris.org/os/community/networking/ipmp-highlevel-design.pdf > > Ah, could the networking community page link to this? :) Also to > the tunnel doc?
Done.
-- meem _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
6,810
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 10:32 AM
in response to: meem
|
|
Peter Memishian writes: > > Instead, the real term should be just "TCP/IP based." > > So UDP and SCTP don't benefit?
UDP, SCTP, ICMP, and many other protocols are typically considered part of the "TCP/IP protocol suite."
> :-P I can say "AF_INET[6]-based" if you > prefer. But claiming that IPMP will work with all IP-based applications, > including those using TLI/XTI, seems a bit too bold to me (and I'm not > convinced it's time well spent to guarantee it).
It also doesn't work with _all_ sockets-based applications, so I'm not sure what the point is.
What I'm trying to say is that where this solution works at all, it works with IPv4 and IPv6, and not anything else, and that there are non-sockets applications (such as NFS/RPC) that work fine with it, so "sockets" isn't the right peg for that hat.
[ping problems elided] > Sadly, these three problems are inherent in the probe-based failure > detection mechanism. I'm not sure what we can do about them from a > technical standpoint.
True. As long as it's clear that the list has been trimmed to include only those things that are fixable by this project (and leaves some things on the table), I suppose that's ok.
The list just read strangely to me because I was _looking_ to see those things. (Yes, I realize that one of the issues is the length of the document. No, I don't think that means that all the issues need to be included.)
> > . The detailed behavior of general multicast (not the > > well-known link-local multicast addresses) is less clear. > > In particular, the behavior necessary to accomodate > > IGMP-snooping switches is probably missing. > > I'm not convinced this is a problem that shapes the *high-level* design.
One of the criticisms leveled against routing is that there's no good solution for multicast. I'm pointing out that multicast isn't completely solved here, either.
> > - I'm not sure the described behavior here really represents > > FAILBACK=no. (But, then, I'm not sure what behavior would really [...] > Yes, it's a bit of an odd bird. Personally, I'd love to get rid of this > feature, but I know there are customers who hate unnecessary rebinding of > addresses to interfaces (because of the affect it has on others hosts) and > thus want to have that happen as little as possible. > > I've added a little more rationale behind the feature, but I don't want to > devote too much space to this wart (I'd like to kill it, but I can't).
My point here is that I don't think the new behavior really represents very well what the old code did. In particular, this case:
(AA) -> (FA) -> (FF) -> (IF) -> (II)
seems to be quite problematic. The old code would have failed over the addresses to the second interface at that first event, but then performed no other changes. This would have left the equivalent of (IA) as the end result, but that's not what this proposed implementation seems to do. Instead, it leaves the whole group failed out as "inactive."
Did I read that correctly?
> > p9, section 3.9: > > > > - It would help a bit, I think, to segregate out the flags that are > > on logical interfaces (address flags) from those that are on the > > underlying physical interface. (The break is after the 5th entry > > in the first table, and after the second in the second table.) > > Segregate how -- with an extra line in the table?
A bold line between the two groups would do it.
> And for what purpose?
The implications of the two sets of flags are very different, and it's clear that many people looking at the flags (and likely quite a few reading this document) are just unclear on the difference -- or that there even is any.
> The mirror-image must be maintained to ensure that naive applications > behave correctly: if there was a situation where IFF_FAILED was set and > IFF_RUNNING was set, then an application would try to use an unusable > interface. (The other case, where IFF_FAILED was clear and IFF_RUNNING > was clear makes no semantic sense: how can the interface not be > IFF_RUNNING, but not be IFF_FAILED?)
That latter case does in fact happen -- when in.mpathd isn't running.
I can mostly understand having the kernel clear IFF_RUNNING when IFF_FAILED is set by the application. But I suspect that you need to maintain the "real" state underneath so that IP can turn IFF_RUNNING back on when IFF_FAILED is cleared *AND* the hardware state is copacetic, and not turn it back on if the hardware state isn't right.
> > In fact, I'm not sure why IFF_RUNNING on the member links would be > > cleared out by IFF_FAILED. It's not as though ordinary > > applications would ever see those interfaces, so they cannot be > > confused by the meaning of the extra bits; they need to set > > special Solaris-specific flags to see them at all. So why the > > interlock? > > To make it clear to someone using ifconfig or other administrative tools.
Seeing "FAILED" in the ifconfig output looks pretty clear to me. And you're proposing changes that make it *certain* that any administrative tools that can see these underlying interfaces at all must already be updated to support IPMP.
In addition to that, clearing out RUNNING down at the member link level means that IPMP-aware applications are *compelled* to use DLPI to figure out what's going on at the physical layer. That seems unfortunate, as we previously used IFF_RUNNING for exactly that purpose.
Moreover, the "FAILED" flag up at the ipmp bundle level doesn't seem to me to add a lot more value over clearing out RUNNING (which is what non-IPMP-aware applications will look at).
So, I don't think the FAILED->~RUNNING functionality is actually needed. (And we've had a bit of a history in getting bits tangled together, so if it can be avoided, it'd be nice.)
> > p18, section 4.2.2: > > > > - Should there be a "-n" option to suppress address-to-name > > translation? (And should "names" be the default the way they are > > most everywhere else?) > > What name translation?
I'm just asking about parallelism with other *stat commands, such as netstat. Those tend to print out _names_ rather than raw addresses by default, and use a "-n" flag to suppress it.
But if you want this one to be different, and always print numeric addresses, that's fine by me.
> > # ifconfig foobar0 plumb group a 10.0.0.1 up > > > > and since "foobar0" will never exist, this will add the address to > > the named group. > > I don't really see how this is new -- I can craft up arbitrary > hostname.<if> files today and achieve similar results. Note that if group > "a" doesn't exist at all by the time the system gets to handling missing > interfaces, the above will be ignored.
The difference is that it fails from ifconfig today:
# ifconfig foobar0 plumb group a 10.0.0.1 up ifconfig: plumb: foobar0: No such file or directory #
> > p23, 'route' changes: > > > > - If this functionality is implemented in the 'route' command > > itself, rather than in the kernel, what does that mean for > > existing utilities? It seems like the "add static route" feature > > in Zebra and the like will be harmed by this. > > I'd prefer to isolate this to route. Why would zebra be adding routes to > the underlying interfaces?
Because (a) Zebra and other daemons allow you to add static routes and (b) user's existing configuration files will already mention those interfaces.
> > (For what it's worth, I think those utilities are probably blown > > out of the water by removing "ce0" from the SIOCGIFCONF data, and > > will need manual intervention to convert their configurations > > over. I hope that there's not much mixed IPMP/Zebra usage.) > > Why would they want to know about ce0? Please elaborate.
I don't think they "want" to know about it. Today, they need to specify "ce0" (or its ifindex) if they want to tie the route to the interface group. There's no "group" representation, so the interface name is what they're using.
On upgrade, those configurations will now become unusable because the interface names have disappeared.
I agree that it's a bit of a corner case -- someone has to be using the belt-and-suspenders approach of having both IPMP and some routing daemon on the system. We can just hope this doesn't happen (and document around it if it does).
> > p24, section 4.6.2: > > > > - NumAddrs: ew. This should really be based on the number of member > > links in the IPMP group. You'll want to have at least one data > > address per member link in order to get the inbound load-spreading > > right. It'd be better still if in.ndpd just did the right thing. > > I'm fine with having in.ndpd try to initially configure as many global > addresses as there are interfaces, but I'm not sure what to do if an > interface is removed -- is it really okay to blow away a global address at > that point? It *feels* wrong to do that.
Agreed. I think you end up having to support using the maximum number that were configured at any one time.
In practice, that shouldn't be too bad. It's hard to add an unbounded amount of hardware to one group.
> > - I had trouble reading this. I assume it means just that > > RTM_NEWADDR will occur during address transfer, and not that > > IFF_NOFAILOVER is the only possible way this message could be > > sent. > > How else could it happen? Any UP data addresses have already migrated to > the IPMP interface.
ifconfig ipmp0:2 10.0.0.1 up?
> > - What do SIOC[GS]LIFMETRIC mean on member link interfaces? This > > doesn't make sense, as routing (the consumer of interface metrics) > > won't use them. Does something in IPMP itself use them? > > Could you explain more about how routing currently makes use of LIFMETRIC?
Sure. It's taken to be the administrator's reported "expense" of sending or receiving a packet through that interface, and is used to adjust the preference of routes within the RIB. (Preferred routes are selected out and injected into the FIB -- the kernel's forwarding table.)
For RIP-2, we just add the SIOCLIFMETRIC to the hop count when figuring the attractiveness of routes we learn over that interface and when advertising to others.
It provides a way for the administrator to say, "you shouldn't normally use this interface because it's slow, but if you have no other choice, go ahead."
It's pretty primitive, but the usage is an ancient BSD-ism.
> Since routing daemons will run over the IPMP interface, I'd (clearly > incorrectly) assumed that it would use this ioctl.
Right. They will -- on the ipmp interface, not on the underlying member links.
> > p34, section 5.10: > > > > - Probably need more detailed behavior for SIOCDARP. > > Please elaborate -- what more would you like to see?
I should have written "SIOCDXARP." That can take an interface name, which we currently use with ill_lookup_on_name. It's not clear to me whether that ought to take a member link name or the ipmp interface name or perhaps both in different contexts.
> > p36: > > > > - Any changes for PSARC 2002/137 (IPMP Asynchronous Event > > Definitions) due to this project? > > Probably. That's not a documented interface, so it was not discussed. > However, this is of little concern, as it seems that Sun Cluster never > made use of the definitions.
Yoiks! Thanks for the update.
-- James Carlson, KISS Network <james dot d dot carlson at sun dot com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
3,045
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 28, 2005 6:05 PM
in response to: carlsonj
|
|
> > > Instead, the real term should be just "TCP/IP based." > > > > So UDP and SCTP don't benefit? > > UDP, SCTP, ICMP, and many other protocols are typically considered > part of the "TCP/IP protocol suite."
Hmm, I find that too easy to misunderstand. I've gone with "IP-based networking applications". Is that acceptable?
> > Sadly, these three problems are inherent in the probe-based failure > > detection mechanism. I'm not sure what we can do about them from a > > technical standpoint. > > True. As long as it's clear that the list has been trimmed to include > only those things that are fixable by this project (and leaves some > things on the table), I suppose that's ok. > > The list just read strangely to me because I was _looking_ to see > those things. (Yes, I realize that one of the issues is the length of > the document. No, I don't think that means that all the issues need > to be included.)
I see. The intent is to focus on techincal problems that can be fixed.
> > > . The detailed behavior of general multicast (not the > > > well-known link-local multicast addresses) is less clear. > > > In particular, the behavior necessary to accomodate > > > IGMP-snooping switches is probably missing. > > > > I'm not convinced this is a problem that shapes the *high-level* design. > > One of the criticisms leveled against routing is that there's no good > solution for multicast. I'm pointing out that multicast isn't > completely solved here, either.
Agreed, but it's not flawed from a high-level design standpoint. As we get into more of the detailed design, I think there will be room to cover this.
> > > > - I'm not sure the described behavior here really represents > > > FAILBACK=no. (But, then, I'm not sure what behavior would really > [...] > > Yes, it's a bit of an odd bird. Personally, I'd love to get rid of this > > feature, but I know there are customers who hate unnecessary rebinding of > > addresses to interfaces (because of the affect it has on others hosts) and > > thus want to have that happen as little as possible. > > > > I've added a little more rationale behind the feature, but I don't want to > > devote too much space to this wart (I'd like to kill it, but I can't). > > My point here is that I don't think the new behavior really represents > very well what the old code did. In particular, this case:
The old code doesn't work at all, so I'd hope we don't represent that ;-)
> > (AA) -> (FA) -> (FF) -> (IF) -> (II) > > seems to be quite problematic. The old code would have failed over > the addresses to the second interface at that first event, but then > performed no other changes. This would have left the equivalent of > (IA) as the end result, but that's not what this proposed > implementation seems to do. Instead, it leaves the whole group failed > out as "inactive." > > Did I read that correctly?
Yes, I see the problem now. It seems that upon repair, in.mpathd should only set INACTIVE if there is another usable interface in the group. Otherwise, it should leave the interface active. That then should cause:
(AA) -> (FA) -> (FF) -> (AF) -> (AI)
Let me know if this fully addresses your concern, or whether you have deeper issues.
> > > p9, section 3.9: > > > > > > - It would help a bit, I think, to segregate out the flags that are > > > on logical interfaces (address flags) from those that are on the > > > underlying physical interface. (The break is after the 5th entry > > > in the first table, and after the second in the second table.) > > > > Segregate how -- with an extra line in the table? > > A bold line between the two groups would do it.
Sadly, a bold line seems to be more challenging in LaTeX than one would think. If I stumble on a good way to do it, I'll update.
> > And for what purpose? > > The implications of the two sets of flags are very different, and it's > clear that many people looking at the flags (and likely quite a few > reading this document) are just unclear on the difference -- or that > there even is any.
Agreed.
> > The mirror-image must be maintained to ensure that naive applications > > behave correctly: if there was a situation where IFF_FAILED was set and > > IFF_RUNNING was set, then an application would try to use an unusable > > interface. (The other case, where IFF_FAILED was clear and IFF_RUNNING > > was clear makes no semantic sense: how can the interface not be > > IFF_RUNNING, but not be IFF_FAILED?) > > That latter case does in fact happen -- when in.mpathd isn't running.
in.mpathd should be viewed as a critical system component -- what happens with IPMP when it's not running is no more relevant than what happens to DR when rcm_daemon isn't running.
> I can mostly understand having the kernel clear IFF_RUNNING when > IFF_FAILED is set by the application. But I suspect that you need to > maintain the "real" state underneath so that IP can turn IFF_RUNNING > back on when IFF_FAILED is cleared *AND* the hardware state is > copacetic, and not turn it back on if the hardware state isn't right. > > [ ... ] > > Seeing "FAILED" in the ifconfig output looks pretty clear to me. And > you're proposing changes that make it *certain* that any > administrative tools that can see these underlying interfaces at all > must already be updated to support IPMP. > > In addition to that, clearing out RUNNING down at the member link > level means that IPMP-aware applications are *compelled* to use DLPI > to figure out what's going on at the physical layer. That seems > unfortunate, as we previously used IFF_RUNNING for exactly that > purpose. > > Moreover, the "FAILED" flag up at the ipmp bundle level doesn't seem > to me to add a lot more value over clearing out RUNNING (which is what > non-IPMP-aware applications will look at). > > So, I don't think the FAILED->~RUNNING functionality is actually > needed. (And we've had a bit of a history in getting bits tangled > together, so if it can be avoided, it'd be nice.)
We talked a bit offline about this -- it really comes down to how one interprets the RUNNING flag. You clearly feel that it represents the link and hardware state, and I can certainly understand why. My take is that it represents IP's notion of whether the interface is usable -- and that is based on both the link/hardware state, *and* the probe state.
I'm leaning towards agreeing with you, but I want some more time to think about it and talk with some other folks.
> > > p18, section 4.2.2: > > > > > > - Should there be a "-n" option to suppress address-to-name > > > translation? (And should "names" be the default the way they are > > > most everywhere else?) > > > > What name translation? > > I'm just asking about parallelism with other *stat commands, such as > netstat. Those tend to print out _names_ rather than raw addresses by > default, and use a "-n" flag to suppress it. > > But if you want this one to be different, and always print numeric > addresses, that's fine by me.
For "ipmpstat -a", using hostnames rather than addresses means that the table key (the first column) is no longer guaranteed to be unique -- ick. The only other context that addresses come up is with regard to probe targets. I'd be willing to go either way on that one.
> > > # ifconfig foobar0 plumb group a 10.0.0.1 up > > > > > > and since "foobar0" will never exist, this will add the address to > > > the named group. > > > > I don't really see how this is new -- I can craft up arbitrary > > hostname.<if> files today and achieve similar results. Note that if group > > "a" doesn't exist at all by the time the system gets to handling missing > > interfaces, the above will be ignored. > > The difference is that it fails from ifconfig today: > > # ifconfig foobar0 plumb group a 10.0.0.1 up > ifconfig: plumb: foobar0: No such file or directory > #
I'm confused why this won't fail after the rearchitecture.
> > > p23, 'route' changes: > > > > > > - If this functionality is implemented in the 'route' command > > > itself, rather than in the kernel, what does that mean for > > > existing utilities? It seems like the "add static route" feature > > > in Zebra and the like will be harmed by this. > > > > I'd prefer to isolate this to route. Why would zebra be adding routes to > > the underlying interfaces? > > Because (a) Zebra and other daemons allow you to add static routes and > (b) user's existing configuration files will already mention those > interfaces.
But this will only affect folks migrating to an IPMP-based configuration. I'd much rather provide clear documentation that static routes should not be associated with underlying interfaces than support the remapping in the kernel.
> > > (For what it's worth, I think those utilities are probably blown > > > out of the water by removing "ce0" from the SIOCGIFCONF data, and > > > will need manual intervention to convert their configurations > > > over. I hope that there's not much mixed IPMP/Zebra usage.) > > > > Why would they want to know about ce0? Please elaborate. > > I don't think they "want" to know about it. Today, they need to > specify "ce0" (or its ifindex) if they want to tie the route to the > interface group. There's no "group" representation, so the interface > name is what they're using.
If they were using IPMP with any routing daemon today, bless their souls. (In the case of Quagga, we document that it is not supported.)
> On upgrade, those configurations will now become unusable because the > interface names have disappeared. > > I agree that it's a bit of a corner case -- someone has to be using > the belt-and-suspenders approach of having both IPMP and some routing > daemon on the system. We can just hope this doesn't happen (and > document around it if it does).
In this case, that seems reasonable to me.
> > > - I had trouble reading this. I assume it means just that > > > RTM_NEWADDR will occur during address transfer, and not that > > > IFF_NOFAILOVER is the only possible way this message could be > > > sent. > > > > How else could it happen? Any UP data addresses have already migrated to > > the IPMP interface. > > ifconfig ipmp0:2 10.0.0.1 up?
The RTM_NEWADDR discussion you refer to is in the context of removing an underlying interface from an IPMP group. In this case, there cannot be any UP data addresses. Of course, adding a data address directly to an IPMP interface will trigger an RTM_NEWADDR, but I fail to see what that has to do with the behavior of underlying interfaces.
> > > - What do SIOC[GS]LIFMETRIC mean on member link interfaces? This > > > doesn't make sense, as routing (the consumer of interface metrics) > > > won't use them. Does something in IPMP itself use them? > > > > Could you explain more about how routing currently makes use of LIFMETRIC?
Sorry, I misread your initial response (I thought you were talking about the IPMP interface metric). As per our other discussion, I've added section 5.7, covering our agreed-upon semantics for interface metrics.
> > > p34, section 5.10: > > > > > > - Probably need more detailed behavior for SIOCDARP. > > > > Please elaborate -- what more would you like to see? > > I should have written "SIOCDXARP." That can take an interface name, > which we currently use with ill_lookup_on_name. It's not clear to me > whether that ought to take a member link name or the ipmp interface > name or perhaps both in different contexts.
I see; added (and I believe it should always take an IPMP interface name -- if I'm wrong, SIOC[GS]XARP will need to be revisited as well).
-- meem _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
6,810
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 29, 2005 4:13 AM
in response to: meem
|
|
Peter Memishian writes: > Hmm, I find that too easy to misunderstand. I've gone with "IP-based > networking applications". Is that acceptable?
Yep.
> > One of the criticisms leveled against routing is that there's no good > > solution for multicast. I'm pointing out that multicast isn't > > completely solved here, either. > > Agreed, but it's not flawed from a high-level design standpoint. As we > get into more of the detailed design, I think there will be room to cover > this.
OK.
> Yes, I see the problem now. It seems that upon repair, in.mpathd should > only set INACTIVE if there is another usable interface in the group. > Otherwise, it should leave the interface active. That then should cause: > > (AA) -> (FA) -> (FF) -> (AF) -> (AI) > > Let me know if this fully addresses your concern, or whether you have > deeper issues.
No, I think that solves the problem.
> > A bold line between the two groups would do it. > > Sadly, a bold line seems to be more challenging in LaTeX than one would > think. If I stumble on a good way to do it, I'll update.
A doubled line isn't too hard ...
> > That latter case does in fact happen -- when in.mpathd isn't running. > > in.mpathd should be viewed as a critical system component -- what happens > with IPMP when it's not running is no more relevant than what happens to > DR when rcm_daemon isn't running.
I disagree with that. in.mpathd isn't running on systems that don't use IPMP. This means that the resulting interface is non-uniform. On systems where IPMP is in use, ~RUNNING becomes FAILED. But on systems where it's not ~RUNNING is just on its own. So the symmetry of the two flags exists only in _some_ cases.
> We talked a bit offline about this -- it really comes down to how one > interprets the RUNNING flag. You clearly feel that it represents the link > and hardware state, and I can certainly understand why. My take is that > it represents IP's notion of whether the interface is usable -- and that > is based on both the link/hardware state, *and* the probe state. > > I'm leaning towards agreeing with you, but I want some more time to think > about it and talk with some other folks.
OK.
> For "ipmpstat -a", using hostnames rather than addresses means that the > table key (the first column) is no longer guaranteed to be unique -- ick. > The only other context that addresses come up is with regard to probe > targets. I'd be willing to go either way on that one.
I'd say names are even more important for the probe targets. In general, though, I'm just suggesting that our commands ought to be consistent from one to another. If the tradition (ping, netstat, traceroute) is to translate numbers to names unless specifically disabled, then new commands ought to do the same.
I think the argument that might work here is that this is more like ifconfig than like any of those other commands, and ifconfig doesn't print names. I'm not sure I _agree_ with that, but it's at least plausible.
As for uniqueness, I don't see how that really matters. If someone has multiple addresses mapping to a single name, then the output of other commands on that same system (e.g., "netstat -i") is going to show the same lack of uniqueness unless "-n" is used.
> > The difference is that it fails from ifconfig today: > > > > # ifconfig foobar0 plumb group a 10.0.0.1 up > > ifconfig: plumb: foobar0: No such file or directory > > # > > I'm confused why this won't fail after the rearchitecture.
Maybe I misunderstood. I read this section as implying that ifconfig itself would be changed to deal with interface plumbing failure by transferring the address.
Or are we sticking with the existing ifparse-based machinery in net_include.sh?
> But this will only affect folks migrating to an IPMP-based configuration. > I'd much rather provide clear documentation that static routes should not > be associated with underlying interfaces than support the remapping in the > kernel.
It seems odd that we'd try harder with /sbin/route, but OK.
> > > > - I had trouble reading this. I assume it means just that > > > > RTM_NEWADDR will occur during address transfer, and not that [...] > The RTM_NEWADDR discussion you refer to is in the context of removing an > underlying interface from an IPMP group. In this case, there cannot be > any UP data addresses. Of course, adding a data address directly to an > IPMP interface will trigger an RTM_NEWADDR, but I fail to see what that > has to do with the behavior of underlying interfaces.
Returning to my original comment: the text wasn't clear. It said that the "only" way RTM_NEWADDR happens is with transfer, and that's not the case.
> > I should have written "SIOCDXARP." That can take an interface name, > > which we currently use with ill_lookup_on_name. It's not clear to me > > whether that ought to take a member link name or the ipmp interface > > name or perhaps both in different contexts. > > I see; added (and I believe it should always take an IPMP interface name > -- if I'm wrong, SIOC[GS]XARP will need to be revisited as well).
Not sure. There are certainly ARP entries for the underlying interfaces as well. If there weren't, then you couldn't actually probe any of those targets.
And when manipulating ARP entries for the IPMP interface, things get a bit confusing. Do you select the underlying interface with which a given entry is associated by specifying its MAC address? How do I say, "add it to the group and set its MAC address to be the same as some appropriate member?" What's the MAC address of an IPMP interface?
(Note that published ARP entries _can_ have MAC addresses that don't match any local address, so the solution can't restrict the interface to handle only matching addresses.)
-- James Carlson, KISS Network <james dot d dot carlson at sun dot com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
3,045
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 29, 2005 9:07 AM
in response to: carlsonj
|
|
> A doubled line isn't too hard ...
Indeed, Phil Kirk showed me a way to do it. I've updated the document accordingly.
> > > That latter case does in fact happen -- when in.mpathd isn't running. > > > > in.mpathd should be viewed as a critical system component -- what happens > > with IPMP when it's not running is no more relevant than what happens to > > DR when rcm_daemon isn't running. > > I disagree with that. in.mpathd isn't running on systems that don't > use IPMP. This means that the resulting interface is non-uniform. On > systems where IPMP is in use, ~RUNNING becomes FAILED. But on systems > where it's not ~RUNNING is just on its own. So the symmetry of the > two flags exists only in _some_ cases.
Hence "what happens with IPMP" in my above statement. Since FAILED will never be set when IPMP is not in-use, they will clearly be asymmetric in that case.
Anyway, as previously discussed, I will think about the FAILED/RUNNING interplay a bit more and make a decision shortly.
> As for uniqueness, I don't see how that really matters. If someone > has multiple addresses mapping to a single name, then the output of > other commands on that same system (e.g., "netstat -i") is going to > show the same lack of uniqueness unless "-n" is used.
But it will be harder for scripts to parse the machine-parseable format, because we will not be able to guarantee the uniqueness of each key (see section 4.2.6 of revision 1.3 or later of the document).
> > > The difference is that it fails from ifconfig today: > > > > > > # ifconfig foobar0 plumb group a 10.0.0.1 up > > > ifconfig: plumb: foobar0: No such file or directory > > > # > > > > I'm confused why this won't fail after the rearchitecture. > > Maybe I misunderstood. I read this section as implying that ifconfig > itself would be changed to deal with interface plumbing failure by > transferring the address.
That was not the intended implication.
> Or are we sticking with the existing ifparse-based machinery in > net_include.sh?
Sort of. What will happen is that the boot scripts will first try to plumb everything, and collect a list of the plumb operations that failed. For the set of failed interfaces, it will use ifparse to determine what IPMP group they were supposed to be part of, and add those addresses to the relevant IPMP group. This may potentially create the IPMP group along the way.
Does this address your concern? If so, I will update the document to make this explicit.
> > But this will only affect folks migrating to an IPMP-based configuration. > > I'd much rather provide clear documentation that static routes should not > > be associated with underlying interfaces than support the remapping in the > > kernel. > > It seems odd that we'd try harder with /sbin/route, but OK.
The difference is that many sites today use /sbin/route with IPMP.
> > > > > - I had trouble reading this. I assume it means just that > > > > > RTM_NEWADDR will occur during address transfer, and not that > [...] > > The RTM_NEWADDR discussion you refer to is in the context of removing an > > underlying interface from an IPMP group. In this case, there cannot be > > any UP data addresses. Of course, adding a data address directly to an > > IPMP interface will trigger an RTM_NEWADDR, but I fail to see what that > > has to do with the behavior of underlying interfaces. > > Returning to my original comment: the text wasn't clear. It said that > the "only" way RTM_NEWADDR happens is with transfer, and that's not > the case.
This comment is regarding section 5.4.2, which is explicitly about routing socket messages associated with the underlying physical interfaces. The routing socket message you're talking about would be associated with the IPMP group interface (section 5.4.1). However, 5.4.1 does not explicitly discuss the behavior of RTM_NEWADDR or RTM_DELADDR on the IPMP group interface; I will add this explanation.
> Not sure. There are certainly ARP entries for the underlying > interfaces as well. If there weren't, then you couldn't actually > probe any of those targets.
I don't see any reason why an application should be mucking with the ARP entries for test addresses -- thus, the expectation is that those ARP entries will be maintained by the kernel, and will not be directly modifiable by applications (they could be indirectly modified by changing a hardware address).
> And when manipulating ARP entries for the IPMP interface, things get a > bit confusing. Do you select the underlying interface with which a > given entry is associated by specifying its MAC address? How do I > say, "add it to the group and set its MAC address to be the same as > some appropriate member?"
The only time this should happen is proxy ARP, right? In that case, as we discussed, I think the most reasonable behavior is to treat the proxied address as if it belongs to the IPMP interface itself, and migrate it between interfaces in the group according to failure and repair. Thus, if an application asks to establish a binding from IP address I to hardware address H1, but H1 is associated with a failed interface, and H2 (also in the group) is functioning, we will establish a binding from I to H2. Likewise, if the interface associated with H2 later fails, but the interface associated with H1 is working, then the binding will be changed to be from I to H1.
I will update the document if you agree.
> What's the MAC address of an IPMP interface?
It doesn't have one -- but from the perspective of SIOCG[X]ARP, the IP addresses it hosts are associated with a hardware address associated with one of the interfaces in the group.
-- meem _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
6,810
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 29, 2005 9:19 AM
in response to: meem
|
|
Peter Memishian writes: > > As for uniqueness, I don't see how that really matters. If someone > > has multiple addresses mapping to a single name, then the output of > > other commands on that same system (e.g., "netstat -i") is going to > > show the same lack of uniqueness unless "-n" is used. > > But it will be harder for scripts to parse the machine-parseable format, > because we will not be able to guarantee the uniqueness of each key (see > section 4.2.6 of revision 1.3 or later of the document).
I still don't see a problem here. For those who really care about the issue, turning off address-to-name mapping is the right answer.
Not all users manage their systems in exactly the same way or necessarily do the same things in every single script.
> > Or are we sticking with the existing ifparse-based machinery in > > net_include.sh? > > Sort of. What will happen is that the boot scripts will first try to > plumb everything, and collect a list of the plumb operations that failed. > For the set of failed interfaces, it will use ifparse to determine what > IPMP group they were supposed to be part of, and add those addresses to > the relevant IPMP group. This may potentially create the IPMP group > along the way. > > Does this address your concern? If so, I will update the document to make > this explicit.
The above sounds like a "yes."
> This comment is regarding section 5.4.2, which is explicitly about routing > socket messages associated with the underlying physical interfaces. The > routing socket message you're talking about would be associated with the > IPMP group interface (section 5.4.1). However, 5.4.1 does not explicitly > discuss the behavior of RTM_NEWADDR or RTM_DELADDR on the IPMP group > interface; I will add this explanation.
OK.
> > And when manipulating ARP entries for the IPMP interface, things get a > > bit confusing. Do you select the underlying interface with which a > > given entry is associated by specifying its MAC address? How do I > > say, "add it to the group and set its MAC address to be the same as > > some appropriate member?" > > The only time this should happen is proxy ARP, right? In that case, as we > discussed, I think the most reasonable behavior is to treat the proxied > address as if it belongs to the IPMP interface itself, and migrate it > between interfaces in the group according to failure and repair. Thus, if > an application asks to establish a binding from IP address I to hardware > address H1, but H1 is associated with a failed interface, and H2 (also in > the group) is functioning, we will establish a binding from I to H2. > Likewise, if the interface associated with H2 later fails, but the > interface associated with H1 is working, then the binding will be changed > to be from I to H1. > > I will update the document if you agree.
Sounds good.
> > What's the MAC address of an IPMP interface? > > It doesn't have one -- but from the perspective of SIOCG[X]ARP, the IP > addresses it hosts are associated with a hardware address associated with > one of the interfaces in the group.
OK.
-- James Carlson, KISS Network <james dot d dot carlson at sun dot com> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
3,045
From:
US
Registered:
3/9/05
|
|
|
|
Re: Clearview IPMP Rearchitecture: high-level
design: extended to 9/29
Posted:
Sep 29, 2005 1:40 PM
in response to: carlsonj
|
|
> > But it will be harder for scripts to parse the machine-parseable format, > > because we will not be able to guarantee the uniqueness of each key (see > > section 4.2.6 of revision 1.3 or later of the document). > > I still don't see a problem here. For those who really care about the > issue, turning off address-to-name mapping is the right answer. > > Not all users manage their systems in exactly the same way or > necessarily do the same things in every single script.
True. I think I will make this change, but I need to think about it a little more.
I've updated the document to version 1.4. This includes a rewrite of the description of the ARP handling to take into account the issues you brought up, and a host of other smaller clarifications.
At this point, I believe I've addressed all of your feedback, with the following exceptions:
* Changing the handling of FAILED vs ~RUNNING for underlying interfaces (still thinking about this).
* Changing the output of ipmpstat to default to hostnames, with an option to show IP addresses (as per the discussion above).
* Covering IGMP handling -- and, as per Ramesh's earlier feedback, the behavior associated with all-nodes multicasts (and some other minor multicast issues).
* Covering "degraded" in more depth (need to talk to the FMA team about this one).
* Covering the high-level interaction with existing IPMP API's (asynchronous events and the query interface).
As always, the latest version is here:
http://opensolaris.org/os/community/networking/ipmp-highlevel-design.pdf
Please let me know if I've overlooked anything else. -- meem _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
|