|
Replies:
7
-
Last Post:
Mar 5, 2006 5:53 PM
by: paulj
|
|
|
Posts:
303
From:
Dublin, Ireland
Registered:
10/17/05
|
|
|
|
quagga/SMF routing management design review -> due 9 October 2005
Posted:
Oct 26, 2005 12:22 AM
|
|
hi folks
the quagga/routing management design doc has just been posted at
https://www.opensolaris.org/os/community/networking/quagga-design.pdf
the project aims to replace the SFWzebra routing protocol suite with Quagga (http://www.quagga.net) and, on a related subject (since Quagga for Solaris includes an SMF manifest), update routing management (and routing daemons) to fit with SMF.
whether your interests lie in routing, SMF or elsewhere, we're interested in hearing your comments, and though the project is pretty straightforward, there are collaboration opportunities if people are interested.
as mentioned above, deadline for comments is in 2 weeks time (9 October). thanks!
-- alan maguire (alan dot maguire at sun dot com)
|
|
|
Posts:
3
From:
tulsa
Registered:
12/27/05
|
|
|
|
Re: quagga/SMF routing management design review -> due 9 October 2005
Posted:
Mar 1, 2006 7:34 AM
in response to: amaguire
|
|
> hi folks > > the quagga/routing management design doc has just > been posted at > > https://www.opensolaris.org/os/community/networking/qu > agga-design.pdf > > the project aims to replace the SFWzebra routing > protocol suite with Quagga (http://www.quagga.net) > and, on a related subject (since Quagga for Solaris > includes an SMF manifest), update routing management > (and routing daemons) to fit with SMF. > > whether your interests lie in routing, SMF or > elsewhere, we're interested in hearing your comments, > and though the project is pretty straightforward, > there are collaboration opportunities if people are > interested. > > as mentioned above, deadline for comments is in 2 > weeks time (9 October). thanks! > > -- > alan maguire (alan dot maguire at sun dot com)
Has anyone looked at openbsd's design with openbgpd
|
|
|
|
Posts:
215
From:
Scotland
Registered:
9/15/05
|
|
|
|
Re: Re: quagga/SMF routing management design
review -> due 9 October 2005
Posted:
Mar 1, 2006 6:40 PM
in response to: art
|
|
On Wed, 1 Mar 2006, art wrote:
> Has anyone looked at openbsd's design with openbgpd
Yes, it's interesting. They seem to have made a lot of progress, though they're still missing a few things (as-path regex matching, route-refresh or soft-reconfig[1]).
One critique I'd have of OpenBGPd is they seem to have a tendency to ignore the RFC. OpenBGPd however does the following when accept()ing connections ('p' represents other the remote peer):
if (p->fd != -1) { if (p->state == STATE_CONNECT) session_close_connection(p); else { close(connfd); return; } }
Now, that's a whole lot simpler than the RFC admittedly, and it probably works for 99.999% of cases, however it's not really robust AFAICT (would be /great/ if it was). The worst case scenario is where both sides repeatedly connect to each other at the same time - each time the connections cross, each time both sides close(). I have a vague memory a well-known vendor tried this trick before and had to back it out due to interoperability problems.
I've had an email discussion with Henning about this, he swears it's impossible for both sides to repeatedly and continuously connect() at the same time. I reckon he just simply hasn't put OpenBGPd under enough load. He seemed to agree it might be an idea to add deliberate jitter to the connect() (no idea whether he did).
To be fair, Quagga's handling of dual-connections isn't great either. It falls into the trap of how 1771 /seems/ to specify collision detection (ie the wrong way), and it gets collision detection wrong sometimes because of this.
However, OpenBGPd seems to play very loose with a part of BGP that historically seems rife with interoperabi**** problems - I wonder how wise their decision was.
Also, contrary to apparently popular opinion amongst OpenBGPd people, GNU Zebra/Quagga memory usage does /not/ scale with the number of feeds. It scales according to the number of distinct attributes received, regardless of how many times they were received.
So memory usage tends to follow the number of /ASNs/ who give you full feeds, not the number of sessions. (We could probably change the as-path cache to be tree based, but to be honest memory is **** cheap these days, most new hardware these days comes with way /more/ than enough memory to accomodate a /lot/ of sessions. I've never had a bgpd user complain to me about memory usage other than for slow memory leaks. We've better things to do at the moment.).
This presumes soft-reconfig is *not* enabled (storing copies of received routes), which it shouldn't be as nearly everyone supports route-refresh (dynamically asking peer to resend routes), except of course for OpenBGPd. ;)
Be interesting if to see if they can manage to retain their reputed (at least amongst OpenBGPd community) memory usage and performance as they start to add some of the more demanding features. (I have a feeling their memory usage mightn't scale well with number of sessions).
I don't know of any objective comparisons between Quagga and OpenBGPd unfortunately (particularly performance). I havn't used it myself or looked at it too much, so I can't really give much in the way of useful comparisons between the two, other than the (mostly) unlikely nit mentioned above (FSM handling is the only thing I've really gone and looked at in OpenBGPd).
That said, it's great to see competition. Choice is good for users. OpenBGPd does seem to have made great strides in last (what?) year and a half / two years. Good to see IMHO.
At present OpenBGPd isn't much of a choice for anyone but OpenBSD users due to it using OpenBSD specific kernel interfaces (You can run it as a non-forwarding peer on Linux and FreeBSD though). Presumably someone could port the kernel interfaces of OpenBGPd if they wished.
1. Though, apparently that's in the works:
http://undeadly.org/cgi?action=article&sid=20060126160334
Interesting they're trying to do it with one RIB.
regards, -- Paul Jakma, Network Approachability, KISS. http://quagga.ireland.sun.com/ Sun Microsystems, Dublin, Ireland. tel: EMEA x19190 / +353 1 819 9190 _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
215
From:
Scotland
Registered:
9/15/05
|
|
|
|
Re: Re: quagga/SMF routing management design
review -> due 9 October 2005
Posted:
Mar 2, 2006 8:23 AM
in response to: paulj
|
|
On Thu, 2 Mar 2006, Paul Jakma wrote:
> Be interesting if to see if they can manage to retain their reputed > (at least amongst OpenBGPd community) memory usage and performance as > they start to add some of the more demanding features.
Ok, just for kicks, see the following:
http://archives.neohapsis.com/archives/openbsd/2006-02/0994.html
(note, I picked that because it shows the 'bgpctl' output[1], ignore the '481MB' - they had just integrated soft-reconfig support into CURRENT and they were still shaking out regressions. That leak presumably is fixed.).
According to bgpctl's own output, OpenBGPd uses 132MB of memory. This is for a router with just one full feed:
"OpnBSD-current, 1 IPv4 full mesh eBGP, 1 IPv6 eBGP (681 routes), 1 iBGP to Box 2, and 10-12 peers (2 or 3 routes per peer)"
It's got:
140820 attributes 29625 AS_PATH attributes.
The actual memory usage will be slightly greater than 132MB due to overheads, but lets ignore that.
Here's Quagga on a FreeBSD 4 box at a webhosting facility:
6192 root 2 0 117M 117M select 59:52 12.65% 12.65% bgpd 6071 root 2 0 57332K 56936K select 1:25 0.00% 0.00% zebra
160MB total. 117MB for bgpd. The 57MB for zebra will remain mostly constant once you have a full feed, regardless of how many BGP sessions/announcements you get. The composite RIB kept by 'zebra' only ever sees the best prefixes, so its RAM usage scales with the number of distinct prefixes you receive, i.e. it scales with the size of the growth in the global internet routing tables (which is just under 180k at the moment).
Quagga's bgpd here has:
- 2 full-feed+ upstream connections, (180k and 200k prefixes) - 2 peering connections (28k and 1k prefixes received) - 63665 BGP AS-PATH entries (more than twice the number of the OpenBGPd case) - At least 200k distinct prefixes in its RIB (at least 10% more than the OpenBGPd case)
And uses significantly less RAM than OpenBGPd, if you discount zebra's usage - which is replicated information between bgpd and zebra, due to Quagga's architecture. It's required for being able to choose between routes from different protocols. Something OpenBGPd does not support - unless it relies on the kernel to act as RIB between OpenBGPd's routes and (say) OpenOSPFd's - kernel memory however can be more precious than userspace memory.
"Aha, but OpenBGPd is configured for soft-reconfiguration here, Quagga is not!"
Well, fair enough, however Quagga supports "dynamic route refresh". Further, the reporter above reports OpenBGPd was using up to 80MB /before/ the upgrade to soft-reconfig. That doesn't seem out of line with the difference in the number of routes and prefixes between the two cases.
Further, I have a suspicion Quagga bgpd's memory usage might scale better than OpenBGPd's. That's without a question of a doubt while 'soft-reconfig' is their only dynamic reconfig option (and enabled by default too).
I'd love to see someone objectively compare memory usage, and how it scales, between the two though. I'd be surprised if Quagga didn't scale as well as or better than OpenBGPd.
Finally, note that their memory usage requirements go up and up as they add features. In its early days OpenBGPd took about 10MB to 15MB for a full feed. That seems to have turned into 60 to 80MB as they added more support for attributes and filtering (and modifying attributes). Now it's at least 130MB for the exact same case because of soft-reconfig (and soft-reconfig scales *horribly*).
Note the trend.
I can address performance next, at least from Quagga's POV.
--paulj
1. I honestly could not find any other example of this output, I googled for 'bgpctl "RDE memory statistics"', if I had found other examples I would have used those instead. I'm not deliberately "picking on" a memory-leak report, honest. ;) (ignore the leak - it's happened to Quagga too ;) )
regards, -- Paul Jakma, Network Approachability, KISS. http://quagga.ireland.sun.com/ Sun Microsystems, Dublin, Ireland. tel: EMEA x19190 / +353 1 819 9190 _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
215
From:
Scotland
Registered:
9/15/05
|
|
|
|
Re: Re: quagga/SMF routing management design
review -> due 9 October 2005
Posted:
Mar 2, 2006 5:08 PM
in response to: paulj
|
|
On Thu, 2 Mar 2006, Paul Jakma wrote:
> "Aha, but OpenBGPd is configured for soft-reconfiguration here, Quagga is > not!"
And for apples-apples my bgpd guinea-pig went and enabled soft-reconfig and then hard-cleared his sessions on that machine just for the fun of it:
" # grep ^route-map /usr/local/etc/quagga/bgpd.conf|wc -l 18
18 route maps for those 2 full feeds which means it needs some processing and route switching when one comes back up so actually, i think 70sec for a full flap with an exotic config is quite impressive;)
6192 root 2 0 132M 132M select 71:00 5.27% 5.27% bgpd
still only 132MB"
The 70s figure is apparently the time it took for the /slowest/ feed to clear, reconnect *and* completely sync up again (he cleared all his 4 sessions on that router). The machine apparently is a low-end P4 box with one channel of DDR-400 RAM (Sis chipset).
Note that Quagga 0.99 *remains responsive* during this. Which brings me on to performance:
The major failing of GNU Zebra bgpd, and hence inherited by Quagga bgpd, was its inability to respond to IO while dealing with large-scale BGP events (ie clearing routes of a peer, due to manual command, keepalive timeout or connection drop). GNU Zebra's, and hence Quagga's, bgpd would synchronously do all the processing work required to remove the route from it's RIB, update zebra (synchronously), pick a new one, update zebra, put the new route in the Adj-Out of peers (according to filters), etc - not responding to network IO while doing this. GNU Zebra therefore has gained a reputation for dropping sessions.
GNU Zebra's performance in terms of /throughput/ was really good, it could process a /lot/ of BGP RIB entries in maybe 20% or less of the time it would take more widely used implementations to do the same work, on cheaper hardware. However, processing all the updates RIB in 120s where others might take 600s or more is not much good if you can't service required protocol IO. In short the performance was *awful* from the POV of responsiveness.
I can't speak for OpenBGPd's performance, however by its design it should be really good at remaining responsive, having split network IO and route-processing between two seperate processes (the "Routing Decision Engine" and the "Session Engine").
In the Quagga 0.99 development cycle (which we're currently stabilising for a 1.0.0 release hopefully within the next few months), we went and fixed this GNU Zebra shortcoming. We have made several key improvements (and probably sacrificed a small amount of throughput performance in the process - well worth it):
- the 'zserv' protocol, used for communication between zebra and clients such as bgpd, is now extensively buffered, both on the input side of zebra, and on the output side of bgpd.
- eliminated the long 'update zebra (synchronously)' delay.
This change was responsible for bringing down the interactivity 'blocking' of bgpd down from approx 60s odd on a test case of a 800MHz with two full feeds (then 160k), where the peer whose routes were nearly all 'best' was cleared, such that bgpd had to update the best route for all prefixes (ie the wrost case), down to 20s or less.
- Paths within bgpd which did a lot of work sequentially, such as:
- the entry point to rib_process (where new routes are picked and propogated to zebra and BGP peers)
- paths which walk the entire RIB (e.g. to remove routes from a peer which went down)
have been modified to 'packetise' their work into small chunks, via workqueues.
On the same testcase, this brought the 'interactivity blocking' down further from '< 20s' to less than 4s worst case.
I'm pretty sure we can 'packetise' this ~4s block due to the RIB walk too (very easy to do now, but I need to think about any possible races - I think there are none, but need to be sure). Which should bring worst-case "blocks" down to about 100msec, or lower.
In short, Quagga 0.99 has, and hence 1.0.0 will, solved GNU Zebra's primary deficiency. It's done so in an evolutionary manner, without resorting to reimplementation (and hence no doubt reimplementing bugs..).
See also:
http://blogs.sun.com/roller/page/paulj?entry=peer_pressure
In summary:
Compared to OpenBGPd, Quagga's:
- memory usages appear, by available reports, to be in-line, for low number of feeds.
Quagga has a once-off additional deficit, scaled to the number of best prefixes (ie just under 60MB with the current DFZ size, on 32bit i386, just under 100MB for 64bit) due to zebra maintaining a composite 'best route' RIB in userspace.
Quagga's bgpd memory usage is not incomparable to OpenBGPds. Further, until OpenBGPd implement dynamic route-refresh, Quagga's memory usage requirements undoubtedly will scale *much* better than OpenBGPds, for the same effective operational capability (ie ability to soft-clear).
- the chronic responsitivity performance problem inherited from GNU Zebra has been eliminated. If any such problems remain, we also now have the infrastructure in Quagga to fix them relatively easily.
We've achieved this without destabilising Quagga, and without sacrificing it's throughput performance to any significant extent, which should still be *way* in excess of low-powered commercial routers and (presumably) not incomparable to OpenBGPd.
Finally, as noted before, no thorough and/or objective comparisons exist[1] of the relative performance of Quagga 0.99 (equivalent of -CURRENT) against OpenBGPd CURRENT. I'd love to see one, and to see whether or not we still fall behind anywhere, and if so where.
If someone were to take the time to do that, it would make a very nice 'one-pager' paper. ;)
Hope this helps..
1. Closest is probably Hasso Tepper's comparison at:
http://hasso.linux.ee/doku.php/english:network:openbgpd
Replicating his test with Quagga 0.99 would be very useful.
regards, -- Paul Jakma, Network Approachability, KISS. http://quagga.ireland.sun.com/ Sun Microsystems, Dublin, Ireland. tel: EMEA x19190 / +353 1 819 9190 _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
20
From:
CH
Registered:
6/19/05
|
|
|
|
Re: Re: quagga/SMF routing management design
review -> due 9 October 2005
Posted:
Mar 3, 2006 12:56 PM
in response to: paulj
|
|
Paul,
very interesting posts about Quagga (in comparison with OpenBGPd and in general), thanks.
One small note: While cooperative route filtering is great, "operationally" many ISPs like to be able to look at routes that the other side announced to us, but that were trapped in our filters. It would be great if we could have our cake and eat it too, by using the memory-saving ORF [i]most of the time[/i], but being able to (non-disruptively) make the peer send us all routes from time to time for debugging/monitoring. The unfiltered set of routes wouldn't need to go into a long-lived buffer - just log and/or analyze them for statistics. Do you think that would be possible?
To be honest, I don't think we'd use Quagga for [i]external[/i] BGP anytime soon (we do use it for OSPFv2/v3 to announce anycast routes from Linux and Solaris servers), so I should really ask our router vendor. But since you know the code I'd like to hear your opinion on the feasibility of this.
|
|
|
|
Posts:
215
From:
Scotland
Registered:
9/15/05
|
|
|
|
Re: Re: Re: quagga/SMF routing management design
review -> due 9 October 2005
Posted:
Mar 5, 2006 5:53 PM
in response to: sleinen
|
|
On Fri, 3 Mar 2006, Simon Leinen wrote:
> Paul, > > very interesting posts about Quagga (in comparison with OpenBGPd and > in general), thanks.
Welcome. User critique by way of comparing both in practice would be good too.
> One small note: While cooperative route filtering is great, > "operationally" many ISPs like to be able to look at routes that the > other side announced to us, but that were trapped in our filters.
Sure. Then just enable soft-reconfig. In Quagga, any attributes common to routes in both the Local-RIB and the Adj-In are stored only once (we have a cache). So it's pretty low overhead, unless you modify bulk of the attributes as part of filtering into the Local-RIB.
> It would be great if we could have our cake and eat it too, by using > the memory-saving ORF [i]most of the time[/i], but being able to > (non-disruptively) make the peer send us all routes from time to time > for debugging/monitoring. The unfiltered set of routes wouldn't need > to go into a long-lived buffer - just log and/or analyze them for > statistics. Do you think that would be possible?
Sort of.
With Quagga you have:
- ability for bgpd to log updates (including in 'MRT' format) - ability to run 'tcpdump', 'snoop', 'ethereal', etc.. depending on the capabilities of the host. (tcpdump -w / snoop -o can be useful obviously)
So you can initiate route-refresh and capture the updates, even without soft-reconfig.
You can't tell though exactly when the 'refresh' ends, unfortunately. There is a feature in BGP for this, "End of RIB", which could potentially used to signal "Finished sending you my refresh" but it's tied in with the BGP Graceful-Restart RFC at the moment (unfortunately).
I guess you could just provide an option to buffer received prefixes (pre-filtering), to some maximum number of prefixes or until user asks for it to be stopped.
Soft-reconf is probably easier.
> To be honest, I don't think we'd use Quagga for [i]external[/i] BGP > anytime soon
That's understable for now.
> (we do use it for OSPFv2/v3 to announce anycast routes from Linux and > Solaris servers),
Neat :)
> so I should really ask our router vendor. But since you know the code > I'd like to hear your opinion on the feasibility of this.
See above.
regards, -- Paul Jakma, Network Approachability, KISS. http://quagga.ireland.sun.com/ Sun Microsystems, Dublin, Ireland. tel: EMEA x19190 / +353 1 819 9190 _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
Posts:
215
From:
Scotland
Registered:
9/15/05
|
|
|
|
Re: Re: quagga/SMF routing management design
review -> due 9 October 2005
Posted:
Mar 3, 2006 6:31 AM
in response to: paulj
|
|
Corrections:
On Thu, 2 Mar 2006, Paul Jakma wrote:
> And uses significantly less RAM than OpenBGPd, if you discount zebra's > usage - which is replicated information between bgpd and zebra, due to > Quagga's architecture. It's required for being able to choose between > routes from different protocols. Something OpenBGPd does not support - > unless it relies on the kernel to act as RIB between OpenBGPd's routes
Ok, OpenBGPd /does/ maintain a userspace copy of the RIB. 5MB for 155k apparently. However, it doesn't support:
- recursive nexthops for BGP, to allow BGP routes to follow through IGP routes. (Now I understand why the OpenBGPd presentations make such a huge deal about having link-state available to BGP.) - preferences between different kinds of protocols ('administrative distance') - recording of protocol metrics - statistics on route changes ("look in the logs" is the answer instead I believe)
That said, there is some silly 'fat' in Quagga's zebra (storing unimportant information per route we can retrieve elsewhere) and our recursive nexthops are a bolt-on and wasteful of space - has to be overhauled at some stage.
So we'll see what we can do there post-1.0.
The rest of the comparison should stand.
> than OpenBGPd's. That's without a question of a doubt while 'soft-reconfig' > is their only dynamic reconfig option (and enabled by default too).
Interesting thing here, I drew the conclusion that they did not support RR from looking at their documentation, and from fact they went to the effort of implementing stored-route soft-reconfig. But on looking at the code they actually do /respond/ to RR messages, and will resend /others/ their best routes in response. However OpenBGPd does not appear able to /send/ RR, AFAICT.
Which is strange, I've either missed where they're generating RR (and how the user initiates route-refresh), or else perhaps there is some technical reason for OpenBGPd not being able to handle routes being resent to it.
Curious :).
regards, -- Paul Jakma, Network Approachability, KISS. http://quagga.ireland.sun.com/ Sun Microsystems, Dublin, Ireland. tel: EMEA x19190 / +353 1 819 9190 _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris dot org
|
|
|
|
|