OpenSolaris

Discussions Communities Projects Download Source Browser

Home » OpenSolaris Forums » networking » discuss

Thread: quagga/SMF routing management design review -> due 9 October 2005

Welcome, Guest Help
Login Login
Guest Settings Guest Settings
Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 7 - Last Post: Mar 5, 2006 5:53 PM by: paulj
amaguire

Posts: 303
From: Dublin, Ireland

Registered: 10/17/05
quagga/SMF routing management design review -> due 9 October 2005
Posted: Oct 26, 2005 12:22 AM

  Click to reply to this thread Reply

hi folks

the quagga/routing management design doc has just been posted at

https://www.opensolaris.org/os/community/networking/quagga-design.pdf

the project aims to replace the SFWzebra routing protocol suite with Quagga (http://www.quagga.net) and, on a related subject (since Quagga for Solaris includes an SMF manifest), update routing management (and routing daemons) to fit with SMF.

whether your interests lie in routing, SMF or elsewhere, we're interested in hearing your comments, and though the project is pretty straightforward, there are collaboration opportunities if people are interested.

as mentioned above, deadline for comments is in 2 weeks time (9 October). thanks!

--
alan maguire (alan dot maguire at sun dot com)

art

Posts: 3
From: tulsa

Registered: 12/27/05
Re: quagga/SMF routing management design review -> due 9 October 2005
Posted: Mar 1, 2006 7:34 AM   in response to: amaguire

  Click to reply to this thread Reply

> hi folks
>
> the quagga/routing management design doc has just
> been posted at
>
> https://www.opensolaris.org/os/community/networking/qu
> agga-design.pdf
>
> the project aims to replace the SFWzebra routing
> protocol suite with Quagga (http://www.quagga.net)
> and, on a related subject (since Quagga for Solaris
> includes an SMF manifest), update routing management
> (and routing daemons) to fit with SMF.
>
> whether your interests lie in routing, SMF or
> elsewhere, we're interested in hearing your comments,
> and though the project is pretty straightforward,
> there are collaboration opportunities if people are
> interested.
>
> as mentioned above, deadline for comments is in 2
> weeks time (9 October). thanks!
>
> --
> alan maguire (alan dot maguire at sun dot com)

Has anyone looked at openbsd's design with openbgpd

paulj

Posts: 215
From: Scotland

Registered: 9/15/05
Re: Re: quagga/SMF routing management design review -> due 9 October 2005
Posted: Mar 1, 2006 6:40 PM   in response to: art

  Click to reply to this thread Reply

On Wed, 1 Mar 2006, art wrote:

> Has anyone looked at openbsd's design with openbgpd

Yes, it's interesting. They seem to have made a lot of progress, though
they're still missing a few things (as-path regex matching,
route-refresh or soft-reconfig[1]).

One critique I'd have of OpenBGPd is they seem to have a tendency to
ignore the RFC. OpenBGPd however does the following when accept()ing
connections ('p' represents other the remote peer):

if (p->fd != -1) {
if (p->state == STATE_CONNECT)
session_close_connection(p);
else {
close(connfd);
return;
}
}

Now, that's a whole lot simpler than the RFC admittedly, and it probably
works for 99.999% of cases, however it's not really robust AFAICT (would
be /great/ if it was). The worst case scenario is where both sides
repeatedly connect to each other at the same time - each time the
connections cross, each time both sides close(). I have a vague memory a
well-known vendor tried this trick before and had to back it out due to
interoperability problems.

I've had an email discussion with Henning about this, he swears it's
impossible for both sides to repeatedly and continuously connect() at
the same time. I reckon he just simply hasn't put OpenBGPd under enough
load. He seemed to agree it might be an idea to add deliberate jitter to
the connect() (no idea whether he did).

To be fair, Quagga's handling of dual-connections isn't great either. It
falls into the trap of how 1771 /seems/ to specify collision detection
(ie the wrong way), and it gets collision detection wrong sometimes
because of this.

However, OpenBGPd seems to play very loose with a part of BGP that
historically seems rife with interoperabi**** problems - I wonder how
wise their decision was.

Also, contrary to apparently popular opinion amongst OpenBGPd people,
GNU Zebra/Quagga memory usage does /not/ scale with the number of feeds.
It scales according to the number of distinct attributes received,
regardless of how many times they were received.

So memory usage tends to follow the number of /ASNs/ who give you full
feeds, not the number of sessions. (We could probably change the as-path
cache to be tree based, but to be honest memory is **** cheap these
days, most new hardware these days comes with way /more/ than enough
memory to accomodate a /lot/ of sessions. I've never had a bgpd user
complain to me about memory usage other than for slow memory leaks.
We've better things to do at the moment.).

This presumes soft-reconfig is *not* enabled (storing copies of received
routes), which it shouldn't be as nearly everyone supports route-refresh
(dynamically asking peer to resend routes), except of course for
OpenBGPd. ;)

Be interesting if to see if they can manage to retain their reputed (at
least amongst OpenBGPd community) memory usage and performance as they
start to add some of the more demanding features. (I have a feeling
their memory usage mightn't scale well with number of sessions).

I don't know of any objective comparisons between Quagga and OpenBGPd
unfortunately (particularly performance). I havn't used it myself or
looked at it too much, so I can't really give much in the way of useful
comparisons between the two, other than the (mostly) unlikely nit
mentioned above (FSM handling is the only thing I've really gone and
looked at in OpenBGPd).

That said, it's great to see competition. Choice is good for users.
OpenBGPd does seem to have made great strides in last (what?) year and a
half / two years. Good to see IMHO.

At present OpenBGPd isn't much of a choice for anyone but OpenBSD users
due to it using OpenBSD specific kernel interfaces (You can run it as a
non-forwarding peer on Linux and FreeBSD though). Presumably someone
could port the kernel interfaces of OpenBGPd if they wished.

1. Though, apparently that's in the works:

http://undeadly.org/cgi?action=article&sid=20060126160334

Interesting they're trying to do it with one RIB.

regards,
--
Paul Jakma,
Network Approachability, KISS. http://quagga.ireland.sun.com/
Sun Microsystems, Dublin, Ireland. tel: EMEA x19190 / +353 1 819 9190
_______________________________________________
networking-discuss mailing list
networking-discuss at opensolaris dot org



paulj

Posts: 215
From: Scotland

Registered: 9/15/05
Re: Re: quagga/SMF routing management design review -> due 9 October 2005
Posted: Mar 2, 2006 8:23 AM   in response to: paulj

  Click to reply to this thread Reply

On Thu, 2 Mar 2006, Paul Jakma wrote:

> Be interesting if to see if they can manage to retain their reputed
> (at least amongst OpenBGPd community) memory usage and performance as
> they start to add some of the more demanding features.

Ok, just for kicks, see the following:

http://archives.neohapsis.com/archives/openbsd/2006-02/0994.html

(note, I picked that because it shows the 'bgpctl' output[1], ignore the
'481MB' - they had just integrated soft-reconfig support into CURRENT
and they were still shaking out regressions. That leak presumably is
fixed.).

According to bgpctl's own output, OpenBGPd uses 132MB of memory. This is
for a router with just one full feed:

"OpnBSD-current, 1 IPv4 full mesh eBGP, 1 IPv6 eBGP (681 routes), 1
iBGP to Box 2, and 10-12 peers (2 or 3 routes per peer)"

It's got:

140820 attributes
29625 AS_PATH attributes.

The actual memory usage will be slightly greater than 132MB due to
overheads, but lets ignore that.

Here's Quagga on a FreeBSD 4 box at a webhosting facility:

6192 root 2 0 117M 117M select 59:52 12.65% 12.65% bgpd
6071 root 2 0 57332K 56936K select 1:25 0.00% 0.00% zebra

160MB total. 117MB for bgpd. The 57MB for zebra will remain mostly
constant once you have a full feed, regardless of how many BGP
sessions/announcements you get. The composite RIB kept by 'zebra' only
ever sees the best prefixes, so its RAM usage scales with the number of
distinct prefixes you receive, i.e. it scales with the size of the
growth in the global internet routing tables (which is just under 180k
at the moment).

Quagga's bgpd here has:

- 2 full-feed+ upstream connections,
(180k and 200k prefixes)
- 2 peering connections
(28k and 1k prefixes received)
- 63665 BGP AS-PATH entries
(more than twice the number of the OpenBGPd case)
- At least 200k distinct prefixes in its RIB
(at least 10% more than the OpenBGPd case)

And uses significantly less RAM than OpenBGPd, if you discount zebra's
usage - which is replicated information between bgpd and zebra, due to
Quagga's architecture. It's required for being able to choose between
routes from different protocols. Something OpenBGPd does not support -
unless it relies on the kernel to act as RIB between OpenBGPd's routes
and (say) OpenOSPFd's - kernel memory however can be more precious than
userspace memory.

"Aha, but OpenBGPd is configured for soft-reconfiguration here, Quagga
is not!"

Well, fair enough, however Quagga supports "dynamic route refresh".
Further, the reporter above reports OpenBGPd was using up to 80MB
/before/ the upgrade to soft-reconfig. That doesn't seem out of line
with the difference in the number of routes and prefixes between the two
cases.

Further, I have a suspicion Quagga bgpd's memory usage might scale
better than OpenBGPd's. That's without a question of a doubt while
'soft-reconfig' is their only dynamic reconfig option (and enabled by
default too).

I'd love to see someone objectively compare memory usage, and how it
scales, between the two though. I'd be surprised if Quagga didn't scale
as well as or better than OpenBGPd.

Finally, note that their memory usage requirements go up and up as they
add features. In its early days OpenBGPd took about 10MB to 15MB for a
full feed. That seems to have turned into 60 to 80MB as they added more
support for attributes and filtering (and modifying attributes). Now
it's at least 130MB for the exact same case because of soft-reconfig
(and soft-reconfig scales *horribly*).

Note the trend.

I can address performance next, at least from Quagga's POV.

--paulj

1. I honestly could not find any other example of this output, I googled
for 'bgpctl "RDE memory statistics"', if I had found other examples I
would have used those instead. I'm not deliberately "picking on" a
memory-leak report, honest. ;) (ignore the leak - it's happened to
Quagga too ;) )

regards,
--
Paul Jakma,
Network Approachability, KISS. http://quagga.ireland.sun.com/
Sun Microsystems, Dublin, Ireland. tel: EMEA x19190 / +353 1 819 9190
_______________________________________________
networking-discuss mailing list
networking-discuss at opensolaris dot org



paulj

Posts: 215
From: Scotland

Registered: 9/15/05
Re: Re: quagga/SMF routing management design review -> due 9 October 2005
Posted: Mar 2, 2006 5:08 PM   in response to: paulj

  Click to reply to this thread Reply

On Thu, 2 Mar 2006, Paul Jakma wrote:

> "Aha, but OpenBGPd is configured for soft-reconfiguration here, Quagga is
> not!"

And for apples-apples my bgpd guinea-pig went and enabled soft-reconfig
and then hard-cleared his sessions on that machine just for the fun of
it:

" # grep ^route-map /usr/local/etc/quagga/bgpd.conf|wc -l
18

18 route maps for those 2 full feeds which means it needs some
processing and route switching when one comes back up so actually, i
think 70sec for a full flap with an exotic config is quite impressive;)

6192 root 2 0 132M 132M select 71:00 5.27% 5.27% bgpd

still only 132MB"

The 70s figure is apparently the time it took for the /slowest/ feed to
clear, reconnect *and* completely sync up again (he cleared all his 4
sessions on that router). The machine apparently is a low-end P4 box
with one channel of DDR-400 RAM (Sis chipset).

Note that Quagga 0.99 *remains responsive* during this. Which brings me
on to performance:

The major failing of GNU Zebra bgpd, and hence inherited by Quagga bgpd,
was its inability to respond to IO while dealing with large-scale BGP
events (ie clearing routes of a peer, due to manual command, keepalive
timeout or connection drop). GNU Zebra's, and hence Quagga's, bgpd would
synchronously do all the processing work required to remove the route
from it's RIB, update zebra (synchronously), pick a new one, update
zebra, put the new route in the Adj-Out of peers (according to filters),
etc - not responding to network IO while doing this. GNU Zebra therefore
has gained a reputation for dropping sessions.

GNU Zebra's performance in terms of /throughput/ was really good, it
could process a /lot/ of BGP RIB entries in maybe 20% or less of the
time it would take more widely used implementations to do the same work,
on cheaper hardware. However, processing all the updates RIB in 120s
where others might take 600s or more is not much good if you can't
service required protocol IO. In short the performance was *awful* from
the POV of responsiveness.

I can't speak for OpenBGPd's performance, however by its design it
should be really good at remaining responsive, having split network IO
and route-processing between two seperate processes (the "Routing
Decision Engine" and the "Session Engine").

In the Quagga 0.99 development cycle (which we're currently stabilising
for a 1.0.0 release hopefully within the next few months), we went and
fixed this GNU Zebra shortcoming. We have made several key improvements
(and probably sacrificed a small amount of throughput performance in the
process - well worth it):

- the 'zserv' protocol, used for communication between
zebra and clients such as bgpd, is now extensively buffered, both
on the input side of zebra, and on the output side of bgpd.

- eliminated the long 'update zebra (synchronously)'
delay.

This change was responsible for bringing down the interactivity
'blocking' of bgpd down from approx 60s odd on a test case of a 800MHz
with two full feeds (then 160k), where the peer whose routes were nearly
all 'best' was cleared, such that bgpd had to update the best route for
all prefixes (ie the wrost case), down to 20s or less.

- Paths within bgpd which did a lot of work sequentially, such as:

- the entry point to rib_process (where new routes are picked
and propogated to zebra and BGP peers)

- paths which walk the entire RIB (e.g. to remove routes from
a peer which went down)

have been modified to 'packetise' their work into small chunks, via
workqueues.

On the same testcase, this brought the 'interactivity blocking' down
further from '< 20s' to less than 4s worst case.

I'm pretty sure we can 'packetise' this ~4s block due to the RIB walk
too (very easy to do now, but I need to think about any possible races -
I think there are none, but need to be sure). Which should bring
worst-case "blocks" down to about 100msec, or lower.

In short, Quagga 0.99 has, and hence 1.0.0 will, solved GNU Zebra's
primary deficiency. It's done so in an evolutionary manner, without
resorting to reimplementation (and hence no doubt reimplementing
bugs..).

See also:

http://blogs.sun.com/roller/page/paulj?entry=peer_pressure

In summary:

Compared to OpenBGPd, Quagga's:

- memory usages appear, by available reports, to be in-line, for low
number of feeds.

Quagga has a once-off additional deficit, scaled to the number of
best prefixes (ie just under 60MB with the current DFZ size, on
32bit i386, just under 100MB for 64bit) due to zebra maintaining a
composite 'best route' RIB in userspace.

Quagga's bgpd memory usage is not incomparable to OpenBGPds.
Further, until OpenBGPd implement dynamic route-refresh, Quagga's
memory usage requirements undoubtedly will scale *much* better than
OpenBGPds, for the same effective operational capability (ie ability
to soft-clear).

- the chronic responsitivity performance problem inherited from GNU
Zebra has been eliminated. If any such problems remain, we also now
have the infrastructure in Quagga to fix them relatively easily.

We've achieved this without destabilising Quagga, and without
sacrificing it's throughput performance to any significant extent,
which should still be *way* in excess of low-powered commercial
routers and (presumably) not incomparable to OpenBGPd.

Finally, as noted before, no thorough and/or objective comparisons
exist[1] of the relative performance of Quagga 0.99 (equivalent of
-CURRENT) against OpenBGPd CURRENT. I'd love to see one, and to see
whether or not we still fall behind anywhere, and if so where.

If someone were to take the time to do that, it would make a very nice
'one-pager' paper. ;)

Hope this helps..

1. Closest is probably Hasso Tepper's comparison at:

http://hasso.linux.ee/doku.php/english:network:openbgpd

Replicating his test with Quagga 0.99 would be very useful.

regards,
--
Paul Jakma,
Network Approachability, KISS. http://quagga.ireland.sun.com/
Sun Microsystems, Dublin, Ireland. tel: EMEA x19190 / +353 1 819 9190
_______________________________________________
networking-discuss mailing list
networking-discuss at opensolaris dot org



sleinen

Posts: 20
From: CH

Registered: 6/19/05
Re: Re: quagga/SMF routing management design review -> due 9 October 2005
Posted: Mar 3, 2006 12:56 PM   in response to: paulj

  Click to reply to this thread Reply

Paul,

very interesting posts about Quagga (in comparison with OpenBGPd and in general), thanks.

One small note: While cooperative route filtering is great, "operationally" many ISPs like to be able to look at routes that the other side announced to us, but that were trapped in our filters. It would be great if we could have our cake and eat it too, by using the memory-saving ORF [i]most of the time[/i], but being able to (non-disruptively) make the peer send us all routes from time to time for debugging/monitoring. The unfiltered set of routes wouldn't need to go into a long-lived buffer - just log and/or analyze them for statistics. Do you think that would be possible?

To be honest, I don't think we'd use Quagga for [i]external[/i] BGP anytime soon (we do use it for OSPFv2/v3 to announce anycast routes from Linux and Solaris servers), so I should really ask our router vendor. But since you know the code I'd like to hear your opinion on the feasibility of this.

paulj

Posts: 215
From: Scotland

Registered: 9/15/05
Re: Re: Re: quagga/SMF routing management design review -> due 9 October 2005
Posted: Mar 5, 2006 5:53 PM   in response to: sleinen

  Click to reply to this thread Reply

On Fri, 3 Mar 2006, Simon Leinen wrote:

> Paul,
>
> very interesting posts about Quagga (in comparison with OpenBGPd and
> in general), thanks.

Welcome. User critique by way of comparing both in practice would be
good too.

> One small note: While cooperative route filtering is great,
> "operationally" many ISPs like to be able to look at routes that the
> other side announced to us, but that were trapped in our filters.

Sure. Then just enable soft-reconfig. In Quagga, any attributes common
to routes in both the Local-RIB and the Adj-In are stored only once (we
have a cache). So it's pretty low overhead, unless you modify bulk of
the attributes as part of filtering into the Local-RIB.

> It would be great if we could have our cake and eat it too, by using
> the memory-saving ORF [i]most of the time[/i], but being able to
> (non-disruptively) make the peer send us all routes from time to time
> for debugging/monitoring. The unfiltered set of routes wouldn't need
> to go into a long-lived buffer - just log and/or analyze them for
> statistics. Do you think that would be possible?

Sort of.

With Quagga you have:

- ability for bgpd to log updates (including in 'MRT' format)
- ability to run 'tcpdump', 'snoop', 'ethereal', etc.. depending on the
capabilities of the host. (tcpdump -w / snoop -o can be useful
obviously)

So you can initiate route-refresh and capture the updates, even without
soft-reconfig.

You can't tell though exactly when the 'refresh' ends, unfortunately.
There is a feature in BGP for this, "End of RIB", which could
potentially used to signal "Finished sending you my refresh" but it's
tied in with the BGP Graceful-Restart RFC at the moment (unfortunately).

I guess you could just provide an option to buffer received prefixes
(pre-filtering), to some maximum number of prefixes or until user asks
for it to be stopped.

Soft-reconf is probably easier.

> To be honest, I don't think we'd use Quagga for [i]external[/i] BGP
> anytime soon

That's understable for now.

> (we do use it for OSPFv2/v3 to announce anycast routes from Linux and
> Solaris servers),

Neat :)

> so I should really ask our router vendor. But since you know the code
> I'd like to hear your opinion on the feasibility of this.

See above.

regards,
--
Paul Jakma,
Network Approachability, KISS. http://quagga.ireland.sun.com/
Sun Microsystems, Dublin, Ireland. tel: EMEA x19190 / +353 1 819 9190
_______________________________________________
networking-discuss mailing list
networking-discuss at opensolaris dot org



paulj

Posts: 215
From: Scotland

Registered: 9/15/05
Re: Re: quagga/SMF routing management design review -> due 9 October 2005
Posted: Mar 3, 2006 6:31 AM   in response to: paulj

  Click to reply to this thread Reply

Corrections:

On Thu, 2 Mar 2006, Paul Jakma wrote:

> And uses significantly less RAM than OpenBGPd, if you discount zebra's
> usage - which is replicated information between bgpd and zebra, due to
> Quagga's architecture. It's required for being able to choose between
> routes from different protocols. Something OpenBGPd does not support -
> unless it relies on the kernel to act as RIB between OpenBGPd's routes

Ok, OpenBGPd /does/ maintain a userspace copy of the RIB. 5MB for 155k
apparently. However, it doesn't support:

- recursive nexthops for BGP, to allow BGP routes to follow through IGP
routes.
(Now I understand why the OpenBGPd presentations make such a huge deal
about having link-state available to BGP.)
- preferences between different kinds of protocols ('administrative
distance')
- recording of protocol metrics
- statistics on route changes ("look in the logs" is the answer instead
I believe)

That said, there is some silly 'fat' in Quagga's zebra (storing
unimportant information per route we can retrieve elsewhere) and our
recursive nexthops are a bolt-on and wasteful of space - has to be
overhauled at some stage.

So we'll see what we can do there post-1.0.

The rest of the comparison should stand.

> than OpenBGPd's. That's without a question of a doubt while 'soft-reconfig'
> is their only dynamic reconfig option (and enabled by default too).

Interesting thing here, I drew the conclusion that they did not support
RR from looking at their documentation, and from fact they went to the
effort of implementing stored-route soft-reconfig. But on looking at the
code they actually do /respond/ to RR messages, and will resend /others/
their best routes in response. However OpenBGPd does not appear able to
/send/ RR, AFAICT.

Which is strange, I've either missed where they're generating RR (and
how the user initiates route-refresh), or else perhaps there is some
technical reason for OpenBGPd not being able to handle routes being
resent to it.

Curious :).

regards,
--
Paul Jakma,
Network Approachability, KISS. http://quagga.ireland.sun.com/
Sun Microsystems, Dublin, Ireland. tel: EMEA x19190 / +353 1 819 9190
_______________________________________________
networking-discuss mailing list
networking-discuss at opensolaris dot org






Terms of Use | Privacy | Trademarks | Copyright Policy | Site Guidelines
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Use.
Copyright © 1995-2005 Sun Microsystems, Inc.