Better Enterprise Multihoming

31

9

I would like to get some opinions regarding ways in which I can improve a BGP dual-provider, dual-router design. Each provider supplies a /24 public subnet. I will refer to the routers, circuits, subnets, HSRP groups and providers as A and B, respectively. The bandwidth on each circuit is adequate for the entire load.

Current Design

The current design attempts to achieve per-provider symmetry. In a steady state, the intended routing logic is that traffic to/from subnet A transits only circuit A and traffic to/from subnet B transits only circuit B. The circuits would back each other up in a failed state.

The providers only advertise the default route. Outbound routing entails a mix of PBR and HSRP. The routers have no routing between them: No iBGP, no OSPF, no static routing. Instead, there are two HSRP groups tracking the default route. Router A is primary for HSRP group A and router B is primary for HSRP group B. Downstream devices have a default route pointing to HSRP group A and PBR which directs traffic sourced from subnet B to HSRP group B. Inbound routing is influenced with prepending and communities. Subnet A is prepended and communitied on circuit B and subnet B is prepended and communitied on circuit A.

I see much room for improvement in this design. The lack of Internet topology awareness combined with circuit affinity completely eliminates best path selection. There are concerns about the tier designation of the providers and the design has been rationalized as providing ‘acceptable performance’ and simpler to troubleshoot. Indeed, the design couldn’t possibly get any simpler. I have demonstrated that transiting an extra AS adds 6 hops and 63ms (+421%) to the RTT. I would prefer not to settle for acceptable.

Better Design

The better design provides the routers with as much Internet topology awareness as possible. The best path algorithm is left to determine the inbound and outbound routing logic. The circuits would back each other up in a failed state.

The providers advertise the full view. The routers run iBGP and OSPF. HSRP is eliminated. The outbound routing would be purely destination-based best path and the inbound routing would be left to the best path algorithm and transit provider whims.

Now that I type it out, it does seem simpler. At the very least, it took fewer words to explain. There are concerns about asymmetry, but I have seen plenty of asymmetry in the current design. I would think they are probably equally prone to asymmetry and it really doesn’t worry me. We have never seen problems as a result. Presently it is relegated to the realm of ifs, “What ‘if’ we had to troubleshoot something?”

Am I way off base here or did I hit the nail on the head? How have others solved this problem? What would Google do?

Dennis Olvany

Posted 2013-05-08T01:54:05.750

Reputation: 1 625

Great detail and explanation. Welcome! – Pandom – 2013-05-08T02:16:42.460

Traditionally, " I would like to get some opinions on my design" questions aren't really great SE questions... But this can be discussed on meta – Aaron – 2013-05-08T03:11:59.063

Answers

14

Yes, you did hit the nail on the head.

You will get asymmetry in the improved design, but asymmetry is a fact of life on the Internet, and there's really no good reason to expect symmetric routing of traffic to/from. Shoot, the whole concept of packet routing is that separate packets are routed independently of each other and may take different paths, even packets going in the same direction.

Personally, I loath PBR. Its one of those technologies that when I decide that its the best solution to the problem, I stop and take a step back to see if I really understand the real nature of the problem, even back to figuring out what the business problem to be solved is. When I do so, I almost always find that there is a way to solve the problem without using a technology like that.

Having full Internet routes in your routers will take some getting used to, but once you get used to it, it is indeed very easy to understand and troubleshoot. Certainly there are fewer "moving parts" of different protocols to worry about.

You don't want to have full Internet routes in your OSPF database, so you'll want to advertise a default via OSPF into the interior of your network (or perhaps static default...personally I prefer default in OSPF). That will move traffic towards the BGP speaking Internet routers which can make the more fully informed decision of having the full Internet routes.

That will give you close to "destination based best path". There will still be cases where the traffic will do things you don't quite expect, so you'll want to get familiar with the BGP route selection process.

Jeff McAdams

Posted 2013-05-08T01:54:05.750

Reputation: 2 056

Thanks, Jeff. I agree with your disposition on PBR. I have seen it implemented in nightmarish ways. I have ripped more PBR out of networks than I care to remember. I once managed a tiered environment where PBR was deployed as a virtual routing mechanism with a unique route-map per SVI (100's). The PBR also contained permit/no set clauses which resulted in process switching. In hardcopy, it was like 60 pages of config. Needless to say, I took the wrecking ball to it; replaced it with VRF. – Dennis Olvany – 2013-05-09T00:35:29.270

4

To offer a different approach to the others given already, which may or may not be better than the existing ideas, but primarily to through some extra ideas out there;

I would say two easy steps you can take to improve on your current situation are as follows;

Step 1;

Get full BGP tables from both providers - Now, you will have more optimal outbound routing because you will be routing via the transit provider with the smallest AS path to your destination. As you said, you can remove HSRP and simple advertise a default route in OSPF and run iBGP between your two edge routers.

Step 2;

Set up AS prepends and communities etc on your two edge routers to control outbound traffic granularly as you require. So ISP B may have a better route to some subnet but you may buy more transit from ISP A and rather it when via them, and so on.


Assuming the two /24's you mentioned as having are PI independent address space so you are announcing them via both providers, or both providers have agree to announce the same IP address space for you, you can now announce both prefixes to both ISPs from both routers without prepends or communities and get better inbound routing as well (of course, unless you have some CDRs you need to adhear to or similar, in which case you can tune as required).

jwbensley

Posted 2013-05-08T01:54:05.750

Reputation: 3 558

Thanks, javano. I think we agree that the inbound and outbound routing policies are detrimental. I totally want to do away with PBR, prepending and communities! – Dennis Olvany – 2013-05-09T00:27:48.380

2

Start simple, then add complexity only when needed. I would question if there's even a need to run OSPF on your Internet edge routers. Boot PBR to the curb and only use on your interior network.

  1. Take full Internet routes if your router's have the memory, but do filtering! Toss anything gt a /24.
  2. Take a default route from A and B.
  3. Must run iBGP to let your routers make the bestpath decisions considering all prefixes received from A and B.
  4. If you plan to use both A and B's /24s with both providers, then you can better influence inbound traffic by prepend A's /24 on B's network and vice versa. Both /24s must be advertised! Check with your ISP for their communities in setting the prepends for you.
  5. Use two different HSRP groups for your outbound traffic from your firewall; you could set up ECLB to load-share to your two routers. Equal Cost Load Balancing.

All of this can be simplified if you're just using a single /24 advertised to both A and B.

Later, look into more complexity for better traffic engineering and protection:

  1. Become familiar with A and B's communities as you may prefer to use peer route-maps to set localpref to determine which routes use A vs. B.
  2. Set a floating static default route on both routers as an emergency backup to everything else in case your BGP blows up.

    ip route 0.0.0.0 0.0.0.0 a.b.c.d 254
    
  3. Look into more complex ways of advertising to control your inbound policy such as half your IP space going through A and the other half through B. For a given /24, you could advertise the /24 to both A and B, but split that into two /25's and advertise the lower /25 to A and the upper /25 to B.

  4. Use soft-reconfiguration so you can tweak your policies and do a soft reset on the BGP session so you don't cause dampening of your prefixes on the other side if you completely reset (or clear) the session. Changes in policy require a reset.

generalnetworkerror

Posted 2013-05-08T01:54:05.750

Reputation: 5 144

1

If the full BGP table is too much for you I think you could consider just receiving a portion. Perhaps provider A and B each advertise a default route and their local AS routes. You would need to run iBGP internally. That way you would have the shortest route for anything directly connected to the providers and would take either for downstream AS routes.

Kelly McDowell

Posted 2013-05-08T01:54:05.750

Reputation: 66

Thanks, Kelly. The better design would run iBGP. A hardware refresh is prompting the architecture review, so I am not too worried about the routers being able to handle it. The sales team says transitioning from IOS to JUNOS is a cakewalk. I am not so sure I agree, thus far. – Dennis Olvany – 2013-05-09T00:15:42.130

I don't know that I'd say its a cakewalk...its daunting to learn, not just the new syntax, but new concept of syntax. What I will say, though, is that I believe it is well worth it. JunOS will leave you head scratching for a while, but at some point, it will click and it will all start to make sense. You'll still have to look stuff up, of course (knowing the syntax of a language isn't the same as knowing the vocabulary), but by and large it will make sense. – Jeff McAdams – 2013-05-09T00:50:27.073

1

So what I understand from the write up is that you don't really have a need to make decisions based on AS paths to reach external subnets and the only purpose of dual homing to two ISP's is to buy redundancy to reach the internet. If thats the case, then you don't really need to run BGP. You can just accept the same default routes that you are already receiving from both your service provider. Now for the local side of the network, run a single ospf area on the routers that connect to the ISP on the interface facing your LAN (Do not include the ISP interface under the process) and depending on how simple the design needs to be to you can add routers in different areas and summarize the subnets at network boundaries, but for two subnets I think size of the OSPF database or the number of LSA flooding is not a huge concern, so you can put all the nodes in a single OSPF area and divide it into different areas as you grow.

On each OSPF router that connects to the ISP, redistribute the learned default routes into OSPF by using a "default-information originate" statement.

Couple of advantages:

  1. With this design, when you grow the network you can enable BGP with the service providers and just accept the default route without touching anything downstream devices. Until you verify that you are receiving a default route from BGP you are good.

  2. Whenever you need to route traffic off of a ISP for maintenance, just remove the "default-information originate" from underneath the OSPF process on that router and proceed with the maintenance. Nothing else is needed.

And I agree with the previous answer in that symmetric routing is over-rated, I rather go for scalability and ease of maintenance.

Vinny

Posted 2013-05-08T01:54:05.750

Reputation: 19

Thanks, Vinny. Failover of outobund traffic is not a problem, but would I not need BGP to failover inbound? If this was just users getting PAT'd to the internet, it may be feasible, but this is a web hosting environment. – Dennis Olvany – 2013-05-09T00:21:42.670

@user161: Absolutely, if we need a inbound failover for your originated subnets then you do need to run BGP. Check with your ISP's to see if they support ORF capability for BGP peering if so you can advertise locally originated subnets via BGP with a inbound filter on the border routers just to accept a default route and/or a select few subnets from the ISP routers. If the ISP does not support ORF, then there are really no better choices than buying a router with more juice.. – Vinny – 2013-05-09T01:14:30.347

If i understand @user161's plans, the goal is more intelligent outbound path selection. How do you achieve that in your OSPF-based solution? – Paul Gear – 2013-05-08T07:51:50.687