The Odd World of the DFZ - Fundamentals

Foreword§

This is probably going to be a series of regularly updated posts trying to pass some knowledge about how to do your best at being a member of the Internet. These posts are also explicitly talking about eBGP with other networks instead of internal routing, though some information might still be applicable.

As I learn more about that myself, I'll update this post with some general advice. There will be other posts describing routing platforms/vendors and their quirks.

Be warned however that you need to know some basic networking terminology, for example what Network Prefixes and Routes are.

Trash or Treasure?§

The millions of routes covering "the whole internet" are what make up the DFZ - the Default-free Zone. When a router in the DFZ needs to forward a packet to a destination it doesn't know about, it can't simply forward it to another one. Thus, getting all of the routes is crucial to ensure reachability.

Adjacent networks (ASNs) announce to each other a subset of routes they know about and receive the same. Sometimes just a few routes they are resonsible for, sometimes all routes they know about.

However, to make that work, the Internet is built on varying degrees of mutual trust. This mesh relies heavily on the idea that almost every given route information is correct, up to date and not maliceous. If that's not the case, traffic towards some destinations might be slower than it has to be, doesn't reach the destination or even gets intercepted.

When you are a network operator in the DFZ, you'll most likely end up with an ASN and BGP to route your prefixes. It does not matter if you are a big global ISP - like Lumen or DTAG - or a small research network (like me, AS213342).

Trusting is nice, but mistakes happen. And sometimes, Route Hijacking happens on purpose.

Trust but Verify.§

It is clear that it is probably not a good idea to let anyone pretend to be anyone. Luckily for all Netizens, it is not quite Wild West out there.

Thanks to the IRR and RIRs, it's rather clear who owns what prefixes. Clearly we need to apply Route Filtering to make sure the implicit trust given to us by others is not violated. We don't want mistakes to be propagated to others. That's what stops misconfiguration from becoming Route Hijacking.

Route Filtering§

When a BGP peer announces a prefix to another, there are a lot of reasons why it may be incorrect or not desired.

Some reasons why a route announcement may be "incorrect" in the DFZ are listed below. This is not the everything, it's really just meant as general examples to help you get started.

Invalid Next-Hops§

Sometimes a router might announce routes with invalid next-hops to you. Say, internal IP addresses that you have no way to reach.

If your router receives a route that says X.Y.Z.0/24 is reachable via A.B.C.5, it has to resolve that last IP to a MAC address to forward the Layer 3 information to it. When it can't figure out a way to reach the last IP, that route is invalid.

Most platforms will reject it or hide it and simply not consider it a valid route. This should be automatic.

Enforcing first AS§

Unless you're peering with a Route Server on an IXP or something similar, routes announced to you via eBGP will contain the AS of the network that announced that route to you.

This makes a lot of sense, as the traffic will flow through that network to reach the routes announced to you.

However, routers being designed for flexibility and not knowing what type of peer you're peering with, usually don't default to enforcing that. Turn it on for any eBGP peering sessions that isn't to a Route Server, as routes announced to you that don't have the peer AS prepended in that case are most likely accidental and/or invalid.

The option is usually called enforce-first-as or a variation of that.

Default Routes§

This might seem obvious that the DFZ - the Default-free Zone - shouldn't have defaults. In the grand scheme of things, this is correct!

However, upstreams may announce a Default Route in addition to a Full Table or instead of one or another. And there are several reasons why you might want that!

Perhaps you just want a default route from a transit provider and peerings in addition to that. This might be the case when you only have one upstream or two "equal" feeds. Or when you are using a Layer 3 Switch or Router with a small FIB instead of a full blown router, capable of carring the millions of routes.

But, to save you some time, if you can accept a Full Table, you probably should.

This ensures that you have the best chance at finding the best path to a network instead of just a reasonable one. If you don't have feeds from the same ISP twice - limiting your capabilities in terms of redundancy and high availability - it is highly likely that one has better routes to some network than the other. If you plan on becoming an upstream, not carrying the entire DFZ becomes unfeasable as well. Therefore, if you're serious about that Internet thing, probably a wise choice to carry a Full Table.

Note that when you do so, you essentially become your own source of the Default Routes, which you can advertise to your own internal equipment or customers if they so desire.

Bogon Prefixes§

Bogon Filtering is a common practice and when filtering, kind of a low hanging fruit. There are lots of network prefixes that are not supposed to be carried through the Internet.

[edit]
[email protected]# show policy-options prefix-list BOGONS_V4
apply-flags omit;
/* RFC1122 'this' network */
0.0.0.0/8;
/* RFC1918 Private-Use */
10.0.0.0/8;
/* RFC6598 Shared Address Space/CGNAT */
100.64.0.0/10;
/* RFC1122 Loopback */
127.0.0.0/8;
/* RFC3927 Link Local */
169.254.0.0/16;
/* RFC1918 Private-Use */
172.16.0.0/12;
/* RFC6333 DS-Lite/IETF Protocol Assignments */
192.0.0.0/29;
/* RFC5737 TEST-NET-1 */
192.0.2.0/24;
/* RFC7526 6to4 Relay Anycast */
192.88.99.0/24;
/* RFC1918 Private-Use */
192.168.0.0/16;
/* RFC2544 Benchmarking */
198.18.0.0/15;
/* RFC5737 TEST-NET-2 */
198.51.100.0/24;
/* RFC5737 TEST-NET-3 */
203.0.113.0/24;
/* RFC5771 Multicast */
224.0.0.0/4;
/* RFC1112 Reserved */
240.0.0.0/4;
/* RFC8190 Limited Broadcast */
255.255.255.255/32;


[edit]
[email protected]# show policy-options prefix-list BOGONS_V6
apply-flags omit;
/* RFC4291 IPv4-compatible, loopback, et al */
::/8;
/* RFC6666 Discard-Only Address Block */
0100::/8;
/* RFC4048 OSI NSAP IPv6 mapping */
0200::/7;
/* RFC4291 Reserved by IETF */
0400::/6;
/* RFC4291 Reserved by IETF */
0800::/5;
/* RFC4291 Reserved by IETF */
1000::/4;
/* RFC4380 Teredo */
2001::/32;
/* RFC5180 Benchmarking */
2001:2::/48;
/* RFC7450 Automatic Multicast Tunneling */
2001:3::/32;
/* RFC4843 ORCHID */
2001:10::/28;
/* RFC7343 ORCHIDv2 */
2001:20::/28;
/* RFC3849 Documentation */
2001:db8::/32;
/* RFC7526 6to4 Relay Anycast */
2002::/16;
/* RFC2471 6bone */
3ffe::/16;
/* RFC4193 Unique Local Unicast */
fc00::/7;
/* RFC4291 Link Local Unicast */
fe80::/10;
/* RFC3879 old site local unicast */
fec0::/10;
/* RFC4291 Multicast */
ff00::/8;

These are not to be announced and not to be routed into the DFZ. Never. Always invalid.

Well, with the exception of Teredo, depending on your understanding, but honestly, that ship has sunk. Reject it when coming from external sources.

Bogon ASNs§

Like Bogon Prefixes, Bogon ASNs are ASN that are not supposed to be in the DFZ.

Job Snijders has done the Internet a favour and compiled Bogon ASN Filter examples for several vendors. See bogon-asn-filters for up to date examples.

Another Juniper example:

[edit]
[email protected]# show policy-options as-path-group BOGON_ASNS
/* RFC7607 */
as-path zero ".* 0 .*";
/* RFC4893 AS_TRANS */
as-path AS_TRANS ".* 23456 .*";
/* RFC5398 Documentation/Example ASNs */
as-path examples1 ".* [64496-64511] .*";
as-path examples2 ".* [65536-65551] .*";
/* RFC6996 Private ASNs */
as-path reserved1 ".* [64512-65534] .*";
as-path reserved2 ".* [4200000000-4294967294] .*";
/* RFC6996 Last 16-bit ASN */
as-path last16 ".* 65535 .*";
/* RFC6996 Last 32-bit ASN */
as-path last32 ".* 4294967295 .*";
/* IANA Reserved ASNs */
as-path iana-reserved ".* [65552-131071] .*";

When you receive a route with them in the path, you should reject them. You should also make sure your announcements strip all your private ASNs, but don't do it for your customers or peers.

Too Big or Small Networks§

Some prefix sizes just have no business in the DFZ.

For IPv4, a longer prefix length than a /24 is not considered routable. So anything /25-/32 has no business being advertised. A shorter prefix length than a /8 does not make much sense either. So anything /1-/7 should be rejected.

For IPv6, a longer prefix than a /48 is not considered routable. So anything /49-/128 belongs in the trash. Defining the shortest prefix length to accept is a bit more difficult. /12s are the largest blocks assigned to RIRs, so that's a very safe bet. /29s are pretty much the shortest prefix length blocks they will allocate. I chose /19, because there are currently only two prefixes announced with that size. DTAG's 2003::/19 and Orange S.A.'s Opentransit 2a01:c000::/19. There is only one prefix I know of that is bigger: 2002::/16, the 6to4 prefix, which I chose to reject anyway. See the Bogon Prefixes. So, I'll reject any IPv6 prefix /1-/17.

Keep in mind that I explicitly left out /0 prefixes - the default routes. That one has its own section.

[edit]
[email protected]# show policy-options policy-statement REJECT_ODD_SIZE_V4
term too-small {
    from {
        route-filter 0.0.0.0/0 prefix-length-range /25-/32;
    }
    then reject;
}
term too-big {
    from {
        route-filter 0.0.0.0/0 prefix-length-range /1-/7;
    }
}

[edit]
[email protected]# show policy-options policy-statement REJECT_ODD_SIZE_V6
term too-small {
    from {
        route-filter ::/0 prefix-length-range /49-/128;
    }
    then reject;
}
term too-big {
    from {
        route-filter ::/0 prefix-length-range /1-/18;
    }
    then reject;
}

Reject Transit ASNs§

When not peering with a transit provider, you usually don't expect huge Transit ASNs to appear in the path. If you don't expect it, it is most likely a Transit Leak!

In practice, if you peer with a route server of an IXP and you receive a path containing a big Transit network, it'll either be announcing itself or it's a leak. If it's just a regular peering session with some random network, it most definitly is a leak.

This is the as-path regex list I came up with:

/* List of ASNs to filter when peering with RS or not-T1s */
as-path-group TRANSIT_LEAKS {
    as-path cogent ".* 174 .*";
    as-path centurylink ".* 209 .*";
    /* Verizon/UUNET is at IXPs, doesn't peer with RS */
    as-path verizon ".* 701 .*";
    /* Vodafone iss at IXPs, never peers with RS */
    as-path vodafone ".* 1273 .*";
    as-path arelion ".* 1299 .*";
    /* Verizon is at IXPs, doesn't peer with RS */
    as-path verizon ".* 2828 .*";
    /* NTT is at IXPs, never peers with RS */
    as-path ntt ".* 2914 .*";
    /* GTT is at IXPs, never peers with RS */
    as-path gtt ".* 3257 .*";
    /* DTAG is barely at IXPs, never peers with RS */
    as-path dtag ".* 3320 .*";
    /* Lumen is at IXPs, doesn't peer with RS */
    as-path lumen ".* 3356 .*";
    /* PCCW is at IXPs, never peers with RS */
    as-path pccw ".* 3491 .*";
    /* ChinaNet peers with RS */
    as-path chinanet "[^4134]+ 4134 .*";
    /* Telsta peers with RS */
    as-path telsta "[^4637]+ 4637 .*";
    /* ChinaNet peers with RS */
    as-path chinanet2 "[^4809]+ 4809 .*";
    /* Orange is at IXPs, doesn't peer with RS */
    as-path orange ".* 5511 .*";
    as-path tata ".* 6453 .*";
    /* Zayo peers with RSes */
    as-path zayo "[^6461]+ 6461 .*";
    /* Seabone is at IXes, but doesn't peer with RS */
    as-path ti-seabone ".* 6762 .*";
    /* LG is at IXPs, never peers with RS */
    as-path libertyglobal ".* 6830 .*";
    /* HE peers with RS */
    as-path hurricane "[^6939]+ 6939 .*";
    as-path atnt ".* 7018 .*";
    /* Singtel is at IXPs, doesn't peer with RS */
    as-path singtel ".* 7473 .*";
    as-path comcast ".* 7922 .*";
    /* ReTN is at IXPs and with RS! */
    as-path retn "[^9002]+ 9002 .*";
    /* Telxius is at IXes, but doesn't peer with RS */
    as-path telxius ".* 12956 .*";
}

It drops some routes with carriers in the list that never peer with a route server, plus big networks that would only announce itself.

I doubt I'll ever get the chance to do settlement free peering with most of these on that list and if that happens, I'll most likely classify it as "transit" anyway. That way I don't fully overload my poor QFXes that just do default routes + peers.

Keep in mind that this is for my network. The list might be different, you might have sensible additions or changes.

RPKI§

Oh boy, this one is great! This is about RFC6480.

A bunch of people realized that while there are a bunch of measures against bad announcements, none of them were easily deployable or maintainable. Mostly before RPKI, the best way to filter were prefix lists generated from IRR data with tools like bgpq4. Another thing Job Snijders has helped with.

While feasable for filtering Eyeball Networks that generally have a fixed list of prefixes and stick to them for a while, for huge service providers this list changes very often. At some point, there is at least some degree of necessary trust or you'll risk not being able to reach a lot of prefixes.

If say, Amazon would announce a prefix of Google, I'm sure there are a bunch of routers that'd accept it.

RPKI is meant to solve issues like this by implementing a cryptographically verifyable way to assert ownership of resources. Much like Root DNS servers, RIRs enable a signed certificate trust chain for proof of origin. Thus, it is much harder to fake.

Thanks to software support, it is also much easier to keep this information up to date and offloading the "knowing the ownership of all resources in the internet" to another server. This lets routers just ask that server on-demand instead of having to keep that entire ownership information in memory at all times.

The thing that does the most impact in todays world is rejecting RPKI Invalid prefixes. We can also consider accepting RPKI Valid prefixes even if they are not in our generated prefix list for a peer.

While I hope for a future where every prefix is signed, we are unfortunately not at that point. When RPKI does not know about a prefix, we'll just have to rely on prefix lists once again. All I can do to do my part is to ensure all my customers have RPKI records and it's valid.

Maximum AS_PATH Length§

Sometimes there are really ridiculous paths being advertised thanks to path prepending being used as a traffic shaping mechanism.

Realistically, you'd only see a maximum AS_PATH length of somewhere around 10 for like 99.9% of the routes. It's probably safe to drop routes with an AS_PATH length of 20 or more. That'd give it a 10 AS safety margin.

A BIRD example for this could look something like this:

function reject_long_aspath() {
    if (bgp_path.len > 20) then {
        print "Reject: AS_PATH too long: ", net, " ", bgp_path, " protocol: ", proto;
        reject;
    }
}

Cisco has bgp maxas-limit.

Maximum Prefix Limit§

While not strictly filtering routes itself, it is the last line of defense: The dreaded Maximum Prefix Limit.

Intended as a way to bring down peering sessions forcibly when the other side is clearly misconfigured, it stops just that. If the other side is trying to send you a Full Table and you expect a few dozen prefixes instead, this might overload your router and kill it! Or you end up routing to it but it'll just drop the packets.

Either way, it's just one less thing to worry about. Usually your peering partners mention a sane limit either directly or have it somewhere, for example on PeeringDB.

Packet Filtering§

Route Filtering is only half the story. You need to filter the actual traffic, too.

Just because you only announce certain routes does not mean you don't get packets that are destined for others.

Discarding Bogons§

Like I mentioned in the Bogon Prefixes section, Bogons are not welcome in the Internet. They must be discarded. You can do this in a plethora of ways, important is that they are welcome neither as source nor destination.

Dropping packets destined for bogons is simple enough. On Junos and other routing platforms, you simply install discard routes for the prefixes.

Packets with bogon as source address vary more between vendors. With Junos, you simply set up rpf-loose-mode-discard, enable uRPF loose and it discards packets whose source address points to a discard next-hop. See the next section.

Unicast Reverse-Path Forwarding (uRPF)§

Unicast Reverse-Path Forwarding is a way to prevent IP address spoofing.

uRPF has three modes, two of them are commonly implemented:

  1. Strict Mode
  • Each incoming packet's source is checked against the FIB, which contains only the best path.
  • If the FIB doesn't contain an entry for the source or it points to a different interface than the one you received it on, the packet gets discarded.
  • This only works in symmetric routing, so only applicable for your own infrastructure or Eyeball Networks.
  1. Feasable-Paths Mode
  • Each incoming packet's source is checked against the FIB, which carries all sane paths, not just the best one.
  • If the FIB doesn't contain an entry for the source or if no entry points to the interface the packet was received on, the packet gets discarded.
  • This works with asymmetric routing, however, it is not always implemented on lower-grade routers.
  1. Loose mode
  • Each incoming packet's source is checked against the FIB.
  • If the FIB doesn't contain an entry for the source (or points to a discard interface on some platforms), the packet gets discarded.
  • This shouldn't cause trouble to implement but doesn't help that much (except for bogon filtering).

These all help in some degree, strict mode being the most helpful but only sometimes applicable and loose mode being only generally useful if discard routes get honored. General recommendation from me is the following:

  • Transit gets uRPF loose.
  • Peers get uRPF feasable-paths if supported or uRPF loose otherwise.
  • Customers get uRPF feasable-paths if supported or strict.

Note that this will only go well for customers who actually export you their prefixes and not just send traffic to you. Tell them about it, mention that they can add no-export communities, etc..

They are usually set on a per-interface basis, but some vendors/platforms might allow you to set loose mode globally.

Remote Triggered Black Hole (RTBH)§

RTBH is a way to route a specific prefix (usually a more specific) explicitly to the void. Usually when that prefix is targeted by DDoS as a last measure when mitigation is not otherwise possible.

Sometimes this is signalled by attaching the well known blackhole community to a route and is usually an option on Route Servers on an IXP or on peerings with bigger providers.

Inbound Filtering§

When accepting routes from your upstreams and peers, there are some things you should always reject, like the Bogons, too big/small networks and RPKI Invalids. Depending on your needs, you should also filter for your desired default route behaviour. Discard Bogons.

Doing any more filtering on your upstreams is a loosing battle. If you see them announcing RPKI Invalids, do tell them, though. :)

Filtering peers that don't provide you a Full Table is much more feasable. You can generate prefix lists with bgpq4 for example for their ASN or AS-SET of their choosing. Make sure to update that regularly though. AIf you can't generate prefix-lists or keep them up to date for whatever reason, make sure to at least drop Bogons, too big/small, RPKI Invalids and transit leaks. Configure the Maximum Prefix Limit to something sane, too.

Filtering routes from your downstream customers is the best case for you. You should filter strict with a prefix list, possibly (only) allowing RPKI Valids anyway and maybe apply uRPF feasable-paths/strict (but think about it).

If they complain, ask them why. If they don't have a good answer, tell them to Do Things Correctly™: They are either abusing you or doing things they almost certainly shouldn't.

Outbound Filtering§

"Right, after making clear that there is a bunch we can't accept, what stops me from just announcing my prefixes statically? After all, I know what my prefixes are!"

Well, nothing! If you don't provide upstream to another ASN, that's just fine. But, once that changes, you'll probably need to do some dynamic filtering or you could interfer with the operations of your downstream.

After filtering your downstreams inbound - accepting only what you know to be correct - and making sure you're allowed to announce the prefixes according to the IRR and RPKI, you can announce that to your peers and upstreams. One of the simplest way to do that is to attach a BGP Community to it of some kind and making your outbound policies ensure it is attached to a route before announcing it. You should also make sure you filter it when it comes from external sources. Unless you wanna become a free upstream provider, that is.

If you provide someone downstream the Full Table of yours, you should make sure you did your best to filter your inbound routes first and don't announce internal or invalid routes alongside.

So be a good egg in the chicken coop, make sure you don't announce crap and RPKI sign all your routes. You should also tell others to set a sane Maximum Prefix Limit (usually 10x the actual number of prefixes expected to be announced for that safety margin), just in case.

Make sure to not route Bogon sources or destinations and only route packets for destinations your peers announce. Best way to ensure sane outgoing traffic is to discard bad incoming traffic.

Notes§

Again, this is not complete and just my understanding of best practices mixed in with my own opinion. If you have suggestions, feel free to contact me.

FYI: I mentioned a lot of RFCs here, I wanted to generate markdown footnotes for the references to those, but failed to realize that in code blocks, they don't work. Here's what I attempted regardless.

# Unify RFC references and delete old ones. Breaks thanks to codeblocks.
sed -Ei -e 's/\[?RFC ?([0-9]+)\]?/[RFC\1]/ig' -e '/^\[RFC/d' content/odd-world-fundamentals.md

# Generate the references. This part works, but with no links to point to it, rather useless.
sed -En '[email protected]*(RFC ?)([0-9]+).*@[\1\2]: https://datatracker.ietf.org/doc/html/rfc\[email protected]' content/odd-world-fundamentals.md | uniq >> content/odd-world-fundamentals.md

So I undid that. Sorry!

Hope you can make some sense of this and it helped you.

Update 2022-04-18§

On Twitter Daryll Swer pointed out that I didn't mention traffic filtering at all and suggested some things.

Initially, I didn't want to touch this subject as this post was basically inspired by a colleague who wanted to know more about BGP filtering, but not mentioning it seems negligent. So, I added some sections.

Update 2022-04-24§

Mention filtering of Transit Leaks and add my own as-path filter list. Does drop some routes, but you should have better routes to those anyway.