RFC 9136 | EVPN Prefix Advertisement | October 2021 |
Rabadan, et al. | Standards Track | [Page] |
The BGP MPLS-based Ethernet VPN (EVPN) (RFC 7432) mechanism provides a flexible control plane that allows intra-subnet connectivity in an MPLS and/or Network Virtualization Overlay (NVO) (RFC 7365) network. In some networks, there is also a need for dynamic and efficient inter-subnet connectivity across Tenant Systems and end devices that can be physical or virtual and do not necessarily participate in dynamic routing protocols. This document defines a new EVPN route type for the advertisement of IP prefixes and explains some use-case examples where this new route type is used.¶
This is an Internet Standards Track document.¶
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.¶
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc9136.¶
Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
[RFC7365] provides a framework for Data Center (DC) Network Virtualization over Layer 3 and specifies that the Network Virtualization Edge (NVE) devices must provide Layer 2 and Layer 3 virtualized network services in multi-tenant DCs. [RFC8365] discusses the use of EVPN as the technology of choice to provide Layer 2 or intra-subnet services in these DCs. This document, along with [RFC9135], specifies the use of EVPN for Layer 3 or inter-subnet connectivity services.¶
[RFC9135] defines some fairly common inter-subnet forwarding scenarios where Tenant Systems (TSs) can exchange packets with TSs located in remote subnets. In order to achieve this, [RFC9135] describes how Media Access Control (MAC) and IPs encoded in TS RT-2 routes are not only used to populate MAC Virtual Routing and Forwarding (MAC-VRF) and overlay Address Resolution Protocol (ARP) tables but also IP-VRF tables with the encoded TS host routes (/32 or /128). In some cases, EVPN may advertise IP prefixes and therefore provide aggregation in the IP-VRF tables, as opposed to propagating individual host routes. This document complements the scenarios described in [RFC9135] and defines how EVPN may be used to advertise IP prefixes. Interoperability between EVPN and Layer 3 Virtual Private Network (VPN) [RFC4364] IP Prefix routes is out of the scope of this document.¶
Section 2.1 describes the inter-subnet connectivity requirements in DCs. Section 2.2 explains why a new EVPN route type is required for IP prefix advertisements. Sections 3, 4, and 5 will describe this route type and how it is used in some specific use cases.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This document also assumes familiarity with the terminology of [RFC7365], [RFC7432], and [RFC8365].¶
This section describes the inter-subnet connectivity requirements in DCs and why a specific route type to advertise IP prefixes is needed.¶
[RFC7432] is used as the control plane for an NVO solution in DCs, where NVE devices can be located in hypervisors or Top-of-Rack (ToR) switches, as described in [RFC8365].¶
The following considerations apply to TSs that are physical or virtual systems identified by MAC (and possibly IP addresses) and are connected to BDs by Attachment Circuits:¶
The Tenant Systems may be VA entities that forward traffic to/from IP addresses of different end devices sitting behind them.¶
Figure 1 illustrates some of the examples described above.¶
Where:¶
NVE1, NVE2, NVE3, NVE4, NVE5, DGW1, and DGW2 share the same BD for a particular tenant. BD-10 is comprised of the collection of BD instances defined in all the NVEs. All the hosts connected to BD-10 belong to the same IP subnet. The hosts connected to BD-10 are listed below:¶
For a BD to which an ingress NVE is attached, "Overlay Index" is defined as an identifier that the ingress EVPN NVE requires in order to forward packets to a subnet or host in a remote subnet. As an example, vIP23 (Figure 1) is an Overlay Index that any NVE attached to BD-10 needs to know in order to forward packets to SN1. The IRB3 IP address is an Overlay Index required to get to SN4, and ESI4 is an Overlay Index needed to forward traffic to SN5. In other words, the Overlay Index is a next hop in the overlay address space that can be an IP address, a MAC address, or an ESI. When advertised along with an IP prefix, the Overlay Index requires a recursive resolution to find out the egress NVE to which the EVPN packets need to be sent.¶
All the DC use cases in Figure 1 require inter-subnet forwarding; therefore, the individual host routes and subnets:¶
[RFC7432] defines a MAC/IP Advertisement route (also referred to as "RT-2") where a MAC address can be advertised together with an IP address length and IP address (IP). While a variable IP address length might have been used to indicate the presence of an IP prefix in a route type 2, there are several specific use cases in which using this route type to deliver IP prefixes is not suitable.¶
One example of such use cases is the "floating IP" example described in Section 2.1. In this example, it is necessary to decouple the advertisement of the prefixes from the advertisement of a MAC address of either M2 or M3; otherwise, the solution gets highly inefficient and does not scale.¶
For example, if 1,000 prefixes are advertised from M2 (using RT-2) and the floating IP owner changes from M2 to M3, 1,000 routes would be withdrawn by M2 and readvertised by M3. However, if a separate route type is used, 1,000 routes can be advertised as associated with the floating IP address (vIP23), and only one RT-2 can be used for advertising the ownership of the floating IP, i.e., vIP23 and M2 in the route type 2. When the floating IP owner changes from M2 to M3, a single RT-2 withdrawal/update is required to indicate the change. The remote DGW will not change any of the 1,000 prefixes associated with vIP23 but will only update the ARP resolution entry for vIP23 (now pointing at M3).¶
An EVPN route (type 5) for the advertisement of IP prefixes is described in this document. This new route type has a differentiated role from the RT-2 route and addresses the inter-subnet connectivity scenarios for DCs (or NVO-based networks in general) described in this document. Using this new RT-5, an IP prefix may be advertised along with an Overlay Index, which can be a GW IP address, a MAC, or an ESI. The IP prefix may also be advertised without an Overlay Index, in which case the BGP next hop will point at the egress NVE, Area Border Router (ABR), or ASBR, and the MAC in the EVPN Router's MAC Extended Community will provide the inner MAC destination address to be used. As discussed throughout the document, the EVPN RT-2 does not meet the requirements for all the DC use cases; therefore, this EVPN route type 5 is required.¶
The EVPN route type 5 decouples the IP prefix advertisements from the MAC/IP Advertisement routes in EVPN. Hence:¶
The following sections describe how EVPN is extended with a route type for the advertisement of IP prefixes and how this route is used to address the inter-subnet connectivity requirements existing in the DC.¶
The BGP EVPN NLRI as defined in [RFC7432] is shown below:¶
This document defines an additional route type (RT-5) in the IANA "EVPN Route Types" registry [EVPNRouteTypes] to be used for the advertisement of EVPN routes using IP prefixes:¶
According to Section 5.4 of [RFC7606], a node that doesn't recognize the route type 5 (RT-5) will ignore it. Therefore, an NVE following this document can still be attached to a BD where an NVE ignoring RT-5s is attached. Regular procedures described in [RFC7432] would apply in that case for both NVEs. In case two or more NVEs are attached to different BDs of the same tenant, they MUST support the RT-5 for the proper inter-subnet forwarding operation of the tenant.¶
The detailed encoding of this route and associated procedures are described in the following sections.¶
An IP Prefix route type for IPv4 has the Length field set to 34 and consists of the following fields:¶
An IP Prefix route type for IPv6 has the Length field set to 58 and consists of the following fields:¶
Where:¶
The RD, Ethernet Tag ID, IP prefix length, and IP prefix are part of the route key used by BGP to compare routes. The rest of the fields are not part of the route key.¶
An IP Prefix route MAY be sent along with an EVPN Router's MAC Extended Community (defined in [RFC9135]) to carry the MAC address that is used as the Overlay Index. Note that the MAC address may be that of a TS.¶
As described in Section 3.2, certain data combinations in a received route would imply a treat-as-withdraw handling of the route [RFC7606].¶
RT-5 routes support recursive lookup resolution through the use of Overlay Indexes as follows:¶
In order to enable the recursive lookup resolution at the ingress NVE, an NVE that is a possible egress NVE for a given Overlay Index must originate a route advertising itself as the BGP next hop on the path to the system denoted by the Overlay Index. For instance:¶
Note that the RT-1 or RT-2 routes needed for the recursive resolution may arrive before or after the given RT-5 route.¶
The indirection provided by the Overlay Index and its recursive lookup resolution is required to achieve fast convergence in case of a failure of the object represented by the Overlay Index (see the example described in Section 2.2).¶
Table 1 shows the different RT-5 field combinations allowed by this specification and what Overlay Index must be used by the receiving NVE/PE in each case. Cases where there is no Overlay Index are indicated as "None" in Table 1. If there is no Overlay Index, the receiving NVE/PE will not perform any recursive resolution, and the actual next hop is given by the RT-5's BGP next hop.¶
ESI | GW IP | MAC* | Label | Overlay Index |
---|---|---|---|---|
Non-Zero | Zero | Zero | Don't Care | ESI |
Non-Zero | Zero | Non-Zero | Don't Care | ESI |
Zero | Non-Zero | Zero | Don't Care | GW IP |
Zero | Zero | Non-Zero | Zero | MAC |
Zero | Zero | Non-Zero | Non-Zero | MAC or None** |
Zero | Zero | Zero | Non-Zero | None*** |
Table Notes:¶
If the combination of ESI, GW IP, MAC, and Label in the receiving RT-5 is different than the combinations shown in Table 1, the router will process the route as per the rules described at the beginning of this section (Section 3.2).¶
Table 2 shows the different inter-subnet use cases described in this document and the corresponding coding of the Overlay Index in the route type 5 (RT-5).¶
Section | Use Case | Overlay Index in the RT-5 |
---|---|---|
4.1 | TS IP address | GW IP |
4.2 | Floating IP address | GW IP |
4.3 | "Bump-in-the-wire" | ESI or MAC |
4.4 | IP-VRF-to-IP-VRF | GW IP, MAC, or None |
The above use cases are representative of the different Overlay Indexes supported by the RT-5 (GW IP, ESI, MAC, or None).¶
This section describes some use cases for the Overlay Index types used with the IP Prefix route. Although the examples use IPv4 prefixes and subnets, the descriptions of the RT-5 are valid for the same cases with IPv6, except that IP Prefixes, IPL, and GW IP are replaced by the corresponding IPv6 values.¶
Figure 5 illustrates an example of inter-subnet forwarding for subnets sitting behind VAs (on TS2 and TS3).¶
An example of inter-subnet forwarding between subnet SN1, which uses a 24-bit IP prefix (written as SN1/24 in the future), and a subnet sitting in the WAN is described below. NVE2, NVE3, DGW1, and DGW2 are running BGP EVPN. TS2 and TS3 do not participate in dynamic routing protocols, and they only have a static route to forward the traffic to the WAN. SN1/24 is dual-homed to NVE2 and NVE3.¶
In this case, a GW IP is used as an Overlay Index. Although a different Overlay Index type could have been used, this use case assumes that the operator knows the VA's IP addresses beforehand, whereas the VA's MAC address is unknown and the VA's ESI is zero. Because of this, the GW IP is the suitable Overlay Index to be used with the RT-5s. The NVEs know the GW IP to be used for a given prefix by policy.¶
NVE2 advertises the following BGP routes on behalf of TS2:¶
Similarly, NVE3 advertises the following BGP routes on behalf of TS3:¶
DGW1 and DGW2 import both received routes based on the Route Targets:¶
When DGW1 receives a packet from the WAN with destination IPx, where IPx belongs to SN1/24:¶
The IP packet destined to IPx is encapsulated with:¶
When the packet arrives at NVE2:¶
Note that in the opposite direction, TS2 will send traffic based on its static-route next-hop information (IRB1 and/or IRB2), and regular EVPN procedures will be applied.¶
Sometimes TSs work in active/standby mode where an upstream floating IP owned by the active TS is used as the Overlay Index to get to some subnets behind the TS. This redundancy mode, already introduced in Sections 2.1 and 2.2, is illustrated in Figure 6.¶
In this use case, a GW IP is used as an Overlay Index for the same reasons as in Section 4.1. However, this GW IP is a floating IP that belongs to the active TS. Assuming TS2 is the active TS and owns vIP23:¶
NVE2 advertises the following BGP routes for TS2:¶
NVE3 advertises the following BGP route for TS3 (it does not advertise an RT-2 for M3/vIP23):¶
DGW1 and DGW2 import both received routes based on the Route Target:¶
When DGW1 receives a packet from the WAN with destination IPx, where IPx belongs to SN1/24:¶
The IP packet destined to IPx is encapsulated with:¶
When the packet arrives at NVE2:¶
Figure 7 illustrates an example of inter-subnet forwarding for an IP Prefix route that carries subnet SN1. In this use case, TS2 and TS3 are Layer 2 VA devices without any IP addresses that can be included as an Overlay Index in the GW IP field of the IP Prefix route. Their MAC addresses are M2 and M3, respectively, and are connected to BD-10. Note that IRB1 and IRB2 (in DGW1 and DGW2, respectively) have IP addresses in a subnet different than SN1.¶
Since TS2 and TS3 cannot participate in any dynamic routing protocol and neither has an IP address assigned, there are two potential Overlay Index types that can be used when advertising SN1:¶
The advantage of using an ESI as the Overlay Index as opposed to the VA's MAC address is that the forwarding to the egress NVE can be done purely based on the state of the AC in the Ethernet segment (notified by the Ethernet A-D per EVI route), and all the EVPN multihoming redundancy mechanisms can be reused. For instance, the mass withdrawal mechanism described in [RFC7432] for fast failure detection and propagation can be used. It is assumed per this section that an ESI Overlay Index is used in this use case, but this use case does not preclude the use of the VA's MAC address as an Overlay Index. If a MAC is used as the Overlay Index, the control plane must follow the procedures described in Section 4.4.3.¶
The model supports VA redundancy in a similar way to the one described in Section 4.2 for the floating IP Overlay Index use case, except that it uses the EVPN Ethernet A-D per EVI route instead of the MAC advertisement route to advertise the location of the Overlay Index. The procedure is explained below:¶
Assuming TS2 is the active TS in ESI23, NVE2 advertises the following BGP routes:¶
NVE3 advertises the following BGP route for TS3 (no AD per EVI route is advertised):¶
DGW1 and DGW2 import the received routes based on the Route Target:¶
When DGW1 receives a packet from the WAN with destination IPx, where IPx belongs to SN1/24:¶
The IP packet destined to IPx is encapsulated with:¶
When the packet arrives at NVE2:¶
This use case is similar to the scenario described in Section 9.1 of [RFC9135]; however, the new requirement here is the advertisement of IP prefixes as opposed to only host routes.¶
In the examples described in Sections 4.1, 4.2, and 4.3, the BD instance can connect IRB interfaces and any other Tenant Systems connected to it. EVPN provides connectivity for:¶
In order to provide connectivity for (1), MAC/IP Advertisement routes (RT-2) are needed so that IRB or TS MACs and IPs can be distributed. Connectivity type (2) is accomplished by the exchange of IP Prefix routes (RT-5) for IPs and subnets sitting behind certain Overlay Indexes, e.g., GW IP, ESI, or TS MAC.¶
In some cases, IP Prefix routes may be advertised for subnets and IPs sitting behind an IRB. This use case is referred to as the "IP-VRF-to-IP-VRF" model.¶
[RFC9135] defines an asymmetric IRB model and a symmetric IRB model based on the required lookups at the ingress and egress NVE. The asymmetric model requires an IP lookup and a MAC lookup at the ingress NVE, whereas only a MAC lookup is needed at the egress NVE; the symmetric model requires IP and MAC lookups at both the ingress and egress NVE. From that perspective, the IP-VRF-to-IP-VRF use case described in this section is a symmetric IRB model.¶
Note that in an IP-VRF-to-IP-VRF scenario, out of the many subnets that a tenant may have, it may be the case that only a few are attached to a given IP-VRF of the NVE/PE. In order to provide inter-subnet connectivity among the set of NVE/PEs where the tenant is connected, a new SBD is created on all of them if a recursive resolution is needed. This SBD is instantiated as a regular BD (with no ACs) in each NVE/PE and has an IRB interface that connects the SBD to the IP-VRF. The IRB interface's IP or MAC address is used as the Overlay Index for a recursive resolution.¶
Depending on the existence and characteristics of the SBD and IRB interfaces for the IP-VRFs, there are three different IP-VRF-to-IP-VRF scenarios identified and described in this document:¶
Inter-subnet IP multicast is outside the scope of this document.¶
Figure 8 depicts the Interface-less IP-VRF-to-IP-VRF model.¶
In this case:¶
In order to meet the above requirements, the EVPN route type 5 will be used to advertise the IP prefixes, along with the EVPN Router's MAC Extended Community as defined in [RFC9135] if the advertising NVE/DGW uses Ethernet NVO tunnels. Each NVE/DGW will advertise an RT-5 for each of its prefixes with the following fields:¶
Each RT-5 will be sent with a Route Target identifying the tenant (IP-VRF) and may be sent with two BGP extended communities:¶
The following example illustrates the procedure to advertise and forward packets to SN1/24 (IPv4 prefix advertised from NVE1):¶
NVE1 advertises the following BGP route:¶
DGW1 imports the received routes from NVE1:¶
When DGW1 receives a packet from the WAN with destination IPx, where IPx belongs to SN1/24:¶
When the packet arrives at NVE1:¶
The model described above is called an "interface-less" model since the IP-VRFs are connected directly through tunnels, and they don't require those tunnels to be terminated in SBDs instead, as in Sections 4.4.2 or 4.4.3.¶
Figure 9 depicts the Interface-ful IP-VRF-to-IP-VRF with SBD IRB model.¶
In this model:¶
EVPN type 5 routes will be used to advertise the IP prefixes, whereas EVPN RT-2 routes will advertise the MAC/IP addresses of each SBD IRB interface. Each NVE/DGW will advertise an RT-5 for each of its prefixes with the following fields:¶
Each RT-5 will be sent with a Route Target identifying the tenant (IP-VRF). The EVPN Router's MAC Extended Community should not be sent in this case.¶
The following example illustrates the procedure to advertise and forward packets to SN1/24 (IPv4 prefix advertised from NVE1):¶
NVE1 advertises the following BGP routes:¶
DGW1 imports the received routes from NVE1:¶
When DGW1 receives a packet from the WAN with destination IPx, where IPx belongs to SN1/24:¶
When the packet arrives at NVE1:¶
The model described above is called an "interface-ful with SBD IRB" model because the tunnels connecting the DGWs and NVEs need to be terminated into the SBD. The SBD is connected to the IP-VRFs via SBD IRB interfaces, and that allows the recursive resolution of RT-5s to GW IP addresses.¶
Figure 10 depicts the Interface-ful IP-VRF-to-IP-VRF with unnumbered SBD IRB model. Note that this model is similar to the one described in Section 4.4.2, only without IP addresses on the SBD IRB interfaces.¶
In this model:¶
This model will also make use of the RT-5 recursive resolution. EVPN type 5 routes will advertise the IP prefixes along with the EVPN Router's MAC Extended Community used for the recursive lookup, whereas EVPN RT-2 routes will advertise the MAC addresses of each SBD IRB interface (this time without an IP).¶
Each NVE/DGW will advertise an RT-5 for each of its prefixes with the same fields as described in Section 4.4.2, except:¶
Each RT-5 will be sent with a Route Target identifying the tenant (IP-VRF) and the EVPN Router's MAC Extended Community containing the MAC address associated with the SBD IRB interface. This MAC address may be reused for all the IP-VRFs in the NVE.¶
The example is similar to the one in Section 4.4.2:¶
NVE1 advertises the following BGP routes:¶
Route type 5 (IP Prefix route) containing the same values as in the example in Section 4.4.2, except:¶
Route type 2 (MAC route for the SBD IRB) with the same values as in Section 4.4.2, except:¶
DGW1 imports the received routes from NVE1:¶
When DGW1 receives a packet from the WAN with destination IPx, where IPx belongs to SN1/24:¶
When the packet arrives at NVE1:¶
The model described above is called an "interface-ful with unnumbered SBD IRB" model (as in Section 4.4.2) but without the SBD IRB having an IP address.¶
This document provides a set of procedures to achieve inter-subnet forwarding across NVEs or PEs attached to a group of BDs that belong to the same tenant (or VPN). The security considerations discussed in [RFC7432] apply to the intra-subnet forwarding or communication within each of those BDs. In addition, the security considerations in [RFC4364] should also be understood, since this document and [RFC4364] may be used in similar applications.¶
Contrary to [RFC4364], this document does not describe PE/CE route distribution techniques but rather considers the CEs as TSs or VAs that do not run dynamic routing protocols. This can be considered a security advantage, since dynamic routing protocols can be blocked on the NVE/PE ACs, not allowing the tenant to interact with the infrastructure's dynamic routing protocols.¶
In this document, the RT-5 may use a regular BGP next hop for its resolution or an Overlay Index that requires a recursive resolution to a different EVPN route (an RT-2 or an RT-1). In the latter case, it is worth noting that any action that ends up filtering or modifying the RT-2 or RT-1 routes used to convey the Overlay Indexes will modify the resolution of the RT-5 and therefore the forwarding of packets to the remote subnet.¶
IANA has registered value 5 in the "EVPN Route Types" registry [EVPNRouteTypes] defined by [RFC7432] as follows:¶
Value | Description | Reference |
---|---|---|
5 | IP Prefix | RFC 9136 |
The authors would like to thank Mukul Katiyar, Jeffrey Zhang, and Alex Nichol for their valuable feedback and contributions. Tony Przygienda and Thomas Morin also helped improve this document with their feedback. Special thanks to Eric Rosen for his detailed review, which really helped improve the readability and clarify the concepts. We also thank Alvaro Retana for his thorough review.¶