RFC 9062 | EVPN OAM Requirements/Framework | June 2021 |
Salam, et al. | Informational | [Page] |
This document specifies the requirements and reference framework for Ethernet VPN (EVPN) Operations, Administration, and Maintenance (OAM). The requirements cover the OAM aspects of EVPN and Provider Backbone Bridge EVPN (PBB-EVPN). The framework defines the layered OAM model encompassing the EVPN service layer, network layer, underlying Packet Switched Network (PSN) transport layer, and link layer but focuses on the service and network layers.¶
This document is not an Internet Standards Track specification; it is published for informational purposes.¶
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are candidates for any level of Internet Standard; see Section 2 of RFC 7841.¶
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc9062.¶
Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
This document specifies the requirements and defines a reference framework for Ethernet VPN (EVPN) Operations, Administration, and Maintenance (OAM) [RFC6291]. In this context, we use the term "EVPN OAM" to loosely refer to the OAM functions required for and/or applicable to [RFC7432] and [RFC7623].¶
EVPN is a Layer 2 VPN (L2VPN) solution for multipoint Ethernet services with advanced multihoming capabilities that uses BGP for distributing Customer/Client Media Access Control (C-MAC) address reachability information over the core MPLS/IP network.¶
PBB-EVPN combines Provider Backbone Bridging (PBB) [IEEE-802.1Q] with EVPN in order to reduce the number of BGP MAC advertisement routes; provide client MAC address mobility using C-MAC [RFC7623] aggregation and Backbone MAC (B-MAC) [RFC7623] sub-netting; confine the scope of C-MAC learning to only active flows; offer per-site policies; and avoid C-MAC address flushing on topology changes.¶
This document focuses on the fault management and performance management aspects of EVPN OAM. It defines the layered OAM model encompassing the EVPN service layer, network layer, underlying Packet Switched Network (PSN) transport layer, and link layer but focuses on the service and network layers.¶
This document leverages concepts and draws upon elements defined and/or used in the following documents:¶
[RFC6136] specifies the requirements and a reference model for OAM as it relates to L2VPN services, pseudowires, and associated Packet Switched Network (PSN) tunnels. This document focuses on Virtual Private LAN Service (VPLS) and Virtual Private Wire Service (VPWS) solutions and services.¶
[RFC8029] defines mechanisms for detecting data plane failures in MPLS Label Switched Paths (LSPs), including procedures to check the correct operation of the data plane as well as mechanisms to verify the data plane against the control plane.¶
[IEEE-802.1Q] specifies the Ethernet Connectivity Fault Management (CFM) protocol, which defines the concepts of Maintenance Domains, Maintenance Associations, Maintenance End Points, and Maintenance Intermediate Points.¶
[Y.1731] extends Connectivity Fault Management in the following areas: it defines fault notification and alarm suppression functions for Ethernet and specifies mechanisms for Ethernet performance management, including loss, delay, jitter, and throughput measurement.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This document uses the following terminology, much of which is defined in [RFC6136]:¶
Multiple layers come into play for implementing an L2VPN service using the EVPN family of solutions as listed below. The focus of this document is the service and network layers.¶
This layering extends to the set of OAM protocols that are involved in the ongoing maintenance and diagnostics of EVPN networks. Figure 1 below depicts the OAM layering and shows which devices have visibility into what OAM layer(s).¶
Service OAM and Network OAM mechanisms only have visibility to the PE nodes but not the P nodes. As such, they can be used to deduce whether the fault is in the customer's own network, the local CE-PE segment, the PE-PE segment, or the remote CE-PE segment(s). EVPN Transport OAM mechanisms can be used for fault isolation between the PEs and P nodes.¶
Figure 2 below shows an example network where Ethernet domains are interconnected via EVPN using MPLS, and it shows the OAM mechanisms that are applicable at each layer. The details of the layers are described in the sections below.¶
The EVPN Service OAM protocol depends on what service-layer technology is being interconnected by the EVPN solution. In the case of [RFC7432] and [RFC7623], the service layer is Ethernet; hence, the corresponding Service OAM protocol is Ethernet CFM [IEEE-802.1Q].¶
EVPN Service OAM is visible to the CEs and EVPN PEs but not to the P nodes. This is because the PEs operate at the Ethernet MAC layer in [RFC7432] and [RFC7623], whereas the P nodes do not.¶
The EVPN PE MUST support MIP functions in the applicable Service OAM protocol (for example, Ethernet CFM). The EVPN PE SHOULD support MEP functions in the applicable Service OAM protocol. This includes both Up and Down MEP functions.¶
As shown in Figure 3, the MIP and MEP functions being referred to are logically located within the device's port operating at the customer level. (There could be MEPs/MIPs within PE ports facing the provider network, but they would not be relevant to EVPN Service OAM as the traffic passing through them will be encapsulated/tunneled, so any customer-level OAM messages will just be treated as data.) Down MEP functions are away from the core of the device while Up MEP functions are towards the core of the device (towards the PE forwarding mechanism in the case of a PE). OAM messages between the PE Up MEPs shown are a type of EVPN Network OAM, while such messages between the CEs or from a PE to its local CE or to the remote CE are Service OAMs.¶
The EVPN PE MUST, by default, learn the MAC address of locally attached CE MEPs by snooping on CFM frames and advertising them to remote PEs as a MAC/IP Advertisement route. Some means to limit the number of MAC addresses that a PE will learn SHOULD be implemented.¶
The EVPN PE SHOULD advertise any MEP/MIP local to the PE as a MAC/IP Advertisement route. Since these are not subject to mobility, they SHOULD be advertised with the static (sticky) bit set (see Section 15.2 of [RFC7432]).¶
EVPN Network OAM is visible to the PE nodes only. This OAM layer is analogous to Virtual Circuit Connectivity Verification (VCCV) [RFC5085] in the case of VPLS/VPWS. It provides mechanisms to check the correct operation of the data plane as well as a mechanism to verify the data plane against the control plane. This includes the ability to perform fault detection and diagnostics on:¶
EVPN Network OAM mechanisms MUST provide in-band monitoring capabilities. It is desirable, to the extent practical, for OAM test messages to share fate with data messages. Details of how to achieve this are beyond the scope of this document.¶
EVPN Network OAM SHOULD provide both proactive and on-demand mechanisms of monitoring the data plane operation and data plane conformance to the state of the control plane.¶
The Transport OAM protocol depends on the nature of the underlying transport technology in the PSN. MPLS OAM mechanisms [RFC8029] [RFC6425] as well as ICMP [RFC0792] and ICMPv6 [RFC4443] are applicable, depending on whether the PSN employs MPLS or IP transport, respectively. Furthermore, Bidirectional Forwarding Detection (BFD) mechanisms per [RFC5880], [RFC5881], [RFC5883], and [RFC5884] apply. Also, the BFD mechanisms pertaining to MPLS-TP LSPs per [RFC6428] are applicable.¶
Link OAM depends on the data-link technology being used between the PE and P nodes. For example, if Ethernet links are employed, then Ethernet Link OAM ([IEEE-802.3], Clause 57) may be used.¶
When interworking two networking domains, such as actual Ethernet and EVPN to provide an end-to-end emulated service, there is a need to identify the failure domain and location, even when a PE supports both the Service OAM mechanisms and the EVPN Network OAM mechanisms. In addition, scalability constraints may not allow the running of proactive monitoring, such as Ethernet Continuity Check Messages (CCMs) [IEEE-802.1Q], at a PE to detect the failure of an EVI across the EVPN domain. Thus, the mapping of alarms generated upon failure detection in one domain (e.g., actual Ethernet or EVPN network domain) to the other domain is needed. There are also cases where a PE may not be able to process Service OAM messages received from a remote PE over the PSN even when such messages are defined, as in the Ethernet case, thereby necessitating support for fault notification message mapping between the EVPN Network domain and the Service domain.¶
OAM interworking is not limited, though, to scenarios involving disparate network domains. It is possible to perform OAM interworking across different layers in the same network domain. In general, alarms generated within an OAM layer, as a result of proactive fault detection mechanisms, may be injected into its client-layer OAM mechanisms. This allows the client-layer OAM to trigger event-driven (i.e., asynchronous) fault notifications. For example, alarms generated by the Link OAM mechanisms may be injected into the Transport OAM layer, and alarms generated by the Transport OAM mechanism may be injected into the Network OAM mechanism, and so on.¶
EVPN OAM MUST support interworking between the Network OAM and Service OAM mechanisms. EVPN OAM MAY support interworking among other OAM layers.¶
This section discusses the EVPN OAM requirements pertaining to fault management and performance management.¶
The network operator configures proactive fault management functions to run periodically. Certain actions (for example, protection switchover or alarm indication signaling) can be associated with specific events, such as entering or clearing fault states.¶
Proactive fault detection is performed by periodically monitoring the reachability between service end points, i.e., MEPs in a given MA, through the exchange of CCMs [IEEE-802.1Q]. The reachability between any two arbitrary MEPs may be monitored for:¶
The fact that MPLS/IP networks do not enforce congruency between unicast and multicast paths means that the proactive fault detection mechanisms for EVPN networks MUST provide procedures to monitor the unicast paths independently of the multicast paths. This applies to EVPN Service OAM and Network OAM.¶
Defect indications can be categorized into two types: forward and reverse, as described below. EVPN Service OAM MUST support at least one of these types of event-driven defect indications upon the detection of a connectivity defect.¶
FDI is used to signal a failure that is detected by a lower-layer OAM mechanism. A server MEP (i.e., an actual or virtual MEP) transmits a forward defect indication in a direction away from the direction of the failure (refer to Figure 4 below).¶
Forward defect indication may be used for alarm suppression and/or for the purpose of interworking with other layer OAM protocols. Alarm suppression is useful when a transport-level or network-level fault translates to multiple service- or flow-level faults. In such a scenario, it is enough to alert a network management station (NMS) of the single transport-level or network-level fault in lieu of flooding that NMS with a multitude of Service or Flow granularity alarms. EVPN PEs SHOULD support forward defect indication in the Service OAM mechanisms.¶
RDI is used to signal that the advertising MEP has detected a LOC defect. RDI is transmitted in the direction of the failure (refer to Figure 5).¶
RDI allows single-sided management, where the network operator can examine the state of a single MEP and deduce the overall health of a monitored service. EVPN PEs SHOULD support reverse defect indication in the Service OAM mechanisms. This includes both the ability to signal a LOC defect to a remote MEP as well as the ability to recognize RDI from a remote MEP. Note that, in a multipoint MA, RDI is not a useful indicator of unidirectional fault. This is because RDI carries no indication of the affected MEP(s) with which the sender had detected a LOC defect.¶
On-demand fault management functions are initiated manually by the network operator and continue for a bounded time period. These functions enable the operator to run diagnostics to investigate a defect condition.¶
EVPN Network OAM MUST support on-demand connectivity verification mechanisms for unicast and multicast destinations. The connectivity verification mechanisms SHOULD provide a means for specifying and carrying the following in the messages:¶
EVPN Network OAM MUST support connectivity verification at per-flow granularity. This includes both user flows (to test a specific path between PEs) as well as test flows (to test a representative path between PEs).¶
EVPN Service OAM MUST support connectivity verification on test flows and MAY support connectivity verification on user flows.¶
For multicast connectivity verification, EVPN Network OAM MUST support reporting on:¶
EVPN OAM MUST support an on-demand fault localization function. This involves the capability to narrow down the locality of a fault to a particular port, link, or node. The characteristic of forward/reverse path asymmetry in MPLS/IP makes fault isolation a direction-sensitive operation. That is, given two PEs A and B, localization of continuity failures between them requires running fault-isolation procedures from PE A to PE B as well as from PE B to PE A.¶
EVPN Service OAM mechanisms only have visibility to the PEs but not the MPLS or IP P nodes. As such, they can be used to deduce whether the fault is in the customer's own network, the local CE-PE segment, or a remote CE-PE segment(s). EVPN Network and Transport OAM mechanisms can be used for fault isolation between the PEs and P nodes.¶
Performance management functions can be performed both proactively and on demand. Proactive management involves a recurring function, where the performance management probes are run continuously without a trigger. We cover both proactive and on-demand functions in this section.¶
EVPN Network OAM SHOULD provide mechanisms for measuring packet loss for a given service -- for example, [RFC7680] and [RFC6673].¶
Given that EVPN provides inherent support for multipoint-to-multipoint connectivity, packet loss cannot be accurately measured by means of counting user data packets. This is because user packets can be delivered to more PEs or more ports than are necessary (e.g., due to broadcast, unpruned multicast, or unknown unicast flooding). As such, a statistical means of approximating the packet loss rate is required. This can be achieved by sending "synthetic" OAM packets that are counted only by those ports (MEPs) that are required to receive them. This provides a statistical approximation of the number of data frames lost, even with multipoint-to-multipoint connectivity.¶
EVPN Service OAM SHOULD support measurement of one-way and two-way packet delay and delay variation (jitter) across the EVPN network. Measurement of one-way delay requires clock synchronization between the probe source and target devices. Mechanisms for clock synchronization are outside the scope of this document. Note that Service OAM performance management mechanisms defined in [Y.1731] can be used. See also [RFC7679], [RFC2681], and [RFC3393].¶
EVPN Network OAM MAY support measurement of one-way and two-way packet delay and delay variation (jitter) across the EVPN network.¶
EVPN OAM MUST prevent OAM packets from leaking outside of the EVPN network or outside their corresponding Maintenance Domain. This can be done for CFM, for example, by having MEPs implement a filtering function based on the Maintenance Level associated with received OAM packets.¶
EVPN OAM SHOULD provide mechanisms for implementation and optional use to:¶
This document has no IANA actions.¶
The authors would like to thank the following for their review of this work and their valuable comments: David Black, Martin Duke, Xiao Min, Gregory Mirsky, Zaheduzzaman Sarker, Dave Schinazi, John Scudder, Melinda Shore, Robert Wilton, Alexander Vainshtein, Stig Venaas, and Éric Vyncke.¶