summaries-se-ost

Computer Networks 2

1.ROUTING

FOSS ftw

1.1.ADMINISTRATIVE DISTANCE (AD)

When multiple sources exist for routing information, such as static routes and BGP, a Cisco router uses the concept of administrative distances to prefer one routing source to the others. The protocol with the lowest administrative distance wins. The accepted best route is then installed in the routing table.

The administrative distances of the most common routing protocols are shown in the table to the right.

When running show ip routeshow ip route, the output is structured as follows

[Protocol] [Type] [Net] [AD/Cost] [Via] [Updated] [If][Protocol] [Type] [Net] [AD/Cost] [Via] [Updated] [If]

The first number in the brackets (eg. [160/...][160/...]) is the administrative distance of the information source; the second number (eg. [.../5][.../5]) is the metric for the route.

Protocol AD
RIP v1 and v2 120
EIGRP Internal 90
EIGRP External 170
OSPF 110
Integrated ISIS 115
BGP Internal 200
BGP External 20
Static to Next Hop 1
Static to Interface 1
Connected 0
2.INTERIOR GATEWAY PROTOCOLS (IGP)

Establish the global connectivity between routers, within an AS.

Link–state routing protocols use a two-layer area hierarchy composed of one backbone area and multiple regular areas which have to be connected to the backbone area.

2.1.OPEN SHORTEST PATH FIRST (OSPF)

OSPF is an instance of a link state protocol designed for intra-domain routing in an IP network. OSPF gathers link state information from available routers and constructs a topology map of the network. The version of OSPF used in IPv4 networks is known as OSPF version 2 (OSPFv2). OSPF for IPv6 networks is known as OSPFv3.

2.1.1.SPF calculation

Every time there is a change in the network topology, OSPF needs to reevaluate its shortest path calculations.

  • For each intra-area topology change, routers must rerun SPF.
  • An inter-area topology change do not trigger the SPF recalculation.

    • The router determines the best paths for interarea routes based on the calculation of the best path towards the ABR.
    • The changes that are described in type 3 LSAs do not influence how the router reaches the ABR.

    SPF recalculation is not needed.

2.1.2.Network hierarchy

OSPF provides the functionality to divide an intra-domain network into sub-domains (areas). Areas are identified through a 32-bit area field. Area ID 0 is the same as 0.0.0.0. Every intra-domain must have a core area with area ID 0 (backbone area). All other areas connected to the backbone area are referred to as low-level areas. The backbone area is in charge of summarizing the topology of one area to another area and vice versa.

2.1.3.Router classification

The routers are classified into four different types according to RFC 2328

Term Definition
Internal Routers
(IR)
A router with all directly connected networks belonging to the same area. These routers run a single copy of the basic routing algorithm.
Area Border
Routers (ABR)
A router that attaches to multiple areas. Area border routers run multiple copies of the basic algorithm, one copy for each attached area. Area border routers condense the topological information of their attached areas for distribution to the backbone. The backbone in turn distributes the information to the other areas.
Backbone
Routers (BR)
A router that has an interface to the backbone area. This includes all routers that interface to more than one area (i.e., area border routers). However, backbone routers do not have to be area border routers. Routers with all interfaces connecting to the backbone area are supported.
AS Boundary
Routers (ASBR)
A router that exchanges routing information with routers belonging to other Autonomous Systems. Such a router advertises AS external routing information throughout the Autonomous System. The paths to each AS boundary router are known by every router in the AS. This classification is completely independent of the previous classifications: AS boundary routers may be internal or area border routers, and may or may not participate in the backbone.
2.1.4.Network types

OSPF is designed to address four different types of networks:

2.1.4.1.Point-to-point networks

Point-to-point networks refer to connecting a pair of routers directly by an interface/link.

2.1.4.2.Broadcast networks

Broadcast networks are multi-access where all routers in a broadcast network can receive a single transmitted packet. In such networks, a router is elected as a Designated Router (DR) and another as a Backup Designated Router (BDR).

2.1.4.3.Non-broadcast multi-access networks (NBMA)

Non-broadcast multi-access networks (NBMA) are networks where more than two routers may be connected without broadcast capability. Such networks require an extra configuration to emulate the operation of OSPF on a broadcast network. Like broadcast networks, NBMA networks elect a DR and a BDR.

2.1.4.4.Point-to-multipoint networks

Point-to-multipoint networks are also non-broadcast networks much like NBMA networks. However, OSPF’s mode of operation is different and is similar to point-to-point links.

2.1.4.5.Optimization on Non-Point-to-Point Networks
  • Designated Router (DR) and Backup Designated Router (BDR) based on priority or router ID
  • DR performs the LSA forwarding and LSDB synchronization tasks on behalf of all routers on the broadcast domain
  • Each router establishes a FULL adjacency with the DR and the BDR by using the IPv4 multicast address 224.0.0.6
  • The BDR performs the DR tasks only if the DR fails.
2.1.5.Virtual links
  • Virtual links cannot go through more than one area.
  • Virtual links can only run through standard non-backbone areas. (not over stubby areas for example)
Rule 1 Rule 2
2.1.6.Passive Interfaces

The passive interface is used on interfaces where the router is not expected to form any OSPF neighbor adjacency. On a passive interface, the router stops sending and receiving OSPF Hello packets.

2.1.7.Link State Advertisement (LSA) Types

OSPF floods routing information such as link state advertisements. The scope of flooding of OSPF packets depends on the LSA types. The six different LSA types are:

2.1.7.1.Router LSA (Type 1)

Every router generates a Router LSA that lists all the routers’ outgoing interfaces. For each interface, the state and cost of the link are included. Such LSAs are generated for point-to-point links.

  • Type: All routers in Area
  • Scope: Flooding of Router LSAs is restricted to the area where they originate. [AREA]
2.1.7.2.Network LSA (Type 2)

Network LSAs are applicable in broadcast and non-broadcast networks where they are generated by the DR. A Network LSA represents a LAN. All attached routers and the DR are listed in the Network LSA.

  • Type: Only DRs
  • Scope: Flooding of Network LSAs is also restricted to the area where they originate. [AREA]
2.1.7.3.Network Summary LSA (Type 3)

Area Border Routers (ABR) generate Network Summary LSAs that are used for advertising destinations outside an area.

  • Type: Only ABRs
  • Scope: Flooded in all the areas that are not totally stubby. [AREA]
2.1.7.4.ASBR Summary LSA (Type 4)

Identifies the ASBR and provides a route to the ASBR. All traffic that is destined to an external AS requires routing table knowledge of the ASBR that originated the external routes.

The ASBR sends a type 1 router LSA with a bit (known as the external bit) that is set to identify itself as an ASBR. When the ABR (identified with the border bit in the router LSA) receives this type 1 LSA, the ABR builds a type 4 LSA and floods it to the subsequent areas.

  • Type: Only ABRs
  • Scope: [AREA]
2.1.7.5.AS External LSA (Type 5)

AS External LSAs are generated by ASBRs and propagate the external networks within the OSPF domain. Destinations external to an OSPF AS are advertised using AS external LSAs.

  • Type: ASBR and ABR
  • Scope: AS external LSAs are flooded in all the areas that are neither stub nor totally stubby. [DOMAIN]
2.1.7.6.External LSA (Type 7)

Also contain external networks within the OSPF domain. NSSA areas do not allow type 5 external LSAs.

  • Type: ASBRs
  • Scope: [AREA]
2.1.7.7.Summary

OSPF Accepted LSAs per Area Type

2.1.8.Route types
Term Definition
Intra-area routes Are originated and learned in the same local area (O)
Inter-area routes Originate in other areas and are inserted into the local area to which your router belongs (O IA)
External routes (O E1 or O E2)
2.1.9.Area types
Term Definition
Backbone Area
  • Has to be connected to all of the other areas.
  • Must always be contiguous
  • Generally, end users are not found within a backbone area
Stub Area
  • Eliminates all external routes (type 5)
  • Creates a default route
  • Cannot contain ASBRs
Standart Area
  • All routes present
Totally Stubby Area
  • Eliminates all external routes (type 5)
  • Eliminates all inter-area routes (type 3)
  • Creates a default route
  • Cannot contain ASBRs
Not so Stubby Area
(NSSA)
  • Allows injection of external routes into a stub area
  • Eliminates all external routes* (type 5)
  • No default route
  • Creates own area type 7 LSA

    * advertises area 4 ASBR redistributed external routes as LSA type 7 in area 4 itself. When traversing to a different OSPF area, it transforms them to a regular type 5 LSA

Totally Not so Stubby
Area (Totally NSSA)
  • Eliminates all external routes* (type 5)
  • Eliminates all inter-area routes (type 3)
  • Creates a default route
  • Creates own area type 7 LSA

    * advertises area 5 ASBR redistributed external routes as LSA type 7 in area 5 itself. When traversing to a different OSPF area, it transforms them to a regular type 5 LSA

2.1.9.1.Stub areas and stub networks
  • Stubby = Eliminates all external routes (type 5)
  • Totally stubby = Additionally eliminates all inter-area routes (type 3)
2.1.10.Flooding

OSPF sits directly on top of IP in the TCP/IP stack by using the IP protocol number 8989. OSPF packets use the multicast destination MAC address 224.0.0.5. OSPF is required to provide its own reliable mechanism, instead of being able to use a reliable transport protocol such as TCP. OSPF addresses reliable delivery of packets through use of either an implicit or explicit acknowledgment.

Since a router may not receive acknowledgment from its neighbor to whom it sent a link state update message, a router is required to track a link state retransmission list of outstanding updates.

  • An LSA is retransmitted, always as unicast, on a periodic basis until an acknowledgment is received, or the adjacency is no longer available.
  • A router floods all its LSAs every 30 minutes, regardless of whether the content of the LSA such as the metric value has changed. Hence, the Link State Database (LSDB) is always synchronized between all routers in an area
2.1.11.Packet format

OSPF has 5 packet types:

Packet Type Function Transmission Mode Address
Hello Discovery/Maintain Usually Multicast 244.0.0.5*
DBD Database Summary Unicast Neighbor IP
LSR Request LSA Unicast Neighbor IP
LSU Link State Update Multicast/Unicast 244.0.0.5/.6* or Unicast
LSAck Acknowledgement Multicast/Unicast 244.0.0.5/.6* or Unicast

* 244.0.0.5 for all routers, 244.0.0.6 for DR/BDR

2.1.11.1.Hello Packet (Hello)

The primary purpose of the hello packet is to establish and maintain adjacencies. The hello packet is also used in the election process of the Designated Router and Backup Designated Router in broadcast networks. Moreover, it is used for negotiating optional capabilities.

Network Mask
Hello Interval Options Priority
Router Dead Interval
Designated Router
Backup Designated Router
Neighbors (4 bytes each)
Field Definition
Network Mask (32 bit) Address mask of the router interface from which this packet is sent.
Hello Interval (16 bit) Time difference in seconds between any two hello packets. The sending and the receiving routers are required to maintain the same value. Otherwise, a neighbor relationship is not established. For point-to-point and broadcast networks, the default value is 10s10s, for other network types it’s 30s30s.
Options (8 bit) Allow compatibility with a neighboring router to be checked.
Priority (8 bit) Used when electing the DR and the BDR.
Router Dead Interval (32 bit) Length of time in which a router will declare a neighbor to be dead if it does not receive a hello packet. Needs to be larger than the hello interval. The neighbors also need to agree on the value of this parameter. A routing packet that is received and does not match this field on a receiving router’s interface is dropped. The default value is four times the default value for the hello interval (40s40s and 120s120s).
Designated Router (32 bit) DR/BDR field lists the IP address of the interface of the DR/BDR on the network, but not its router ID. If the DR/BDR field is 0.0.0.0, this means that there is no DR/BDR.
Neighbors (4 bytes each) This field is repeated for each router from which the originating router has received a valid Hello recently, meaning in the past Router Dead Interval.
2.1.11.2.Database Description Packet (DBD)

The database description packet contains a summary of all the LSAs (not the entire LSAs) that the neighboring router has in its LSDB. It includes:

  • Link-state type
  • Address of advertising router
  • Link-cost
  • Sequence number
Interface MTU Options 0 0 0 0 0 I M MS
DD Sequence Number
LSA Headers
Field Definition
Interface MTU (16 bit) Indicates the size of the largest transmission unit the interface can handle without fragmentation.
Options (8 bit) Consist of several bit-level fields. The most interesting one is the E-bit which is set when the attached area is capable of processing AS-external-LSAs.
I (1 bit) I-bit (initial-bit) is initialized to 1 for the initial packet that starts a database description session; for other packets for the same session, this field is set to 0.
M (1 bit) M-bit (more-bit) is used to indicate that this packet is not the last packet for the database description session by setting it to 1; the last packet for this session is set to 0.
MS (1 bit) MS-bit (master-slave bit) is used to indicate that the originator is the master by setting this field to 1, while the slave sets this field to 0.
DD Sequence Number (32 bit) Used for incrementing the sequence numbers of packets from the side of the master during a database description session. The master sets the initial value for the sequence number.
LSA Headers (32 bit) Lists headers of the link state advertisements in the originator’s link state database.
2.1.11.3.Link State Request Packet (LSR)
  • Typically triggered after DBD
  • Requests specific LSAs from neighbors (unicast)

The link state request packet is used for pulling information. Once the database description has been received from a neighbor, a router knows which LSAs are not in its LSDB and will request the entire missing LSAs from that neighbor. The fields are repeated for each unique entry:

Link State Type
Link State ID
Advertising Router
Field Definition
Link State Type (32 bit) Identifies a link state type such as a router or network.
Link State ID (32 bit) Dictated by the link state type.
Advertising Router (32 bit) Address of the router that has generated this LSA.
2.1.11.4.Link State Update Packet (LSU)

This packet is the answer to a Link State Request Packet. It contains the first field to be the number of LSAs followed by information on LSAs that match the LSA packet format. A link state update packet can contain one or more LSAs. It ensures that all routers have same view.

  • Implicit acknowledgement
  • Flooding of LSAs (multicast)
  • Sending LSA responses to LSRs (unicast)
Number of LSAs
LSAs
...
LSAs
2.1.11.5.Link State Acknowledgement Packet (LSAck)
  • Explicit acknowledgement
  • Make LSA flooding reliable (multicast)
  • Acknowledging direct LSU (unicast)

Each newly received LSA must be acknowledged. This is usually done by sending Link State Acknowledgment packets. However, acknowledgments can also be accomplished implicitly by sending Link State Update packets.

Many acknowledgments may be grouped together into a single Link State Acknowledgment packet. Such a packet is sent back out the interface which received the LSAs.

A Link State Acknowledgment Packet contains a regular OSPF header with the type field set to 5 and a set of one or more LSA headers as payload.

Version Type Packet Length
Router ID
Area ID
CheckSum AuType
Authentication
Authentication
LSA Headers...
2.1.12.Neighbor states
  1. Down: Initial state of a neighbor conversation when no “Hellos” packets have been received from the neighbour. If a router doesn’t receive a hello packet from a neighbor within the RouterDeadInterval time, then neighbour state changes from full to down.
  2. Init: The router has received a Hello from the neighbor but has not yet seen its own router ID in the neighbor Hello packet.
  3. 2-Way: The router has seen its own router ID in the Hello packet received from the neighbor. The DR and BDR election is taking place if necessary.
  4. Exstart: Neighboring routers establish a master/slave relationship and determine the initial database descriptor (DBD) sequence number to use while exchanging DBD packets.
  5. Exchange: The routers exchange DBD packets, which describe their entire link-state database.
  6. Loading: routers exchange full Link State information based on DataBase Descriptor (DBD) provided by neighbors, the OSPF router sends Link State Request (LSR) and receives Link State Update (LSU) containing all Link State Advertisements (LSAs).
  7. Full: Normal operating state that indicates everything is functioning normally. In this state, routers are fully adjacent with each other and all the router and network Link State Advertisements (LSAs) are exchanged and the routers’ databases are fully synchronized.
2.1.13.Sub-Protocols
2.1.13.1.Hello Protocol

During initialization/activation, the hello protocol is used for neighbor discovery as well as to agree on several parameters before two routers become neighbors.

  • When using the hello protocol, logical adjacencies are established for point-to-point, point-to-multipoint, and virtual link networks.
  • For broadcast and NBMA networks, not all routers become logically adjacent. The hello protocol is used for electing Designated Routers and Backup Designated Routers.

After initialization, for all network types the hello protocol is used for keeping alive connectivity which ensures bidirectional communication between neighbors. If the keep alive hello messages are not received within a certain time interval that was agreed upon during initialization, the link/connectivity between the routers is assumed to be not available.

2.1.13.2.Database Synchronization Protocol

Beyond basic initialization to discover neighbors, two adjacent routers need to build adjacencies. A complete link state advertisement of all links in the database of each router can be exchanged, but a special database description process is used to optimize this step. During the database description phase, only headers of link state advertisements are exchanged. Headers serve as adequate information to check if one side has the latest LSA. Since such a synchronization process may require exchange of header information about many LSAs, the database synchronization process allows for such exchanges to be split into multiple chunks.

These chunks are communicated using database description packets by indicating whether a chunk is an initial packet (using I-bit), or a continuation/more packet or last packet (using M-bit). One side needs to serve as a master (MS-bit) while the other side serves as a slave. The neighbor with the lower router ID becomes the slave.

2.1.14.Routing computation and Equal-Cost MultiPath

Default costs:

Bandwidth (b/s) Cost
128K 781
10M 10
100M 1

LSAs type 1 and 2 are flooded throughout an area. This allows every router in an area to build link state databases with identical topological information.

  • Shortest path computation based on Dijkstra’s algorithm is performed at each router for every known destination based on the directional graph determined from the link state database.
  • The link cost used for each link is the metric value advertised in the link state advertisement packet. The value can be between 1 and 65'535
  • Dijkstra-based shortest path computation using link state information is applied only within an area. For routing updates between areas, information from one area is summarized using Summary LSAs without providing detailed link information.

The next hop is extracted from the shortest path computation to update the routing table and subsequently, the forwarding table.

  • Routing table entries are for destinations identified through hosts or subnets or simply IP prefixes with CIDR notation, not in terms of end routers.
  • Because of CIDR, multiple similar route entries are possible, eg. 10.1.64.0/24 vs 10.1.64.0/18. To select the route preferred by an arriving packet, OSPF uses a best route selection process (most specific match).
  • In case there are multiple paths available after this step, the second step selects the route where an intra-area path is given preference over an inter-area path, which in turn gives preference over external paths for routes learned externally.
2.1.14.1.Equal-cost multipath (ECMP)

ECMP means that if two paths have the same lowest cost, then the outgoing link (next hop) for both can be listed in the routing table so that traffic can be equally split. The original Dijkstra’s algorithm generates only one shortest path even if multiple shortest paths are available. To capture multiple shortest paths, where available, Dijkstra’s algorithm is slightly modified.

The router implementation handles the ECMP path selection on a per-flow basis rather than on a per-packet basis. The ECMP path selection is based on the hash of certain fields of the IP packet without having to maintain states at routers.

2.1.14.2.Route selection

When using OSPF routing hierarchy, the following rules apply:

  • If the source and destination addresses of a packet reside within the same area, intra-area routing is used. Intra-area routes in OSPF are described by router (type 1) and network (type 2) LSAs. When displayed in the OSPF routing table, these types of intra-area routes are designated with an O.
  • If the source and destination addresses of a packet reside within different areas, but are still within the AS, inter-area routing is used. These types of routes are described by network summary (type 3) LSAs. When routing packets between two nonbackbone areas, the backbone is used. This means that inter-area routing has pieces of intra-area routing along its path, for example:

    1. An intra-area path is used from the source router to the area border router.
    2. The backbone is then used from the source area to the destination area.
    3. An intra-area path is used from the destination area’s area border router to the destination.

    When you put these three routes together, you have an inter-area route. Of course, the SPF algorithm calculates the lowest cost between these two points. When displayed in the OSPF routing table, these types of routes are indicated with an O IA.

  • If the destination address of a packet resides outside the AS, external routing is used. External routing information are injected into OSPF through redistribution from another routing protocol. The AS boundary routers (ASBRs) flood the external route information throughout the AS. Every router receives this information, with the exception of stub areas. The types of external routes used in OSPF are as follows:

    • E1 routes: E1 route’s costs are the sum of internal and external (remote AS) OSPF metrics. If a packet is destined for another AS, an E1 route takes the remote AS metric and adds all internal OSPF costs. They are identified by the E1 designation within the OSPF routing table.
    • E2 routes: E2 routes are the default external routes for OSPF. They do not add the internal OSPF metrics. Multiple routes to the same destination use the following order of preference: intra-area, inter-area, E1, and E2.

OSPF will first look at the “type of path” to decide and secondly look at the metric. On equal types path cost will decide by the preferred path list for OSPF:

  1. Intra-Area (O)
  2. Inter-Area (O IA)
  3. External Type 1 (E1)
  4. NSSA Type 1 (N1)
  5. External Type 2 (E2)
  6. NSSA Type 2 (N2)
2.1.14.3.Route summarization

Route summarization helps solve two major challenges

  • Large routing tables
  • Frequent LSA flooding throughout the autonomous system

With route summarization, the ABRs or ASBRs consolidate multiple routes into a single advertisement

  • Route summarization requires a good addressing plan
  • Subnets in areas should be assigned contiguously to ensure that these addresses can be summarized into a minimal number of summary addresses

Summarization is only allowed on ASBRs and ABRs.

ABRs ASBRs
Summarize type 3 LSAs Summarize type 5 LSAs

An internal summary route is generated if at least one subnet within the area falls in the summary address range

The summarized route metric is equal to the lowest cost of all the subnets within the summary address range

Summarization of external routes can be done on the ASBR for redistributed routes before injecting them into the OSPF domain
2.1.15.Synchronizing the LSDB
2.1.16.Extending OSPF
  • Classical OSPF is not easy to extend to add new features

    • They require the creation of a new LSA
    • OSPF version 2 was developed exclusively for IPv4
    • RFC 7684 introduces Opaque LSAs
2.2.INTERMEDIATE SYSTEM TO INTERMEDIATE SYSTEM (IS-IS)
  • Widely used (especially in ISP networks / as an intra-domain routing protocol)
  • Fast convergence
  • Equal Cost Multipath (ECMP) Load Balancing
  • IS-IS supports different protocol suites RFC 1195
  • Originally developed for ISO OSI environments (CLNS) but integrated IS-IS can be used to support pure-IP or dual environments. In modern IP-only environments:

    • There are no CLNP-based user applications.
    • The routing process is the primary user of the underlying CLNS mechanisms.
    • The only ISO packets typically observed are ES-IS and IS-IS control messages.
    • CLNP node-based addresses are still used to identify routers
Term Definition
ES End-host devices are called End Systems (ES)
IS Routers are called Intermediate Systems (IS)
IIH IS-IS Hello Packets
TLV Type Length Value
2.2.1.Connectionless Network Service (CLNS)

Different to the TCP/IP suite, the OSI architecture has a strict distinction between services and protocols. Services are defined as the functions provided by one layer to the layer above it, while protocols are the specific implementations of these services In the OSI model, each layer provides services to the layer above it and relies on services from the layer below it.

OSI Model TCP/IP Model
L3 Service CLNS (Connectionless Network Service) No separate name
Service Type Connectionless Connectionless
Data-Plane Protocol CLNP (Connectionless Network Protocol) IP (Internet Protocol)
Control-Plane Protocol IS-IS / ES-IS OSPF / IS-IS / BGP / etc.
Addressing NSAP (Network Service Access Point) IP Address
2.2.1.1.Protocol Suite

The connectionless network service defined by CLNS is realized and supported within the ISO architecture by several protocols, including:

CLNP, ES-IS, and IS-IS are specified as separate network layer protocols, coexisting at Layer 3 of the OSI reference model.

2.2.1.2.Connectionless Network Protocol (CLNP)

The CLNP is the OSI equivalent of the IP.

  • Both are connectionless.
  • Both provide best-effort delivery.
  • Both rely on separate routing protocols for path calculation.

CLNP operates at Layer 3 of the OSI model and provides connectionless, best-effort packet delivery between systems.

CLNP provides network-layer services to ISO transport protocols, rather than to TCP and UDP as in the TCP/IP architecture.

At the data-link layer, CLNP packets are identified by the Ethernet protocol type: 0xFE FE.

2.2.1.3.End System to Intermediate System (ES-IS)

Operates between hosts (End Systems) and routers (Intermediate Systems) in an ISO CLNS environment. Its primary function is adjacency and reachability discovery within a shared network segment (for example, a LAN). ES-IS automates the exchange of addressing and presence information between connected systems.

The protocol operates using two message types:

These messages allow systems to discover neighboring devices and their network layer addresses. From a functional perspective, ES-IS can be loosely compared to the combined roles of:

  • ARP (address resolution)
  • ICMP (reachability signaling)
  • DHCP (host configuration assistance)

When IS-IS is configured on certain router platforms, ES-IS functionality operates automatically in the background to support adjacency formation.

2.2.1.4.Intermediate System to Intermediate System (IS-IS)

IS-IS is a link-state routing protocol operating between routers (Intermediate Systems). Originally developed for routing CLNP traffic within ISO CLNS networks, IS-IS dynamically exchanges topology and reachability information between routers. It builds a link-state database and computes shortest paths using the SPF algorithm.

IS-IS operates in conjunction with ES-IS:

  • ES-IS supports neighbor discovery.
  • IS-IS establishes and maintains router adjacencies.
  • IS-IS distributes routing information across the routing domain.

On multi-access networks, routers learn the data-link addresses (for example, MAC addresses, also referred to as SNPAs Subnetwork Points of Attachment) of adjacent systems and store this information in the adjacency database.

2.2.2.Network Service Access Point (NSAP)

An NSAP address identifies a network-layer entity within the OSI architecture. It is the CLNS equivalent of an IP address, although its structure differs significantly.

An NSAP address:

  • Is variable in length (up to 20 bytes)
  • Is hierarchically structured
  • Identifies a system (node) rather than a specific interface
  • Is used by routing protocols such as IS-IS
2.2.3.Similarities between IS-IS and OSPF
  • Standardized
  • Link-state protocol
  • Similar sync mechanism
  • Use Dijkstra SPF algorithm
  • Similar update/flooding process
  • Quick convergence
2.2.4.Advantages of IS-IS over OSPF
  • Detects a failure faster
  • Simpler than OSPF
  • Well-positioned for IPv6
  • Scalability

    • Less “chatty”
    • TLVs instead of new LSPs
  • IS-IS operates directly over the data link layer

    • No technical need for IP to establish adjacencies
    • Security: immune to remote IP-based attack vectors. packets are directly encapsulated over the data link and are not carried in IP packets or even CLNP packets. Therefore, to maliciously disrupt the IS-IS routing environment, an attacker has to be physically attached to a router in the IS-IS network
  • Multiple protocols supported, but treated as metadata attributes

ISIS considered to be more scalable and better suited for large and complex networks. OSPF might struggle with very large networks, especially in one single area.

  • IS-IS: groups updates into one LSP
  • OSPF: many small LSA updates
2.2.5.Integrated IS-IS

Although IS-IS was not originally designed for IP routing, it was later extended to support IP networks. This extension is commonly referred to as Integrated IS-IS. In modern IP-only environments:

  • There are no CLNP-based user applications.
  • The routing process is the primary user of the underlying CLNS mechanisms.
  • The only ISO packets typically observed are ES-IS and IS-IS control messages.
2.2.5.1.Addressing

The following list consists of requirements and caveats that must be followed to define NSAP for IS-IS routing in general and particularly on Cisco routers:

  • Each node in an IS-IS routing area must have a unique SysID
  • The SysID of all nodes in an IS-IS routing domain must be of the same length
  • The length of the SysID is 6 bytes (fixed) on Cisco routers

You can use one of the LAN MAC addresses on a router as its SysID, essentially embedding a MAC address (a Layer 2 address) in the NSAP. Another popular way to define unique SysIDs is by padding a dotted-decimal loopback IP address with zeros to transform it into a 12-digit address, which can then be easily rearranged to represent a 6-byte SysID in hexadecimal, by regrouping the digits in fours and separating them with dots.

2.2.5.2.Network Entity Title (NET)
  • Identifies talking to the router itself (not to a specific application)
  • NSAP address with an NSEL (NSAP Selector) of 0
  • Included in LSP header
  • Area part starts with 49

    • Authority and Format Identifier
    • Stands for private/local address
2.2.6.Router types
2.2.6.1.Level 1 router (L1)
  • Knows the topology only of its own area
  • All L1 routers have same LSPDB within area

L1 routers form adjacencies only with other L1 routers that belong to the same area. During the hello process, the routers verify that their Area IDs match. If the Area IDs differ, no L1 adjacency is established.

After adjacency establishment, L1 routers exchange Level-1 Link-State Packets (L1 LSPs). In order to exchange routing information, IS-IS uses LSPs (Link State Packet) which is similar to OSPF’s LSAs. These LSPs contain:

  • Information about directly connected neighbors within the area
  • Reachable IP prefixes within the area
  • Associated metrics

Through reliable flooding, each L1 router distributes its LSPs to all other routers in the same area so that all L1 routers have a consistent view of the area topology.

Based on the synchronized L1 LSDB, each router independently runs the SPF algorithm to compute optimal paths to all destinations within the area.

2.2.6.2.Level 2 router (L2)
  • Knows about other areas
  • All L2 routers have same LSPDB

An L2 router operates at the inter-area level and is responsible for routing between different IS-IS areas.

After adjacency establishment, Level-2 routers exchange Level-2 Link-State Packets (L2 LSPs). These LSPs contain:

  • Information about neighboring Level-2 routers
  • Reachable prefixes from their attached areas
  • Associated metrics

Through reliable flooding, all Level-2 routers build a synchronized Level-2 LSDB that represents the inter-area topology.

Based on the synchronized Level-2 LSDB, each router independently runs a separate Level-2 SPF calculation.

2.2.6.3.Level 1/Level 2 router (L1-L2)
  • Has a Level 1 link-state database for intra-area routing and a Level 2 link-state database for interarea routing
  • L1-L2 routers maintain a separate L1 and L2 LSPDB

A L1-L2 router operates simultaneously at both routing levels and acts as the border router between an area and the Level-2 backbone, enabling communication between different IS-IS areas.

The router participates independently in both flooding domains, Level-1 LSPs within its local area and Level-2 LSPs across the backbone.

A L1-L2 router also runs two independent SPF calculations for both areas.

2.2.7.Adjacencies

An adjacency must be in an up state for a router to send or process received LSPs:

  • A Level 1 adjacency is formed when the area addresses match unless configured otherwise.
  • A Level 2 adjacency is formed alongside the Level 1 unless the router is configured to be Level 1-only.
  • If no matching areas exist between the configuration of the local router and the area addresses information in the received hello, only a Level 2 adjacency is formed.
  • If the transmitting router is configured for Level 2-only, the receiving router must be capable of forming a Level 2 adjacency. Otherwise, no adjacency forms.

When designing IS-IS networks, always remember that the backbone must be contiguous. In other words, a Level 1-only router should never be inserted between any two Level 2 routers (Level 2-only or Level 1-2).

2.2.7.1.Point-to-point

IS-IS adjacencies on point-to-point links are initialized by receipt of ISHs through the ES-IS protocol. This is followed by the exchange of point-to-point IIHs. In the default mode of operation, IIHs are padded to the MTU size of the outgoing interface. Routers match the size of IIHs received to their local MTUs to ensure that they can handle the largest possible packets from their neighbors before completing an adjacency.

  • An unnecessary pseudonode LSP is not included in the LSPDB of all routers in that level.
  • CSNPs are not continuously flooded into a segment
  • CSNPs are sent only once during start
2.2.7.2.Adjacency states for broadcast network segments
2.2.7.3.Multiaccess

The process of building adjacencies is not triggered by receipt of ISHs. A router sends IIHs on broadcast interfaces as soon as the interface is enabled.

Routers include the MAC addresses of all neighbors on the LAN that they have received hellos from, allowing for a simple mechanism to confirm two-way communication.

  • Two-way communication is confirmed when subsequent hellos received contain the receiving router’s MAC address (SNPA) in an IS Neighbors TLV field.
  • Otherwise, communication between the nodes is deemed one-way, and the adjacency stays at the initialized state.

The broadcast medium is modeled as a node, called the pseudonode. The pseudonode role is played by an elected DIS.

Multicast Addresses:

  • All L1 ISs: 01-80-C2-00-00-14
  • All L2 ISs: 01-80-C2-00-00-15
2.2.7.3.1.DIS election
  • Designated Intermediate System
  • Similar to the DR in OSPF Protocol
  • Responsibility

    • Creating and updating pseudonode LSPs
    • Flooding LSPs over the LAN
  • L1 and L2 DIS may not be the same router
  • Selection of the DIS

    • The highest priority. Configurable priority from 0 to 127.
    • Highest SNPA (Subnetwork Point of Attachment)

      • Highest MAC address of router’s interface
2.2.7.3.2.Pseudonodes

To minimize the complexity of managing multiple adjacencies on multiaccess media, such as LANs, while enforcing efficient LSP flooding to minimize bandwidth consumption, IS-IS models multiaccess links as nodes, referred to as pseudonodes.

As the name implies, this is a virtual node, whose role is played by an elected DIS for the LAN. Separate DISs are elected for Level 1 and Level 2 routing.

  • Election of the DIS is based on the highest interface priority, with the highest SNPA address (MAC address) breaking ties.
  • The default interface priority on Cisco routers is 64.

The responsibilities of LAN Level 1 and Level 2 DISs include the following:

  • Generating pseudonode link-state packets to report links to all systems on the LAN
  • Carrying out flooding over the LAN for the corresponding routing level

Despite the critical role of the DIS in LSP flooding, no backup DIS is elected for either Level 1 or Level 2. If the current DIS fails, another router is immediately elected to play the role.

An elected router is not guaranteed to remain the DIS if a new router with a higher priority shows up on the LAN. Any eligible router at the time of connecting to the LAN immediately takes over the DIS role, assuming the pseudonode functionality.

2.2.7.4.Passive interfaces

Passive interfaces provide a method of advertising network prefixes into IS-IS, while preventing an adjacency from forming on that interface.

A passive interface does not send out IS-IS traffic and will not process any received IS-IS packets.

2.2.8.PDU

Each PDU has two packet types, one for each level.

Each type of IS-IS packet is made up of:

  • A header with the common fields shared by all IS-IS packets
  • A number of optional variable-length fields containing specific routing-related information (Type, Length, and Value (TLV)) which has become a synonym for variable-length fields.

Enhancements to the original IS-IS protocol are normally achieved through the introduction of new TLV fields. A key strength of the IS-IS protocol design lies in the ease of extension through the introduction of new TLVs rather than new packet types.

2.2.8.1.Hello Packet (Hello)

Used to establish adjacencies between IS-IS neighbors. Once the neighbors are discovered, hello packets act as keepalive messages to maintain the adjacency. Additionally to the L1 and L2 types, Point-to-point hello packets exist.

  1. discover
  2. build
  3. maintain

IS-IS neighbor adjacencies: Sent periodically

2.2.8.2.Link-State Packet (LSP)

Used to distribute and exchange routing information between IS-IS nodes. An IS-IS router floods an LSP throughout an area to identify its adjacencies and their states, path costs as well as reachable address prefixes.

  • Carries associated networks (IPv4/IPv6) as metadata TLVs
  • Sequenced to prevent duplication
  • Unicast on Point-to-point Links, Multicast on broadcast media
2.2.8.3.Sequence number Packets (SNPs)

Used to control distribution of linkstate packets, providing mechanisms for synchronization of the distributed Link-State databases on the routers in an IS-IS routing area.

2.2.8.3.1.Complete Sequence Number PDU (CSNP)

Describe summary of LSPs in the LSDB. Similar to DBD in OSPF.

2.2.8.3.2.Partial Sequence Number PDU PSNP

Used to request and acknowledge missing pieces of link-state information. Like OSPF LSR and LSAck in one packet.

2.2.9.Hello process

Routers periodically send hello packets to adjacent peers, every hello interval. On Cisco routers, the default value of the hello interval is:

  • 10s for ordinary routers
  • 3.3s for the DIS on a multi-access link

IS-IS uses the concept of hello multiplier to determine how many hello packets can be missed from an adjacent neighbor before declaring it “dead”.

  • The maximum time-lapse allowed between receipt of two consecutive hello packets received is referred to as the holdtime.
  • The holdtime is defined as the product of the hello interval and the hello multiplier.
2.2.10.Routing
2.2.10.1.Level 1 routing

Level 1 routing is routing within an area

  • L1 routers follow a default route to the closest L1/L2 router.

    • L1-L2 routers do not advertise L2 routes into the L1 area
    • Default Route to the closest L1/L2 router
    • An IS-IS L1 area is equivalent to an OSPF totally stubby area.
2.2.10.2.Level 2 routing

Level 2 routing is routing between different areas

  • L1-L2 routers inject L1 prefixes into the L2 topology.

    • Routes from the L1 level are advertised to the L2 topology populating the L1 topology metric into the L2 link-state packet (LSP) metric.
  • L2 routers flag connectivity to the backbone to Level 1 routers by setting the attached bit in their Level 1 LSP, which is flooded throughout the area.
2.2.10.3.Interface metrics

Narrow metric:

  • 6-bit field (value between 1 and 63)
  • IS-IS assigns a default metric of 10 to all interfaces regardless of the interface bandwidth

    • A 1-Mbps link uses the same path metric as a 10-Gbps link by default

Wide metric:

  • 24-bit field
  • It should be used for large networks

    • The narrow-style metric can accommodate only 64 metric values, which is typically insufficient in modern networks
2.2.10.4.Path selection route types

IS-IS best-path selection uses the following processing order, identifying the route with the lowest path metric for each stage

External routes are no longer treated as a separate category for path selection; they are integrated based on their redistribution level and metric.

2.2.10.5.Route leaking

Even though the selected default router might be the closest in the area, it might not be the best exit out of the area when the overall cost to the destination is considered. There is a possibility of suboptimal path selection, which can be corrected by route-leaking.

  • Route-leaking is a technique that redistributes the L2 level routes into the L1 level
  • Route leaking uses a restrictive route map or route policy to control which routes are leaked
  • Set the Up/Down bit to mark routes leaked from Level 2 to Level 1, preventing routing loops by ensuring they aren’t readvertised back into the backbone
2.2.10.6.IS-IS summarization

Because all routers within a level must maintain an identical copy of the LSPDB, summarization occurs when routers enter an IS-IS level, such as

  • L1 routes entering the L2 backbone
  • L2 routes leaking into the L1 backbone
  • Redistribution of routes into an area

The default metric for the summary range is the smallest metric associated with any matching network prefix

You configure only the network that needs to have a different route and on the L1/L2 router that is the more optimal BR (not the default BR).

3.BORDER GATEWAY PROTOCOL (BGP)

RFC 1654 defines the Border Gateway Protocol as an EGP standardized path-vector routing protocol that provides scalability, flexibility, and network stability.

BGP does not advertise incremental updates or refresh network advertisements like OSPF or ISIS would – it prefers stability within the network. A flapping link could potentially result in the re-computation for thousands of routes.

3.1.COMPARISON TO IGP
IGP BGP
Neighbors typically discovered using multicast packets on the connected subnets Neighbor IP address is explicitly configured and may not be on common subnet
Does not use TCP Uses a TCP connection between neighbors (port 179)
Advertises prefix/length Advertises prefix/length, called Network Layer Reachability Information (NLRI)
Advertises metric information Advertises a variety of path attributes (PA) that BGP uses instead of a metric to choose the best path
Emphasis on fast convergence to the truly most efficient route Emphasis on scalability; might not always choose the most efficient route
Link-state logic Path-vector logic
3.2.INTERNET ROUTE AGGREGATION

Problem:

  • Increasing Internet Routing Table

    • If there are many small routes in routing table

Idea/Solution/Mitigation:

  • Route Summarization

    • Allocate consecutive addresses in a single route by geography and ISP
3.3.AUTONOMOUS SYSTEMS (AS)

An AS is a network under same administrative domain using one or more IGPs. An IGP is not required within an AS, and iBGP could be used, however, it would not scale well. Routing and security policies are under the control of one service provider or of one company.

3.3.1.Autonomous System Numbers (ASN)

Organizations requiring connectivity to the internet must obtain an ASN. They were originally 2 bytes with 65'535 ASNs. This limited range was exhausted rather quickly, prompting the expansion of the ASN range to 4 bytes in RFC 4893 , resulting in 4'294'967'295 ASNs, being backward compatibile with ASN 23456 (ASN_TRANS).

Two blocks of private ASNs are available to any organization. These can be used as long as the companies do not exchange them on the internet (similar to the private IPv4 addresses specified in RFC 1918 ). They are defined in RFC 6996 (Autonomous System Reservation for Private Use):

  • 16-bit range: 64'512 – 65'534
  • 32-bit range: 4'200'000'000 – 4'294'967'294

Note that RFC 7300 (Reservation of Last Autonomous System Numbers) define 65'535 (last 16-bit ASN) and 4'294'967'295 (last 32-bit ASN) as reserved, but not explicitly for private use.

3.4.SESSIONS

A BGP session refers to the established adjacency between two BGP routers. BGP sessions are always point-to-point and are categorized into two types, iBGP and eBGP.

3.4.1.Internal BGP (iBGP)

BGP that are peering within the same AS. iBGP sessions are considered more secure, and some of BGP’s security measures are lowered in comparison to eBGP sessions. iBGP prefixes are assigned an AD of 200 upon being installed into the router’s RIB.

  • AS-Path not modified
  • Next-hop not modified

The need for BGP within an AS typically occurs when transit connectivity is provided between autonomous systems.

Advertising the full BGP table into an IGP is not a viable solution for the following reasons:

  • Scalability: In January 2024, the internet had 943 000+ IPv4 networks, and it’s still growing. IGPs do not scale to such a high number of routes.
  • Path Attributes: IGP protocols do not know about BGP path attributes. Only BGP is capable of maintaining the path attribute as the prefix is advertised from one edge of the AS to the other edge.
3.4.1.1.Full Mesh Requirement

iBGP peers do not prepend their ASN to the AS_PATH because the NLRIs would fail the validity check (because it’s the same ASN) and would not install the prefix into the IP routing table.

No other method exists to detect loops with iBGP sessions, and RFC 4271 prohibits the advertising of NLRI received from an iBGP peer to another iBGP peer (split horizon). It also states that all BGP routers within a single AS must be fully meshed to provide a complete loop-free routing table and prevent traffic blackholing.

3.4.1.2.Peering via Loopback Addresses

BGP sessions are sourced by the outbound interface toward the BGP peers IP address by default.

It is preferable to configure the BGP neighbours to establish a session between their loopback addresses. The loopback interface is virtual and always stays up. In the event of link failure, the session remains intact if the IGP finds another path to the loopback address.

3.4.1.3.Scalability

The inability for BGP to advertise a prefix learned from one iBGP peer to another can lead to scalability issues within an AS: Let be the number of iBGP speakers. There are sessions required. In asymptotic notation, we would categorize this as .

3.4.1.3.1.Route Reflectors

RFC 1966 introduces the concept of route reflection, which allows an iBGP speaker to advertise routes learned from one iBGP peer to other iBGP peers. The router performing this function is called a route reflector (RR).

An RR forms iBGP sessions with two types of peers:

The following rules govern route reflection:

  1. If a RR receives a NLRI from a non-client peer, the RR advertises the NLRI to all clients. It does not advertise the NLRI to other non-client peers.
  2. If a RR receives a NLRI from a client, the RR advertises the NLRI to all non-client peers and to all other clients (except the originating client).
  3. If a RR receives a NLRI from an eBGP peer, the RR advertises the NLRI to all clients and all non-client peers, subject to normal BGP rules.

Only route reflectors need to be aware of this modified advertisement behaviour. Route-reflector clients require no special configuration beyond establishing the iBGP session with the RR. By introducing route reflectors, the requirement for a full iBGP mesh can be relaxed: each client only needs to peer with the RR to receive the routes from the rest of the AS.

3.4.2.External BGP (eBGP)

Sessions established with eBGP routers that are in different ASes. eBGP prefixes are assigned an AD of 20 upon being installed into the router’s RIB.

  • Each eBGP device modifies the AS-Path attribute with its own AS.
  • Each eBGP device modifies the next-hop attribute
3.4.2.1.Comparison to iBGP
  • TTL on BGP packets is set to one by default. BGP packets drop in transit if a multihop BGP session is attempted.
  • The advertising router modifies the BGP next-hop to the IP address sourcing the BGP connection.
  • The advertising router prepends its ASN to the existing AS_PATH. The receiving router verifies that the AS_PATH does not contain an ASN that matches the local routers. BGP discards the NLRI if it fails the AS_PATH loop prevention check.
3.4.3.Combining iBGP and eBGP

Combining eBGP sessions with iBGP sessions can cause problems. The most common issue involves the failure of the next-hop accessibility. iBGP peers do not modify the next-hop address if the NLRI has a next-hop address other than 0.0.0.0.

The next-hop address must be resolvable in the global RIB for it to be valid and advertised to other BGP peers.

3.4.3.1.Next hop behavior

Problem: When paired with an eBGP neighbor the next‐hop is passed to the iBGP neighbor but the iBGP neighbor is not able to reach the next-hop.

To avoid this issue, the next-hop IP address can be modified. NHOP is a BGP attribute that can also be manipulated. Configuring the next-hop-self feature modifies the next-hop address in all external NLRIs using the IP address of the BGP neighbour.

3.4.4.Multihop Sessions

TCP allows for handling of fragmentation, sequencing, and reliability (acknowledgement and retransmission) of communication packets. While BGP can form neighbour adjacencies that are directly connected, it can also form adjacencies that are multiple hops away. Multihop sessions require that the routers use an underlying route installed in the RIB (static or from any routing protocol) to establish the TCP session with the remote endpoint.

3.5.MESSAGES

BGP utilizes 4 different types of messages to establish, maintain and tear down BGP peers: OPEN, KEEPALIVE, UPDATE and NOTIFICATION.

3.5.1.OPEN

The OPEN message is used to establish a BGP adjacency. It contains the BGP version number, ASN of the originating router, hold time, the BGP identifier, and other optional parameters that describe the session capabilities.

3.5.2.KEEPALIVE

A BGP process does not rely on the TCP connection state to ensure that its neighbours are still alive. KEEPALIVE messages are exchanged every third of the hold timer agreed upon between the two BGP routers. Cisco devices have a default hold time of 180 s, so the default KEEPALIVE interval is 60 s.

3.5.3.NOTIFICATION

A NOTIFICATION message is sent when an error is detected with the BGP session, such as a hold timer expiring, neighbour capabilities change, or when a BGP session reset is requested. It causes the BGP connection to close.

3.5.4.UPDATE

An UPDATE message advertises feasible routes, withdraws previously advertised routes, or does both. It contains Network Layer Reachability Information (NLRI), which includes the prefix, along with associated BGP path attributes when advertising those prefixes. Withdrawn NLRIs include only the prefix. An UPDATE message can also act as a KEEPALIVE message to reduce obsolete traffic.

A BGP update message is composed of:

  • A list of routes to explicitly withdraw.
  • The attributes associated with the new prefixes being advertised in this update.

    • The attributes include AS path, MED, community, and many others.
  • The new prefixes

    • The routes include both a network and a mask.
3.6.PATH ATTRIBUTES
  • Per RFC 4271 , well-known attributes must be recognized by all BGP implementations.
  • Optional attributes do not have to be recognized by all BGP implementations.

BGP attributes can be classified in 4 categories:

Well‐known
Mandatory
Well‐known
Discretionary
Optional
Transitive
Optional
Non-transitive
  • AS Path
  • Origin
  • Next Hop
  • Atomic Aggregate
  • Local Preference
  • Community
  • Aggregator
  • MED
  • Weight
  • Cluster ID
  • Originator ID
  • Cluster List

NLRI (Network Layer Reachability Information) is the format used to represent the prefixes a BGP speaker advertises to its peers (informing them of networks reachable via specific paths). It consists of a network prefix, prefix length, and any BGP prefix attributes for that specific route.

3.7.PATH CALCULATION

A route advertisement consists of the NLRI and its path attributes. A BGP router may learn multiple paths to the same destination network. The attributes associated with each path influence the desirability of the route when the router selects the best path. A BGP router advertises only its selected best path to its peers.

Within the BGP table, all learned routes and their associated path attributes are maintained, and the best path is calculated. The selected best path is then installed in the router’s routing table (RIB). If the best path becomes unavailable, the router can evaluate the remaining known paths to quickly select a new best path.

BGP recalculates the best path for a prefix when one of the following events occurs:

  • A change in reachability of the BGP next hop.
  • Failure of an interface connected to an eBGP peer.
  • A change in redistributed routes.
  • When receiving new paths for the same prefix.

Some router configurations modify the BGP attributes to influence inbound traffic or outbound traffic. BGP path attributes can be modified upon receipt or advertisement to influence routing in the local AS or neighbouring AS. A basic rule for traffic engineering with BGP is that modifications in outbound routing policies influence inbound traffic, and modifications to inbound routing policies influence outbound traffic.

3.7.1.Route Selection

BGP installs the first received path as the best path automatically. When additional paths are received, the newer paths are compared against the current best path. If there is a tie, then processing continues onto the next step, until the best path winner is identified.

BGP uses 11 steps to determine the best path:

  1. Prefer highest weight (local to router) CISCO specific - do not use!
  2. Prefer highest local preference (global within AS)
  3. Prefer routes that the router originated
  4. Prefer shorter AS paths (only length is compared)
  5. Prefer lowest origin code (IGP < EGP < Incomplete)
  6. Prefer lowest MED (also called metric)
  7. Prefer external (EBGP) paths over internal (IBGP)
  8. For IBGP paths, prefer path through closest IGP neighbor
  9. For EBGP paths, prefer oldest (most stable) path
  10. Prefer paths from router with the lower BGP router-ID
  11. Prefer the path that comes from the lowest neighbor address
3.8.LOOP PREVENTION

As a path-vector routing protocol, BGP does not contain a complete topology of a network (as opposed to link-state routing protocols). BGP behaves similar to distance vector protocols to ensure a path is loop free.

The BGP attribute AS_PATH is a well-known mandatory attribute that includes a complete listing of all the ASNs that the prefix advertisement has traversed from its source AS. If AS-Path includes the router’s ASN, then it is ignored.

3.9.NETWORK STATEMENTS

BGP Network statements identify a specific network prefix to be installed into the BGP table. After configuring a BGP network statement, the BGP process searches the global RIB for an exact network prefix match. The network prefix can be a connected network, secondary connected network, or any route from a routing protocol. After verifying that the network statement matches a prefix in the global RIB, the prefix is installed into the BGP table.

The following BGP Origin path attribute is set depending on the RIB prefix type:

3.10.ROUTE FILTERING AND MANIPULATION

Route filtering is a method to select routes to receive from (import) or advertise to (export) neighbouring routers. This feature can be used to manipulate traffic flows, reduce memory utilization, or to improve security.

It is common for ISPs to deploy route filters on BGP peerings to customers: They want to ensure that only the customer’s routes are allowed over the peering link, preventing the customer from accidentally becoming a transit AS on the internet.

Filtering of routes within BGP is accomplished with filter-lists, prefix-lists, or route-maps on Cisco IOS.

In Cisco IOS, regular expressions can be used in show commands and AS path accesslists to match BGP prefixes based on the information contained in their AS path.

3.10.1.Path Announcement

ASs announce paths to destination addresses, data flows back to the opposite direction.

3.11.COMMUNITIES

BGP communities provide additional capability for tagging routes and for modifying BGP routing policy on upstream and downstream routers.

Communities can be appended, removed, or modified selectively on each attribute as the route travels from router to router. They are an optional transitive BGP attribute that can traverse from autonomous system to autonomous system. A BGP community is a 32-bit number that can be included with a route. It can be displayed as a full 32-bit number (0 − 4'294'967'295) or as two 16-bit numbers (0 − 65'535:0 − 65'535), commonly referred to as new-format.

3.12.CONNECTIVITY OPTIONS
3.12.1.Single-Homed without BGP
  • The customer doesn’t use BGP.
  • Static default route on the customer side to reach outside networks
  • Specific static route on the ISP side to reach the customer IP address prefix
3.12.2.Single-Homed with BGP
  • The customer uses BGP
  • Changes in the customer topology will then be sent to the provider.
  • Provider may redistribute the changes to the internet.
3.12.3.Dual-Homed
  • One or two customer router(s)
  • Customer optimally runs BGP between routers
  • ISP announces default route
  • Usually used in a primary/backup design

    • Local Preference to steer traffic
    • First Hop Redundancy Protocols used to ensure correct traffic routing

      • e.g. Hot Standby Router Protocol (HSRP)
  • Load-Sharing possible
3.12.4.Multi-Homed

Used in an active/active design

  • By receiving the full routing table, the path through the internet can be optimized

A customer AS never wants to be transit

  • Only advertise routes originating in own AS to ISPs
3.12.5.Dual Multi-Homed
Provides the most redundancy (ISPs as well as links).
3.12.6.Traffic engineering
3.12.6.1.Outbound

How can you influence how traffic is leaving the network?

3.12.6.2.Inbound

How can you influence how traffic is entering the network?

3.12.6.3.Caveats

An AS has direct control over egress traffic but lacks absolute control over ingress paths.

Example: an ISP’s Local Preference settings will take over and effectively ignore any MED or AS_PATH attributes.

3.12.6.4.Aggregate

Used in Multi-Homed systems.

  • Customer prefers Primary provider
  • Using Alternate only as backup
  • Primary provider advertises aggregated networks
  • Alternate provider advertises individual network
  • Remote autonomous systems prefer longest-match prefix
  • Result: Traffic toward the customer flows through Alternate provider
  • Solution: Don’t use Provider-aggregate public IP address
4.THE INTERNET
4.1.STRUCTURE

The internet is a network of networks.

  • Internet is an interconnection of 10′000s autonomous service providers and customers.
  • There is no central co-ordination for the management of interconnections, services and tariffs.

Who controls the internet?

  • The control over paths is completely distributed. It is all based on trust.

Assumption:

  • The Internet was based on a well-ordered provider client hierarchy.

Reality:

  • Unordered subset of interconnects
  • Driven by business requirements underpinned by performance
  • Non-disclosure and bilateral agreements
  • Peering is now considered a corporate asset & legal concern
4.2.PUBLIC IP ADDRESS ASSIGNMENT
Term Definition
ICANN Internet Corporation for Assigned Names and Numbers
IANA Internet Assigned Numbers Authority
IR Internet Registry
RIR Regional Internet Registry
NIR National Internet Registry
LIR Local Internet Registry
  1. ICANN and IANA group public addresses by major geographic region.
  2. IANA allocates those address ranges to Regional Internet Registries (RIR).
  3. Each RIR further subdivides the address space by allocating public address ranges to National Internet Registries (NIR) or Local Internet Registries (LIR). (ISPs are typically LIRs.)
  4. Each type of Internet Registry (IR) can assign a further subdivided range of addresses to the end-user organization to use.
4.3.PEERING VS. TRANSIT

The nature of the linking between these ISPs is governed by a series of agreements known as peering arrangements.

Term Definition
Transit

Business relationship where one ISP provides reachability to all destinations in it’s routing table to its customers.

  • Transit fees, usually paid by a smaller ISP to a larger
Peering

Business relationship where ISPs provide to each other reachability to each pre-defined portions of their routing table

  • Peers are equals and pass traffic from one to another without worrying about payments.
  • They treat smaller ISPs as just another customer.
4.4.ROUTING POLICIES
  • Each ISP has a unified routing policy framework
  • The decision on which routes to advertise and which routes to accept is determined by routing policy.

    • Routes, or prefixes, not only need to be advertised to another AS, but need to be accepted.
  • ISPs do not provide free transit services and generally are either peers or customers of other ISPs.

    • Unless “arrangements” are made, transit ISPs will routinely block transit
4.5.INTERNET EXCHANGE POINT (IXP)
Public IXP Private IXP

Member will generally peer with a route server.

  • Generally, over a common switched infrastructure

The route server announces the members routes to all peers

  • Generally, the Route Server (RS) will inject its AS into the AS_PATH
  • The NEXT_HOP is preserved and keeps the RS out of the data path

Lack of policy control

  • Single legal contract to manage

Members will peer on a one-to-one basis

  • Generally, over a common switched infrastructure - Public peering
  • Can be over private interconnects – Private peering

Peers implement policy towards each other

  • One BGP session per neighbour
  • Multiple legal contracts to manage
4.6.ROUTING SECURITY
  • Internet Routing Registry (IRR)

    • unreliable
4.6.1.Resource Public Key Infrastructure (RPKI)

RFC 8210 (The Resource Public Key Infrastructure (RPKI) to Router Protocol, Version 1)

RFC 6480 (An Infrastructure to Support Secure Internet Routing)

RPKI is a robust security framework for verifying the association between resource holder and Internet resource. Helps to secure Internet routing by validating routes and thus preventing the following:

“Is this AS number (ASN) authorized to announce this IP range?”

  • RPKI-capable routers can fetch the validated Resource Origin Authorizations (ROA) data set from a validated cache
VALID INVALID NOT FOUND / UNKOWN
Indicates that the prefix and ASN pair have been found in the database

Indicates that the prefix is found, but:

  • ASN received did not match
  • The prefix length is longer than the maximum length
Indicates that the prefix does not match any in the database
4.6.1.1.Trust Anchors (TA)

A TA is a certificate authority (CA) in RPKI terms. The five Regional Internet Registries (RIR) are the TAs and have following responsibilities:

  1. Provide the infrastructure so that resource holders can sign their prefixes and ASNs.
  2. Provide a public list so that others can verify these prefixes and ASNs.

Resource certificates are based on the X.509 V3 certificate format defined in RFC 5280 and extended by RFC 3779 , which binds a list of resources (IP, ASN) to the subject of the certificate.

X.509 certificates are typically used for authenticating either an individual or, for example, a website. In RPKI, certificates do not include identity information, as their only purpose is to transfer the right to use Internet number resources.

4.6.1.2.Route Origin Authorization (ROA)
  • Specifies which ASNs are authorized to originate certain IP prefixes, enabling routers to verify the legitimacy of BGP announcements
  • A signed digital object that contains a list of addresses, prefixes and one AS number
  • Created by a prefix holder to authorize an AS number to originate one or more specific route advertisements
  • An ROA is valid if, the associated certificate can be validated up to the TA of the of the corresponding RIR e.g. (RIPE, APNIC, etc.)
4.6.1.3.RPKI Validators
  • RPKI Validators also called Relying Party Software
  • Run independently by an organization
  • Synchronizes and validates resource records
  • Preloaded with trust anchor locators (TALs) for all RIRs.
  • Retrieves data with a specific protocol RFC 8182

    • Follows chain of trust top to bottom
  • Validates cryptographic signatures on all objects.

    • Outputs a list of Validated ROA Payloads (VRPs).
  • Periodically retrieves updates

Currently, RPKI only provides origin validation. While BGPsec path validation is a desirable characteristic and standardised in RFC 8205 , real-world deployment may prove limited for the foreseeable future. However, RPKI origin validation functionality addresses a large portion of the problem surface.

4.6.2.BGP Monitoring

BGP monitoring is a process that helps network operators detect and troubleshoot issues in their routing infrastructure. By understanding and analyzing BGP data, operators can optimize network performance, minimize downtime, and maintain the overall health of their networks.

5.UNICAST
Unicast Broadcast Multicast
Source sends unicast datagrams, one for each receiver Source sends a datagram for each network (summarizing) Routers actively participate in multicast, making copies as needed
Only each individual target receive the packets “Not receivers” still receive packets “Not receivers” do not receive packets
6.BROADCAST
6.1.LAYER 2 - ETHERNET BROADCAST FRAMES

At Layer 2, the all-hosts broadcast MAC address is ffff.ffff.ffffffff.ffff.ffff. Switches must replicate such frames when forwarding them so that all devices within the VLAN receive the traffic; this process is known as flooding. Example: ARP request.

Switches forward broadcast traffic out of every interface in the same VLAN that is in a forwarding state, except for the port on which the frame was received.

6.2.LAYER 3 - IP BROADCAST PACKETS

A local broadcast (255.255.255.255) is restricted to the local network segment and is never forwarded by routers. The router accepts it, but only for the local interface. This is the default behavior on routers from all major vendors. If routers were to forward local broadcast packets, they would have to send them out of every interface with IP enabled, because the destination address 255.255.255.255 is considered reachable through all IP-enabled interfaces on the router.

In contrast, a directed broadcast (e.g., 192.168.1.255) is the broadcast address for a specific subnet and can be routed to that subnet by a router. This works for local networks (e.g., 10.1.1.255) and for remote networks (e.g., 10.3.3.255).

This distinction highlights the fundamental difference between bridging traffic at Layer 2 and routing traffic at Layer 3.

7.MULTICAST

Broadcast traffic is confined to a single broadcast domain and is received by all devices within that segment, regardless of whether they require the data. In contrast, multicast is designed for efficient group communication across potentially multiple network segments, where traffic is delivered only to explicitly interested receivers. Furthermore, multicast traffic can be routed through the network, while broadcast traffic is typically not forwarded by routers and therefore remains limited to a single Layer 2 domain.

7.1.LAYER 2 - MAC MULTICAST

While Layer 3 multicast enables packets to be routed across multiple network segments, the final distribution to end hosts is performed at Layer 2 using multicast MAC addresses. This distinction ensures that multicast traffic reaches the correct networks while avoiding unnecessary flooding within each segment. Without Layer 2 multicast mechanisms, switches would treat multicast traffic similarly to broadcast traffic, leading to inefficient use of bandwidth and increased processing overhead on end devices.

7.2.LAYER 3 - IP MULTICAST

UDP-based

One-to-many Many-to-many
  • Music-on-hold services
  • Sensor updates
  • Stock exchanges
  • Group chat applications
7.2.1.Benefits
  • Enhanced Efficiency

    • Controls network traffic and reduces server and CPU loads
  • Optimized Performance

    • Eliminates traffic redundancy
  • Distributed Applications

    • Makes multipoint applications possible
7.2.2.Inner workings
Best effort delivery No congestion avoidance Duplicated packets
  • Drops are to be expected
  • multicast applications should not expect reliable delivery of data and should be designed accordingly
  • Lack of TCP windowing and “slow-start” mechanisms can result in network congestion
  • multicast applications should attempt to detect and avoid congestion conditions
  • Out-of-order delivery: Some protocol mechanisms may also result in out-of-order delivery of packets
  • multicast applications should be designed to expect occasional duplicate packet
7.2.3.Source
  • Sends packets to multicast group IP address
  • All group members will receive multicast traffic
  • Source doesn’t have to be a member of group
7.2.4.Receiver
  • Any multicast capable device
  • Express interest in particular multicast group
  • Receive traffic destinated to Multicast group IP
7.2.5.Addressing
Type Address range Details
Link-local multicast 224.0.0.0/24
(to 224.0.0.255)
  • Used for control protocols (e.g., routing protocols, IGMP)
  • Packets are not forwarded by routers and remain within the local network segment
  • Transmitted with TTL = 1
Internetwork 224.0.1.0/24
(to 224.0.1.255)
  • Used for protocol control that must be forwarded through the Internet (eg. NTP)
SSM (Source-
Specific Multicast)
232.0.0.0/8
(to 232.255.255.255)
  • Used when both source and group are known

RFC 4607

Globally scoped
multicast
235.0.0.0 -
238.255.255.255)
  • Can be routed across networks
  • Used for general multicast applications

RFC 5771

Administratively
scoped addresses
239.0.0.0/8
(to 239.255.255.255)
  • Intended for private multicast domains
  • Similar to private IPv4 addresses

RFC 2365

7.2.5.1.Link-local multicast

The link-local multicast range has special significance, as traffic in this range is handled differently from other multicast traffic. Packets sent to these addresses are not routed and always remain within the local network segment.

At Layer 2, link-local multicast traffic is typically flooded to all ports, similar to broadcast traffic. However, unlike broadcast, multicast frames are only processed by hosts that support multicast and listen to the corresponding multicast MAC addresses.

Switch behavior for this range is defined such that mechanisms like IGMP snooping do not restrict forwarding, ensuring that essential control traffic is always delivered RFC 4541 .

7.2.5.2.IPv4 MAC address mapping

There is a specific reserved range (25bits) for Multicast MAC: 0100.5E00.0000 to 0100.5E7F.FFFF. A multicast MAC consists of the reserved range and parts of the IP address.

Multicast IPv4 addresses belong to the Class D range and always begin with the binary prefix 1110:

0b1110 0000 - 0b1110 1111 (0d224 - 0d239)

This corresponds to the address range:

224.0.0.0 – 239.255.255.255

The lower-order 23 bits of the IP address are mapped to the lower-order 23 of the IP multicast address. 5 bits are variable.

Limitations: Since only the last 23 of the 32 bits of the IP multicast address are used in the MAC address, and 4 bits are reserved as a fixed prefix, 5 bits (bits 5-9) are lost during the mapping process. Meaning, out of all of the possible addresses could get the same address assigned.

7.2.5.3.IPv6 MAC address mapping

IPv6 follows the same schema. The IPv6 multicast address range is ff00::/8 (the first 8 bits are fixed). The corresponding Ethernet multicast MAC address range is 3333.0000.0000 – 3333.FFFF.FFFF (the first 16 bits are fixed).

Mapping is performed by taking the lower 32 bits of the IPv6 multicast address and inserting them into the lower 32 bits of the Ethernet multicast MAC address, resulting in a many-to-1 (-to-) mapping between IPv6 multicast addresses and Ethernet multicast MAC addresses.

7.2.6.Internet Group Management Protocol (IGMP)

IGMP is the protocol used to manage group subscriptions for IPv4 multicast. On the router, IGMP tracks multicast group memberships on each segment. The operation can be summarized as follows:

  • The router sends query messages to discover hosts that are members of a multicast group.
  • Hosts send membership report messages to indicate interest in joining or leaving a multicast group, and also respond to router queries with report messages.

The selection of which IGMP version to run on your network depends on the operating systems and behavior of the multicast application. There are three IGMP versions: 1, 2, and 3. Each of these has unique characteristics.

7.2.6.1.IGMPv1

IGMPv1 offers a basic query-and-response mechanism to determine which multicast streams should be sent to a particular network segment.

IGMPv1 lacks an explicit leave-signaling mechanism. When a host silently leaves a group, the router continues forwarding the multicast stream until the group membership timer expires. To maintain the group state, the router sends periodic Membership Queries to the All-Hosts address (224.0.0.1) every 60 seconds. If no Membership Reports are received after several consecutive query cycles, the router removes the group from the interface and prunes the stream.

7.2.6.2.IGMPv2

One of the most significant improvements of IGMPv2 over IGMPv1 was the addition of the leave process. A host using IGMPv2 can send a leave-group message to the router indicating that it is no longer interested in receiving a particular multicast stream. This operation eliminates a significant amount of unneeded multicast traffic by not having to wait for the group to time out.

IGMPv2 added the capability of group queries. This feature allows the router to send a message to the hosts belonging to a specific multicast group. Every host on the subnet is no longer subjected to receiving a multicast message.

7.2.6.3.IGMPv3

The most significant addition in IGMPv3 is support for source filtering. In IGMPv1 and IGMPv2, a host could not specify the source of a multicast stream. Source filtering allows a host to signal membership using include or exclude source lists, indicating from which senders it wants or does not want to receive traffic. This provides finer control and can improve security at the application level.

IGMPv3 enables hosts to signal source-specific multicast membership, allowing PIM Source-Specific Multicast (SSM) to be used for IP multicast routing. In this context, IGMPv3 hosts send membership reports to the multicast address 224.0.0.22 (all IGMPv3 routers), replacing the earlier 224.0.0.2.

7.2.6.4.IGMP Snooping

IGMP Snooping is a Layer 2 switch feature that listens for IGMP conversations between hosts and routers to intelligently map multicast traffic only to the ports that have requested it, preventing it from flooding the entire VLAN like a broadcast.

When IGMP snooping is not enabled on a VLAN, any multicast will be treated as broadcast and will be sent to all end-hosts connected to the VLAN.
When IGMP snooping is enabled on a VLAN, a Layer 2 switch listens for IGMP messages. During the snooping process, the switch learns which end-hosts want to receive which groups and builds an IGMP snooping table. The switch is now using that table to forward the multicast traffic to the port that requested it. When an end-host leaves a group, the switch will also read the IGMP Leave message and will update the IGMP snooping table accordingly.

7.3.PROTOCOL INDEPENDENT MULTICAST (PIM)

PIM is the most widely used multicast routing protocol. PIM does not build its own routing table; instead, it relies entirely on the existing unicast routing table to make forwarding decisions.

This means that PIM can operate with any underlying unicast routing protocol, such as static routing, OSPF, or IS-IS. In contrast, older multicast routing protocols like DVMRP or MOSPF maintain their own separate routing information.

PIM uses the concept of the Reverse Path Forwarding (RPF) check to ensure that multicast traffic follows the correct path through the network.

PIM supports three different operating modes:

7.3.1.PIM Dense Mode

PIM dense mode is a multicast routing protocol that floods multicast traffic across all network links until it receives prune messages from routers not interested in receiving the traffic. This is called a “push” model where we flood multicast traffic everywhere and then prune it when it’s not needed.

7.3.1.1.Dense-Mode mechanism

The PIM dense mode operation can be summarized in four steps:

7.3.1.2.IGMP and the Querier

Although PIM dense mode handles multicast routing between routers, it relies on IGMP to learn about receivers on directly connected networks.

On each local network segment, one router is elected as the IGMP querier. This router periodically sends IGMP query messages to discover interested receivers. Hosts respond with IGMP membership reports for the multicast groups they wish to join. These IGMP messages are not only used by routers, but are also essential for switches running IGMP snooping. IGMP snooping relies on the presence of a querier to observe query and report messages in order to build multicast forwarding tables. If no IGMP querier is present, switches with IGMP snooping enabled may not learn any group memberships and can therefore drop multicast traffic. In contrast, if IGMP snooping is disabled, multicast traffic is flooded similarly to broadcast, and no querier is required.

Based on the received reports, the router determines whether there are active receivers for a given multicast group on an interface. If no host responds, the router assumes that no receivers are present and can prune that interface from the multicast distribution tree.

7.3.1.2.1.Role in Dense Mode

Even though dense mode uses a flood-and-prune approach, IGMP is essential to:

  • Detect whether receivers exist on a local network
  • Prevent unnecessary multicast traffic on access links
  • Trigger prune behavior when no receivers are present

Thus, IGMP complements PIM dense mode by providing receiver awareness at the network edge, enabling more efficient multicast forwarding.

7.3.1.3.Challenges

PIM dense mode suffers from inefficient bandwidth utilization and resource allocation, particularly in large networks. The flooding nature of dense mode can result in unnecessary congestion, especially in scenarios where only a fraction of devices require multicast traffic. Dense mode is better suited for smaller networks or environments where multicast traffic is universally sought.

7.3.2.PIM Sparse Mode

Protocol independent multicast sparse-mode (PIM-SM) is a protocol that is used to route multicast packets in the network more efficiently. It interacts with IGMP to recognize networks, that have members of a multicast group and with the routing table to forward the traffic using the desired path. This is called a “pull” model, because multicast traffic is only forwarded on request.

The basic mechanism of PIM-SM can be summarized as follows:

  • Receivers join a multicast group by sending join messages toward a Rendezvous Point (RP) in ASM, forming a shared distribution tree. In SSM, receivers instead join directly toward the source, creating a source-specific distribution tree.
  • Routers forward multicast traffic only on interfaces for which they have received explicit join messages from downstream routers.
7.3.2.1.Any Source Multicast (ASM)

When IGMPv1 or IGMPv2 is used, the multicast source is unknown to receivers, as they can only join a group (*,G)(*,G). IGMPv3 allows receivers to specify a source (S,G)(S,G), which is why it is typically used for Source-Specific Multicast (SSM), although it can also operate in ASM.

Any-Source Multicast (ASM) is a many-to-many communication model in which any sender can transmit data to a multicast group, and receivers do not need to know the sender in advance. From the receiver’s perspective, the goal is simply to receive traffic sent to a specific multicast group, denoted as GG, regardless of the source. This is expressed using the notation (*,G)(*,G), where ** represents any source. In ASM, the network is responsible for automatically discovering active sources and delivering their traffic to interested receivers.

7.3.2.2.Rendezvous Point (RP)

If a receiver sends an IGMP Join (*,G)IGMP Join (*,G) to the first-hop router, the first-hop router forwards a PIM Join (*,G)(*,G) hop-by-hop towards the RP.

The RP acts as the meeting place for sources and receivers. A source initially sends its traffic to the RP using a PIM Register tunnel.

For this process to work, every router in the network must know the location of the RP. This is achieved through a mapping that assigns each multicast group to a specific RP. The mapping can either be configured statically on all routers or learned dynamically via mechanisms such as the Bootstrap Router (BSR) ( RFC 5059 ) or Auto-RP ( Cisco-proprietary BS ).

However, the RP is only required during the initial phase of multicast communication. Once the receiver learns about the active source via the RP, the last-hop router can determine the optimal path to the source using the unicast routing table. It then builds a shortest-path tree (SPT) directly towards the source. As a result, multicast traffic no longer needs to traverse the RP and instead follows the most efficient path from source to receiver.

7.3.2.2.1.IPv6

An advantage of IPv6 multicast compared to IPv4 multicast is that the Rendezvous Point address can be included in the multicast address.

Below is illustrated how an IPv6 address of a Rendezvous Point is included in an IPv6 multicast address. The flags are set to 0111. This means that the Network Prefix defines the IPv6 address of the Rendezvous Point and the multicast address is dynamically assigned. With the Network Prefix and the RPaddr field, the address of the Rendezvous Point can be calculated.

The advantage of this method is that the administrator only has to configure the multicast address.

Type Flags Scope Rsvd RpAddr Plen Network Prefix GroupID
FFFF 77 88 00 11 4040 2001:DB8:12:02001:DB8:12:0 1111:22221111:2222
8 Bits 4 Bits 4 Bits 4 Bits 4 Bits 8 Bits 64 Bits 32 Bits
7.3.2.3.Reverse Path Forwarding (RPF)

To ensure that multicast packets follow correct and loop-free paths, routers rely on the concept of Reverse Path Forwarding (RPF).

RPF is a mechanism that verifies whether a multicast packet has arrived on the correct interface. A router performs an RPF check by consulting its unicast routing table and determining the interface it would use to reach the source (or RP). If the multicast packet arrives on that interface, it is accepted and forwarded; otherwise, it is discarded. This ensures efficient forwarding and prevents routing loops.

7.3.2.4.Source Specific Multicast (SSM)

In SSM, the receiver knows the exact source from which it wants to receive multicast traffic. This simplifies the multicast process, as a shortest-path tree can be built directly toward the source without the need for a RP.

The receiver subscribes to a channel using IGMPv3, which provides the first-hop router with both the source IP address and the multicast group address. As a result, PIM can immediately construct a source-specific tree (S,G)(S,G).

In SSM, no shared trees (*,G)(*,G) are used. Multicast state is built exclusively as shortest-path trees toward the source. An SSM channel is therefore identified by an (S,G)(S,G) pair, where SS is the source address and GG is the group address.

IANA has reserved the IPv4 address range of 232.0.0.0/8 for PIM SSM. It is recommended to allocate SSM multicast groups using that range.

PIM-SSM requires that the successful establishment of an (S,G)(S,G) forwarding path from the source SS to any receiver(s) depends on hop-by-hop forwarding of the explicit join request from the receiver toward the source. The receivers send an explicit join to the source because they have the source IP address in their join message with the multicast address of the group. PIM-SSM leverages the unicast routing topology to maintain the loop-free behaviour.

The receivers can receive traffic only from designated (S,G)(S,G) channels to which they are subscribed, which is in contrast to ASM, where receivers need not know the IP addresses of sources from which they receive their traffic.

7.3.3.PIM Sparse-Dense Mode

In PIM sparse-dense mode we can use sparse or dense mode for each multicast group.

7.4.MULTICAST ROUTING CONCEPTS

Unicast routing protocols like OSPF are forward-looking, the routing information or routes stored in the routing database provide information on the destination.

The opposite is true for multicast routing. The objective is to send multicast messages away from the source towards all branches or segments of the network interested in receiving those messages. Messages must always flow away from the source, and never back on a segment from where the transmission originated. This means rather than tracking only destinations, multicast routers must also track the location of sources, the inverse of unicast routing. This method is called reverse path forwarding (RPF).

7.4.1.RPF Check

Even though multicast uses the exact inverse logic of unicast routing protocols, you can leverage the information obtained by those protocols for multicast forwarding.

In the case of a Shared Tree with a Rendezvous Point, the RPF check is carried out against the Rendezvous Point, because it is the one that knows the source.

7.4.2.The (*,G)(*,G) multicast routing table entry

IGMP hosts sends an IGMP membership report also called IGMP Join. The router adds the (*,G)(*,G) entry to the mroute table.

7.4.3.The (S,G)(S,G) multicast routing table entry

In order to build a tree with an (S,G)(S,G) the router needs to receive an (S,G)(S,G) join or (S,G)(S,G) membership report from hosts via IGMP.

After a source for a group is known by the router, it adds the (S,G)(S,G) to the multicast routing table.

8.NETWORK DESIGN

RFC 1925

8.1.AVAILABILITY
Term Definition
MTBF Mean Time Between Failures
MTTR Mean Time To Repair (Resolve)
MTTD Mean Time To Detect
MTTI Mean Time To Identify
MTRS Mean Time To Restore Service
MTBSI Mean Time Between Service Incidents
Availability

MTBF combined

MTBF parallel

8.1.1.Reasons for downtime
8.1.2.Redundancy

Redundancy is a tradeoff

  • Adds complexity

    • Source of errors
    • Network Addressing
    • Routing / Traffic Flow
    • Which redundancy mechanism is appropriate?

Adding links (paths) increases the MTBF

  • However, it also increases the MTTR

    • Increasing parallelism increases routing complexity
    • Thus, increasing convergence time
    • Convergence time is directly tied to the MTTR

Network Design is not trivial

Consider Backup Paths vs. Load Balancing

  • Backup Paths:

    • Duplicate devices/links on the primary path
    • Build extra link to provide redundancy
    • Questions arise:
    • How much capacity does the backup link need?
    • How quickly will the network begin to use the backup path?
  • Load Balancing:

    • ECMP
    • Ether- / Port-Channel
8.2.SCALABILITY

Things to consider when planning for expansion:

  • What is the bandwidth need and future growth?
  • How many more sites will be added in the next years?
  • How many more users?
  • How many more servers?
  • How many more tenants (customers)?

Scalability constraints

  • Broadcasts
  • Limitations

    • Addresses
  • Separations of applications/tenants/customers
8.3.TOPOLOGY

A topology describes how a network is connected.

Star Bus
Ring Full Mesh
Hierarchical Flat
Eg. Tree Eg. Ring

Divide and conquer

  • Each layer has clearly defined tasks
  • Allow summarization

    • Addressing
    • Traffic / Capacity Planning

Improve Scalability

  • Think about adding components

    • For Capacity
    • For More Ports
  • Small networks
  • Any-to-Any communication

    • MPLS VPN
    • LAN
8.4.CAMPUS
  • Enterprise network with hundreds/thousands of user
  • More than one LAN (Local Area Network)
  • Limited geographical area

    • connects multiple buildings
  • Connected via Ethernet (and Wireless)
  • One company owns the hardware
Traditional / Hierarchical Design Model Emerging Technologies / Fabric
  • Most common design
  • Typically, consists of three layers

    • Core
    • Distribution
    • Access
  • Newer technologies
  • Underlay/Overlay networks

    • Software-Defined Networking (SDN), Software-Defined Access (SDA)
    • EVPN
    • Segment Routing (SR)
    • (MPLS)
8.4.1.Hierarchical
  • Each layer has specific functions/capabilities

    • Simplifies network design, deployment and management
  • Design elements can be changed easily

    • Adds scalability/modularity
    • Use the same “building blocks”
  • Changes in the network affect a small subset of the network
8.4.1.1.Core
  • Backbone: connects different distribution layer switches

    • Large networks
    • Geographically reasons e.g. several buildings
    • Simplifies design, otherwise full-mesh between distribution layer
  • Simple but fast
  • Only layer 3

    • Drives resiliency and stability
  • Design criteria:

    • Scalability
    • Capacity
    • Redundancy
8.4.1.2.Distribution
  • Aggregate data from multiple access switches and connect to the core
  • Design Simplification

    • Scalability
    • Smaller Fault Domains
    • Redundancy
  • Usually, Layer 3 Boundaries

    • Load Balancing
    • Inter-VLAN Routing
    • Optimizations: Route summarization and fast convergence, loop protection
    • Security Policies
    • QoS enforcement
8.4.1.3.Access
  • Connects user devices/end-points to network
  • High port density but low cost
  • Power over Ethernet (PoE)
  • Network Access Control
  • QoS classification
  • L2 features

    • VLAN
    • STP
    • IGMP snooping
    • DHCP snooping
    • Etc.
8.4.1.3.1.Issues
8.4.1.3.2.Simplified access
  • Create a switch stack at the distribution layer
  • Easier management

    • No FHRP
    • Single Point of Managmenet
  • No loop
  • Scalability

    • Add switches to stack easily
8.4.1.4.Collapsed core
  • Dual Role:

    • Core and distribution combined
  • Access Layer Aggregation
  • Service Connectivity e.g.

    • External Services: WAN/Internet
    • WLAN
  • Reduces complexity
  • Limited Scalability
8.4.2.First Hop Redundancy
  • Provide a resilient default gateway/first hop address to end-stations
  • Different First Hop Redundancy Protocols (FHRP)
  • Leverage Timers for Fast Failover
  • Optimize Timers for Smooth Transitions

    • Goal: As few blackholed traffic as possible
8.4.2.1.Virtual Router Redundancy Protocol (VRRP)
  • A group of routers function as one virtual router by sharing one virtual IP address (and each one its own MAC address)
  • One (master) router performs packet forwarding for local hosts
  • The rest of the routers act as “backup” in case the master router fails
  • Backup routers stay idle as far as packet forwarding from the client side is concerned
  • IETF Standard RFC 3768
  • VRRP if you need multivendor interoperability
8.4.2.2.Hot Standby Router Protocol (HSRP)
  • A group of routers function as one virtual router by sharing one virtual IP address and one virtual MAC address
  • One (active) router performs packet forwarding for local hosts
  • The rest of the routers provide “hot standby” in case the active router fails
  • Standby routers stay idle as far as packet forwarding from the client side is concerned
  • Cisco proprietary
8.4.2.3.Gateway Load Balancing Protocol (GLBP)
  • All the benefits of HSRP plus load balancing of default gateway

    • Utilizes all available bandwidth
  • A group of routers function as one virtual router by sharing one virtual IP address but using multiple virtual MAC addresses for traffic forwarding

    • Active Virtual Gateway (AVG) responds to ARP
    • Active Virtual Forwarder (AVF) is used for forwarding
    • AVG sends virtual MAC addresses of AVFs
  • Cisco proprietary
8.4.3.Software Defined Access
  • “One controller to rule them all”
  • Example: Overlay Protocols

    • VXLAN / LISP
    • VXLAN / EVPN
  • Decoupling Layer 2 / Layer 3
  • Anycast Gateway
  • Automation
  • Simplification
8.4.3.1.Ethernet VPN (EVPN)

A technology for carrying layer 2 Ethernet traffic as a virtual private network using wide area network protocols. EVPN technologies include Ethernet over Multiprotocol Label Switching (MPLS) and Ethernet over Virtual Extensible LAN (VXLAN).

  • Anycast Gateway
8.5.DATA CENTER
  • Predominant East-West traffic
  • Campus Requirements apply
  • Additional Requirements

    • Agility
    • Multitenancy
    • Scalability
  • Depends on the data center type

    • Enterprise Data Center differs from Cloud Data Center
  • IP network
  • Storage network
8.5.1.Data Center Tiers
Parameters Tier 1 Tier 2 Tier 3 Tier 4
Uptime guarantee 99.671% 99.741% 99.982% 99.995%
Downtime per year <28.8 hours <22 hours <1.6 hours <26.3 minutes
Price $ $$ $$$ $$$$
Compartmentalization No No No Yes
8.5.2.Traffic patterns
North-South traffic East-West traffic
  • Traffic traveling between different network scopes (e.g., internal and external)
  • Entering or Leaving Data Center
  • Client-Server traffic
  • Traffic that stays within the same network scope
  • Does not leave data center(s)
  • Traffic between servers
  • Traffic between server and storage
8.5.3.Logical Topology

Web Tier

Application Tier

Database Tier

8.5.4.Three-Tier Data Center Architecture
  • Similar to Access-Distribution-Core “Hierarchical” (Page 1)

    • Distribution is called Aggregation
  • Also called Fat Tree Data Center Network
  • Server connect to Access layer

Problems:

  • Server Virtualization increased East-West traffic

    • VMs / Containers
  • East-West traffic goes over all layers
  • Three-Tier Data Center designed for North-South traffic

    • Failed to adapt to modern workloads
  • Nowadays we need Layer 2 connectivity
8.5.5.Leaf Spine Architecture
  • Sometimes called two-tier architecture
  • Solution for modern workloads
  • Brings a lot of advantages

    • Scalability – easy expansion
    • Reduce latency
    • Load-Balancing through ECMP
    • Fault Tolerance
    • No L2 problem thanks to Fabric

      • EVPN / VXLAN
      • Or similar technologies
8.5.6.Top of Rack (ToR)

[1]

8.5.7.End of Row (EoR)

[2]

8.5.8.Comparison
BIBLIOGRAPHY