summaries-se-ost

Computer Networks 2

yoinked from https://github.com/Nicicalu/OST-CN2-Spick

Equal Cost Multipath (ECMP) load balancing, fast convergence, widely used. Uses IP Port 89 on Layer 3. Uses 224.0.0.5 (all routers), 224.0.0.6 (all DR,BDR)

DR Election
Highest interface priority if tie, highest Router ID. Priority 0 = ineligible for DR/BDR. No preemption. BDR = same rules, second best candidate
  • Establish neighbor adjacencies and exchange LSAs
  • Build the Link-State Database (LSDB)
  • Run Dijkstra’s SPF algorithm:

    • Intra-area change: SPF recalculation
    • Inter-area change: No SPF needed — ABR handles updates
  • Build the routing table from SPF results
  • AS split into areas (sub-domains), each with a 32-bit Area ID (e.g., 0.0.0.0 = Area 0)
Backbone Area (Area 0)
Core of the OSPF domain (must exist) Must connect to all other areas (directly or via virtual links), Must be contiguous (no disjointed segments), Should not contain end-user networks
Non-Backbone Areas
Connect end users and local resources, All inter-area traffic must transit the backbone
Area Border Router (ABR)
Connects two or more OSPF areas, must have 1 interface in backbone. 1 OSPF DB per Area
Internal Router
All interfaces belong to the same area (non-backbone)
Backbone Router
At least one interface in Area 0, Includes ABRs and routers internal to the backbone
AS Boundary Router (ASBR)
Connects to external AS/Network (e.g., BGP), Advertises external routes into OSPF, Can be in backbone or non-backbone area
Rule 1
Backbone (Area 0) must be contiguous — no partitions allowed
Rule 2
Every non-backbone area must connect to Area 0
Type 1 – Router-LSA
Sent by all routers, lists directly connected links (so all outgoing interfaces with state and cost) (intra-area only)
Type 2 – Network-LSA
Sent by DR, lists all routers and DR in broadcast/multi-access network (intra-area only)
Type 3 – Summary-LSA
Sent by ABRs, advertises networks from other areas, flooded in all the areas that are not “totally stubby” (inter-area)
Type 4 – ASBR-Summary
Sent by ABRs, advertises path to ASBRs (inter-area)
Type 5 – AS-External-LSA
Advertises external routes (e.g., BGP); flooded to all non-stub areas only normal areas
Type 7 – NSSA External
Like Type 5 but used inside NSSA; converted to Type 5 by ABR
Standard (1,2,3,4,5)
Normal
Stub Area (1,2,3)
  • Blocks external LSAs (Type 5)
  • ABR injects a default route (0.0.0.0)
  • Supports LSA Types 1–3
Totally Stubby Area (1,2)
  • Blocks external (Type 5) and summary (Type 3/4)
  • Only allows default route from ABR
Not-So-Stubby Area – NSSA (1,2,3,7)
  • Like stub area, but allows one ASBR inside
  • External routes use LSA Type 7 (converted to Type 5 by ABR)
Totally NSSA (1,2,7)
  • Like NSSA but blocks Type 3 so gets default route from ABR
Type 1 – Hello
Used to discover, maintain, and verify neighbors; forms adjacencies. Also for election of DR,BDR in broadcast networks. Contains network mask(of sending routers’ interface), Hello interval(p2p,broadcast: default=10s), Options, Priority(for election), Router dead interval(default=40s), DR/BDR IP, Neighbors.
Type 2 – Database Description (DBD/DD)
Exchange summaries of LSAs during adjacency formation (headers only). Contains Interface Max. MTU, Options, I/M/MS bits (Initial, More, Master-slave bit), DD Sequence num, LSA Header
Type 3 – Link State Request (LSR)
Sent when a router needs specific LSAs listed in the DBD. Contains Link State Type (router/network), Link State ID, Advertising Router (sender address)
Type 4 – Link State Update (LSU)
Used to flood new or updated LSAs. Contains Number of LSAs, full LSAs information
Type 5 – Link State Acknowledgment (LSAck)
Confirms receipt of LSAs to ensure reliable flooding. For this you send LSAck or implicitly by sending LSU with same info back. Many acks may be grouped together to a single LSAck.
Hello Protocol
  • Used for neighbor discovery and parameter negotiation.
  • Maintains logical adjacencies on P2P, P2MP, and virtual links.
  • Elects DR/BDR on broadcast and NBMA networks.
  • Continuously sends hello packets to maintain bidirectional connectivity; failure to receive = neighbor down (in agreed router dead interval at initialization)
Database Sync Protocol
  • Syncs LSDB using Database Description (DBD) packets with only LSA headers.
  • Uses I-bit (initial), M-bit (more), and MS-bit (master/slave).
  • ExStart: Bi-dir comm; highest Router-ID = master. Determine initial seq nr
  • Exchange: Exchange of DBD packets (LSA headers).
  • Loading: Missing LSAs are requested.
  • Full: Databases fully synchronized.
  • Each router runs Dijkstra per area; link cost = metric from LSAs (1–65535)
  • OSPF perfers more specific match (CIDR) and if then still multiple: intra-area > inter-area > external
  • Routes added to RIB/FIB based on computed next hops
  • ECMP: Modified Dijkstra supports Equal-Cost MultiPath if multiple paths have same cost routes added with multiple next-hops for load balancing
Intra-Area (O)
Source and dest in same area; routes from Type 1 and 2 LSAs
Inter-Area (O IA)
Source and dest in different areas within same AS; via Type 3 LSAs through backbone
External (E1/E2)

Dest outside AS; info injected by ASBR via redistribution

  • E1: Total = external + internal OSPF cost
  • E2: Only external cost (default)
Preference order
More specific route > Intra-area > Inter-area > E1 > E2
Cost calculation
Cost = Reference Bandwidth(default 100 Mbps)/Interface Bandwidth
CLNS
ISO Layer 3 datagram service; supports CLNP, ES-IS, IS-IS.
CLNP
Connectionless Network Protocol, similar to IP, used in ISO stack (EtherType 0xFEFE).
IS-IS
Link-state routing protocol (Layer 3); forms adjacencies with ES-IS; designed for CLNP but extended (Integrated IS-IS) to support IP.
Integrated IS-IS
Allows IP routing with IS-IS; used widely by service providers even without CLNP.
NSAP
Network Service Access Point
NET = Network Entity Title
Unique router identifier in IS-IS
Format
49.AAAA.BBBB.BBBB.BBBB.00
49.AAAA…
Area ID (variable length)
BBBB.BBBB.BBBB
System ID (usually 6 bytes = unique router ID)
00
N-Selector (NSEL) (always 00 for routers)
System ID
Must be unique per router (often based on lo-IP/MAC)
Example: 49.0001.1921.6800.1024.00 based on IP 192.168.1.24
Area ID
Used for routing hierarchy (like OSPF areas)
IIH (Hello)

Builds and maintains adjacencies; includes system ID, holding time, prio

  • Built from 3 functions: discover, build, maintain
  • Interval: 10s (default); DIS sends every 3.3s on LANs
  • Multiplier: Missed Hello limit Holdtime = Interval × Multiplier(default=3)
  • Multicast:

    • 01-80-C2-00-00-14 (AllL1ISs)
    • 01-80-C2-00-00-15 (AllL2ISs)
LSP (Link State PDU)
Contains topology info including prefixes with costs; flooded throughout the area similar to OSPF LSA Type 1
CSNP (Complete SNP)
Sent by DIS; lists all known LSPs (used for database sync)
PSNP (Partial SNP)
Used to request missing LSPs or acknowledge received LSPs
IS-IS Packet Structure
Common header + TLVs
L1 Router
Only within one area; no inter-area routing
L2 Router
Backbone router; routes between areas
L1/L2 Router
Acts as both; separates databases, redistributes between levels
  • Required on broadcast links (no DIS on p2p)
  • Sends periodic CSNPs to ensure DB sync, creates pseudonode LSP
  • No backup DIS in IS-IS
    1. Highest interface priority (0–127) Ciso default = 64
    1. Highest SNPA (MAC-Address!)
Preemption
Enabled – higher prio router automatically takes ver the DIS Role
CSNP
Sent once at adjacency startup
LSP
Advertises topology changes (link-state info)
PSNP
Acknowledges received LSPs or requests missing ones

Path Selection Order (in IS-IS) - Lower Metric better:

  • L1 intra-area routes
  • L2 intra-area routes
  • Leaked L2 L1 (internal metric)
  • L1 external (external metric)
  • L2 external (external metric)
  • Leaked L2 L1 (external metric)
  • Intra-area routing only (like OSPF intra-area)
  • L1 routers use the closest L1/L2 router for inter-area traffic
  • L1/L2 routers:

    • Do not advertise L2 routes into the L1 area (unless route leaking is active)
    • Set Attached bit to signal L2 connectivity to backbone
  • L1 routers install a default route to nearest L1/L2
  • L1 area like OSPF Totally Stubby Area
  • Distribution Bit: Set to ‘up’ (1) on L2 L1 leaks; blocks re-advertisement L1 L2.
  • Route-Leaking injects a more specific route into L1 to improve routing
  • Routing between areas (inter-area)
  • L1/L2 routers inject L1 routes into L2 topology
  • L1 routes are redistributed into L2 with L1 metric preserved in L2 LSP
Feature IS-IS OSPF
Layer L2 (CLNS) L3 (IP, proto 89)
Encapsulation No IP, uses TLVs IP packets
Hello Type IIH Hello packet
Area Model L1/L2 Backbone + Areas
Metric Cost (default 10) Cost (bandwidth)
Router ID System ID (6B) 32-bit Router ID
Adj. Types L1, L2, L1/L2 DR/BDR, P2P
LSDB Per level (L1/L2) Per area
Scaling Large-scale ISP core Enterprise/campus
Routing Info TLVs (flexible) Fixed LSA types
  • next-hop-self fixing iBGP: neighbor (neighbor IP) next-hop-self (when overriding)

Point-to-point adjacencies between BGP routers

iBGP
Between routers in the same AS, AD=200, more trusted (lower security overhead)
eBGP
Between routers in different ASes, AD=20, stricter policy enforcement
  • Unique ID for each AS; required for Internet routing with BGP
  • Private ranges:

    • 64′512–65′535 (legacy 16-bit)
    • 4′200′000′000–4′294′967′294 (32-bit)
  • Two routers with a BGP TCP session (port 179) are called peers or neighbors
  • Each BGP router is a BGP speaker
  • BGP exchanges routing info between ASes (loop-free, policy-based)
  • Supports CIDR, route aggregation; decisions based on policies/rules

Used for route control and policy enforcement

Well-known mandatory
Always present (e.g., AS-Path, Origin, Next Hop)
Well-known discretionary
Optional but recognized by all (e.g., Local Pref, Atomic Aggregate)
Optional transitive
Passed between ASes (e.g., Community, Aggregator)
Optional non-transitive
Not passed across ASes (e.g., MED, Weight, Originator ID, Cluster ID/List)
NLRI
Routing table info: prefix, prefix length, and associated path attributes

BGP uses AS-Path (list of ASNs) to detect loops. If a router sees its own ASN in a received route, it discards it.

AS-Override
Allows reuse of same ASN across different customer sites (e.g., Swisscom); rewrites ASN to avoid loop detection
OPEN

Establishes session; includes version, ASN, Hold Time, BGP Identifier, optional params

  • Hold Time: Heartbeat in seconds (default 180s, Cisco), reset by KEEPALIVE/UPDATE; 0 = session down
  • BGP Identifier: 32-bit Router-ID, manually set or highest loopback/active IP; used for loop prevention
KEEPALIVE
Sent every 1/3 of Hold Time (default 180s); ensures neighbor liveness (BGP doesn’t rely on TCP keepalive)
UPDATE
Advertises new routes, withdraws old ones, or both; includes NLRI (prefix + path attributes); can act as KEEPALIVE
NOTIFICATION
Sent on session error (e.g. hold timer expired); terminates session immediately
  • Purpose: Advertise specific prefixes to BGP peers (does not activate interfaces)
  • Prefix must exist exactly in the RIB (from static, connected, or learned route)
  • Attributes (e.g., origin, next-hop, MED) depend on how the route exists in RIB
  • BGP advertises only the best path for a prefix to peers, even if multiple exist
  • BGP maintains all received paths per prefix but advertises only the best one
  • Best path is installed in RIB; recalculated on:

    • Next-hop reachability change
    • Interface failure to eBGP peer
    • Redistribution change
    • New/withdrawn path received
  • Influence:

    • Outbound BGP policy inbound traffic behavior
    • Inbound BGP policy outbound traffic behavior
  1. Prefer highest Weight (Cisco-specific, local to router)
  2. Prefer highest Local Preference (global within AS)
  3. Prefer routes originated by the router (only small i in path, NH 0.0.0.0)
  4. Prefer shorter AS path (only length is compared)
  5. Prefer lowest origin type: IGP < EGP < Incomplete (I on origin)
  6. Prefer lowest MED (Multi-Exit Discriminator) (also called metric)
  7. Prefer external (EBGP) over internal (IBGP)
  8. For iBGP: prefer path with lowest IGP metric to next-hop
  9. For eBGP: Prefer oldest (more stable) path
  10. Prefer lowest BGP router ID
  11. Prefer path from lowest neighbor IP address
  • Filters control which routes are received/advertised
  • Used for security, traffic shaping, memory optimization
  • Tools: prefix-list (IP), filter-list (AS-path), route-map (flexible match/set)
  • 32-bit optional, transitive tag (e.g. ASN:value, 65000:100)
  • Used to mark routes for policy control across ASes
  • Can be added, modified, or removed at each hop
  • iBGP does not re-advertise routes between iBGP peers full mesh required (cause no loop prevention)
  • Session count = (e.g., 5 routers = 10 sessions, 10 = 45)
  • Solves iBGP full-mesh scaling by allowing selective route reflection
  • Clients only peer with RR; unaware they’re clients
  • RR Rules:

    • From non-client advertise to clients only
    • From client advertise to all (clients & non-clients)
    • From eBGP peer advertise to all (clients & non-clients)
  • Only the RR needs special config - clients remain unaware of route reflection. This eliminates the need for full iBGP mesh.
Transit
ISP provides full reachability (paid relationship)
Peering
ISPs exchange selected routes; equal relationship, usually unpaid
AS Path Filtering
to avoid getting transit: ip as-path access-list 10 permit ^ $
  • Facility where networks exchange traffic via BGP peering
  • Reduces transit costs, latency, and offloads upstream links
  • Members peer via shared switch fabric and a route server
  • Route server distributes routes but stays out of data path (NEXT_HOP unchanged)
  • Minimal policy control; one BGP session to route server
  • Simplified setup (one legal contract)
  • Direct BGP sessions between two parties (1:1)
  • May use public or private interconnects
  • Full policy control per neighbor
  • Requires one session and legal contract per peer

Examples: Equinix, SwissIX (non-profit)

Single-Homed
One ISP, one link (BGP or static); simple but no redundancy
Dual-Homed
One ISP, two links (or routers); redundancy within same provider
Multihomed
Multiple ISPs; improved redundancy and routing control, but avoid being a transit — advertise only customer-owned prefixes
Dual-Multihomed
Multihomed, but two links per ISP
Outbound TE (Local Pref)
Set higher local pref to prefer exit path; affects outbound traffic; highest wins
Inbound TE (MED)
Signal entry preference with MED; lowest wins; only works if peer honors it
Inbound TE (AS-Path Prepending)
Add own ASN multiple times on backup path; shortest AS-path wins
TE Limitation
AS controls outbound (e.g. local pref); inbound control limited — ISPs may ignore MED
TE with Aggregate
Prefer primary ISP with summarized routes; advertise specific prefixes on backup for failover
Aggregate Impact
Longest-match wins specific prefixes may steer traffic to alternate ISP; avoid provider-owned aggregates
  • Framework to validate that a prefix is legitimately originated by a specific ASN
  • Prevents route hijacking and accidental mis-originations (route leaks)
  • RPKI validates origin only — not the AS-path
Trust Anchors (TAs)
Root CAs of the 5 RIRs; issue certs for resource holders
ROA – Route Origin Authorization
Digitally signed object that authorizes ASN to announce a prefix (contains AS, prefix that AS can originate, max prefix length)
RPKI Validators
Software that downloads, verifies, and stores Validated ROA Payloads (VRPs)
Valid
Prefix + ASN match ROA
Invalid
Prefix found, but ASN mismatch or prefix too long
Not Found / Unknown
No matching ROA
  • Use TALs (Trust Anchor Locators) to fetch data from RIRs
  • Validate cryptographic signatures (via X.509 certs with RFC 3779)
  • Outputs VRPs; invalid objects are discarded
  • Update frequency: >=24h (recommended), ~30–60min (practice)
  • RRDP (RFC 8182) replacing rsync (uses HTTPS)

Event tracking, BGP hijack detection, Route leak detection, RPKI status check, Reachability tracking, AS path change tracking, AS path visualization

Pillars: Scalability, Speed, Availability, Security, Manageability overall Cost

MTBF
Mean Time Between Failures, MTTR: Mean Time to Repair
  • MTBF combined: parallel:
  • Lower MTTR + higher MTBF = more availability
  • Adds reliability, decreases MTBF but increases MTTR and complexity
  • Balance: resilience vs. manageability
Backup Paths
Duplicate devices/links on primary path, build extra links for redundancy, consider backup link capacity, consider failover speed
Load Balancing
ECMP, EtherChannel, Port-Channel
Access
Connect end devices, high port count, port security, L2, QoS marking
Distribution
L3, policy control, HSRP/VRRP, loop protection, small fault domain
Core
High-speed backbone, L3 only, no policy, scalable/redundant, no security
Collapsed Core
Combines Core + Distribution (small/medium networks)
  • Any-to-any; small networks, MPLS/LAN setups
  • Lacks scalability and control

Modern design using Underlay/Overlay

Underlay
transport (e.g., IP, MPLS)
Overlay
logical virtual topology (e.g., EVPN)
  • 100s/1000s of users, multiple buildings, one physical location
  • Multiple interconnected LANs, connected via Ethernet and Wireless
  • Ensures default gateway is always reachable, PCs can only have one
  • Enables fast failover during router failure
VRRP – Virtual Router Redundancy Protocol (Multivendor)
Shared virtual IP, real MACs per router, Master router handles forwarding
HSRP – Hot Standby Router Protocol (Cisco)
Shared virtual IP + virtual MAC, One active, others in standby faster
GLBP – Gateway Load Balancing Protocol (Cisco)
Load balancing + redundancy, Shared virtual IP + multiple virtual MACs, Roles: AVG: Answers ARP and sends virtual MAC addresses of AVFs AVF: Forwards traffic
ToR - Top of Rack
switches per rack, less cabling, easy expansions/exchanges per ‘rack’, scalable glass fiber, ideal for high service density (full racks). But more switches, more ports, more L2 Srv-2-Srv traffic, more STP to be managed
EoR - End of Row
1 switch per row, less switches, higher utilization of ports, switches all at one place, better L2 availability between racks. But more cabling
North–South
Between external networks (client-server, in/out DC)
East–West
Within DC (e.g., server-server, storage)
  • Access – Aggregation – Core (like Hierarchical)
  • Optimized for North–South traffic
  • Not ideal for East–West communication
  • Two-tier: Leaf switches (access) connect to Spines (core)
  • High performance, low latency
  • Scalable, ideal for East–West traffic
Use-cases 1-to-Many
Streaming, software updates, music-on-hold
Use-cases Many-to-Many
Gaming, VR, stock data, group chat
Benefits
Efficient bandwidth, lower server/CPU load, no redundancy, supports multipoint apps
Properties
UDP-based (no delivery guarantee, congestion control, or ordering) Apps must handle drops, duplicates, out-of-order packets
Source
Sends to group IP; doesn’t need to join
Receiver
Must explicitly join group to receive traffic
224.0.0.0 – 224.0.0.255
Link-local, TTL = 1 (not forwarded by routers)
224.0.1.0 – 224.0.1.255
Reserved by IANA, routable
232.0.0.0 – 232.255.255.255
Source-Specific Multicast (SSM)
239.0.0.0 – 239.255.255.255
Administratively scoped (private multicast space)

L3 routes between subnets; L2 floods within same subnet

L2 (Bridging)
MAC ffff.ffff.ffff; switches flood to all ports in VLAN
L3 (Routing)
  • 255.255.255.255: local broadcast, not routed
  • Directed broadcast (e.g. 10.1.1.255): can be routed if enabled
  1. Get IP: Example multicast IP address: 239.5.5.5
  2. Convert to Binary: 239.5.5.5 = 11101111.00000101.00000101.00000101 Take only the last 23 bits: 00000101.00000101.00000101
  3. Map to MAC Prefix: Use fixed MAC multicast prefix: 0100.5E Map last 23 bits to: 0100.5E.05.05.05
  4. Final MAC Address: 0100.5E05.0505

Purpose: Manages group membership for IPv4 multicast on each segment

IGMPv1
  • Basic join via query-response mechanism
  • No way for a host to leave a group explicitly
  • Router sends general membership queries every 60s to 224.0.0.1
  • If no report is received, router removes group after timeout
  • Receiver has no knowledge of the multicast source
IGMPv2
  • Adds Leave Group message (faster pruning of unused traffic)
  • General queries to 224.0.0.1 every 125s ‘Any hosts interested in any groups?’
  • Supports Group-Specific Queries e.g. when someone leaves group (‘anyone still interested in group xy?’), reducing broadcast overhead
  • Still source-agnostic: receivers don’t know who the source is
IGMPv3
  • Adds source filtering (Include/Exclude lists)
  • Enables Source-Specific Multicast (SSM) – receiver requests traffic only from selected source(s), no need for Rendezvous Point (RP) anymore
  • Adds support for application-level access control and filtering
  • Can also be used in ASM (Any Source Multicast), but mainly with SSM
Without snooping
multicast = broadcast on VLAN
With snooping
switch listens to IGMP messages and builds a forwarding table
Default
snooping is enabled; switch needs IGMP Query to operate
RPF Check
To avoid loops, verifies that a multicast packet arrives on the interface that a unicast packet destined for the multicast source would be forwarded out of.
  • Used in Shared Tree (*,G)(*,G) setups with PIM-SM
  • RP acts as the common meeting point for sources and receivers
  • RPF check is performed toward the RP (not the source)
  • Once the source is known, routers may switch to a Source Tree (S,G)(S,G)
Shared Tree (*,G)(*,G)
  • IGMP host sends a membership report (IGMP Join)
  • Router adds (*,G)(*,G) entry to multicast routing table
  • ** means ‘any source’ — source is unknown/unspecified
Source-Based Tree (S,G)(S,G)
  • Built when router receives an (S,G)(S,G) join/report from IGMP host
  • SS = known multicast source; GG = group
  • Router adds (S,G)(S,G) to mroute table once source is known
  • Relies entirely on the unicast routing table (RIB) for multicast forwarding decisions
  • Protocol-independent: works with static routes, OSPF, IS-IS, etc.
Push Model
Floods multicast traffic to all interfaces; then prunes where no receivers exist.
  1. Flooding: Source sends traffic forwarded out all multicast-enabled links using unicast RIB
  2. Distribution Tree: Initially includes entire network (shared tree rooted at source)
  3. Prune Messages: Routers without interested receivers send prunes upstream to remove themselves from the tree
  4. State Maintenance: Routers track source, receivers, interfaces to forward/prune per group
Pull Model
Multicast traffic is only sent where requested. Works with IGMP to detect interested receivers and uses unicast routing for forwarding.
  1. Join/Prune: Routers send explicit join/prune messages to request or stop receiving multicast for group (G) to other routers
  2. Forwarding: Routers only forward multicast packets for group (G) on interfaces from which explicit joins were received
  • Works with IGMPv1 or IGMPv2 receiver does not know the source
  • Receiver sends IGMP Join (*,G)(*,G) to its first-hop router
  • First-hop router forwards PIM Join (*,G)(*,G) hop-by-hop toward the Rendezvous Point (RP)
  • RP acts as a common meeting point for sources and receivers
  • Sources send multicast traffic to the RP via a PIM Register tunnel
  • All routers in the multicast domain must know the RP location
  • Receiver subscribes using IGMPv3, providing both source (SS) and group (GG) to the first-hop router
  • No Rendezvous Point (RP) required PIM-SSM builds only (S,G)(S,G) Shortest Path Trees (SPT)
  • No shared tree (*,G)(*,G) model used; SSM is source-directed
  • IANA reserved 232.0.0.0/8 for SSM in IPv4
  • Join messages are forwarded hop-by-hop toward the source to establish forwarding path
  • Uses unicast routing table (RPF) to maintain loop-free delivery
PIM Sparse Mode
Pull model – multicast traffic is forwarded only on request
PIM Dense Mode
Push model – traffic is flooded everywhere, then pruned
Sparse-Dense Mode
Supports both modes per multicast group; choice depends on RP availability

PIM is protocol-independent: it relies on the unicast routing table for RPF checks and to forward joins toward the source or Rendezvous Point (RP).

  • PIM Dense Mode (DM)
  • “Push” model
  • Floods multicast traffic throughout the network
  • Prunes back where traffic unwanted
  • PIM Sparse Mode (SM)
  • “Pull” model
  • Traffic sent only on request
  • Requires explicit Join messages

Usage Recommendation:

Dense Mode
Best for small or tightly scoped networks where most devices need multicast
Sparse Mode
Preferred for large-scale or distributed environments where multicast receivers are few or spread out

Issues of L2: STP, Max amount of VLANs (4094), Large MAC Address tables

VXLAN (Virtual Extensible LAN)
Tunnels Ethernet (Layer 2) over IP using MAC-in-UDP encapsulation (Port 4789). For flexible and scalable network segmentation.
VNID (VXLAN Network Identifier)
24-bit identifier (up to 16 million segments) that defines the VXLAN broadcast domain.
VTEP (Virtual Tunnel Endpoint)
Device (switch, router, or host) responsible for encapsulating/de-encapsulating VXLAN traffic.
NVE (Network Virtual Interface)
Logical interface on a VTEP used for VXLAN tunnel operations.
  • VXLAN establishes IP tunnels between VTEPs to extend Layer 2 networks across Layer 3 boundaries.
  • VXLAN enables both L2 and L3 VPN functionality in overlay networks.
  • VXLAN traffic is encapsulated in UDP (default port: 4789).
  • Ethernet frame VXLAN Header UDP Outer IP Header.
  • The VXLAN header contains the 24-bit VNID and flags.
  • Outer headers allow Layer 2 frames to traverse IP underlay networks.
  • 24-bit VXLAN Network Identifier uniquely defines VXLAN segments.
  • Replaces traditional VLAN IDs (12-bit), enabling ~16 million logical segments.
  • Used by VTEPs to map traffic into corresponding Layer 2 domains.
  • Connects the overlay (VXLAN) and underlay (IP) networks.
  • Types:

    • Software VTEP: Located on hypervisors using virtual switches.
    • Hardware VTEP: Located on routers/switches with ASICs for performance.
  • Interfaces:

    • VTEP IP Interface: Connects to the underlay network and handles encapsulation.
    • VNI Interface: Virtual interface per segment (like SVI); handles segregation of Layer 2 domains.
On control plane
happens proactively
On data plane
ad-hoc with flooding
  • Each VTEP maintains a VXLAN mapping table linking destination MAC addresses to remote VTEP IPs.
  • Learning via ARP:

    • Host H1 sends ARP request, switches learn H1′s MAC.
    • ARP request is flooded to H2.
    • H2 responds; switches learn H2′s MAC.
  • Learning Methods:

    • Static VXLAN: Manual MAC-to-VTEP mappings. Doesn’t scale well; BUM traffic is inefficient.
    • Multicast VXLAN: VTEPs join multicast groups per VNI. Scales better, offloads BUM replication. 20+ VTEPs = there is too much traffic, doesn’t scale well
    • MP-BGP EVPN: Modern solution using BGP as control plane. Dynamically learns MAC/IP info.

Overcome flood-and-learn limitations, doesn’t rely on data plane learning, utilizes robust control plane MP-BGP, works with different encapsulation techniques (VXLAN, MPLS), excellent scalability, l2 and l3 Support.

  • Enables protocol-based VTEP discovery and host reachability via control-plane learning
  • Reduces flooding by replacing data-plane learning
  • Extends BGP with multiprotocol capabilities (AFI/SAFI)
  • Uses MP_REACH_NLRI and MP_UNREACH_NLRI for route advertisement and withdrawal
Type 2 – Host Advertisement
Advertises host MAC (mandatory), optionally IP, along with L2VNI and optionally L3VNI. Used for MAC learning, ARP suppression, and host mobility. Sent when host connects to VTEP.
Type 5 – Subnet Advertisement
Advertises IP prefix + prefix length with L3VNI. Used for inter-subnet routing. VTEP redistributes connected/static/dynamic IP routes. Additional attributes: L3VNI, extended communities.
Host Deletion
When a host detaches, its ARP (default: 1500s) and MAC entry (default: 1800s) time out on the VTEP. Upon aging, the VTEP withdraws the host’s MAC/L2VNI and IP/L3VNI advertisements.
Host Move
When a host moves to a new VTEP, the new VTEP advertises updated reachability with a higher move sequence number. The old VTEP withdraws its entry, completing the migration.
  • Route Distinguisher (RD): Uniquely identifies VPN routes — allows same IP prefix to be used in different VPNs. Can be IPv4 or ASN Used to make routes unique in BGP (VPNv4/v6). Forms VPNv4 NLRI: RD:IPv4 prefix
  • Route Target (RT): Controls route import/export between VRFs. Used as extended BGP community.
  • How RTs Work:

    • A route is tagged with an RT when advertised by BGP.
    • Other VRFs import the route if the RT matches their import policy.
    • Allows overlapping or shared connectivity between tenants (e.g., shared services).
  • Format: Typically in the form ASN:nn or IP:nn, e.g., 65000:10065000:100, 1:101:10
  • Multiple RTs can be used: A route can have multiple RTs for flexible policies (e.g., one RT for VPN, another for shared services)
  • L2 bridging across L3 networks
BGP Control Plane
Distributes MAC info (no flooding)
VXLAN Overlay
Encapsulates L2 in L3 UDP (data plane)
Multi-Tenancy
via VNI segmentation
Redundancy
All-active multihoming, ECMP, fast convergence
  • Multi-tenant datacenter interconnects (DCI)
  • Extending L2 over WAN between remote sites
  • Scalable, segmented L2 fabrics
  • PEs learn MACs from local CEs (data plane)
  • MACs advertised via BGP (control plane)
  • Uses Route Distinguishers and MPLS labels
  • Remote PEs update L2 RIB/FIB with MAC and next-hop info
  • Enables seamless L2 across IP/MPLS backbone
  • EVPN uses MP-BGP with specific AFI/SAFI
  • Supports multiple route types and attributes
  • Unsupported routes are dropped by BGP
  • Route Reflector (RR) avoids full-mesh iBGP
  • RR reflects EVPN routes to other PEs
  • RR doesn’t participate in EVPN or pseudowires
  • RR needs only address-family l2vpn evpn
  • L2VPN RIB stores endpoint/VFI info for control plane
  • BGP_UPDATE from spines contain ORIGINATOR_ID (origin leaf)
  • Host connects to VTEP MAC learned locally
  • VTEP advertises MAC + L2VNI via BGP EVPN
  • MAC learning follows normal Ethernet semantics
  • BUM traffic, when Multicast underlay network is not used, handle multi-destination traffic (ARP unicast)
  • Avoids flooding ARP requests
  • VTEP queries control plane for MAC/IP/VNI mapping
  • If known direct unicast (no broadcast)
  • If IP/MAC unknown ARP sent via ingress replication
  • Replicated ARP request goes to remote VTEPs
  • Only correct host responds update reflected to all VTEPs
  • Future traffic uses updated BGP mapping
  • Multiple isolated routing tables on one device
  • Each tenant = one VRF traffic isolation
  • Supports independent policies per tenant
  • Key for scaling and multi-customer separation
  • Enables inter-VLAN routing inside EVPN
  • Avoids central gateway no ‘traffic tromboning’
  • Two modes: Symmetric and Asymmetric
  • Routing/bridging on ingress + egress VTEPs
  • Uses L3 Transit VNI (same in both directions), One L3 VNI per VRF (Tenant)
  • Scales well; clean separation of MAC and IP
  • Routing only on ingress, bridging on egress
  • VXLAN uses destination VNI in both directions
  • One L2 VNI per VLAN/Subnet
  • Simple config, but requires all VLANs/VNIs on all VTEPs
  • Same gateway IP+MAC on all VTEPs
  • Enables local default gateway for hosts
  • Supports mobility + optimal forwarding
  • Host sends ARP/ND to local VTEP
  • VTEP learns MAC/L2VNI and IP/L3VNI
  • Info is advertised in EVPN (control plane)
  • Label Switched Path (LSP) pre-determined path across MPLS network
  • advantage eBGP between PE-CE: No mutual redistribution, same routing process
  • encrypt traffic flowing over MPLS L3VPN backbone? yes (e.g. bank)
  • Unicast Reverse Path Forwarding (uRPF): checks source of each packet & verifies that source is in routing table
  • control plane (e.g. OSPF) to learn labels
  • iBGP used to exchange NLRI (RD, RT, IPv4 Prefix, NextHop &VPN Label) between PE
  • imp-null = networks are directly connected, no more label switching

Connects remote LANs via SPs for data/voice/video; key needs: bandwidth, control, design, resilience, mgmt.Requirements:

Bandwidth
App needs, peak usage, reserve for VoIP
Control/Security
Trust provider? No full control
Availability
Redundancy, SLA for failures
Mgmt
Inband vs out-of-band
Point-to-Point
Leased L2 line (Ethernet); monthly fee; private circuit
Dark Fiber
Physical fiber lease; costly; ISPs prefer selling lambdas
Connection-oriented
Predefined path, packets carry IDs (ATM, Frame Relay)
Connectionless
No setup; full address in each packet (Ethernet, MPLS VPN)
CE - Customer Edge
no knowledge of MPLS, no labels; connected to PE
PE - Provider Edge
connected to CE; runs iBGP and LDP; uses VRFs
P - Provider or LSR(Label Switch Router)
inside MPLS VPN, no CE connection; forwards labels
RIB (Routing IB (Information Base))
Learned prefixes from routing protocols
FIB (Forwarding IB)
Built from RIB; only best routes for forwarding
LIB (Label IB)
All label mappings; 1 label per prefix
LFIB (Label Forwarding IB)
Built from LIB; used for actual forwarding decisions. (L)FIB only contains currently best LSP (decision: Routing Protocol)
Control Plane
Builds routing/label tables (RIB, LIB)
Data Plane
Forwards packets (FIB, LFIB); pushes/swaps/pops labels (see below)

4-byte header before IP:

  • Label (20b) – actual MPLS label
  • EXP (3b) – QoS/CoS, Now called Traffic Class (TC)
  • S-bit (1b) – bottom of label stack indicator, 1 = True = last label before IP header
  • TTL (8b) – time-to-live (eq toual IP TTL)
  • ingress PE router decrements IP TTL field & copies packet’s IP TTL field into new MPLS TTL
  • P routers decrements MPLS TTL
  • egress PE router decrements MPLS TTL, pops final MPLS header, copies IP TTL
  • traceroute receive ICMP Time Exceeded, Provider doesn’t want to expose MPLS network to fix: disable MPLS TTL propagation (on PE), PE set MPLS TTL = 255, egress leaves PE original IP TTL unchanged
  • = MPLS network appears as single router hop from TTL perspective
  • Distributes labels to neighbors using control plane
  • Hello messages: Sent via UDP (Port 646) to 224.0.0.2 to discover neighbors
  • TCP (Port 646) connection is used to exchange label bindings (prefix to local label)
  • Routers advertise all local bindings after TCP session is up
  • Label mapping used to build LIB LFIB
  • LDP router ID must be reachable (via routing table)
  • Each router manages local labels independently
  • VPN traffic uses 2 labels (stacked):
Outer label
Transport label (LDP); identifies LSP between ingress/egress PE
Inner label
VPN label (MP-BGP); identifies customer VRF
Push
Ingress PE; classify and label packets
Swap
P router; replaces label, forwards based on new label
Pop
Egress PE removes label; sends original packet to CE
Penultimate Hop Popping
MPLS feature, penultimate router removes the outer MPLS label before forwarding to egress PE, default enabled, ISPs disable it
  • VRF = Virtual Routing and Forwarding table (Virtual router inside a PE. Maintains isolated RIB + FIB per customer.)
  • Stores separate routing info per customer (VPN isolation)
  • Exists per MPLS-aware PE router; one per attached customer
  • Contains: RIB, FIB, and separate routing process per CE
  • 64-bit RD + 32-bit IPv4 = 96-bit VPNv4 prefix
  • transferring VPNv4 between PE router Multiprotocol iBGP (MP-iBGP)
MPLS
Label-based forwarding (fast, scalable)
LDP
Distributes labels for MPLS paths
IGP
Underlay routing (e.g., OSPF, IS-IS)
MP-BGP
Extends BGP to carry VPNv4/v6, EVPN routes
Control Plane
LDP/RSVP-TE adds complexity
Scalability
Per-flow/path state limits growth; LSP and signaling overhead increase rapidly
OAM
  • Troubleshooting: Traceroute less useful in MPLS; labels hide topology
  • Traffic Eng.: LDP lacks TE; relies only on IGP cost
Fast Reroute
Limited coverage; microloops possible
Source routing
Sender defines full path using Segment List (Segment = Instruction)
SID = Segment Identifier
Each SID = 1 instruction (e.g., forward via ECMP, specific iface, or to a service)
State in packet
No per-flow state in network; intermediate nodes follow SID instructions
No new control plane
Uses existing protocols (OSPF, IS-IS, BGP) width extensions; no LDP or RSVP-TE needed
Segment List
Ordered SID list carried in packet header; defines full route
Simple but powerful
Enables TE, fast reroute, policy routing
Push
Insert SIDs into packet; set active SID (top of list)
Continue
Active SID not yet completed; keep processing it
Next
Current SID completed; activate next SID in list
Global Segments
Known and supported by all SR nodes in the domain, Installed in forwarding tables across the network (e.g. “Forward packet according to shortest path to Node1”)
Local Segments
Defined and installed only on originating node, Not forwarded by others, but must be understood network-wide (e.g. “Forward packet on interface to Node2”)
Global segments
Defined in the SR Global Block (SRGB) and should be consistent across all nodes; local segments are defined in the SR Local Block (SRLB) and are specific to the local SR node.
IGP Prefix Segment
Global SID tied to IGP prefix (multi-hop); all nodes install forwarding entries
IGP Node Segment
Global SID for a specific node (shortest-path forwarding)
IGP Anycast Segment
Global SID for a group of nodes; traffic sent to nearest
IGP Adjacency Segment
Local SID; direct link to neighbor
L2 Adjacency SID
Local SID for Layer-2 segment (e.g., Ethernet link)
Combining Segments
  • End-to-end paths can mix IGP and BGP segments
  • Traffic to BGP Anycast more ECMP in data centers
  • Reuses existing MPLS data plane — no hardware change needed
  • Segments = MPLS labels; Segment List = label stack (top = active)
  • Segments distributed via IGP/BGP; no LDP required (interoperable if needed)
  • Supports both IPv4 and IPv6 networks

Benefits: Simplification (removes protocols, simple operations, admin and mgmt), enhanced Traffig eng. (Delay, Bandwidth, Packet Loss, TE metric, Controller, Source-Node), Seamless deployment, Robust, Network Innovation (zB Container Networking)

Source Routing
Balances distributed intelligence with centralized optimization
TI-LFA
Fast reroute technique; protects against link/node failure with microloop avoidance and no pre-calculation dependency
Traffic Engineering (TE)
Optimizes network performance by analyzing and controlling data flow to reduce congestion and improve QoS
Service Function Chaining (SFC)
Chains SDN services in order; automates traffic between VNFs and optimizes routing for performance

Internet is best effort: no guarantees, no QoS; all traffic treated equally (net neutrality); simple, scalable, but no delivery/order assurance or prioritization

QoE – Quality of Experience
Perceived service quality from user perspective
Route Pinning
Keeps flow on a fixed path to prevent oscillation (don’t switch immediately to “better” path)
Latency / Delay [ms]
Time for packets to travel src dest (Voip < 150ms)
End-to-End Delay
Total time sender to receiver
One-Way Delay
From first bit sent to last bit received
Delay Components
Transmission delay (time to push onto link), Processing delay (lookup, queuing), Propagation delay (physical travel time)
Jitter [ms]
Variation in delay between packets, caused by re-routing/queuing (Voip<30ms), Calc: no queue - queued delay
Throughput
Rate of successfully delivered data
Packet Loss [%]
Dropped packets due to congestion or errors (Voip < 1%)
Bandwidth [Gbit/s]
Maximum transfer capacity of a link
FIFO (First-In First-Out)
Basic, no prioritization
Priority Queuing (PQ)
Multiple queues, serve highest first; others may starve
Round-Robin
One packet per queue in turn (fair, but ignores priority)
Weighted Fair Queuing (WFQ)
Round-Robin with weights, e.g., 2 packets from Q1, 4 from Q2
Class-Based WFQ (CBWFQ)
WFQ with user-defined classes, queue limits, max bandwidth guaranteed or max % of bandwidth (logical queues based on IP Precedence only)
Low Latency Queuing (LLQ)
Adds strict priority queue (priority class) to CBWFQ for delay-sensitive traffic (e.g. voice) (based on IP Precedence, DSCP, src, port, protocol…)
Tail Drop
Drops packets when queue full; huge interruption of traffic same as no connectivity
TCP Global Sync
Many TCP flows back off and restart simultaneously link underutilization
TCP Starvation
TCP slows down after drops, UDP doesn’t queues filled with UDP, TCP squeezed out
RED
Random early drops before full queue to prevent global sync and TCP collapse. Dropped TCP segments cause TCP sessions to reduce their windows sizes
WRED
RED + DSCP/EXP-based drop logic, prioritizes higher-marked traffic
DSCP / EXP
DSCP (6-bit in IP header) marks packets for QoS; used in DiffServ for classifying traffic. EXP (3-bit in MPLS label) serves same purpose within MPLS networks; often mapped from DSCP.
Policing (Inbound mostly)
Drops packets that exceed configured rate limits
Shaping (Outbound)
Buffers packets to smooth traffic bursts and conform to profile
Best Effort
No guarantees, all traffic treated equally (follows Internet neutrality)
Integrated Services (IntServ)
End-to-end QoS, per-flow resource reservation, precise but not scalable (uses RSVP)
Differentiated Services (DiffServ)
Class-based, scalable approach using marking (e.g., ), no hard guarantees
L3 Marking
ToS byte DSCP (6 bits) + IP Precedence (3 bits)
L2 Marking
Dot1q header 802.1p CoS bits
Class Map
Define traffic classes (e.g., match voice or video)
Policy Map
Define actions for each class (e.g., limit, shape, priority)
Service Policy
Apply policies to interfaces or directions (in/out)
Origin Server
Central content source (original files), usually in a datacenter
Edge / CDN Server (POP - Point of Presence)
Geographically distributed, caches content
DNS Infrastructure
Directs users to optimal edge server (e.g. via Geo-Routing)
Latency Reduction
Nearby edge servers reduce round-trip time
Availability
Failover and redundancy in case of node failure
Scalability
Handles traffic spikes via load balancing
Cost Optimization
Reduces backend and transit load on origin
DDoS Protection
Edge servers absorb attacks not all traffic on one server
Global Load Reduction
Less long-distance traffic across the Internet
  • Decides which edge server should serve a client request
  • Goal: Best performance (e.g. proximity, load, responsiveness)
  • Each edge has a unique IP
  • DNS server picks closest/optimal edge server based on:

    • Resolver IP location (not user!)
    • GeoIP DBs (MaxMind, IP2Location), load, latency, business rules
  • Limitation: DNS Resolver != user location can cause wrong choice
  • Resolver includes part of client IP in DNS request (e.g. /24 subnet)
  • Authoritative DNS makes better decision based on actual client region
  • Improves accuracy without revealing full IP
  • Same IP (e.g. 7.7.7.7) advertised from multiple locations
  • BGP routing decides which path is “best” (AS-path, local pref, etc.)
  • No DNS logic or per-client decision — pure BGP convergence
Pros
Fast failover, simple, no app logic needed
Cons
Less control, BGP != best latency, route flapping risk
  • Caching is controlled via HTTP headers between clients, proxies, and servers
Cache-Control
Main directive (no-cache, no-store, max-age, must-revalidate, etc.)
Expires
Absolute expiration time (older method, replaced by Cache-Control)
ETag
Validator tag (version/hash), used with If-None-Match
Last-Modified
Timestamp used with If-Modified-Since for revalidation
Age
Time (in seconds) since response was fetched from origin
Validation
Client uses ETag or Last-Modified; server returns 304 if unchanged
AD
Inter-protocol choice (e.g., OSPF vs RIP) lower wins.
Cost/Metric
Intra-protocol choice (e.g., OSPF path A vs B) lower wins.
Routing Preference Order (across protocols)
  • Most specific prefix
  • Lowest Administrative Distance
  • Static default route
Administrative Distances
(Smallest Administrative Distance wins)
Protocol Distance
Connected 0
Static (Interface) 1
Static (Next Hop) 1
BGP External 20
EIGRP Internal 90
OSPF 110
ISIS 115
RIP v1/v2 120
EIGRP External 170
BGP Internal 200

* = Would not be there if it was L2 VNI BGP Routing Table

Route Distinguisher
172.16.255.101:32777
Route Type
2
MAC Address Length
48
MAC Address
5254.00f8.29a8
*IP Address Length
32
*IP Address
10.10.0.100
L2 VNI
30010
*L3 VNI
50000
Remote VTEP IP Address
172.16.254.101
L2 Route Target
1:10
*L3 Route Target
65000:50000
leaf-03# show bgp l2vpn evpn 10.10.0.100
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 172.16.255.101:32777
BGP routing table entry for
[2]:[0]:[0]:[48]:[5254.00f8.29a8]:[32]:[10.10.0.100]/272, version 19897
Paths: (1 available, best #1)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path, no labeled nexthop
Imported to 2 destination(s)
AS-Path: NONE, path sourced internal to AS
172.16.254.101 (metric 81) from 172.16.255.1 (172.16.255.1)
Origin IGP, MED not set, localpref 100, weight 0
Received label 30010 50000
Extcommunity: RT:1:10 RT:65000:50000 ENCAP:8 Router MAC:5254.00ca.69ae
Originator: 172.16.255.101 Cluster list: 172.16.255.1
leaf-03# show bgp l2vpn evpn 10.10.0.100
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 172.16.255.101:32777
BGP routing table entry for
[2]:[0]:[0]:[48]:[5254.00f8.29a8]:[32]:[10.10.0.100]/272, version 19897
Paths: (1 available, best #1)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path, no labeled nexthop
Imported to 2 destination(s)
AS-Path: NONE, path sourced internal to AS
172.16.254.101 (metric 81) from 172.16.255.1 (172.16.255.1)
Origin IGP, MED not set, localpref 100, weight 0
Received label 30010 50000
Extcommunity: RT:1:10 RT:65000:50000 ENCAP:8 Router MAC:5254.00ca.69ae
Originator: 172.16.255.101 Cluster list: 172.16.255.1
3-tier campus network
Default Gateway (D), QoS marking (A), STP Root Port (A), HSRP, VRRP or GLBP (D), “Simple” (C), OSPF Totally Stub Area (D), High availability (C)
Campus Design
used to reduce size of L2 domain: EVPN, MPLS
MP_REACH_NLRI
Next hop, MAC Address
  • VXLAN is a data plane technology which encapsulates Ethernet frames in UDP datagrams to tunnel layer 2 frames over a layer 3 network.
  • The underlay network is unaware of VXLAN devices that connect to the physical switches are unaware of VXLAN.
  • A route distinguisher is used to uniquely identify a route in combination with the destination prefix.