The Genetic Lottery

I finished this book today. And to be honest I have struggled a bit with it from the scientific aspect to the social one.

DNA is our instruction set but we depend on the our environment too for developing those instruction. But that DNA is a random mix from our parents. So we could look like them but we are not them, we are unique. Each person is unique.

It is difficult to accept that whatever I have achieved is based on genes and environment, and luck. Many times I tell myself when looking at the mirror that I am the luckiest person in the world, so I came into terms with that point. As well, that doesn’t mean that we are pre-programmed and there is nothing to do, that there is no “free” will. This is a tough philosophical topic and again, I got a hard time reading about it in the book but the author says there is elbow room between our genes/environment and what we become or do with it. So it was a bit of a relief as your ego is not totally destroyed.

People vary in ability, energy, heath, character, and other socially important traits, and there is good, although not absolutely conclusive, evidence that the variance of all these traits is in part genetically conditioned. Conditioned, mind you, not fixed or predestined”

Theodosius Dobzhansky

So in the book, there is a strong emphases to the fact that the genetic lottery defines much of the inequality in the society we live. We live in the system where educational success, work success, etc follows one standard. The book wants to change the idea that one system fits all is not possible, as we are all different so we need/have different ways to evolve, learn, etc. So for achieving a more fair society we need to provide a different education method to children that can’t learn/develop in the “standard” way. For doing that, we need to have a better understanding of our DNA. And obviously that is a bit scary because it can be misused by companies, governments, etc. That means a change in mentality in social politics. It is kind of being more “socialist” instead of a more cut-through capitalist society. And I think that makes sense, the social improvements we have from the “socialist” politics like free education, free health system, holidays, worker rights, etc has improved our societies compared with the earliest ones from the Industrial Revolution. But at some point, the inequality gap is increasing again. So accepting that genetics is a lottery and we need a new approach to close that gap, is the first step.

These are two pictures that helped me to understand what they book was trying to achieve. Equality is giving to all, the same. Equity is given each one what it needs, and that is win-win situation for all.

The author refers to three types of positions when dealing with social policies:

  • eugenic: we are defined by our genetics and we should do nothing to change it.
  • genome-blind: ignore genetics difference, waste time/money without really improving the inequality gap (or making it worse)
  • anti-eugenic: use genetic data to search for effective processes that improve people’s live and reduce inequality in society.

Something that surprised me is the mention that there are deaf couples that wanted their children deaf as they dont see deafness as defect, and they used genetic help for that. I struggle to accept that is correct.

Still got the feeling that I am not explaining all properly or if I have understood everything properly. I need to take notes, highlight things, etc. How difficult is to have a pencil around when most of the time I am reading at home 🙂

Chocolate Fondant

Based on this video:

Ingredients:

  • 130g dark chocolate (+70%)
  • 130g butter + extra for greasing
  • 3 eggs
  • 1 egg yolk
  • 100g caster sugar
  • 70g plain flour
  • 15g cocoa powder + extra for dusting
  • 2.5g baking powder
  • 4 aluminium moulds

Process:

  • Pre-heat oven at 180C
  • Melt chocolate and butter at “baine marie”. Let it cool down a bit to use later.
  • Grease with butter the moulds and dust the sides with cocoa powder
  • Make the “raw sabayon”: whisk the eggs and sugar until pale in color.
  • Fold the chocolate (be sure it is not too hot) into the sabayon. Be sure the mix is uniform and there are no lumps.
  • Sieve flour, cocoa and baking powder. Then add to the mix bit by bit, folding with a spatula and checking there are no lumps.
  • Fill the moulds at 90% aprox. They will raise in the oven
  • Bake the moulds at 180C for 9 minutes aprox.
  • Use a tooth stick to be sure they are still creamy inside. The idea is the chocolate should come out once opened.
  • Unmould and present on a plate with a bit of fresh mint, strawberries. Dust with a bit of icing sugar.
  • Be sure you serve it hot! (optionally you can add a ball of vanilla ice cream).

Taste

I decided to buy and read this book one day while checking out some stands in a library. I am not keep of biographies of “famous” people. It rang the bell that he was an actor so I was surprised to find a book about food. Checking the cover there was a sentence that bought me “he grew up in an Italian American family that spent every night around the table”. That’s what I like, build a culture around food, preparing and enjoying it.

The book is not a recipe book, it is a history around food. It has some funny moments and more important thing, some recipes to try. And will do my best to do so.

As well, there is an important reference to “Big Night” and actually watched it this weekend. Nothing really special but I got his point about the food and the enjoyment about it. And that reminded me to the movie Chef, I liked it more.

There is another important reference to “Julia Child” who was a famous TV cook/chef/presenter. Some weeks ago having lunch with friends somebody mentioned her and a movie about her life: “Julia and Julie” while talking about cooking and nice food. And I am trying to convince myself to watch it.

A good point is the critic to people on TV tasting food and always saying that is amazing when they haven’t had the time to swallow! I always thought it was a bit fake, so I am glad I am not the only one thinking that.

I am surprised by the outcome of his recovery of tongue cancer. His metabolism and allergies were “reset”. So no more lactose issues and improved digestive system.

Anyway, it was entertaining and I hope I can take some recipes for my own repository.

The New Silk Road

I finished the second part of “The Silk Road” book. It is mainly focus between 2005-2009 and the Trump administration. It is a bit of the same but more up to date with the push from China to “build” the new silk roads and the challenges, like USA rejection. It shows all the chaos caused by Trump and how easy made the life of Russia, Iran, China etc. The same for using tariffs to stop trading as middle-long term, the other side is going to win. This reminds me the “Chip wars”. As well, there are countless examples of agreements for investment between countries of the Middle East and Asia. Something that, from a western point of view, we dont really have visibility or ignore plainly. The summary, based on the author, is the world center is moving to Asia although in EU/USA we dont want to believe it or look at it. I think it is time of opportunities but nothing is perfect. Personally, each empire lasts less and less so how much we will see of China as leader at some point.

The Power Of Regret

Just the day before starting reading this book, I was in a place that at some point were putting the best hits of Edith Piaf (although I prefer this). Funny enough, the book nearly started referring to the song about “regrets”…. and how that is not a good advice.

One of the best sentences is “Feeling is for thinking” and “thinking is for doing”.

I dont believe in the absolute sentence of “no regrets” neither “being blocked by regrets”. At the end of the day, the virtue is in the middle as Aristotle said

BTW, ChatGPT confirms it 🙂

Aristotle's idea of virtue being the "middle ground" or the "golden mean" is discussed in his book "Nicomachean Ethics".

In Book II, Chapter 6, Aristotle writes: "Virtue, then, is a state of character concerned with choice, lying in a mean, i.e., the mean relative to us, this being determined by a rational principle, and by that principle by which the man of practical wisdom would determine it."

He goes on to explain that virtue is the mean between two vices, one of excess and one of deficiency, and that the right amount of any given quality or behavior depends on the situation and the individual involved.

Source: https://www.gutenberg.org/files/8438/8438-h/8438-h.htm#chap02

Regrets has to be a way to improve ourselves, to move forwards. All of us make mistakes, so deal with it.

I regret mainly my lack of balls with girls and not standing up for myself for many years. But I am trying to learn about it: girls, feelings, relationships, etc. I didnt know better neither so as most things, “learning on the job training”.

At some point, I felt the data provided was a bit “weak”. Based on “Calling bullshit” I thought the stats were not really representative for the whole world.

In general, I think the moral of the book has more impact in the western cultures based on most of the quotes from people. But as said, earlier, regret is something that makes us human, so very likely it will affect anybody wherever you are from.

The author divide regrets in four sections:

  • Foundation Regrets: family, education, work, health, money, etc
  • Boldness Regrets: fail to jump for the opportunity: chat with a girl, work abroad, trip to X place, etc
  • Moral Regrets: theft, infidelity, betrayal, etc
  • Connection Regrets: meaningful relationships, etc

For example, in most cases we regret “not doing things” than “doing something”. That lack of action comes from our nature of risk averse.

In summary, the book is easy and quick to read. And it is good reminder of what regret should be, look at the past, learn from it, and move to the future.

Forward The Foundation

So this is the end of Foundation series. I liked it, nice twists in the story, some drama and deaths. Robots and mental powers are the key! So all is connected at the end. Although the last sentence of the book looks like to leave something open. To be honest, I dont like the typical “Hollywood End” where everybody is happy ever after.

Dune6: Chapter House

This is the last book from the original Dune series. Thinking of the first books, looks like a different world although there are references to the beginning. The end is quite open so you can think this is can continue in more books as with the time there is always a new plot/drama. Although there are many things I dont understand. The couple of “Face Dancers” in contact with Duncan? I like the references to Van Gogh pictures like “Thatched Cottages at Cordeville“.

I would never think that Duncan would be present in all books and being a critical character.

But at the end, all is about love and the repression of it like Bene Gesserit do.

And quite moving the last words from the author about his wife death.

Calling Bullshit

This is an interesting book about the flooding of data we need to go through and the difficulty to figure out what is true or not. And I feel it many times you read something “scientific” with many numbers, stats, etc and you kind off believe that has to be true. And those new pharmaceutical drugs that are so amazing or latest paper with a dramatic breakthrough.

Interesting points:

With the hype about machine learning, understanding the algorithm may be out of our understanding but the critical thing is the training data fed into that algorithm. GIGO = Garbage In, Garbage Out. Because the training data is “biased” or not relevant, imagine how is going to be the result.

Correlation is not causation. This is a difficult topic becase we see very easily causation everywhere or find one that matches our theory.

Goodhart’s law adapted to normal people: “When a measure becomes a target, it ceases to be a good measure”. That’s so true. Think of your performance review at work, the GPU tests, etc.

Regarding the stats, it is important to pay attention to the axis: start at zero? same proportions/scales?, be mindful of 3D stats, “ducks” decorate or obscure the meaningful data,

If it is too be good to be true/false, then it isn’t.

“mathiness”: formulas and expressions that look like “good” math but they lack logical coherence and formal rigor. This is very typical for things that are not really easy to quantify (ie healthcare quality management), how things are measured?, unit? etc

One of my favourite examples is the paper about the fMRI on the brain of dead! salmons when showing picture of people showing different types of mood. This was important to clarify that MRI images maybe are not as perfect as you expect. I assume that nowadays that has improved….

Prosecutor’s fallacy: You need to prove you customer is innocent although there is DNA match in a database. There is an error rate of 1 in 10,000,000.

MatchNo Match
Guilty10
Innocent550,000,000

You are the defence prosecutor and you want to focus in the left column (blue). That means that there are 5 chances out of 6 (5+1) that your client is inocent having a DNA match.

p-values: null hypothesis and alternative hypothesis. Most papers are based on a p-value of 0.05 (now you have Goodhart’s law)

Refuting Bullshit:

  • Use “reductio ad absudum”
  • Be memorable (dead salmon example)
  • Find counterexamples (immune system theory vs trees)
  • Provide analogies (74M$ -> 2sec faster)
  • Redraw figures
  • Deploy a null model

I leave a lot of things behind that I dont remember but it is worth the reading (and more than once)

In summary, the goal is to be “smart” sceptic and dont believe everything throw to us.

Other Minds

This is a book recommended by a good friend. He had watched some documentaries about octopus and was amazing. So I was curious about it and gave it a go.

The book is not just about octopus and cuttlefish but about intelligence based on the evolution of our nervous systems. It seems the octupus developed their nervous system in a different way from mammals. And even between cephalopods seems to have evolved in more than one way.

Another things I was quite surprised is the life span of the octopus is around 2 years! There is a part of the book quite interesting about ageing. Why are there organism like sequoias that can live over 3000 years and then octopus with a very advance nervous systems only last 2? I need to re-read it again. As per my understanding this is related to the our evolution, we reap the benefits quickly but there is always drawbacks that turn up later.

There were parts were I didnt engage enough but I think it was worth it just for the two points above.

Building DCs with VXLAN BGP EVPN

VXLAN/EVPN is a technology that I am trying to understand in more detail and depth since I started my current job. All my networking theory/knowledge comes from books so this one is a good base. Keep in mind that is a bit “old” as it was released on 2017. In the last months I have built my confidence with VXLAN/EVPN via some issues and testing designs (Arista EVPN L3 Gateway).

As I used to do in the past, I made notes of the book and I will put them here too so it is a good refresh.

1 INTRO

STP for DC issues:

  • Convergence: tree recalculation
  • Unused links: follow tree…
  • Suboptimal forwarding: follow tree…
  • No ECMP
  • Traffic storm (no TTL in L2)
  • Scale: only 4k vlans (12 bits tag)

Leaf/Spine improvements as per above (Clos Network):

  • Scalability
  • Smple
  • REsilience
  • Efficience
  • No oversubscription
  • ECMP
  • Deterministic latency
  • scale out -> + leaves // scale up (+bw) -> + spines

BUM = Broadcast, Unknown Unicast, Multicast

Fabric Path = MAC in MAC (technology earlier to vxlan). Proprietary to Cisco

VXLAN = Standard, MAC in IP/UDP, VNI = 24 bits -> 16M valns! Flood & Learn (F&L): each network has its L2VNI + multicast group (control-plane). BGP EVPN doesnt need F&L, so better control-plane

Border Leaf or Border Spine = for external connectivity.

Route-Reflector (RR) or RP (multicast) in Spine

VXLAN – dataplane / EVPN – control-plane

2 BASICS

In DC, most traffic is east-west.

Limit vlan 12 bits (4k) + multi-tenancy? -> overlay: (indirection) abstraction of existing network tech + extend classic network capabilities. (David Wheeler: problem -> indirection)

Underlay -> increase MTU! (overlay overhead)

Handle BUM in underlay? Multicast (PIM – VNI mapped to multicas group = dst IP outer header) or Ingress Replication (head-end replication)

VTEP = Edge device, encap/decap, build overlay

VNI – Virtual Network ID

VXLAN header: original inner 802.1q header of l2 frame is removed and mapped to a VNI, to complete vxlan header. UDP dst port = 4789, src port = based on inner header

overhead = 50 bytes (14 (l2) +20 (l3) + 8 (l4) + 8 (vxlan) = 50). 54 bytes if optional 802.1q tag (4bytes) is added.

ECMP: 5-tuple: src IP, dst IP, proto, src port, dst port -> but only src port changes in vxlan -> that;s the entropy!

F&L: Doesnt scale, no control-plane. Multicast replication has a limit. So Ingress replication (IR) . Every VTEP must be aware of other VTEPS in same VNI. Source VTEP has to replicate each packet to each VTEP.

BGP EVPN: solution for F&L. Eliminates unnecessary flooding. EVPN carries host MAC, IP, network, VRF and VTEP info. If a VTEP that detects a host and doesnt send an EVPN update -> remote VTEP doesn’t age out entry for that host.

IMPORTANT! Broadcast traffic (ARP, DHCP, etc) is still flooding!!!

Tenat += VRF

eBGP = implicit next-hop-self for originated

RD = 8 bytes

— type 0: 2 byte ASN + 4 byte value

— type 1: 4 byte IP + 2 byte value

— type 2: 4 byte ASN + 2 byte value

RT: control import/export prefixes in VRF (auto derivation ASN:VNI)

VXLAN EVPN: RFC7432. Focus NVO (Network Virtualization Overlay) -> Route-Types:

— type 2: MAC/IP (host: /32 or /128). Sent once host is learnt. Info about IP is optional. Ext Community: RMAC = Router MAC = source VTEP.

— type 3: Inclusive multicast ethernet tag route -> create distribution list for IR. Generated and sent out immediately as a VNI is configured. Need ASIC support !!!

— type 5: IP prefix route (L3VNI)

“show bgp l2vpn evpn MAC” ->

[Route Type]:[Eth Segment ID]:[Eth Tag Id]:[MAC lenght]:[MAC]:[IP prefix]:[bit count]

bit count:

— 216: type2 only MAC

— 272: type2 IP/MAC

— 224: type5 IPv4

ExtCommunity: ENCAP:8 -> it is VXLAN

ARP request triggers an IP-MAC.

MAC learnt via BGP is not aged-out via normal process: only BGP delete message deletes the MAC

L3 learnt: depends on hw (FIB)

— HRT (Host Route Table): only for /32 or /128 (big)

— LPM (Longest Prefix Match): TCAM (small)

FIB: [Bridge Domain, RMAC] -> BD maps to L3VPN and RMAC maps to dst VTEP MAC

Type5:

— advertises first-hop-routing: prefix where VTEP is default gateway (IP anycast gateway)

— advertise prefixes from other protocols

Host detection:

ARP aging = 1500 sec -> If ARP request fails -> type2 deletes are sent. ** Even when ARP entry is deleted, MAC only type2 is still in BGP EVPN CP until MAC aging expires (1800 sec) (sent BGP withdraw)

ARP aging < MAC aging -> avoid unnecessary flooding

Host mobility: VM to send GARP (gratuitous ARP): Highest MAC mobility seq ext community => Best

3 FORWARDING

  • Handling BUM or multidestination traffic:

— MC replication in the underlay:

Use MC in UL => 1xL2VNI = 1xMC IP => problem: 2^24 VNI available -> is a stretch for MC IPs available, sw/hw limits (1000’s PIM, IGMP, etc) -> doesnt scale

How to manage VNI-MC mapping? VNI randomly assigned to MC or MC is localized for a set of VNIs.

— Ingress Replication (IR = HER = Head-End-Replication): Unicast mode. VTEP makes n-1 copies of BUM packet and send them as unicast to the n-1 VTEPs of that VNI

replication list? dynamic with BGP EVPN. type 3 (IMET). Replication list is updated when config of a L2VNI in a VTEP occurs –> Big overhead compared with MC.

  • ARP Suppression:

— Use ARP snooping. ARP request -> populates BGP EVPN CP. 1) If VTEP knows dst MAC, then responds (this is ARP suppresion). If not, using IR or MC, sned ARP to all vTEP. Egress VTEP that has the host connected, receives ARP reply, makes a EVPN Typ2 announce to all VTEPs + send ARP reply (as unicast) to avoid any delay.

NX-OS uses MC for BUM by default = flood L2 locally and to all VTEPs in VNI.

MC group for overlay != MC group for underlay.

IGMP snoopnig (if supported), optional solution, it doesnt depend on hw, just sw.

  • Distributed IP Anycast Gateway: Implemented at each VTEP, reduces traffic transit. Anycast = ne to the nearest association.

Anycast GW VTEPs share the same MAC -> prevent black-holing for host-mobility (AGM = Anycast GW MAC address). Same AGM is used in all default gw IPs -> no hair-pining.

  • Integrated Routing and Bridging (IRB)

— Asymmetric:

bridge-route-bridge at local VTEP

traffic eggresing towards a remote VTEP uses a different VNI than the return traffic from the remote VTEP

requires consistent VNI config in all VTEPS

— Symmetric (NXOS):

bridge-route-route-bridge

egress and return use same L3VNI. L2VNI are not used for routing in symmetric IRB

Not all VNIs need to be configured in all VTEPS but for a VRF, L3VNI needs to be configured in all VTEPs.

Inter VRF routing -> route leaking -> external router or firewall.

  • End Point Mobility:

BGP extended community = MAC mobility seq. Higher wins. With each move, seq++

End point move triggered by (update via BGP EVPN CP)

— Reverse ARP: only advertises new MAC

— Gratuitous ARP: adverts new MAC/IP

VTEP verifies if endpoint has actually moved.

  • VPC: MCLAG + LACP: Cisco -> vPC: 2 devices: 1 peer link + 1 keepalive link.

— PIP: primary IP. individual per VPC member per VTEP

— VIP: secondary IP in nve interface. Virtual IP = anycast VTEP at VPC level. It is the next-hop used in EVPN typ2/5. ** anycast VTEP != anycast gw

–orphans: blackhole if using VIP -> solution: “advertise-pip” VPC members use PIP instead of VIP for NH in originated EVPN type5 (type2 still uses VIP)

— Router MAC ext community in typ2/5:

—- PIP uses switch RMAC

—- VIP uses local derived MAC based on VIP. Both VPC members derive the same MAC because the share the same VIP. As RMAC ext community is non-transitive and VIP are unique, no issue

  • DHCP: discovery, offer, request, offer. DHCP relay: configured in default gw: relay agent uses default gw IP in the GiAdr field of DHCP payload. DHCP servers uses GiAdd field to find correct scope. As well, uses GiAddr as dst IP for the answer. Problem with anycast gw because all VTEP uses the same IP -> sol: each VTEP dhcp relay uses unique IP (lox) and must be routable. how to choose scope? DHCP option 92.

4 UNDERLAY

  • Considerations

— Clos network = each port equidistant + consistent latency => multistage.

— MTU: vxlan -> avoid fragmentation. vlxan overhead = 50 bytes (14 outer MAC header + 20 outer IP header + 8 vxlan header — extra 4 if QinQ in VNI). Normal ethernet MTU 1500 -> Ethernet Frame = 1518 (or 1522 if 8021q) 18 = 6 MAC src + 6 MAC dst + 2 ether type + 4 FCS. If using vxlan => MTU 1450. If using jumbo frame 9000 => vxlan is 9050. Most network kit supports up to 9216 MTU

— IP Addressing: RID = lo. Use /31 or unnumbered (lo is used for RID) as much as possible. Lo0 (BGP) and Lo1 (VTEP) on IGP. Leaf = Lo0 + Lo1. Spine = only Lo0 because it is not vtep (if using multisite gateway need lo1 as it is vtep). Be sure your ip schema aggregates!!! -> reduce routing table (1x/24 all lo0, 1x/23 all p2p, etc)

  • Unicast Routing

— IGP is OSPF or BGP -> ECMP.

OSPF: use p2p type instead of broadcast -> only LSA-1 !!! low convergence time !!! and small LSDB. If ipv6 -> ospfv3 -> dual stack… two protocols!

ISIS: no IP, works on L2 (CNLS). SPF algo. TLV. NSAP addressing. IP independent.

BGP: path vector (no SPF) if eBGP -> next-hop unchanged (if spine not a vtep). underlay eBGP -> phy to phy // overlay eBGP -> lo to lo (multihop!). If eBGP “route reflector” => “retain router-target all”. If using “Two AS” design (if Spine no vtep) -> spine: ipv4 + evpn => “disable peer-as-check” // leaf: ipv4 + evpn => allowas-in

  • Multicast Routing: more efficient than unicast but needs one extra protocol

— BUM traffic: unicast mode = ingress replication in underlay // multicast mode = use multicast in underlay.

— Unicast: VTEP host to generate n-1 copies of packet. Replication of data traffic is data plane operation. VTEP-VNI membership distribution is dynamic via CP BGP EVPN or static via FnL (doesnt scale!).

— Multicast: PIM Any Source Multicast ASM (PIM SM) or PIM BiDir (depens on hw). Can’t mix PIM modes. RP in Spines!

— PIM ASM Anycast RP: in each spine. 1 IP for all spines -> load balancing. 9S,G) at VTEP.

— PIM BiDIR: (*,G) at RP = Spines. Difference with Anycast, BiDir creates only a shared tree (*,G) on a per multicast group instead of creating a source tree (S,G) per VTEP per multicast group. Redundancy achieved with “phantom” RP that uses lo with different prefix length.

5 MULTITENANCY (L2-> vlan / L3 -> vrf)

  • Bridge Domain: Broadcast domain that represents the scope of a L2 network (vlan). Way of stretching a vlan -> vlan (12bits), vni (24 bits), switch.

  • VLANS in VXLAN: vlan local significant, vni is global significant (per switch, per port)

— L2VNI: RD -> RID: vlan+32767. RT -> autogenerate / AS:l2vni (RT+eBGP is manual at underlay)

  • L2 Multitenancy:

— VLAN mode: restriction 4K to VNI mapping per switch.

vlan 10
  vn-segment 30001

— Bridge domain mode: BD is used instead of vlan-mode. BD implements a BDI instead of a SVI. No retrictions of 4k VNI mapping -> hw restriction:

  • VRF in VXLAN BGP EVPN: VRF-Lite doesnt scale. L3 at Leaf. EVPN -> scale CP -> RD+RT

  • L3 Multitenancy: L3VNI global scope, vrf name is local significant. Auto: RD= RID:VRF_ID / RT= AS:L3VNI (RT+eBGP is manual at underlay)

— Summary: 1) Associate L3VNI into VTEP interface 2) core-vlan associated with L3VNI 3) SVI created in VRF

router bgp X
 vrf VRF-A
  addressing ipv4 unicast
    advertise l2vpn evpn
---
interface nve1
  member vni 50001 associate-vrf
---
vlan 2501
  vni-segment 50001
---
interface vlan 2501
  vrf member VRF-A
  no shut
  mtu 9216
  ip forwarding
---
vrf context VRF-A
  vni 50001
  rd auto
  address-family ipv4 unicast
  route-target both auto
  route-target both auto evpn

6 UNICAST FORWARDING

  • Intra-Subnet Unicast Forwading (Bridging) (Classic Ethernet)

— ARP suppression disabled: ARP request -> BUM mode => Multicast or IR -> BGP EVPN for source MAC

— ARP suppression enabled: ARP snooping -> source MAC -> generated EVPN type2. If dst MAC is know by ingress VTEP then it generates ARP reply (ARP proxy)

— commands:

show bgp l2vpn evpn vni-id 30001
show l2route evpn mac all        <--|-- verifies FIB is updated
show mac address-table vlan X    <--|

// Anounce IP L3 GW manually
interface vlan 10
 vrf X
 ip address a/b tag 12345
---
route-map RM permit 10
 match tag 12345
---
router bgp Z
 vrf X
  address-family ipv4 unicast
     advertise l2vpn evpn
     redistribute direct route-map RM
  • Inter-subnet unicast forwarding (routing)

Symmetric IRB (bridge-routing-routing-bridge): VXLAN-router traffic uses same L3VNI in each direction. VRF -> l3vni -> mapping in all VTEPs.

— Distributed IP Anycast GW: anycast GW MAC (AGM) It is a VTEP. local routing in a VTEP -> no vxlan is used.

— Distributed behind remote VTEP (routing) -> vxlan > inner MAC header (SMAC = VTEP1 router MAC / DMAC = VTEP2 router MAC). RMAC is encoded in BGP EVPN NLRI as extended -community.

— Silent Hosts:

— Dest IP unknown + dst bridge domain is local to ingress VTEP => IP lookup hits LPM (ie /24) -> because L3 distribution IP Anycast FW -> chose local route (lowest AD) -> trigger ARP request for dst IP (because unknow) in different VNI !! -> BUM forwarding -> reach other VTEP.

— No L2 extension present:

show bgp l2vpn vpn vni-id X   <-- 1) verify BGP RIB
show bgp ip unicast vrf Y         2) verify RIB (RT worked fine)
show ip arp vrf Y                 3) verify FIB
  • Forwarding with dual-home endpoints: VPC -> anycast VTEP = VIP. Egress (outer src IP = VIP when traffic leaving ingress VETP). Ingress (outer dst IP = VIP when return traffic leaves egress VTEP -> ECMP to either of VTEP behind VIP)

— orphan: traffic may cross VPC peer-link because NH=VIP. L2/L3 announcements in VPC -> NH=VIP. If routing needed between VTEP1 and VTEP2 (both belong to same VPC) -> BGP or VRF-lite or advertise type5 with “PIP” from each VTEP instead of VIP (preferred)

  • IPv6: Anycast GW MAC (AGM) is shared between ipv4/ipv6. Underlay only ipv4 -> overlay ipv6 communication => NH VTEP=ipv4.

7 MULTICAST FORWARDING

Handling MC in overlay.

EVPN Type3 -> (unicast is used to handle BUM) VTP announces interest in a L2VNI

Initially not VXLAN L2 MC without IGMP snooping => L2 MC flooded to all VTEPs in that VNI even if not interested.

  • L2 MC forwarding = Intra-subnet MC. Same VNI = broadcast domain. In MC mode, underlay maps L2VNI to MC group.

— IGMP in VXLAN BGP EVPN:

— Classic IGMP snooping: Traffic is still flooded unconditionally as long as VTEPs are member of that VNI. MC is dropped at VTEPs egress.

— Improved IGMP snooping: “ip igmp snooping disable-nve-static-route-port” -> conditional addition of a VTEP to the Outgoing Interface List (OIL) for a given VNI.

  • L2 MC forwarding in VPC: one of the two peers of VPC -> elected DF (lowest cost to RP). Election process: Both VPC peers send PIM join to RP using Anycast VTEP IP (secondary IP in lo1). RP sends only 1 reply to anycast IP, this is hashed to one VPC peer -> the peer with the (S,G) is the DF (S=VTEP anycast IP, G=MC VNI mapping)
  • L3 MC forwarding = inter-subnet MC. Not much info, something expected in 2017.

8 EXTERNAL CONNECTIVITY

  • Placement:

— Border Leaf: VTEP, few flows N-S. Extra hop. No end-points. SS doesnt ned to be a VTEP.

— Border Spine: Spine becomes VTEP. Most flows N-S

— Extended L3 connectivity (L3 handoff):

— Wiring:

—- Full mesh: most resilient, no require sync between border nodes.

—- U-shape: sync link between border nodes.

— VRF-Lite/Inter-AS opt-A: BGP + redist + summarization, 802.1q. VRF-Lite-> SVI (needs BFD), subinterface (recommended) + ebgp

— Extended L2 connectivity: End-point mobility -> RARP (non-IP)

  • Classic Ethernet + VPC: VPC -> anycast VTEP IP (secondary IP in lo1) -> NH = anycast VIP (type2). “advertise pip” for type5 NH = VPC physical IP (primary lo1).

* BPDU not transported in VXLAN -> Use VPC between STP switch and VTEPs.

  • Extranet + Shared Services: Internet, DNS, DHCP, etc.

— VRF route-leaking: tenant VRF <-> shared VRF (dhcp, dns, etc) -> route leaking: CP leaking at ingress VTEP, DP leaking at egress VTEP. VXLAN uses VNI associated with source VRF for remote traffic. Problem: force consistent config in VTEP with leaking. Scalability (asymmetric IRB)

— Downstream VNI assigment: egress VTEP dictates the VNI to be used by ingress VTEP with downstream VNI-assigment via CP

9 MULTIPOD, MULTIFABRIC, DCI

  • OTV vs VXLAN: VXLAN frame similar to OTV. OTV is transport agnostic IP-based solution.

— OTV includes CP and DP. VXLAN only DP (it needs BGP EVPN for CP)

— OTV provides multihoming (redundancy) using DF on per VLAN, doesnt need VPC. VXLAN needs VPC to provide multihoming.

— OTV has loop prevention. VXLAN needs BPDU guards + storm control.

— ARP suppresion enabled in both. Unknown multicast is dropped in OTV. VXLAN+EVPN doesnt stop unknow unicast.

  • Multipod: LS-SS + super spine layer. Prefix scale MAC/IP? Spine or Super-Spine needs to be BGP RR. MC -> escale Output Interface list (OIF). Max 65k LS. Single DP extends pod to pod = single fabric.

  • Multifabric: Difference from multi-pod, complete segregation CP and DP -> interconnect at border -> stitching VNIs, -> DCI design.

  • Interpod / Interfabric: Broadcast storm in overlay reaches all pods if L2 extended to all pods.

— opt-1: Multipod, single DP end to end. problem: failure domain, no separation pods (vxlan encap end to end)

— opt-2: Multifabric: DCI at border of fabric using classic Ethernet (VRF-lite + 802.1q). Better scale, MAC/IP not spread across all VTEPs (VXLAN encap only inside fabric). VXLAN ends at border device. Problem: DCI is bottleneck.

— opt-3: Multisite: option2 + re-originate L3 routing info (MPLS L3EVPN) VXLAN ends at border fabric -> DCI encap in MPLS -> other end removes MPLS and then back to VXLAN.

— opt-4: Multisite L2: option 3 for L2. OTV or EVPN. VNI-VNI stitching.

* Multiste EVPN VXLAN using BGW -> IETF draft-sharma-multi-site-evpn 2016

10 L4-7 SERVICES INTEGRATION

  • Firewalls in VXLAN BGP EVPN:

— routing mode: use L3

— bridging mode: “bump in the wire”, VLAN stitching

— FW redundancy with static routing: ok if HA FW connected to same LS pair (VPC). If FW in different LS -> suboptimal routing -> 2 solutions: 1) static route tracking, 2) static route at remote LS -> static route in ALL LS that need to reach the FW -> LS will learn type2 of FW via active LS.

  • Inter-Tenant / Tenant-Edge FW: security enforcement at edge/exit of a tenant/VRF. VRF stitching located at Border LS.

— Inra-tenant FW: E-W firewall = FW inside VRF.

— deployment:

—-FW route mode + default GW for all VLANs => VXLAN only at L2 => no VRFS, no anycast gw.

—- FW bridge mode: all network belong to same subnet. VXLAN + distributed IP anycast gw. FW connected to distributed IP anycast GW LS.

—- PBR: Policy-Base Routing

— Mixing intra-tenant and inter-tenant:

— Intra-tenant:

—-L2 (E-W): FW is GW. LS only extends L2 -> vxlan only l2, no distributed IP anycast gw. BL trunk to FW to extend L2.

—- L3: LS uses distributed IP anycast gw.

— Inter-tenant: default route pointing to FW -> redistribute via BGP EVPN

  • Load Balancer: “statefull”

— one-arm source-NAT: LB connected with 1 link / PO to LS.

— Direct VIP subnet approach: LB VIP + LB physical IP in same range. VIP advertised via type2

— Indirect VIP subnet approach: needs static route (like FW example) -> type5.

— source-NAT -> client IP is hidden, servers return traffic to LB

— service chains: LB+FW: FW belongs to BL, LB belongs to Service Leaf. If 2-Arm LB -> VRF-transit between FW-LB. If 1-arm LB -> no transit-vrf, source NAT.

11 FABRIC MANAGEMENT

  • POAP: out-of-ban (mgmt port) needs dhpc relayy. inband (front panel ports)
  • NRFU
  • OAM:
show mac addres-table
show l2route evpn mac all
show vlan id X vn-segment
show bgp l2vpn evpn vni-id Z
show bgp l2vpn evpn MAC
show ip arp vrf Y
show forwarding vrf Y adj
show forwarding up local-host-db vrf Y
show l2route evpn mac-ip all
show bgp l2vpn evpn IP
show ip route vrf Y IP
show nve internal bgp remote database
show nve peers detail

ping nve up unknown vrf X payload IP DST SRC port SRC DST proto 6 payload-end vni 50000 verbose
traceroute ...
pathtrace ...