How Do Routers Work, Really? (opens in new tab)

(kamila.is)

398 pointsturingbook5y ago98 comments

98 comments

78 comments · 13 top-level

pfarrell5y ago· 21 in thread

I would suggest expanding your terminology section. I know almost nothing about routers and I'm lost in the first sentence of the High Level Overview section.

  "A switch (or an L2 switch :-) ) is an L2-only thing."

I don't know what L2 means. I suspect a definition of the various levels would expand the audience for this post.

msla5y ago

It's important to keep layering in mind when talking to people outside the IETF, but the IETF itself is not impressed:

https://en.wikipedia.org/wiki/Internet_protocol_suite#Compar...

> The IETF protocol development effort is not concerned with strict layering. Some of its protocols may not fit cleanly into the OSI model, although RFCs sometimes refer to it and often use the old OSI layer numbers. The IETF has repeatedly stated that Internet protocol and architecture development is not intended to be OSI-compliant. RFC 3439, referring to the Internet architecture, contains a section entitled: "Layering Considered Harmful".

Anyway: People sometimes like to pretend that OSI is a model and TCP/IP implements the model, forgetting that OSI is/was a protocol stack and TCP/IP has no interest in being "compliant" with any other protocol stack to the extent it mimics its layering architecture.

jlmcguire5y ago

This is one of those cases where both sides have some insight depending on viewpoint. The OSI model is like every other model. It isn't reality (at least in TCP/IP) but instead is a helpful abstraction esp. around troubleshooting and understanding networking concepts. There comes a point where the model breaks down but that doesn't mean it's an unhelpful model just that it isn't a complete picture. I try and work networking problems through the OSI layer model but am aware when things don't really fit well into it (MPLS, MSS, ARP, Layer 5-7).

1 more reply

_jal5y ago

For me the OSI tends to come up at work to talk about scope or areas of control. People will say "that happens in layer 3" (for instance) as shorthand, not as a referent that corresponds to any actual thing.

Johnny5555y ago

I don't think the post is meant to be a beginners level introduction to networking, the author writes:

This is the inside view of how exactly a router operates. You only need to know this if you are poking inside a router implementation. If that is the case, my condolences.

If you're poking inside a router implementation, it seems fair to expect that you have a basic understanding of OSI networking layers.

hinkley5y ago

Reading the replies, I somewhat doubt whether you still know what L2 means. The danger of being a nerd is sometimes you say a lot of words but they don’t mean anything.

Ethernet. L2 means Ethernet (or WiFi). Ethernet is the envelope we put Internet traffic in (L3) and the layers above that are about nailing down how exactly a conversation is managed. Sometimes people get upset about what constitutes Layers 5-7, especially since that Tim Berners-Lee joker ruined all the pretty pictures with HTTP. So mostly we only talk about 2,3,4 and 7, in the same way you don’t bring up religion or politics at a family reunion.

peanutz4545y ago

"Tim Berners-Lee joker ruined all the pretty pictures with HTTP"

This is the first time I am reading this, I interpret this to mean HTTP is badly designed and Tim Berners-Lee caused it. Need more...

2 more replies

AlphaSite5y ago

I think you need to know your audience and cater to them, trying to explain everything just ends in a book. L2 is especially googleable.

pfarrell5y ago

This is a good point. You have to have some assumptions of what your audience brings.

I'm aware there are levels of information in an IP packet, but I don't know them offhand. If I have to google something on the first sentence in a high level overview, then I'm likely not going to read the piece and the author has lost me as a reader. Maybe I'm not the target audience, though I was interested. I'm providing that as feedback for the origial author since the piece mentions that's it's still a work in progress.

samatman5y ago

If only we were using a technology where you could turn the word "L2"[0] into a link to a page explaining what it means.

Then the author wouldn't have to, and we wouldn't need to use a search engine!

[0]: https://en.wikipedia.org/wiki/OSI_model#Layer_2:_Data_Link_L...

hinkley5y ago

To be fair, L2 could be Layer 2 or Level 2 (cache) and it might be a crapshoot what you get. You might get confused trying to answer your own questions.

Discoverability lives in the space between overexplaining and underexplaining.

2 more replies

inopinatus5y ago

For me, that book was W. Richard Stevens's TCP/IP Illustrated, volumes 1 & 2 particularly.

monadic25y ago

Surely they would be aware the audience would know what ethernet is. To me, L2 refers to the level 2 cpu cache.

tejohnso5y ago

https://en.wikipedia.org/wiki/OSI_model#Layer_2:_Data_Link_L...

Cerium5y ago

The IP stack has the concept of layers, which function as abstractions that hide the implementation of lower layers from the upper layers. Layer 2 (L2) is the physical link layer - it only cares about getting a packet between two devices. Layer 3 (L3) is where IP addresses live. As the article describes a router has functionality to send a packet towards its final destination as well as get it between ports.

josteink5y ago

> The IP stack has the concept of layers, which function as abstractions that hide the implementation of lower layers from the upper layers

Correction: the network stack has layers, where IP is one of them, near the top.

Which is why most software targets IP. It’s a good abstraction and it’s portable.

1 more reply

varjag5y ago

L1 is (naturally) physical. L2 is data link.

Cyph0n5y ago

L1 is the physical layer. L2 is the MAC layer.

IncRnd5y ago

This refers to Layer 2 in the OSI model of the network stack. See https://en.wikipedia.org/wiki/OSI_model

1. physical layer, 2. data link, 3.vnetwork, 4. transport, 5. session, 6. presentation, 7. application layer.

So, many switches are layer 2, but layer 3 switches are often referred to as switching routers. This can cause two different switches to act differently from each other in certain network environments. It isn't that one switch "doesn't work" but that it isn't a router.

A router is nominally a L3 device, though most actually are L1-7. To work, you need L1 & L2, but in today's world, there are applications and interfaces that move the router across L1-7, though not to the same depth as purpose built application devices for example. Topping this off, some routers will switch and some will not. It's the same wide-world of words that we see across the whole computer industry.

The OSI model differs from the TCP model of networking, even though both use numbered layers.

zakki5y ago

You may want to read OSI 7 layers model. Those L1,L2,L3,L4 and L7 concept derived from that model. L1 is the physical access. It is the cable, the fiber or the WiFi itself. L2 is datalink. We use Ethernet for IP network. The device that mainly handle communication at this later called a switch. L3 is network. In IP Network it handles the routings between IP Network. The device usually called as a router.

Some devices can do L2 and L3 at the same time. That’s why another term came up: L2 only switch.

And so on, you can read it more on [1].

1] https://en.m.wikipedia.org/wiki/OSI_model

dreamcompiler5y ago

https://duckduckgo.com/?q=osi+7+layer

mav3rick5y ago

L3 => IPs L2 => MAC addresses

xg155y ago· 18 in thread

> Note that the next hop’s IP address is in the router’s memory only: it does not appear in the packet at any time.

This clears some points that always puzzled me:

If the gateway is identified by an IP address, but the destination host is also an IP address, which address exactly is put into the packet? And how can a packet be routed if the gateway's IP is itself part of the subnet that's supposed to be routed to it. (E.g. 192.168.0.0/24 with default gateway 192.168.0.1)

So the answer is, if I send the packet to host 1.1.1.1 but the routing table has 2.2.2.2 as the next hop, the packet will have 1.1.1.1 as the destination in the IP part but the MAC of 2.2.2.2 as destination of the Ethernet part (or equivalent). It doesn't matter which subnet the next hop's IP is in, as the routing table isn't consulted for it anyway - it's only used in ARP)

This leaves the question, why the indirection and why the mucking around with ARP and IPs that are never used as the destination to anything?

Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

jcrawfordor5y ago

To give a simplified but largely accurate summation: IP and Ethernet were each designed in different time periods and largely without knowledge of the other. Ethernet was historically used in such a fashion that multiple hosts (more than 2) occupied the same collision domain, that is, they were physically connected to the same cable, or through hubs that repeated frames to all interfaces without routing. This means that Ethernet required an addressing scheme so that hosts on the same media knew which frames were for them (higher-level protocols at the time did not necessarily handle this).

Ethernet's addressing scheme was not designed to accommodate large hierarchical networks and so is unsuitable for the IP use case, but more importantly, IP was designed completely separately from Ethernet, and was not used primarily with Ethernet until later, so IP could not "assume" that the layer below it handled addressing (typically there was either no layer below [point-to-point] or only a very simple one).

The result is that Ethernet and IP duplicate functionality to some extent. It is theoretically possible, although not common, to build a network which uses only layer 3 routing without any reliance on Ethernet addressing. A significant reason this is rare, arguably the most significant reason, is that IP is now carried over Ethernet a significant majority of the time and L2 Ethernet devices (like switches) require the use of Ethernet addressing for the network to function. You usually see "pure IP" in virtual networking environments where the IP is encapsulated in, well, more IP, but even then Ethernet frames are sometimes used because, well, just like network hardware, operating system network stacks generally expect them (examine, e.g., the linux bridge implementation). It is completely possible to build network stacks and network appliances which do not require the use of Ethernet but it is expensive and there's not much of a motivation to do so, and you'd run into issues with any kind of equipment not so designed.

Addressing is not the only duplicate functionality between Ethernet and IP, and it's one of the less significant ones since Ethernet addressing does provide utility even if not strictly required. Ethernet frames are checksummed, and IP headers are also checksummed, even though the Ethernet checksum is already over them. The IP header checksum exists because IP was historically carried over lower layers that did not provide integrity checking. This is basically pure wasted space in typical networks, so IPv6 drops the header checksum to remove the overhead.

In general, though, network protocols tend to make more sense when you have some awareness of the history of their development, as when you try to view the modern internet as an elegant, monolithic design as some authors attempt, a lot of things won't make sense because they simply are that way for historic reasons. Ethernet and IP were each designed in the '70s, but separately, and their use has accumulated significant cruft since then, including some radical changes in the ways that they were used (for example the transition of Ethernet from shared media to point-to-point, which occurred de facto earlier but became largely formalized with the introduction of GbE which prohibits more than two hosts in a collision domain, and of course ironically the introduction of multiple hosts in a collision domain as an even larger issue with wireless protocols, which requires additional handling below, or actually in lieu of, the ethernet layer, 802.11 being a replacement for ethernet that happens to behave similarly in many ways for compatibility).

Finally, the OSI model is something that tends to add complexity and confusion to these discussions, which is why I doggedly discourage its use in teaching. The OSI Model describes the OSI protocols, which were contemporaries competitors to the TCP/IP protocols. Arguably, one of the reasons that the OSI protocols fell out of use (in favor of IP) is exactly because they assumed seven layers, and each was fairly complex. Some OSI protocols are still in use, for example IS-IS (OSI layer 2) in the telecom industry and some backbone IP transit, but in niches and generally being replaced with IP. IP is intentionally simpler, and can be fully described using four layers, what's usually referred to as the TCP/IP model.

The OSI layers do not map 1:1 to the TCP/IP layers, even if you simply ignore the ones that map more poorly as instructors often do. Even worse, many instructors and textbook authors feel such a strong compulsion to map modern networks to the obsolete OSI model that they cram application-layer protocols into OSI layers 5 and 6 in order to have examples of them. I have seen cases as extreme as an instructor claiming that HTTP cookies represent the session layer. This kind of thing is nonsense and hinders understanding rather than contributing to it. If the OSI model is taught (not a bad idea at all as students should realize that TCP/IP is merely the popular way, and certainly not the only way), it should be taught specifically by contrasting it to the different TCP/IP model. Unfortunately few instructors and website authors today seem to even be aware that the OSI protocol stack existed separately from IP.

And, if you are wondering, yes, Ethernet can be used in a switched network completely independently from IP (although not really in a routed network unless you are generous about how you define routing). This was more common decades ago, the only equipment I have ever personally encountered that used bare Ethernet was a very outdated CNC setup.

jwatzman5y ago

Along with the above fantastic comment, I found https://apenwarr.ca/log/20170810 an interesting (if inflammatory/divisive) essay on the subject and its history.

1 more reply

therealcamino5y ago

Besides the choice between using IP or "bare" Ethernet, there are alternatives to IP as the layer on top of Ethernet that are used in routed networks. Two of the more-common examples historically are Novell Netware (IPX/SPX) and DECnet.

1 more reply

bnjms5y ago

Beautiful rant.

Request. Do TLS next (if it’s in your wheelhouse). I’ve been looking for a good summary of ECC and selected curves in tls 1.2

1 more reply

swinglock5y ago

> It doesn't matter which subnet the next hop's IP is in, as the routing table isn't consulted for it anyway - it's only used in ARP)

You can only ARP for hosts on the same subnet as you, terrible hacks excluded.

> This leaves the question, why the indirection and why the mucking around with ARP and IPs that are never used as the destination to anything?

Because it was designed in layers so that different layers could be replaced. We didn't know we'd end up with mostly only IP and Ethernet in LANs back then.

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

It could have been done in any number of ways. It's not that much complexity through and it would bake Ethernet MACs into everything IP, even in the cases where it's not needed.

AlphaSite5y ago

Fiddling with ARO comes up more often that you’d think, especially as a quick easy way to handle HA.

james4125y ago

IP addresses sharing a route have a common prefix. This is not true of MAC addresses. They are allocated essentially randomly. If you wanted to route solely using MAC addresses, every router in the world would need a lookup table containing every MAC address, route aggregation would be impossible

That's not /the/ reason why a MAC address is involved. It's because that's the address for a physical device at a lower layer in the stack. As others mention, IP is media-independent, it cannot depend on a lower tier addressing scheme without becoming fused to that medium

mrkstu5y ago

In an alternative universe where Novell continued to dominate networking, we'd be talking about how IPX uses the MAC directly to ID the host and had a separate network ID to uniquely identify the LAN the host is connected to.

It is actually a pretty reasonable way of integrating hardware MACs directly into the internetworking stack.

yabones5y ago

The reason for that is because IP is not 'integrated' with layer-2 tech like Ethernet. In fact, for a very long time Ethernet was only really used on local networks. Point-to-Point Protocol (PPP) [1] is a completely separate data link layer technology with no real concept of MAC addresses, because there can only be two devices on the bus.

Most of the very expensive 'multilayer' switches [2] do a form of this where they associate a next-hop IP with a MAC address entry and store that in the TCAM or data layer. It's not used as much because Cisco has a ton of patents on this type of technology, and also because general purpose hardware has gotten quick enough that it's not as important as it was ~15 years ago...

[1] https://en.wikipedia.org/wiki/Point-to-Point_Protocol

[2] https://en.wikipedia.org/wiki/Multilayer_switch#Layer-3_swit...

yardstick5y ago

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

One reason why using an IP is still important is the IP can move to a different router, so the MAC for that IP can change. Eg if a hardware swapout was performed, or the network admin manually moved the IP, or some HA system that dynamically moves IPs to other routers (and isn’t VRRP, which uses a virtual MAC).

Usability: it’s a lot easier imo to read a routing table with IP next hop than MAC as you don’t have to remember what MAC every machine is. The IP also conveys visually which port the traffic is (probably) going out. Eg Port 1 - 192.168.1.0/24 Port 2 - 192.168.2.0/24

If my next hop for 1.1.1.1 is via 192.168.2.254 I know immediately it’s going out port 2. If it was a MAC I’d have no clue unless I memorised all MACs in my networks.

w75y ago

You can have network segments which do not use ethernet and therefor have no MAC addresses, but still use IP addressing and need to be routable. It doesn't make sense to tie the next-hop in a table to MAC addresses which are an implementation detail on a lower layer. A good, popular, example of this you can test yourself without obscure hardware is wireguard.

monocasa5y ago

A lot of protocols don't end up using Ethernet as the physical layer, even ones you still use today.

Qemu (and I think Docker too?) use SLIRP internally for access between VMs which is ultimately an IP layer bridge.

On the WAN side (at least at one point, I could be out of date here) they didn't use Ethernet, but instead IP layer routing as well, on top of stuff like PPP and SONET.

starfallg5y ago

>Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

This is exactly what Cisco Express Forwarding (and similar layer 3 switching technology) does. The adjacency table keeps all of the layer 2 information to be used for fast routing of packets. This was implemented on the CPU back in the day, but now usually done in the switching ASICs.

However, you still need layer 3 next-hop information in the routing table (and dynamic routing protocols). The reason being 1. ethernet is one of many layer 2 technologies that IP supports and 2. MAC addresses can change for a particular IP address due to various reasons including hardware replacement and HA.

wmf5y ago

Historically, some links didn't have MAC addresses and different link types have different address types so it's easier for the routing protocols to work in terms of IP addresses.

jlgaddis5y ago

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

Several others have already answered your question -- the key points being "the OSI model" (e.g., layer 2 vs. layer 3) and the multitude of other layer 2 protocols which don't use MAC addresses -- so I'll mention one other important detail.

---

Although the Ethernet protocol itself has been around for ~40 years now, for the majority of that time it mostly only existed "in the LAN".

In fact, when it comes to "on the WAN", Ethernet is still a relative newcomer. Before ~15 years or so ago, pretty much no one was using Ethernet "on the WAN" -- instead, it was X.25 and frame relay and HDLC and PPP and ATM and POS on analog "leased lines" and ISDN and DS-{1,3}s and OC-{3,12,48,192}s.

Along came MPLS, MetroE, EoMPLS, Carrier Ethernet, etc., and soon enough everyone was "tunneling" Ethernet between sites but we were still mostly using those "legacy" protocols "on the WAN".

Over time, technology advanced to the point that "native" Ethernet eventually became feasible "on the WAN" -- in no small part because 1) Ethernet speeds kept increasing by an order of magnitude (!) every few years, 2) standardizing on Ethernet everywhere drove the costs down, and 3) Ethernet was "easy" (compared to all of those "WAN" protocols we were using up until this point) -- everybody already "knew" Ethernet because, by this time, everybody had been using it in their LANs for a decade or more!

Although ATM and SONET (at least) are still around in (some parts of) some service provider networks, they are now the exception and Ethernet -- to butcher a phrase -- "has eaten the world" but, as I mentioned, Ethernet "on the WAN" is still a relatively new thing.

---

So, I'll offer an alternative answer to your question:

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

Sure, if you had done it about 30 years earlier!

notyourday5y ago

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

No, because MAC address only makes sense for ethernet-like layer 2 protocols and IP can run over any number of layer 2 protocols, including point to point protocols and some of the point to point protocols.

rmetzler5y ago

If you would put next hops MAC address in the routing table and the device fails and needs to be replaced, all the routing tables would need to be rewritten, because MACs are supposed to be unique. You couldn’t just take a spare device, configure it accordingly and be done with it.

bluecmd5y ago

IPV6 commonly does that. Your next hop is installed as a link-local fe80-entry which is derived from the mac address. Not exactly what you're after, but removes the IP numbering need.

geerlingguy5y ago· 12 in thread

I learned how routers really work from Ericsson's seminal video on the matter, The Good Warriors of the Net: https://www.youtube.com/watch?v=x9XWxD6cJuY

Though I always thought the "router switch" was much more fun.

lelandbatey5y ago

Slightly higher quality version here: https://youtu.be/PBWhzz_Gn10

jpxw5y ago

Just watched the whole video, amazing, nostalgic but also subtly wrong in a number of annoying ways!

whoisburbansky5y ago

For someone with only a passing understanding of router innards, what should I watch out for from this talk to avoid coming away with an incorrect understanding of how things work?

1 more reply

hoten5y ago

"accidents happen [in LAN]", "at least the router is exact (for the most part)"

What does this mean?

Then towards the end... "the packet is recycled". What?

geerlingguy5y ago

I don't know about packet recycling, but at least with the 'for the most part', packet collision and packet loss used to be a lot more common for some reason. Nowadays the only times I see them on local networks is when cables get badly kinked or terminations are poorly done.

Spare_account5y ago

I watched this decades ago and forgot just enough about it that I couldn't find it again recently when I tried. Thank you

dec0dedab0de5y ago

Haha I forgot about this video. It was required viewing at my first job.

sgillen5y ago

Haha thanks for sharing. Interesting how much emphasis there is on "the ping of death" compared to literally any other exploit. Does anyone know if this was really such a big problem when this video came out?

schoen5y ago

What I remember is that the ping of death was extremely surprising in terms of the number of OSes affected, the ease of exploiting it, and the super-noticeable consequence of instantly crashing the target machine. And it came out at a time when there wasn't as much vulnerability research and very few extensively cross-platform vulnerabilities.

Also, with the ping of death, the only way to use it was to very noticeably crash systems -- not to secretly build a botnet or something, as might have been done with RCE vulnerabilities.

kitteh5y ago

It was popular for booting people off IRC, but there were other exploits around the same era that did the same such as land and teardrop.

It wasn't super notable. What was more horrific was the amount of windows machines that had tcp ports for various windows services open to the internet that led to not only crashing but remote compromise and rootkits/botnet stuff. That went on for years and only got mitigated by people deploying routers with fw/Nat functionality.

geerlingguy5y ago

I do remember hearing about it causing issues here and there in the 90s/early 00s, but rarely. Never hear about it anymore.

But I do remember AppleTalk causing issues more frequently on a network I helped manage that had radio studios with two Macs per studio, but mostly Windows PCs through the rest of the building.

That place also had a Macintosh 512K running its phone system until around 2010!

methou5y ago

Someone gotta make this in Factorio

Cyph0n5y ago· 4 in thread

> If that is the case, my condolences.

As a software engineer working on IOS-XR, that gave me a chuckle :p

In the case of enterprise- and SP-grade routers, the data-plane - i.e., where the actual forwarding and lookups take place - runs entirely on a dedicated network processor (NP), mainly for performance reasons. Information on the NP is populated by the router's operating system in response to user configuration, network topology changes, or protocol state updates. On the other hand, the control plane runs mainly on the CPU(s). This is required so that the protocols running on the router OS (e.g., BGP) can receive and send out updates based on their state machines.

anotherkamila_5y ago

> As a software engineer working on IOS-XR, that gave me a chuckle :p

Good good :D

Thanks for the clear data plane / control plane explanation, that's a good way to summarise the distinction. May I link to it from the article?

Cyph0n5y ago

Thanks! Sure, go ahead!

peterwwillis5y ago

I think the simplest way for people familiar with PCs to visualize it are the FirePOWER devices. Network cards plugged into some slot have embedded chips which can be programmed to, say, filter specific kinds of traffic, or pass it onto the host CPU for more advanced logic. While the machine's central CPU runs a web interface, manages local databases, downloads updates, manages clusters, records metrics, etc. And either can even be hot-pluggable, interchangeable blades in a larger machine chassis.

Protocol-wise, isn't it common now for the NP on higher end stuff to handle L4 and higher protocols? Or are those still largely managed by the CPU?

Cyph0n5y ago

Yeah, NPs can handle L4 protocols, but I believe it’s usually a hybrid approach where the logic is split between CPU and NP.

1 more reply

teleforce5y ago· 4 in thread

I teach computer networking class with lab using Linux Switch Appliance (LISA) and Quagga router (based on Zebra) on embedded computer running x86 CPU with multi-port Ethernet. The embedded router need to be dual-boot for its specific function because LISA is based on custom Linux kernel but Quagga is just using normal/vanilla kernel.

I am looking for a "layer 3 switch" than has switching and routing functionalities without rebooting. If anyone know any software based open source solution for this it will be very helpful. Preferably with Cisco IOS like user command interface but it is optional but not mandatory.

Based on the article, it is explaining router internal based on P4. Perhaps I should try to use P4 for the above mentioned requirements?

mitchs5y ago

For labbing with quagga you can get pretty far with Linux containers to emulate multiple routers on a single host. (I've used both lxc and docker to manage containers.) You can create virtual ethernet device pairs (ip link add veth0 type veth peer name veth1) , and drop either end into running containers (ip link set veth0 netns <container process ID>.) Make sure to turn on the ip forwarding sysctls inside the containers and Linux will behave quite nicely as a virtual router.

Also, consider consider upgrading to the more active fork called Free Range Routing.

zamadatix5y ago

GNS3 and run actual vendor virtual images if you want to have the actual vendor interface, it's made for this scenario.

wmf5y ago

VyOS supports bridging and routing although the config is more like a Linux host and unlike a real Cisco/Arista switch.

snuxoll5y ago

The Vyatta/VyOS/EdgeOS CLI took heavy inspiration from Juniper’s JunOS, so saying the config is unlike a “real” switch is factually incorrect.

It’s still a little odd, but as somebody quite comfortable with JunOS (I run Juniper switches in my homelab) it’s pretty easy to pick up any of the Vyatta forks and hit the ground running.

anotherkamila_5y ago· 2 in thread

Hi, I'm the author. Uh hi w00t how why what's it doing here?! :D

I promise to make it better and actually finish it now! Check back in a day or two I guess? Also I should post the code I promised. Hello from the ADHD squirrel!

anotherkamila_5y ago

Also thanks a ton for your suggestions, I really appreciate them!

coolgeek5y ago

Love the URL!

boryas5y ago· 2 in thread

I believe this piece does a good job with forwarding, but would be improved by a discussion of termination.

Routing is only triggered when the packet is L2 terminated: the destination MAC of the packet is one of the router's own MACs.

If the packet's destination MAC does not belong to the router, it doesn't matter what is in its IP header, it will be switched in the LAN it came in on.

This design also generalizes nicely to the case when the destination IP of a routed packet is one of the router's IPs.

anotherkamila_5y ago

Good point. Incorporating that would require more brain that I have right now (bad timezone :D), but you're right, I completely left that out. May I update the article with a link to this comment?

boryas5y ago

sure!

rabuse5y ago· 1 in thread

I learned a lot about networking when setting up servers in racks. Had to deal with issues arising from terrible UI's on a lot of the routers out there, so I just kept digging deeper and deeper into how it all works. Also, if more are looking into how packets are actually routed, look into BGP, and how CDN's work. Great stuff.

walshemj5y ago

I would start with how internal routing works before starting on WAN routing.

Id look at the cisco press and CCNA training materials

bogomipz5y ago· 1 in thread

>"It needs to be routed: the router, based on L3 information, decides where it needs to go ,in L3 speak – it will decide which host to send it to, but not how. This corresponds to the routing table (or FIB)."

This is not correct. The FIB(forwarding information base) is concerned with layer 2. The RIB(routing information base) determines the next hop. The RIB is what is used to populate entries in the FIB with the correct outgoing interface. These two terms are basic router terms. It was kind of surprising to see this statement in a post titled "How Do Routers Work, Really?"

anotherkamila_5y ago

You're right, I noticed it about an hour ago -- no idea what was going on in my head then :-/ Fixed already. Thank you!

icedchai5y ago

Maybe a mention of other, non-ethernet, links. Serial PPP? Frame Relay? I realize these are mostly historical curiosities these days, but it might help to enforce the differences between L2 and L3.

When I first started working with routers, over 25 years ago, it was all ethernet LAN to serial WAN, usually point-to-point T1 or frame relay. On site had a dual T1, load balanced on both ports of a Cisco 2501. Fun times.

wbsun5y ago

Click is a very good software router to read and learn: https://github.com/kohler/click

It can be more than a router though.

dnautics5y ago

this is great if for no other reason that in section 1 it explains the difference between a switch and a router (which took me a decade? to really understand). I really wish someone could have laid it out clearly for me.

mrburton5y ago

I just have to say this "magnets how do they work"? ;) Anyone get the reference?

j / k navigate · click thread line to collapse

98 comments

78 comments · 13 top-level

pfarrell5y ago· 21 in thread

I would suggest expanding your terminology section. I know almost nothing about routers and I'm lost in the first sentence of the High Level Overview section.

  "A switch (or an L2 switch :-) ) is an L2-only thing."

I don't know what L2 means. I suspect a definition of the various levels would expand the audience for this post.

msla5y ago

It's important to keep layering in mind when talking to people outside the IETF, but the IETF itself is not impressed:

https://en.wikipedia.org/wiki/Internet_protocol_suite#Compar...

jlmcguire5y ago

1 more reply

_jal5y ago

Johnny5555y ago

I don't think the post is meant to be a beginners level introduction to networking, the author writes:

This is the inside view of how exactly a router operates. You only need to know this if you are poking inside a router implementation. If that is the case, my condolences.

If you're poking inside a router implementation, it seems fair to expect that you have a basic understanding of OSI networking layers.

hinkley5y ago

Reading the replies, I somewhat doubt whether you still know what L2 means. The danger of being a nerd is sometimes you say a lot of words but they don’t mean anything.

peanutz4545y ago

"Tim Berners-Lee joker ruined all the pretty pictures with HTTP"

This is the first time I am reading this, I interpret this to mean HTTP is badly designed and Tim Berners-Lee caused it. Need more...

2 more replies

AlphaSite5y ago

I think you need to know your audience and cater to them, trying to explain everything just ends in a book. L2 is especially googleable.

pfarrell5y ago

This is a good point. You have to have some assumptions of what your audience brings.

samatman5y ago

If only we were using a technology where you could turn the word "L2"[0] into a link to a page explaining what it means.

Then the author wouldn't have to, and we wouldn't need to use a search engine!

[0]: https://en.wikipedia.org/wiki/OSI_model#Layer_2:_Data_Link_L...

hinkley5y ago

To be fair, L2 could be Layer 2 or Level 2 (cache) and it might be a crapshoot what you get. You might get confused trying to answer your own questions.

Discoverability lives in the space between overexplaining and underexplaining.

2 more replies

inopinatus5y ago

For me, that book was W. Richard Stevens's TCP/IP Illustrated, volumes 1 & 2 particularly.

monadic25y ago

Surely they would be aware the audience would know what ethernet is. To me, L2 refers to the level 2 cpu cache.

tejohnso5y ago

https://en.wikipedia.org/wiki/OSI_model#Layer_2:_Data_Link_L...

Cerium5y ago

josteink5y ago

> The IP stack has the concept of layers, which function as abstractions that hide the implementation of lower layers from the upper layers

Correction: the network stack has layers, where IP is one of them, near the top.

Which is why most software targets IP. It’s a good abstraction and it’s portable.

1 more reply

varjag5y ago

L1 is (naturally) physical. L2 is data link.

Cyph0n5y ago

L1 is the physical layer. L2 is the MAC layer.

IncRnd5y ago

This refers to Layer 2 in the OSI model of the network stack. See https://en.wikipedia.org/wiki/OSI_model

1. physical layer, 2. data link, 3.vnetwork, 4. transport, 5. session, 6. presentation, 7. application layer.

The OSI model differs from the TCP model of networking, even though both use numbered layers.

zakki5y ago

Some devices can do L2 and L3 at the same time. That’s why another term came up: L2 only switch.

And so on, you can read it more on [1].

1] https://en.m.wikipedia.org/wiki/OSI_model

dreamcompiler5y ago

https://duckduckgo.com/?q=osi+7+layer

mav3rick5y ago

L3 => IPs L2 => MAC addresses

xg155y ago· 18 in thread

> Note that the next hop’s IP address is in the router’s memory only: it does not appear in the packet at any time.

This clears some points that always puzzled me:

This leaves the question, why the indirection and why the mucking around with ARP and IPs that are never used as the destination to anything?

Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

jcrawfordor5y ago

jwatzman5y ago

Along with the above fantastic comment, I found https://apenwarr.ca/log/20170810 an interesting (if inflammatory/divisive) essay on the subject and its history.

1 more reply

therealcamino5y ago

1 more reply

bnjms5y ago

Beautiful rant.

Request. Do TLS next (if it’s in your wheelhouse). I’ve been looking for a good summary of ECC and selected curves in tls 1.2

1 more reply

swinglock5y ago

> It doesn't matter which subnet the next hop's IP is in, as the routing table isn't consulted for it anyway - it's only used in ARP)

You can only ARP for hosts on the same subnet as you, terrible hacks excluded.

> This leaves the question, why the indirection and why the mucking around with ARP and IPs that are never used as the destination to anything?

Because it was designed in layers so that different layers could be replaced. We didn't know we'd end up with mostly only IP and Ethernet in LANs back then.

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

It could have been done in any number of ways. It's not that much complexity through and it would bake Ethernet MACs into everything IP, even in the cases where it's not needed.

AlphaSite5y ago

Fiddling with ARO comes up more often that you’d think, especially as a quick easy way to handle HA.

james4125y ago

mrkstu5y ago

It is actually a pretty reasonable way of integrating hardware MACs directly into the internetworking stack.

yabones5y ago

[1] https://en.wikipedia.org/wiki/Point-to-Point_Protocol

[2] https://en.wikipedia.org/wiki/Multilayer_switch#Layer-3_swit...

yardstick5y ago

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

If my next hop for 1.1.1.1 is via 192.168.2.254 I know immediately it’s going out port 2. If it was a MAC I’d have no clue unless I memorised all MACs in my networks.

w75y ago

monocasa5y ago

A lot of protocols don't end up using Ethernet as the physical layer, even ones you still use today.

Qemu (and I think Docker too?) use SLIRP internally for access between VMs which is ultimately an IP layer bridge.

On the WAN side (at least at one point, I could be out of date here) they didn't use Ethernet, but instead IP layer routing as well, on top of stuff like PPP and SONET.

starfallg5y ago

>Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

wmf5y ago

Historically, some links didn't have MAC addresses and different link types have different address types so it's easier for the routing protocols to work in terms of IP addresses.

jlgaddis5y ago

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

---

Although the Ethernet protocol itself has been around for ~40 years now, for the majority of that time it mostly only existed "in the LAN".

Along came MPLS, MetroE, EoMPLS, Carrier Ethernet, etc., and soon enough everyone was "tunneling" Ethernet between sites but we were still mostly using those "legacy" protocols "on the WAN".

---

So, I'll offer an alternative answer to your question:

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

Sure, if you had done it about 30 years earlier!

notyourday5y ago

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

rmetzler5y ago

bluecmd5y ago

IPV6 commonly does that. Your next hop is installed as a link-local fe80-entry which is derived from the mac address. Not exactly what you're after, but removes the IP numbering need.

geerlingguy5y ago· 12 in thread

I learned how routers really work from Ericsson's seminal video on the matter, The Good Warriors of the Net: https://www.youtube.com/watch?v=x9XWxD6cJuY

Though I always thought the "router switch" was much more fun.

lelandbatey5y ago

Slightly higher quality version here: https://youtu.be/PBWhzz_Gn10

jpxw5y ago

Just watched the whole video, amazing, nostalgic but also subtly wrong in a number of annoying ways!

whoisburbansky5y ago

For someone with only a passing understanding of router innards, what should I watch out for from this talk to avoid coming away with an incorrect understanding of how things work?

1 more reply

hoten5y ago

"accidents happen [in LAN]", "at least the router is exact (for the most part)"

What does this mean?

Then towards the end... "the packet is recycled". What?

geerlingguy5y ago

Spare_account5y ago

I watched this decades ago and forgot just enough about it that I couldn't find it again recently when I tried. Thank you

dec0dedab0de5y ago

Haha I forgot about this video. It was required viewing at my first job.

sgillen5y ago

schoen5y ago

Also, with the ping of death, the only way to use it was to very noticeably crash systems -- not to secretly build a botnet or something, as might have been done with RCE vulnerabilities.

kitteh5y ago

It was popular for booting people off IRC, but there were other exploits around the same era that did the same such as land and teardrop.

geerlingguy5y ago

I do remember hearing about it causing issues here and there in the 90s/early 00s, but rarely. Never hear about it anymore.

But I do remember AppleTalk causing issues more frequently on a network I helped manage that had radio studios with two Macs per studio, but mostly Windows PCs through the rest of the building.

That place also had a Macintosh 512K running its phone system until around 2010!

methou5y ago

Someone gotta make this in Factorio

Cyph0n5y ago· 4 in thread

> If that is the case, my condolences.

As a software engineer working on IOS-XR, that gave me a chuckle :p

anotherkamila_5y ago

> As a software engineer working on IOS-XR, that gave me a chuckle :p

Good good :D

Thanks for the clear data plane / control plane explanation, that's a good way to summarise the distinction. May I link to it from the article?

Cyph0n5y ago

Thanks! Sure, go ahead!

peterwwillis5y ago

Protocol-wise, isn't it common now for the NP on higher end stuff to handle L4 and higher protocols? Or are those still largely managed by the CPU?

Cyph0n5y ago

Yeah, NPs can handle L4 protocols, but I believe it’s usually a hybrid approach where the logic is split between CPU and NP.

1 more reply

teleforce5y ago· 4 in thread

Based on the article, it is explaining router internal based on P4. Perhaps I should try to use P4 for the above mentioned requirements?

mitchs5y ago

Also, consider consider upgrading to the more active fork called Free Range Routing.

zamadatix5y ago

GNS3 and run actual vendor virtual images if you want to have the actual vendor interface, it's made for this scenario.

wmf5y ago

VyOS supports bridging and routing although the config is more like a Linux host and unlike a real Cisco/Arista switch.

snuxoll5y ago

The Vyatta/VyOS/EdgeOS CLI took heavy inspiration from Juniper’s JunOS, so saying the config is unlike a “real” switch is factually incorrect.

It’s still a little odd, but as somebody quite comfortable with JunOS (I run Juniper switches in my homelab) it’s pretty easy to pick up any of the Vyatta forks and hit the ground running.

anotherkamila_5y ago· 2 in thread

Hi, I'm the author. Uh hi w00t how why what's it doing here?! :D

I promise to make it better and actually finish it now! Check back in a day or two I guess? Also I should post the code I promised. Hello from the ADHD squirrel!

anotherkamila_5y ago

Also thanks a ton for your suggestions, I really appreciate them!

coolgeek5y ago

Love the URL!

boryas5y ago· 2 in thread

I believe this piece does a good job with forwarding, but would be improved by a discussion of termination.

Routing is only triggered when the packet is L2 terminated: the destination MAC of the packet is one of the router's own MACs.

If the packet's destination MAC does not belong to the router, it doesn't matter what is in its IP header, it will be switched in the LAN it came in on.

This design also generalizes nicely to the case when the destination IP of a routed packet is one of the router's IPs.

anotherkamila_5y ago

Good point. Incorporating that would require more brain that I have right now (bad timezone :D), but you're right, I completely left that out. May I update the article with a link to this comment?

boryas5y ago

sure!

rabuse5y ago· 1 in thread

walshemj5y ago

I would start with how internal routing works before starting on WAN routing.

Id look at the cisco press and CCNA training materials

bogomipz5y ago· 1 in thread

anotherkamila_5y ago

You're right, I noticed it about an hour ago -- no idea what was going on in my head then :-/ Fixed already. Thank you!

icedchai5y ago

Maybe a mention of other, non-ethernet, links. Serial PPP? Frame Relay? I realize these are mostly historical curiosities these days, but it might help to enforce the differences between L2 and L3.

wbsun5y ago

Click is a very good software router to read and learn: https://github.com/kohler/click

It can be more than a router though.

dnautics5y ago

mrburton5y ago

I just have to say this "magnets how do they work"? ;) Anyone get the reference?

j / k navigate · click thread line to collapse