Everything was a battle. The configuration is very finicky and not well documented, and every time I went to edgecore or sonic for support, they would blame the other party.
We did eventually get everything working, but it took hours of tinkering on the configuration. There's all sorts of undocumented magic about how ports, interfaces, lanes, etc all have to line up with each other. If it doesn't like how you've configured BGP, it will silently fail and refuse to link. And God help you if you have breakout cables. And for some reason, all the services are run in docker containers, which makes it even more painful to debug. Reapplying configurations is slow and buggy in itself, so you have to reboot the switch constantly.
Ultimately I find sonic hard to recommend. Yes it's free, but I'm really not sure that's a benefit. Any home enthusiast will buy a consumer grade switch with an OS preinstalled, or use OpenWRT. Any business using this for production would be wise to go with a stable product with good support, like Cumulus. As much as I hate how Cisco does business, at least their products will mostly work how you need out of the box.
Let's talk a bit more about SONiC...
For one, SONiC literally lists that pull requests that aren't already planned and approved will not be accepted[1]. This defeats a good chunk of the value of having a community project. People will want to contribute and extend your platform in ways you never thought of, and they'll do it in a completely decentralized fashion.
Another issue I see is that SONiC literally holds back everything to an old Linux kernel and ships random BSP blobs that are unvetted. This is a nasty combination for anyone who wants to consider their NOS trusted or secure. They're on a 4.9.x kernel, and while that is still maintained, it is far from the best option if you want to take advantage of innovation in Linux networking.
I'm also generally confused on why this whole project isn't just "let's get the networking tools and hardware support stuff into standard Linux distributions and leverage their tooling and communities". This was also a problem I had with Cumulus. When I tore apart Cumulus, I figured out that it was less than a dozen unique tools and a distribution rebuilt for 32-bit MIPS and PowerPC. It was pretty trivial to rebase to standard Fedora or Debian and get a better platform out of it.
And finally, I don't really think this provides any real innovation. It's not really different from Cumulus, Open Network Linux, and others. And ONL actually is using more up to date kernels (5.4.x as of right now!) and offers better networking tools!
What I would love to see is all these people who keep doing this crap working in the actual Linux distribution communities to build and integrate with upstream projects so that everyone downstream gets all kinds of flexibility.
Imagine if you had a flavor of Fedora CoreOS for your network gear! The immutable OS, updated with RPM-OSTree, fresh software stack, and broad hardware support, all in one neat package.
If we treated the network gear like weaker servers, instead of specialty equipment, there's so many more interesting things you can do!
[1]: https://github.com/Azure/SONiC/wiki/Sonic-Roadmap-Planning
Almost all of which are open source (or at least "source available"), with the exception of switchd, which cannot be open sourced because it links with proprietary asic sdk's. I don't see how having very few custom tools over a vanilla Linux distribution is a bad thing.
>It was pretty trivial to rebase to standard Fedora or Debian and get a better platform out of it.
If you enable upstream Debian apt sources in your sources.list then it effectively is standard Debian - plus switchd.
Of course it is entirely possible to take all of the components of Cumulus Linux and use them on a separate operating system - enter sonic, vyos, etc - so if you build out such a system which can also drive ASICs and that you prefer over Cumulus, you can take full advantage of all of Cumulus's open source contributions.
>What I would love to see is all these people who keep doing this crap working in the actual Linux distribution communities to build and integrate with upstream projects so that everyone downstream gets all kinds of flexibility
If I read you correctly, Cumulus works upstream as much as it can:
~/linux$ git log --author "cumulusnetworks.com" --oneline | wc -l
773
~/ifupdown2$ git log --author "cumulusnetworks.com" --oneline | wc -l
1265
~/frr$ git log --author "cumulusnetworks.com" --oneline | wc -l
8107
I like to believe Cumulus is quite active in the communities of projects it uses. I feel I may have misunderstood your point, though.>If we treated the network gear like weaker servers, instead of specialty equipment, there's so many more interesting things you can do!
I completely agree, that's the dream!
Disclaimer: I work at Cumulus
The problem historically with Cumulus on this was that it was heavily obfuscated. In the past, when I talked to Cumulus sales folks, it was not quite as honest as what you've said.
I don't have a problem with the "shipping a Linux distribution you can support" thing. I have a problem with "not making it so the stuff you have is available everywhere (i.e. push into Fedora _and_ Debian to feed into all distros and ecosystems)".
> If I read you correctly, Cumulus works upstream as much as it can. I like to believe Cumulus is quite active in the communities of projects it uses. I feel I may have misunderstood your point, though.
Cumulus is actually a nice exception to this rule. Most Linux-based network operating systems do not bother (including SONiC, VyOS, EOS, etc), but Cumulus does good work here. My only complaint is the focus on ifupdown2 instead of helping make cross-distro tools like NetworkManager support these things. It's been a long time since NetworkManager was only for desktop-only use-cases and only did Wi-Fi. It's the standard tool on a wide range of distributions and supports server use-cases very well. I personally use it over ifupdown and netconfig on my systems.
Random note. I worked at a switching startup (a few). At one we always ran own latest code. After an update to a core switch everything looked good, but then people started to complain things where very slow. Went looking. Switch looked fine but dropping traffic towards CPU which should not happen. In checking the cacti graphs for that switch (10 second polling) all the graphs that showed the ports between the different networks were exactly the same flat line at a max of 134MB/s on 10G pots. Hum, strange. Hold on, that sounds like the max BW between the ASIC and the CPU port! Let check some bits in the ASIC configuration. Yup. New build forgot to set HW routing on in the pipeline so every packet was punted to the CPU for route processing. Lucky control plane policy had the STP etc, packets at a higher queue. Tweak the bit, blam, graphs go to 11 :) File bug.
The bottleneck is not only affecting networking but GPU industry as well. That's probably the main reason why Nvidia bites the bullet and bought major Infiniband player Mellanox for close to USD7 Billions deal. The bottleneck is only just bearable for video and games but not when you have to scale the processing of big data AI and machine learning applications.
[1] https://www.mellanox.com/pdf/whitepapers/PCI_3GIO_IB_WP_120....
Right now, I believe it’s limited to fixed systems, but work is ongoing to get SONiC running on the distributed chassis. Exciting stuff!
[1]: https://www.cisco.com/c/en/us/products/routers/8000-series-r...
[2]: https://blogs.cisco.com/sp/cisco-goes-sonic-on-cisco-8000
I teach computer networking lab but for now we need to resort using dual boot Linux Switch Appliance (LISA) that uses custom kernel and another vanilla kernel running Quagga on multi-port Ethernet embedded PC. The good thing is that both LISA and Quagga use CLI environment similar to familiar Cisco switch/router IOS. I really wish there is an open source alternative offering that can seamlessly support layer 2 and layer 3 without dual boot, perhaps using Software Defined Networking (SDN) concept with an intuitive CLI.
Recently the new Shortest Path Bridging (SPB) has been integrated with 802.1Q since 2018 (bye-bye TRILL). I reckon any reasonably good Linux based open source layer 2 and 3 network OS player will be extremely popular overnight for enterprise, consumer and education. Together with eBPF this thing should be flying on the new off-the-shelf whitebox supporting 40Gbps multi-port Ethernet. Imagine a LAN or Metro LAN party with this beast :-)
[1]https://archive.fosdem.org/2019/schedule/event/from_closed_t...
More like a "switch" operating system - it looks like SONiC is Microsoft's answer to ONL, Stratum, OpenSwitch, Cumulus et al.. Basically open source software to run on your cheap whitebox (or expensive greybox) switch.
https://www.servethehome.com/get-started-with-40gbe-sdn-with...
Here is a cheatsheet:
https://cumulusnetworks.com/learn/resources/cheatsheets/nclu
Also Cumulus runs FRR (as does sonic) but on Cumulus you can do sudo vtysh and pretty much be at a CLI for routing like you are at an IOS prompt.
Sonic sticks all configuration in different docker containers in Json files and can be a real pain. Also not all commands are hitless, ie some will restart forwarding. That is being worked on. SONiC is pretty much “what MS wanted for large scale ops” and still is rough around the edges for enterprise IT.
There are a number of companies that support Sonic in production enterprise such as Dell and Apstra.
All configurations go in a single JSON file, which is used to configure the docker containers that manage the switching hardware.
https://medium.com/@Meela349588204/using-cows-to-explain-the...
If it runs on a $10 raspberry pi it will run fine on a $20,000 switch.
And then there are the high-availability (HA) requirements which typically lead to redundancy in software and hardware.
Cisco NXOS is Yocto. They planned to move to Fedora at some point. Might have by now.
Cumulus is Debian based. switchd is closed sourced ASIC driver.
Junos is now a FreeBSD VM running on a Linux boot not sure what version.
Sonic is Debian based IIRC.
OS10 (Dell / Force10 was NextBSD) but I think with OPX (open source OS10 that SONiC will replace - personal option) moved to Linux.
FoundryOS was custom (VXworks?) Current version is Broadcom Strata.
Extreme original was VXworks. Current is Linux based.
Cisco IOS XE/XR is Linux (Debian IIRC).
SwtichLight is Linux as well as the BSNOS with their open flow stuff on top.
Ubiquiti is Vyatta running on Linux.
That is a quick dump from meat cache.
Linux (and other Unix or Unix-like) kernels (and indeed full OS distributions) run fine on many low-end and embedded CPUs and hardware, and network switches are no exception.
OpenWRT is Linux-based and runs on extremely low-end switches such as home routers and access points.
Arista EOS is based on Fedora. (Of course Arista switches have real server CPUs and lots of memory. People do crazy things like running KVM on them.)
Juniper's Junos is based on BSD.
Remember that on a high-speed switch packets usually pass through the switching hardware without touching the switch CPU. Programmable switching chips like Tofino typically run pre-compiled pipelines that execute on-chip at line rate. The switch OS is primarily used for running management software that programs the hardware, runs the CLI, and/or provides other services. The OS can also run software to provide higher-level protocols and services such as BGP, DNS, or DHCP.
Most network equipment run their control plane using standard Intel CPUs, so running Linux isn't much of a strech.
Ten years ago switches were using 800 MHz single-core PowerPCs which was adequate to run Linux (although many were using VxWorks or whatever). Now the $400 switches are still wimpy but more expensive "disaggregated" switches are using Atoms or low-end Xeons.