Security is an Architectural Issue
Security is a topic that we introduced to Computer Networks: A Systems Approach back in the second edition, i.e., in the late 1990s. We took a network-centric approach to the topic, which made sense for a networking textbook, but clearly network security is only a small part of the total cybersecurity landscape. A book on the Systems Approach to security sits high on our to-do list. This week’s newsletter is an effort to capture some of the ideas that will eventually form the basis for such a book.
I’ve been interested in architecture–of the physical building variety, as distinct from computer or network architecture–for as long as I can remember. So I was pretty excited when I got to work in a Frank Gehry-designed building at MIT in the late 2000s. As it turns out, the building is something of a case study in the perils of high-profile architecture, with a litany of defects including mold, ice falling on passers-by from the roof, and a conference room that made people (including both me and Frank Gehry) sea-sick. While MIT eventually settled a lawsuit against Gehry and the builders, it was never entirely clear how many of the issues were a matter of design versus implementation. But it was pretty clear that architectural decisions have significant implications for those who have to live with them.
Which brings us to the Internet and its architectural shortcomings. While the the Internet has been hugely successful in almost every dimension, even those most closely associated with it have pointed out that it lacked a solid architectural foundation on the matter of security. Vint Cerf, for example, argued that the Internet’s original architecture had two basic flaws: too little address space, and no security. David Clark, the “architect of the Internet”, suggested that how we apply the principle known as the “end-to-end argument” to the Internet should be rethought in the light of what we now know about security and trust (among other things).
To paraphrase the concerns raised by Internet pioneers, the Internet has done really well at connecting billions of people and devices (now that the address space issues are dealt with in various ways), but it remains quite flawed in terms of security. The original design goal of making it easy for a distributed set of researchers to share access to a modest number of computers didn’t require much security. The users mostly trusted each other, and security could be managed on end-systems rather than being a feature of the network. In 1988, the Morris Worm famously illustrated the limitations of depending on end-system security alone. So today we have an architecture where the default is that every device can talk to every other device, and any time we want to enforce some other behavior, we need to take some specific action–like inserting a firewall and explicitly blocking all traffic except some specified subset. And that approach of adding point fixes, like firewalls, has led to a proliferation of security devices and technologies, none of which really changes the architecture, but which does increase the overall complexity of managing networks.
A few significant developments in the last decade give me reason to think there may be cause for optimism. One is the emergence of “zero trust” approaches to security, which pretty much inverts the original security approach of the Internet. The term was coined at Forrester in 2009 and can be thought of as a corollary to the principle of least privilege laid out by Saltzer and Schroeder in 1975:
“Every program and every user of the system should operate using the least set of privileges necessary to complete the job.”
Rather than letting every device talk to every other device, zero trust starts from the assumption that no device should be trusted a priori, but only after some amount of authentication does it get access to a precisely scoped set of resources–just the ones necessary to complete the job.
Zero trust implies that you can no longer establish a perimeter firewall and let everything inside that perimeter have unfettered access to everything else. This idea has been adopted by approaches such as Google’s Beyond Corp in which there is no concept of a perimeter, but every system access is controlled by strict authentication and authorization procedures. From my perspective, the ability to enforce zero trust has also been one of the major benefits of software-defined networking (SDN) and network virtualization.
In the early days of network virtualization, my Nicira colleagues had a vision that everything in networking could eventually be virtualized. At the time I joined the team, the Nicira product had just virtualized layer 2 switching and layer 3 routing was about to ship. It took a little while after the VMware acquisition of Nicira for us to make our way up to layer 4 with the distributed firewall, and in my mind that was the critical step to making a meaningful impact on security. Now. rather than putting a firewall at some choke point and forcing traffic to pass through it, we could specify a precise set of policies about which devices (typically virtual machines in those days) could communicate with each other and how. Rather than operate with “zones” in which lots of devices that didn’t need access to each other nevertheless could communicate, it was now a relatively simple matter to specify precise and fine-grained security policies regarding how devices should communicate.
A similar story played out with SD-WAN. There are lots of reasons SD-WAN found a market, but one of them was that you no longer had to backhaul traffic from branch offices to some central firewall to apply your security policy. Instead you could specify the security policy centrally but implement it out at the branches–a significant win as more and more traffic headed for cloud services rather than centralized servers in a corporate data center.
This paradigm of specifying policy centrally and having software systems that implement it in a distributed manner also applies to securing modern, distributed applications. Service meshes are an emerging technology that applies this paradigm, and a topic that we’ll go deeper on in an upcoming post.
So while it is too early to declare success on the security front, I do think there are reasons for optimism. We don’t just have an ever-expanding set of point fixes to an architectural issue. We actually have some solid architectural principles (least privilege, zero trust) and significant technological advances (SDN, intent-based networking, etc.) that are helping to reshape the landscape of security.
Earlier this month I wrote a blog post recounting the history of OpenFlow at ONF. It got me thinking about how to have impact, and potentially change the course of technology. The OpenFlow experience makes for an interesting case study.
For starters, the essential idea of OpenFlow was to codify what at the time was a hidden, but critical interface in the network software stack: the interface that the network control plane uses to install entries in the FIB that implements the switch data plane. The over-the-wire details of OpenFlow don’t matter in the least (although there were plenty of arguments about them at the time). The important contributions were (a) to recognize that the control/data-plane interface is pivotal, and (b) to propose a simple abstraction (match/action flow rules) for that interface.
In retrospect, one of the secrets of OpenFlow’s success was its seemingly innocuous origins. The original paper, published in ACM SIGCOMM’s CCR (2008), was a Call-to-Action for the network research community, proposing OpenFlow as an experimental open interface between the network’s control and data planes. The goal was to enable innovation, which at the time included the radical idea that anyone—even researchers—should be able to introduce new features into the network control plane. Care was also taken to explain how such a feature could be deployed into the network without impacting production traffic, mitigating the risks such a brazen idea could inflict on the network.
It was a small opening, but a broad range of organizations jumped into it. A handful of vendors added an “OpenFlow option” to their routers; the National Science Foundation (NSF) funded experimental deployments on University campuses; and Internet2 added an optional OpenFlow-substrate to their backbone. ONF was formed to provide a home for the OpenFlow community and ON.Lab started releasing open source platforms based on OpenFlow. With these initiatives, the SDN transformation was set in motion.
Commercial adoption of SDN was certainly an accelerant, with VMware acquiring the startup Nicira and cloud providers like Google and Microsoft talking publicly about their SDN-based infrastructures (all in 2012), but this was a transformation that got its start in the academic research community. Over time some of the commercial successes have adapted SDN principles to other purposes—e.g., VMware’s NSX supports network virtualization through programmatic configuration, without touching the control/data plane interface of physical networking equipment—but the value of disaggregating the network control/data planes and logically centralizing control decisions proved long lasting, with OpenFlow and its SDN successors running in datacenter switching fabrics and Telco access networks today.
The original proposal did not anticipate where defining a new API would take the industry, but the cascading of incremental impact is impressive (and perhaps the most important takeaway from this experience). Originally, OpenFlow was conceived as a way to innovate in the control plane. Over time, that flexibility put pressure on chip vendors to also make the data plane programmable, with the P4 programming language (and a toolchain to auto-generate the control/data plane interface) now becoming the centerpiece of the SDN software stack. It also put pressure on switch and router vendors to make the configuration interface programmable, with gNMI and gNOI now replacing (or at least supplementing) the traditional router CLI.
OpenFlow was also originally targeted at L2/L3 switches, but the idea is now being applied to the cellular network. This is putting pressure on the RAN vendors to open up and disaggregate their base stations. The 5G network will soon have a centralized SDN controller (renamed RAN Intelligent Controller), hosting a set of control applications (renamed xApps), using a 3GPP-defined interface (in lieu of OpenFlow) to control a distributed network of Radio Units. SD-RAN is happening, and has the potential to be a killer app for SDN.
One of the more interesting aspects of all of this is what happened to OpenFlow itself. The specification iterated through multiple versions, each enriching the expressiveness of the interface, but also introducing vendor-championed optimizations. This led to data plane dependencies, an inherent risk in defining what is essentially a hardware abstraction layer on top of a diverse hardware ecosystem. P4 is a partial answer to that. By coding the data plane’s behavior in a P4 program (whether that program is compiled into an executable image that can be loaded into the switching chip or merely descriptive of a fixed-function chip’s behavior) it is possible to auto-generate the control/data plane interface (known as P4RunTime) in software, instead of depending on a specification that evolves at the pace of standardization. (This transition to P4 as a more effective embodiment of the control/data plane interface is covered in our SDN book.)
It is now the case that the network—including the control/data plane interface—can be implemented, from top to bottom, entirely in software. OpenFlow served its purpose bootstrapping SDN, but even the Open Networking Foundation is shifting its focus from OpenFlow to P4-based SDN in its new flagship Aether project. Marc Andreessen's famous maxim that "software is eating the world" is finally coming true for the network itself!
A well-placed and smartly-defined interface is a powerful catalyst for innovation. OpenFlow has had that effect inside the network, with the potential to replicate the success of the Socket API at the edge of the network. Sockets defined the demarcation point between applications running on top of the Internet and the details of how the Internet is implemented, kickstarting a multi-billion dollar industry writing Internet (now cloud) applications. Time will tell how the market for in-network functionality evolves, but re-architecting the network as a programmable platform (rather than treating it as plumbing) is an important step towards improving feature velocity and fostering the next generation of network innovation.
The Accidental SmartNIC
The moment when the current generation of SmartNICs really captured my attention was during a demo at VMworld 2019. At the time, ESXi was formally supported on x86 processors only, but there had been a skunkworks project to run ESXi on ARM for several years. Since most SmartNICs have an ARM processor, it was now possible to run ESXi on it. I do remember thinking “just because you can do something doesn’t mean you should” but it made for a fun demo.
This certainly wasn’t my first exposure to SmartNICs. As a member of the networking team at VMware, I was periodically visited by SmartNIC vendors who wanted to offer their hardware as a way to improve the performance of virtual switching. And AWS had been subtly incorporating them into their EC2 infrastructure since about 2014 via the “Nitro” System. But as I looked more closely at SmartNIC architectures, I realized that I had actually been involved in an earlier incarnation of the technology in the 1990s–not that we called them SmartNICs then. Even the term NIC was not yet standard terminology. Below is a slightly prettified diagram from a paper I published in SIGCOMM in 1991.
If you compare this to the block diagram of a current generation SmartNIC (e.g., here), you will see some pretty remarkable similarities. Of course you need to connect to a host bus on one side; that’s likely to be PCIe today. (That choice was much less obvious in 1990.) And you need the necessary physical and link layer hardware to connect to your network of choice; today that’s invariably some flavor of Ethernet, whereas in 1990 it still seemed possible that ATM would take off as a local area network technology (it didn’t). In between the host and the network, there’s one or more CPUs, and some programmable hardware (FPGA). It’s the programmability of the system, delivered by the CPU and FPGA, that makes it “Smart”.
To be clear, I definitely didn’t invent the SmartNIC. The earliest example that I can find was described by Kanakia and Cheriton in 1998. Other researchers around this time took a similar approach. There was a reason we gravitated towards designs that were relatively expensive but highly programmable: we didn’t yet know which functions belonged on the NIC. So we kept our options open. This gave us the ability to move functions between the host and the NIC, to experiment with new protocols, and to explore new ways of delivering data efficiently to applications. This was essentially my introduction to the systems approach to networking: building a system to experiment with various ways of partitioning functionality among components, and seeking an approach that would address end-to-end concerns such as reliability and performance. I was fortunate to be influenced in the design of my “SmartNIC'' by David Clark, the “architect of the Internet” and co-author of the end-to-end argument, and this work also led to my collaboration with Larry Peterson.
The 1990s, in retrospect, was a time when a lot of questions about networking were still up for debate. As we tried to achieve the then-crazy goal of delivering a gigabit per second to a single application, there was a widespread concern that TCP/IP would not be up to the task. Perhaps we needed completely new transport protocols, or a new network layer (e.g., ATM). Perhaps transport protocols were so performance-intensive that they needed to be offloaded to the NIC. With so many open questions, it made sense to design NICs with maximum flexibility. Hence the inclusion of a pair of CPUs and some of the largest FPGAs available at the time.
By the 2000s, many of these networking questions were addressed by the overwhelming success of the Internet. TCP/IP (with Ethernet as the link layer) became the dominant networking protocol stack. There turned out to be no problem getting these protocols from the 1970s to operate at tens of gigabits per second. Moore’s law helped, as did the rise of switched Ethernet and advances in optical transmission. As the protocols stabilised, there wasn’t so much need for flexibility in the NIC, and hence fixed-function NICs became the norm.
Jump ahead another ten years, however, and fixed-function NICs became a liability as new approaches to networking emerged. By 2010 NICs frequently included some amount of “TCP offload”, echoing one of the concerns raised in the 1990s. These offloads left hosts free to transfer large chunks of data to or from the NIC while the NIC added the TCP headers to segments on transmit and parsed them on receipt. This was a performance win, unless you wanted anything other than a simple TCP/IP header on your packets, such as an extra encapsulation header to support network virtualization. The optimization of performance for the common case turned into a huge handicap for innovative approaches that couldn’t leverage that optimization. (My colleagues at Nicira found some creative solutions to this problem, ultimately leading to the GENEVE encapsulation standard).
As networking became more dynamic with the rise of SDN and network virtualization (and the parallel rise of software-defined storage) it started to become clear that once again the functions of a NIC could not be neatly tied down and committed to fixed-function hardware. And so the pendulum swung back to where it had been in the 1990s, where the demand for flexibility warranted NIC designs that could be updated at software speeds–leading to what we might call the second era of SmartNICs. This time, it’s the need to efficiently support network virtualization, security features, and flexible approaches to storage that demands highly capable NICs. While all these functions can be supported on x86 servers, it’s increasingly more cost-effective to move them onto a SmartNIC that is optimized for those tasks and still flexible enough to support rapid innovation in cloud services. This is why you see projects like AWS Nitro, Azure Accelerated Networking, and VMware’s Project Monterey all moving functions that you expect to see in a hypervisor to the new generation of SmartNICs.
Why did I title this post “The Accidental SmartNIC”? Because I wasn’t trying to make a SmartNIC, there was just so much uncertainty about the right way to partition our system that I needed a high degree of flexibility in my design. (It’s also a nod to the excellent film “The Accidental Tourist”.) Determining how best to distribute functionality across components is a core aspect of the systems approach. Today’s SmartNICs exemplify that approach by allowing complex functions to be moved from servers to NICs, meeting the goals of high performance, rapid innovation, and cost-effective use of resources. Building a platform that supports innovation is a common goal in systems research and we see that playing out today as SmartNICs take off in the cloud.
Defining "A Systems Approach"
Last week we noticed that our book, Computer Networks: A Systems Approach, was discussed in a thread on Hacker News. It was nice to see mostly positive commentary, but we also noticed a fairly involved debate about the meaning of “Systems Approach”. Some readers had a pretty good idea of what we meant, others mistakenly took it for a reference to “Systems and Cybernetics”, which we definitely never intended. Others interpreted it as an empty, throw-away term. “Don’t these people read prefaces?” we thought, before remembering that we dropped the definition from the latest edition, thinking it was old news. Clearly, we had been making some assumptions that left many of our readers in the dark. Rather than just rescuing the old preface from the recycling bin, we thought it would be timely to revisit the meaning of “Systems Approach” as we’re now building a whole series of books around that theme.
The term “Systems” is used commonly by computer science researchers and practitioners who study the issues that arise when building complex computing systems such as operating systems, networks, distributed applications, and so on. At MIT, for example, there is a famous class 6.033: Computer System Design (with an excellent accompanying book) that is a typical introduction to the systems field. The required reading list is a tour through some of the most influential systems papers. The key to the systems approach is a "big picture" view – you need to look at how the components of a system interact with each other to achieve an overall result, rather than fixating on a single component (either unnecessarily optimizing it or trying to solve too many problems in that one component). This is one of the important takeaways of the End-to-End Argument, a landmark paper for system design.
A systems approach also has a strong focus on real-world implementation, with the Internet being the obvious example of a widely-deployed, complex networking system. This seems incredible now, but when we wrote our first edition in 1995, it was not yet obvious that the Internet would be the most successful networking technology of all time, and organising our book around the principles that underlie the design and implementation of the Internet was a novel idea.
The Systems Approach is a methodology for designing, implementing, and describing computer systems. It involves a specific set of steps:
In following this methodology, there are requirements that come up again and again. Scalability is an obvious example, and appears as a key design principle throughout networking, e.g., in the partitioning of networks into subnets, areas, and autonomous systems to scale the routing system. A good example of cross-disciplinary systems thinking is the importing of techniques developed to scale distributed systems such as Hadoop to solve scaling challenges in software-defined networking.
Generality is another common requirement: the way that the Internet was designed to be completely agnostic to the applications running over it and the class of devices connected to it distinguishes it from networks like the phone network and the cable TV network, whose functionality has now been largely subsumed by the Internet.
And there are a set of system-agnostic design principles that are used extensively to guide systems designers. They are not mathematically rigorous (compared to, say, Maxwell’s Equations or the Shannon-Hartley theorem) but are considered best practices:
In applying the Systems Approach applies to networking, and to our books, you’ll notice that we start every chapter in Computer Networks with our problem statement. In chapter 1 we go on to develop requirements for a global network that meets the needs of various stakeholders, satisfies scaling objectives, manages resources cost-effectively, and so on. Even though the Internet is already built, we’re walking the reader through the system design process that led to it being a certain way, so that they are learning systems principles and best practices like those mentioned above. We call many of these out explicitly in “Bottom Line” comments such as this one.
One of the most challenging aspects of teaching people about networking is deciding how to handle layering. On the one hand, layering is a form of abstraction–a fine system design principle. On the other hand, layering can sometimes prevent us from thinking about how best to implement the system as a whole. For example, in recent years it’s become clear that HTTP, an application layer protocol, and TCP, a transport layer protocol, don’t work terribly well together from a performance perspective. Optimizing each independently could only take us so far. Ultimately by looking at them as parts of a system that needs to deliver reliability, security, and performance to applications, both HTTP and the transport layer evolved, with QUIC being the new entrant to the transport layer. What we have tried to do is give readers the tools to see where such system-level thinking can be applied, rather than just teach them that the 7-layer model was handed down from on high and can’t be touched.
Hopefully this helps give some clarity around what we mean by “A Systems Approach”. It’s certainly a way of thinking that becomes natural over time, and we hope that as you read our books it will become part of your thinking as well.
Congestion control has been on my mind a lot over the last year, given our dependence on a smooth-running Internet to work from home, connect with friends, and watch a lot of streaming entertainment. But it is particularly on our minds at Systems Approach, LLC as we (Larry Peterson, Larry Brakmo and myself) have started the latest book in our series, focussed on Congestion Control. With that in mind, I wanted to revisit some content I had worked on earlier in the last few months.
Fractals are everywhere, including in the Internet's traffic patterns. Photo by Enrico Sottocorna on Unsplash
Back in another era, at the end of 2019, I was writing one of those end-of-year prediction posts, and I decided to say something about the future of the Internet. I don’t regard myself as especially bold when it comes to predictions – unlike the inspiration for my post, the inventor of the Ethernet, Bob Metcalfe. Metcalfe famously predicted the collapse of the Internet in 1995 and publicly ate his printed words when proven wrong. My prediction was “I confidently predict that the Internet is not going to collapse in my lifetime”, which on reflection was a bolder statement than I realised at the time.
Little did I know how much the world would change in a few months after I made the above prediction. As COVID-19 spread across the globe, a large proportion of the world’s workforce moved to working from home, Zoom and Slack became primary means of communication for knowledge workers, and online video streaming – which was already 60% of Internet traffic pre-COVID – suddenly became massively more important to hundreds of millions of people.
Adding capacity in advance
To the surprise of many, the Internet has handled this incredibly well. The Atlantic did a good piece on this, pointing out that it was built to withstand a wide range of failures (although the claim that it was supposed to withstand nuclear war has been well debunked). There are a lot of reasons the Internet has fared so well, including the pioneering work on congestion control that I touched on in my pre-COVID post. But another aspect jumped out at me as I read The Atlantic: The Internet has generally been built with a lot of free capacity. This might seem wasteful, and it’s not how we build most other systems. Highways near major cities, for example, are normally full or over capacity at rush hour, and whenever a new lane is added to increase capacity, it’s not long before new traffic arrives to use up that capacity.
We were delighted to receive the foreword for our recent book Software-Defined Networks: A Systems Approach from SDN pioneer Nick McKeown. It is reproduced below. Also, if you're the sort of person who likes physical books, you can now purchase a print edition of this book on Amazon.
I got goosebumps when I saw the first Mosaic web browser in 1993. Something big was clearly about to happen; I had no idea how big. The Internet immediately exploded in scale, with thousands of new ISPs (Internet Service Providers) popping up everywhere, each grafting on a new piece of the Internet. All they needed to do was plug interoperable pieces together—off-the-shelf commercial switches, routers, base-stations, and access points sold by traditional networking equipment vendors—with no need to ask permission from a central controlling authority. The early routers were simple and streamlined—they just needed to support the Internet protocol. Decentralized control let the Internet grow rapidly.
The router manufacturers faced a dilemma: It’s hard to maintain a thriving profitable business selling devices that are simple and streamlined. What's more, if a big network of simple devices is easy to manage remotely, all the intelligence (and value) is provided by the network operator, not the router manufacturer. So the external API was kept minimal (“network management” was considered a joke) and the routers were jam-packed with new features to keep all the value inside. By the mid 2000s, routers used by ISPs were so complicated that they supported hundreds of protocols and were based on more than 100 million lines of source code—ironically, more than ten times the complexity of the largest telephone exchange ever built. The Internet paid a hefty price for this complexity: routers were bloated, power hungry, unreliable, hard to secure, and crazy expensive. Worst of all, they were hard to improve (ISPs needed to beg equipment vendors to add new capabilities) and it was impossible for an ISP to add their own new features. Network owners complained of a “stranglehold” by the router vendors, and the research community warned that the Internet was “ossified.”
This book is the story of what happened next, and it’s an exciting one. Larry, Carmelo, Brian, Thomas and Bruce capture clearly, through concrete examples and open-source code: How those who own and operate big networks started to write their own code and build their own switches and routers. Some chose to replace routers with homegrown devices that were simpler and easier to maintain; others chose to move the software off the router to a remote, centralized control plane. Whichever path they chose, open-source became a bigger and bigger part. Once open-source had proved itself in Linux, Apache, Mozilla and Kubernetes, it was ready to be trusted to run our networks too.
This book explains why the SDN movement happened. It was essentially about a change in control: the owners and operators of big networks took control of how their networks work, grabbing the keys to innovation from the equipment vendors. It started with data center companies because they couldn’t build big-enough scale-out networks using off-the-shelf networking equipment. So they bought switching chips and wrote the software themselves. Yes, it saved them money (often reducing the cost by a factor of five or more), but it was control they were after. They employed armies of software engineers to ignite a Cambrian explosion of new ideas in networking, making their networks more reliable, quicker to fix, and with better control over their traffic. Today, in 2021, all of the large data center companies build their own networking equipment: they download and modify open-source control software, or they write or commission software to control their networks. They have taken control. The ISPs and 5G operators are next. Within a decade, expect enterprise and campus networks to run on open-source control software, managed from the cloud. This is a good change, because only those who own and operate networks at scale know how to do it best.
This change—a revolution in how networks are built, towards homegrown software developed and maintained by the network operator—is called Software Defined Networking (SDN). The authors have been part of this revolution since the very beginning, and have captured how and why it came about.
They also help us see what future networks will be like. Rather than being built by plugging together a bunch of boxes running standardized interoperability protocols, a network system will be a platform we can program ourselves. The network owner will decide how the network works by programming whatever behavior they wish. Students of networking will learn how to programme a distributed system, rather than study the arcane details of legacy protocols.
For anyone interested in programming, networks just got interesting again. And this book is an excellent place to start.
In 2012 we published the fifth edition of Computer Networks: A Systems Approach. As with all the prior editions, we worked with a traditional publisher who retained full copyright to the material. At the time, open source software was becoming ever more important to the networking industry. Larry had decades of involvement in open source projects such as the PlanetLab and was embarking on a new initiative with the Open Networking Foundation to build an open source SDN stack, while Bruce was at Nicira contributing to the Open vSwitch project. It seemed to us that the time was right to consider how to leverage open source for networking education materials, not just for the code.
Like many textbook authors, we embarked on writing a book mostly as a labor of love. That is, we wanted to contribute to the development of the next generation of networking students, researchers, and professionals. We recognized that this desire to educate, and to reach the widest possible audience, aligned well with an open source approach. We also recognized that the book could be better if we drew on the input of the community more effectively that we were able to do in a traditional proprietary textbook.
Over the next several years we negotiated with our publisher an arrangement in which we would continue to furnish them with new editions—ideally leveraging community contributions as well as our own work—while also gaining the rights to make the entire contents of the book freely available under an open source license.
This post, originally published at the 2020 SIGCOMM Education Workshop, reports on our experiences and progress to date in developing the open source version of Computer Networks. We also offer some thoughts on future directions for networking texts, which are likely applicable to many other technical fields .
5G and the End-to-End Argument
One of the most surprising things I learned while writing the 5G Book with Oguz Sunay is how the cellular network’s history, starting 40+ years ago, parallels that of the Internet. And while from an Internet-centric perspective the cellular network is just one of many possible access network technologies, the cellular network in fact shares many of the “global connectivity” design goals of the Internet. That is, the cellular network makes it possible for a cell phone user in New York to call a cell phone in Tokyo, then fly to Paris and do the same thing again. In short, the 3GPP standard federates independently operated and locally deployed Radio Access Networks (RAN) into a single logical RAN with global reach, much as the Internet federated existing packet-switched networks into a single global network.
For many years, the dominant use case for the cellular network has been access to cloud services. With 5G expected to connect everything from home appliances to industrial robots to self-driving cars, the cellular network will be less-and-less about humans making voice calls and increasingly about interconnecting swarms of autonomous devices working on behalf of those humans to the cloud. This raises the question: Are there artifacts or design decisions in the 3GPP-defined 5G architecture working at cross-purposes with the Internet architecture?
Another way to frame this question is: How might we use the end-to-end argument—which is foundational to the Internet’s architecture—to drive the evolution of the cellular network? In answering this question, two issues jump out at me, identity management and session management, both of which are related to how devices connect to (and move throughout) the RAN.
The 5G architecture leverages the fact that each device has an operator-provided SIM card, which uniquely identifies the subscriber with a 15-digit International Mobile Subscriber Identity (IMSI). The SIM card also specifies the radio parameters (e.g., frequency band) needed to communicate with that operator’s Base Stations, and includes a secret key that the device uses to authenticate itself to the network. The IMSI is a globally unique id and plays a central role in devices being mobile across the RAN, so in that sense it plays the same role as an IP address in the Internet architecture. But if you instead equate the 5G network with a layer 2 network technology, then the IMSI is effectively the device’s “ethernet address.”
Ethernet addresses are also globally unique, but the Internet architecture makes no attempt to track them with a global registry or treat them as a globally routable address. The 5G architecture, on the other hand, does, and it is a major source of complexity in the 3GPP Mobile Core. Doing so is necessary for making a voice call between two cell phones anywhere in the world, but is of limited value for cloud-connected devices deployed on a manufacturing floor, with no aspiration for global travel. Setting aside (for the moment) the question of how to also support traditional voice calls without tracking IMSI locations, the end-to-end argument suggests we leave global connectivity to IP, and not try to also provide it at the link layer.
Let’s turn from identity management to session management. Whenever a mobile device becomes active, the nearest Base Station initiates the establishment of a sequence of secure tunnels connecting the device back to the Mobile Core, which in turn bridges the RAN to the Internet. (You can find more details on this process here.) Support for mobility can then be understood as the process of re-establishing the tunnel(s) as the device moves throughout the RAN, where the Mobile Core’s user plane buffers in-flight data during the handover transition. This avoids dropped packets and subsequent end-to-end retransmissions, which may make sense for a voice call, but not necessarily for a TCP connection to a cloud service. As before, it may be time to apply the end-to-end argument to the cellular network’s architecture in light of today’s (and tomorrow’s) dominant use cases.
To complicate matters, sessions are of limited value. The 5G network maintains the session only when the same Mobile Core serves the device and only the Base Station changes. This is often the case for a device moving within some limited geographic region, but moving between regions—and hence, between Mobile Cores—is indistinguishable from power cycling the device. The device is assigned a new IP address and no attempt is made to buffer and subsequently deliver in-flight data. This is important because any time a device becomes inactive for a period of time, it also loses its session. A new session is established and a new IP address assigned when the device becomes active. Again, this makes sense for a voice call, but not necessarily for a typical broadband connection, or worse yet, for an IoT device that powers down as a normal course of events. It is also worth noting that cloud services are really good at accommodating clients who’s IP addresses change periodically (which is to say, when the relevant identity is at the application layer).
This is all to say that the cellular network’s approach, which can be traced to its roots as a connection-oriented voice network, is probably not how one would design the system today. Instead, we can use IP addresses as the globally routable identifier, lease IP addresses to often-sleeping and seldom-moving IoT devices, and depend on end-to-end protocols like TCP to retransmit packets dropped during handovers. Standardization and interoperability will still be needed to support global phone calls, but with the ability to implement voice calls entirely on top of IP, it’s not clear the Mobile Core is the right place to solve that problem. And even if it is, this could potentially be implemented as little more than legacy APIs supported for backward compatibility. In the long term, it will be interesting to see if 3GPP-defined sessions hold up well as the foundation for an architecture that fully incorporates cellular radio technology into the cloud.
We conclude by noting that while we have framed this discussion as a thought experiment, it illustrates the potential power of the software-defined architecture being embraced by 5G. With the Mobile Core in particular implemented as a set of micro-services, an incremental evolution that addresses the issues outlined here is not only feasible, but actually quite likely. This is because history teaches us that once a system is open and programmable, the dominant use cases will eventually correct for redundant mechanisms and sub-optimal design decisions.
Why 5G Matters
Over the last month I undertook a detailed review of a new book in the Systems Approach series, 5G Mobile Networks: A Systems Approach by Larry Peterson and Oguz Sunay. Talking to people outside the technology world about my work, I soon found myself trying to explain "why does 5G matter" to all sorts of folks without a technical background. At this point in 2020, we can generally assume people know two things about 5G: the Telcos are marketing it as the greatest innovation ever (here's a sample); and conspiracy theorists are having a field day telling us all the things that 5G causing or covering up (which has in turn led to more telco ads like this one). By the end of reviewing the new book from Larry and Oguz, I felt I had finally grasped why 5G matters. Spoiler alert: I'm not going to bother debunking conspiracy theories, but I do think there is something quite important going on with 5G. And frankly, there is plenty of hype around 5G, but behind that hype are some significant innovations.
What is clear about 5G, technically, is that there will be a whole lot of new radio technologies and new spectrum allocation, which will enable yet another upgrade in speeds and feeds. If you are a radio person that's quite interesting–there is plenty of innovation in squeezing more bandwidth out of wireless channels. It's a bit harder to explain why more bandwidth will make a big difference to users, simply because 4G generally works pretty well. Once you can stream video at decent resolution to your phone or tablet, it's a bit hard to make a case for the value of more bandwidth alone. A more subtle issue is bandwidth density–the aggregate bandwidth that can be delivered to many devices in a certain area. Think of a sporting event as a good example (leaving aside the question of whether people need to watch videos on their phones at sporting events).
Lowering the latency of communication starts to make the discussion more interesting–although not so much to human users, but as an enabler of machine-to-machine or Internet-of-things applications. If we imagine a world where cars might communicate with each other, for example, to better manage road congestion, you can see a need for very low latency coupled with very high reliability–which is another dimension that 5G aims to address. And once we start to get to these scenarios, we begin to see why 5G isn't just about new radio technology, but actually entails a whole new mobile network architecture. Lowering latency and improving availability aren't just radio issues, they are system architecture issues. For example, low latency requires that a certain set of functions move closer to the edge–an approach sometimes called edge computing or edge clouds.
The Importance of Architecture
The high points of the new cellular architecture for 5G are all about leveraging trends from the broader networking and computing ecosystems. Three trends stand out in particular:
If you want to know more about the architecture of 5G, the application requirements that are driving it, and how it will enable innovation, you should go read the book as I did!
Photo by Sander Weeteling on Unsplash
Having recently received the foreword for the sixth edition of Computer Networks: A Systems Approach from David Clark, we thought it would be fun to pull out his original foreword from 1996, which will also be republished in the forthcoming book.
"Plus ça change, plus c'est la même chose."
Foreword to the First Edition
The term *spaghetti code* is universally understood as an insult. All good computer scientists worship the god of modularity, since modularity brings many benefits, including the all-powerful benefit of not having to understand all parts of a problem at the same time in order to solve it. Modularity thus plays a role in presenting ideas in a book, as well as in writing code. If a book’s material is organized effectively—Modularly—the reader can start at the beginning and actually make it to the end.
The field of network protocols is perhaps unique in that the “proper” modularity has been handed down to us in the form of an international standard: the seven-layer reference model of network protocols from the ISO. This model, which reflects a layered approach to modularity, is almost universally used as a starting point for discussions of protocol organization, whether the design in question conforms to the model or deviates from it.
It seems obvious to organize a networking book around this layered model. However, there is a peril to doing so, because the OSI model is not really successful at organizing the core concepts of networking. Such basic requirements as reliability, flow control, or security can be addressed at most, if not all, of the OSI layers. This fact has led to great confusion in trying to understand the reference model. At times it even requires a suspension of disbelief. Indeed, a book organized strictly according to a layered model has some of the attributes of spaghetti code.
Which brings us to this book. Peterson and Davie follow the traditional layered model, but they do not pretend that this model actually helps in the understanding of the big issues in networking. Instead, the authors organize discussion of fundamental concepts in a way that is independent of layering. Thus, after reading the book, readers will understand flow control, congestion control, reliability enhancement, data representation, and synchronization, and will separately understand the implications of addressing these issues in one or another of the traditional layers.
This is a timely book. It looks at the important protocols in use today—especially the Internet protocols. Peterson and Davie have a long involvement in and much experience with the Internet. Thus their book reflects not just the theoretical issues in protocol design, but the real factors that matter in practice. The book looks at some of the protocols that are just emerging now, so the reader can be assured of an up-to-date perspective. But most importantly, the discussion of basic issues is presented in a way that derives from the fundamental nature of the problem, not the constraints of the layered reference model or the details of today’s protocols. In this regard, what this book presents is both timely and timeless. The combination of real-world relevance, current examples, and careful explanation of fundamentals makes this book unique.