Congestion control has been on my mind a lot over the last year, given our dependence on a smooth-running Internet to work from home, connect with friends, and watch a lot of streaming entertainment. But it is particularly on our minds at Systems Approach, LLC as we (Larry Peterson, Larry Brakmo and myself) have started the latest book in our series, focussed on Congestion Control. With that in mind, I wanted to revisit some content I had worked on earlier in the last few months.
Back in another era, at the end of 2019, I was writing one of those end-of-year prediction posts, and I decided to say something about the future of the Internet. I don’t regard myself as especially bold when it comes to predictions – unlike the inspiration for my post, the inventor of the Ethernet, Bob Metcalfe. Metcalfe famously predicted the collapse of the Internet in 1995 and publicly ate his printed words when proven wrong. My prediction was “I confidently predict that the Internet is not going to collapse in my lifetime”, which on reflection was a bolder statement than I realised at the time.
Little did I know how much the world would change in a few months after I made the above prediction. As COVID-19 spread across the globe, a large proportion of the world’s workforce moved to working from home, Zoom and Slack became primary means of communication for knowledge workers, and online video streaming – which was already 60% of Internet traffic pre-COVID – suddenly became massively more important to hundreds of millions of people.
Adding capacity in advance
To the surprise of many, the Internet has handled this incredibly well. The Atlantic did a good piece on this, pointing out that it was built to withstand a wide range of failures (although the claim that it was supposed to withstand nuclear war has been well debunked). There are a lot of reasons the Internet has fared so well, including the pioneering work on congestion control that I touched on in my pre-COVID post. But another aspect jumped out at me as I read The Atlantic: The Internet has generally been built with a lot of free capacity. This might seem wasteful, and it’s not how we build most other systems. Highways near major cities, for example, are normally full or over capacity at rush hour, and whenever a new lane is added to increase capacity, it’s not long before new traffic arrives to use up that capacity.
We were delighted to receive the foreword for our recent book Software-Defined Networks: A Systems Approach from SDN pioneer Nick McKeown. It is reproduced below. Also, if you're the sort of person who likes physical books, you can now purchase a print edition of this book on Amazon.
I got goosebumps when I saw the first Mosaic web browser in 1993. Something big was clearly about to happen; I had no idea how big. The Internet immediately exploded in scale, with thousands of new ISPs (Internet Service Providers) popping up everywhere, each grafting on a new piece of the Internet. All they needed to do was plug interoperable pieces together—off-the-shelf commercial switches, routers, base-stations, and access points sold by traditional networking equipment vendors—with no need to ask permission from a central controlling authority. The early routers were simple and streamlined—they just needed to support the Internet protocol. Decentralized control let the Internet grow rapidly.
The router manufacturers faced a dilemma: It’s hard to maintain a thriving profitable business selling devices that are simple and streamlined. What's more, if a big network of simple devices is easy to manage remotely, all the intelligence (and value) is provided by the network operator, not the router manufacturer. So the external API was kept minimal (“network management” was considered a joke) and the routers were jam-packed with new features to keep all the value inside. By the mid 2000s, routers used by ISPs were so complicated that they supported hundreds of protocols and were based on more than 100 million lines of source code—ironically, more than ten times the complexity of the largest telephone exchange ever built. The Internet paid a hefty price for this complexity: routers were bloated, power hungry, unreliable, hard to secure, and crazy expensive. Worst of all, they were hard to improve (ISPs needed to beg equipment vendors to add new capabilities) and it was impossible for an ISP to add their own new features. Network owners complained of a “stranglehold” by the router vendors, and the research community warned that the Internet was “ossified.”
This book is the story of what happened next, and it’s an exciting one. Larry, Carmelo, Brian, Thomas and Bruce capture clearly, through concrete examples and open-source code: How those who own and operate big networks started to write their own code and build their own switches and routers. Some chose to replace routers with homegrown devices that were simpler and easier to maintain; others chose to move the software off the router to a remote, centralized control plane. Whichever path they chose, open-source became a bigger and bigger part. Once open-source had proved itself in Linux, Apache, Mozilla and Kubernetes, it was ready to be trusted to run our networks too.
This book explains why the SDN movement happened. It was essentially about a change in control: the owners and operators of big networks took control of how their networks work, grabbing the keys to innovation from the equipment vendors. It started with data center companies because they couldn’t build big-enough scale-out networks using off-the-shelf networking equipment. So they bought switching chips and wrote the software themselves. Yes, it saved them money (often reducing the cost by a factor of five or more), but it was control they were after. They employed armies of software engineers to ignite a Cambrian explosion of new ideas in networking, making their networks more reliable, quicker to fix, and with better control over their traffic. Today, in 2021, all of the large data center companies build their own networking equipment: they download and modify open-source control software, or they write or commission software to control their networks. They have taken control. The ISPs and 5G operators are next. Within a decade, expect enterprise and campus networks to run on open-source control software, managed from the cloud. This is a good change, because only those who own and operate networks at scale know how to do it best.
This change—a revolution in how networks are built, towards homegrown software developed and maintained by the network operator—is called Software Defined Networking (SDN). The authors have been part of this revolution since the very beginning, and have captured how and why it came about.
They also help us see what future networks will be like. Rather than being built by plugging together a bunch of boxes running standardized interoperability protocols, a network system will be a platform we can program ourselves. The network owner will decide how the network works by programming whatever behavior they wish. Students of networking will learn how to programme a distributed system, rather than study the arcane details of legacy protocols.
For anyone interested in programming, networks just got interesting again. And this book is an excellent place to start.
In 2012 we published the fifth edition of Computer Networks: A Systems Approach. As with all the prior editions, we worked with a traditional publisher who retained full copyright to the material. At the time, open source software was becoming ever more important to the networking industry. Larry had decades of involvement in open source projects such as the PlanetLab and was embarking on a new initiative with the Open Networking Foundation to build an open source SDN stack, while Bruce was at Nicira contributing to the Open vSwitch project. It seemed to us that the time was right to consider how to leverage open source for networking education materials, not just for the code.
Like many textbook authors, we embarked on writing a book mostly as a labor of love. That is, we wanted to contribute to the development of the next generation of networking students, researchers, and professionals. We recognized that this desire to educate, and to reach the widest possible audience, aligned well with an open source approach. We also recognized that the book could be better if we drew on the input of the community more effectively that we were able to do in a traditional proprietary textbook.
Over the next several years we negotiated with our publisher an arrangement in which we would continue to furnish them with new editions—ideally leveraging community contributions as well as our own work—while also gaining the rights to make the entire contents of the book freely available under an open source license.
This post, originally published at the 2020 SIGCOMM Education Workshop, reports on our experiences and progress to date in developing the open source version of Computer Networks. We also offer some thoughts on future directions for networking texts, which are likely applicable to many other technical fields .
One of the most surprising things I learned while writing the 5G Book with Oguz Sunay is how the cellular network’s history, starting 40+ years ago, parallels that of the Internet. And while from an Internet-centric perspective the cellular network is just one of many possible access network technologies, the cellular network in fact shares many of the “global connectivity” design goals of the Internet. That is, the cellular network makes it possible for a cell phone user in New York to call a cell phone in Tokyo, then fly to Paris and do the same thing again. In short, the 3GPP standard federates independently operated and locally deployed Radio Access Networks (RAN) into a single logical RAN with global reach, much as the Internet federated existing packet-switched networks into a single global network.
For many years, the dominant use case for the cellular network has been access to cloud services. With 5G expected to connect everything from home appliances to industrial robots to self-driving cars, the cellular network will be less-and-less about humans making voice calls and increasingly about interconnecting swarms of autonomous devices working on behalf of those humans to the cloud. This raises the question: Are there artifacts or design decisions in the 3GPP-defined 5G architecture working at cross-purposes with the Internet architecture?
Another way to frame this question is: How might we use the end-to-end argument—which is foundational to the Internet’s architecture—to drive the evolution of the cellular network? In answering this question, two issues jump out at me, identity management and session management, both of which are related to how devices connect to (and move throughout) the RAN.
The 5G architecture leverages the fact that each device has an operator-provided SIM card, which uniquely identifies the subscriber with a 15-digit International Mobile Subscriber Identity (IMSI). The SIM card also specifies the radio parameters (e.g., frequency band) needed to communicate with that operator’s Base Stations, and includes a secret key that the device uses to authenticate itself to the network. The IMSI is a globally unique id and plays a central role in devices being mobile across the RAN, so in that sense it plays the same role as an IP address in the Internet architecture. But if you instead equate the 5G network with a layer 2 network technology, then the IMSI is effectively the device’s “ethernet address.”
Ethernet addresses are also globally unique, but the Internet architecture makes no attempt to track them with a global registry or treat them as a globally routable address. The 5G architecture, on the other hand, does, and it is a major source of complexity in the 3GPP Mobile Core. Doing so is necessary for making a voice call between two cell phones anywhere in the world, but is of limited value for cloud-connected devices deployed on a manufacturing floor, with no aspiration for global travel. Setting aside (for the moment) the question of how to also support traditional voice calls without tracking IMSI locations, the end-to-end argument suggests we leave global connectivity to IP, and not try to also provide it at the link layer.
Let’s turn from identity management to session management. Whenever a mobile device becomes active, the nearest Base Station initiates the establishment of a sequence of secure tunnels connecting the device back to the Mobile Core, which in turn bridges the RAN to the Internet. (You can find more details on this process here.) Support for mobility can then be understood as the process of re-establishing the tunnel(s) as the device moves throughout the RAN, where the Mobile Core’s user plane buffers in-flight data during the handover transition. This avoids dropped packets and subsequent end-to-end retransmissions, which may make sense for a voice call, but not necessarily for a TCP connection to a cloud service. As before, it may be time to apply the end-to-end argument to the cellular network’s architecture in light of today’s (and tomorrow’s) dominant use cases.
To complicate matters, sessions are of limited value. The 5G network maintains the session only when the same Mobile Core serves the device and only the Base Station changes. This is often the case for a device moving within some limited geographic region, but moving between regions—and hence, between Mobile Cores—is indistinguishable from power cycling the device. The device is assigned a new IP address and no attempt is made to buffer and subsequently deliver in-flight data. This is important because any time a device becomes inactive for a period of time, it also loses its session. A new session is established and a new IP address assigned when the device becomes active. Again, this makes sense for a voice call, but not necessarily for a typical broadband connection, or worse yet, for an IoT device that powers down as a normal course of events. It is also worth noting that cloud services are really good at accommodating clients who’s IP addresses change periodically (which is to say, when the relevant identity is at the application layer).
This is all to say that the cellular network’s approach, which can be traced to its roots as a connection-oriented voice network, is probably not how one would design the system today. Instead, we can use IP addresses as the globally routable identifier, lease IP addresses to often-sleeping and seldom-moving IoT devices, and depend on end-to-end protocols like TCP to retransmit packets dropped during handovers. Standardization and interoperability will still be needed to support global phone calls, but with the ability to implement voice calls entirely on top of IP, it’s not clear the Mobile Core is the right place to solve that problem. And even if it is, this could potentially be implemented as little more than legacy APIs supported for backward compatibility. In the long term, it will be interesting to see if 3GPP-defined sessions hold up well as the foundation for an architecture that fully incorporates cellular radio technology into the cloud.
We conclude by noting that while we have framed this discussion as a thought experiment, it illustrates the potential power of the software-defined architecture being embraced by 5G. With the Mobile Core in particular implemented as a set of micro-services, an incremental evolution that addresses the issues outlined here is not only feasible, but actually quite likely. This is because history teaches us that once a system is open and programmable, the dominant use cases will eventually correct for redundant mechanisms and sub-optimal design decisions.
Over the last month I undertook a detailed review of a new book in the Systems Approach series, 5G Mobile Networks: A Systems Approach by Larry Peterson and Oguz Sunay. Talking to people outside the technology world about my work, I soon found myself trying to explain "why does 5G matter" to all sorts of folks without a technical background. At this point in 2020, we can generally assume people know two things about 5G: the Telcos are marketing it as the greatest innovation ever (here's a sample); and conspiracy theorists are having a field day telling us all the things that 5G causing or covering up (which has in turn led to more telco ads like this one). By the end of reviewing the new book from Larry and Oguz, I felt I had finally grasped why 5G matters. Spoiler alert: I'm not going to bother debunking conspiracy theories, but I do think there is something quite important going on with 5G. And frankly, there is plenty of hype around 5G, but behind that hype are some significant innovations.
What is clear about 5G, technically, is that there will be a whole lot of new radio technologies and new spectrum allocation, which will enable yet another upgrade in speeds and feeds. If you are a radio person that's quite interesting–there is plenty of innovation in squeezing more bandwidth out of wireless channels. It's a bit harder to explain why more bandwidth will make a big difference to users, simply because 4G generally works pretty well. Once you can stream video at decent resolution to your phone or tablet, it's a bit hard to make a case for the value of more bandwidth alone. A more subtle issue is bandwidth density–the aggregate bandwidth that can be delivered to many devices in a certain area. Think of a sporting event as a good example (leaving aside the question of whether people need to watch videos on their phones at sporting events).
Lowering the latency of communication starts to make the discussion more interesting–although not so much to human users, but as an enabler of machine-to-machine or Internet-of-things applications. If we imagine a world where cars might communicate with each other, for example, to better manage road congestion, you can see a need for very low latency coupled with very high reliability–which is another dimension that 5G aims to address. And once we start to get to these scenarios, we begin to see why 5G isn't just about new radio technology, but actually entails a whole new mobile network architecture. Lowering latency and improving availability aren't just radio issues, they are system architecture issues. For example, low latency requires that a certain set of functions move closer to the edge–an approach sometimes called edge computing or edge clouds.
The Importance of Architecture
The high points of the new cellular architecture for 5G are all about leveraging trends from the broader networking and computing ecosystems. Three trends stand out in particular:
If you want to know more about the architecture of 5G, the application requirements that are driving it, and how it will enable innovation, you should go read the book as I did!
Having recently received the foreword for the sixth edition of Computer Networks: A Systems Approach from David Clark, we thought it would be fun to pull out his original foreword from 1996, which will also be republished in the forthcoming book.
"Plus ça change, plus c'est la même chose."
Foreword to the First Edition
The term *spaghetti code* is universally understood as an insult. All good computer scientists worship the god of modularity, since modularity brings many benefits, including the all-powerful benefit of not having to understand all parts of a problem at the same time in order to solve it. Modularity thus plays a role in presenting ideas in a book, as well as in writing code. If a book’s material is organized effectively—Modularly—the reader can start at the beginning and actually make it to the end.
The field of network protocols is perhaps unique in that the “proper” modularity has been handed down to us in the form of an international standard: the seven-layer reference model of network protocols from the ISO. This model, which reflects a layered approach to modularity, is almost universally used as a starting point for discussions of protocol organization, whether the design in question conforms to the model or deviates from it.
It seems obvious to organize a networking book around this layered model. However, there is a peril to doing so, because the OSI model is not really successful at organizing the core concepts of networking. Such basic requirements as reliability, flow control, or security can be addressed at most, if not all, of the OSI layers. This fact has led to great confusion in trying to understand the reference model. At times it even requires a suspension of disbelief. Indeed, a book organized strictly according to a layered model has some of the attributes of spaghetti code.
Which brings us to this book. Peterson and Davie follow the traditional layered model, but they do not pretend that this model actually helps in the understanding of the big issues in networking. Instead, the authors organize discussion of fundamental concepts in a way that is independent of layering. Thus, after reading the book, readers will understand flow control, congestion control, reliability enhancement, data representation, and synchronization, and will separately understand the implications of addressing these issues in one or another of the traditional layers.
This is a timely book. It looks at the important protocols in use today—especially the Internet protocols. Peterson and Davie have a long involvement in and much experience with the Internet. Thus their book reflects not just the theoretical issues in protocol design, but the real factors that matter in practice. The book looks at some of the protocols that are just emerging now, so the reader can be assured of an up-to-date perspective. But most importantly, the discussion of basic issues is presented in a way that derives from the fundamental nature of the problem, not the constraints of the layered reference model or the details of today’s protocols. In this regard, what this book presents is both timely and timeless. The combination of real-world relevance, current examples, and careful explanation of fundamentals makes this book unique.
David Clark of MIT has written the foreword for every edition of "Computer Networks: A Systems Approach" since its inception. Here is his foreword to the soon-to-be-released sixth edition.
Readers: before you start the book, first take a moment and set your time machine to 1996. That is when the first edition of this book was published. Do you remember 1996? Were you alive then? People forget how long ago the foundations of the Internet were laid.
In 1996, the NSFNET had just been decommissioned, and the commercial phase of the Internet was just beginning. The first search engine (Alta Vista—do you remember?) had just been demonstrated. Content delivery networks did not exist-- Akamai was founded two years later in 1998, the same year Google was officially born. Cloud was only a distant haze on the horizon. And there was no such thing as residential broadband or consumer wireless. We used dialup modems—the 56K modem had just been invented. There were packet radios before then, but they were slower than dialup and the size of a beer fridge. You needed a truck or at least a Jeep to be mobile.
And in 1995 or so, Larry and Bruce decided to write this book. It may be hard, from today’s perspective, to remember how important a book like this was in 1996. It captured a lot of tacit knowledge and made it available to anyone who would read. And rather than just reciting a series of protocol descriptions, it taught how the parts fit together. It taught how the Internet worked, not just what the parts were.
One way to think about how the Internet has evolved is through the lens of the application designer. After all, the purpose of the Internet as a packet transport system is to support apps. Only geeks and performance freaks send packets for the fun of it. In 1996, if you wanted to build an application, the ecosystem included the IP packet transport service, TCP to smooth out the losses at the Internet layer, the DNS, and that was about it. Anything else the application designer needed had to be built from scratch.
Now an application designer has lots of resources to build on: cloud and cloud networks, other global networks that can hook services together, CDNs, app development environments and so on. Some of these may seem quite different from what we had in 1996 and in detail they are. Consider cloud. (I hate the choice of the term—to me “cloud” suggests something soft and fluffy, but if you have ever seen a data center the size of a football field that sucks megawatts, you would not think soft and fluffy. But never mind…) Data centers have become very sophisticated about cost, energy efficiency, performance and resilience. There is a lot to learn about how to build a modern data center. But the fundamentals are the same: packet forwarding, statistical capacity sharing, transport protocols, routing protocols, the pursuit of generality and broad utility, and the like.
Looking forward, technologies such as cloud are clearly central and this edition devotes considerable attention to cloud. Requirements such as improving security are critical, and the book discusses additional issues related to security: trust, identity, and the latest hot topic—blockchain. However, if you were to look at the first edition, many of the foundational concepts are the same. But this edition is the modern version of the story, with up to date examples and modern technology. Enjoy.
We recently sat down for a conversation with Martin Casado of Andreesen Horowitz. The transcript below has been lightly edited for clarity.
Bruce Davie (BD) - Okay, so I'm here today with Martin Casado. We're doing another episode
of Coffee with Bruce. Martin, welcome.
Martin Casado (MC) - Happy to be here with
my chrysanthemum tea.
BD - Alright, yeah. Dealing with time zones and all, it's kind of late for coffee there, I guess. So, Martin Casado really needs very little introduction but in case you've been living under a rock, Martin was one of the founders of Nicira, was the person who successfully recruited me to go and work at Nicira, starting off an eight year journey into network virtualisation for me. And these days he's a general partner at Andreessen Horowitz. And so Martin, I've got a few things I wanted to talk about today. First of all, I want to talk a little bit about some of the things we experienced at Nicira in the early days of network virtualisation.
I think when I joined, you guys had already established that you had this plan to disrupt networking and change it in a pretty significant way. So I guess first thing I wanted to say looking back now, I think it was what 2007 that you founded Nicira. It's a lot of years to look back on now. Are you happy with where things ended up?
Martin Casado (MC) - So yes. Yes, I'm quite happy.
It's interesting though, when we started the company, if you would have looked at what our aspirations were, it was "start with the data center and then change all of networking". And every new bit of networking is another feature on the same technical platform.
And it turns out, every new bit of networking is actually a separate industry, right?
So I think originally we had pretty broad ambitions, which it's fun to see the industry play out totally independent of what we're doing. I mean, that's been fantastic.
But if you look at, I was actually looking back. If you look at our Series A deck, when we're like, okay, we're going to start, and we're going to focus on the data center and network virtualisation. If you look at the timelines that we played out and what actually happened, it was actually pretty much in line.
And so I think in the area that we decided to focus on, you know, it's been great to see the realisation.
BD - Yeah, and honestly, SD-WAN for me is just the continuation of the Nicira vision. It's just that it got done by other companies. And the first time somebody explained SD-WAN to me, it was like, oh, you're doing what Nicira did, but you're gonna do it for the WAN.
And so to that extent, the vision has actually played out quite, quite broadly. It's just, as you say it, wasn't on a single technical platform.
MC - I know that's exactly, yeah. I think that's exactly right. The lower you go in the stack, the more general the technology, isn't it? So at some level a compiler company is basically every software company on the planet, right? Because you can build anything with a compiler.
So I think that the industry - and we were a very important part of the discourse - the industry at the time was kind of grappling with what does software-defined networking and network virtualisation mean? I think that what we distilled in that discourse was the right way to build networks, all aspects of networks.
But when it comes to the brass tacks of building a company, who you sell to dictates the kind of company you are, and the specific problems you're tackling dictate the type of engineering organisation that you build.
And what I didn't appreciate that I appreciate now is, absolutely SD-WAN is very much part of the original vision that we had, but how much effort it takes to execute in these separate verticals.
Which is why it's actually so exciting to see companies pull this off. That have both data center and WAN solutions, because they actually have so much potential synergy, but for a startup that's a hard thing to execute on.
BD - Yeah, also you sort of getting onto another thing I want to talk about here, which is actually what kind of company you wanted to build, I thought the culture at Nicira was pretty amazing. My biggest regret is I didn't join earlier just because it was so much fun.
But what was in your mind about how to build the culture? What did you do to go and create the culture at Nicira? And I mean, what are your thoughts on that more broadly.
MC - In some ways, one of our greatest challenges was also our greatest gift. And that was, we started the company in 2007 and then the world ended about a year later, right? If you'll remember with the great recession. And it was like the nuclear winter had set in.
And you probably had two options, which is you band together or you die. And I think that in many ways our culture was galvanised in this tough time. And what it forced in particular is for the company to have a very clear vision that you can march towards. And if the vision wasn't clear, people wouldn't have done it. I mean, they just wouldn't have, just because it was such a scary time.
And so I think 2008, 2009, we just really, really sharpened a vision that everybody believed in. And if someone didn't believe in it they'd speak up because they didn't have anything to lose.
And then we had iterated by the time things were starting to improve, that vision was very clear. And then I think what happened is once you have a vision that you believe in, the work is really realising that vision and an implementation of building product, which it's something that we've just been desperate to do anyways. And so I do think a lot of this was galvanised by, having a tough time.
And I think that being mission focused is something people talk about a lot and it's like, these are the three things on my badge or whatever. And I think that kind of misses the point, which is if you don't have a very high definition kind of like the mountain top that you're tackling, I think people start just meandering in the woods. And I think that was very, very core to the cultural success of the company.
BD - Yeah and to be honest, I've talked about that in terms of how you led the NSBU when we were at VMware together was, I think every staff meeting you would tell us, you know, these are the three things that we're shooting for this year. And you would repeat it every week so that nobody could have any doubt what direction the organisation was moving.
MC - It's always been amazing to me how easy is to lose focus of even very simple things. We're a room of very professional adults at that peak of our career. And we're talking about very simple things like KPIs.
But I do think in many ways being an operator is managing your own psychology and the hardest part of managing your own psychology is just staying focused on the things that are really important and taking a long view.
And so one thing that I loved as a group that we did is, everybody had the same expectation that we're going to make sure that we're staying on track for a broader goal, and everything is subservient to that.
And I mean, listen I sit on 17 boards now, and I can't tell you how rare that actually is, that people will actually distill all of the trouble and all of the pain, and all of the excitement until you get fairly simple to articulate goals.
BD - What about now that we're moving to this world of much more distributed teams, given the COVID impact on remote working, does that affect the way you think about building a team culture in a startup?
MC - This is one of the big questions that we're trying to answer right now. I think maintaining culture and building culture are actually two different things. In my experience, maintaining culture is something that you can kind of doing the same motions but you may have to do more of them and you may have to adapt them online, but it's not that interesting of a conversation. Does that make sense?
Which is like, yes, you should do it online. Yes, you should have more touch points. Yes, should you be sensitive to people in their personal issues. But these are fairly straightforward conversations to have and you can maintain a strong culture. And quite frankly, we've seen cultures getting even stronger because there's a notion of solidarity, through that.
What I don't know the answer to and I wish I did have an answer because I think everybody's trying to figure it out is, can you create a new culture, right? Can you change a culture?
And I think about - a lot of times - Andreessen Horowitz is a venture firm for example. I mean, it's pretty remarkable to create a tier one venture firm in 10 years, right? It just doesn't happen very often.
And then I ask the question could you do that in the time of COVID, right? Like for sure you can stay being a firm. And so I think it's a great question. I wish I had a better answer, but I would encourage anybody thinking to this to really decouple the two issues of maintaining culture and creating culture.
BD - Yeah, it's going to be fascinating to see how this plays out in the next few years.
So, one of the areas that I've been really interested in for the last few years, and I know you've written about it quite a bit is, is AI and machine learning. And also, I love the fact that when you talk about it, you often talk about it in terms of economics. Because I go all the way back to listening to David Clark at MIT talk about the idea that we should all be economists because you can't really understand networking, unless you think about the economic impact of your design decisions.
What's your thinking about the economics of AI businesses?
MC - Let me actually start with the economics of cloud businesses because it's very interesting. And then we'll back into the economics of AI.
So let's talk about the life cycle of a company that's built in the cloud. So, you know, Bruce Davie creates Bruce Davie inc, and comes to Martin Casado and says, "Hey, Martin, you know, give me $2 million to do my company."
BD - The scenario could totally happen by the way.
MC - So, we're like here, Bruce here's 2 million bucks and you're off looking for product market fit. Now you come back and like, there's some interest here, give me $10 million. So I'll give you $10 million and I join your board.
And so now what happens for every board meeting, the questions I ask are: what's growth look like? What features are we shipping?
And so you're telling your R&D organisation, we need to grow, we need new features. And that's basically your entire focus. Never once do we talk about things like gross margin or COGS right? And COGS are cost of goods sold, like how much it costs to run this stuff. So you're writing horribly suboptimal code on AWS or wherever.
So, you're doing this and then you've got a real business to say you're 50 million in ARR. You're feeling good and now you've got a real financial investor. Like I'm an investor that invests in ideas and people. And like, right, Bruce is the smartest architect I've ever worked with. Like, this is a great space. Like whatever, I'll invest just in that.
But financial investors actually care about financial metrics, like unit economics, and things like margins matter because they directly impact profitability. So then they look at your company and they're like, hey, listen, this is all great. You've got great growth but basically for every dollar you're making 50 cents goes to Amazon.
You got this huge issue now, which is you spent four years building an R&D team and a business practice around growth because that's what all of the economic incentives including the board have told you to do. And you've got a company now that has low multiples because you've got low margin, right? And then you've got this paradox, what do you do?
So more and more, what we're seeing as soon as companies slow down, they do, what's called repatriation. They're like, listen, like you can't fix that much code. I mean, it took Dropbox years to do it. So we're going to have to find some way to move it onto our own infrastructure or something else to improve those margins.
And I would say there's probably, in SaaS companies as they slowed down, billions of dollars trapped in this margin issue for these companies that are going to the cloud.
Okay, so this is the life cycle you're focusing on growth. You write poor code, you have these kind of relatively fat margins, and now you have to do something about it because it's impacting your valuation. Instead of being valued at $10 billion or $20 billion, you're valued at $10 billion. We're talking about a lot of money that's being impacted.
Okay, so the question with AI/ML to bring it back to that, is it's not clear in the case of the cloud, you could be like, well, I'm paying somebody else, if I do it myself, I can do it cheaper. And maybe if I re-architect it, I do it cheaper.
There's a very open question for AI/ML, which is, is there a way to actually build the company where you, no matter what can have good margins or software level margins.
So I feel like we're going to this escalation of margin crisis. You had traditional software, like in the Microsoft era, say where you ship software to 80% margins because you write it and you shipped it. And copying bits is basically free. You have 80% margins. Now we've have the cloud era, which I just walked through where you have a margin crisis where it is possible to run it on low margins.
But it's something that you have to tackle later on, which is why people think about re-architecting and repatriation.
Now we've got the AI/ML one, which is there's a fundamental question. And it's interesting to talk about is, is it even possible, no matter what you do, to do this on good margins? And the reason that question comes up is if you look at what it takes to create a model, and I'm going to shut up in a second, but if you look at what it takes to create one of these data companies, you have to create a model per customer.
BD - Yeah.
MC - And that's incredibly compute intensive to do it. And then to improve the accuracy of any single model, for example, you require just a ton more data. So you basically have a dis-economy of scale, which is in order to stay ahead, you've got to do more and it costs a lot more.
And so I feel like we're having a margin crisis on top of a margin crisis, like all the cloud margin costs and now the actual structural algorithm cost per AI/ML.
BD - So Martin there's one thing I wanted to drill into. There was this thing about repatriation, because I think that's a bit of a controversial topic. The idea that somehow you can run your private infrastructure more cost-effectively than public-cloud infrastructure.
So maybe you could just give us a little bit of insight on why that's true or back up that assertion.
MC - Yeah, so we've now seen a number of cases I think in the industry to really understand what's going on here. And it's an economic argument. It's strictly an economic argument and I kind of penciled it out, but I want to be a little bit clearer which is, if you look at dollars out the door on any given month, it never makes sense to build a data center, right? Because it takes a lot of dollars out the door in order to do that. However, if you look at multiples of revenue to the valuation of a company, all of a sudden it makes sense, right?
So if you've got 80% gross margins and you're growing at 2X, it may be 10X revenue, for the value of the company. And if your COGS, if your margins are say 40%, it could be 5X. Now that could be the difference between a $10 and a $20 billion company. Now, can you build a data center for $10 billion? The answer is yes, you can buy a whole bunch of them, right?
And then if you purpose built a data center for your app, can you make it more efficient than the cloud? Absolutely, because this is standard. The cloud is built for the long tail of small apps because that's what it caters to. This is nobody's fault, here, it's just the system.
So if you have a sufficiently large workload that you've been focusing on growth where the company slows down, let's take any online SaaS service and that's impacting your margins and therefore it's impacting the valuation of the company. It's actually quite easy to make the argument that, from a shareholder standpoint, it's far, far cheaper to build a data center than not.
BD - Wow, that's brilliant analysis.
We will give people a chance to go and find other ways to dig deeper into some of these topics, but a super interesting discussion Martin.
And I want to say thanks for your time.
MC - Thank you so much.
BD - Alright, cheers. And we'll see you soon.
Earlier this month, Bruce and Larry sat down over Zoom to discuss their shared history researching and developing networking, how "Computer Networks: A Systems Approach" came to be written, and their thoughts on the future of Networking. The interview is on YouTube and the transcript follows.
Bruce Davie: so I'm here today with Larry Peterson. I'm drinking coffee as is my normal habit. I'm guessing it's bit late in the day for you for coffee.
Larry Peterson: Well it isn't too late for a brew of some kind.
Bruce Davie: All right, well, if you want to go and pop up and get a beer, feel free. So, for those of you who don't know, this is Larry Peterson, my co-author on "Computer Networks: A Systems Approach" and Larry and I have known each other since I think early 1990s, when we collaborated on a networking project.
I guess before I go into giving more background about you. I want to know that your cat since the cat theme been showing up a lot
Larry Peterson: He's going to be interrupting here at some point. This is Toby. If you can see him. Yeah, so we live in the desert, which is full of coyotes. Toby's used up a couple of his lives right outside in our backyard. There's a wall on the other side of that is open desert. And when he was a few months old a pack of coyotes came in the yard in and got him and took him over the wall. My wife went out and started yelling at the coyotes. We think he has some coyote DNA now.
Bruce Davie: So I have a lot of literally warm memories of visiting you in Arizona. Also remember working with you on a joint research paper sometime in the early 90s and I was locked in my house during an ice storm in New Jersey, and it was literally unsafe to go outside, because there was so much ice on the roads. You were probably soaking up the sun in Arizona. But it was one of those early experiences, similar to what we're going through now where we actually could leverage networking to collaborate from a pretty remote distance.
So I think you've actually been in networking longer than I have. And your ageing very well, by the way. Can you maybe tell me a little bit about your early experiences with networking?
Larry Peterson: Yeah, let's see. Well, I was a grad student at Purdue. I remember the transition from the ARPANET to the Internet and my advisor actually handed me a nine track track tape for our VAX. That had this thing called TCP/IP on it so, yeah, that was my first exposure to it.
We are officially shutting down PlanetLab at the end of May, with our last major user community (MeasurementLab) having now migrated to new infrastructure. It was 18 years ago this month (March 2002) that 30 systems researchers got together at the Intel Lab in Berkeley to talk about how we could cooperate to build out the distributed testbed to support our research. There were no funding agencies in the room, no study group, and no platinum sponsors. Just a group of systems people that wanted to get their research done. We left the meeting with an offer from David Tennenhouse, then Director of Research at Intel, to buy 100 servers to bootstrap the effort. In August, the day before SIGCOMM, a second underground meeting happened in Pittsburgh, this time drawing 80 people. The first machines came online at Princeton and Berkeley in July, and by October, we had the 100 seed machine up and running at 42 sites. The rest, as they say, is history.
In retrospect, it was a unique moment in time. The distributed systems community, having spent the previous 15 years focused on the LAN, was moving on to wide-area networking challenges. The networking community, having architected the Internet, was ruminating about how it had become ossified. Both lacked a realistic platform to work on. My own epiphany came during an Internet End-to-End Research Group meeting in 2001, when I found myself in a room full of the Internet’s best-and-brightest, trying to figure out how we could possibly convince Cisco to interpret one bit in the IP header differently. I realized we needed to try a different approach.
PlanetLab enabled a lot of good research, much of which has been documented in the website’s bibliography. Those research results are certainly important, but from my point of view, PlanetLab has had impact in other, more lasting ways. One was a model for how computer scientists can share research infrastructure. Many of the early difficulties we faced deploying PlanetLab had to do with convincing University CIOs that hosting PlanetLab servers had an acceptable risk/reward tradeoff. A happy mistake we made early on was asking the VP for Research (not the University CIO) for permission to install servers on their campus. By the time the security-minded folks figured out what was going on, it was too late. They had no choice but to invent Network DMZs as a workaround.
A second was to expose computer scientists to real-world operational issues that are inevitable when you’re running Internet services. Researchers that had been safely working in their labs were suddenly exposed to all sorts of unexpected user behavior, both benign and malicious, not to mention the challenges of keeping a service running under varied network conditions. There were a lot of lessons learned under fire, with unexpected traffic bursts (immediately followed by email from upset University system admins), a common right-of-passage for both grad students and their advisors. I’m not surprised when I visit Google and catch up with former faculty colleagues to hear that they now spend all their time worrying about operational challenges. Suddenly, network management is cool.
Then there were the non-technical, policy-related issues, forcing us to deal with everything from DMCA take-down notices to FBI subpoenas to irate web-surfers threatening to call the local Sheriff on us. These and similar episodes were among the most eye-opening aspects of the entire experience. They were certainly the best source of war stories, and an opportunity to get to know Princeton’s General Counsel quite well. Setting policy and making judgements about content is really hard… who knew.
Last, but certainly not least, is the people. In addition to the fantastic and dedicated group of people that helped build and operate PlanetLab, the most gratifying thing that happens to me (even still today) is running into people--usually working for an Internet company of one sort or another--who tell me that PlanetLab was an important part of their graduate student experience. If you are one of those people and I haven’t run into you recently (or even if I have) please leave a comment and let me know what you’re up to. It will be good to hear from you.