Data Center Fabric: How Facebook Redesigned its Data Center Network
Facebook has built a reputation of being a pioneer and innovator, introducing new concepts that eventually stick. In fact, it is safe to say that Mark Zuckerberg’s baby was primarily instrumental in changing the face of the internet as we know it today, fully realizing the potential and meaning of the word “social” in social networking. Ever since founding in 2004, Facebook has continued introducing new ideas and concepts, and one of the more recent ones is its new data center networking architecture, which it called “data center fabric”.
In this article, you’ll learn 1) why Facebook changed its data center architecture, 2) how the new data center fabric works, and 3) what are the benefits.
A CALL FOR CHANGE
Humans, by nature, are never satisfied. They are bound to look for something more – bigger, better, faster. This is actually a good thing, since it also forces developers to look for better ways to offer their services and products. Despite the fact that Facebook is now one of the largest and most successful businesses in the world, the brilliant team behind it is continuously working hard to improve its features and offerings.
Facebook reported a total of 1.44 billion monthly active users in its First Quarter report for 2015. That is a lot of people making use of Facebook, so we can only imagine the demand in terms of performance. Several years down the road, that number will definitely grow exponentially, and so Facebook is taking steps to optimize its internal application efficiency and to keep up with that growth.
The end user normally does not care about what goes on behind the scenes. As long as he can log on to his Facebook account, post status updates, read whatever appears on his Timeline, stream videos and whatever else one normally does on Facebook, then they could not care less about the technical stuff.
However, it pays to get even a little bit of an idea on what makes Facebook work, and that is mostly down to its data center networking architecture.
The Old Design
The design of the previous data architecture used by Facebook is hierarchical.
According to Najam Ahmad, the Director of Network Engineering in Facebook, Facebook’s data center networking used to follow the server clusters approach. It involved groupings of many machines and equipment, which are the clusters. There was nothing really wrong with it, and it was actually working quite fine, but there were several points that they thought could have been better. One particular point he cited was the limitation brought about by the size of the switches that are available.
As the usage increased, there was a corresponding rise in the demand for a bigger switch. It was directly proportional: more users meant more bandwidth, which means more racks added to the cluster which, in turn, calls for a bigger switch. They can, of course, always buy a bigger switch, but there is that one possibility that they might have the biggest switch, and still they would need another, bigger one. It will just go on and on. Sooner or later, they will hit what Ahmad called a “dead end”.
There is also the issue of Facebook being at the mercy of its vendors supplying the networking devices needed to maintain its data center. There are only a limited number of vendors for huge networking devices, and there is that great possibility that there will no longer be a vendor who will be able to supply a network device or switch that will be able to accommodate Facebook’s ever-growing clusters.
Speed was also be compromised. The bandwidth infrastructure is bound to suffer in terms of speed when there are simply too many ports operating. There is a risk of overload or oversubscription and, as most of us now know from personal experience, overloads can lead to potential failures of hardware and software performance.
Thus, this called for a change, something that will address this issue and actually provide more benefits to all users. Hence, the Data Center Fabric.
THE DATA CENTER FABRIC
Instead of a hierarchical system that has the risk of being oversubscribed, the new design turned the whole Facebook data center into a single high-performance network. The role of the data center fabric is simple: to enable Facebook to account for, and handle, all the traffic it gets, without its system slowing down. Basically, it is tasked to distribute traffic in the data center of Facebook and, in the event that something goes wrong in the data center, perform the necessary repairs or fixes.
In the new design, Facebook has decided to veer away from its server cluster approach and, instead, make use of a core-and-pod approach where the groupings of machines and equipment are replaced by smaller physical pods. This is not really an entirely new approach, since it is also used by Google and eBay. Facebook just came up with its own redesign concept that serves its specific needs and user demands.
In this design, “core” refers to the data center network core. The standard unit of network, compute and storage is a server pod – which is described by Ahmad as “just like a layer3 micro-cluster” – and within each pod are servers or racks. These identical server pods are built for high-performance connectivity among them within the data center, with each server pod served by four fabric switches, also known as “Six-Packs”. These fabric switches carry with it the same features and advantages that the old design for server rack TOR uplinks had. Thanks to these switches, every node on the network is connected, and can share data easily. It also allowed room for scalability, in the event that there is a need to do so in the future. Essentially, one server pod has 4 fabric switches and a corresponding 48 top of rack (TOR) switches. That is an impressive drop from the previous Facebook data center design, which involved a pod size consisting of 255 racks.
Instead of moving on to a larger switch to accommodate increasing bandwidth and power usage, the action taken will be the deployment of pods. Basically, they can deploy as many pods as they like, or as needed. This is quite the more preferred setup, since there is no limit to how much and how often the pods are deployed; unless, of course, you run out of power, or there is no longer any physical space available.
As explained by the designers of the data center fabric, they created four independent control “planes” of spine switches, and each of these switches are made to be scalable up to 48 independent devices. Of course, the scalability of these devices are only within the plane they are in. One pod has four fabric switches, and each of these switches will then connect to each spine switch that is also within the same plane. These pods and planes make up the modular network, which contains hundreds of thousands of servers.
The infrastructure design philosophy of Facebook is pretty simple: for it to be able to move fast and support the rapid growth of both “machine to user” and “machine to machine” traffic. The latter is what is addressed by the current shifts and changes made to Facebook’s data centers.
BENEFITS OF THE DATA CENTER FABRIC
Change is good, but how good are they, exactly? What are the improvements that we can expect from the redesigned architecture?
Data center space is maximized.
This pertains to both virtual space and physical space.
The new design will help Facebook make better use of its current data space. It will no longer be limited by the size of the available switches, because it has shifted to deploying pods instead.
Unlike the clusters, the pods are much smaller, so they can efficiently fit into pretty much any data center floor plan. It also dispenses with the need to look for and secure larger switches, because the four fabric switches that are required are only of basic or average size.
Worries about physical space are also lessened considerably, if not completely eliminated. Just think: there is no longer a need to map out a floor plan for the data center, figuring out where to physically put which machine. The Data Center Fabric already provides a virtual layout of all the physical machines, and all that is needed to be done is to follow that layout.
Provision of room for growth.
In every aspect of the data center fabric’s design, it is to be noted that scalability was always taken into consideration. It was designed to be highly modular, so scaling capacity is facilitated within the framework. Scalability can be accomplished by adding server pods, edge pods, spine switches, edge switches, and scale uplinks.
In order to make room for gradual scalability, in anticipation of increased user demand (and it will happen, if the current trend continues), the entire network was designed as an end-to-end non-oversubscribed environment. This allows Facebook to increase its capacity bit by bit, or in greater leaps, depending on what the circumstances demand.
Simple, robust and modular internal architecture.
Contrary to initial reactions to Facebook’s new data center fabric, the new design is actually simple and robust. The data center fabric was built by the engineering team by using a routing protocol in the form of standard BGP4. The connectivity requirements from core to pod and vice versa are kept as generic as possible. This is to ensure design freedom, in case there is a need to change the technology used in the pods in the future. (At the rate that technology is changing, there is a high probability of that happening.) The in-house developers alsoproudly claim to have stuck to using “only the minimum necessary features” again, for scalability purposes, as well as ensure performance is optimized.
Port density was one of the issues encountered in the previous design and, since that is already addressed in the new core-and-pod approach of the data center fabric by making the port density of the fabric switches significantly smaller, it is certainly something to be happy about.
In the cluster approach, one of the pet peeves of anyone maintaining the physical data center, is having to deal with a complex and often confusing cabling infrastructure. Too many equipment and machines connected together by wires and cables can be a source of headaches. The new Data Center Fabric lessened this problem somewhat: cables are shorter, there are less machines involved, and there’s no complicated network of wires and cables.
This also results to a reduction of the time it takes for the flow of work to be completed. The Data Center Fabric was first implemented in Facebook’s data center facility in Altoona, and results show that the length of time for its site network turn-up is considerably shorter.
Better network performance.
The network performance is also scaled up, since uplink capacity on the fabric switches of the pods are reserved for every downlink port to a top of rack switch.The uniform or identical design of the pods and how they are made to connect together also contribute greatly to upping the performance of the network. This modular network can be scaled to “multi-petabit bisection bandwidth”, so oversubscription is not going to be an issue.
Improved external connectivity.
The data center fabric also paid attention to external connectivity by having a number of edge pods that can provide up to 7.68Tbps on their data center sites. These edge pods are also scalable to 100G. External connectivity is further enhanced by assigning higher port speeds.
Better performance of applications.
It is a fact that an application will only perform as well as the network. Even if the application is excellent, if there are network problems, then users cannot make the most of its features.
This limitation is addressed by the new data center fabric, which allows application developers to develop apps outside of the more limiting cluster-type environment. The result would be more flexible, faster and better applications.
Simple to operate and low-maintenance.
This is primarily an advantage for the engineers of Facebook. By nature, systems and networks are complex. However, thanks to automation and design principles, the level of complexity can be greatly reduced.
Usually, problems or issues may arise within the network, and the engineers would get down to fixing the problem. The Data Center Fabric was built with an exclusive configuration-management software, which is able to automatically configure a white box in accordance with the specifications of Facebook. The engineers are no longer required to be on hand to tweak or adjust anything.
The same is true with the scaling process. The engineers no longer have to physically and personally deploy more pods, or add more switches or devices to the data center. The software will also do the work. It will recognize the need for new devices and adds what is required. It will then automatically recognize the new addition and set it up or configure it – again, conforming to Facebook’s specs.
The engineers are also spared from having to manually troubleshoot problematic boxes. The configuration-management software is also able to simply put the problematic box “out of commission” and create a new one. It’s like wiping the slate clean and starting afresh.
Cost and functions streamlining.
Think of all the money that goes in purchasing or leasing space for your data center. If you have so many machines of large sizes, you would naturally require a large space to hold everything in. The Data Center Fabric’s use of small pods is certainly more physical space-efficient, so that is certainly a source of cost savings.
Similarly, engineers no longer have to devote most of their work hours monitoring the data center. Since the Data Center Fabric is pretty much self-regulating, thanks to the configuration-management software, they can concentrate on more, newer projects instead.
The devices are smaller, and they are simpler. Thus, they are easier to troubleshoot. There is really no need to bring in the “big guns” and spend a lot of money to fix massive problems, since they can be taken care of at the lower level.
Ready sources of devices and components.
In the previous architecture, one of the stumbling blocks was the limited number of sources of the large-sized networking devices for the clusters. Since the components in the Data Center Fabric are small to mid-size, there are multiple sources or vendors to choose from.
Normally, upgrading your data center also entails upgrading to high-end hardware. The simplicity of the new Data Center Fabric made it possible for Facebook to stick to simple (and less expensive) hardware.
Facebook’s network engineering team succeeded in creating a data networking architecture that is effective and efficient, while keeping things simple. Although implementation is still on a limited scale, we can definitely expect it to be fully operational in all of Facebook’s data centers. The initial reaction of many is that the new design is specific only to Facebook. For now, perhaps, it is, but there is no doubt that it may also be applicable to other enterprises and companies. After all, if you take note of the principles behind it, they apply to pretty much any data center, not just Facebook’s.