The Modern Data Center - Are Blade Servers Dead ?

Back in the late 90's and early 2000's as companies started to purchase Intel CPU based servers in larger and larger quantities, they started to experience some pain.

That pain came in two forms.  First was "server sprawl" of their X86 compute and second was managing all the power and networking requirements that the sprawl created.

Back then, most 'rack servers' were 2U in size - with 4U and 6U units that supported larger local storage or 4 CPU sockets.   If the application you were building needed dozens or hundreds of servers, you were eating rack space, power and cooling capability pretty quickly.  And it was hard to get to "full density" - meaning filling - a 42U rack  

And while the 1U or pizza box rack server form factor helped with rack server density, the power and cabling often became challenging when trying to get 30 or more of these servers in a 42U rack.

Companies like RLX came up with the concept of the "blade server".  With blade servers, a common chassis provided power, networking and management to a portfolio of servers.  Chassis ranged from 6U to 10U in size and could hold 8 - 16 or more blades.  When compared to utilizing 2U rack servers - compute density was greatly improved.  You could have 16 - 2 socket servers for example in 10U compared to 32U for the 2U rack servers.  A huge density boost.   

And by consolidating the power and networking functions you could also reduce some of the complexity - while still providing excellent performance.   

Soon all of the major players had a "blade" offering - HP (who bought RLX), Dell, IBM, Hitachi, Fujitsu.  But blades also made a concession to the other big growing standard and that was SAN.  To get all of that compute density, most blade servers had 1 maybe 2 HDD that was primarily to be used for OS.  For all other storage, you needed the blade servers to connect to external storage via either iSCSI or Fibre Channel or mount remote volumes via CIFS or NFS.  

And while blades did increase density - depending on your configuration they may not have reduced as much network cabling as much as some wanted and because of the compute density needed expensive networking modules to provide connectivity - especially SAN.

A further advancement came with Cisco's UCS Blade Solutions when they added Converged Networking - combining Ethernet and SAN traffic on the same wire running from the blade chassis to what they called a Fabric Interconnect at top of rack.  

And so while blades did provide value in increased density - they were also expensive.  When you added in the chassis, power supplies, networking modules and then the blades themselves, management, licensing, support, etc - a fully populated blade server could easily run $150 - $200K or more.  

And often customers would buy the chassis but not fill it.  In theory to provide for future expansion.   I would often see a 10U chassis holding 4 blades.  In those cases both the cost and density value didn't work.

Even with that - blades became hugely successful and it is rare to walk into a corporate data center today and not see them....

At the same time that blades were becoming popular another new approach to data center compute and storage was starting to take shape.  And that was coming from the big web companies like Google, Facebook, Yahoo, etc.  These companies had compute and storage demands that were magnitude's higher than anything seen in the corporate world.  

One thing these companies quickly realized was that the standard offerings from folks like HP, Dell, IBM, Cisco, etc wasn't going to work for them.  It wouldn't scale to the thousands of nodes they needed at a price that would work for their business models.  They needed something different.  

So starting with Google - companies started to develop their own servers, by working with Original Design Manufacturers (ODMs).  Often these were simple 1U servers with no chassis that could be installed in very high density. The blade concept morphed into what many started calling sleds.  These were very simple 2 socket servers with fixed components like HDD/SSD and LOM designed to run Linux and effectively be disposable.  

In addition the software they were developing took a completely different approach from what was the norm in Corporate IT.  Their software didn't require all sorts of hardware redundancy for high availability.   The app itself provided the redundancy and if a node failed - yawn - just replace the node.  So large numbers of cheap nodes were better.  They didn't want or need all of the hardware high availability complexity or cost.

One the configurations that came out of this approach is what is commonly called a 4-in-2.  This configuration provided 4 compute sleds in a 2U chassis. No multi-channel backplanes or large networking modules.  This approach increased density again - so versus a 16 blade in 10U of rack space - you could have 20 sleds - a 25% increase.   Many of these 4-in-2 form factors also support up to 24 x 2.5" HDD.  And while that was great for certain applications, there was still a need for dense cheap storage as well.

Continuing on that theme Intel developed what they called the Rack Scale Architecture (RSA - now called Rack Scale Design) and companies like Facebook and Microsoft formed the Open Compute Project (OCP) to try and address a standard approach and hardware specification for high density computing and storage.  The most recent OCP 2.1 specification utilizes a 12U "chassis" that can support up to 24 compute or storage sleds

The one thing you'll notice about all of these designs is the lack of Fibre Channel SAN Networking & Storage Arrays.  Storage is either local to a node or the node is direct attached to a storage sled to create a "storage server" - typically running an open source object store.  Scale and availability is provided by the software - running across many nodes.  

Most recently the next generation of the sled model has a 2U chassis that can support up to 8 - 1 Socket servers, 4 - 2 Socket Servers, 2 - 4 socket servers or 16 - Micro servers.  And unlike the 4-in-2 design I mentioned above can also hold storage sleds - some designs allow for 3 x 16 2.5" drive sleds per chassis. Using 3.8TB MLC SSDs for example - you could have up to 182TB of SSD in a 2U chassis.  

And versus the large chassis, these new models have a much less expensive entry cost - since the chassis is much less complex than a traditional blade servers and you also don't need large and expensive networking modules.  

Bottom line is this - Sled based compute and storage offerings are coming into the market that are inspired by the experiences of the large Web scale companies like Google, Facebook, AWS and MS Azure.  They are lower cost / unit and provide extremely high density - with the ability to support thousands of compute cores and/or PB worth of storage / rack.    This model will eventually kill off what we now think of as blade servers.

Comments

Popular posts from this blog

Solar Storage - 2023 Update

Journey to Solar - Part 1 - Understanding your usage and getting skinny

ASUS RT-AC68U Router & WDS - a nice solution for a large home.