Open Compute Project - Web Scale Computing

The Open Compute Project (OCP) was formed back in 2011 by Jonathan Heiliger at Facebook as a method for the largest web-scale participants - providers, hardware vendors, etc to share their information and to drive energy efficiency and innovation across this unique industry.

Members today include Facebook, Intel, Google, MS, Apple, Lenovo, Cisco, Juniper, etc.  These are not your normal data center consumers.   MS Azure alone is over 1M servers and counting.  So for these folks things like PUE or Power Usage Effectiveness is critical not only from a business sense - but also from a overall environmental impact.  Saving 1-2 watts / server is huge when you multiply it across server farms of these sizes.  

The goal of the group is to develop standards across server, storage, networking, racks, and data center that drive PUE up and costs down.  The expectation is that the innovations coming from this group will find their way into traditional SKU based servers that corporate customers can buy.

Prior to the forming of the OCP - each of these web scale companies approached Original Design Manufacturer (ODM) companies with their own unique specs for servers, storage, etc and had these ODMs build their solutions.  Customers like these do not really buy traditional OEM - Original Equipment Manufacturer servers like Dell, HP, etc.  In fact some of teh traditional OEM companies have spun off separate ODM divisions to address the needs of this market.  Think about buying 10K or more servers at a time - every month.  

The current OCP Server "standard" known as the Open CloudServer (OCS) is a 12U chassis that can support 24 compute or a mix of compute and storage modules.  The most recent 2.1 specification was submitted by Microsoft on Feb 9, 2016 and the details are here http://www.opencompute.org/wiki/Server/SpecsAndDesigns

Compute modules are using the latest Intel Haswell architecture based on a 2 socket design with 16 x DDR4 DIMMs along with up to 4 PCie Risers where each can support 2 x M.2 SSD along with 4 x HDD or SDDs.  

Storage modules provide 10 x HDDs and support SAS expansion up to 8 modules per server

The chassis is supported by 6 x 1600W Battery Backed Power Supplies and a centralized management module. There are also chassis I/O modules than can support 10Gbps/40Gbps Uplinks and SAS that are combined with Mezzanine I/O cards for the Modules.  

Using the OCS specification- up to 4 chassis can be installed into a single rack - so 96 compute modules.  If you max it out with prescribed E5-2600V3 14C CPUs and 16 x 32Gb DIMMS (512GB) - that means 2,688 Cores and 49,152GB of RAM per rack.  By comparison - filling a rack with 42 x 1U servers using the same CPU and RAM spec would only provide 1,176 Cores and 21,504 GB of RAM.

The OCS team has also submitted the 1.0 specification for a quad server module that would provide 4 x 1S SoC compute components per module. While each 1S compute component is less powerful then their 2S versions and support fewer DIMMS per 1S they do provide the potential for even higher density light load compute.  

On the storage front there are some very interesting designs for Vault & Cold storage and the use of Open Storage with Ethernet Devices (OSED) - so think about HDDs with an ethernet interface as a potential replacement for SAS/SATA. http://www.opencompute.org/wiki/Storage#Specs.2C_Designs_and_Presentations

From a networking standpoint - the OCP standards are based on the Spine/Leaf topology using Open Networking switches that support ONIE and Open Linux as their base OS and then SDN.  Many of the switches themselves come from vendors you may have never heard of like Alpha Networks, EdgeCore, Inventec along with Dell and Broadcom.

As a layer in the overall switch stack - Microsoft, Dell, Facebook, Broadcom, Intel, Mellanox introduced something called the Software Abstraction Interface (SAI).  The SAI is is a standardized API that allows network hardware vendors to develop innovative hardware architectures to achieve great speeds while keeping the programming interface consistent.

Additionally at this years OCP Summit MS presented on a piece of software known as SONiC or Software for Open Networking in the Cloud.  SONiC is a collection of software and tools that provides for L2/L3 functionalities in a standard fashion so that individual controllers in an SDN design can work with a number of switch vendors.

Bottom line here is that what used to be considered the great mystery of how these web-scale providers design, build and configure their data centers is now available to anyone. And - if you are so inclined you can buy these solutions. Folks like Hyve and Stack Velocity are great 3rd party shops that do all the racking cabing, etc, etc.

Finally some of the design concepts driven by these group are making their way into consumable servers for corporations.  To me the best example of this is Dell's FX2 line.  While the FX2 is only a 2U versus 12U chassis - many of the same principals apply along with a collection of compute and storage modules and networking options.




Comments

Popular posts from this blog

Solar Storage - 2023 Update

ASUS RT-AC68U Router & WDS - a nice solution for a large home.

Home Automation Platforms + Matter - Early Observations