Continuous Hardware Software Validation: Flex–Mirantis Use Case

By Dharmesh Jani, VP of Ciii Labs and Strategy, Flex

Many would agree that until recently, Amazon and Microsoft have been the primary choice for public cloud access to infrastructure-as-a-service and platform-as-a-service. VMware has been the primary cloud vendor on the enterprise side, with its vSphere suite and vCloud Air for hybrid deployments. Over the past six years, OpenStack has emerged as the only viable alternative to the currently available choices.

According to Forrester, OpenStack is positioned to become the fifth major public cloud camp, along with AWS, Azure, Google, and VMware.  There are multiple instances of OpenStack production deployments running cloud infrastructure. This is both from large hosters like Rackspace, as well as companies like PayPal and Walmart that use it for internal private clouds. A number of service providers, including AT&T, are deploying OpenStack to leverage the benefits of using open source technology to manage operational and capital expenses. Most of these instances require considerable engineering resources to ensure that a production distribution is validated against a targeted hardware platform.  This effort is often duplicated across multiple end users, each re-discovering the problems of others. This duplication of effort across global adopters creates barriers to the rapid distribution upgrades across the industry as OpenStack evolves.

OPSV - Platform Validation

It would be good to separate the challenge of running CI/CD from the validation aspects of running the latest changes in the OpenStack software against the latest hardware components and drivers as they get updated. To do that, we need a common platform on which hardware vendors and Openstack vendors can drop their respective updates for CI/CD. We can classify this as a new set of activity called continuous validation (CV).

There are moving parts on the hardware side which have to be continuously validated against the moving parts of the software stack that runs on it to ensure that, as both pieces iterate, nothing gets broken. OpenStack’s official release cadence is six months; however, it takes users almost this much time to get the distribution certified on the next hardware SKU. This often causes delays in the validation cycle on end user timelines. Often the vendors for host bus adapters (HBAs), NICs, or motherboards have released an update to their drivers, which may not be compatible with the latest release of OpenStack due to lack of access to these drivers during the development cycle. Often the disconnect between the ecosystem of hardware vendors and OpenStack developers leads to integration issues that take a while to iron out after each official OpenStack release.

Mirantis and Flex are working together to address this disconnect by providing a live development and integration environment. This consists of a fully populated rack of hardware running at a Flex facility, with Mirantis having full remote access to the system. The live testing of OpenStack during the development phase provides a quick feedback loop, which guarantees tighter hardware and software integration and a well-controlled end user experience as OpenStack evolves. Such an environment also provides continuous insight into the health and performance of a distribution as it evolves toward its next release. Initially, validation tests will be run with Flex and Mirantis teams working together, but over a period of time we plan to extend this to a wider ecosystem of infrastructure component providers, as well as operators who can quickly validate their stack on top of this accessible system.

OPSV - Racks of servers

Continuous validation is currently running in the Flex DataCenter Showcase (Flex DCS) using Ciii white box 1U & 2U servers based on Intel E5-2600 series processors and Intel, Micron, and Transcend flash drives. The Flex DCS can hold and power 11 racks, and the integration lab increases that number up to 30 racks. As an R&D lab, network adapters, flash, and NVMe drives from partner vendors are inserted into compute and storage servers to understand the cluster benefits under specific use cases. Mirantis OpenStack is deployed across multiple racks of servers, including Ceph on the storage servers.  Using an automation framework based on Ansible, and open source tools such as Stacki, Zabbix, and Grafana, the ELK stack, VMs, applications, and benchmarks are spawned, executed, and measured for reliability under various stress-inducing scenarios, such as server and drive failures. Common compute and storage workloads include various database access patterns, and Hadoop/Spark processing. In the networking and telecommunications space, we are currently testing and performance benchmarking NFV applications. We are also working in conjunction with partners contributing to OPNFV and CORD to set up SDN/NFV reference architectures and solutions using the latest open source software stack and commodity hardware designs. This approach enables faster adoption and validation of new open hardware architectures such as OCP, Open19, and OCP-Telco.

We’re looking forward to gathering with the OpenStack community to discuss this next week at OpenStack Silicon Valley. See you there!

Leave a Reply

Your email address will not be published. Required fields are marked *