In this post I am presenting reasons OpenStack could benefit from a new transfer service component. This component would receive VM images and write them to nova instance storage on behalf of nova-compute providing the following benefits:
- A more flexible architecture
- A more predictable quality of service for end user’s VM instances.
- A cleaner separation of concerns for nova-compute.
IaaS clouds must transfer VM images from the repositories in which they reside to compute nodes where they are booted. In the current state of OpenStack images download via HTTP to a nova-compute client speaking to the Glance image service. Within the Glance v2 protocol a notion of a direct URL exists. Looking forward, clients can leverage this URL to bypass Glance as a transfer service, allowing direct access to the backend store.
This feature introduction helps because depending on the architecture of a given OpenStack deployment a download using Glance HTTP may perform suboptimally. In some cases, a direct image pull from swift transfers data faster than any other option. In other cases a simple file copy excels. In more complicated, yet real world examples, a bittorrent swarm propagates the image ideally. Other IaaS offerings use advanced protocols like GridFTP and LANTorrent for fast image propagation. Because the best possible option varies, the Glance community created a blueprint to expose multiple locations of the single object to the user .
The idea is that Glance will operate in part as an image replica service translating logical names (image IDs) into physical locations (URLS). The client receives the list of physical locations and the responsibility of choosing the best one. Unfortunately this can be a complicated decision to make.
Advantages Of A Separate Service
Simply being able to speak every protocol puts an unnecessary burden on the client. As the set of protocols grows, so must the dependencies of the client. Moreover the client may need global information in order to select the best URL, for example the network topology between endpoints or the current demand on each endpoint. Advanced optimizations like queueing and reordering requests could have dramatic performance benefits in certain use cases and thus should be possible. Such complicated logic burdens components like nova-compute whose primary concern has nothing to do with data transfer.
Virtual machine images commonly range in size from hundreds of megabytes to gigabytes. Downloading such files can require substantial resources. Not only does it place demands on the network and the system’s NIC but it also taxes memory, bus, and CPU resources particularly if it is an encrypted or highly compressed data stream. If the system doing the download shares resource with co-located services, or if users expect a certain quality of service then resource management is important. This is the case with nova-compute nodes. In addition to running OpenStack services, nova-compute nodes also run end-user VMs. These VMs expect a reasonable, predictable, and consistent quality of service. If the download dominates resources it can disrupt the processing of the nova-compute control messages, or worse it can act as a noisy neighbor to user VMs.
Creating a separate service to handle the downloading of images would not only help with the above two problems, but it would also add flexibility. Some OpenStack deployments consist of a rack of compute nodes that mount a SAN for use as an instance store. Instead of provisioning each node with the resources necessary to download images, a separate node could mount the file system and be used strictly to transfer images into the instance store. As more transfer resources are needed, additional transfer nodes could be added. The decoupling of transfer services and compute services not only provides the flexibility needed to meet this deployment scenario, but it also allows the transfer of images to be well-managed and provides a more predictable quality of service with horizontal scalability.
Third Party Transfers
This idea centers around the concept of the third-party transfer. The most familiar transfer case is a two-party transfer. In this case a client either uploads, or downloads a file directly from the server. All of the bytes stream through the client, but the server handles most of the heavy lifting involved with resource management, security, DOS attacks, and the like. The client is typically (tho not necessarily) thin and simple. This is the case for GET and PUT operations in Glance. In a third-party transfer the client contacts two servers, a source and a destination. It routes enough information between the two of them to negotiate a means of transferring the data. It then steps out of the way and lets the two servers stream the data. The client does not perform any protocol interpretation needed to stream the bytes, nor does it see the bytes. The two endpoint servers entirely handle that part of the transfer. The cost to the client of every transfer is fixed and does not scale with regard to the size of the data set being transferred. A detailed description of third-party transfers can be found in the FTP RFC. An animated gif of this process can be found here..
To put this in OpenStack terms, nova-compute would be a client which would make requests to two transfer services, one with access to the source data store and another with access to the destination data store (the nova instance storage file system). The nova-compute transfer client would route enough information between the two services so that it could instruct them to perform the transfer. It could then asynchronously poll for status or its initial request could block until completion.
With such a system in place, the transfer service is now in a position to provide a more graceful – and optimal – degradation of service. Because requests to any given endpoint come to a single (yet possibly replicated) service, the service can queue up requests making the most efficient use of the network as possible. When compared against the case where every image booted results in a download that has TCP back off algorithms as the only means of providing fair sharing (rarely do congestion events lead to optimal network usage) the benefits are noteworthy.
New Service Design
Hopefully the OpenStack community will buy this pitch, because this effort will take a village. For now I will have to mark the most fun part (designing and writing software) as TBD.