Posted by: John Bresnahan | January 11, 2013

An Image Transfers Service For OpenStack

In this post I am presenting reasons OpenStack could benefit from a new transfer service component. This component would receive VM images and write them to nova instance storage on behalf of nova-compute providing the following benefits:

  1. A more flexible architecture
  2. A more predictable quality of service for end user’s VM instances.
  3. A cleaner separation of concerns for nova-compute.

Introduction

IaaS clouds must transfer VM images from the repositories in which they reside to compute nodes where they are booted.  In  the current state of OpenStack images download via HTTP to a nova-compute client speaking to the  Glance image service.  Within the Glance v2 protocol a notion of a direct URL exists.  Looking forward, clients can leverage this URL to bypass Glance as a transfer service, allowing direct access to the backend store.

This feature introduction helps because depending on the architecture of a given OpenStack deployment a download using Glance HTTP may perform suboptimally.  In some cases, a direct image pull from swift transfers data faster than any other option. In other cases a simple file copy excels.  In more complicated, yet real world examples, a bittorrent swarm propagates the image ideally.  Other IaaS offerings use advanced protocols like GridFTP and LANTorrent for fast image propagation. Because the best possible option varies, the Glance community created a blueprint to expose multiple locations of the single object to the user .

The  idea is that Glance will operate in part as an image replica service translating logical names (image IDs) into physical locations (URLS).  The client receives the list of physical locations and the responsibility of choosing the best one.  Unfortunately this can be a complicated decision to make.

Advantages Of A Separate Service

Simply being able to speak every protocol puts an unnecessary burden on the client.  As the set of protocols grows, so must the dependencies of the client.  Moreover the client may need global information in order to select the best URL, for example the network topology between endpoints or the current demand on each endpoint.  Advanced optimizations like queueing and reordering requests could have dramatic performance benefits in certain use cases and thus should be possible.  Such complicated logic burdens components like nova-compute whose primary concern has nothing to do with data transfer.

Virtual machine images commonly range in size from hundreds of megabytes to gigabytes. Downloading such files can require substantial resources. Not only does it place demands on the network and the system’s NIC but it also taxes memory, bus, and CPU resources particularly if it is an encrypted or highly compressed data stream.  If the system doing the download shares resource with co-located services, or if users expect a certain quality of service then resource management is important.  This is the case with nova-compute nodes.  In addition to running OpenStack services, nova-compute nodes also run end-user VMs. These VMs expect a reasonable, predictable, and consistent quality of service.  If the download dominates resources it can disrupt the processing of the nova-compute control messages, or worse it can act as a noisy neighbor to user VMs.

Creating a separate service to handle the downloading of images would not only help with the above two problems, but it would also add flexibility.  Some OpenStack deployments consist of a rack of compute nodes that mount a SAN for use as an instance store. Instead of provisioning each node with the resources necessary to download images, a separate node could mount the file system and be used strictly to transfer images into the instance store.  As more transfer resources are needed, additional transfer nodes could be added.  The decoupling of transfer services and compute services not only provides the flexibility needed to meet this deployment scenario, but it also allows the transfer of images to be well-managed and provides a more predictable quality of service with horizontal scalability.

Third Party Transfers

This idea centers around the concept of the third-party transfer. The most familiar transfer case is a two-party transfer.  In this case a client either uploads, or downloads a file directly from the server.  All of the bytes stream through the client, but the server handles most of the heavy lifting involved with resource management, security, DOS attacks, and the like.  The client is typically (tho not necessarily) thin and simple. This is the case for GET and PUT operations in Glance.  In a third-party transfer the client contacts two servers, a source and a destination.  It routes enough information between the two of them to negotiate a  means of transferring the data.  It then steps out of the way and lets the two servers stream the data.  The client does not perform any protocol interpretation needed to stream the bytes, nor does it see the bytes.  The two endpoint servers entirely handle that part of the transfer.  The cost to the client of every transfer is fixed and does not scale with regard to the size of  the data set being transferred.  A detailed description of third-party transfers can be found in the FTP RFC.  An animated gif of this process can be found here..

To put this in OpenStack terms, nova-compute would be a client which would make requests to two transfer services, one with access to the source data store and another with access to the destination data store (the nova instance storage file system).  The nova-compute transfer client would route enough information between the two services so that it could instruct them to perform the transfer.  It could then asynchronously poll for status or its initial request could block until completion.

With such a system in place, the transfer service is now in a position to provide a more graceful – and optimal – degradation of service.  Because requests to any given endpoint come to a single (yet possibly replicated) service, the service can queue up  requests making the most efficient use of the network as possible.  When compared against the case where every image booted results in a download that has TCP back off algorithms as the only means of providing fair sharing (rarely do congestion events lead to optimal network usage) the benefits are noteworthy.

New Service Design

Hopefully the OpenStack community will buy this pitch, because this effort will take a village.  For now I will have to mark the most fun part (designing and writing software) as TBD.


Responses

  1. I’m definitely intrigued by the idea of third-part transfers a la FTP, but can you give me some examples on what other protocols we’d use to communicate the data in such a way? I’m having trouble wrapping my brain around several examples of external storage systems working in this way.

    My 2 cents: The transfer logic has to live somewhere. I like that you’re trying to get it out of the client and simplify nova-compute (and any other service that uses images). You’ve identified at least 4 pieces in play here:

    1) Something that needs data
    2) Something that has data
    3) Something that knows where the data is
    4) Knowledge of how to transfer the data

    Right now we have an external service (glance) that knows where the data is, and it knows how to get it. What I think we are going towards in Glance is to move that transfer logic to the client thus skipping the extra hope / network transfer at the glance-api layer. (Sorry, I realized I’m just trying to rehash your post to make sure I understand!).

    What you seem to be suggesting is more of a “push” model where the transfer logic is contained in a daemon on the data source…is that right? The “push” would be initiated by a third service telling storage_node01 to push image XXX to compute_node01 — or did I miss everything? My apologies if so!

    Brian

    • Brian,

      Thanks for your insightful comments. You certainly have not missed everything 😉

      Another example of a protocol that could be used to transfer images is bittorrent. There has been some effort to get that working with OpenStack in the past.

      However, any protocol could be used in this way, simply wrap the traditional client in a service. For example HTTP. A service could be created that simply does a GET on behalf of its client (like nova-compute).

      I am not sure necessarily suggesting a push model (where the client tells the source to push to the destination) or a pull model (where the client tells the dest to pull from the server). There is even a third case, which is how FTP does it, in which the client contacts both the source and the destination and tells them how and when to contact each other. I am not, at this point, advocating or rejecting any one of those possibilities. At this point I am just trying to make a case for the need to abstract the transfer logic into a service.

      John

  2. Great post John,

    This makes a lot of sense and service of this nature is definitely something the OpenStack community should consider.

    The current way Glance allows for direct urls certainly makes the optimizations possible but it would be great to make them easier to implement. Currently each end of the equation just has to know so much about whatever data store the image data is in and how to get it. This logic would need to be replicated in all of the virt drivers and adding in complex logic in order to choose from multiple locations would just be a mess.

    I would like to see all of the responsibility of data transfer removed from Glance and have Glance focus on its job of being an image metadata service.

    • Alex,

      The idea of separating data transfer from image replica/metadata lookup makes sense to me as well. They are quite different things.

      Thanks for the comments!

      John

  3. In this post I only directly speak of downloading images to compute nodes. Alex Meade referenced this issue (https://review.openstack.org/#/c/17803/) and pointed out to me the importance of uploading backups and snapshots as well and I wanted to make sure that aspect is specifically brought into this discussion. This architecture should work well for that situation as well and I would love to hear any thoughts on that.

  4. […] a previous post I discussed the importance of managing the resource consumption of large file transfers.  Here I […]

  5. […] argue that such actions are often best left to a service dedicated to that job (as I described in a previous post). The storage systems control domain ends at its API.  After that, the all bytes coming and going […]


Leave a reply to buzztroll Cancel reply

Categories