The Need For A Cache In Glance
Glance can be configured with a variety of back-end storage modules. It can be backed by swift, Ceph, a local file system, S3, another HTTP service, etc. When a client goes to download an image, Glance first finds the back-end store where the image resides and connections to it. The image is streamed from that back-end store to Glance where it is then routed to the client. If the back-end store is a local file system then this is fast and easy. If however, it is S3, or a remote Swift service, or HTTP or anything else that could potentially be across a WAN, a significant price is paid for the network hop needed to translate it through Glance.
In an effort to alleviate this problem caching functionality exists in Glance. When an image is transferred from a back-end store to glance on its way to the client, it is written to a local file-system as well as being sent to the client. In this way, if that image is ever requested in the future glance does not need to contact the back-end store, instead it can open up the local file and send it without that potentially expensive WAN hop.
The cache described above does its job well. However a small problem is introduced when Glance API services are scaled out horizontally over many nodes. Say you have several Glance API servers running on different nodes that are all backed by the same Swift service. The pool of glance servers are all behind the same VIP or DNSRR, so to the client they all appear to be the same endpoint. However, because they have different disks, each will have its own cache. If a client downloads an image, it may be routed to Glance-API server A. The image will be retrieved from Swift to A, then cached on A‘s disk as it is sent to the client. Now if another client goes to download that same image DNSRR may send them to Glance-API server B. B does not have this image in its cache so it will have to go back to Swift and download it.
Such is life with cache, sometimes you hit and sometimes you miss. Not a problem. What is a problem is how administrators are currently able to manage the cache of each node.
Currently within Glance there are some cache management API calls that an administrator can use to do things like see what is in the cache, delete images from the cache, etc. Obviously this is a problem if these calls goes through DNSRR. One call may be directed to host A, while the next is directed to host B. Each host has entirely different datasets and thus there is no consistency between calls. Further, the caller has no control with which endpoint they actually contact.
One proposed solution to this problem is to separate out the administrator interface to cache management from the rest of the API calls and into its own service. This service would then be run on each node and there would be no DNSRR in front of it. The administrator would contact and manage each cache separately and thus consistency would be achieved. This certainly solves the problem and is a reasonable thing to do.
However, I would like to look at this problem from a different angle….
There is an active blueprint in the works for Glance called Multiple Image Locations. This describes a means for glance to return to a client a list of locations where the image is stored. From there, the client may be able to access those locations directly, instead of routing the data though glance. This is a potentially large optimization.
Glance has always been an image registry (as well as a transfer service). With this enhancement it also becomes a replication service. Not only can clients find images, but they can also discover specifically where those images are and from there select the best location for their needs.
For example, say that an image is downloaded from a third-party anonymous HTTP site and put into a Glance service which is backed by Swift. With Multiple Locations the meta-data for that image can now present the client with 3 options for download: a Swift URL, a Glance URL, and the original HTTP URL. The client knows where itself is located in the network topology and the workload of its system much better than Glance does, thus it is in the best place to choose the ideal source of the image. Pushing on this a bit more, lets say that Glance was backed by a file system instead of Swift and that the client wishing get the image has access to the same file system. In that case a system copy could be done which would avoid a lot of extra cycles. (note this is not a far-fetched case, options in nova-compute exists for this right now).
The cached copy may be the best location from which a client can download, but it may not be. The client may be in a position on the network where direct access to Swift is faster, or a copy from a shared file system like Gluster is possible. There are many reasons why the cached copy may not be the best place, and it will always be the client (or more specifically the downloader of that image) that is in the best position to make that choice.
What is Cache Anyway?
Ultimately a cached copy of an image is simply just another replicated location of that image. It may be a more transient location, but really that is all defined by the SLAs of any given store and is a policy outside of the control of Glance. There is little difference between the cached copy and a copy on some remote Swift service or some HTTP server. So why treat it any differently?
Is the cache going to need tools to manage constancy that other replicas will not? If we have a service designed specifically for managing cached copies (verifying, removing, and generally maintaining consistency) won’t we also need similar tools to accomplish the same thing for other replicated copies of the image held in its meta data? Ideally we could generalize this into a single solution for both.
I suggest that we fold the idea of caching into the idea of multiple locations. When an image is routed through the Glance service it should (or could) still be written to a local disk cache. Once the copy is made, Glance would update the meta-data for that image with an HTTP URL that points to this hosts IP address (not the VIP or the DNS name). Thus when a client looks up all of the locations of the image that cached point is listed with the others and the client is thus empowered to pick the best location for itself. When downloading an image from Glance in the traditional way Glance could check the list of locations in a similar way. If there is cached copy that is deemed the best location it would open it and stream it to the client and business would be conducted as it is today.
Lets get back to the problem at hand: the administrators service for managing the image cache. I argue that this should be generalized to a set of tools for managing replicas. Lets deprecate the current service and instead create tools which will solve the problems of the future as well. An admin is going to need a way to verify the consistency of all the locations listed for a given image, not just the ones on local disk. The details of how this tool (or service) would work should be hashed out by the community, but in general it would contact Glance, get a list of replicated locations points for a given image, and then perform the needed operations on them based upon their URL. It would ensure the consistency of all the points of replication and not just the one special case.