In my last post I talked about the need to sanitize a virtual machine (VM) instance before creating an image (especially a publicly available image) from it. The Dell Cloud Manager (DCM) agent team decided it was an important task and that it should be automated. To help do this we created the tool dcm-agent-scrubber (the scrubber). This is a CLI that assists an owner of a VM instance to remove any dangerous files from being shared on a child image. It can search for and remove any RSA private keys, history files, system logs, cloud-init caches, and various other things. It also creates a recovery file which is an important detail discussed below. The source code for the scrubber can be found here.
When running the scrubber you can elect to create a recovery file. This is very useful if after you create a child image from your parent VM instance you plan on using that parent instance again. The scrubber may delete some files needed for that parent instance to properly run. Thus once the child image creation has completed and the parent instance returns to processing errors may occur in the absence of those files. Because of this before the scrubber deletes files it places them into a tarball. Once it has completed removing files it encrypts the tarball using your public key. While it is still best practice to remove that recovery tarball before the child instance is created, if it is not removed it is still safe because only someone with access to the matching private key can access the information inside of it. Once the child image is created the recovery tarball can be decrypted with the owners matching private key and then untarred in the root directory of the parent instance and thereby reinstating everything that was removed.
Automation from DCM
Encrypting this recover file was particularly import to our use case. DCM controls VMs running in IaaS clouds and can run an agent inside of those VMs for additional control . When a DCM customer wishes to create an image of their instance we would like to make this as safe and convenient as possible and thus we would like to help them sanitize the image. We can do this by sending a command to the agent telling it to run the scrubber on the customers image. However, one of the things that the scrubber removes is a secret that the local agent needs to authenticate with DCM. Therefore once scrubbed DCM central command may no long have that link to the agent. The scrubber can create the recovery file but the agent may lose contact with DCM before the recovery file can be safely pull off the parent VM. This is why the asymmetric encryption is critical for us. We can create the recovery file and be assured that if it is burnt onto the child image that no secrets have been leaked.
- We do not actually encrypt the entire tarball with a public key. Instead we create a symmetric that is used to encrypt the tarball and then encrypt that symmetric key with an RSA public key. This is a common practice for large data streams.
- The exact architecture of DCM and the DCM agent will be discussed in a later post.