We define customer data as “all the data, including all text, sound, software or image files that a user provides, or are provided on the users’ behalf, to the cloud provider through use of the online services.” [[i]] Data protection is the process of safeguarding important information from corruption and/or loss [[ii]]. Cloud providers should commit to protecting the data and limit the use of them. The data that that Public Authorities host in cloud services belongs to them—and should not be used by a cloud provider for purposes other than to provide the customer’s service. Moreover, cloud providers should not use customer data for purposes unrelated to providing the service, such as advertising. Additionally, each service has established a set of standards for storing and backing up data, and securely deleting data upon request from the customer.
The best-designed and implemented service cannot protect customer data and privacy if it is deployed to an environment that is not secure. Customers expect that their data will not be exposed to other cloud customers. They also assume that the processes used at the datacentre, and the people who work there, all contribute to keeping their data private and secure.
The term data protection is also used to describe operational backup of data that usually comes in the form of incremental backups. The aim of the backup procedure is to keep data from being lost due to intentional or unintentional access.
The STORM CLOUDS approach
The STORM CLOUDS backup process is based on the data requirements of the services and the architecture of the SCP. The backup process aims to best exploit the features implemented by the IaaS cloud where the VMs are hosted and more specifically Swift, the Object Storage Service implemented by OpenStack. The main steps needed for backing up application’s data are presented below.
1st Step: Design a Backup Strategy
During this step, several aspects related to the data and/or the application(s) managing the data were analysed in order to put together a list of what needs to be backed up, when to backup, how long to keep the backup data and how long it takes to restore. It includes the following tasks:
- Analysis of current data usage that reveals:
- Types of data used.
- Data locations, including folders and/or databases.
- Approximate amount of data.
- How often data changes, as this affects our decision on how often the data should be backed up.
- Data sensitivity. For critical data, such as a database, we should have redundant backup sets that extend back for several backup periods. For sensitive data, we should ensure that backup data is encrypted, using public/private key-pair technology.
- How quickly we need to recover the data.
- What’s the best time to schedule backups (scheduling backups when system use is as low as possible will speed up the backup process).
- Set an up limit for the backup volume as the amount of data we need to backup is only going to increase as time goes by.
- Identify the software tools that will be used.
- Select the appropriate backup type/policy (Full or Incremental). Typically, one of the following approaches is used: (a) Full daily, (b) Full weekly + Incremental daily. The process of taking incremental backups following an initial full backup is known as data deduplication. The final choice depends on the required performance levels and data protection levels, the total amount of data retained and the cost associated with it, since cloud storage space comes at a cost that depends on the service provider.
- Choose where to store the backups. Using the cloud environment to store the backup data is arguably more resilient to disaster than other technology solutions because it is not physically located at the same place as the organisation. Moreover, since the applications are hosted in the Cloud we also save bandwidth and time taken to transfer the files needed to restore the application correctly. However, the cost associated with storing the backup data in the cloud is a significant factor in our decision.
2nd Step: Generate a Key-Pair on the Client Machine
Although we can create the key pair directly on the VM, it is good practice to keep a copy of the keys outside the VMs using them. The reason is that VMs are “ephemeral”, meaning that once a VM is deleted, we are not anymore able to decrypt our backup data when restored. Moreover, creating key-pairs requires some level of “entropy” for ensuring randomness in the generation.
3rd Step: Prepare the VMs for Backup
Install and configure
4th Step: Implement the Backup Strategy
The backup scripts that address all the aspects of backup strategy are created and executed using the Duplicity tool.
5th Step: Validation tests
Validation includes tests on the restore mechanism. More specifically both incremental and full backups were used to bring the applications to a previous operational state successfully. The backup solution should be tested many times after it has been implemented in order to ensure that it is working as intended. Moreover, the applications should be re-tested periodically to ensure they’re functional, and data is being backed up appropriately. Validation not only will help us to identify problems in the backup process but will also train the Municipalities’ IT personnel to recover quickly and efficiently the files if this becomes necessary.
After the initial setup, the backup process is scheduled according to the backup strategy.
[i] Microsoft, 2014, Protecting Data and Privacy in the Cloud