Research on the Architecture of Cloud Host Autonomous Backup System in a Cloud Data Center

In order to realiaze data central control, improve resource utilization and reduce the operation and maintenance cost of information system, enterprises often use private cloud data center to provide virtual cloud host services for users. Therefore, data backup for cloud host usres is particulary important. In this paper, an appropriate technical solution is proposed according to the actual requirements of enterprise cloud host data backup. First according to the performance requirements of data backup recovery for an enterprise cloud host, the three basic elements of data backup are briefly analyzed. Then the logical architecture of the cloud host’s autonomous backup system is built. Finally, the current mainstream backup networking mode, backup strategy and snapshot strategy are analyzed in detail. This architecture research can provide effective technical support for an enterprise to build a cloud host autonomous backup system.


Introduction
An enterprise cloud data center provides enterprises with comprehensive virtual resource services. One of them is to provide enterprise employees with real-time online virtual hosting services in the form of remote cloud desktops through the cloud data center. Users can remotely access data center virtual hosts through thin clients. Browse web information, process documents, graphic design, software development, and even web services. The cloud host business faces a large number of users, involves many departments, and has a huge amount of data. The importance of cloud hosts and their data varies from user to user. Some have extremely high requirements for data security and reliability, and some have extremely high requirements for data security and reliability. Security and reliability requirements are general, and some employees even have no requirements for data reliability. If the cloud data center operation and maintenance personnel back up all the data of all cloud hosts, it is difficult to adopt different backup strategies for different users, which will occupy too much storage resources and increase the time taken for data backup and recovery. Also increased the workload of operation and maintenance personnel. Therefore, it is necessary for cloud data centers to choose appropriate backup solutions based on actual needs, build an autonomous backup system for cloud hosts, provide backup services to users in the form of services, and leave data backup to cloud users to choose appropriate backup frequencies according to their own needs. The right point, suitable data backup strategy for independent backup, on the one hand, can meet the backup needs of all kinds of personnel, on the other hand, it can also reduce the waste of system resources and reduce the workload of operation and maintenance personnel.  [1]. Therefore, when planning a data center backup system, it is necessary to make a trade-off based on the actual performance requirements and cost of the backup system of the enterprise. and then choose a suitable technical solution.
(1) BW BW refers to the time from the start of the backup to the end of the backup when the backup system performs data backup. An application system, especially a 7x24-hour real-time online application system, provides services to users at the same time when data is backed up. Data backup not only consumes host hardware resources, but also real-time changing data also affects data backup, in order not to affect the system's external services For performance, the BW index is as small as possible, and the theoretical time tends to zero.
When actually planning a backup system, it is usually necessary to consider factors such as the running time and data volume of the application system, and comprehensively consider the selection of backup technology and backup period.
(2) RPO RPO means that the newly-added data within the time difference between the time of the last data backup and the time of subsequent failures will be lost because they are not backed up. On the one hand, RPO reflects that the design goal of the backup system is that the lower the RPO, the better. In theory, real-time backup will not cause data loss at any time. On the other hand, RPO reflects the user's tolerance for data loss. The lower the tolerance for data loss, the smaller the RPO value design, and of course the higher the cost. If the tolerance for data loss is higher, the higher the RPO value will be, and the lower the cost of building a backup system will be. This value is related to the backup frequency. In the actual planning of the backup system, if the data is very important and the user has a low tolerance for RPO, the backup frequency can be appropriately increased.
(3) RTO RTO refers to the time consumed from the time a failure occurs to the normal provision of services by the IT system, and mainly reflects the time required for system business recovery. The RTO includes the response time from the occurrence of a failure to the start of the backup recovery, the data recovery time from the start of the backup recovery to the completion of the recovery, and the application recovery time between the completion of data recovery and the normal start of the IT system application. In theory, the smaller the RTO, the better, but the smaller For RTO, the more complex the backup system, the more cost. Therefore, in actual design, the importance of the business is usually analyzed, and the user's tolerance for business interruption time. The more important the business and the smaller the user's tolerance, the smaller the RTO value and the more complex the backup system.
An enterprise cloud data center business covers many business application systems, complex types, and varying degrees of importance, and has different requirements for BW, RPO, and RTO. For example, some extremely important real-time business systems cannot tolerate even a short interruption, and some general business systems can tolerate even a few minutes or even hours of interruption. Therefore, when planning a disaster recovery backup of an enterprise data center, it is necessary to classify the enterprise business. For extremely important businesses, a solution of active-active backup can be adopted, which can realize the real-time business system can perform system resource switching without perception when a failure occurs. The cloud host autonomous backup system is a general business application system, and a time-based strategy-based regular backup system can be used to ensure the reliability and recoverability of business data.

Logical architecture of cloud host autonomous backup system
The cloud host autonomous backup system is based on the cloud data center to build a backup system, and at the same time provide users with cloud host backup services through the cloud management platform. Its logical structure is shown in Figure 1. The backup system adopts a distributed deployment method. A highly reliable backup system is realized by deploying two main backup servers and multiple backup agents to achieve load sharing: at the peak of the task, by extending the backup agent, the automatic load of the backup task is realized Sharing, the workload is evenly distributed to another server; Failover: During the execution of the task, if the backup agent node fails, the task will automatically be switched to another available backup agent for execution. High availability: Two backup servers are configured in the backup system to construct active and standby HA to ensure that when the main server fails, the business department can switch to another one to work at any time.
Periodically copy the cloud host data from the storage in the production center to the storage medium to form a data copy. The backup server is equivalent to a console. It is a server such as a number issuing backup strategy, which sends backup strategies to the business host and backup media; the media server connects to the backup storage to implement read and write operations on the backup storage, and perform the backup during backup The backup instruction sent by the server writes the backup data into the backup storage, and when the data is restored, the data that needs to be restored is read out.

Backup network analysis
Common data backup system networking modes mainly include three modes: LAN-Base, LAN-Free, and Server-Free [2].
(1) LAN-Base networking structure The LAN-Base networking structure is shown in Figure 2(a). The production system and the backup system are only connected through the LAN network, and the read and write data of the business storage and backup storage are interactively performed through the LAN. The basic work flow is that the backup server issues a backup instruction, and the backup agent in the business host reads through the business storage network Take the response data in the business storage system and enter the media server through the LAN network. The media server then writes the received data into  This kind of backup network structure is simple. You can install a backup agent directly on the production server and deploy a backup server to complete the backup. Because the backup data and other data are all transmitted by Lan, it takes up Lan network bandwidth, and the amount of data is large. Under the circumstance, the network performance may be reduced, and the performance of the backup and recovery system may be reduced at the same time. In addition, the application server needs to complete the backup function while completing the production service, and the application server has a heavy load.
(2) Lan-Free networking mode The Lan-Free networking diagram is shown in Figure 2(b). In the Lan-Free networking structure, the production agent and media agent software are directly installed on the server, and the storage only supports the SAN network. When data is backed up, the data stream is directly backed up from File server to Tape through FC switch without going through Lan, so that it will not occupy the bandwidth of the main network and reduce the pressure on the LAN. But the data will still pass through the file server's local disk-memory-FC switch step, so the file server's resources will still be consumed. Therefore, there is the following Server Free backup to reduce the pressure on the production server as much as possible. In this networking mode, the pressure on the Lan network is reduced, and the performance of the backup and recovery system is improved. However, the production server is under greater pressure and needs to operate simultaneously with the production storage and the backup storage at the same time.
(3) Server-Free networking mode The Server-Free networking diagram is shown in Figure 3. In the Server-Free networking structure, the CA of the production server only generates data snapshot functions. CA and MA are installed in the main control server, and the main control server is directly connected to the production storage and backup Storage interaction, when performing data backup, copy the data snapshot formed by the production server to the backup storage. The data does not flow through the bus and memory of the server during backup. The file server uses the File Server Storage space of the SAN. Now if you need to back up the file server, you only need to back up the File Server Storage data directly to Tape. It reduces a lot of pressure on the file server, so that it can focus on providing file services to the outside world without consuming a lot of CPU, memory, and IO for backup.
In cloud hosting services, application servers mainly provide users with common cloud hosting services. They mainly face general applications. Application server application pressure is relatively small. At the same time, most of the services are on working days and the amount of backup data is

Analysis of Backup Strategy
The backup strategy mainly has three modes: full backup, differential backup and incremental backup.
(1) Full backup Each backup will back up all the data at that moment. The advantage of this kind of backup is that there is no relationship between the data backed up each time, and the data can be restored quickly. The disadvantage is that each time it is a backup of the entire amount of data, it occupies the largest storage resources.
(2) Differential backup Differential backup is based on a certain backup. This base point backup is a full backup, and each subsequent backup only backs up the part of the data that is different from the base point. Advantages of differential backup: When data is restored, because it is related to a certain time point and base point data, data recovery is faster. Disadvantages: less space and full backup.
(3) Incremental backup Incremental backup is based on a certain backup, and only the difference part of the previous backup will be backed up for each subsequent backup. Disadvantages of incremental backup: When data is restored, the data at this point in time is related to each backup data of the previous few times, so the data recovery is the slowest, and the advantage: it takes up the least space.
According to the important characteristics of cloud host data in a data center, incremental backup is selected. At the same time, when implementing the strategy, choose to perform a full backup once a week, and then perform an incremental backup.

Analysis of Snapshot Strategy
Data backup is a complete backup of data at a certain point in time. The data being backed up is constantly updated and changed due to online business applications. At the same time, the backup window cannot be zero. Therefore, snapshot technology is required to freeze the data at this point in time to form a point in time. To obtain a copy of the data, copy the frozen data copy of the node to the backup storage.
The definition of Snapshot by the Storage Network Industry Association (SNIA) is: a fully usable copy of a specified data set, which includes the image of the corresponding data at a certain point in time (the point in time when the copy starts)[3]. A snapshot can be a duplicate of the data it represents, or a duplicate of the data. It can be said that a snapshot is a complete and available data copy of the data collection at a certain point in time. Common snapshot strategies are divided into full backup and incremental backup. Full backup mainly includes split-mirror, and incremental backup mainly includes copy-on-write and redirect-on-write and so on.
(1) Split-mirror Separate mirroring technology is to create and maintain a complete physical mirrored volume for the source data volume before the snapshot time point. The source data volume and the physical mirrored volume data are updated synchronously. At the snapshot time point, the physical mirrored volume is converted to a snapshot volume and serves as Data backup source. After the data backup is completed, the snapshot volume can be released, or the snapshot volume can be converted to a mirror volume of the source data volume according to the needs, and the source data can be kept in sync with the source data, and it can be used as the next snapshot time point snapshot volume. ready.
The advantage of the separated mirror snapshot technology is that the snapshot is formed extremely quickly, has almost no impact on the business application system, and has strong RPO and RTO indicators. However, its shortcomings are also extremely obvious, but this snapshot technology lacks flexibility and cannot create a snapshot for any data volume at any point in time. In addition, it requires one or more mirrored volumes with the same capacity as the source data volume, which will reduce the overall performance of the storage system when mirroring is synchronized [4]. (2) Copy-On-Write Copy-on-write is an incremental snapshot technology. In short, COW only creates a pointer table to the source data at the time of the snapshot, and only copies the source data to the snapshot volume when the source data changes.
When COW creates a snapshot, it creates a relatively small storage space for storing the source data that needs to be updated, and copies a metadata (pointer) that points to the source data volume data to form a snapshot data pointer, and the two together form the snapshot volume , Which is equivalent to a copy of the source data volume. After the snapshot volume is created, COW tracks the changes of the source data volume. When the source data of the source data volume changes for the first time, COW will need to copy the updated source data to the storage space of the snapshot volume and add a pointer to the snapshot volume data. The corresponding pointer of the table is updated to the pointer that points to the data copied to the snapshot volume, ensuring the integrity of the snapshot volume, and then replacing the updated data of the source data volume with the new data, which also ensures the real-time nature of the source data volume.
It can be seen from the working principle of COW that when creating a snapshot, only a piece of storage space and a source data pointer table need to be allocated, and the source data is not copied, so the COW snapshot creation time is very fast and has a short backup window. However, the COW snapshot technology requires one read and two write operations when the source data changes for the first time, which reduces the performance of data write operations to a certain extent [5]. Therefore, the COW snapshot technology is suitable for real-time online services, and only partial data changes are expected in the source data volume, which does not require harsh write performance.
(3) Redirect-On-Write Write-time redirection is also an incremental snapshot technology. In short, ROW only creates a pointer table to the source data at the time of the snapshot. When the source data changes, the source data is copied to the snapshot volume. The source data volume Does not change [6].
ROW is basically the same as COW. When creating a snapshot, ROW creates a relatively small storage space for storing the source data that needs to be updated, and copies a metadata (pointer) that points to the source data volume data to form a snapshot data pointer. The two together form the snapshot volume, which is equivalent to a copy of the source data volume. The main difference is that after the snapshot volume is created, ROW tracks the changes in the source data volume and writes new data that changes the source data volume to the snapshot volume. At the same time, the corresponding pointer of the snapshot volume data pointer table is updated to point to copy to the snapshot. Pointer to the volume data, while the source data volume remains unchanged. The ROW snapshot volume stores real-time updated data, while the source data volume stores the original data. If multiple snapshots need to be created, there will be multiple snapshot volumes in the ROW, and each snapshot is connected by pointers until the source data volume. That is, if you want to restore the data of a certain snapshot volume, then the data will be tracked and merged into the source data volume.
It can be seen from the working principle of ROW that when creating a snapshot, only a piece of storage space and a source data pointer table need to be allocated, and the source data is not copied. Therefore, the COW snapshot creation time is very fast and has a shorter backup window. However, when the source data changes, the ROW snapshot technology only needs one write operation, so it has good write operation performance. However, frequent read and write operations and multi-layer data pointer links will make the read operation performance slightly worse than COW. Therefore, ROW is more suitable for business applications that require higher write performance.

Conclusion
Cloud host autonomous backup can help users customize differentiated backup strategies according to their actual needs to perform autonomous backups, which can effectively improve the different backup needs of various users, and can also reduce the workload of system operation and maintenance personnel and reduce the cost of system resource occupation. The cloud host autonomous backup system is a component of the cloud data center disaster recovery backup system. The logical architecture and solution of the cloud host autonomous backup system in the article is a basic idea. It also needs to be combined with other sub-system solutions to coordinate design and planning to enhance the cloud data center The overall performance of the disaster recovery backup system meets the needs of all aspects.