• Español – América Latina
  • Português – Brasil

What is a Disaster Recovery Plan?

Disaster recovery (DR) is an organization’s ability to restore access and functionality to IT infrastructure after a disaster event, whether natural or caused by human action (or error). DR is considered a subset of business continuity, explicitly focusing on ensuring that the IT systems that support critical business functions are operational as soon as possible after a disruptive event occurs.

Today, disaster recovery planning is crucial for any business, especially those operating either partially or entirely in the cloud. Disasters that interrupt service and cause data loss can happen anytime without warning—your network could have an outage, a critical bug could get released, or your business might have to weather a natural disaster. Organizations with robust and well-tested disaster recovery strategies can minimize the impact of disruptions, achieve faster recovery times, and resume core operations rapidly when things go awry.   

Learn more about Google Cloud backup and disaster recovery features and products and how they can be used to build the right DR solution for your business.

IT disaster recovery defined

IT disaster recovery is a portfolio of policies, tools, and processes used to recover or continue operations of critical IT infrastructure, software, and systems after a natural or human-made disaster.

The first and foremost aspect of a disaster recovery plan is cloud. The cloud is considered the best solution for both business continuity and disaster recovery. The cloud eliminates the need to run a separate disaster recovery data center (or recovery site). 

What is a disaster recovery site? 

It’s a second, physical data center that’s costly to build and maintain—and with the cloud, made unnecessary.

What is considered a disaster?

Dr planning and strategies focus on responding to and recovering from disasters—events that disrupt or completely stop a business from operating..

While these events can be natural disasters like a hurricane, they can also be caused by a severe system failure, an intentional attack, or even human error. 

Types of disasters can include: 

  • Natural disasters (for example, earthquakes, floods, tornados, hurricanes, or wildfires)
  • Pandemics and epidemics
  • Cyber attacks (for example, malware, DDoS, and ransomware attacks)
  • Other intentional, human-caused threats such as terrorist or biochemical attacks
  • Technological hazards (for example, power outages, pipeline explosions, and transportation accidents)
  • Machine and hardware failure 

Importance of disaster recovery

Technology plays an increasingly important role in every aspect of business, with applications and services enabling companies to be more agile, available, and connected. This trend has contributed to the widespread adoption of cloud computing by organizations to drive growth, innovation, and exceptional customer experience. 

However, the migration to cloud environments—public, private, hybrid, or multicloud—and the rise of remote workforces are introducing more infrastructure complexity and potential risks. Disaster recovery for cloud-based systems is critical to an overall business continuity strategy. A system breakdown or unplanned downtime can have serious consequences for enterprises that rely heavily on cloud-based resources, applications, documents, and data storage to keep things running smoothly. 

In addition, data privacy laws and standards stipulate that most organizations are now required to have a disaster recovery strategy. Failure to follow DR plans can result in compliance violations and steep regulatory fines. 

Every business needs to be able to recover quickly from any event that stops day-to-day operations, no matter what industry or size. Without a disaster recovery plan, a company can suffer data loss, reduced productivity, out-of-budget expenses, and reputational damage that can lead to lost customers and revenue. 

How disaster recovery works

Disaster recovery relies on having a solid plan to get critical applications and infrastructure up and running after an outage—ideally within minutes..

An effective DR plan addresses three different elements for recovery: 

  • Preventive: Ensuring your systems are as secure and reliable as possible, using tools and techniques to prevent a disaster from occurring in the first place. This may include backing up critical data or continuously monitoring environments for configuration errors and compliance violations. 
  • Detective: For rapid recovery, you’ll need to know when a response is necessary. These measures focus on detecting or discovering unwanted events as they happen in real time. 
  • Corrective: These measures are aimed at planning for potential DR scenarios, ensuring backup operations to reduce impact, and putting recovery procedures into action to restore data and systems quickly when the time comes. 

Typically, disaster recovery involves securely replicating and backing up critical data and workloads to a secondary location or multiple locations—disaster recovery sites. A disaster recovery site can be used to recover data from the most recent backup or a previous point in time. Organizations can also switch to using a DR site if the primary location and its systems fail due to an unforeseen event until the primary one is restored.

Types of disaster recovery

The types of disaster recovery you’ll need will depend on your it infrastructure, the type of backup and recovery you use, and the assets you need to protect..

Here are some of the most common technologies and techniques used in disaster recovery: 

  • Backups: With backups, you back up data to an offsite system or ship an external drive to an offsite location. However, backups do not include any IT infrastructure, so they are not considered a full disaster recovery solution. 
  • Backup as a service (BaaS): Similar to remote data backups, BaaS solutions provide regular data backups offered by a third-party provider. 
  • Disaster recovery as a service (DRaaS): Many cloud providers offer DRaaS, along with cloud service models like IaaS and PaaS . A DRaaS service model allows you to back up your data and IT infrastructure and host them on a third-party provider’s cloud infrastructure. During a crisis, the provider will implement and orchestrate your DR plan to help recover access and functionality with minimal interruption to operations.  
  • Point-in-time snapshots: Also known as point-in-time copies, snapshots replicate data, files, or even an entire database at a specific point in time. Snapshots can be used to restore data as long as the copy is stored in a location unaffected by the event. However, some data loss can occur depending on when the snapshot was made. 
  • Virtual DR: Virtual DR solutions allow you to back up operations and data or even create a complete replica of your IT infrastructure and run it on offsite virtual machines (VMs). In the event of a disaster, you can reload your backup and resume operation quickly. This solution requires frequent data and workload transfers to be effective. 
  • Disaster recovery sites: These are locations that organizations can temporarily use after a disaster event, which contain backups of data, systems, and other technology infrastructure.

Benefits of disaster recovery

Stronger business continuity.

Every second counts when your business goes offline, impacting productivity, customer experience, and your company’s reputation. Disaster recovery helps safeguard critical business operations by ensuring they can recover with minimal or no interruption. 

Enhanced security

DR plans use data backup and other procedures that strengthen your security posture and limit the impact of attacks and other security risks. For example, cloud-based disaster recovery solutions offer built-in security capabilities, such as advanced encryption, identity and access management, and organizational policy. 

Faster recovery

Disaster recovery solutions make restoring your data and workloads easier so you can get business operations back online quickly after a catastrophic event. DR plans leverage data replication and often rely on automated recovery to minimize downtime and data loss.

Reduced recovery costs

The monetary impacts of a disaster event can be significant, ranging from loss of business and productivity to data privacy penalties to ransoms. With disaster recovery, you can avoid, or at least minimize, some of these costs. Cloud DR processes can also reduce the operating costs of running and maintaining a secondary location.

High availability

Many cloud-based services come with high availability (HA) features that can support your DR strategy. HA capabilities help ensure an agreed level of performance and offer built-in redundancy and automatic failover, protecting data against equipment failure and other smaller-scale events that may impact data availability. 

Better compliance

DR planning supports compliance requirements by considering potential risks and defining a set of specific procedures and protections for your data and workloads in the event of a disaster. This usually includes strong data backup practices, DR sites, and regularly testing your DR plan to ensure that your organization is prepared. 

Planning a disaster recovery strategy

A comprehensive disaster recovery strategy should include detailed emergency response requirements, backup operations, and recovery procedures. DR strategies and plans often help form a broader business continuity strategy, which includes contingency plans to mitigate impact beyond IT infrastructure and systems, allowing all business areas to resume normal operations as soon as possible. 

When it comes to creating disaster recovery strategies, you should carefully consider the following key metrics: 

  • Recovery time objective (RTO): The maximum acceptable length of time that systems and applications can be down without causing significant damage to the business. For example, some applications can be offline for an hour, while others might need to recover in minutes.
  • Recovery point objective (RPO) : The maximum age of data you need to recover to resume operations after a major event. RPO helps to define the frequency of backups. 

These metrics are particularly useful when conducting risk assessments and business impact analysis (BIA) for potential disasters, from moderate to worst-case scenarios. Risk assessments and BIAs evaluate all functional areas of a business and the consequences of any risks, which can help define DR goals and the actions needed to achieve them before or after an event occurs. 

When creating your recovery strategy, it’s useful to consider your RTO and RPO values and pick a DR pattern that will enable you to meet those values and your overall goals. Typically, the smaller your values (or the faster your applications need to recover after an interruption), the higher the cost to run your application. 

Cloud disaster recovery can greatly reduce the costs of RTO and RPO when it comes to fulfilling on-premises requirements for capacity, security, network infrastructure, bandwidth, support, and facilities. A highly managed service on Google Cloud can help you avoid most, if not all, complicating factors and allow you to reduce many business costs significantly. 

For more guidance on using Google Cloud to address disaster recovery, you can read our Disaster recovery planning guide or contact your account manager for help with creating a DR plan.

Solve your business challenges with Google Cloud

What is disaster recovery used for, ensure business resilience.

No matter what happens, a good DR plan can ensure that the business can return to full operations rapidly, without losing data or transactions.

Maintain competitiveness

When a business goes offline, customers are rarely loyal. They turn to competitors to get the goods or services they require. A DR plan prevents this.

Avoid regulatory risks

Many industries have regulations dictating where data can be stored and how it must be protected. Heavy fines result if these mandates are not met.

Avoid data loss

The longer a business’s systems are down, the greater the risk that data will be lost. A robust DR plan minimizes this risk.

Keep customers happy

Meeting customer service level agreements (SLAs) is always a priority. A well-executed DR plan can help businesses achieve SLAs despite challenges.

Maintain reputation

A business that has trouble resuming operations after an outage can suffer brand damage. For that reason, a solid DR plan is critical.

Related products and services

Google offers many products that can be used as building blocks when creating a secure and reliable DR plan, including Cloud Storage .

Take the next step

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Start your next project, explore interactive tutorials, and manage your account.

  • Need help getting started? Contact sales
  • Work with a trusted partner Find a partner
  • Continue browsing See all products
  • Get tips & best practices See tutorials

Disaster recovery options in the cloud

Disaster recovery strategies available to you within AWS can be broadly categorized into four approaches, ranging from the low cost and low complexity of making backups to more complex strategies using multiple active Regions. Active/passive strategies use an active site (such as an AWS Region) to host the workload and serve traffic. The passive site (such as a different AWS Region) is used for recovery. The passive site does not actively serve traffic until a failover event is triggered.

It is critical to regularly assess and test your disaster recovery strategy so that you have confidence in invoking it, should it become necessary. Use AWS Resilience Hub to continuously validate and track the resilience of your AWS workloads, including whether you are likely to meet your RTO and RPO targets.


      Graph showing disaster recovery strategies and highlights of each strategy.

Figure 6 - Disaster recovery strategies

For a disaster event based on disruption or loss of one physical data center for a well-architected , highly available workload, you may only require a backup and restore approach to disaster recovery. If your definition of a disaster goes beyond the disruption or loss of a physical data center to that of a Region or if you are subject to regulatory requirements that require it, then you should consider Pilot Light, Warm Standby, or Multi-Site Active/Active.

When choosing your strategy, and the AWS resources to implement it, keep in mind that within AWS, we commonly divide services into the data plane and the control plane . The data plane is responsible for delivering real-time service while control planes are used to configure the environment. For maximum resiliency, you should use only data plane operations as part of your failover operation. This is because the data planes typically have higher availability design goals than the control planes.

Backup and restore

Backup and restore is a suitable approach for mitigating against data loss or corruption. This approach can also be used to mitigate against a regional disaster by replicating data to other AWS Regions, or to mitigate lack of redundancy for workloads deployed to a single Availability Zone. In addition to data, you must redeploy the infrastructure, configuration, and application code in the recovery Region. To enable infrastructure to be redeployed quickly without errors, you should always deploy using infrastructure as code (IaC) using services such as AWS CloudFormation or the AWS Cloud Development Kit (AWS CDK) . Without IaC, it may be complex to restore workloads in the recovery Region, which will lead to increased recovery times and possibly exceed your RTO. In addition to user data, be sure to also back up code and configuration, including Amazon Machine Images (AMIs) you use to create Amazon EC2 instances. You can use AWS CodePipeline to automate redeployment of application code and configuration.


        Architecture diagram showing backup and restore architecture

Figure 7 - Backup and restore architecture

AWS services

Your workload data will require a backup strategy that runs periodically or is continuous. How often you run your backup will determine your achievable recovery point (which should align to meet your RPO). The backup should also offer a way to restore it to the point in time in which it was taken. Backup with point-in-time recovery is available through the following services and resources:

Amazon Elastic Block Store (Amazon EBS) snapshot

Amazon DynamoDB backup

Amazon RDS snapshot

Amazon Aurora DB snapshot

Amazon EFS backup (when using AWS Backup)

Amazon Redshift snapshot

Amazon Neptune snapshot

Amazon DocumentDB

Amazon FSx for Windows File Server , Amazon FSx for Lustre , Amazon FSx for NetApp ONTAP , and Amazon FSx for OpenZFS

For Amazon Simple Storage Service (Amazon S3), you can use Amazon S3 Cross-Region Replication (CRR) to asynchronously copy objects to an S3 bucket in the DR region continuously, while providing versioning for the stored objects so that you can choose your restoration point. Continuous replication of data has the advantage of being the shortest time (near zero) to back up your data, but may not protect against disaster events such as data corruption or malicious attack (such as unauthorized data deletion) as well as point-in-time backups. Continuous replication is covered in the AWS Services for Pilot Light section.

AWS Backup provides a centralized location to configure, schedule, and monitor AWS backup capabilities for the following services and resources:

Amazon Elastic Block Store (Amazon EBS) volumes

Amazon EC2 instances

Amazon Relational Database Service (Amazon RDS) databases (including Amazon Aurora databases)

Amazon DynamoDB tables

Amazon Elastic File System (Amazon EFS) file systems

AWS Storage Gateway volumes

AWS Backup supports copying backups across Regions, such as to a disaster recovery Region.

As an additional disaster recovery strategy for your Amazon S3 data, enable S3 object versioning . Object versioning protects your data in S3 from the consequences of deletion or modification actions by retaining the original version before the action. Object versioning can be a useful mitigation for human-error type disasters. If you are using S3 replication to back up data to your DR region, then, by default, when an object is deleted in the source bucket, Amazon S3 adds a delete marker in the source bucket only . This approach protects data in the DR Region from malicious deletions in the source Region.

In addition to data, you must also back up the configuration and infrastructure necessary to redeploy your workload and meet your Recovery Time Objective (RTO). AWS CloudFormation provides Infrastructure as Code (IaC), and enables you to define all of the AWS resources in your workload so you can reliably deploy and redeploy to multiple AWS accounts and AWS Regions. You can back up Amazon EC2 instances used by your workload as Amazon Machine Images (AMIs). The AMI is created from snapshots of your instance's root volume and any other EBS volumes attached to your instance. You can use this AMI to launch a restored version of the EC2 instance. An AMI can be copied within or across Regions. Or, you can use AWS Backup to copy backups across accounts and to other AWS Regions. The cross-account backup capability helps protect from disaster events that include insider threats or account compromise. AWS Backup also adds additional capabilities for EC2 backup—in addition to the instance’s individual EBS volumes, AWS Backup also stores and tracks the following metadata: instance type, configured virtual private cloud (VPC), security group, IAM role , monitoring configuration, and tags. However, this additional metadata is only used when restoring the EC2 backup to the same AWS Region.

Any data stored in the disaster recovery Region as backups must be restored at time of failover. AWS Backup offers restore capability, but does not currently enable scheduled or automatic restoration. You can implement automatic restore to the DR region using the AWS SDK to call APIs for AWS Backup. You can set this up as a regularly recurring job or trigger restoration whenever a backup is completed. The following figure shows an example of automatic restoration using Amazon Simple Notification Service (Amazon SNS) and AWS Lambda . Implementing a scheduled periodic data restore is a good idea as data restore from backup is a control plane operation. If this operation was not available during a disaster, you would still have operable data stores created from a recent backup.


          Diagram showing workflow of restoring and testing backups.

Figure 8 - Restoring and testing backups

Your backup strategy must include testing your backups. See the Testing Disaster Recovery section for more information. Refer to the AWS Well-Architected Lab: Testing Backup and Restore of Data for a hands-on demonstration of implementation.

Pilot light

With the pilot light approach, you replicate your data from one Region to another and provision a copy of your core workload infrastructure. Resources required to support data replication and backup, such as databases and object storage, are always on. Other elements, such as application servers, are loaded with application code and configurations, but are "switched off" and are only used during testing or when disaster recovery failover is invoked. In the cloud, you have the flexibility to deprovision resources when you do not need them, and provision them when you do. A best practice for “switched off” is to not deploy the resource, and then create the configuration and capabilities to deploy it (“switch on”) when needed. Unlike the backup and restore approach, your core infrastructure is always available and you always have the option to quickly provision a full scale production environment by switching on and scaling out your application servers.


        Reference architecture diagram for pilot light architecture

Figure 9 - Pilot light architecture

A pilot light approach minimizes the ongoing cost of disaster recovery by minimizing the active resources, and simplifies recovery at the time of a disaster because the core infrastructure requirements are all in place. This recovery option requires you to change your deployment approach. You need to make core infrastructure changes to each Region and deploy workload (configuration, code) changes simultaneously to each Region. This step can be simplified by automating your deployments and using infrastructure as code (IaC) to deploy infrastructure across multiple accounts and Regions (full infrastructure deployment to the primary Region and scaled down/switched-off infrastructure deployment to DR regions). It is recommended you use a different account per Region to provide the highest level of resource and security isolation (in the case compromised credentials are part of your disaster recovery plans as well).

With this approach, you must also mitigate against a data disaster. Continuous data replication protects you against some types of disaster, but it may not protect you against data corruption or destruction unless your strategy also includes versioning of stored data or options for point-in-time recovery. You can back up the replicated data in the disaster Region to create point-in-time backups in that same Region.

In addition to using the AWS services covered in the Backup and Restore section to create point-in-time backups, also consider the following services for your pilot light strategy.

For pilot light, continuous data replication to live databases and data stores in the DR region is the best approach for low RPO (when used in addition to the point-in-time backups discussed previously). AWS provides continuous, cross-region, asynchronous data replication for data using the following services and resources:

Amazon Simple Storage Service (Amazon S3) Replication

Amazon RDS read replicas

Amazon Aurora global databases

Amazon DynamoDB global tables

Amazon DocumentDB global clusters

Global Datastore for Amazon ElastiCache for Redis

With continuous replication, versions of your data are available almost immediately in your DR Region. Actual replication times can be monitored using service features like S3 Replication Time Control (S3 RTC) for S3 objects and management features of Amazon Aurora global databases .

When failing over to run your read/write workload from the disaster recovery Region, you must promote an RDS read replica to become the primary instance. For DB instances other than Aurora, the process takes a few minutes to complete and rebooting is part of the process. For Cross-Region Replication (CRR) and failover with RDS, using Amazon Aurora global database provides several advantages. Global database uses dedicated infrastructure that leaves your databases entirely available to serve your application, and can replicate to the secondary Region with typical latency of under a second (and within an AWS Region is much less than 100 milliseconds). With Amazon Aurora global database, if your primary Region suffers a performance degradation or outage, you can promote one of the secondary regions to take read/write responsibilities in less than one minute even in the event of a complete regional outage. You can also configure Aurora to monitor the RPO lag time of all secondary clusters to make sure that at least one secondary cluster stays within your target RPO window.

A scaled down version of your core workload infrastructure with fewer or smaller resources must be deployed in your DR Region. Using AWS CloudFormation, you can define your infrastructure and deploy it consistently across AWS accounts and across AWS Regions. AWS CloudFormation uses predefined pseudo parameters to identify the AWS account and AWS Region in which it is deployed. Therefore, you can implement condition logic in your CloudFormation templates to deploy only the scaled-down version of your infrastructure in the DR Region. For EC2 instance deployments, an Amazon Machine Image (AMI) supplies information such as hardware configuration and installed software. You can implement an Image Builder pipeline that creates the AMIs you need and copy these to both your primary and backup Regions. This helps to ensure that these golden AMIs have everything you need to re-deploy or scale-out your workload in a new region, in case of a disaster event. Amazon EC2 instances are deployed in a scaled-down configuration (less instances than in your primary Region). To scale-out the infrastructure to support production traffic, see Amazon EC2 Auto Scaling in the Warm Standby section.

For an active/passive configuration such as pilot light, all traffic initially goes to the primary Region and switches to the disaster recovery Region if the primary Region is no longer available. This failover operation can be initiated either automatically or manually. Automatically initiated failover based on health checks or alarms should be used with caution. Even using the best practices discussed here, recovery time and recovery point will be greater than zero, incurring some loss of availability and data. If you fail over when you don’t need to (false alarm), then you incur those losses. Manually initiated failover is therefore often used. In this case, you should still automate the steps for failover, so that the manual initiation is like the push of a button.

There are several traffic management options to consider when using AWS services.

One option is to use Amazon Route 53 . Using Amazon Route 53, you can associate multiple IP endpoints in one or more AWS Regions with a Route 53 domain name. Then, you can route traffic to the appropriate endpoint under that domain name. On failover you need to switch traffic to the recovery endpoint, and away from the primary endpoint. Amazon Route 53 health checks monitor these endpoints. Using these health checks, you can configure automatically initiated DNS failover to ensure traffic is sent only to healthy endpoints, which is a highly reliable operation done on the data plane. To implement this using manually initiated failover you can use Amazon Route 53 Application Recovery Controller . With Route 53 ARC, you can create Route 53 health checks that do not actually check health, but instead act as on/off switches that you have full control over. Using the AWS CLI or AWS SDK, you can script failover using this highly available, data plane API. Your script toggles these switches (the Route 53 health checks) telling Route 53 to send traffic to the recovery Region instead of the primary Region. Another option for manually initiated failover that some have used is to use a weighted routing policy and change the weights of the primary and recovery Regions so that all traffic goes to the recovery Region. However, be aware this is a control plane operation and therefore not as resilient as the data plane approach using Amazon Route 53 Application Recovery Controller.

Another option is to use AWS Global Accelerator . Using AnyCast IP, you can associate multiple endpoints in one or more AWS Regions with the same static public IP address or addresses. AWS Global Accelerator then routes traffic to the appropriate endpoint associated with that address. Global Accelerator health checks monitor endpoints. Using these health checks, AWS Global Accelerator checks the health of your applications and routes user traffic automatically to the healthy application endpoint. For manually initiated failover, you can adjust which endpoint receives traffic using traffic dials, but note this is a control plane operation. Global Accelerator offers lower latencies to the application endpoint since it makes use of the extensive AWS edge network to put traffic on the AWS network backbone as soon as possible. Global Accelerator also avoids caching issues that can occur with DNS systems (like Route 53).

Amazon CloudFront offers origin failover, where if a given request to the primary endpoint fails, CloudFront routes the request to the secondary endpoint. Unlike the failover operations described previously, all subsequent requests still go to the primary endpoint, and failover is done per each request.

AWS Elastic Disaster Recovery

AWS Elastic Disaster Recovery (DRS) continuously replicates server-hosted applications and server- hosted databases from any source into AWS using block-level replication of the underlying server. Elastic Disaster Recovery enables you to use a Region in AWS Cloud as a disaster recovery target for a workload hosted on-premises or on another cloud provider, and its environment. It can also be used for disaster recovery of AWS hosted workloads if they consist only of applications and databases hosted on EC2 (that is, not RDS). Elastic Disaster Recovery uses the Pilot Light strategy, maintaining a copy of data and “switched-off” resources in an Amazon Virtual Private Cloud (Amazon VPC) used as a staging area. When a failover event is triggered, the staged resources are used to automatically create a full-capacity deployment in the target Amazon VPC used as the recovery location.


          Architecture diagram showing AWS Elastic Disaster Recovery architecture.

Figure 10 - AWS Elastic Disaster Recovery architecture

Warm standby

The warm standby approach involves ensuring that there is a scaled down, but fully functional, copy of your production environment in another Region. This approach extends the pilot light concept and decreases the time to recovery because your workload is always-on in another Region. This approach also allows you to more easily perform testing or implement continuous testing to increase confidence in your ability to recover from a disaster.


        Architecture diagram showing warm standby architecture.

Figure 11 - Warm standby architecture

Note: The difference between pilot light and warm standby can sometimes be difficult to understand. Both include an environment in your DR Region with copies of your primary Region assets. The distinction is that pilot light cannot process requests without additional action taken first, whereas warm standby can handle traffic (at reduced capacity levels) immediately. The pilot light approach requires you to “turn on” servers, possibly deploy additional (non-core) infrastructure, and scale up, whereas warm standby only requires you to scale up (everything is already deployed and running). Use your RTO and RPO needs to help you choose between these approaches.

All of the AWS services covered under backup and restore and pilot light are also used in warm standby for data backup, data replication, active/passive traffic routing, and deployment of infrastructure including EC2 instances.

Amazon EC2 Auto Scaling is used to scale resources including Amazon EC2 instances, Amazon ECS tasks, Amazon DynamoDB throughput, and Amazon Aurora replicas within an AWS Region. Amazon EC2 Auto Scaling scales deployment of EC2 instance across Availability Zones within an AWS Region, providing resiliency within that Region. Use Auto Scaling to scale out your DR Region to full production capability, as part of a pilot light or warm standby strategies. For example, for EC2, increase the desired capacity setting on the Auto Scaling group. You can adjust this setting manually through the AWS Management Console, automatically through the AWS SDK, or by redeploying your AWS CloudFormation template using the new desired capacity value. You can use AWS CloudFormation parameters to make redeploying the CloudFormation template easier. Ensure that service quotas in your DR Region are set high enough so as to not limit you from scaling up to production capacity.

Because Auto Scaling is a control plane activity, taking a dependency on it will lower the resiliency of your overall recovery strategy. It is a trade-off. You can choose to provision sufficient capacity such that the recovery Region can handle the full production load as deployed. This statically stable configuration is called hot standby (see the next section). Or you may choose to provision fewer resources which will cost less, but take a dependency on Auto Scaling. Some DR implementations will deploy enough resources to handle initial traffic, ensuring low RTO, and then rely on Auto Scaling to ramp up for subsequent traffic.

Multi-site active/active

You can run your workload simultaneously in multiple Regions as part of a multi-site active/active or hot standby active/passive strategy. Multi-site active/active serves traffic from all regions to which it is deployed, whereas hot standby serves traffic only from a single region, and the other Region(s) are only used for disaster recovery. With a multi-site active/active approach, users are able to access your workload in any of the Regions in which it is deployed. This approach is the most complex and costly approach to disaster recovery, but it can reduce your recovery time to near zero for most disasters with the correct technology choices and implementation (however data corruption may need to rely on backups, which usually results in a non-zero recovery point). Hot standby uses an active/passive configuration where users are only directed to a single region and DR regions do not take traffic. Most customers find that if they are going to stand up a full environment in the second Region, it makes sense to use it active/active. Alternatively, if you do not want to use both Regions to handle user traffic, then Warm Standby offers a more economical and operationally less complex approach.


        Architecture diagram showing multi-site active/active architecture (change one
          Active path to Inactive for hot standby)

Figure 12 - Multi-site active/active architecture (change one Active path to Inactive for hot standby)

With multi-site active/active, because the workload is running in more than one Region, there is no such thing as failover in this scenario. Disaster recovery testing in this case would focus on how the workload reacts to loss of a Region: Is traffic routed away from the failed Region? Can the other Region(s) handle all the traffic? Testing for a data disaster is also required. Backup and recovery are still required and should be tested regularly. It should also be noted that recovery times for a data disaster involving data corruption, deletion, or obfuscation will always be greater than zero and the recovery point will always be at some point before the disaster was discovered. If the additional complexity and cost of a multi-site active/active (or hot standby) approach is required to maintain near zero recovery times, then additional efforts should be made to maintain security and to prevent human error to mitigate against human disasters.

All of the AWS services covered under backup and restore , pilot light , and warm standby also are used here for point-in-time data backup, data replication, active/active traffic routing, and deployment and scaling of infrastructure including EC2 instances.

For the active/passive scenarios discussed earlier (Pilot Light and Warm Standby), both Amazon Route 53 and AWS Global Accelerator can be used for route network traffic to the active region. For the active/active strategy here, both of these services also enable the definition of policies that determine which users go to which active regional endpoint. With AWS Global Accelerator you set a traffic dial to control the percentage of traffic that is directed to each application endpoint. Amazon Route 53 supports this percentage approach, and also multiple other available policies including geoproximity and latency based ones. Global Accelerator automatically leverages the extensive network of AWS edge servers , to onboard traffic to the AWS network backbone as soon as possible, resulting in lower request latencies.

Asynchronous data replication with this strategy enables near-zero RPO. AWS services like Amazon Aurora global database use dedicated infrastructure that leaves your databases entirely available to serve your application, and can replicate to up to five secondary Region with typical latency of under a second. With active/passive strategies, writes occur only to the primary Region. The difference with active/active is designing how data consistency with writes to each active Region are handled. It is common to design user reads to be served from the Region closest to them, known as read local . With writes, you have several options:

A write global strategy routes all writes to a single Region. In case of failure of that Region, another Region would be promoted to accept writes. Aurora global database is a good fit for write global , as it supports synchronization with read-replicas across Regions, and you can promote one of the secondary Regions to take read/write responsibilities in less than one minute. Aurora also supports write forwarding, which lets secondary clusters in an Aurora global database forward SQL statements that perform write operations to the primary cluster.

A write local strategy routes writes to the closest Region (just like reads). Amazon DynamoDB global tables enables such a strategy, allowing read and writes from every region your global table is deployed to. Amazon DynamoDB global tables use a last writer wins reconciliation between concurrent updates.

A write partitioned strategy assigns writes to a specific Region based on a partition key (like user ID) to avoid write conflicts. Amazon S3 replication configured bi-directionally can be used for this case, and currently supports replication between two Regions. When implementing this approach, make sure to enable replica modification sync on both buckets A and B to replicate replica metadata changes like object access control lists (ACLs), object tags, or object locks on the replicated objects. You can also configure whether or not to replicate delete markers between buckets in your active Regions. In addition to replication, your strategy must also include point-in-time backups to protect against data corruption or destruction events.

AWS CloudFormation is a powerful tool to enforce consistently deployed infrastructure among AWS accounts in multiple AWS Regions. AWS CloudFormation StackSets extends this functionality by enabling you to create, update, or delete CloudFormation stacks across multiple accounts and Regions with a single operation. Although AWS CloudFormation uses YAML or JSON to define Infrastructure as Code, AWS Cloud Development Kit (AWS CDK) allows you to define Infrastructure as Code using familiar programming languages. Your code is converted to CloudFormation which is then used to deploy resources in AWS.

Warning

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Thanks for letting us know we're doing a good job!

If you've got a moment, please tell us what we did right so we can do more of it.

Thanks for letting us know this page needs work. We're sorry we let you down.

If you've got a moment, please tell us how we can make the documentation better.

TechRepublic

Account information.

disaster recovery plan for cloud services

Share with Your Friends

How to Get the Most Out of Your Cloud Disaster Recovery Plan

Your email has been sent

Image of Scott Matteson

On the surface, it would seem cloud computing was made for disaster recovery, a “set it and forget it” concept due to the breadth and robust features of cloud resources.

However, the concept isn’t cut and dry. While redundancy and data protection are the core elements of maintaining uptime and recovering from disasters, it’s important to focus on the individual trees in the forest for the best cloud operational results.

Amitabh Sinha, co-founder and CEO of Workspot; Ofer Maor, co-founder and chief technology officer at Mitiga; and Or Aspir, cloud security research team leader at Mitiga, shared advice on cloud disaster recovery best practices with TechRepublic.

No. 1 challenge: Maintaining uptime in cloud environments

Alleviating cloud challenges, how disaster recovery factors in.

Amitabh Sinha: The number one challenge is the level of availability the cloud provides. Today, the major public clouds — AWS, Google and Azure — offer 99.9% availability, which means more than eight hours a year of downtime, a number that significantly hinders operations for most mission-critical workloads and can cost organizations millions of dollars in lost productivity.

The second major challenge is about cloud capacity. An organization might try to optimize cloud costs by shutting down some of their virtual machines when not in use, but what happens when you need to bring them back up? Even if the cloud is available, there may not be capacity in that cloud region or cloud to accommodate bringing those machines back up again, and that has another chilling effect on productivity.

In a disaster recovery scenario, capacity constraints are an even greater risk if you can’t get the capacity you need to get your business back up and running.

SEE: Disaster recovery and business continuity plan

Ofer Maor: The notion of the cloud and its shared responsibility model is that the responsibility for maintenance and availability of the environment lies on the cloud vendor. The reality is more complex.

The cloud vendor does not commit to 100% availability, only close to it, and while most of the time the environments are up, we have seen multiple outages in various cloud vendors over the last couple of years.

Furthermore, other aspects of availability revolve around the specific applications and utilization of resources, which are already the responsibility of the user and not the cloud vendor.

Finally, as attacks are moving to the cloud, security breaches can often lead to disruption of service through various means, from DOS to abuse of resources and ransomware attacks.

Or Aspir: Moving to the cloud requires organizations to acquire new skills, adapt existing processes and familiarize themselves with the intricacies of cloud infrastructure and services. This learning curve can slow down deployment, configuration and troubleshooting processes, potentially impacting uptime as teams navigate the complexities of cloud technologies.

Despite the availability of multi-zone or multi-region redundancies provided by cloud providers, many companies opt for centralized regions/zones due to compliance and cost considerations. However, this centralized approach makes them susceptible to power outages, network disruptions and physical damage within a specific zone, posing risks to their uptime and service availability.

Amitabh Sinha: Particularly for end-user computing (EUC), a multi-cloud and multi-region approach is critical. Running EUC workloads across cloud regions and across major clouds can drastically reduce the amount of downtime businesses experience.

Information technology leaders should expect capabilities that enable automatic failover, for example, from a primary virtual desktop to a secondary desktop — whether the secondary desktop is in another cloud region or an alternative cloud — in a way that is completely transparent to the end user. This always-available virtual desktop is now a reality. Virtual desktop deployment should be spread across multiple regions and clouds to ensure uptime.

Or Aspir: Effective monitoring and incident response mechanisms are essential for identifying and addressing issues promptly. Use proactive planning to understand your company’s recovery time objective (RTO) and recovery point objective (RPO).

Explore cloud providers’ offerings for ensuring uptime and implementing effective disaster recovery strategies. One good example is the AWS disaster recovery blog posts .

Amitabh Sinha: RTO is the metric everyone considers in a DR context. How long will it take you to get your business back up and running after a disruption? In the legacy, on-premises data center world, RTO was typically measured in days — with potentially catastrophic consequences for the business.

The two dimensions we talked about earlier — cloud availability and cloud capacity. In a DR context, as well as in a day-to-day operational context, the organization must have the agility to recover from a business disruption, whether a cloud outage, a weather event, or a ransomware attack in a few minutes. An RTO of days is no longer acceptable. Instead, the multi-cloud approach anticipates the cloud availability and cloud capacity constraints and solves them proactively.

Ofer Maor: Disaster recovery is a crucial aspect of this. While some uptime issues may be a result of a timed event, such as outage of a CSP region (in which case, no much DR is needed — it will come back on its own), other cases may include the destruction of cloud environments and in more extreme cases of the data itself, requiring disaster recovery measures to take place.

Naturally, backups are a crucial piece of the puzzle that must be done by the cloud (and SaaS) customers as they cannot rely on the cloud vendor to do them (at least in most shared responsibility models). One of the areas where most organizations are still lagging behind is on SaaS backup and recovery, but if an organization is breached and their entire Sharepoint or GDrive is held ransom by an attacker, the vendor may not be able to help.

How cloud disaster recovery compares to on-premise 

Amitabh Sinha: With on-prem, it can take days or weeks to be back up and running again; it is a costly endeavor and very time-consuming for teams. In a cloud DR scenario companies can be up and running in minutes if they have chosen the right solutions.

How weather events factor in and related recommendations

Or Aspir: Severe weather conditions like hurricanes, floods, or storms can disrupt data centers within a specific availability zone in the cloud. These disruptions can cause power outages, network disruptions or physical damage, resulting in service interruptions and affecting the availability of cloud resources within that zone. An example of such a case is the outage of multiple Google Cloud services in Europe on April 25, 2023. This outage occurred due to a combination of a flood and fire incident.

Our recommendations are to verify cloud services’ availability zone redundancy for resilience against severe weather conditions.

How do more eyes on the end user decrease the costly downtime of outages?

Amitabh Sinha: Getting real-time visibility into the end user is crucial to mitigate any downtime. End-user observability allows IT teams to understand the problems users are having. By leveraging that data, teams can understand the level of the problem — from troubles with only accessing only a single desktop or app to the performance of those resources.

They can figure out if there is a more significant problem, such as a trend with a specific location, if it is impacting only a subset of end-users or if it has the potential to become a widespread issue. They can determine if it is a network issue or if a pattern is emerging in terms of cloud availability and access that could affect productivity and then they can take action in real time to resolve the problem.

In data center environments, IT teams only have control and visibility inside that data center itself. These legacy systems do not have the levels of end-user visibility that cloud environments do. By running cloud end-user observability tools IT teams can take real-time action to quickly identify and resolve any existing issues.

What else do you recommend IT professionals focus on here?

Amitabh Sinha: Create direct, in-product end-user feedback mechanisms for all end user applications (e.g., surveys at the end of a Teams or Zoom session).

Leverage workload-specific cloud-native observability tools, like DataDog for server workloads, and Workspot and ControlUp for end-user computing workloads.

Define people and processes to act on insights derived from the observability tools so problems are rapidly solved.

Or Aspir : Expanding the focus beyond natural disasters or malfunctions is crucial to address the potential impact of security incidents on disaster recovery. It is important to understand that under the shared-responsibility model, customers are responsible for the security of using their own cloud or SaaS instance, and any breach resulting from a misconfiguration or a compromised user is their responsibility and therefore they will be responsible for dealing with the repercussions of such an event.

This includes scenarios where compromised identities possess permissions not only on production systems but also on backup systems. By recognizing and preparing for such security-related disasters, organizations can enhance their overall disaster recovery strategies and mitigate the risks associated with unauthorized access and compromised identities.

Having a robust incident response plan, which may include collaboration with third-party entities, can significantly aid in addressing disaster recovery in the event of security incidents.

Read next: Your organization needs regional disaster recovery: Here’s how to build it on Kubernetes

Subscribe to the Cloud Insider Newsletter

This is your go-to resource for the latest news and tips on the following topics and more, XaaS, AWS, Microsoft Azure, DevOps, virtualization, the hybrid cloud, and cloud security. Delivered Mondays and Wednesdays

  • Best cloud backup services and solutions
  • Cloud data warehouse guide and checklist
  • Become a Microsoft Azure administrator online and start a great career
  • Cloud computing: More must-read coverage

Image of Scott Matteson

Create a TechRepublic Account

Get the web's best business technology news, tutorials, reviews, trends, and analysis—in your inbox. Let's start with the basics.

* - indicates required fields

Sign in to TechRepublic

Lost your password? Request a new password

Reset Password

Please enter your email adress. You will receive an email message with instructions on how to reset your password.

Check your email for a password reset link. If you didn't receive an email don't forgot to check your spam folder, otherwise contact support .

Welcome. Tell us a little bit about you.

This will help us provide you with customized content.

Want to receive more TechRepublic news?

You're all set.

Thanks for signing up! Keep an eye out for a confirmation email from our team. To ensure any newsletters you subscribed to hit your inbox, make sure to add [email protected] to your contacts list.

Explore top-rated data protection at an affordable price

  • Customer stories

Learn how organizations of all sizes and industries successfully protect data with NAKIVO

  • Product Datasheet
  • Backup Solution for MSPs
  • Backup for Virtualization
  • Microsoft 365 Backup
  • Ransomware Protection
  • Real-Time Replication BETA

Gartner® Magic QuadrantTM

Enterprise Backup and Recovery Solution

  • Virtual: VMware | Hyper-V | Nutanix AHV
  • Physical server: Windows | Linux
  • Workstations: Windows | Linux
  • SaaS: Microsoft 365
  • Cloud: Amazon EC2
  • File Share: NAS | File Server
  • Apps: SQL | Active Directory Exchange | Oracle Database
  • Virtual: VMware | Hyper-V
  • MSP SOLUTION
  • DISASTER RECOVERY
  • VMware Disaster Recovery
  • REAL-TIME REPLICATION beta
  • IT MONITORING
  • Backup Malware Scan
  • SMB | Enterprise | Education Remote Office Backup Hybrid Cloud Backup
  • Raspberry Pi
  • Western Digital
  • Backblaze B2
  • S3-Compatible Storage
  • EMC Data Domain
  • HPE StoreOnce
  • NEC HYDRAstor
  • Backup from HPE Storage Snapshots
  • Pricing and Editions
  • Pricing Calculator
  • Get a Quote
  • Find a Reseller
  • Find an MSP
  • Renew License

More growth opportunities with the NAKIVO Partner Program

  • Why Partner
  • Solution Partner Signup
  • Deal Registration

Grow your customer base with powerful BaaS and DRaaS

  • MSP Partner Signup
  • Technology Partners
  • Storage Certification Program
  • Log In to the Partner Portal
  • SUPPORT RESOURCES

Find answers to your questions in our technical documentation

  • Knowledge Base
  • Release Notes
  • API Reference Guide
  • SUPPORT CENTER

Reach out to our highly-rated support team about any issues

  • Send Support Bundle
  • How-to Videos

Disaster Recovery In Cloud Computing: All You Need To Know

Data is the most valuable asset of modern-day organizations. Its loss can result in irreversible damage to your business, including the loss of productivity, revenue, reputation, and even customers. It is hard to predict when a disaster will occur and how serious its impact will be. However, what you can control is the way you respond to a disaster and how successfully your organization will recover from it. Get to discover post how you can use disaster recovery in cloud computing for your benefit.

Ensure Availability with NAKIVO

Ensure Availability with NAKIVO

Meet strict requirements for service availability in virtual infrastructures. Achieve uptime objectives with robust DR orchestration and automation features.

Backup and Disaster Recovery in Cloud Computing

Cloud computing is the on-demand delivery of computing services over the internet (more often referred to as ‘the cloud’) which operates on a pay-as-you-go basis. Cloud computing vendors generally provide access to the following services:

  • Infrastructure as a service (IaaS) allows you to rent IT infrastructure, including servers, storages and network component, from the cloud vendor.
  • Platform as a service (PaaS) allows you to rent a computing platform from the cloud provider for developing, testing, and configuring software applications.
  • Software as a service (SaaS) allows you to access software applications which are hosted on the cloud.

As you can see, each cloud computing service is designed to help you achieve different business needs. More so, cloud computing can considerably improve data the security and high availability of your virtualized workloads. Let’s discuss how you can approach disaster recovery in the cloud computing environment.

Cloud disaster recovery vs. traditional disaster recovery

Cloud disaster recovery is a cloud computing service which allows for storing and recovering system data on a remote cloud-based platform. To better understand what disaster recovery in cloud computing entails, let’s compare it to traditional disaster recovery.

The essential element of traditional disaster recovery is a secondary data center, which can store all redundant copies of critical data, and to which you can fail over production workloads. A traditional on-premises DR site generally includes the following:

  • A dedicated facility for housing the IT infrastructure, including maintenance employees and computing equipment.
  • Sufficient server capacity to ensure a high level of operational performance and allow the data center to scale up or scale out depending on your business needs.
  • Internet connectivity with sufficient bandwidth to enable remote access to the secondary data center.
  • Network infrastructure, including firewalls, routers, and switches, to ensure a reliable connection between the primary and secondary data centers, as well as provide data availability.

However, traditional disaster recovery can often be too complex to manage and monitor. Moreover, support and maintenance of a physical DR site can be extremely expensive and time-consuming. When working with an on-premises data center, you can expand your server capacity only by purchasing additional computing equipment, which can require a lot of money, time, and effort.

Disaster Recovery in Cloud Computing

Disaster recovery in cloud computing can effectively deal with most issues of traditional disaster recovery. The benefits include the following:

  • You don’t need to build a secondary physical site, and buy additional hardware and software to support critical operations. With disaster recovery in cloud computing, you get access to cloud storage, which can be used as a secondary DR site.
  • Depending on your current business demands, you can easily scale up or down by adding required cloud computing resources.
  • With its affordable pay-as-you go pricing model, you are required to pay only for the cloud computing services you actually use.
  • Disaster recovery in cloud computing can be performed in a matter of minutes from anywhere. The only thing you need is a device that is connected to the internet.
  • You can store your backed up data across multiple geographical locations, thus eliminating a single point of failure. You can always have a backup copy, even if one of the cloud data centers fails.
  • State-of-the-art network infrastructure ensures that any issues or errors can be quickly identified and taken care of by a cloud provider. Moreover, the cloud provider ensures 24/7 support and maintenance of your cloud storage, including hardware and software upgrades.

Why Choose Disaster Recovery in Cloud Computing

The primary goal of disaster recovery is to minimize the overall impact of a disaster on business performance. Disaster recovery in cloud computing can do just that. In case of disaster, critical workloads can be failed over to a DR site in order to resume business operations. As soon as your production data center gets restored, you can fail back from the cloud and restore your infrastructure and its components to their original state. As a result, business downtime is reduced and service disruption is minimized.

Backup and Disaster Recovery Planning in Cloud Computing

Due to its cost-efficiency, scalability, and reliability, disaster recovery in cloud computing has become the most lucrative option for small and medium-sized businesses (SMBs). Generally, SMBs don’t have a sufficient budget or resources to build and maintain their own DR site. Cloud providers offer you access to cloud storage, which can become a cost-effective and long-lasting solution to data protection as well as disaster recovery.

How to Design a Cloud-Based Disaster Recovery Plan

After considering the benefits of cloud computing in disaster recovery, it is time to design a comprehensive DR plan. In fact, you can read one of our blog posts which walks you through  the entire process of a creating a DR plan . Below, we are going to discuss how to create a DR plan which works in the cloud environment.

As a rule, an effective cloud-based DR plan should include the following steps:

  • Perform a risk assessment and business impact analysis.
  • Choose prevention, preparedness, response, and recovery measures.
  • Test and update your cloud-based DR plan.

Let’s discuss how disaster recovery planning works in cloud computing.

Perform a risk assessment and business impact analysis

The first step in a disaster recovery planning in cloud computing is to assess your current IT infrastructure, as well as identify potential threats and risk factors that your organization is most exposed to.

A risk assessment helps you discover vulnerabilities of your IT infrastructure and identify which business functions and components are most critical. At the same time, a business impact analysis allows you to estimate how unexpected service disruption might affect your business.

Based on these estimations, you can also calculate the financial and non-financial costs associated with a DR event, particularly Recovery Time Objective (RTO) and Recovery Point Objective (RPO). The RTO is the maximum amount of time that IT infrastructure can be down before any serious damage is done to your business. The RPO is the maximum amount of data which can be lost as a result of service disruption. Understanding the RTO and RPO can help you decide which data and applications to protect, how many resources to invest in achieving DR objectives, and which DR strategies to implement in your cloud-based DR plan.

Implement prevention, preparedness, response, and recovery measures

The next step is to decide which prevention, preparedness, response, and recovery (PPRR) measures should be implemented in disaster recovery of the cloud computing environment. In a nutshell, PPRR measures can accomplish the following:

  • Prevention  allows you to reduce possible threats and eliminate system vulnerabilities in order to prevent a disaster from occurring in the first place.
  • Preparedness  entails creating the outline of a DR plan which states what to do during an actual DR event. Remember to document every step of the process to ensure that the DR plan is properly executed during a disaster.
  • Response  describes which DR strategies should be implemented when a disaster strikes in order to address an incident and mitigate its impact.
  • Recovery  determines what should be done to successfully recover your infrastructure in case of a disaster and how to minimize the damage.

After you have determined which approach to disaster recovery to implement, you should choose a data protection solution capable of putting your DR plan into action and achieving DR objectives. Choose the solution which meets your business needs and complies with your infrastructure requirements. For this purpose, consider the following criteria:

  • Available services
  • Hardware capacity
  • Data security
  • Ease of use
  • Service scalability

Test and update your cloud-based DR plan

After you have created and documented the DR plan, you should run regular tests to see if your plan actually works. You can test whether business-critical data and applications can be recovered within the expected time frame.

Testing a cloud-based DR plan can help you identify any issues and inconsistencies in your current approach to disaster recovery in cloud computing. After the test run, you can decide what your DR plan lacks and how it should be updated in order to achieve the required results and eliminate existing issues.

Try NAKIVO Backup & Replication

Try NAKIVO Backup & Replication

Get a free trial to explore all the solution’s data protection capabilities. 15 days for free. Zero feature or capacity limitations. No credit card required.

People also read

Picture

Black and blue background

Disaster recovery (DR) consists of IT technologies and best practices designed to prevent or minimize data loss and business disruption resulting from catastrophic events—everything from equipment failures and localized power outages to cyberattacks, civil emergencies, criminal or military attacks, and natural disasters.

Many businesses—especially small- and mid-sized organizations—neglect to develop a reliable, practicable disaster recovery plan. Without such a plan, they have little protection from the impact of significantly disruptive events.

Infrastructure failure can cost as much as  USD 100,000 per hour  (link resides outside IBM), and critical application failure costs can range from USD 500,000 to USD 1 million per hour. Many businesses cannot recover from such losses. More than 40% of small businesses will not re-open after experiencing a disaster, and among those that do, an additional 25% will fail within the first year after the crisis. Disaster recovery planning can dramatically reduce these risks.

Disaster recovery planning involves strategizing, planning, deploying appropriate technology, and continuous testing. Maintaining backups of your data is a critical component of disaster recovery planning, but a backup and recovery process alone does not constitute a full disaster recovery plan.

Disaster recovery also involves ensuring that adequate storage and compute is available to maintain robust failover and failback procedures.  Failover  is the process of offloading workloads to backup systems so that production processes and end-user experiences are disrupted as little as possible.  Failback  involves switching back to the original primary systems.

Read our article to learn more information about  the important distinction between backup and disaster recovery planning .

Business continuity planning creates systems and processes to ensure that all areas of your enterprise will be able to maintain essential operations or be able to resume them as quickly as possible in the event of a crisis or emergency. Disaster recovery planning is the subset of business continuity planning that focuses on recovering IT infrastructure and systems.

Business impact analysis

The creation of a comprehensive disaster recovery plan begins with business impact analysis. When performing this analysis, you’ll create a series of detailed disaster scenarios that can then be used to predict the size and scope of the losses you’d incur if certain business processes were disrupted. What if your customer service call center was destroyed by fire, for instance? Or an earthquake struck your headquarters?

This will allow you to identify the areas and functions of the business that are the most critical and enable you to determine how much downtime each of these critical functions could tolerate. With this information in hand, you can begin to create a plan for how the most critical operations could be maintained in various scenarios.

IT disaster recovery planning should follow from and support business continuity planning. If, for instance, your business continuity plan calls for customer service representatives to work from home in the aftermath of a call center fire, what types of hardware, software, and IT resources would need to be available to support that plan?

Risk analysis

Assessing the likelihood and potential consequences of the risks your business faces is also an essential component of disaster recovery planning. As cyberattacks and ransomware become more prevalent, it’s critical to understand the general cybersecurity risks that all enterprises confront today as well as the risks that are specific to your industry and geographical location.

For a variety of scenarios, including natural disasters, equipment failure, insider threats, sabotage, and employee errors, you’ll want to evaluate your risks and consider the overall impact on your business. Ask yourself the following questions:

  • What financial losses due to missed sales opportunities or disruptions to revenue-generating activities would you incur?
  • What kinds of damage would your brand’s reputation undergo? How would customer satisfaction be impacted?
  • How would employee productivity be impacted? How many labor hours might be lost?
  • What risks might the incident pose to human health or safety?
  • Would progress towards any business initiatives or goals be impacted? How?

Prioritizing applications

Not all workloads are equally critical to your business’s ability to maintain operations, and downtime is far more tolerable for some applications than it is for others. Separate your systems and applications into three tiers, depending on how long you could stand to have them be down and how serious the consequences of data loss would be.

  • Mission-critical:  Applications whose functioning is essential to your business’s survival.
  • Important:  Applications for which you could tolerate relatively short periods of downtime.
  • Non-essential:  Applications you could temporarily replace with manual processes or do without.

Documenting dependencies

The next step in disaster recovery planning is creating a complete inventory of your hardware and software assets. It’s essential to understand critical application interdependencies at this stage. If one software application goes down, which others will be affected?

Designing resiliency—and disaster recovery models—into systems as they are initially built is the best way to manage application interdependencies. It’s all too common in today’s  microservices -based architectures to discover processes that can’t be initiated when other systems or processes are down, and vice versa. This is a challenging situation to recover from, and it’s vital to uncover such problems when you have time to develop alternate plans for your systems and processes—before an actual disaster strikes.

Establishing recovery time objectives, recovery point objectives, and recovery consistency objectives

By considering your risk and business impact analyses, you should be able to establish objectives for how long you’d need it to take to bring systems back up, how much data you could stand to use, and how much data corruption or deviation you could tolerate.

Your recovery time objective (RTO) is the maximum amount of time it should take to restore application or system functioning following a service disruption.

Your recovery point objective (RPO) is the maximum age of the data that must be recovered in order for your business to resume regular operations. For some businesses, losing even a few minutes’ worth of data can be catastrophic, while those in other industries may be able to tolerate longer windows.

A recovery consistency objective (RCO) is established in the service-level agreement (SLA) for continuous data protection services. It is a metric that indicates how many inconsistent entries in business data from recovered processes or systems are tolerable in disaster recovery situations, describing business data integrity across complex application environments.

Regulatory compliance issues

All disaster recovery software and solutions that your enterprise have established must satisfy any data protection and security requirements that you’re mandated to adhere to. This means that all data backup and failover systems must be designed to meet the same standards for ensuring data confidentiality and integrity as your primary systems.

At the same time, several regulatory standards stipulate that all businesses must maintain disaster recovery and/or business continuity plans. The Sarbanes-Oxley Act (SOX), for instance, requires all publicly held firms in the U.S. to maintain copies of all business records for a minimum of five years. Failure to comply with this regulation (including neglecting to establish and test appropriate data backup systems) can result in significant financial penalties for companies and even jail time for their leaders.

Choosing technologies

Backups serve as the foundation upon which any solid disaster recovery plan is built. In the past, most enterprises relied on tape and spinning disks (HDD) for backups, maintaining multiple copies of their data and storing at least one at an offsite location.

In today’s always-on digitally transforming world, tape backups in offsite repositories often cannot achieve the RTOs necessary to maintain business-critical operations. Architecting your own disaster recovery solution involves replicating many of the capabilities of your production environment and will require you to incur costs for support staff, administration, facilities, and infrastructure. For this reason, many organizations are turning to cloud-based backup solutions or full-scale Disaster-Recovery-as-a-Service (DRaaS) providers.

Choosing recovery site locations

Building your own disaster recovery  data center  involves balancing several competing objectives. On the one hand, a copy of your data should be stored somewhere that’s geographically distant enough from your headquarters or office locations that it won’t be affected by the same seismic events, environmental threats, or other hazards as your main site. On the other hand, backups stored offsite always take longer to restore from than those located on-premises at the primary site, and network latency can be even greater across longer distances.

Continuous testing and review

Simply put, if your disaster recovery plan has not been tested, it cannot be relied upon. All employees with relevant responsibilities should participate in the disaster recovery test exercise, which may include maintaining operations from the failover site for a period of time.

If performing comprehensive disaster recovery testing is outside your budget or capabilities, you can also schedule a “tabletop exercise” walkthrough of the test procedures, though you should be aware that this kind of testing is less likely to reveal anomalies or weaknesses in your DR procedures—especially the presence of previously undiscovered application interdependencies—than a full test.

As your hardware and software assets change over time, you’ll want to be sure that your disaster recovery plan gets updated as well. You’ll want to periodically review and revise the plan on an ongoing basis.

The IBM Knowledge Center provides an  example of a disaster recovery plan .

Disaster-Recovery-as-a-Service (DRaaS) is one of the most popular and fast-growing managed IT service offerings available today. Your vendor will document RTOs and RPOs in a service-level agreement (SLA) that outlines your downtime limits and application recovery expectations.

DRaaS vendors typically provide cloud-based failover environments. This model offers significant cost savings compared with maintaining redundant dedicated hardware resources in your own data center. Contracts are available in which you pay a fee for maintaining failover capabilities plus the per-use costs of the resources consumed in a disaster recovery situation. Your vendor will typically assume all responsibility for configuring and maintaining the failover environment.

Disaster recovery service offerings differ from vendor to vendor. Some vendors define their offering as a comprehensive, all-in-one solution, while others offer piecemeal services ranging from single application restoration to full data center replication in the cloud. Some offerings may include disaster recovery planning or testing services, while others will charge an additional consulting fee for these offerings.

Be sure that any enterprise software applications you rely on are supported, as are any public cloud providers that you’re working with. You’ll also want to ensure that application performance is satisfactory in the failover environment, and that the failover and failback procedures have been well tested.

If you have already built an on-premises disaster recovery (DR) solution, it can be challenging to evaluate the costs and benefits of maintaining it versus moving to a monthly DRaaS subscription instead.

Most on-premises DR solutions will incur costs for hardware, power, labor for maintenance and administration, software, and network connectivity. In addition to the upfront capital expenditures involved in the initial setup of your DR environment, you’ll need to budget for regular software upgrades. Because your DR solution must remain compatible with your primary production environment, you’ll want to ensure that your DR solution has the same software versions. Depending upon the specifics of your licensing agreement, this might effectively double your software costs.

Not only can moving to a DRaaS subscription reduce your hardware and software expenditures, it can lower your labor costs by moving the burden of maintaining the failover site to the vendor.

If you’re considering third-party DRaaS solutions, you’ll want to make sure that the vendor has the capacity for cross-regional multi-site backups. If a significant weather event like a hurricane impacted your primary office location, would the failover site be far enough away to remain unaffected by the storm? Also, would the vendor have adequate capacity to meet the combined needs of all its customers in your area if many were impacted at the same time? You’re trusting your DRaaS vendor to meet RTOs and RPOs in times of crisis, so look for a service provider with a strong reputation for reliability.

Read “ Disaster Recovery as a Service (DRaaS) vs. Disaster Recovery (DR): Which Do You Need? ” for a comparative overview of both solutions.

Protect your data with a cloud disaster recovery plan.

Achieve RPO in seconds and RTO in minutes, with an easy-to-deploy and scalable data-protection solution.

Run smoother with deployment options for every workload. Our network is resilient, redundant, highly available.

Gain the skills and knowledge required to begin a career as an IBM Cloud Professional Architect. Validate your capabilities in an interactive curriculum that prepares you for IBM Cloud certification.

Learn the basics of backup and disaster recovery so you can formulate effective plans that minimize downtime.

Compare the costs, benefits, and functionality of on-premises disaster recovery solutions and DRaaS.

Disaster recovery solutions based in the IBM Cloud are resilient and reliable. You can provision a failover site in any of the more than 60 data centers located in six regions and in 18 global availability zones for low latency and in order to meet geographically-specific business requirements.

  • Storage Hardware
  • Storage Software
  • Storage Management
  • Storage Networking
  • Backup and Recovery

Logo

Related Articles

Is the 3-2-1 backup rule still relevant, cloud server vs dedicated server: what’s the difference, 11 data backup best practices: avoid data loss and speed recovery, get the free newsletter.

Subscribe to Cloud Insider for top news, trends, and analysis.

Latest Articles

15 software defined storage best practices, what is fibre channel over ethernet (fcoe), 9 types of computer memory defined (with use cases).

Logo

Recommended for you

  • Disaster Recovery Planning: Best Practices and a Template
  • Disaster Recovery Plan Checklist
  • Disaster Recovery Testing: Scenarios, Best Practices, Methods
  • Free Data Recovery Software
  • Disaster-Recovery-as-a-Service (DRaaS) Overview

Cloud Disaster Recovery

Cloud Disaster Recovery: Planning and Approaches

Cloud Disaster Recovery

For many businesses today, the cloud is an essential component of disaster recovery planning. But integrating cloud-based resources into your disaster recovery strategy can be challenging, because there are so many approaches you can take. There are multiple ways to benefit from the cloud backup and disaster recovery , as well as multiple strategies for integrating the cloud into such a plan.

Read on for tips on getting the most out of the cloud as part your disaster recovery strategy. This article explains how cloud-based environments can support disaster recovery workflows, as well as how to build a plan for disaster recovery in the cloud tailored to your needs.

Cloud Disaster Recovery

Cloud disaster recovery is the process of remediating data and infrastructure to the working state using cloud-based resources. There are three basic ways to use the cloud as part of disaster recovery.

Approach 1. Disaster Recovery from the Cloud

Cloud Disaster Recovery

This approach to cloud-based disaster recovery allows you to take advantage of the low-cost data storage options available from cloud vendors. It also simplifies your backup routines by allowing you to store backup data from all of your systems in a single location within the cloud.

The downside of disaster recovery from the cloud, however, is that it doesn’t work well if your on-premises IT systems are unavailable when you need to perform disaster recovery. If your local servers and storage media were disrupted by whichever event caused the data loss -- which is likely to be the case if, for example, a fire or flood or impacted your local data center -- you won’t have infrastructure available to which you can recover data.

New call-to-action

Further reading Amazon S3, MS Azure and Google Cloud Storage Pricing Comparison

Approach 2. Disaster Recovery to the Cloud

Cloud Disaster Recovery

This approach eliminates the need for physical on-site infrastructure to remain available following a disaster. Instead, you can quickly recover data to virtual environments running in the cloud.

The major risk, however, is that, if you store data backups on-premises, your backups may be destroyed if a disaster impacts your local environment.

Approach 3. Cloud-to-cloud Disaster Recovery

Cloud Disaster Recovery

Under this approach, you would spin up virtual machines and databases in the cloud, then populate them with data from your cloud-based backups in the event of a disaster that impacts on-premises resources.

In addition to separating both your backup data and backup infrastructure from your local data center, this strategy may speed disaster recovery, because it will typically take less time to transfer backup data from cloud storage to cloud VMs and databases than it would to move data between the cloud and an on-premises environment, or vice versa. That’s because networks within the same cloud offer much more bandwidth than the public Internet that connects a cloud to external environments.

The downside of disaster recovery in the cloud is that it is likely to cost the most, because it requires you to maintain both backup storage and backup infrastructure in the cloud.

Further reading Building the SMB Backup Infrastructure with Cloud Apps

New call-to-action

Configuring Cloud-based Disaster Recovery

That said, you can optimize the costs of cloud-based disaster recovery by choosing a disaster recovery configuration that is best suited to your needs and budget. There are four basic options to choose from, each of which caters to different RTO and RPO requirements .

Simple Backup and Recovery

The most straightforward cloud disaster recovery configuration is to back up data from on-premises to the cloud, then recover it from the cloud when needed. This approach will cost the least and is the simplest to administer. The major limitation to consider, however, is whether you’ll be able to move and recover data quickly enough from the cloud to your on-premises environment to meet RTO and RPO needs.

Further reading Full System Backup and Restore Guide

Pilot Light

A “pilot light” disaster recovery configuration involves setting up backup infrastructure (in other words, VMs, databases and any other resources you require) within the cloud, but leaving it turned off until you need it. Because most cloud providers don’t charge for resources that aren’t actually running, this approach allows you to keep costs low, while still providing you with the assurance of a ready-to-go, cloud-based infrastructure that you can use for disaster recovery when you need to.

Warm Standby

If you have more budget, you can use a warm standby configuration. In warm standby, you set up a backup environment in the cloud and actually keep it running at all times. That way, you don’t need to waste time starting it up. Warm standby will therefore help you meet tighter RPO and RTO needs.

If you have truly strict RTO and RPO requirements, you may opt for a multi-site cloud disaster recovery configuration. Under this approach, you keep live copies of backup infrastructure in multiple cloud availability zones at all times. By keeping redundant copies of the infrastructure running, you ensure your ability to perform disaster recovery even if one availability zone happens to go down when the disaster occurs. Of course, this configuration also costs the most.

Further reading Multi-Cloud Backup: Avoiding a Single Point of Failure

Cloud Disaster Recovery Planning

When considering the best plan for disaster recovery in the cloud for your needs, then, you should take several factors into account:

  • RPO and RTO needs: How long can your business operate without normally functioning systems? The more quickly you need data restored, the more you will probably need to invest in cloud-based disaster recovery that can get your workloads up and running again quickly.
  • Cost: The more money you have to invest in cloud backup and disaster recovery, the more sophisticated your cloud disaster recovery plan can be. More financial resources will enable you to take advantage of the warm standby or multi-site disaster recovery configurations described above, which provide faster and more reliable recovery, albeit at a steeper price.
  • Administration: The more cloud environments and resources you have running as part of your disaster recovery plan, the more time you’ll have to spend managing them (not to mention keeping them secure). Think about how much time your team can afford to spend on administration as you create your cloud disaster recovery plan.

New call-to-action

Further reading Disaster Recovery FAQ: Essential Definitions for IT Pros and MSPs

Building a Cloud Disaster Recovery Solution

By putting together all of the pieces described above, you can build a cloud disaster recovery solution that reflects your needs. There are three basic steps to follow here.

Step 1. Choose the Approach

Cloud Disaster Recovery

Step 2. Choose the Cloud Vendor

Cloud Disaster Recovery

Step 3. Choose the Cloud Backup and Disaster Recovery Solution

Cloud Disaster Recovery

For this need, MSP360 Managed Backup offers unparalleled reliability and flexibility. Because MSP360 supports all of the major cloud platforms, it will work no matter which cloud vendor you choose. In addition, MSP360 provides a range of cloud backup and disaster recovery options, including not just file and folder backup but also image-based backup , which makes it easy to back up entire systems, then restore them quickly to cloud-based VM instances for disaster recovery.

New call-to-action

Talk to a multi-cloud expert.

Ask a question or learn more about Faction. We’re ready when you are.

  • Share Via Email

What is Cloud Disaster Recovery? Building a Plan for Success

cloud disaster recovery

In this guide, we will be discussing (click on link to jump to section):

I. What is Cloud Disaster Recovery II. How Does Cloud Disaster Recovery Differ from Traditional Disaster Recovery? III. Why Cloud Disaster Recovery is Crucial for Your Business IV. Five Benefits of Having a Disaster Recovery Strategy in Place V. What is an IT Disaster Recovery Plan? VI. How to Design a Cloud Disaster Recovery Plan VII. How Faction’s Cloud-Based Disaster Recovery Can Protect Your Business

Let’s dive in:

As more industries depend on IT infrastructure, a service outage can be more than an annoyance when critical business activities are disrupted.

A recent disaster recovery survey found that 79% of midsized businesses and 87% of large businesses had experienced one or more service outages in the past year. Even more shocking is that 27% of companies lost money due to service outages, and 31% of those reported losses in excess of $100,000.

What is Cloud Disaster Recovery?

Unlike traditional DR, cloud disaster recovery uses cloud-based technology to automate failover to the cloud, rapidly restore IT infrastructure, and prevent costly service outages.

This approach reduces or eliminates capital expenses in secondary site data center real estate and hardware and associated operating expenses. With cloud-based DR, companies pay only for the resources they are using – meaning you don’t pay for the compute resources needed for failover until you need it.

cloud disaster recovery

In the past, organizations with the goal of increasing their resiliency against disasters would bear the significant up-front cost of building a remote disaster recovery (DR) site. Today, cloud-based disaster recovery provides a high-performance alternative that is flexible and cost-effective – and businesses everywhere are taking notice.

In this blog, we explain all the basics of cloud disaster recovery. You will learn how cloud disaster recovery differs from the traditional model, why disaster recovery is so important to your business and how you can implement your own cloud disaster recovery plan with help from Faction and our partners.

Talk to an Expert

How Does Cloud Disaster Recovery Differ from Traditional Disaster Recovery?

With cloud disaster recovery, organizations of all sizes can achieve rapid and cost-effective failover by combining tools from public cloud providers like AWS and third-party vendors like Faction.

disaster recovery plan for cloud services

A great example is the AWS Elastic Load Balancer that automatically distributes incoming application traffic according to pre-set rules and across multiple availability zones. IT organizations can leverage this powerful functionality to replace their on-premise load-balancer and seamlessly failover on-premise applications to VMware virtual machines in the cloud. They can also deploy an attached cloud storage like Faction Cloud Control Volumes to quickly scale up storage availability for business-critical loads.

Why Cloud Disaster Recovery is Crucial for Your Business

While 95% of business respondents in a recent survey currently have a disaster recovery plan in place, many are lacking coverage in key areas and 23% admit their disaster plan has never been tested.

Without a solid disaster recovery plan and solution in place, your business faces tremendous risk and uncertainty in the face of a cyberattack, data loss event, power outage or natural disaster.

Safeguard your business with a solid DR solution to:

I. Avoid Revenue Losses

According to Gartner, the average cost of network downtime can be pegged at $5,600 per minute. In practice, every business is impacted differently by service outages, so results may vary. Still, if your business depends on the availability of IT infrastructures to process customer transactions, you could be losing revenue for every minute your systems are down.

II. Retain Customer Trust

Your customers are depending on you to provide a service. When your servers and applications are unavailable, that service is not being provided and your customers may take their business elsewhere. If you experience frequent downtime, you could earn a reputation for being unreliable that may adversely impact your market viability.

III. Ensure Business Survival

According to the Federal Emergency Management Agency (FEMA) , forty percent of small businesses never reopen after a natural disaster and an additional 25% close within a year. Without an adequate disaster recovery plan, a major data loss event could simply annihilate your business and leave you with no option but to close for good.

To avoid these negative outcomes and protect your business, you need a cloud disaster recovery plan that helps you restore your most critical services as soon as possible when disaster strikes.

Five Benefits of Having a DR Strategy in Place

You read about it all the time – outages happen and take companies down for minutes, hours, and sometimes days or even shut them down completely. This kind of threat is something that although cannot be avoided 100% of the time, it’s definitely something you can plan for.

Think of Disaster Recovery like you would an insurance policy: a smart (and necessary) move, although it may be an expense you aren’t thrilled about. It serves a great purpose, and if disaster ever does strike – you’ll be glad you decided to move forward with implementing DR.

Having a plan in place to handle disruptive incidents could be the difference between being offline for a brief amount of time, versus a detrimental length of time.

Disruptive incidents can be anything that puts an organization’s operations at risk, such as:

  • Loss of personnel
  • Loss of facilities
  • Loss of access to data or applications
  • Equipment failure
  • Natural disasters
  • Cyberattacks (ransomware)

So why should you consider implementing a DR strategy? Let’s review five benefits of having a DR strategy in place.

  • Mitigating risk, such as snapshots or mirror copy failure & corruption, human error, hardware failure or poorly documented communication.
  • Minimizing downtime: keeping your business online!
  • Insuring against natural disasters by having geographic diversity: storing copies of your data in several locations.
  • Having the ability to withstand human error or physical impact on a production environment.
  • Limit the effect on business or downstream customers.

What is an IT Disaster Recovery Plan?

Disaster recovery is a plan, set of processes, policies, and tools that enables organizations to restore business and IT operations following a business continuity disaster.

Disasters that interrupt business continuity and cost organizations time and revenue include:

I. Natural Disasters

Natural disasters include hurricanes, tornadoes, floods, earthquakes, and any other natural phenomenon or weather pattern that could damage your data center and disrupt service.

II. Technological Disasters

Technological disasters include power outages and internet outages that impact service availability. Hardware failure, third-party vendor outages, and hardware destruction are also among the leading causes of disruptive service outages for businesses.

III. Man-Made Disasters

A system administrator mistakenly deletes a critical file or database, a disgruntled former employee sabotages your IT systems, or a cyber attacker disrupts your operations with a DDoS attack. Man-made disasters can be either intentional or accidental.

Disaster recovery emerged as a business process during the late 1970s, with IT managers recognizing the increasingly vital role of IT services within their organizations. Today, most business-critical activities and data depend on the availability of IT systems. When these systems experience outages, employees are unable to do their jobs, customers are unable to place orders, and trust is lost.

Disaster recovery traditionally requires additional expenses for real estate, staffing, hardware and software, networking equipment, testing, and maintenance. For businesses that cannot tolerate any downtime, the historically best-practice approach would be to construct a second data center in a separate geographical location. This would create geographical redundancy, such that a technological or natural disaster impacting one data center might not impact the other. These dual data centers would be run in parallel with real-time data mirroring and synchronization to ensure minimal or even zero data loss in the event of a service disruption.

How to Design a Cloud Disaster Recovery Plan

1. conduct business impact analysis (bia).

The purpose of a business impact analysis is to assess the overall risk that your organization could face from a service outage or disaster event. Conduct a BIA by identifying all your applications and components, then grouping them into tiers based on their importance to your business. The higher the tier, the more quickly the applications must be restored. Organizations must strategically allocate their disaster recovery resources to ensure that the most critical resources are restored first and restored fast. At the same time, designating less important applications into higher tiers will incur unnecessary costs.

2. Establish Service Level Agreement (SLA) Definitions

An SLA is a performance target for your cloud disaster recovery plan . Different components of your IT infrastructure can have their own SLAs depending on how critical they are to business activities. The two most common SLAs for disaster recovery are:

a. Recovery Time Objective (RTO)

The maximum amount of application downtime permitted in a disaster event.

b. Recovery Point Objective (RPO)

The maximum time period of data loss from an IT service due to a service disruption.

Cloud-based DR can restore customer’s agreed-upon mission-critical applications with an RPO and RTO of less than 15 minutes. Other workloads can be restored within 1-8 hours depending on the customer’s needs and the workload classification.

3. Define Control Measures

There are five types of controls that can be used to cope with disasters:

a. Prevention

Preventive measures are used to avoid or prevent disasters before they occur.

b. Protection

Protective measures are used to safeguard systems and prevent damage or data loss in the event of a service outage.

c. Mitigation

Mitigation measures reduce the severity of a service outage that is already affecting system availability.

d. Response

Response procedures for disaster recovery detail the steps that will be taken to restore service once a disaster has been identified, including shifting workloads to the cloud.

e. Recovery

Recovery procedures are used to restore operations from your failover environment back to the primary environment., a process known as failback in disaster recovery lingo.

4. Assign Roles and Responsibilities

At this point, you should have identified all applications and components of your IT infrastructure and defined SLAs and control measures for each application. The next step is to assign roles and responsibilities – whose job is it to implement those control measures?

Organizations do not necessarily have to rely on internal knowledge and expertise to manage the complexity of cloud disaster recovery. With vendors like Faction offering Hybrid Disaster Recovery-as-a-Service (DRaaS), organizations can reap the benefits of fully managed disaster recovery service without the added technical overhead.

5. Testing, Training, and Maintenance

Regular testing is required to ensure that cloud disaster recovery protocols function as planned and consistently meet SLAs for critical applications. IT organizations should perform quarterly or biannual tests to verify that the performance of disaster recovery systems is aligned with business needs.

DRaaS or DIY, which DR solution is right for you?

Are you trying to determine which route is best for your organization when it comes to Disaster Recovery? Chances are you have evaluated a variety of solutions available in the market and it really comes down deciding which direction is better for your organization:

  • Disaster Recovery-as-a-Service (DRaaS) OR
  • DIY Disaster Recovery solution

Each option has its own benefits and challenges, coming to a decision depends on budget, skill set and business objectives.  Let’s take a closer look at both DRaaS and DIY DR.

DRaaS is the replication and hosting of physical or virtual servers by a third party to provide failover in the event of a natural or man-made disaster. A true DRaaS solution should include the following:

  • Assessment of your environment to determine workload protection groups
  • Replication to a second site (cloud)
  • A complete runbook for your recovery operations, including documented detailed recovery procedures
  • Testing to ensure the plan works according to the RTO and RPO your organization has set

So, what are the benefits of Disaster Recovery-as-a-Service?

  • No huge upfront costs, unlike DIY DR (hardware, licensing, colo)
  • A more manageable monthly payment
  • Professional and managed services provided by a team of experts to ensure the best solution for your organization
  • Recovery is fast and automated
  • Depending on your choice of vendor, pay for only what you use in a DR event (eliminate unused capacity)*

Are there challenges that come with selecting DRaaS as your solution?

  • Data transfer costs (egress fees) – but that’s only if a disaster takes place
  • Clarifying SLA’s – make sure that you clearly communicate your RPO and RTO with your provider and understand whether or not it matches up with their SLA

DIY Disaster Recovery is taking on all of the responsibilities that are part of building, implementing, and maintaining a DR solution. It involves building and maintaining a separate datacenter so you have a copy of your data stored in the event of a disaster. It is often argued that benefits of going the DIY route include:

  • Owning all the expertise, equipment and tools necessary
  • Not having to rely on an outside organization for DR

If you take this approach then you must embrace these challenges:

  • Document the entire process and communicate it to key stakeholders so that the organization knows what to do if an incident occurs
  • Monitor 24/7 and account for any changes in your environment that may happen in the future (ie. capacity)
  • Ensure compliance is met per the guidelines of your industry or organization, and keep it updated
  • Test regularly and ensure you can execute a recovery in order to meet RTO and RPO standards of your organization
  • Have the appropriate employees who possess the skill sets needed to carry out the DIY solution in the event that you need to recover your data

A common misconception of DIY DR is that you’ll save money because you aren’t paying someone else to take care of it for you. However, when you consider all of the unused capacity in a DIY solution, the cost to set up, maintain and test everything yourself, it may end up costing you more money instead of saving you money. Keep in mind that with a DIY approach you also sacrifice scalability and flexibility.

How Faction’s Cloud-Based Disaster Recovery Can Protect Your Business

Faction offers cloud disaster recovery solutions to help businesses safeguard their IT and application infrastructure against disruptive service outages.

Ready to learn more?

Or watch our cloud disaster recovery webinar where we explained exactly how your business can start using AWS and VMware cloud to execute on your organization’s cloud disaster recovery plan.

View Webinar: Navigating Disaster Recovery with VMware Cloud on AWS

Recommended For You

New ceo derek pilling’s thoughts on what’s next for faction, multi-cloud data services pioneer faction appoints derek pilling as ceo, faction 2024 predictions: navigating the ai wave, connect with an expert today, get the latest on multi-cloud and faction sent straight to your inbox..

Fill out the form below to access the gated content across our site.

Sorry. You must be logged in to view this form.

Anywhere Workspace

Access Any App on Any Device Securely

App Platform

Build and Operate Cloud Native Apps

Cloud & Edge Infrastructure

Run Enterprise Apps Anywhere

  • Telco Cloud

Cloud Management

Automate and Optimize Apps and Clouds

Desktop Hypervisor

Manage apps in a local virtualization sandbox

  • Fusion for Mac
  • Workstation Player
  • Workstation Pro

Security & Networking

Connect and Secure Apps and Clouds

Run VMware on any Cloud. Any Environment. Anywhere.

On public & hybrid clouds.

alibaba

On Private & Local Clouds

emc

Anywhere Workspace Access Any App on Any Device Securely

App platform build and operate cloud native apps, cloud infrastructure run enterprise apps anywhere, cloud management automate and optimize apps and clouds, edge infrastructure enable the multi-cloud edge, networking enable connectivity for apps and clouds, security secure apps and clouds, by industry.

  • Communications Service Providers
  • Department of Defense
  • Federal Government
  • Financial Services
  • Healthcare Providers
  • State and Local Government

VMware AI Solutions

Accelerate and ensure the success of your generative AI initiatives with multi-cloud flexibility, choice, privacy and control.

For Customers

  • Find a Cloud Provider

Find a Partner

  • VMware Marketplace
  • Work with a Partner

For Partners

  • Become a Cloud Provider
  • Cloud Partner Navigator
  • Get Cloud Verified
  • Learning and Selling Resources
  • Partner Connect Login
  • Partner Executive Edge
  • Technology Partner Hub
  • Work with VMware

Working Together with Partners for Customer Success

A new, simplified partner program to help achieve even greater opportunities for profitability.

Tools & Training

  • VMware Customer Connect
  • VMware Trust Center
  • Learning & Certification
  • Product Downloads
  • Product Trials
  • Cloud Services Engagement Platform
  • Hands-on Labs
  • Professional Services
  • Customer Success
  • Support Offerings
  • Support Customer Welcome Center

Marketplace

  • Cloud Marketplace
  • VMware Video Library
  • VMware Explore Video Library

Blogs & Communities

  • News & Stories
  • Communities
  • Customer Stories
  • VMware Explore
  • All Events & Webcasts
  • Products 
  • VMware Cloud Disaster Recovery (VCDR) | SaaS Solution

disaster recovery plan for cloud services

Disaster Recovery as a Service Solution VMware Cloud Disaster Recovery

Protect your data, minimize downtime and reduce costs with optimized disaster recovery (DR) that’s easily accessible on demand and delivered as a SaaS solution.

Custom Thumbnail Image for video

  • Resources & FAQ
  • TRY HANDS-ON LAB

Drive Business Growth with Smarter DR

Flexible deployment options.

Choose the deployment option that works best for your needs. Set up failover capacity 100% on demand or with minimal footprint.

Optimized Costs

Leverage the elasticity and reliability of cloud to balance effective DR operations and optimized IT resource allocation to achieve up to 60% lower TCO.

Accelerated Ransomware Recovery

GigaOm named VMware an outperformer and a leader in ransomware recovery thanks to its ground-breaking, integrated capabilities.

disaster recovery plan for cloud services

Join the Expert Deep Dive

This webcast shows how you can accelerate ransomware protection and recovery with VMware Solutions.

Product Demos

30 minute rpo, set up a new sddc, protect production site workloads, protect vcf workloads.

30 Minute RPO

Improve delivery of business SLAs for IT teams with lower RPO for VM Protection Groups.

Set Up a New SDDC

Deploy a new SDDC for disaster recovery needs directly from the SaaS Orchestrator user interface.

Protect Production Site Workloads

Protect VMware Cloud on AWS SDDCs in addition to on-premises vCenter/vSphere based environments.

Protect VCF Workloads

Protect VCF workloads for disaster recovery along with standard vSphere and VMware Cloud on AWS SDDCs.

VMware Cloud Disaster Recovery Features

Pilot light.

In Pilot Light mode, DRaaS deploys a subset of Software-Defined Data Center (SDDC) hosts ahead of time to recover critical apps with lower RTO requirements. 

Detailed DR Reports

When performing workflows, history reports are automatically generated. These reports provide proof that your DR plans are tested and executed correctly.

Instant Power-On

In a disaster, stored replicas are instantly powered on by mounting the Scale-out Cloud File system (SCFS) directly to the SDDC. 

Delta-Based Failback

Get automated orchestration of site failback with a single click. Optimize failback cloud egress charges and DR operational costs by only transferring back the changed blocks of data to the production site.

Immutable Snapshots

Live Mount allows hosts to boot virtual machines (VMs) from snapshots stored securely in the SCFS for short- and long-term retention of immutable, operationally air-gapped snapshots.

Frequent DR Health Checks

DR health checks consistently run every 30 minutes for increased reliability. Any issues automatically generate email alerts to the proper resources.

disaster recovery plan for cloud services

VMware Cross-Cloud Services

VMware Cloud Disaster Recovery is a Cloud Infrastructure service of the Cross-Cloud services portfolio that enables Disaster Recovery in a public cloud.

Solve Your Toughest Challenges

Dr modernization, business growth.

disaster recovery plan for cloud services

Ransomware Recovery

Ensure confident ransomware recovery with a plan that rapidly restores critical apps and supports resiliency. Efficiently identify points of recovery to run rapid recovery point validations. 

disaster recovery plan for cloud services

Accelerate cloud adoption by modernizing your DR infrastructure. Get even faster DR with on-demand failover capacity that eliminates the need to maintain a secondary DR site.

disaster recovery plan for cloud services

Maximize ROI with a tiered approach to DR. By tuning SLAs and TCO to match app requirements, you can cut DR costs. Leverage  cloud economics with no hardware investment or lifecycle management. 

Compare Editions and Pricing

Purchase VMware Cloud Disaster Recovery through our standard year term subscriptions. Available payment plans:

  • 1-year subscription
  • 3-year subscription

VMware Cloud on AWS: Single Host 60-Day Option

Recommended for just-in-time configurations and evaluation purposes. The SDDC will need to be re-created when the 60-day period expires.

  • For non-production use only
  • Available with on-demand payment plan

Starting at $7/host/hour

VMware Cloud on AWS: Production Host Option

Recommended for Pilot Light configurations to achieve faster recovery times.

  • Minimum 2 hosts for i3.metal
  • Available with on-demand, 1-year or 3-year payment plan

Case Studies

Reily Foods Logo

“We leverage the cloud OpEx model and optimize TCO by avoiding an ongoing investment in expensive on-premises equipment.”

Merrick Logo

“VMware Cloud Disaster Recovery just works. I don’t lose sleep over our ransomware recovery capability … It’s nice to not be too concerned about workload protection anymore.”

— Greg Morrissey, IT Manager

Fozzy Logo

“Previous [to the war] we had a plan for recovering our VMware infrastructure. But it was not so easy to check how well it worked. Or to see how well it would restore our systems if we did have a disaster. After implementing VMware Cloud Disaster Recovery, I certainly sleep much better”

— Ian Slavioglo, VP of IT

Wochepass Logo

“We had a real disaster and could only access devices to a limited extent. We needed to make sure a scenario like that never occurred again.”

- Adrian Hess, CEO

 ESG Logo

“VMware users finally have a true DR option that works automatically and on demand, regardless of the complexity of their environment.”

— Steve Duplessie, Founder and Senior Analyst

Learn, Evaluate, Implement

Explore technical documentation, reports, trial, communities and more.

Use Partner Locator to quickly find a VMware partner near you.

View common question and answers about VMware Cloud Disaster Recovery.

Ready to Get Started?

  • Site Recovery Manager
  • Disaster Recovery Solutions
  • What is Disaster Recovery as a Service (DRaaS)

burger

Disaster Recovery Plan for Cloud Services

Disaster Recovery Plan for Cloud Services - image

What Is Disaster Recovery for Cloud Services?

Types of disasters, why is disaster recovery significant, loud examples of cloud disasters, benefits of cloud-based disaster recovery, disaster recovery work, how we implement disaster recovery plans.

Cloud computing is an efficient way to manage digital assets, but it’s not resistant to disasters. Data, in its turn, is one of the most valuable assets, which is effortless to store in the cloud. What can you do if a disaster influences your cloud data?

It’s almost impossible to foretell when you’ll need a disaster recovery plan for applications on cloud services, so if you can’t control when disaster hits, the next best thing is to be able to regulate the recovery process.

Disaster recovery in cloud services can be achieved through measures such as a robust backup system or multiple servers in different territories and continents to reduce the harm a disaster can force.

Disaster recovery (DR) is readying for and recovering from a disaster. The disasters can take several forms, but all have identical effects: preventing the system from functioning normally and controlling the business from meeting its daily goals.

There are four main categories:

Natural Disasters: Natural disasters such as floods, hurricanes, or earthquakes are less common but not rare. If a disaster strikes the area where the server hosting your application’s cloud service is located, it may disrupt the services and need recovery operations.

Technical Disasters: Perhaps the most obvious of the three technological disasters cover everything that can go wrong with cloud technology. They can include energy outages or loss of network connectivity.

Human Errors: Human errors are common and are usually accidents that occur while using cloud services. They could include unintentional misconfiguration or malicious third-party access to the cloud service.

Security Breach: hackers’ attacks that lead to the loss of control over the cloud or data loss. A data breach occurs when a cybercriminal successfully infiltrates a data source and extracts sensitive information. It can be done by accessing a computer or network to steal local files or bypass network security remotely. End users are rarely the target of cybercriminals who are out to steal sensitive information in bulk unless an individual is connected to the industry. However, end users can be affected when their records are part of the information stolen from big companies.

Cloud providers are responsible for everything they directly control, like the resilience of the overall infrastructure: hardware, software, network, and facilities. You are typically accountable for cloud configuration, secure data backup, workload architecture, and availability.

Establishing DR protocols and contingencies is vital to business continuity. In a disaster, a company with DR protocols and options can minimize disruption to its services and reduce the overall impact on business performance. Minimal service interruption means less loss of revenue, which in turn implies user dissatisfaction minimization.

A disaster recovery plan (DRP) is a recorded, structured approach defining how a business can renew work after an unplanned incident. It is applied to an organization’s parts that depend on an active IT infrastructure. A DRP strives to help an institution resolve data loss and recover system functionality to perform in the aftermath of an incident.

The plan consists of steps to minimize the effects of a disaster so the organization can continue to work or quickly resume mission-critical functions. Typically, a DRP involves an analysis of business processes and continuity needs. Before generating a detailed plan, an organization often performs a business impact analysis (BIA) and risk analysis (RA) to establish recovery objectives.

As cybercrime and security breaches become more sophisticated, an organization must define its data recovery and protection strategies. Quickly handling incidents can reduce downtime and minimize financial and reputational damages. DRPs also help organizations meet compliance requirements while providing a clear roadmap to recovery.

Having a disaster recovery plan (DRP) also means that your company can define a Recovery Time Objective (RTO) and a Recovery Point Objective (RPO). RTO is the maximum acceptable delay between service interruption and resumption, and RPO is the maximum time interval between data recovery points.

Quantifying these areas can help your company determine the optimal level of protection for DR and choose the proper protocols to implement, such as backup and multiple servers.

Although rare, cloud disasters have happened in the past, and even at some of the biggest cloud providers.

The data center operated by OVHCloud was destroyed by fire in early 2021. All four data centers were too close, so it took firefighters more than six hours to put out the fire. It seriously affected the cloud services operated by OVHCloud and led to disaster for companies whose total assets were hosted on these servers.

In June 2016, storms in Sydney damaged electrical infrastructure and caused significant power outages. They failed many Elastic Compute Cloud instances and Elastic Block Store volumes hosting critical workloads for several large companies.

It meant that some high-traffic websites and the online presence of some of the biggest brands were down for more than ten hours over the weekend, severely affecting business.

Amazon Store

In February 2017, an Amazon employee was trying to fix a problem with their payment system when he accidentally shut down more servers than necessary. The domino effect started and removed two server subsystems, spilling over to others. Thousands of people were unable to access Amazon’s servers for several hours.

Using the cloud for cloud-based DR means the customer does not have to store backup copies of data on disks or physical hard drives.

The distributed nature of the cloud means that services can be distributed to different servers in different geographical locations, essentially providing complete protection against local natural disasters.

Another advantage of using the cloud for DR is that the cloud provider can shift some of the responsibility. As mentioned earlier, the cloud provider is responsible for the essential fault tolerance of the cloud infrastructure, removing this concern from the customer.

Cloud-based disaster recovery also proves to be cost-effective. Because cloud providers only charge for the services they use, your business can choose which services it needs from the provider. It results in significant cost reductions by increasing the personalization of the package your company is paying for.

disaster recovery plan for cloud services

DR for cloud services is a delicate process. The methodologies underlying them must be thoroughly understood for successful recovery.

Backup & Restore

Data backup and recovery is one of the easiest, cheapest and fastest ways to recover from a cloud computing failure. It is used to mitigate regional disasters, such as natural ones, by copying data and storing it in a geographically different location.

Pilot Light

A “Pilot Light” DR approach is a method in which your company restores only the minimum and essential services necessary to function. It means that only a tiny part of your IT infrastructure needs to be copied and provides minimal functional replacement in the event of a crash.

Warm Standby Mode

A warm standby approach is when a scaled-down version of your full-featured environment is available, running in a separate location from your central server. In the event of a crash, your company can still run a version located in a different region.

Multi-Site Deployment

Although multi-site deployment is the most expensive solution of the three, it provides the most comprehensive solution to regional disaster issues. A multi-site deployment involves running a full workload simultaneously in multiple regions. These regions can be actively used or on standby in case of a disaster in another area.

Here are five steps we use to help you prepare a recovery plan:

Your DR plan should be a part of your business continuity one. It should comprise RTO and RPO determinations to help you select which cloud services you need and enhance cost-effectiveness.

If you haven’t already done so, determine the RTO and RPO for disaster recovery. This step will form the basis of your DR plan and, in turn, the types of disaster recovery services you will need.

Design your plan based on your comeback goals. It involves looking at your RTO and RPO points to decide what disaster recovery template you need to meet these criteria. Your goals should outline the maximum and minimum impact on your services.

Design for end-to-end recovery. Your plan should include restoring every aspect of your business that needs to work.

Create specific tasks to ensure a smooth process. The more detailed your charges are, the easier the recovery process will be and the less likely you will deviate from the plan.

A disaster recovery plan must be evaluated, examined, and reorganized at least once every year. Every time significant changes are made to recovery tactics, human resources, operating software, and IT infrastructure, business continuity and disaster recovery tests must be conducted.

The frequency of the tests depends on the type of business plan being analyzed. A disaster recovery plan entails managing activities between multilayered technology configurations and vendor partnerships. The suggestion for DRP testing is every year, but more frequent testing is essential because of the inclusiveness of a business continuity plan.

There are BCP and DRP training courses to help people become more familiar with the nitty-gritty of disaster recovery testing. Also, some vendors offer business continuity management certifications to help conduct sufficient DR testing.

Developing and implementing best practices for cloud-based disaster recovery is key to success. These include compliance with points 1-5 and mandatory use of shortcuts. Creating a proper business continuity plan is vital, as thoroughly testing your backups and regularly testing your overall recovery plans, whatever methods they use.

In order to quickly deploy and easily control DRP, orchestration is needed. It can either be done with Docker Compose or Kubernetes, depending on the size of the system. We use Kubernetes as an orchestration tool for Docker containers. It manages containers on the same machine to reduce network load and use resources more efficiently. In this approach, each container performs a specific function. Then we write scripts using Terraform: database restore script, Kubernetes restore script, and general script for minor settings. These scripts are executed one after the other; thus, the entire system is deployed in the cloud quickly, qualitatively, and fully automated. The resulting automatic disaster recovery is convenient, operational, and better controlled.

In general, cloud disaster recovery should be planned at scale and ongoing. Using the cloud during DR makes your process flexible and, most importantly, efficient in terms of cost and function. By designing a recovery plan that precisely meets your specifications and considering your RTOs and RPOs, you can create a robust disaster recovery plan for your solutions and products that are in the cloud.

Evgeniy Berkovich

Tell us about your project

Fill out the form or contact us

disaster recovery plan for cloud services

Related posts

Subscribe to our newsletter

Get in touch

Your submission is received and we will contact you soon

This browser is no longer supported.

Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.

  • Contact Sales
  • Free account

Public Preview: Regional Disaster Recovery by Azure Backup for AKS

Published date: february 20, 2024.

In today's dynamic landscape, safeguarding containerized workloads and application data is paramount. That's why Azure Backup for AKS provides comprehensive protection for your AKS clusters, enabling scheduled backups and seamless restoration in scenarios like Operational Recovery, Accidental Deletion, and Application Migration.

Now, we're excited to highlight a key addition: the Regional Disaster Recovery Capability, available in public preview. With this feature, you can proactively prepare for and mitigate the impact of regional disasters by:

  • Recovering AKS clusters from backups stored in a secondary region, leveraging Azure Paired Regions, ensuring business continuity even in the face of regional disruptions.
  • Storing Backup Copies offsite, adhering to the 3-2-1 backup strategy, and having the resilience to restore them in case of tenant compromise.
  • Retaining data for extended periods to meet compliance requirements in regulated industries, ensuring data integrity and security.

By embracing Azure Backup for AKS, you empower your organization with advanced disaster recovery capabilities, enhancing resilience and ensuring uninterrupted operations. Go ahead and fortify your AKS backup strategy against regional disruptions with documentation available here .

  • Azure Backup
  • Azure Kubernetes Service (AKS)

Related Products

IMAGES

  1. How to Plan an Effective Cloud Disaster Recovery Strategy?

    disaster recovery plan for cloud services

  2. 3 Steps for Building a Cloud-Based Disaster Recovery Plan

    disaster recovery plan for cloud services

  3. Infographic: How to Plan A Cloud Disaster Recovery Strategy

    disaster recovery plan for cloud services

  4. Cloud Disaster Recovery: Methods and Approaches Overview

    disaster recovery plan for cloud services

  5. Building a Disaster Recovery Plan for the Cloud Era

    disaster recovery plan for cloud services

  6. Backup and Disaster Recovery Plan

    disaster recovery plan for cloud services

VIDEO

  1. Cloud disaster recovery strategies

  2. Disaster Recovery

COMMENTS

  1. What is a Disaster Recovery Plan?

    Disaster recovery for cloud-based systems is critical to an overall business continuity strategy. A system breakdown or unplanned downtime can have serious consequences for enterprises that...

  2. What is Disaster Recovery?

    A disaster recovery plan prompts the quick restart of backup systems and data so that operations can continue as scheduled. Enhances system security Integrating data protection, backup, and restoring processes into a disaster recovery plan limits the impact of ransomware, malware, or other security risks for business.

  3. Disaster recovery options in the cloud

    Disaster recovery options in the cloud PDF RSS Disaster recovery strategies available to you within AWS can be broadly categorized into four approaches, ranging from the low cost and low complexity of making backups to more complex strategies using multiple active Regions.

  4. Disaster Recovery (DR) Architecture on AWS, Part I: Strategies for

    For most examples in this blog post, we use a multi-Region approach to demonstrate DR strategies. But, you can also use these for Multi-AZ strategies or hybrid (on-premises workload/cloud recovery) strategies. DR strategies. AWS offers resources and services to build a DR strategy that meets your business needs.

  5. PDF Best practices for implementing disaster recovery in the cloud

    Given the high stakes, IT teams need to understand the core concepts of disaster recovery, the differences between disaster recovery and backup, and best practices for planning and implementing disaster recovery. They should also explore the benefits of using public clouds such as Amazon Web Services (AWS) as recovery sites for on-premises and ...

  6. Backup and Disaster Recovery

    The Azure backup and disaster recovery solution is simple to architect, cloud-native, highly available, and resilient. Learn more about Azure reliability Simplified management across environments Azure built-in security controls Reduced complexity and cost Extend solutions to Azure with our partners

  7. How to Get the Most Out of Your Cloud Disaster Recovery Plan

    How cloud disaster recovery compares to on-premise Amitabh Sinha: With on-prem, it can take days or weeks to be back up and running again; it is a costly endeavor and very time-consuming for teams.

  8. What Is a Disaster Recovery Plan? 4 Examples

    A disaster recovery plan defines instructions that standardize how a particular organization responds to disruptive events, such as cyber attacks, natural disasters, and power outages. A disruptive event may result in loss of brand authority, loss of customer trust, or financial loss.

  9. Understanding Disaster Recovery in the Cloud

    The term cloud disaster recovery (cloud DR) refers to the strategies and services enterprises apply for the purpose of backing up applications, resources, and data into a cloud environment. Cloud DR helps protect corporate resources and ensure business continuity.

  10. Disaster Recovery In Cloud Computing: What, How, And Why

    Cloud disaster recovery is a cloud computing service which allows for storing and recovering system data on a remote cloud-based platform. To better understand what disaster recovery in cloud computing entails, let's compare it to traditional disaster recovery.

  11. Cloud-era disaster recovery planning: Setting strategy and developing

    Cloud-era disaster recovery planning: Setting strategy and developing plans In the second in a series on cloud-era disaster recovery, we look at how to formulate a DR strategy and...

  12. Disaster Recovery: An Introduction

    Disaster recovery planning involves strategizing, planning, deploying appropriate technology, and continuous testing. Maintaining backups of your data is a critical component of disaster recovery planning, but a backup and recovery process alone does not constitute a full disaster recovery plan.

  13. Disaster recovery in the cloud explained

    Therefore, a cloud disaster recovery plan (aka cloud DR blueprint) is very specific and distinctive for each organization. Triage is the overarching principle used to derive traditional as well as cloud-based DR plans. The process of devising a DR plan starts with identifying and prioritizing applications, services and data, and determining for ...

  14. How to Plan an Effective Cloud Disaster Recovery Strategy?

    Cloud disaster recovery, often called cloud DR, is a comprehensive approach encompassing strategies and services for safeguarding data, applications, and assets by replicating them to public cloud environments or dedicated service providers.

  15. What Is Cloud Disaster Recovery? Planning & Services

    Cloud disaster recovery is a backup and restore strategy that applies not only to data, but also entire virtual machines, servers and corporate networks. The operative word is "strategy" because businesses need to decide for themselves how best to use such a service.

  16. Building a Cloud Disaster Recovery Plan: Tips and Approaches

    Get the most out of the cloud as part of disaster recovery strategy with tips on how to create a cloud-based disaster recovery plan and tailor it to your needs

  17. Tips for Building a Cloud Disaster Recovery Plan

    A cloud disaster recovery solution performs data replication and frequent backups, minimizing the risk of losing stored data when a disaster occurs. Finally, a cloud disaster recovery plan helps meet compliance with industry regulations that mandate businesses to have a disaster recovery (DR) plan.

  18. Why You Need A Disaster Recovery Plan For The Cloud

    Apr 23, 2021,08:00am EDT Share to Facebook Share to Twitter Share to Linkedin Senior Vice President for Global Solutions Engineering at Sungard Availability Services. Getty Just after midnight on...

  19. The Ultimate Guide to Cloud Disaster Recovery

    A recent disaster recovery survey found that 79% of midsized businesses and 87% of large businesses had experienced one or more service outages in the past year. Even more shocking is that 27% of companies lost money due to service outages, and 31% of those reported losses in excess of $100,000.

  20. Cloud Disaster Recovery Solutions

    VMware Cloud Disaster Recovery is a Cloud Infrastructure service of the Cross-Cloud services portfolio that enables Disaster Recovery in a public cloud. ... Ensure confident ransomware recovery with a plan that rapidly restores critical apps and supports resiliency. Efficiently identify points of recovery to run rapid recovery point validations ...

  21. Disaster Recovery Plan for Cloud Services

    What Is Disaster Recovery for Cloud Services? Disaster recovery (DR) is readying for and recovering from a disaster. The disasters can take several forms, but all have identical effects: preventing the system from functioning normally and controlling the business from meeting its daily goals. Types of Disasters There are four main categories:

  22. Disaster Preparedness: A Journey to the Cloud

    Safety in the Cloud Cloud-based systems help manufacturers stay connected and their data secure when disasters strike. Cloud solutions offer a relief from the damage that can result with an on-premise setup. A cloud-based enterprise resource planning (ERP) system effectively prevents physical destruction to businesses' IT setup and data servers.

  23. TR-4931: Disaster Recovery with VMware Cloud on Amazon Web Services and

    A proven disaster recovery (DR) environment and plan is critical for organizations to ensure that business-critical applications can be rapidly restored in the event of a major outage. This solution focuses on demonstrating DR use cases with a focus on VMware and NetApp technologies, both on-premises and with VMware Cloud on AWS.

  24. Public Preview: Regional Disaster Recovery by Azure Backup for AKS

    Plan a clear path forward for your cloud journey with proven tools, guidance, and resources. Customer stories. See examples of innovation from successful companies of all sizes and from all industries. ... Keep your business running with built-in disaster recovery service. Azure Chaos Studio