Active/Active, Active/Standby - Data Center
A Data Center (DC) is where all critical information is stored and hosted. It provides access to key applications. Without this immediately accessible resource, companies will lose significant amounts of revenue dollar$ when their environment is down or impacted. In addition to restoring the services, companies will waste additional man-hours of operational support and root cause analysis updating the change board committee.
Since DCs are so critical, many large enterprises are expanding their footprint to multiple locations for resiliency. While this added geographic redundancy is great for backup, it adds on complexities to the environment and teams who support it, for seamless access to the users who utilize it.
Over the years of implementing a backup data center for various enterprises, I have been involved in multiple discussions across many business units to support and define the role the new DC site will play. While designing and installing a network infrastructure to support application requirements and dynamic failover capabilities, I have realized how isolated each department can be when looking at the whole design and failover picture. This is a completely normal expectancy as each team is focused on their areas of expertise. The application owners expect the network team to take care of the network, the security team to take care of security, compute and storage team will handle their environment; this all has to come together to work and deliver resource needs for the end-user. Often the result of not understanding the overall architecture leads to a lack of prediction for what will happen when there is a failure. How will services be impacted and restored to ensure the ecosystem behaves in uniform across both sites with no hiccup or loss of data? Can all teams that work with this environment answer this question and will all their answers be the same?
What happens when these teams are not aligned? Application DOWNTIME and poor execution to restore services. Oh no!
When introducing a Secondary Data Center, it is important for these teams to come together and understand expectations of failover requirements, simulate specific outage scenarios and discuss expected results of dynamic capabilities amongst all stakeholders: IT Network, Security, Storage/Compute, Application Owners, Operational Support and Change Management alike.
Active/Active or Active/Standby DC? - When there are two working Data Centers, is your definition of Active/Active flow of traffic from an endpoint ingress to the site going to a primary location? Then in event of Internet failure routing over to Secondary Data Center? Or is any end-user accessing either Data Center simultaneously, given location or load balancing mechanism defined? If so, how is the backend database properly supporting this active sync? If the Internet is restored, will SAN environment automatically restore as well and update? Or is there some sort of manual intervention required? Knowing this helps predict steps to perform in advance to deliver maximum uptime and efficiency without a headache or concern of putting out a fire in the middle of the night.
Overview of everyone’s quick perspective:
- Security: Only approved flows permitted, must be able to inspect traffic that flows through the infrastructure and apply necessary posturing or restrictions.
- Network: Dynamic routing failover capabilities can be configured to advertise public ARIN subnet out of one DC primarily, or leverage DNS load balancing or global traffic managers to send host to either DC simultaneously.
- SAN/Compute: Can explore options such as ACI multi-pod, Hyperflex HX stretch cluster, NetApp MetroCluster, or Pure ActiveCluster to keep storage sync at both DCs in production at the same time. Additional intelligent features could include a mediator in the Cloud to monitor when either disk should be in write-disabled mode.
- Application: Needs front end hosted access and backend database following security guideline requirements.
Our goal is to ensure all business units are aligned to work together and the definition of Active/Active is understood across all parties.
Consider the options of your backup location. Perhaps you’d like the Cloud as the backup or primary solution. Either option is viable, the key is all systems work together as expected; the environment is tested, predicted, documented, and configured proactively to work as each team assumes.
Which architecture stance do you offer? An Active/Active or an Active/Standby approach? Possibly a hybrid solution that may vary per application. Let us here at ANM help you map out and define your environment so everyone can properly support each team’s expectations.
John Bouaphaseuth, Solutions Architect
John is a Solutions Architect with 15+ years of experience in the IT industry. He holds a degree in Computer Science and Network Management with his greatest professional achievement earning his CCIE #56740. When he isn’t networking, he enjoys the great outdoors Colorado has to offer and traveling.
This form can only be submitted one time. Thank you for your interest in working with ANM.