Performing an Uncontrolled Failover (2024)

In the event of an unplanned failure of an active data center or network isolation, there will not be an opportunity to gracefully release activity from the Message VPNs at that replication site.

There are three types of uncontrolled failovers:

  • Short -Term Outage

    The Active site is out-of-service or isolated for a short duration (for example, minutes or hours). The replication queue has enough capacity to store all replicated messages and transactions during the outage.

  • Long-Term Outage

    The Active site is out-of-service or isolated for a long duration (for example, days or weeks). The replication queue does not have enough capacity to store all replicated messages and transactions during the outage.

  • Complete Failure

    The Active site goes out of service and cannot be recovered. Acritical component (the event broker, region connectivity, etc.) has been lost, or data on the external disk has been lost.

In all these types of failovers, the following general steps must be taken:

  • Step 1: Make Message VPNs at Standby Site Replication Active to Restore Service
  • Step 2: Ensure Clients Cannot Connect to the Failed Site
  • Step 3: If Necessary, Suspend Replication
  • Step 4: Bring Message VPNs at the Failed Site Back Online as Replication Standby

In the provided example, the New York replication site has experienced the failure, and its mate Boston site takes over activity until the New York site has been restored. For simplicity, only a single Message VPN (Trading_VPN) is presented in the example. When the failure occurred, Trading_VPN had a replication active state at the New York site and a replication standby state at the Boston site.

While these simple examples only show replication sites with a single Message VPN, in real-world scenarios, these steps must be performed for each Message VPN involved in a replication site failover.

Consequences of an Uncontrolled Failover

There are potential consequences of an uncontrolled failover that include:

  • The build-up of messages on the replication queue for the duration of the outage at the Active site.
  • The replication queue becoming full.
  • The loss of one or more event brokers at the failed site prior to restoring operation at the failed site.
  • The possibility of lost messages and transactions being replicated asynchronously.
  • An increased probability and volume of duplicate message delivery.

We recommend that you Contact Solace for help resolving any issues that may be present in the circ*mstances of a specific uncontrolled failover.

Step 1: Make Message VPNs at Standby Site Replication Active to Restore Service

This procedure should be followed after it has been determined that an uncontrolled failure has occurred for a data center site.

Make a Replication Standby Message VPN Replication Active

To restore service, change the replication state of the Message VPN to active.

Boston Data Center

BOS_EventBroker(configure)# message-vpn Trading_VPNBOS_EventBroker(configure/message-vpn)# replication state active

Clients will now be able to connect to the Message VPN.

Since a standby site is not available, asynchronous messages and transactions will be stored in the replication queue. By default, synchronous replication will switch to asynchronous, causing those messages and transactions to also be stored in replication queue. If reject-msg-when-sync ineligible is set on the Message VPN, synchronous replication will be blocked until the standby Message VPN is restored.

Step 2: Ensure Clients Cannot Connect to the Failed Site

It is important that the failed site does not come up with its replication Message VPNs in active state. If both sites have an active replication state at the same time, proper operation cannot be guaranteed. Since the failed event broker was configured with replication state active when it failed or became unreachable, when it recovers that will be its default state. Note that if the failed site cannot be recovered, and its configuration has to be restored from a backup, that backup configuration may have been saved with an active replication state, so this step applies in that case as well.

In this step, the goal is to allow the failed event broker to be brought back up, but also prevent clients from connecting. To do that, you should block ports that allow client connectivity, while still allowing the event broker to be managed through the management ports.

There may be a number of ways to accomplish this step; the specific actions to perform should be tested before an uncontrolled failure occurs so it is clear what to do in an actual failure scenario.

Step 3: If Necessary, Suspend Replication

If the failed site takes a long time to recover, there is risk that the replication queue will fill up. If this happens, messages published to replicated topics (in or out of transactions will be rejected), since no replication service can be provided. If you know that there will be a prolonged outage or the replication queue is getting close to filling up (high event log has been triggered on the replication queue), it may be necessary to suspend the replication service to continue to provide non-replicated service to the replicated topics.

To suspend replication, disable the reject-msg-to-send behavior on the replication queue using the following CONFIG command:

solace(configure/message-vpn/replication/queue)# no reject-msg-to-sender-on-discard

Note that with this setting, replicated service will continue until the replication queue gets full. Once it is full, only local, non-replicated service is provided.

Step 4: Bring Message VPNs at the Failed Site Back Online as Replication Standby

Once the failed site has been recovered with management access by no client access (see Step 2: Ensure Clients Cannot Connect to the Failed Site ), then it can be prepared to be the standby site. Here the steps for preparing the recovered event broker to be the standby site:

  • Step 4-1: Configure All Message VPNs as Standby
  • Step 4-2: Verify the Message Spool
  • Step 4-3: Heuristically Complete Transactions
  • Step 4-4: Allow Clients to Connect
  • Step 4-5: Wait For Synchronous Replication to be Eligible
  • Step 4-6: If Necessary, Re-enable Replication
  • Step 4-7: Retrieving Replication Queue Spooled Messages from the Failed Site
  • Step 4-1: Configure All Message VPNs as Standby

    Configure all Message VPNs on the restored Replication site with a standby Replication state. In this example, the Message VPN Trading VPN at the New York site is configured with a standby Replication state:

    NY Data Center

NY_EventBroker1(configure)# message-vpn Trading_VPNNY_EventBroker1(configure/message-vpn)# replication state standby

The Config-Sync facility propagates this setting to the Trading_VPN Message VPN on Ny-Appliance2.

Step 4-2: Verify the Message Spool

You should verify that the message spool for the event brokers at the failed Replication site are now capable of providing service.

Before continuing, ensure that the message spool on the recovered site is active for the primary virtual router. In the sample output below (which may vary by event broker type and version), the Activity Status of Local Inactive and the Message Spool Status of AD-Not Ready indicates that the event broker and the message spool it uses are not active.

NY_EventBroker1# show redundancyConfiguration Status : EnabledAuto Revert : NoRedundancy Mode : Active/ActiveMate Router Name : solaceBackupADB Link To Mate : UpADB Hello To Mate : DownPrimary Virtual Router Backup Virtual Router ---------------------- ----------------------Activity Status Local Inactive Local ActiveRouting Interface 1/1/lag1:1 1/1/lag1:3VRRP VRID 33 34Routing Interface Status Up UpVRRP Status Master MasterVRRP Priority 75 250Message Spool Status AD-NotReady AD-DisabledPriority Reported By Mate Backup-Reconcile Primary-Reconcile

In this situation, you must resolve the issue preventing failed event broker to become active. If you cannot resolve the issue, contact Solace.

Step 4-3: Heuristically Complete Transactions

If applicable, heuristically commit or heuristically rollback any prepared transactions on the failed site. Once heuristically completed, delete them to free up the resources.

To commit, rollback or delete a transaction, enter the appropriate ADMIN commands on the failed site:

NY Data Center

solace(admin/message-spool) commit-transaction xid <xid>

and/or

solace(admin/message-spool) rollback-transaction xid <xid>

and then

solace(admin/message-spool) delete-transaction xid <xid>

Where:

xid specifies the XID of the transaction to be committed, rolled back, or deleted.

Step 4-4: Allow Clients to Connect

You previously had blocked traffic to prevent client connectivity (Step 2: Ensure Clients Cannot Connect to the Failed Site ). You now must unblock the ports to allow client connectivity. This step allows clients to connect as well as to the replication bridge, which allows data to be synchronized from the active site (Boston site to the New York site in the example).

Step 4-5: Wait For Synchronous Replication to be Eligible

Once connectivity is restored between the recovered site and the active site, the replication bridge will connect from the standby site to the active site and drain the replication queue in order to synchronize the two sites. Depending on how much message and transaction data is in the replication queue and the available bandwidth between the sites, this process may take a long time. When this process is complete, the Replication service will no longer be degraded and the Message VPN will become eligible for synchronous replication.

In the following example, the information is shown for the Boston site, which is acting as active for the recently failed New York site.

BOS_EventBroker# show message-vpn Trading_VPN replicationFlags Legend:A - Admin State (U=Up, D=Down)C - Config State (A=Active, S=Standby)B - Local Bridge State (U=Up, Q=Queue Unbound, D=Down, -=N/A)R - Remote Bridge State (U=Up, D=Down, -=N/A)Q - Queue State (U=Up, D=Down, -=N/A)S - Sync Replication Eligible (Y=Yes, N=No, -=N/A)M - Reject Msg When Sync Ineligible (Y=Yes, N=No)T - Transaction Replication Mode (A=Async, S=Sync, -=N/A)Message VPN A C W B R Q S M T-------------------------------- - - - - - - - - -Trading_VPN U A N - U U - N ABOS_EventBroker# 

The ‘Y’ under the ‘S’ column indicates that synchronous Replication is eligible for the Message VPN Trading_VPN.

Step 4-6: If Necessary, Re-enable Replication

If you previously had to suspend replication because the replication queue overflowed, re-enable it. Enter the following CONFIG command:

solace(configure/message-vpn/replication/queue)# reject-msg-to-sender-on-discard

Step 4-7: Retrieving Replication Queue Spooled Messages from the Failed Site

Asynchronously spooled messages on the formerly active site (NY) can only be consumed when the activity is failed back to the formerly active state site (NY).

In order to retrieve these messages, fail back to the formerly active state site (NY) in the next maintenance window.

Performing an Uncontrolled Failover (2024)
Top Articles
6 pieces of tech that are always draining power and costing you money
askST: Is seafood from Japan safe to eat? Can salt protect against radiation?
Joi Databas
Urist Mcenforcer
Garrison Blacksmith Bench
New Slayer Boss - The Araxyte
How to Type German letters ä, ö, ü and the ß on your Keyboard
Vocabulario A Level 2 Pp 36 40 Answers Key
Midway Antique Mall Consignor Access
Mycarolinas Login
Immediate Action Pathfinder
Insidekp.kp.org Hrconnect
I Touch and Day Spa II
Napa Autocare Locator
Everything We Know About Gladiator 2
Plan Z - Nazi Shipbuilding Plans
Gina Wilson All Things Algebra Unit 2 Homework 8
Touchless Car Wash Schaumburg
Busted Mcpherson Newspaper
Free Personals Like Craigslist Nh
Koninklijk Theater Tuschinski
Https E22 Ultipro Com Login Aspx
Roanoke Skipthegames Com
27 Modern Dining Room Ideas You'll Want to Try ASAP
Anesthesia Simstat Answers
Cosas Aesthetic Para Decorar Tu Cuarto Para Imprimir
100 Gorgeous Princess Names: With Inspiring Meanings
Mississippi Craigslist
R/Mp5
Missing 2023 Showtimes Near Mjr Southgate
Naya Padkar Newspaper Today
D3 Boards
Wsbtv Fish And Game Report
Banana Republic Rewards Login
Rochester Ny Missed Connections
Hindilinks4U Bollywood Action Movies
R/Moissanite
Tedit Calamity
Man Stuff Idaho
Mcalister's Deli Warrington Reviews
Jamesbonchai
Costco Gas Foster City
Gli italiani buttano sempre più cibo, quasi 7 etti a settimana (a testa)
Backpage New York | massage in New York, New York
Hello – Cornerstone Chapel
25 Hotels TRULY CLOSEST to Woollett Aquatics Center, Irvine, CA
Freightliner Cascadia Clutch Replacement Cost
Fresno Craglist
Is Chanel West Coast Pregnant Due Date
Frank 26 Forum
Guidance | GreenStar™ 3 2630 Display
Latest Posts
Article information

Author: Van Hayes

Last Updated:

Views: 5718

Rating: 4.6 / 5 (66 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Van Hayes

Birthday: 1994-06-07

Address: 2004 Kling Rapid, New Destiny, MT 64658-2367

Phone: +512425013758

Job: National Farming Director

Hobby: Reading, Polo, Genealogy, amateur radio, Scouting, Stand-up comedy, Cryptography

Introduction: My name is Van Hayes, I am a thankful, friendly, smiling, calm, powerful, fine, enthusiastic person who loves writing and wants to share my knowledge and understanding with you.