Oracle Database quarantines, or isolates, the recovery of transactions that could potentially cause a system crash. These transactions must be manually resolved by the DBA so that row locks are released.
About Redo Application
Database buffers in the buffer cache in the SGA are written to disk only when necessary, using a least-recently-used (LRU) algorithm. Because of the way that the database writer process uses this algorithm to write database buffers to datafiles, datafiles may contain some data blocks modified by uncommitted transactions and some data blocks missing changes from committed transactions.
Crash Recovery and Instance Recovery
Crash recovery is used to recover from a failure either when a single-instance database crashes or all instances of an Oracle Real Application Clusters database crashes. Instance recovery refers to the case where a surviving instance recovers a failed instance in an Oracle Real Application Clusters database.
The goal of crash and instance recovery is to restore the data block changes located in the cache of the dead instance and to close the redo thread that was left open. Instance and crash recovery use only online redo log files and current online datafiles.
Two potential problems can result if an instance failure occurs:
-
Data blocks modified by a transaction might not be written to the datafiles at commit time and may only appear in the redo log. Therefore, the redo log contains changes that must be reapplied to the database during recovery.
-
After the roll forward phase, the datafiles may contain changes that had not been committed at the time of the failure. These uncommitted changes must be rolled back to ensure transactional consistency. These changes were either saved to the datafiles before the failure or introduced during the roll forward phase.
To solve this dilemma, two separate steps are generally used by Oracle for a successful recovery of a system failure: rolling forward with the redo log (cache recovery) and rolling back with the rollback or undo segments (transaction recovery).
Cache Recovery
The online redo log is a set of operating system files that record all changes made to any database buffer, including data, index, and rollback segments, whether the changes are committed or uncommitted. All changes to Oracle blocks are recorded in the online log.
The first step of recovery from an instance or disk failure is called cache recovery or rolling forward and involves reapplying all of the changes recorded in the redo log to the datafiles. Because rollback data is also recorded in the redo log, rolling forward also regenerates the corresponding undo segments.
Rolling forward proceeds through as many redo log files as necessary to bring the database forward in time. Rolling forward usually includes online redo log files (instance recovery or media recovery) and may include archived redo log files (media recovery only).
After rolling forward, the data blocks contain all committed changes. They may also contain uncommitted changes that were either saved to the datafiles before the failure or were recorded in the redo log and introduced during cache recovery.
Transaction Recovery
Undo tablespaces (in automatic undo management mode) contain undo segments that record the before-image of changes to the database. In database recovery, the undo blocks inside the undo segments roll back the effects of uncommitted transactions previously applied by the rolling forward phase.
After the roll forward, any changes that were not committed must be undone. Oracle applies undo blocks to roll back uncommitted changes in data blocks that were either written before the crash or introduced by redo application during cache recovery. This process of rolling back uncommitted transactions in the database is called transaction recovery.
The following figure illustrates rolling forward and rolling back, the two steps necessary to recover from any type of system failure.
Figure 30-1 Rolling Forward and Rolling Back
Description of "Figure 30-1 Rolling Forward and Rolling Back"
Failure During Transaction Recovery
Transaction recovery can fail due to the following reasons:
- Physical data corruption of database blocks (ORA-01578, ORA-28304)
- Logical data corruption (ORA-00600)
- Memory corruption (ORA-00602, ORA-07445)
- State Corruptions (ORA-00600)
A failure during transaction recovery can be irrecoverable to the entire database instance and bring down the entire container database (CDB) including its pluggable databases. Inability to recover all the transactions in the system leads to rowlocks being held by unrecovered transactions for longer. This severely impacts critical business operations.
Starting with Oracle Database 23ai, transactions that fail to recover are quarantined and left un-recovered until the DBA can resolve the issue. This increases the availability of the database. The Database Developer is notified about the quarantined transaction and must take immediate action so that the row locks held by quarantined transactions can be released.
Transaction quarantines are maintained in a persistent data dictionary table inside the database. Therefore, you can manage quarantines from any RAC instance in the database.
When a DML operation tries to access rows locked by a quarantined transaction error ORA-60451
will be raised as the DML operation cannot be executed while the rows are still locked.
Quarantined Transaction and Replication
Since Oracle Data Guard uses logical replication, quarantine metadata is not replicated to the standby server when using Oracle Data Guard. Therefore, contents of transaction quarantine views, such as DBA_QUARANTINED_TRANSACTIONS
, on the standby server may be different than the entries on the primary server.
When running with Active Data Guard (ADG), the replication is physical which means that for the transaction quarantine feature, both the dead transaction and the catalog representation of the quarantine will be replicated to the standby database.
30.2.1 Monitoring Quarantined Transactions
Alerts and data dictionary views warn the database developer of quarantined transactions.
Oracle Database warns DBAs of quarantined transactions in several ways, which include:
- ALERT_QUE - the transaction quarantine alert is sent to the persistent alert queue
SYS.ALERT_QUE
. This alert is automatically displayed in the data dictionary viewsDBA_OUTSTANDING_ALERTS
andDBA_ALERT_HISTORY
, as well as Enterprise Manager Cloud Control and the AWR report. - Attention log - introduced in Oracle 21c, the attention log contains information about critical and highly visible database events. Starting with Oracle Database 23ai, it includes the transaction quarantine information as well.
- Alert log - an incident will be generated for the internal error and traced in the alert log. The DBA can monitor the quarantine incident in
V$DIAG_ALERT_EXT
.
Views named DBA_QUARANTINED_TRANSACTIONS
and CDB_QUARANTINED_TRANSACTIONS
monitor all active quarantined transactions. These views provides all the necessary information to resolve the quarantine.
Table 30-3 DBA_QUARANTINED_TRANSACTIONS View Columns
Column | Datatype | Null? | Description |
---|---|---|---|
USN | NUMBER | Not Null | Undo segment number of the quarantined transaction. |
SLT | NUMBER | Not Null | Slot number of the quarantined transaction. |
SQN | NUMBER | Not Null | The sequence number of the quarantined transaction. |
UNDO_TSN | NUMBER | The undo tablespace number for the quarantined transaction. | |
TXN_START_SCN | NUMBER | Start SCN of the quarantined transaction. | |
INCIDENT_TIME | VARCHAR2(64) | Identifies the timestamp when the incident happened. | |
REASON | VARCHAR2(256) | The reason why this transaction failed to recover. | |
TRACE_FILE_NAME | VARCHAR2(4096) | The trace file name that contains the reason and diagnosability information for this transaction's recovery failure. | |
UBA_RDBA | NUMBER | Block number of the current undo block being applied for rollback. | |
UBA_SQN | NUMBER | Undo block sequence number. | |
UBA_RECORD_NUMBER | NUMBER | Undo record number. | |
UNDO_RECORD_OBJN | NUMBER | Dictionary object number of the object (OBJN). | |
UNDO_RECORD_OBJD | NUMBER | Dictionary object number of the segment that contains the object (OBJD). | |
PREV_UNDO_BLOCK_DBA | NUMBER | Previous undo block address which was used to rollback. | |
DATA_BLOCK_TSN | NUMBER | Tablespace ID for the object. |
The view DBA_QUARANTINED_TRANSACTIONS
view can be joined with GV$TRANSACTION
and GV$FAST_START_TRANSACTIONS
to get the details of the transaction and its recovery progress. Note that GV$TRANSACTION
will lose its information on a database instance restart because fixed views are not persistent. Since transaction recovery begins after a database instance restart, GV$TRANSACTION
shows the progress of any active transaction recovery even after a database restart.
Parent topic: Automatic Transaction Quarantine
30.2.2 Resolving Quarantined Transactions
The database developer will be alerted when a transaction quarantine is generated. Quarantines should be monitored and resolved quickly to prevent row locks from being held for a long time.
Quarantines can be monitored using DBA_QUARANTINED_TRANSACTIONS
. The REASON
column of the view shows why the transaction was quarantined. For example:
SQL> select usn, slt, sqn, reason, undo_record_objn from dba_quarantined_transactions; USN SLT SQN REASON UNDO_RECORD_OBJN------ ------ ------ ---------------------- ------------------- 6 18 10 ORA-00600[ktubko_1] 73646 7 20 13 ORA-28304 73650
Once the reason for the transaction quarantines has been identified (ORA-00600[ktubko_1]
and ORA-28304
in the example above), then refer to the Primary MOS note for Automatic Transaction Quarantine (Doc ID 3005962.1) where detailed instructions are provided for how to resolve the different causes of transaction quarantines.
Parent topic: Automatic Transaction Quarantine
30.2.4 Transaction Quarantine Escalation
When the transaction quarantine limit is reached (default of 3) for a PDB, it is automatically shut down on all RAC instances so that the database developer can resolve the issue. The other PDBs in the CDB are not affected.
Transaction quarantine is designed to help in cases when the failure, such as memory, data, or state corruption, is confined to a single transaction. That is, the inactive transaction that fails to recover is quarantined, other inactive transactions can be recovered, and there's no need to shut down the PDB or the CDB.
When failures happen across multiple transactions or span the entire PDB, such as physical corruption of multiple blocks, a PDB SGA corruption, or a logical data corruption due to an internal error, quarantining the failed inactive transaction recovery may or may not help. It depends on whether the root cause for those failures is the same or not, because recovering other inactive transactions might run into the same issue. The system keeps on running in an inconsistent state even after quarantining a few transactions. It can be dangerous when the failure is due to logical data corruption, because it spreads over time. To prevent this from happening, there is a transaction quarantine limit of three (3), after which the quarantine is escalated to the database level and the PDB will be terminated using shutdown abort
if archive logging is enabled for the PDB and it is feasible to shut down the PDB. Transaction recovery for the PDB is automatically disabled so that the database developer can correct problems on the next PDB startup.
When an escalation occurs, perform the following steps:
- Open the PDB.
- Query the view DBA_QUARANTINED_TRANSACTIONS to get information about the quarantined transactions.
- For each quarantined transaction in the database, resolve the cause of the transaction quarantine (Resolving Quarantined Transactions) and then drop the transaction quarantine (see Dropping Quarantined Transactions).
- Enable transaction recovery for the PDB.
To enable transaction recovery, use the command:
ALTER SYSTEM SET TRANSACTION_RECOVERY=ENABLED sid='*';
The SCOPE
clause is not necessarily required. The default values for SCOPE
are:
- For PDBs, the default value is
SCOPE=BOTH
. - For CDB$ROOT, if a server parameter file was used to start the database, then the default is
SCOPE=BOTH
. If a parameter file was used to start the database, then the default isSCOPE=MEMORY
.
These default values for SCOPE
will re-enable transaction recovery for automatic transaction quarantine.
To determine if transaction quarantines were escalated to the PDB, alerts are published to all the alert channels described in Monitoring Quarantined Transactions (SYS.ALERT_QUE, Attention log, and Alert log).
Parent topic: Automatic Transaction Quarantine