Tuesday, March 22, 2022

Backup Anywhere offers Expanded Replication for High Availability and More Flexibility

X

Backup Anywhere offers Expanded Replication for High Availability and More Flexibility







The previous release of the Zero Data Loss Recovery Appliance software (19.2.1.1.2) includes 3 new exciting features for replication. 

  • Backup Anywhere - Providing the ability to change roles (upstream vs downstream).
  • Read Only replication - Providing seamless migration to a different Recovery Appliance.
  • Request Only Replication - Providing a High Availability option for backups.

Backup Anywhere

 Backup Anywhere provides even more options for HADR (High Available/Disaster Recovery) with the ability to redirect backups and redo to another Recovery Appliance. In addition, Backup Anywhere provides the ability to perform a role reversal, removing the concept of upstream/downstream.  As the name implies, when replicating between two or more Zero Data Loss Recovery Appliances you can switch the Recovery Appliance that is receiving backups from your protected databases. 

With Backup Anywhere you configure two Recovery Appliance as pairs and create replication servers that point to each other.  The metadata synchronization will ensure backups are replicated to its pair and ensures the Replication Appliance pairs stay in sync.

NOTE: In order to use Backup Anywhere you must use the new REPUSER naming convention of REPUSER_FROM_<source>_TO_<destination>.

For my example, the diagram below depicts a three Zero Data Loss Appliance architecture with the primary databases in New York sending backups to the Recovery Appliance in the New York Data Center,  The Recovery Appliance in the New York Data Center replicates backups to the Recovery Appliance in the London Data Center. And finally, the Recovery Appliance in the London Data Center replicates backups to the Recovery Appliance in Singapore.

New York --> London --> Singapore



But what happens If I want to change which Recovery Appliance I am sending my backups to? With Backup Anywhere I can change the Recovery Appliance receiving backups, and the flow of replicated backups will be taken care of automatically.  With Backup Anywhere the Recover Appliances will seamlessly change the direction of the replication stream based on which Recover Appliance is currently receiving the backups.  Backup Anywhere does this automatically and will still ensure backups on the three Zero Data Loss Appliances are synchronized and available

Singapore --> London --> New York.


 


Read Only Replication

This is my favorite new feature included in the latest Recovery Appliance release. Read Only allows you to easily migrate your backups to a new Recovery Appliance while leaving the older backups still available.

Replication normally synchronizes the upstream catalog with the downstream catalog AND ensures that backups are replicated to the downstream. With Read Only Replication, only the synchronization occurs.  The upstream Recovery Appliance (typically the new RA) knows about the backups on the downstream Recovery Appliance (the old RA).  If a restore is requested that is not on the upstream Recovery Appliance, the upstream will pull the backup from the downstream.

The most common use case is retiring older pieces of equipment, but Read Only Replication can be used for additional use cases.

  • Migrating backups to a new datacenter
  • Migrating backups for a subset of database from an overloaded Recovery Appliance to a new Recovery Appliance to balance the workload

 Replace older Recovery Appliance

In this example I want to replace the current Recovery Appliance (ZDLRAOLD) with a new Recovery Appliance (ZDLRANEW).  During this transition period I want ensure that backups are always available from the protected database.  This example will show the migration of backups from ZDLRAOLD to ZDLRANEW. I am keeping 30 days of backups for my databases and I am starting the migration on September 1.

Step #1 - September 1, configure replication from ZDLRAOLD to ZDLRANEW

Create a replication server from ZDLRAOLD to ZDLRANEW and add the policy(s) for the databases to the replication server.  This will replicate the most current level 0 backup (FULL)  onto ZDLRANEW for all databases without changing the backup location from the protected databases.



Once you have ensured that all databases have replicated a level 0 backup to ZDLRANEW you can remove the replication server from ZDLRAOLD which will stop the replication.

Step #2 - September 2, configure Read Only replication from ZDLRANEW to ZDLRAOLD

Create a replication server from ZDLRANEW to ZDLRAOLD. Add the policies all databases to the replication server and ensure that the read only flag is set when adding the policy.

 

PROCEDURE add_replication_server (
   replication_server_name IN VARCHAR2,
   protection_policy_name IN VARCHAR2
   skip_initial_replication IN BOOLEAN DEFAULT FALSE,
   read_only IN BOOLEAN DEFAULT FALSE,
   request_only IN BOOLEAN DEFAULT FALSE);
 

Note: The Read Only flag must be set when adding the policy to the replication server to ensure backups are NOT replicated from ZDLRANEW to ZDLRAOLD.

 


 

Step #3 - September 3, configure backups from the protected databases to backup to ZDLRANEW.

At this point ZDLRANEW should contain at least 1 full backup for all databases, and the incremental backups will begin on September 3rd.  ZDLRANEW will now contain backups from September 1 (when replication began) until the most current Level 0 virtualized backup taken.  ZDLRAOLD will contain backups from August 4 until September 2nd when protected database backups to ZDLRAOLD were moved to be sent to ZDLRANEW.



Step #4 - September 4+, ZDLRANEW contains all new backups and old backups age off ZDLRAOLD

Below is a snapshot of what the backups would look like 15 days later on September 15th.  Backups are aging off of ZDLRAOLD and ZDLRANEW now contains 15 days of backups.



 

Step #5 - September 15, Restore backups

To restore the protected database using a point in time you would connect the protected database to ZDLRANEW and ZDLRANEW would provide the correct virtual full backup regardless of its location.

1.       If the Full backup prior to the point-in-time is on ZDLRANEW it is restored directly from there.

2.     If the Full backup is NOT on ZDLRANEW, it will get pulled from ZDLRAOLD through ZDLRANEW back to the protected database

The location of the backups is transparent to the protected database, and ZDLRANEW manages where to restore the backup from.



Step #6 - September 30  Retire ZDLRAOLD

At this point the new Recovery Appliance ZDLRANEW contains 30 days of backups and the old Recovery Appliance ZDLRAOLD can be retired.



  

Request Only Mode

 

Request Only Mode is used when Data Guard is present and both the Primary database and the Data Guard database are backing up to a local Recovery Appliance. The two Recovery Appliances synchronize only  the metadata, no backup pieces are actively replicated. But, in the event of a prolonged outage of either Recovery Appliance, this features provides the ability to fill gaps by replicating backups from its paired Recovery Appliance. 

To implement this feature, replication servers are configured on both Recovery Appliances, and the policies are added to the replication server specifying REQUEST_ONLY=TRUE.

 

PROCEDURE add_replication_server (
   replication_server_name IN VARCHAR2,
   protection_policy_name IN VARCHAR2
   skip_initial_replication IN BOOLEAN DEFAULT FALSE,
   read_only IN BOOLEAN DEFAULT FALSE,
   request_only IN BOOLEAN DEFAULT FALSE);
 

Below is my environment that is configured and running in a normal mode. I have my primary database in San Francisco, and my standby database in New York.  Both databases, Primary and Standby are backing up to the local Recovery Appliance in their respective same data center.  Request Only Mode is configured between the two Recovery Appliances.



 

To demonstrate what happens when a failure occurs, I will assume that the Recovery Appliance in the SFO datacenter is down for a period of time.  In this scenario, backups can no longer be sent to the SFO Recovery Appliance, but Data Guard Redo Traffic still occurs to the standby database in New York, and the standby database in New York is still backing up locally to the Recovery Appliance in New York.



When the SFO appliance comes back on-line, it will synchronize the backup information with that on the NYC Recovery Appliance.  The SFO appliance will request datafile backups and any controlfile backups that are older than 48 hours, from NYC appliance.

NOTE: The assumption is that a new backup will occur locally over a faster LAN network and fill any gaps within the last 48 hours. The backups requested from its pair will be transferred over a slower WAN and fill any gaps older than 48 hours

If Real-Time redo is configured, the protected databases will immediately begin the archived log gap fetch process, and fill any gaps in archive logs on SFO appliance that are available on the protected databases. The SFO appliance will also check for new logs to be requested from NYC appliance once per hour over the next 6 hours. This gives time for local arch log gap fetch to run via LAN, which is faster than replicating logs via WAN from NYC.

HADR Bonus Feature: Since the SFO appliance recovery catalog is immediately synchronized with the NYC recovery catalog, backup pieces on the NYC Recovery Appliance are available for recovery.  With this capability you have full recovery protection as soon as the catalog synchronization completes.

 



 

 



This ensures that the SFO Recovery Appliance will be able to provide a short Recovery Point Object without waiting for the next backup job to occur.

All of this happens transparently and quickly returns the Recovery Appliance to the expected level of protection for the database backups.

 

For more details on implementing different replication modes, refer to the Administrator’s Guide.

 

 

 


Tuesday, February 8, 2022

Managing your ZDLRA replication queue remotely

 With the rise of Cyber Crime, more and more companies are looking at an architecture with a second backup copy that is protected with an airgap.   Below is the common architecture that I am seeing.


In this post I will walk through an example of how to implement a simple Java program that performs the tasks necessary to manage the airgap for a ZDLRA that is implemented in a cyber vault (DC1 Vault in the picture).  Feel free to use this as a starting point to automate the process.

Commands

There are 3 commands that I need to be able execute remotely

  • PAUSE      -This will pause the replication server that I configured
  • RESUME - This will resume the replication server that I configured
  • QUERY    - This will query the queue on the upstream to determine how much is left in the queue.
First however I need to configure the parameters to execute the calls.

Config file (airgap.config).

I create config file to customize the script for my environment. Below are the parameters that I needed to connect to the ZDLRA and execute the commands.
  • HOST                    - This is name of the scan listener on upstream ZDLRA.
  • PORT                     - This is the Sqlnet port being used to connect to the upstream ZDLRA
  • SERVICE_NAME - Service name of the database on the upstream ZDLRA
  • USERNAME         - The username to connect to the upstream database
  • PASSWORD          - Password for the user. Feel free to encrypt this in java.
  • REPLICATION_SERVER - Replication server to manage

Below is what my config file looks like.

airgap.host=oracle-19c-test-tde
airgap.port=1521
airgap.service_name=ocipdb
airgap.username=bgrenn
airgap.password=oracle
airgap.replication_server=replairgap


Java code (airgap.java).

Java snippet start

The start of the Java Code will import the functions necessary and set up my class


import java.sql.*;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.io.FileInputStream;
import java.util.Date;
import java.util.Properties;

// Create a arigap class
public class airgap {

   private Properties prop = new Properties();


Java snippet get properties

The first method will get the airgap properties from the property files so that I can use them in the rest of the methods.

// Create a get_airgap_properties method
  public void get_airgap_properties()
        {
                String fileName = "airgap.config";
                try (FileInputStream fis = new FileInputStream(fileName)) {
                    prop.load(fis);
                } catch (FileNotFoundException ex) {
                    System.out.println("cannot find config file airgap.config");
                } catch (IOException ex) {
                    System.out.println("unknown issue finding config file airgap.config");
                }
        }



Java snippet pause replication server

The code below will connect to the database and execute DBMS_RA.PAUSE_REPLICATION_SERVER


// Create a pause_replication  method
  public void pause_replication()
        {
                try     {
                        //Loading driver
                        Class.forName("oracle.jdbc.driver.OracleDriver");

                        //creating connection
                        Connection con = DriverManager.getConnection
                                        ("jdbc:oracle:thin:@//"+
                                         prop.getProperty("airgap.host")+":"+
                                         prop.getProperty("airgap.port")+"/"+
                                         prop.getProperty("airgap.service_name"),
                                         prop.getProperty("airgap.username"),
                                         prop.getProperty("airgap.password"));

                        CallableStatement cs=con.prepareCall("{call dbms_ra.pause_replication_server(?)}");

                        //Set IN Parameters
                        String in1 = prop.getProperty("airgap.replication_server");
                        cs.setString(1,in1);

                        ResultSet rs = cs.executeQuery();   //executing statement


                        con.close();    //closing connection
                        System.out.println("replication server '"+ prop.getProperty("airgap.replication_server")+"' paused");
                        }
                catch(Exception e)      {
                        e.printStackTrace();
                                        }
        }



Java snippet resume replication server

The code below will connect to the database and execute DBMS_RA.RESUME_REPLICATION_SERVER


// Create a pause_replication  method
  public void resume_replication()
        {
                try     {
                        //Loading driver
                        Class.forName("oracle.jdbc.driver.OracleDriver");

                        //creating connection
                        Connection con = DriverManager.getConnection
                                        ("jdbc:oracle:thin:@//"+
                                         prop.getProperty("airgap.host")+":"+
                                         prop.getProperty("airgap.port")+"/"+
                                         prop.getProperty("airgap.service_name"),
                                         prop.getProperty("airgap.username"),
                                         prop.getProperty("airgap.password"));

                        CallableStatement cs=con.prepareCall("{call dbms_ra.resume_replication_server(?)}");

                        //Set IN Parameters
                        String in1 = prop.getProperty("airgap.replication_server");
                        cs.setString(1,in1);

                        ResultSet rs = cs.executeQuery();   //executing statement


                        con.close();    //closing connection
                        System.out.println("replication server '"+ prop.getProperty("airgap.replication_server")+"' resumed");
                        }
                catch(Exception e)      {
                        e.printStackTrace();
                                        }
        }


Java snippet query replication server

The java code below will query the replication queue in the upstream ZDLRA and return 4 columns
  • REPLICATION SERVER - name of the replication server
  • TASKS QUEUED - Number of tasks in the queue to be replicated
  • TOTAL GB QUEUED - Amount of data in the queue
  • MINUTES IN QUEUE - The number of minutes the oldest replication piece has been in the queue.
The last piece of information can be very useful to tell you how current the replication is. With real-time redo, the queue may never be empty.

// Create a queue_select method
  public void queue_select()
        {
                try     {
                        //Loading driver
                        Class.forName("oracle.jdbc.driver.OracleDriver");

                        //creating connection
                        Connection con = DriverManager.getConnection
                                        ("jdbc:oracle:thin:@//"+
                                         prop.getProperty("airgap.host")+":"+
                                         prop.getProperty("airgap.port")+"/"+
                                         prop.getProperty("airgap.service_name"),
                                         prop.getProperty("airgap.username"),
                                         prop.getProperty("airgap.password"));

                        Statement s=con.createStatement();      //creating statement

                        ResultSet rs=s.executeQuery("select replication_server_name,"+
                                                    "       count(*)  tasks_queued,"+
                                                    "       trunc(sum(total)/1024/1024/1024,0) AS TOTAL_GB_QUEUED,"+
                                                    "       round("+
                                                    "         (cast(current_timestamp as date) - cast(min(start_time) as date))"+
                                                    "             * 24 * 60"+
                                                    "         ) as queue_minutes "+
                                                    "from RA_SBT_TASK "+
                                                    "    join ra_replication_config on (lib_name = SBT_library_name) "+
                                                    "          where archived = 'N'"+
                                                    "group by replication_server_name");   //executing statement

                        System.out.println("Replication Server,Tasks Queued,Total GB Queued,Minutes in Queue");

                        while(rs.next()){
                                System.out.println(rs.getString(1)+","+
                                                   rs.getInt(2)+","+
                                                   rs.getInt(3)+","+
                                                   rs.getString(4));
                                        }

                        con.close();    //closing connection
                        }
                catch(Exception e)      {
                        e.printStackTrace();
                                        }
        }



Java snippet main section

Below is the main section, and as you can see you can pass one of the 3 parameters mentioned earlier.





  public static void main(String[] args)
        {

//      import java.sql.*;
         airgap airgap = new airgap();   // Create a airgap object


         airgap.get_airgap_properties();      // Call the queue_select() method
         switch(args[0]) {

                case "resume":
                        airgap.resume_replication();      // Call the resume_replication() method
                        break;
                case "pause":
                        airgap.pause_replication();      // Call the pause_replication() method
                        break;
                case "query":
                        airgap.queue_select();      // Call the queue_select() method
                        break;
                default:
                         System.out.println("parameter must be one of 'resume','pause' or 'query'");
                        }
        }
}


Executing the Java code (airgap.class).

Now if you take the snipets above and put them in a file airgap.java you can compile them into a class file.

javac airgap.java
This creates a class file airgap.class

In order to connect to my oracle database, I downloaded the jdbc driver.

"ojdbc8.jar"

Now I can execute it with the 3 parameters 

$ java -Djava.security.egd=file:/dev/../dev/urandom -cp ojdbc8.jar:. airgap pause
replication server 'replairgap' paused

$ java -Djava.security.egd=file:/dev/../dev/urandom -cp ojdbc8.jar:. airgap resume
replication server 'replairgap' resumed

$ java -Djava.security.egd=file:/dev/../dev/urandom -cp ojdbc8.jar:. airgap query
Replication Server,Tasks Queued,Total GB Queued,Minutes in Queue
ra_replication_config,4,95,58


It's that easy to create a simple java program that can manage your replication server from within an Airgap.


Thursday, December 23, 2021

Cataloging backups and recovering an Oracle Database from the OCI object store

   This is the fourth and final post of a multi-part blog series walking through how to copy your TDE encrypted on premise Oracle Database to an OCI instance in the oracle cloud. This blog post will focus on how to restore your database from the object store, when the backup pieces are not available from your controlfile. 





There a few reasons why this might be the case.

  • The backups were written to the ZDLRA directly.
  • You are using an RMAN catalog, and they have aged off the controlfile.
  • They are "keep" backups which will be stored in the RMAN catalog.
  • You had to rebuild the controlfile, and lost history of backups.
Whatever the reason, there is way to find out what backups are in the Object for your database, and you will be able to catalog them.

NOTE: You can use this same script to delete old backups directly if you've lost your catalog entries.

When you download the Oracle Cloud Backup installation zip file, and execute the "oci_install.jar" command to download the library you will find 5 extra files in the /lib directory with the "libopc.so" file that is used by the RMAN channel. The 2 we are going to use are 
  • odbsrmt.py             --> python script to manage the contents of the object store bucket
  • python_readme.txt --> Documentation for how to use the above python script.

Step #1 Execute odbsrmt.py to get a listing of your backup pieces.

NOTE: The python script uses python 2.x and will not work with python 3.x.  Python 3.x is typically the default version in your path, and you might have to find the 2.x version on your system. For my system this means executing "python2" rather than "python"

If I execute the script without any parameters, I can see what parameters are expected.



[oracle@oracle-19c-test-tde lib]$ python2 odbsrmt.py
usage: odbsrmt.py [-h] --mode
                  {report,rman-listfile,garbage-collection,delete,recall}
                  [--ocitype {classic,swift,bmc,archive}]
                  [--credential CREDENTIAL] [--token TOKEN] --host HOST
                  [--base BASE] [--forcename FORCENAME]
                  [--format {text,xml,json}] [--dbid DBID]
                  [--container CONTAINER] [--dir DIR] [--prefix PREFIX]
                  [--untildate UNTILDATE] [--exclude_deferred]
                  [--thread THREAD] [--proxyhost PROXYHOST]
                  [--proxyport PROXYPORT] [--tocid TOCID] [--uocid UOCID]
                  [--pubfingerprint PUBFINGERPRINT] [--pvtkeyfile PVTKEYFILE]
                  [--skip_check_status] [--debug]
odbsrmt.py: error: argument --mode is required

Now let's go through the most common parameters I am going to use to report on my backups




And now to execute the command to see some of the report.


python2  odbsrmt.py --mode report --ocitype bmc  --host https://objectstorage.us-ashburn-1.oraclecloud.com --dir /home/oracle/ocicloud/report --base mydbreport --pvtkeyfile  /home/oracle/ocicloud/myprivatekey.ppk --pubfingerprint 6d:f9:57:d5:ff:b1:c0:98:81:90:1e:6e:08:0f:d0:69 --tocid ocid1.tenancy.oc1..aaaaaaaanz4trskw6jm57cz2fztoasatto3i6z4h33gzfb3pmei5vvnoq --uocid ocid1.user.oc1..aaaaaaaae2mlwyke4gvd7kzxv5zxgg3k2dlcwvubv7vjy6jvbgsaouxq --container migest_backups  --dbid 301925655


And this will give me the following output in my report file.

FileName
Container                Dbname         Dbid        FileSize          LastModified                BackupType                  Incremental  Compressed   Encrypted
220h9q5f_66_1_1
migest_backups           OCITEST        301925655   72876032          2021-12-21 19:37:33         ArchivedLog                 false        true         true
230h9q5g_67_1_1
migest_backups           OCITEST        301925655   75759616          2021-12-21 19:37:32         ArchivedLog                 false        true         true
240h9q5g_68_1_1
migest_backups           OCITEST        301925655   54263808          2021-12-21 19:37:12         ArchivedLog                 false        true         true
250h9q5g_69_1_1
migest_backups           OCITEST        301925655   48496640          2021-12-21 19:36:58         ArchivedLog                 false        true         true
260h9q9n_70_1_1
migest_backups           OCITEST        301925655   159645696         2021-12-21 19:42:46         Datafile                    true         true         true
270h9q9n_71_1_1
migest_backups           OCITEST        301925655   408682496         2021-12-21 19:47:04         Datafile                    true         true         true
280h9q9n_72_1_1
migest_backups           OCITEST        301925655   524288            2021-12-21 19:37:46         Datafile                    true         true         true
290h9q9n_73_1_1
migest_backups           OCITEST        301925655   56885248          2021-12-21 19:39:37         Datafile                    true         true         true
2a0h9q9v_74_1_1
migest_backups           OCITEST        301925655   235667456         2021-12-21 19:45:05         Datafile                    true         true         true
2b0h9qdi_75_1_1
migest_backups           OCITEST        301925655   233832448         2021-12-21 19:46:18         Datafile                    true         true         true
2c0h9qjb_76_1_1
migest_backups           OCITEST        301925655   52166656          2021-12-21 19:44:31         Datafile                    true         true         true
2d0h9qmk_77_1_1
migest_backups           OCITEST        301925655   1572864           2021-12-21 19:44:43         Datafile                    true         true         true
2e0h9qn3_78_1_1
migest_backups           OCITEST        301925655   34865152          2021-12-21 19:45:41         Datafile                    true         true         true
2f0h9qns_79_1_1
migest_backups           OCITEST        301925655   524288            2021-12-21 19:45:20         Datafile                    true         true         true
2g0h9qrg_80_1_1
migest_backups           OCITEST        301925655   262144            2021-12-21 19:47:14         ArchivedLog                 false        true         true
c-301925655-20211221-00
migest_backups           OCITEST        301925655   524288            2021-12-21 19:47:22         ControlFile SPFILE          false        true         true
Total Storage: 1.34 GB


You can see that this report contains  the backup pieces I need. 

I am going to use the script (below) and pass it the report name to create the commands to catalog the backup pieces.



And when I execute the above script passing my report file, it produces my commands to catalog the backup pieces.

report file used for catalog scripts   : mydbreport4701.lst


catalog device type 'sbt_tape' backuppiece '220h9q5f_66_1_1';
catalog device type 'sbt_tape' backuppiece '230h9q5g_67_1_1';
catalog device type 'sbt_tape' backuppiece '240h9q5g_68_1_1';
catalog device type 'sbt_tape' backuppiece '250h9q5g_69_1_1';
catalog device type 'sbt_tape' backuppiece '260h9q9n_70_1_1';
catalog device type 'sbt_tape' backuppiece '270h9q9n_71_1_1';
catalog device type 'sbt_tape' backuppiece '280h9q9n_72_1_1';
catalog device type 'sbt_tape' backuppiece '290h9q9n_73_1_1';
catalog device type 'sbt_tape' backuppiece '2a0h9q9v_74_1_1';
catalog device type 'sbt_tape' backuppiece '2b0h9qdi_75_1_1';
catalog device type 'sbt_tape' backuppiece '2c0h9qjb_76_1_1';
catalog device type 'sbt_tape' backuppiece '2d0h9qmk_77_1_1';
catalog device type 'sbt_tape' backuppiece '2e0h9qn3_78_1_1';
catalog device type 'sbt_tape' backuppiece '2f0h9qns_79_1_1';
catalog device type 'sbt_tape' backuppiece '2g0h9qrg_80_1_1';
catalog device type 'sbt_tape' backuppiece 'c-301925655-20211221-00';


Now in RMAN I can execute these commands to catalog the backup pieces from the OCI bucket.

Note : By using "untildate" you  can control the dates that will be reported on.