Friday, March 8, 2024

Autonomous Recovery Service Checklist

 Utilizing the Autonomous Recovery Service (ARS) for your Oracle Databases in OCI is the best method for backing up your Oracle databases.  In this post I will go through the steps required to successfully implement this service.  To learn more about this service you can find the documentation here.


1. Ensure your tenancy's resource limits are adequate

Before implementing the ARS, you first must see what the resource settings are in your tenancy. You want to make sure that the "Space Used for Recovery Window (GB)" and "Protected Database Count" allow for the number of databases, and backup size of the databases you want to utilize the service for.

Below is what you would see for the ARS. This is a screen shot from my free tenancy.  In your tenancy you should see what the current limits are.  When looking at the root compartment, this will show you the Limits and usage for the whole tenancy.


If you need to increase the limits for your tenancy click on the 3 dots to the right of the limit you want to increase. It will bring up a choice to "Open Support Request".  After choosing "Open Support Request" you will see a window that allows you to request a limit increase for your tenancy.

NOTE: There is a second choice when clicking on the 3 dots to "Create Quota Policy Stub". Using the stub displayed you can limit the quota of specific compartments.  This can be used to limit the usage for your "dev" compartment for example, ensuring there is space within your limits for production


2. Verify the policies for the tenancy

A) Set root compartment policies for service



Tenancy Polices for ARS
Policy Statement Purpose

Allow service database to manage recovery-service-family in tenancy

Enables the OCI Database Service to access protected databases, protection policies, and Recovery Service subnets within your tenancy.

Allow service database to manage tagnamespace in tenancy

Enables the OCI Database Service to access the tag namespace in a tenancy.

Allow service rcs to manage recovery-service-family in tenancy

Enables Recovery Service to access and manage protected databases, Recovery Service subnets, and protection policies within your tenancy.

Allow service rcs to manage virtual-network-family in tenancy

Enables Recovery Service to access and manage the private subnet in each database VCN within your tenancy. The private subnet defines the network path for backups between a database and Recovery Service.

Allow group admin to manage recovery-service-family in tenancy

Enables users in a specified group to access all Recovery Service resources. Users belonging to the specified group can manage protected databases, protection policies, and Recovery Service subnets.


B) Allow users (in group) to manage the Recovery Service


Group Policy Statement by Compartment
Policy Statement Create In Purpose

Allow group {group name} to manage recovery-service-policy in compartment {location}

Compartment that owns the protection policies. Enables all users in a specified group to create, update, and delete protection policies in Recovery Service.


C) Allow users (in group) to manage the required subnet for the Recovery Service


Group Policy Statement by Compartment
Policy StatementCreate InPurpose

Allow Group {group name} to manage recovery-service-subnet in compartment {location}

Compartment that owns the Recovery Service subnets.Enables all users in a specified group to create, update, and delete Recovery Service subnets.


3. Configure Network Resources for Recovery Service

The Recovery Service uses Private endpoints to control backup traffic between your database and the recovery service.  Below is the architecture.



Each Recovery service subnet needs to be created within the VNC where your database resides.  

The minimum size of the subnet is /24 (256 IP addresses).  You can create a new subnet, or use an preexisting subnet in you database VCN.  This subnet must be IPv4.

Security rules (implemented through Security Lists, or Network Security Groups) for the private subnet must include stateful ingress rules to allow destination ports 8005 and 2484.

NOTE: You can use a public subnet, but it is not recommended for security reasons.

This private subnet must be registered as a Recovery Service Subnet.

Checklist for Security rules (Security List or NSG)

1. Rule 1 - Ingress. Allow HTTPS traffic from anywhere

  • Stateless: No (All rules must be stateful)
  • Source Type: CIDR
  • Source CIDR : CIDR of the VCN where the database resides
  • IP Protocol: TCP
  • Source Port Range: All
  • Destination Port Range: 8005

2. Rule 2 - Ingress. Allow SQLNet traffic from anywhere

  • Stateless: No (All rules must be stateful)
  • Source Type: CIDR
  • Source CIDR : CIDR of the VCN where the database resides
  • IP Protocol: TCP
  • Source Port Range: All
  • Destination Port Range: 2484
NOTE: If your VCN restricts network traffic between subnets, ensure to add an egress rule for ports 2484, and 8005 from the database subnet to the Recovery Service subnet that you create.

3. Register the subnet with recovery service


Under Oracle Database --> Database Backups you need to click on "Recovery Service Subnets" and register the Recovery Service Subnet.



4. Ensure the Recovery Service Subnet can communicate with Oracle services.

The Recovery Service Subnet that you registered needs to communicate with the Recovery Service. In order to access the service, the routing table for this subnets needs to include "All IAD Services In Oracle Services Network".


If all these pieces are in place you should be ready to successfully configure your database backups to go the Recover Service for backup.


Short checklist

  1. Check your limits and quotas for the recovery service
  2. Create the policies for the Recovery Service, and groups (users) to manage the recovery service
  3. Create the subnet for the Recovery Service making sure you have the correct security settings, and the subnet has access to Oracle services
  4. Register the subnet as the Recovery Service Subnet.


Wednesday, February 14, 2024

DB Script management through pre-authenticated URLs

 Pre-authenticated URLs in OCI are fast becoming one of my favorite features of using object storage.  In this blog post I will explain how I am using them for both:

  • Storing the current copy of my backup scripts and dynamically pulling it from my central repository
  • uploading all my logs files to a central location
Pre-authenticated URL creation

PROBLEM:


The problem I was trying to solve, is that I wanted to create a script to run on all my database nodes to create a weekly archival backup.
Since I have databases that are both Base DB databases, and ExaCS I found that I was continuously making changes to my backup script.  Sometimes it was checking something different in my environment, and sometimes it was improving the error checking.
Each time I made a change, I was going out to every DB host and copying the new copy of my script.
Eventually I realized that Pre-authenticated URLs could not only help me ensure all my DB hosts are running the current copy of my backup script, they could be the central log location.

Solution:


Solution #1 - Script repository


The first problem I wanted to solve, was that I wanted to configure a script repository that I could dynamically pull the most current copy of my scripts from. Since I am running in OCI, I was looking for a "Cloud Native" solution rather than using NFS mounts that are shared across all my DB hosts.
To complicate things, I have databases that are running in different tenancies.

Step #1 - Store scripts in a bucket

The first step was to create a bucket in OCI to store both the scripts and logs.  Within that bucket, under "More Actions" I chose "Create New Folder" and I created 2 new folders, "logs" and "scripts".
Then within the scripts folder I uploaded by current scripts
  • rman.sh - My executable script that will set the environment and call RMAN
  • backup.rman - My RMAN script that contains the RMAN command to backup my database.

Step #2 - Create a Pre-Authenticated Request

The next step was to create a Pre-Authenticated request on the "scripts" folder.  Next to the scripts folder I clicked on the  3 dots and chose "Create Pre-Authenticated Request".
On the window that came up, I changed the expiration to be 1 year in the future (the default is 7 days).  I chose the "Objects with prefix" box so that I could download any scripts that I put in this folder to the DB hosts.  I also made sure the "Access Type" is "Permit object reads on those with specified prefix".
I did not chose "Enable Object Listing".
These settings will allow me to download the scripts from this bucket using the Pre-Authenticated URL only.  From this URL you will not be able to list the objects, or upload any changes.


Step #3 - Create wrapper script to download current scripts

Then using the Pre-Authenticated URL in a wrapper script, I download the current copies of the scripts to the host and then executed my execution script (rman.sh) with a parameter.

Below you can see that I am using curl to download my script (rman.sh) and storing it my local script directory (/home/oracle/archive_backups/scripts).  I am doing the same thing for the RMAN command file.
Once I download the current scripts, I am executing the shell script (rman.sh) .


curl -X GET https://{my tenancy}.objectstorage.us-ashburn-1.oci.customer-oci.com/p/{actual URL is removed }/n/id20skavsofo/b/bgrenn/o/scripts/rman.sh --output /home/oracle/archive_backups/scripts/rman.sh
curl -X GET https://{my tenancy}.objectstorage.us-ashburn-1.oci.customer-oci.com/p/{actual URL is removed }/n/id20skavsofo/b/bgrenn/o/scripts/backup.rman --output /home/oracle/archive_backups/scripts/backup.rman


/home/oracle/archive_backups/scripts/rman.sh $1


Solution #2 - Log repository

The second problem I wanted to solve was to make it easy review the execution of my scripts.  I don't want to go to each DB host and look at the log file.  I want to have the logs stored in a central location that I can check.  Again Pre-Authenticated URLs to the rescue !

Step #1 - Create the Pre-Authenticated URL

In the previous steps I already create a "logs" folder within the bucket. In this step I want to create a Pre-Authenticated URL like I did for the scripts, but in this case I want to use it to store the logs.
Like before I chose "Create Pre-Authenticated Request" for the "logs" folder.
This time, I am choosing "Permit object writes to those with the specified prefix". This will allow me to write my log files to this folder in the bucket, but not list the logs, or download any logs.


Step #2 - Upload the log output from my script

The nice thing was once I implemented Solution #1, and had all of my DB nodes already downloading the current script.  I updated the script to add an upload to object storage of the log file and they will all use my new script.
In my script I already had 2 variables set
  • NOW - The current date in "yyyymmdd" format
  • LOGFILE - The name of the output log file from my RMAN backup.
Now all I had to do was to add a curl command to upload my log file to the bucket.

Note I ma using the NOW variable to create a new folder under "logs" with the data so that my script executions are organized by date.

curl --request PUT --upload-file /home/oracle/archive_backups/logs/${LOGFILE} https://{My tenancy}.objectstorage.us-ashburn-1.oci.customer-oci.com/p/{URL removed}/n/id20skavsofo/b/bgrenn/o/logs/${NOW}/${LOGFILE}

BONUS


If I wanted to get fancy I could have put my LOGS in a new bucket, and configured  a lifecycle management rule to automatically delete logs after a period of time from the bucket.

Tuesday, January 9, 2024

RMAN create standby database - Restore or Duplicate ?

RMAN create standby database - Are you like me and use "restore database" for large databases, or like most people (based on my Linkedin poll) and use "duplicate for standby"? 

The table below shows you the 3 main differences between the 2 methods.


This post started with a discussion within my team around which method you use. I, being of the "restore database" camp, didn't realize how commonly used "duplicate for standby" is. 
I have also dug through the documentation, and there is no common method that is mentioned. Even the 21c documentation for creating a standby database doesn't mention using the duplicate command.
I also was pointed to a MOS note that goes through creating a standby directly from the primary across multiple nodes, and with encryption.  Creating a Physical Standby database using RMAN restore database from service (Doc ID 2283978.1)

Well in this post, I will explain why "restore database" has been my preference. 

NOTE : If you are creating a standby database that is encrypted and the source database is not (creating a standby database in OCI for example) then I have instructions at the end of this post for how to use "Restore Database" to create a hybrid standby database.

Duplicate database for standby


From the poll I ran, this is the most common way to create a standby database.  It is probably the simplest way also because a lot of the configuration of the standby database is done automatically as part of the automated process.
Below is the simplified steps to perform this process.

PRE work

  1. Create simple initfile on the standby host.  The real SPFILE will be brought over as part of the duplication process.  This may contain location parameters for datafiles and redo logs if different from the primary.
  2. Create directories on the standby host.  This includes the audit directory, and possibly the database file directories if they are different from the host.
  3. Startup nomount.

Duplicate 

The duplicate process automatically performs these major steps using the standby as an auxiliary instance.

  1.  Create an SPFILE. The process creates an SPFILE for the standby and sets parameters for the standby.
  2. Shutdown/Startup standby database. This will use the newly created SPFILE during the rest of the processing
  3. Restore backup controlfile for standby database. The controlfile for the standby database is put in place, and the spfile is updated to it's location
  4. Mount controlfile . Mount the controlfile that was restored
  5. Restore database . Restore the datafiles files for the CDB and PDBs to their new location on the standby
  6. Switch datafile . Uses the new location of the datafiles that were restored.
  7. Create standby redo logs.
  8. Set parameters for standby database. The parameters necessary to communicate with the primary database are set.
  9. Put standby in recover mode . By this time, you should have set the primary database to communicate with the standby database.

NOTES

If you noticed above, I highlighted the second step which forces a shutdown/startup of the standby database. Because of this step, it is not possible to use this method and restore across nodes in a RAC database.  This can cause the duplicate operation to take much longer for larger databases.
Then in step #5 you can see that the "Restore Database" is automatic in the processing and it is not possible to perform a "restore as encrypted" if you are migrating to OCI from a non-TDE database.  The duplicate process does support "restore as encrypted", but only for creating a new Database, not a standby database.

Restore Database


This is the method that I've always used.  There is no automation, but it gives you much more control over the steps.  

PRE work

  1. Restore copy of prod SPFILE to standby host.  For this process, it doesn't matter if it is an intifile or spfile.  In this file you set all the parameters that are needed for the standby database to communicate with the primary and store datafiles/logfiles in the correct location.
  2. Create directories on the standby host.  This includes the audit directory, and possibly the database file directories if they are different from the host.
  3. Startup nomount.
  4. Create copy of primary controlfile for standby. This will be used for the standby database, and should contain the backup catalog  of the primary database, and the RMAN settings including the  channel definitions.
  5. Copy standby controlfile to standby host. The controlfile is copied to the standby host, and may be put in ASM at this point. Ensure the spfile points to the controlfile (and/or srvctl).
  6. Alter database mount.  Mount the controlfile. 
  7. Start up ALL nodes in the RAC cluster in mount mode.  This will allow you to restore the database across ALL nodes in the RAC cluster, and include all the networking from these nodes.  For a large database hosted on multiple DB nodes this can make a HUGE difference when restoring the database.
  8. Create (or copy) TDE wallet.  If the standby database is going to be TDE, then include the wallet if the primary is TDE, or create a new wallet and key if the standby database is going to be TDE.

Restore Database 

The restore process is a manual process

  1.  RMAN Connect to database (and possibly RMAN catalog). Connect to the database and make sure you have access to the backups. For ZDLRA this may mean connecting to the RMAN catalog.
  2. Restore Database (as encrypted). This will restore the database to the new location.  With Restore Database, the database can be encrypted during the restore operation.  With 19c it is supported to have the standby database be encrypted without the primary database being encrypted (Hybrid dataguard).
  3. Switch datafile . Uses the new location of the datafiles that were restored.
  4. Recover database. This will use the archive logs that are cataloged to bring the standby database forward
  5. Create standby redo logs.
  6. Set parameters for standby database. The parameters necessary to communicate with the primary database are set.
  7. Put standby in recover mode . By this time, you should have set the primary database to communicate with the standby database.


NOTES

With the restore database, there are 2 sections I highlighted and these are the advantages that I love about using this method.
  • RMAN is restoring across multiple nodes in a RAC cluster which can make the restore operation much faster.
  • Restore as encrypted allows you take a database that may have TDE partially implemented, or not implemented and create a new standby database that is encrypted. With the duplicate method, TDE would have to be implemented separately.
If you are restoring a VERY large database (200 TB for example) that was not TDE from object storage to the Exadata Cloud Service, both of these advantages can make a HUGE difference when creating a standby database.

Comparison

The chart below compares the the differences between "Duplicate Database" and "Restore Database".

WARNING: When using a ZDLRA for backups, it is NOT recommended to use the "Restore Database" to clone a database as a new copy. Registering the restored copy can cause issues with the RMAN catalog because the "restore database" leaves entries in the RC_SITE table.



Data Guard Hybrid Cloud Configuration

The ability to create a hybrid cloud configuration was introduced in Version 19.16 and there is a great blog post from Glen Hawkins explaining this feature.
This feature allows you have your Primary database remain unencrypted (no ASO license), but still have the standby database in OCI be TDE encrypted.

In this section I want to talk about how you can use "Restore Database as Encrypted" to implement this configuration quickly.

If you want to implement this feature using "Duplicate for standby" you have to separately encrypt the datafiles once they are restored in OCI.  This can be done online, or offline, but it is still a time consuming task.

Prepare the primary and future standby databases

The first step is prepare the primary database and future standby database by creating a wallet file and setting encryption keys.  There is a great video put together by Peter Wahl (PM for TDE and OKV) that goes through a lot of the steps.

Below is a summary of the steps you need to perform.  You can follow along the steps in Peter's video and I will point out where in the video you will find each step.

  • Create the directories on the primary (3:40) -  Directories are specified in the video and need to be created on all nodes in a RAC cluster.
  • Create the directories on the standby database (4:18) -Directories are specified in the video and need to be created on all nodes in a RAC cluster.
  • Set the wallet_root in the primary (4:25) - This is set in the SPFILE only
  • Set tablespace_encryption to decrypt_only on primary (4:40) -  This is set in the SPFILE only
  • Set the default algorithm to AES256 on primary (4:50) - This is set in the SPFILE only
  • Set wallet_root on standby, tablespace_encryption to auto_enable, and default algorithm on standby --  This is set in the initfile that you create prior to performing the restore.  This step is different from the video because there is no standby at this point.
  • Bounce the primary database (5:50) - This can be in a rolling manner.
  • Create a password protected wallet on the primary (7:25) - This gets created in the default location specified from WALLET_ROOT
  • Create an auto open wallet on the primary (7:30) - This will make it automatically open for the database.
  • Set the encryption keys in the primary (7:40) - The keys will be created so that they can be used when restoring the datafiles on the standby database.
  • Copy the wallets from the primary to the standby (7:55) - This provides the standby database with the keys to encrypt.