Thursday, September 29, 2022

ZFSSA File Retention and Snapshot Retention provide protection for RMAN incremental merge backups.

File Retention Lock and Snapshot Retention Lock are great new features on ZFSSA that can help protect your backups from deletion and help you meet regulatory requirements. Whether it be an accidental deletion or a bad actor attempting to corrupt your backups they are protected.

In this post I am going to walk through how to implement File Retention and Snapshot Retention together to protect an RMAN incremental merge backup from being deleted . 

 Why do I need both? 

The first question you might have is why do I need both File Retention and Snapshot Retention to protect my backups ? RMAN incremental merge backups consists of 3 types of backup pieces.

 FILE IMAGE COPIES - Each day when the backup job is executed the same image copy of each datafile file is updated by recovering the datafile with an incremental backup. This moves the image copy of each datafile forward one day using the changed blocks from the incremental backup. The backup files containing the image copy of the datafiles needs to be updatable by RMAN.

INCREMENTAL BACKUP - Each day a new incremental backup (differential) is taken. This incremental backup contains the blocks that changed in the database files since the previous incremental backup. Once created this file does not change. 

 ARCHIVE LOG BACKUPS - Multiple times a day, archive log backups (also known as log sweeps) are taken. These backup files contain the change records for the database and do not change once written. 


 How to leverage both retention types 


 SNAPSHOT RETENTION can be used to create a periodic restorable copy of a share/project by saving the unique blocks as of the "snapshot" time a new snapshot is taken. Each of these periodic snapshots can be scheduled on a regular basis. With snapshot retention, the snapshots are locked from being deleted, and the schedule itself is locked to prevent tampering with the snapshots. This is perfect for ensuring we have a restorable copy of the datafile images each time they are updated by RMAN.

FILE RETENTION can be used to lock both the incremental backups and the archive log backups. Both types of backup files do not change once created and should be locked to prevent removal or tampering with for the retention period. 


 How do I implement this ? 

First I am going create a new project for my backups named "DBBACKUPS". Of course you could create 2 different projects. Within this project I am going to create 2 shares with different retention settings. 

 FULLBACKUP - Snapshot retention share 

 My image copy backups are going to a share that is protected with snapshot retention. The documentation on where to start with snapshot retention can be found here. In the example below I am keeping 5 days of snapshots, and I am locking the most recent 3 days of snapshots. This configuration will ensure that I have locked image copies of my database files for the last 3 days. 

 NOTE: Snapshots only contain the unique blocks since the last snapshot, but still provide a FULL copy of each datafile. The storage used to keep each snapshots is similar to the storage needed for each incremental backup. 

ZFSSA snapshot retention settings for /fullbackup




 DAILYBACKUPS - File Retention share 

My incremental backups and archivelog backups are going to a share with File Retention. The files (backup pieces) stored on this share will be locked from being modified or deleted. The documentation on where to start with File Retention can be found here

 NOTE: I chose the "Privileged override" file retention policy. I could have chosen "Mandatory" file retention policy if I wanted to lock down the backup pieces even further. 

 In the example below I am retaining all files for 6 days. 

ZFSSA file retention settings for /dailybackups



DAILY BACKUP SCRIPT 


Below is the daily backup script I am using to perform the incremental backup, and the recovery of the image copy datafiles with the changed blocks. You can see that I am allocating channels to "/fullbackup" which is the share configured with Snapshot Retention, and the image copy backups are going to this share. The incremental backups are going to "/dailybackups" which is protected with File Retention. 

run {
  ALLOCATE CHANNEL Z1 TYPE DISK  format '/fullbackup/radb/DATA_%N_%f.dbf';
  ALLOCATE CHANNEL Z2 TYPE DISK  format '/fullbackup/radb/DATA_%N_%f.dbf';
  ALLOCATE CHANNEL Z3 TYPE DISK  format '/fullbackup/radb/DATA_%N_%f.dbf';
  ALLOCATE CHANNEL Z4 TYPE DISK  format '/fullbackup/radb/DATA_%N_%f.dbf';
  ALLOCATE CHANNEL Z5 TYPE DISK  format '/fullbackup/radb/DATA_%N_%f.dbf';
  ALLOCATE CHANNEL Z6 TYPE DISK  format '/fullbackup/radb/DATA_%N_%f.dbf';
  
  backup
    section size 32G
    incremental level 1
    for recover of copy with tag 'DEMODBTEST' database FORMAT='/dailybackups/radb/FRA_%d_%T_%U.bkp';
  recover copy of database with tag 'DEMODBTEST' ;
  RELEASE CHANNEL Z1;
  RELEASE CHANNEL Z2;
  RELEASE CHANNEL Z3;
  RELEASE CHANNEL Z4;
  RELEASE CHANNEL Z5;
  RELEASE CHANNEL Z6;
}


 ARCHIVELOG BACKUP SCRIPT 

Below is the log sweep script that will perform the periodic backup of archive logs and send them to the "/dailybackups" share which has File Retention configured. 

run {
  ALLOCATE CHANNEL Z1 TYPE DISK  format '/dailybackups/radb/ARCH_%U.bkup';
  ALLOCATE CHANNEL Z2 TYPE DISK  format '/dailybackups/radb/ARCH_%U.bkup';
  ALLOCATE CHANNEL Z3 TYPE DISK  format '/dailybackups/radb/ARCH_%U.bkup';
  ALLOCATE CHANNEL Z4 TYPE DISK  format '/dailybackups/radb/ARCH_%U.bkup';
  ALLOCATE CHANNEL Z5 TYPE DISK  format '/dailybackups/radb/ARCH_%U.bkup';
  ALLOCATE CHANNEL Z6 TYPE DISK  format '/dailybackups/radb/ARCH_%U.bkup';

  
  backup
    section size 32G
    filesperset 32
    archivelog all;
  RELEASE CHANNEL Z1;
  RELEASE CHANNEL Z2;
  RELEASE CHANNEL Z3;
  RELEASE CHANNEL Z4;
  RELEASE CHANNEL Z5;
  RELEASE CHANNEL Z6;
}




 RESULT: 

This strategy will ensure that I have 5 days of untouched full backups available for recovery. It also ensures that I have 6 days of untouched archive logs, and incremental backups that can be applied if necessary. This will protect my RMAN incremental merge backups using a combination of Snapshot Retention for backup pieces that need to be updated, and File Retention for backup pieces that will not change.

Friday, July 29, 2022

OCI Database backups with retention lock

 OCI Object Storage provides both lifecycle rules and retention lock.  How to take advantage of both these features isn't always as easy as it looks.

 In this post I will go through an example customer request and how to implement a backup strategy to accomplish the requirements.

OCI Buckets

This image above gives you an idea of what they are looking to accomplish.

Requirements

  • RMAN retention is to keep a 14 day point in time recovery window
  • All long term backups beyond 14 days are cataloged as KEEP backups
  • All buckets are protected with a retention rule to prevent backups from being deleted before they become obsolete
  • Backups are moved to lower tier storage when appropriate to save costs.

Backup strategy

  • A full backup is taken every Sunday at 5:30 PM and this backup is kept for 6 weeks.
  • Incremental backups are taken Monday through Saturday at 5:30 PM and are kept for 14 days
  • Archive log sweeps are taken 4 times a day and are kept for 14 days
  • A backup is taken the 1st day of the month at 5:30 PM and this backup is kept for 13 months.
  • A full backup is taken following the Tuesday morning bi-weekly payroll run and is kept for 7 years
This sounds easy enough.  If you look at the image above you can what this strategy looks like in general. I took this strategy and mapped it to the 4 buckets, how they would be configured, and what they would contain. This is the image below.

OCI Object rules


Challenges


As I walked through this strategy I found that it involved some challenges. My goal was limit the number of full backups to take advantage of current backups.  Below are the challenges I realized exist with this schedule
  • The weekly full backup taken every Sunday is kept for longer than the incremental backups and archive logs. This caused 2 problems
    1. I wanted to make this backup a KEEP backup that is kept for 6 weeks before becoming obsolete.  Unfortunately KEEP backups are ignored as part of an incremental backup strategy. I could not create a weekly full backup that was both a KEEP backup and also be used as part of  incremental backup strategy.
    2. Since the weekly full backup is kept longer than the archive logs, I need to ensure that this backup contains the archive logs needed to defuzzy the backup without containing too many unneeded archive logs
  • The weekly full backup could fall on the 1st of the month. If this is the case it needs to be kept for 13 months otherwise it needs to be kept for 6 weeks.
  • I want the payrun backups to be immediately placed in archival storage to save costs.  When doing a restore I want to ignore these backups as they will take longer to restore.
  • When restoring and recovering the database within the 14 day window I need to include channels allocated to all the buckets that could contain those buckets. 14_DAY, 6_WEEK,  and 13_MONTH.

Solutions

I then worked through how I would solve each issue.

  1. Weekly full backup must be both a normal incremental backup and KEEP backup - After doing some digging I found the best way to handle this issue was to CHANGE the backup to be a KEEP backup with either a 6 week retention, or a 13 month retention from the normal NOKEEP type. By using tags I can identify the backup I want change after it is no longer needed as part of the 14 day strategy.
  2. Weekly full backup contains only archive logs needed to defuzzy - The best way to accomplish this task is to perform an archive log backup to the 14_DAY bucket immediately before taking the weekly full backup
  3. Weekly full backup requires a longer retention - This can be accomplished by checking if the the full backup is being executed on the 1st of the month. If it is the 1st, the full backup will be placed in the 13_MONTH bucket.  If it is not the 1st, this backup will be placed in the 6_WEEK bucket.  This backup will be created with a TAG with a format that can be used to identify it later.
  4. Ignore bi-weekly payrun backups that are in archival storage - I found that if I execute a recovery and do not have any channels allocated to the 7_YEAR bucket, it will may try to restore this backup, but it will not find it and move to the next previous backup. Using tags will help identify that a restore from the payrun backup was attempted and ultimately bypassed.
  5. Include all possible buckets during restore - By using a run block within RMAN I can allocate channels to different buckets and ultimately include channels from all 3 appropriate buckets.
Then as a check I drew out a calendar to walk through what this strategy would look like.

OCI backup schedule


Backup examples

Finally I am including examples of what this would look like.

Mon-Sat 5:30 backup job



dg=$(date +%Y%m%d)
rman <<EOD
run {
ALLOCATE CHANNEL daily1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
ALLOCATE CHANNEL daily2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
backup incremental level 1 database tag="incr_backup_${dg}" plus archivelog tag="arch_backup_${dg}";
   }
exit
EOD

Sat 5:30 backup job schedule

1) Clean up archive logs first



dg=$(date +%Y%m%d:%H)
rman <<EOD
run {
ALLOCATE CHANNEL daily1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
ALLOCATE CHANNEL daily2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
backup archivelog tag="arch_backup_${dg}";
   }
exit
EOD

2a) If this 1st of the month then execute this script to send the full backup to the 13_MONTH bucket


dg=$(date +%Y%m%d)
rman <<EOD
run {
ALLOCATE CHANNEL monthly1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/13_MONTH.ora)';
ALLOCATE CHANNEL monthly2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/13_MONTH.ora)';
backup incremental level 1 database tag="full_backup_${dg}" plus archivelog tag="full_backup_${dg}";
   }
exit
EOD


2b) If this is NOT the 1st of the month execute this script and send the full backup to the 6_WEEK bucket

dg=$(date +%Y%m%d)
rman <<EOD
run {
ALLOCATE CHANNEL weekly1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/6_WEEK.ora)';
ALLOCATE CHANNEL weekly2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/6_WEEK.ora)';
backup incremental level 1 database tag="full_backup_${dg}" plus archivelog tag="full_backup_${dg}";
   }
exit
EOD


3a) If today is the 15th then change the  full backup to a 13 month retention


dg=$(date --date "-14 days" +%Y%m%d)
rman <<EOD
CHANGE BACKUPSET TAG="full_backup_${dg}" keep until time 'sysdate + 390';
EOD

3b) If today is NOT the 14th then change the  full backup to a 6 week retention


dg=$(date --date "-14 days" +%Y%m%d)
rman <<EOD
CHANGE BACKUPSET TAG="full_backup_${dg}" keep until time 'sysdate + 28';
EOD

Tuesday after payrun backup job 

1) Clean up archive logs first


dg=$(date +%Y%m%d:%H)
rman <<EOD
run {
ALLOCATE CHANNEL daily1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
ALLOCATE CHANNEL daily2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
backup archivelog tag="arch_backup_${dg}";
   }
exit
EOD

2) Execute the keep backup


dg=$(date +%Y%m%d)
rman <<EOD
run {
ALLOCATE CHANNEL yearly1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/7_YEAR.ora)';
ALLOCATE CHANNEL yearly2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/7_YEAR.ora)';
backup database tag="payrun_backup_${dg}" plus archivelog tag="full_backup_${dg}" keep until time 'sysdate + 2555';
   }
exit
EOD


Restore example

Now in order to restore, I need to allocate channels to all the possible buckets. Below is the script I used  to validate this with a "restore database validate" command.


run {
ALLOCATE CHANNEL daily1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
ALLOCATE CHANNEL daily2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/14_DAY.ora)';
ALLOCATE CHANNEL weekly1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/6_WEEK.ora)';
ALLOCATE CHANNEL weekly2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/6_WEEK.ora)';
ALLOCATE CHANNEL monthly1 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/13_MONTH.ora)';
ALLOCATE CHANNEL monthly2 DEVICE TYPE     'SBT_TAPE' PARMS  'SBT_LIBRARY=/home/oracle/cloudbackup/lib/libopc.so ENV=(OPC_PFILE=/home/oracle/ociconfig/config/13_MONTH.ora)';
restore database validate;
    }


Below is what I am seeing in the RMAN log because I picked a point in time where I want it to ignore the 7_YEAR backups.

In this case you can see that it tried to retrieve the Payrun backup but failed back to the previous backup with tag "FULL_073122". This is the backup I want.


channel daily1: starting validation of datafile backup set
channel daily1: reading from backup piece h613o4a4_550_1_1
channel daily1: ORA-19870: error while restoring backup piece h613o4a4_550_1_1
ORA-19507: failed to retrieve sequential file, handle="h613o4a4_550_1_1", parms=""
ORA-27029: skgfrtrv: sbtrestore returned error
ORA-19511: non RMAN, but media manager or vendor specific failure, error text:
   KBHS-07502: File not found
KBHS-01404: See trace file /u01/app/oracle/diag/rdbms/acmedbp/acmedbp/trace/sbtio_4819_140461854265664.log for det
failover to previous backup

channel daily1: starting validation of datafile backup set
channel daily1: reading from backup piece gq13o3rm_538_1_1
channel daily1: piece handle=gq13o3rm_538_1_1 tag=FULL_073122
channel daily1: restored backup piece 1
channel daily1: validation complete, elapsed time: 00:00:08


That's all there is to it. Tags are very help helpful to identify the correct backups.



Thursday, July 28, 2022

ZFSSA replicating locked snaphots to OCI for offsite backup

ZFSSA replication can be used to create locked offsite backups. In this post I will show you how to take advantage of the new "Locked Snapshot" feature of ZFSSA and the ZFS Image in OCI to create an offsite backup strategy to OCI.

ZFSSA Snapshot Replication
If you haven't heard of the locked snapshot feature of ZFSSA I blogged about here.  In this post I am going to take advantage of this feature and show you how you can leverage it to provide a locked backup in the Oracle Cloud using the ZFS image available in OCI.

In order to demonstrate this I will start by following the documentation to create a ZFS image in OCI as my destination.  Here is a great place to start with creating the virtual ZFS appliance in OCI.

Step 1 - Configure remote replication from source ZFSSA to ZFS appliance in OCI. 


By enabling the "Remote Replication" service with a named destination, "downstream_zfs" in my example, I can now replicate to my ZFS appliance in OCI.

zfssa remote replication


Step 2 -  Ensure the source project/share has "Enable retention policy for Scheduled Snapshots" turned on


For my example I created a new project "Blogtest".  On the "snapshots" tab I put a checkmark next to 
"Enable retention policy for Scheduled Snapshots".  By checking this, the project will adhere to preventing the deletion of any locked snapshots.  This property is replicated to the downstream and will cause the replicated project shares to also adhere to locking snapshots.  This can also be set at the individual share level if you wish to control the configuration of locked snapshots for individual shares.

Below you can see where this is enabled for snapshots created within the project.

ZFSSA Enable Snapshot Retention


Step 3 -  Create a snapshot schedule with "locked" snapshots


The next step is to create locked snapshots. This can be done at the project level (affecting all shares) or at the share level. In my example below I gave the scheduled snapshots a label "daily_snaps".  Notice for my example I am only keeping only 1 snapshot and I am locking the snapshot at the source. In order for the snapshot to be locked at the destination
  • Retention Policy MUST be enabled for the share (or inherited from the project).
  • The source snapshot MUST be locked when it is created
zfssa create snapshots

Step 4 -  Add replication to downstream ZFS in OCI

The next step is to add replication to the project  configuration to replicate the shares to my ZFS in OCI. Below you can see the target is my "downstream_zfs" that I configured in the "Remote Replication" service.
You can also see that I am telling the replication to "include snapshots", which are my locked snapshots, and also to "Retain user snapshots on target".  Under "Disaster Recovery" you can see that I am telling the downstream to keep a 30 day recovery point.  Even though I am only keeping 1 locked snapshot on the source, I want to keep 30 days of recovery on the downstream in OCI.

ZFSSA add replication

Step 5 -  Configure snapshots to replicate

In this step I am updating the replication action to replicate the locked scheduled snapshot to the downstream.  Notice that I changed the number of snapshots from 1 (on the source) to 30 on the destination, and I am keeping the snapshot retention locked. This will ensure that the daily locked snapshot taken on the source will replicate to the destination as a locked snapshot, and 30 snapshots on the destination will remain locked.  The 31st snapshot is no longer needed.

ZFSSA Autosnap replication


Step 6 -  Configure the replication schedule

The last step is to configure the replication schedule. This ensures that on a daily basis the snapshots that are configured to be replicated will be replicated regularly to the downstream. You can make this more aggressive than daily if you wish the downstream to be more in sync in the primary.  In my example below I configured the replication to occur every 10 minutes. This means that the downstream should have all updates as of 10 minutes ago or less. If I need to go back in time, I will have daily snapshots for the last 30 days that are locked and cannot be removed.

ZFSSA Replication Schedule

Step 7 -  Validate the replication


Now that I have everything configured I am going to take a look at the replicated snapshots on my destination.  I navigate to "shares" and I look under "replicat" and find my share. By clicking on the pencil and looking at the "snapshots" tab I can see my snapshot replicated over.

zfssa downstream copy

And when I click on the pencil next to the snapshot I can see that the snapshot is locked and I can't unlock it.

zfssa downstream locked



From there I can clone the snap and create a local snapshot, back it up to object storage, or reverse the replication if needed.