Monday, December 23, 2019

Oracle - Alternate Destinations, FRA, and balancing 2 destinations


My previous post was on using Groups/Priorities when specifying an alternate destination for sending Redo to a remote destination. You can find it here.

This post covers 2 topics that come up whenever I talk about alternate destinations.

  • Does it balance 2 destinations with the same group/priority. and once it choses a destination, does it remain sticky to that destination and ignore other destinations with the same "group/priority"
  • Does my FRA know both destinations can be considered the same for "shipped" status?


First I will show you what I configured my tests.

In my example, I actually have both a standby database and a Far Sync instance for my primary database.

bsg18   --> Primary database
bsg18d --> Dataguard database
bsg18f  --> Far Sync database

This is what my 2 remote destinations look like in my configuration.


log_archive_dest_2 ='service="bsg18d" ASYNC NOAFFIRM max_failure=0 db_unique_name="bsg18d" group=1 priority=1 valid_for=(online_logfile,all_roles)';

log_archive_dest_3 ='Service="bsg18f" SYNC AFFIRM  max_failure=1 db_unique_name="bsg18f" group=1 priority=1 valid_for=(online_logfile,all_roles)';


You can see that both Dataguard (bsg18d) and Far Sync (bsg18f) are in "group=1 priority=1"

Multiple Destinations with the same Group and Priority - Does it balance them ?


First I'm going to look at where the logs are going.

   DEST_ID  SEQUENCE# S SHI APPLIED   DEL FAL
---------- ---------- - --- --------- --- ---
         2       1751 A YES NO        NO  NO
         2       1752 A YES NO        NO  NO
         2       1753 A YES NO        NO  NO


I can see that they are going to DEST_2, which is my dataguard instance.  I am going to shut it down, do a few log switches and see what happens.

   DEST_ID  SEQUENCE# S SHI APPLIED   DEL FAL
---------- ---------- - --- --------- --- ---
         2       1751 A YES YES       NO  NO
         1       1751 A YES NO        NO  NO
         1       1752 A YES NO        NO  NO
         2       1752 A YES YES       NO  NO
         1       1753 A YES NO        NO  NO
         2       1753 A YES YES       NO  NO
         1       1754 A YES NO        NO  NO
         2       1754 A YES NO        NO  NO
         1       1755 A YES NO        NO  NO
         1       1756 A YES NO        NO  NO
         1       1757 A YES NO        NO  NO
         1       1758 A YES NO        NO  NO
         1       1759 A YES NO        NO  NO
         1       1760 A YES NO        NO  NO


I've including DEST_1 this time to show that my sequence# had moved forward, but any sending of redo my standby has stopped.

Now I started it back up, and did a few log switches.

  DEST_ID  SEQUENCE# S SHI APPLIED   DEL FAL
---------- ---------- - --- --------- --- ---
         2       1751 A YES YES       NO  NO
         2       1752 A YES YES       NO  NO
         2       1753 A YES YES       NO  NO
         2       1754 A YES YES       NO  NO
         2       1761 A YES NO        NO  NO
         3       1755 A YES YES       NO  YES
         3       1758 A YES NO        NO  YES
 .....
         3       1754 A YES NO        NO  YES
         3       1752 A YES NO        NO  YES
         3       1762 A YES NO        NO  NO
         3       1763 A YES NO        NO  NO


After restarting the database, I can see that it is now using DEST_3 (my Far Sync) instance instead of  DEST_2 (dataguard).

Now for a final test on this, I am going to STOP my Far Sync instance.

   DEST_ID  SEQUENCE# S SHI APPLIED   DEL FAL
---------- ---------- - --- --------- --- ---
         2       1751 A YES YES       NO  NO
         2       1752 A YES YES       NO  NO
         2       1753 A YES YES       NO  NO
         2       1754 A YES YES       NO  NO
         2       1761 A YES YES       NO  NO
         3       1755 A YES YES       NO  YES
         3       1758 A YES NO        NO  YES
 ....
         3       1754 A YES YES       NO  YES
         3       1752 A YES YES       NO  YES
         3       1762 A YES NO        NO  NO
         3       1763 A YES NO        NO  NO
         2       1764 A YES NO        NO  YES
         2       1765 A YES NO        NO  NO
         2       1766 A YES NO        NO  NO


I stopped bsg18f (Far Sync), when it was on sequence 1764.  I can see that the database automatically switched to sending logs to bsg18d (dataguard) when DEST_3 because unavailable. 

This is exactly what I would expect to happen when I have 2 destinations with the same Group/Priority.
  • It chooses one of the 2 destinations and becomes "sticky" once chosen.
  • If the "sticky" destination becomes unavailable, it then automatically switches to the second destination.

FRA - If you use an FRA (Fast Recovery Area), how does it understand that I my redo can go to an alternate location?

Now after all this, you can see from my previous tests, some logs were sent from DEST_2 and some were sent from DEST_3.  Both of these were members of the same group.
Let's see if the FRA sees them as shipped (my retention policy).

FILE_TYPE               PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES     CON_ID
----------------------- ------------------ ------------------------- --------------- ----------
CONTROL FILE                             0                         0               0          0
REDO LOG                                 0                         0               0          0
ARCHIVED LOG                         21.78                     21.78             555          0
BACKUP PIECE                          7.54                       .45              38          0
IMAGE COPY                               0                         0               0          0
FLASHBACK LOG                            0                         0               0          0
FOREIGN ARCHIVED LOG                     0                         0               0          0
AUXILIARY DATAFILE COPY                  0                         0               0          0



Perfect !  All the archive log space is reclaimable.  When I have different remote destinations that are members of the same group, as long as the Redo Log was successfully shipped to a destination in the group, the log is eligible for deletion.

Thursday, December 19, 2019

ZDLRA alternate redo destination replaced with Groups/Priority in 12.2

Alternate destination is depreciated in 12.2 for remote locations.

This diagram shows the use of an alternate destination for sending logs.
An alternate destination can be local to the database (to be used if the disk location fills up), but they can also be used to change the remote destination that the logs get sent to.

Configuring an "ALTERNATE" for a local destination is still supported

Below is what this looks like from the 18C manual.

LOG_ARCHIVE_DEST_1='LOCATION=/disk1 MANDATORY MAX_FAILURE=1 
ALTERNATE=LOG_ARCHIVE_DEST_2'
 
LOG_ARCHIVE_DEST_STATE_1=ENABLE
 
LOG_ARCHIVE_DEST_STATE_2=ALTERNATE

LOG_ARCHIVE_DEST_2='LOCATION=/disk2 MANDATORY'

In this example, if /disk1 is not available (fills up, etc). the database will switch to sending logs to /disk2 so that the database will not get hung.

HOWEVER, for remote destinations AKA destinations using "SERVICE" , this changes with 12.2

12.2 introduces the idea of "GROUP" and "PRIORITY". in place of a simple "ALTERNATE".
I'm sure this was added for Far Sync but it also comes into play with the ZDLRA.

Using only "ALTERNATE" gives you the ability to switch to a single remote destination. Using GROUP and PRIORITY lets you set multiple destinations and decide the order in which they are used.

Now let's start with simple example with FAR SYNC.
I have 2 locations. San Francisco (SF) is Primary with 2 local farsync servers, and my remote destination is New York (NYC).


I can group them together and set the priority that they use. I can even switch from sync to async.  Below is what the new syntax would look like for the destinations.

LOG_ARCHIVE_DEST_2='SERVICE=FARSYNC1 SYNC GROUP=1 PRIORITY=1'
LOG_ARCHIVE_DEST_3='SERVICE=FARSYNC2 SYNC GROUP=1 PRIORITY=1'
LOG_ARCHIVE_DEST_4='SERVICE=NYC ASYNC GROUP=1 PRIORITY=2’

This behaves in the following manner.


  • My database in San Francisco sends archive logs to EITHER of the two services specified as PRIORITY=1.  The log could go to FARSYNC1 or FARSYNC2 and get passed on to NYC.
  • If either of these 2 PRIORITY=1 destinations becomes unavailable the other will continue to work.
  • If both of these 2 PRIORITY=1 destinations become unavailable, the redo will be sent to NYC as ASYNC.
  • If NYC is the destination (because of a double failure), it will switch back to the either of the 2 FARSYNC services when they become available.
You can add up to 8 layers of PRIORITY and include a remote FARSYNC server if you would like.

Now, what does this have to do with ZDLRA ?

You can also use an alternate destination for redo traffic on the ZDLRA.  The alternate can be the downstream ZDLRA if replication is used, or it can be the upstream ZDLRA if store-and-forward is being used.  These are outlined in the ZDLRA admin guide under "Implementing Additional High Availability Strategies".

In the case of the ZDLRA you would use 2 destinations.
The first destination is the primary ZDLRA, and the second destination is the alternate ZDLRA.
The configuration would look like this.


LOG_ARCHIVE_DEST_2='SERVICE=ZDLRA_SF_SCAN:1521/ZDLRASF:dedicated" ASYNC GROUP=1 PRIORITY=1'
LOG_ARCHIVE_DEST_3='SERVICE=ZDLRA_NYC_SCAN:1521/ZDLRANYC:dedicated" ASYNC GROUP=1 PRIORITY=2'



This is an interesting change to the ALTERNATE setting on remote destinations.



Tuesday, November 26, 2019

Dataguard and LOG_ARCHIVE_MAX_PROCESSES


Any DBA who has set up Dataguard has seen this diagram of how the sending of logs to standby database works.

What I wanted to go through is the use of LOG_ARCHIVE_MAX_PROCESSES and how how it affects the sending of redo logs.

If you have set this parameter in your environment it is typically because you are seeing issues with the database cleaning out redo logs.
The primary reason this gets set from the default (4) is to add more processes capable of clearing out redo logs and making them archive logs.  The more processes running in parallel to clean out redo logs, the faster  the database can move on to start a new redo log.

But did you know the LOG_ARCHIVE_MAX_PROCESSES parameter also affects how many processes are used on your standby database to fill any gaps ?

In my example I have a primary database (BSG18) which on 18.3.0.  I also have a standby database (BSG18d).

In my primary database I set the parameter to 10.

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_max_processes            integer     10


I then shutdown my dataguard database, and performed many log switches. I needed at least 10 log switches to demonstrate what happens.

Now once I start up my standby database, the LOG_ARCHIVE_MAX_PROCESSES is used to determine how many processes on my standby database to fill the gap.

Here is my query from V$DATAGUARD_PROCESS

select name,
           pid "standby PID",
           role, client_pid "Primary DB PID",
           client_role
    from V$DATAGUARD_PROCESS
    where type='NET'
order by 2;


NAME  standby PID              ROLE                     Primary DB PID CLIENT_ROLE     
----- ------------------------ ------------------------ -------------- ----------------
rfs   18116                    RFS async                          4165 async ORL multi 
rfs   26337                    RFS ping                           4153 gap manager     
rfs   26401                    RFS archive                        4159 archive gap     
rfs   26404                    RFS archive                        8752 archive gap     
rfs   26408                    RFS archive                        8738 archive gap     
rfs   26411                    RFS archive                        8750 archive gap     
rfs   26413                    RFS archive                        8740 archive gap     
rfs   26415                    RFS archive                        8748 archive gap     
rfs   26417                    RFS archive                        8742 archive gap     
rfs   26419                    RFS archive                        8744 archive gap     
rfs   26421                    RFS archive                        8746 archive gap     


If you look at the output, you can see all the pieces fit together.


Real Time redo process


The first process sends the redo information to the standby database in async mode, and is written to the standby redo logs.


oracle    4165     1  0 Nov05 ?        00:05:57 ora_tt02_bsg18


We can see on the primary database it is a "tt" process and starts up with the database.

oracle   18116     1  0 Nov12 ?        00:07:58 oraclebsg18d (LOCAL=NO)


and for the standby database , it is a sqlnet process.

Gap Manager process


Now let's look at the second process.

oracle    4153     1  0 Nov05 ?        00:00:26 ora_tt00_bsg18


We can see on the primary database it is also a "tt" process and also starts up with the database.

oracle   26337     1  0 Nov12 ?        00:00:18 oraclebsg18d (LOCAL=NO)

and for the standby database , it is a sqlnet process.


Gap fill processes


Now let's look at the remaining processes.


oracle    4159     1  0 Nov05 ?        00:00:14 ora_arc1_bsg18
oracle    8738     1  0 Nov05 ?        00:00:13 ora_arc2_bsg18
oracle    8740     1  0 Nov05 ?        00:00:14 ora_arc3_bsg18
oracle    8742     1  0 Nov05 ?        00:00:13 ora_arc4_bsg18
oracle    8744     1  0 Nov05 ?        00:00:13 ora_arc5_bsg18
oracle    8746     1  0 Nov05 ?        00:00:13 ora_arc6_bsg18
oracle    8748     1  0 Nov05 ?        00:00:13 ora_arc7_bsg18
oracle    8750     1  0 Nov05 ?        00:00:13 ora_arc8_bsg18
oracle    8752     1  0 Nov05 ?        00:00:13 ora_arc9_bsg18


When I look at the primary database, I can see 9 "arc" processes.
Why 9 processes? because LOG_ARCHIVE_MAX_PROCESSES of 10 tells the primary database that it can use 10 processes to fill the gap on the standby.
 The 1st process is the gap manager "tt', and the remaining 9 processes "arc" are used to send over the archive logs.

Increasing the LOG_ARCHIVE_MAX_PROCESSES can be used to increase the parallelization of sending archive logs to fill any gaps.