Thursday, March 5, 2026

Recovery Service failure checks

 When using the Autonomous Recovery Service there are some prerequisites that need be met. I have a checklist that goes through these requirements, and you can find that checklist here.


This blog post will help you perform some basic debugging and demonstrate what errors you will see if you miss some of the steps.

This post is broken into two possible places where you will have issues.

  1. Unable to Submit request. This can be caused by
    • Policy issues
    • Limits issue
  2. You submitted backup, but it failed to configure the Recovery Service. This can be caused by
    • DNS issues with resolving FQDN used by Recovery Service
    • Routing/port issues accessing the Recovery Service or Object Storage

Unable to submit Autonomous Recovery Service as a backup location


Policies for the tenancy

The first step is to ensure that you have configured policies for the recovery service.  The easiest way to do this is by utilizing Policy Builder.

NOTE: There is a policy that grants access to the "ADMIN" group. If your administrator group is a different group, you would 

Visible Issue

 If policies are not configured properly, you find that "Recovery Service" is greyed out as an option.


Limits for the Recovery Service

By default if you are not in a multi-cloud environment your paid tenancy will have a limit of
  • 10 Database
  • 10 TB of backups storage
If you are using Multi-cloud, and your database is in partner cloud, there is no default limits.
By default there are no limits set.

This is the most common issue I see with multicloud.  You need to set the limit specifically for the multi-cloud subscription.

Visible Issue

 If limits  are not configured properly, you find that "Recovery Service" is greyed out as an option.

Below the choice for "Recovery service", you will see that there is a warning, telling you that you have exceeded your limits.


Backup request fails when configuring Autonomous Recovery Service 

There are few reasons why the backup request fails, and below is a step-by-step check that may help you determine what caused the failure.

NOTE: Since the recovery service creates endpoints within your VCN, and those endpoints are removed after the failure, you need to make these checks immediately after a failure to best determine the cause.

Pre-test -  Check the connection to the Recovery Service endpoints.

The best way to test the connection to the Recovery Service endpoints is to log in as the oracle user and  execute

[command line prompt]$ tnsping dbrs

NOTE: I tested this on BaseDB where there is a single database. On an ExaDB VM you might see the DB name included in TNS entry. If this fails, check the tnsnames.ora to find an entry that matches your database.

You should see a long output as tnsping attempts to contact the Recovery service using the endpoints.


[oracle@enbr ~]$ tnsping dbrs

TNS Ping Utility for Linux: Version 19.0.0.0.0 - Production on 05-MAR-2026 15:19:51

Copyright (c) 1997, 2025, Oracle.  All rights reserved.

Used parameter files:
/u01/app/oracle/product/19.0.0/dbhome_1/network/admin/sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION_LIST=(LOAD_BALANCE=off)(FAILOVER=on)(DESCRIPTION=(FAILOVER=on)(CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(TRANSPORT_CONNECT_TIMEOUT=3)(ADDRESS_LIST=(LOAD_BALANCE=on)(ADDRESS=(PROTOCOL=TCPS)(HOST=raiadp019-3.rs.br.us-ashburn-1.oraclecloud.com)(PORT=2484))(ADDRESS=(PROTOCOL=TCPS)(HOST=raiadp019-1.rs.br.us-ashburn-1.oraclecloud.com)(PORT=2484))(ADDRESS=(PROTOCOL=TCPS)(HOST=raiadp019-2.rs.br.us-ashburn-1.oraclecloud.com)(PORT=2484)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=ZRCV_K8ATNS4SJQRYCB8TH8WYBKU0WB90))(SECURITY=(MY_WALLET_DIRECTORY=/opt/oracle/dcs/commonstore/wallets/enbr_nmk_iad/server_seps)))(DESCRIPTION=(FAILOVER=on)(CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(TRANSPORT_CONNECT_TIMEOUT=3)(ADDRESS_LIST=(LOAD_BALANCE=on)(ADDRESS=(PROTOCOL=TCPS)(HOST=raiadp017-3.rs.br.us-ashburn-1.oraclecloud.com)(PORT=2484))(ADDRESS=(PROTOCOL=TCPS)(HOST=raiadp017-2.rs.br.us-ashburn-1.oraclecloud.com)(PORT=2484))(ADDRESS=(PROTOCOL=TCPS)(HOST=raiadp017-1.rs.br.us-ashburn-1.oraclecloud.com)(PORT=2484)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=ZRCV_ZIXYV66DA34OFQGRNHJODSUOR))(SECURITY=(MY_WALLET_DIRECTORY=/opt/oracle/dcs/commonstore/wallets/enbr_nmk_iad/server_seps))))

If you look at the "HOST" values in the connect string you will see that it is has  Fully Qualified Domain Name (FQDN) entries.  You can just pick one of the HOST values.
Those entries are associated with endpoints created in the subnet registered with the Autonomous Recovery Service.

An example name for my entries is.

raiadp019-1.rs.br.us-ashburn-1.oraclecloud.com

I am in Ashburn, and this is associated with an endpoint in Ashburn.

Below is the nslookup for for this FQDN.

Notice the following two items:
  1. The nslookup is going to the private DNS resolver in my VCN using 169.254.169.254 for the DNS server.
  2. The FQDN is resolving to an IP address that is an endpoint in my Recovery Service Subnet.  My subnet's CIDR is 10.0.17.0/24 .

[oracle@br ~]$ nslookup raiadp019-1.rs.br.us-ashburn-1.oraclecloud.com
Server:         169.254.169.254
Address:        169.254.169.254#53

Non-authoritative answer:
Name:   raiadp019-1.rs.br.us-ashburn-1.oraclecloud.com
Address: 10.0.17.193



Now let's execute the tnsping dbrs

Used parameter files:
/u01/app/oracle/product/19.0.0/dbhome_1/network/admin/sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION_LIST=(LOAD_BALANCE=off)(FAILOVER=on)(DESCRIPTION=(FAILOVER=on)(CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(TRANSPORT_CONNECT_TIMEOUT=3)(ADDRESS_LIST=(LOAD_BALANCE=on)(ADDRESS=(PROTOCOL=TCPS)(HOST=raiadp019-3.rs.br.us-ashburn-1.oraclecloud.com)(PORT=2484))(ADDRESS=(PROTOCOL=TCPS)(HOST=raiadp019-1.rs.br.us-ashburn-1.oraclecloud.com)(PORT=2484))(ADDRESS=(PROTOCOL=TCPS)(HOST=raiadp019-2.rs.br.us-ashburn-1.oraclecloud.com)(PORT=2484)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=ZRCV_K8ATNS4SJQRYCB8TH8WYBKU0WB90))(SECURITY=(MY_WALLET_DIRECTORY=/opt/oracle/dcs/commonstore/wallets/enbr_nmk_iad/server_seps)))(DESCRIPTION=(FAILOVER=on)(CONNECT_TIMEOUT=3)(RETRY_COUNT=3)(TRANSPORT_CONNECT_TIMEOUT=3)(ADDRESS_LIST=(LOAD_BALANCE=on)(ADDRESS=(PROTOCOL=TCPS)(HOST=raiadp017-3.rs.br.us-ashburn-1.oraclecloud.com)(PORT=2484))(ADDRESS=(PROTOCOL=TCPS)(HOST=raiadp017-2.rs.br.us-ashburn-1.oraclecloud.com)(PORT=2484))(ADDRESS=(PROTOCOL=TCPS)(HOST=raiadp017-1.rs.br.us-ashburn-1.oraclecloud.com)(PORT=2484)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=ZRCV_ZIXYV66DA34OFQGRNHJODSUOR))(SECURITY=(MY_WALLET_DIRECTORY=/opt/oracle/dcs/commonstore/wallets/enbr_nmk_iad/server_seps))))

TNS-12545: Connect failed because target host or object does not exist


TNS-12545: Connect failed because target host or object does not exist

This is typically caused by a DNS lookup failure. Start by performing an "nslookup" of one of the hosts in the connect string.
  • Did the lookup use the 169.254.169.254 DNS server ?
  • Did it get resolved ?
If it did not use this DNS server, and you have overridden this default DNS resolver (in /etc/resolv.conf) with another DNS server, you didn't properly configure your DNS to forward back to the Private DNS resolver on the VCN.
You can find more information on Private DNS resolvers here.

If it did use 169.254.169.254 and you see this error message, it might of already removed the endpoints and removed the DNS entries. You need to try to configure the Recovery Service again, and immediate check with the TNS Ping.

Being able to resolve the FQDN names returned by this command is essential for the service.

Hang 

If the "tnsping dbrs" hungs, that means that it was able to resolve the FQDN, but it is unable to reach the recovery service.

Let's find out why.

The script below will check remote connectivity when given the host and the port.

Let's check my host for ports 2484 and 8005


Network Port Checker

Bash script using curl -v to verify remote connectivity.

#!/bin/bash

HOST=$1
PORT=$2

if [ -z "$HOST" ] || [ -z "$PORT" ]; then
  echo "Usage: $0 <hostname> <port>"
  exit 1
fi

echo "Checking $HOST on $PORT..."

if curl -v --connect-timeout 3 "http://$HOST:$PORT" 2>&1 | grep -q "Connected to"; then
  echo "✅ Port $PORT is OPEN"
else
  echo "❌ Port $PORT is CLOSED"
fi

Below you can see that I can successfully reach both ports.

If a hang occurs then you need to make sure the ports are properly open.

[oracle@enbr ~]$ ./check_host.sh raiadp019-3.rs.br.us-ashburn-1.oraclecloud.com 2484
Testing connection to raiadp019-3.rs.br.us-ashburn-1.oraclecloud.com on port 2484...
-----------------------------------------------
✅ SUCCESS: Port 2484 is OPEN on raiadp019-3.rs.br.us-ashburn-1.oraclecloud.com
-----------------------------------------------
[oracle@enbr ~]$ ./check_host.sh raiadp019-3.rs.br.us-ashburn-1.oraclecloud.com 8005
Testing connection to raiadp019-3.rs.br.us-ashburn-1.oraclecloud.com on port 8005...
-----------------------------------------------
✅ SUCCESS: Port 8005 is OPEN on raiadp019-3.rs.br.us-ashburn-1.oraclecloud.com
-----------------------------------------------

Both are successful but onboarding still fails

The last item to check is access to object storage using the same port checker script. Below I am checking access to object storage in Ashburn

If this isn't successful, check to make sure you have a service gateway and the subnet has access to the service gateway.


[oracle@enbr ~]$ ./check_host.sh swiftobjectstorage.us-ashburn-1.oraclecloud.com 443
Testing connection to swiftobjectstorage.us-ashburn-1.oraclecloud.com on port 443...
-----------------------------------------------
✅ SUCCESS: Port 443 is OPEN on swiftobjectstorage.us-ashburn-1.oraclecloud.com
-----------------------------------------------

These steps should provide you a starting point to determine why there was a failure onboarding to the Autonomous Recovery Service.


No comments:

Post a Comment