Bryan's Oracle Blog

Friday, January 25, 2019

Implementing TDE

This blog post is about what happens when you implement TDE (Transparent Data Encryption) in your database.

I started with 2 other blog posts, but I soon realized that this is a HUGE topic with ramifications throughout the database system.

Well let's start at the beginning.
I wanted to know what happens when you implement TDE.
The 3 items that I keep hearing are

Encrypted data will not compress
Encrypted data will not dedupe
Implementing Advanced compression in the database will mitigate the loss of compression to encryption by compressing the data before encrypting

Now in order to investigate this further, I had some restrictions that affected my testing.

I had no access to a deduplicating appliance to test point 2
I didn't have a real data set to use. I ended up using the SOE schema in swingbench
The only way I could test the effect of compression was to compress the backupset. Most appliances (including the ZDLRA) break the file up into pieces and compress each piece. I couldn't emulate the affect of doing that.

Now for my testing dataset. I created a dataset in the SOE tablespace that in uncompressed. I then took that dataset and created a second copy of the schema with advanced compression in place in the SOE_COMPRESSED tablespace.

When I finished this I had 2 tablespaces comprised of 10 x 5 GB datafiles.

SOE --> uncompressed copy of data

SOE_COMPRESSED --> Advanced compression (ACO) copy of data

In the graphs above you can see that out of the 50 GB of allocated space
uncompressed --> 35.8 GB of used space
compressed --> 24.9 GB of used space

These are the datasets I am going to use throughout my testing.

I started by taking these 2 datasets and seeing what happens with the 2 full backup types
L0 --> Full backups saved as backupset
image copy --> Full copy of datafile.

Below is what I found when I compressed both these backup types.

This was very helpful to see what happens.

For my uncompressed dataset (35 GB), I was able to compress the file down to 20 GB.
For my uncompressed image copy (35 GB used) I was able to compress the file down to 18 GB
For my compressed dataset (24 GB), I was able to compress the file down to 18 GB
For my compressed image copy (24 GB), I was able to compress the file down to 18 GB.

So what I learned is no matter how I compressed the data, in the database (ACO), or just compress the backup, the final size is still ~18 GB.

Now we have some benchmarks on what happens when I just compress the backups.

I started with encrypting the image copy. This was one area that I wanted to learn more about for 2 reasons.

Many of the newer backup strategies use the "incremental merge" process that started with 10G of oracle. I was curious what happens to image copies when you implement TDE.
Many of the Flash arrays use compression to get more usable storage from the smaller SSD drives. Looking at what happens with compressing the datafile copies gives a lot of insight into how those flash arrays will be affected by TDE.

You can see from the above testing there was a very significant impact on the compressed size of the image copy. Both compressed and uncompressed data compressed nicely when it was unencrypted. After encrypting the dataset, the compressed size is about 3x larger than the compressed size.

This gives me the answer on how it will affect these 2 areas.

If using an "incremental merge" based backup that is stored on compressed storage, you will need 3x the amount of storage for your backups.
If using Flash storage for your database files that utilizes compression, you will need 3x the amount of database storage.

OK.. There's the first answer I was looking for.

Now let's see what happens with my Level 0 backups.

With level 0 backups, I can see the benefit of compressing the data first.

I can also see the effect of encryption.

Compressing the data first saved me a about a 1/3 of the size of the data, and about 1/3 of the size of the backupset compressed (once encrypted).

Now let's take a look of the effect of encryption on the change rate. I updated one of the tables in the database to create a change rate, and I looked at both the archive logs and the incremental backup.

Wow.. What stood out to me on this testing is the size of the archive logs. Adding Advanced Compression to the database decreased the size of the incremental backups (expected), but increased the size of the archive logs. Then when I looked at the compressed size (both with and without encryption) compressing the data in the database increased the size of the archive logs.

This showed me a lot of what happens with TDE. What I learned through this was.

Encrypted data indeed does not compress, in fact it can get much bigger.
Implementing Advanced Compression does not always mitigate the effects of encryption. Depending on the dataset, it might have very limited effect.
The compressed size of datafiles and image copy backups may increase by 3x or more depending on the amount of free space you typically have in your datafiles..

Thursday, January 3, 2019

Verifying you have a recoverable backup

As you probably know the best way to validate that you can recover your database to a specific point-in-time is to perform a point-in-time recovery of the database and successfully open it. That isn't always possible due to the size of the database, infrastructure necessary, and the time it takes to go through the process. To verify recoverability there are several commands that give you all the pieces necessary, but it is important to understand what each command does.

Restore Validate - The restore validate command can be performed for any database objects including the whole database to verify the files are valid. This command can be used to restore as of the current time (default), but you can also specify a previous point-in-time. The validate verifies that the files are available and also checks for corruption. By default it checks for physical corruption, but it can also check for logical corruption. This command reads through the backup pieces

Examples of restore validate for different cases.

verify current backup -

RMAN> RESTORE DATABASE VALIDATE;

RMAN> RESTORE ARCHIVELOG ALL VALIDATE;

This will verify that the last full backup (level 0) does not contain corruption, and it verifies that archivelogs exist throughout the recovery window, and that they do not contain corruption.

verify previous backup -

RMAN> RESTORE DATABASE VALIDATE UNTIL TIME "to_date('xx/xx/xx xx:xx:xx','mm/dd/yy hh24:mi:ss')";

This will verify that the full backup performed prior to the "until time" is available and does not contain corruption.

Restore Preview - The restore preview can be used to identify the backup pieces necessary for a restore AND recovery of the database. The preview will show all the datafile backups (both full and incremental), along with the archive logs needed. The restore preview does not check for corruption, it simply lists the backup pieces necessary.

Validate - The validate command can be performed on any database objects along with backupsets. Unlike restore, you cannot give it an "until time", you must identify the object you want to validate. By default it checks for physical corruption, but it can also check for logical corruption.

It is critical to understand exactly what all three commands do to ensure you can efficiently recover to a previous point-in-time. It takes a combination of 2 commands (plus an extra to be sure).

So let's see how I can test if I can recover to midnight on the last day of the previous month. My backup strategy is "Weekly full/Daily incremental" backups. I perform my full backups on Saturday night and keep my backups for 90 days. My archive logs are kept on disk for 20 days.

It is now 12/15/18 and I want to verify that I can recover my database to 23:59:59 on 11/30.

First I want to perform a restore validate of my database using until time.

RMAN > restore validate database UNTIL TIME "to_date('11/30/18 23:59:59','mm/dd/yy hh24:mi:ss')";
By looking through the logs, I can see the this command performed a validation of the Level 0 (full) backup I performed on 11/24/18. It did not validate any of the incremental backups or archive logs that I will need to recover the database.

Now I want to perform a restore validate of my archive logs using until time.
RMAN > restore validate archivelog UNTIL TIME "to_date('11/30/18 23:59:59','mm/dd/yy hh24:mi:ss')";
By looking through the logs, I can see that this command validated the backup of all my older archive logs (older than 20 days), and it validated the FRA copy of my archive logs that were 20 days or newer.
There are couple of issues with this that jump out at me.

1) My incremental backups are NOT validated. Yes I know I can recover to the point-in-time I specified, but all I verified is that it is possible by applying 6 days of archive logs.

2) I don't know if any of my newer backups of archive logs are corrupt. The validation of archive logs only validated backups of archivelogs that are NOT on disk. I still need to validate the archive log backups for archive logs that will age off from the FRA in a few days to be sure..

3) The process doesn't check or validate a backup of the controlfile. If I am restoring to another server (or if I lose my database completely) I want to make sure I have a backup of the controlfile to restore.

The only way to be sure that I have an efficient, successful recovery to this point in time is to combine the 2 commands and use that information to validate all the pieces necessary for recovery.
Here are the steps to ensure I have valid backups of all the pieces necessary to restore efficiently.

1) Perform a restore preview and write the output to a log file.

2) Go through the output from step 1 and identify all the "BS KEY" (backupset key) values.

3) Perform a "validate backupset xx;" using the keys from step 2 to ensure all datafile backups (both full and incremental backups) are valid.

4) Go through the output from step 1 and identify the archive log sequences needed.

5) Identify the backupsets that contain all log sequences needed (my restore preview pointed to the archive logs on disk).

6) Perform a "validate backupset xx;" using the keys from step 5.

7) Perform a restore controlfile to a dummy location using the UNTIL time.

As you can see the "restore validate" is not enough to ensure efficient recovery. It only ensures that the you have a valid RESTORE not a valid recovery.
The ZDLRA does constant recovery checks using the virtual full backups, and archive logs.
The ZDLRA ensures you can quickly AND successfully recover to any point-in-time within your retention window.

Friday, November 30, 2018

TDE–How to implement TDE in your database and what to think about (part 2)

This is the second part in a series on implementing TDE and what happens to the sizing.

At first my plan was to encrypt the dataset I created in my first post, but instead I compressed it.

At this point (and throughout this post), I am working with an un-encrypted dataset.

One of the first things to understand about Encryption is that encrypted data DOES NOT compress.
This is critical when understanding what happens when you implement TDE.

One way to save storage when implementing TDE is to implement encryption AND compression together.

In order to break down the affects of encryption on compressed data VS uncompressed data, I took my dataset (the SOE dataset from swing bench) and I compressed it. I implemented Advanced compression on the tables, and I compressed the indexes and rebuilt them.

I now have 2 copies of the same dataset. 1 is compressed, and 1 is not.

Now let's take a look at the sizing of the Data sets and I will go through the same backup procedures and see what happens.

SEGMENT_NAME              SEGMENT_TYPE Space Used uncompressed   Space Used Compressed SPACE_SAVINGS
------------------------- ------------ ------------------------- -------------------- -------------
ADDRESSES                 TABLE           3,392 MB                  3,264 MB                      3
ADDRESS_CUST_IX           INDEX             703 MB                    728 MB                     -3
ADDRESS_PK                INDEX             662 MB                    888 MB                    -34
CARDDETAILS_CUST_IX       INDEX             703 MB                    562 MB                     20
CARD_DETAILS              TABLE           2,048 MB                  1,600 MB                     21
CARD_DETAILS_PK           INDEX             662 MB                      0 MB                    100
CUSTOMERS                 TABLE           3,328 MB                  2,880 MB                     13
CUSTOMERS_PK              INDEX             443 MB                      0 MB                    100
CUST_ACCOUNT_MANAGER_IX   INDEX             417 MB                    272 MB                     34
CUST_DOB_IX               INDEX             528 MB                    280 MB                     47
CUST_EMAIL_IX             INDEX             975 MB                    280 MB                     71
CUST_FUNC_LOWER_NAME_IX   INDEX             683 MB                    280 MB                     58
INVENTORIES               TABLE             176 MB                    176 MB                      0
INVENTORY_PK              INDEX              18 MB                      0 MB                    100
INV_PRODUCT_IX            INDEX              16 MB                     12 MB                     24
INV_WAREHOUSE_IX          INDEX              16 MB                     12 MB                     24
ITEM_ORDER_IX             INDEX           2,000 MB                  1,770 MB                     11
ITEM_PRODUCT_IX           INDEX           1,768 MB                  1,301 MB                     26
LOGON                     TABLE           1,728 MB                  1,728 MB                      0
ORDERENTRY_METADATA       TABLE               0 MB                      0 MB                      0
ORDERS                    TABLE           3,968 MB                  2,816 MB                     29
ORDER_ITEMS               TABLE           6,976 MB                  4,992 MB                     28
ORDER_ITEMS_PK            INDEX           2,234 MB                      0 MB                    100
ORDER_PK                  INDEX             632 MB                      0 MB                    100
ORD_CUSTOMER_IX           INDEX             671 MB                    480 MB                     28
ORD_ORDER_DATE_IX         INDEX             752 MB                    439 MB                     41
ORD_SALES_REP_IX          INDEX             594 MB                    438 MB                     26
ORD_WAREHOUSE_IX          INDEX             709 MB                    438 MB                     38
PRD_DESC_PK               INDEX               0 MB                      0 MB                    100
PRODUCT_DESCRIPTIONS      TABLE               0 MB                      0 MB                      0
PRODUCT_INFORMATION       TABLE               0 MB                      0 MB                      0
PRODUCT_INFORMATION_PK    INDEX               0 MB                      0 MB                    100
PROD_CATEGORY_IX          INDEX               0 MB                      0 MB                      0
PROD_NAME_IX              INDEX               0 MB                      0 MB                      0
PROD_SUPPLIER_IX          INDEX               0 MB                      0 MB                   -100
WAREHOUSES                TABLE               0 MB                      0 MB                      0
WAREHOUSES_PK             INDEX               0 MB                      0 MB                    100
WHS_LOCATION_IX           INDEX               0 MB                      0 MB                   -100

Here is the total savings by compressing both tables and indexes with advanced compression.

Space Used uncompressed   Space Used Compressed  SPACE_SAVINGS
------------------------- ---------------------- -------------
  36,804 MB                 25,636 MB                       30

Now to compare this with the previous data uncompressed I am going to backup by tablespace. Below is the sizing of the backups. I used a tag to identify the backups.

-rw-rw----. 1 oracle oracle 26773323776 Nov 29 17:02 COMPRESSED_SOE.bkp
-rw-rw----. 1 oracle oracle 38441140224 Nov 29 17:04 UNCOMPRESSED_SOE.bkp
-rw-rw----. 1 oracle oracle 10987765760 Nov 29 18:35 BASIC_COMPRESSED_SOE.bkp
-rw-rw----. 1 oracle oracle 11135655936 Nov 29 18:36 BASIC_COMPRESSED_SOE_COMPRESSED.bkp
-rw-rw----. 1 oracle oracle 13360308224 Nov 29 20:12 MEDIUM_COMP_SOE_COMPRESSED.bkp
-rw-rw----. 1 oracle oracle 14383603712 Nov 29 20:12 MEDIUM_COMPRESSED_SOE_.bkp
-rw-rw----. 1 oracle oracle  9420791808 Nov 30 00:12 HIGH_COMP_SOE_COMPRESSED.bkp
-rw-rw----. 1 oracle oracle  9112944640 Nov 30 00:23 HIGH_COMPRESSED_SOE.bkp

Now I'm going to put that in a table and a chart to compare..

First the table of sizes

Now the chart

Now by looking at the chart it is apparent what happens with compression and the data.

Compression in the database reduced the size of the data by 30%
An uncompressed backupset matched the size of the data
Once I compressed the backupset, the difference is size was minimal.

** Bottom line - Compressing the data in the database saved on the uncompressed backupsize. Once the backupset is compressed the final size is about the same.

** Final conclusion -- Most modern backup appliances (ZDLRA, ZFS, DD) compress the backups. When using those appliances with unencrypted data, the final size is the same regardless of whether the data is compressed in the Database.

Now that I've looked at both compressed and uncompressed data at the DB and backupset I am going to compress the data. Next post.