Tuesday, November 12, 2013

Oracle 12c PL/SQL improvements.


Last week I was giving a presentation for the UNYOUG (Upstate NY Oracle users Group) and I talked about the new features in 12c (along with In-Memory database).

I thought I would share some thoughts after the meeting.

I went through Tom Kyte's top 12 new features, and surprisingly the top feature that excited people was the PL/SQL improvements. 

The PL/SQL improvements had to do with the ability to write a PL/SQL as part of the query.

Lets say currently with 11g you have a function that includes the days between 2 dates.

  CREATE OR REPLACE FUNCTION Days_Between 
             (first_dt DATE, second_dt DATE)
                RETURN NUMBER IS 
       dt_one NUMBER
       dt_two NUMBER
BEGIN 
      dt_one := TO_NUMBER(TO_CHAR(first_dt, 'DDD')); 
      dt_two := TO_NUMBER(TO_CHAR(second_dt, 'DDD'));  
           RETURN (dt_two - dt_one); 
 END Days_Between;   

select Days_between(start_date,end_date) from mytable;



The problem is that in order to test this function you need to create the function.  There are multiple issues that developers face with having to do this.
  1. Developers often don't have the authority to create/change functions, especially if they need to be owned by a different schema
  2. Replacing the current function affects other users and this may not be desirable while debugging changes.
  3. Testing against production data is often not possible because of authorization, and collision issues.

The answer in 12c is the ability to include a function in the "WITH" clause.. The above would become


WITH  FUNCTION Days_Between
             (first_dt DATE, second_dt DATE)
                RETURN NUMBER IS
       dt_one NUMBER;
       dt_two NUMBER;
BEGIN
      dt_one := TO_NUMBER(TO_CHAR(first_dt, 'DDD'));
      dt_two := TO_NUMBER(TO_CHAR(second_dt, 'DDD'));
           RETURN (dt_two - dt_one);
 END Days_Between;  
select Days_between(start_date,end_date) from mytable;

So. what about Procedures you ask ? You can also include procedures in the mix.  The main purpose of doing this is to include any procedures that are invoked from the function.  This way you can include all the "dependencies" in the with clause. 

Finally, I read an article talking about how much this improves performance too.

http://www.oracle-base.com/articles/12c/with-clause-enhancements-12cr1.php#plsql-support

but to the developers I talked to the big advantage was with the ability to test..

As far as performance gains, I don't know how much I would put reusable code (like functions) directly into a sql statement. It would be a bear to support any changes to a "common function" defined multiple places.
 

 

Wednesday, January 9, 2013

Performance and Indexes with oracle

It's been a while since I've written a post, but twice this week, the same issue came up..

The story goes like this.. "I have a query that is using indexes not FTS, but it is much slower then expected".  It seems most folks have it drummed into their heads that Indexes are fastest.

This is the statement I've made twice....

"The only thing worse than a FTS is an index lookup of the whole table."  I figured I would show you what I mean.

First I create 2 tables..

TEST_TABLE with 76,989 rows of data.  There is a primary key.
DRIVER  with the same rows of test_table.

This is the 2 test I did.

1) I created DRIVER with 1 row, and analyzed it , then I deleted the row, and inserted all the rows from TEST_TABLE.
2) I ran the following query which should return every row.

select  /*+ GATHER_PLAN_STATISTICS */  count(distinct a.capture_time)
from test_table a,
     driver b
where a.pval=b.pval
and b.instance_number < 20;

So what happens ?  since the statistics on DRIVER say there is only row you can see the plan and actual vs estimated below.







----------------------------------------------------------------------------------------------        
| Id  | Operation                       | Name       | Starts | E-Rows | A-Rows |   A-Time   |    
----------------------------------------------------------------------------------------------   

PLAN_TABLE_OUTPUT                                                                                                                           
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |            |      1 |        |      1 |00:00:00.86 |
|   1 |  SORT AGGREGATE                 |            |      1 |      1 |      1 |00:00:00.86 |
|   2 |   VIEW                          | VW_DAG_0   |      1 |      1 |  77961 |00:00:01.28 |
|   3 |    HASH GROUP BY                |            |      1 |      1 |  77961 |00:00:01.01 |
|   4 |     NESTED LOOPS                |            |      1 |        |  77989 |00:00:01.00 |
|   5 |      NESTED LOOPS               |            |      1 |      1 |  77989 |00:00:00.58 |
|*  6 |       TABLE ACCESS FULL         | DRIVER     |      1 |      1 |  77989 |00:00:00.07 |
|*  7 |       INDEX UNIQUE SCAN         | PVAL_KEY   |  77989 |      1 |  77989 |00:00:00.21 |
|   8 |      TABLE ACCESS BY INDEX ROWID| TEST_TABLE |  77989 |      1 |  77989 |00:00:00.22 |
----------------------------------------------------------------------------------------------
                                                                                                

Since the Optimizer is only expecting 1 row to come back from the table DRIVER, it does an index lookup on TEST_TABLE, for every row. Notice the actual number of rows is the full table.

Now lets look at the cost of this index lookup.


select  /*+ GATHER_PLAN_STATISTICS */  count(distinct a.capture_time)
from test_table a,
     driver b
where a.pval=b.pval
and b.instance_number < 20

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.84       0.85          0      83932          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.84       0.85          0      83932          0           1


OK.  so the cost of the index lookup row-by-row is .85 seconds elapsed time.

Now after analyzing the DRIVER table, you can see the plan changed to a FTS.


------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Starts | E-Rows | A-Rows |   A-Time   |
------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT                                                                                                                           
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |      1 |        |      1 |00:00:00.16 |
|   1 |  SORT AGGREGATE       |            |      1 |      1 |      1 |00:00:00.16 |
|   2 |   VIEW                | VW_DAG_0   |      1 |  76723 |  77961 |00:00:00.40 |
|   3 |    HASH GROUP BY      |            |      1 |  76723 |  77961 |00:00:00.23 |
|*  4 |     HASH JOIN         |            |      1 |  76723 |  77989 |00:00:01.23 | 
|*  5 |      TABLE ACCESS FULL| DRIVER     |      1 |  76723 |  77989 |00:00:00.22 |
|   6 |      TABLE ACCESS FULL| TEST_TABLE |      1 |  79110 |  77989 |00:00:00.22 |
------------------------------------------------------------------------------------
                                                                                              


Notice the actual rows, and estimates match.  You can also see it is a FTS.

Now for the run time stats with the FTS.

select  /*+ GATHER_PLAN_STATISTICS */  count(distinct a.capture_time)
from test_table a,
     driver b
where a.pval=b.pval
and b.instance_number < 20

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.15       0.15          0       1390          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.15       0.15          0       1390          0           1

     

WOW.. look at that the FTS took .15 seconds compared to .85 seconds.

The bottom line is.. The next time you talk to developers, and they think their query should be fast because they are using an index, look deeper.  The biggest clue is to look at the Estimate vs Actual for the plan.  The index might not be what you want. FTS's can be good.




Monday, November 12, 2012

Big Data and Privacy

I am writing an editorial (which is unusual for me). This was caused by a conversation I had with my Dad about how the world is changing with big data, and how much retiree's (like him) should know.  So here goes.

Dear Dad,

  I know you are involved in college education for Older folks that have a passion to learn.  An interesting topic would be "Big Data".  For someone outside the IT field, I would tell you that Big Data describes the plethora of new data that is generated in today's society.
Where does it come from ??


  • Logs from webservers 
  • Cell phones (including location data).
  • Search data
  • Medical data
  • Machine generated data (like from your computer in your car)
  • Sales data
All this can be tied together in new ways, that most people didn't think were possible years ago.  My favorite example is... You are walking in the mall past past a store, and you get a text on your phone.  The store's computer system has texted you a 30% offer for a new sweater, good for 2 hours.  The store knows ...
  • Your location from phone
  • That you have a ski trip planned for the next week, from your search history, and purchase history
  • That you are looking for a new sweater from your facebook, and twitter posts.
Amazing huh ?

All these things can open up miraculous possibilities. The day may come when your doctor calls you to schedule a preventive test.  Medical history gathered from a large group of people has shown that others with a similar medical history as yours have had a issue that can be tested for and prevented. Wow. really amazing things.

They can predict something that may go wrong on your car given all the information the computer has gathered.

Target (the department store), has even predicted when a woman is pregnant based on purchase history.. 

All these things are amazing possiblities, but they are also scary.

I find this topic very exciting, but I'm sure for a lot of people these ideas are very scary. Where do you cross the line and enter privacy issues.

This is going to be a interesting battle. Who ones all this data ? what data is public, and what data is private ?

But the most interesting thing is what can be done with the data....

Me, I'm an optimist so I can all the good that can come out of all this new data.

At the very least, the students would at least have a better idea what is going on in the world (behind the scenes), and they would understand why Target things "grandma" is pregnant.