Friday, December 23, 2011

Monitoring Goldengate through sql

I have been working on implementing GoldenGate.  Golden Gate works with Oracle, but I wanted to be able to monitor it's status within an Oracle session, and then use a database link to remotely query the status.. Once I have all this in place, I want to create a web page through Apex, that does a union of all my GG environments onto a single screen.. real time ! nifty Eh.

This is how I went about doing it..

1)  The basis for it all is the "info all" command executed within GGSCI.  The output looks like this.

GGSCI (orclbox) 10> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING
EXTRACT     STOPPED     S_EXTR      00:00:00      00:10:38



in this example you can see that the manager is running, but the extract is stopped.  I took the output of this command and created my own command script..

ggsci_status.sh


#!/bin/bash
export ORACLE_SID=${ORA_SID}
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1 
export ADR_HOME=/u01/app/oracle/diag/rdbms/${DB_UNIQUE_NAME}/${ORACLE_SID}
export PATH=${ORACLE_HOME}/bin:${PATH_ORIG} 
export LD_LIBRARY_PATH=$ORACLE_HOME/lib 
export PATH=$PATH:/dbfs/gg/ggs11
 cd /dbfs/gg/ggs11
 
cd /dbfs/gg/ggs11 
ggsci </tmp/ggsci.log 2>/tmp/ggsci.log
info all
EOF

cat /tmp/ggsci.log  | grep EXTRACT
cat /tmp/ggsci.log  | grep MANAGER
cat /tmp/ggsci.log  | grep REPLICAT
 


The output of this script is the 2 lines from above that show the statuses of those processes.

MANAGER     RUNNING
EXTRACT     STOPPED     S_EXTR      00:00:00      00:10:38




So now for step 2... We have the statuses we are looking for as the output of a script.. What to do ?  External tables with a preprocessor (sorry I believe this is an 11.2 feature)..

First create a directory and put your script in that directory.  (this is within oracle). I called mine SP_COPY

Here is how I defined my table creation to read the output of the above script.



create table bgrenn.ggsci_status
(gg_type varchar(20),
 status  varchar(20),
 lag_time       varchar(20),
  checkpoint    varchar(20)
)
 organization external
(type oracle_loader
  default directory SP_COPY
  access parameters
(  
   records delimited by newline
   preprocessor SP_COPY:'ggsci_status.sh'
   badfile SP_COPY: 'ggsci_status.bad'
   logfile SP_COPY: 'ggsci_status.log'
    fields terminated by whitespace
     missing field values are null
   (
   gg_type,
   status,
   lag_time,
   checkpoint
  )
)
   location ('ggsci_status.sh')
)
reject limit unlimited;

 



Now select against this table and you will see the columns from the output of your script appear as columns in the table (2 rows for this example).


Finally .... Step 3.. create a database link to this database and do a select * from ggsci_status@mylink.

There you go.  How to dynamically show the status of Golden Gate processes through a database link on a remote database.


NOTE : If the database is a RAC cluster with GG running on only one node, you need to specify the SID to ensure you are looking at the correct node running GG.

Enjoy...

Sunday, December 18, 2011

Hadoop (again)

I know I've blogged in the past that I am working on implementing Hadoop. Here are 3 articles that should explain why.

 First
http://www.nytimes.com/2011/12/18/sunday-review/the-internet-gets-physical.html?_r=1&pagewanted=all

This is a great article from the NY times, explaining that sensors are turning up everywhere. Of course more sensors mean more data! Lots more data. So how do we collect all this data and process it ?? http://www.calxeda.com/ Calxeda !! These are 5 watt processors that can be scaled up to thousands of nodes (yes I said THOUSANDS). And I know what you are saying. So what do we do this data ? So what ?

Here is a great article on how to tie it all together.

http://blogs.oracle.com/datawarehousing/entry/understanding_a_big_data_implementation

So there you are.. I think this is the vision of the future, and if you are not looking at these technologies, and how they tie together, you are missing the next big leap that is happening in the IT field. Our jobs as IT specialists will become even more important as we become an integral part of all our companies (whatever it is) business process.

Saturday, December 17, 2011

FTS vs Index scan on Exadata

I loaded up some data on my exadata, and created a pimary key on the table. The table has about 1.8 Billion rows, and is about 186g of space.  I decided to do a select count on the table to see how many rows were really there.

select count(1) from sp.wxyz_detl;

The query took about 5 minutes and 36 seconds (336 seconds) .

I was astonished !  186g took almost 6 minutes ? Well on an exadata that seemed really slow.  I took a closer look at the plan.

-----------------------------------------------------------------------------------------------------
| Id  | Operation              | Name               | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                    |       |   653K(100)|          |       |       |
|   1 |  SORT AGGREGATE        |                    |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL  |                    |  1850M|   653K  (6)| 02:10:42 |     1 |    29 |
|   3 |    INDEX FAST FULL SCAN| PIDX_WXYZ_WRKR_ID  |  1850M|   653K  (6)| 02:10:42 |     1 |    29 |
-----------------------------------------------------------------------------------------------------




Well that looked good.  INDEX FAST FULL SCAN should offload to the cells right ? it says "full scan" what could be simpler..

I looked deeper at the V$SQLSTATS table, but it didn't show what I expected.

DISK_READS BUFFER_GETS EXECUTIONS CPU_TIME            ELAPSED_TIME   CELL_OFFLOAD_ELIG_BYTES PHYSICAL_READ_BY  IO_CELL_UNC_bytes OFFLOAD_RETURNED
---------- ----------- ---------- ---------------- ---------------- ------------------------ ----------------   ----------------   ----------------
 3,165,962   3,172,055          1      112,050,965      366,230,300                0            25,935,560,704                0                0


The Offload_eligable bytes is 0, and the IO_CELL_UNCOMPRESSED_BYTES is 0.

Huh ?

Now I figured I would force a full table scan and see what happens


select /*+ full(t) */ count(1) from spr.wxyz_detl t;


-------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |  1958K(100)|          |       |       |
|   1 |  SORT AGGREGATE             |           |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL       |           |  1850M|  1958K  (3)| 06:31:40 |     1 |    29 |
|   3 |    TABLE ACCESS STORAGE FULL| WXYZ_DETL |  1850M|  1958K  (3)| 06:31:40 |     1 |    29 |
-------------------------------------------------------------------------------------------------



Looks like the cost went up, the expected elapsed time went up.. this looks like a worse plan to the optimizer, but here are the stats from v$sqlstats.

DISK_READS BUFFER_GETS EXECUTIONS CPU_TIME         ELAPSED_TIME     CELL_OFFLOAD_ELIG_BYTES PHYSICAL_READ_BY  IO_CELL_UNC_bytes  OFFLOAD_RETURNED
---------- ----------- ---------- ---------------- ---------------- ------------------------ ---------------- ---------------- ----------------
 8,684,661   8,685,010          1       36,904,390       52,623,383           71,144,235,008   71,144,742,912   71,324,041,216   25,825,191,648




Wow, look at that.. Using the FULL hint caused the cell offload eligability to change, and the elapsed time is now 52 seconds.  Big change and exactly what I wanted.

I did some searching and came up with this blog with a similar issue, but the key was a reverse lookup.

http://martincarstenbach.wordpress.com/2011/08/22/why-is-my-exadata-smart-scan-not-offloading/


This is a very simple schema (one table, one PK).. The PK was necessary because we are using Golden Gate to insert the data, and we don't want duplicates.