Sunday, December 18, 2011

Hadoop (again)

I know I've blogged in the past that I am working on implementing Hadoop. Here are 3 articles that should explain why.

 First
http://www.nytimes.com/2011/12/18/sunday-review/the-internet-gets-physical.html?_r=1&pagewanted=all

This is a great article from the NY times, explaining that sensors are turning up everywhere. Of course more sensors mean more data! Lots more data. So how do we collect all this data and process it ?? http://www.calxeda.com/ Calxeda !! These are 5 watt processors that can be scaled up to thousands of nodes (yes I said THOUSANDS). And I know what you are saying. So what do we do this data ? So what ?

Here is a great article on how to tie it all together.

http://blogs.oracle.com/datawarehousing/entry/understanding_a_big_data_implementation

So there you are.. I think this is the vision of the future, and if you are not looking at these technologies, and how they tie together, you are missing the next big leap that is happening in the IT field. Our jobs as IT specialists will become even more important as we become an integral part of all our companies (whatever it is) business process.

Saturday, December 17, 2011

FTS vs Index scan on Exadata

I loaded up some data on my exadata, and created a pimary key on the table. The table has about 1.8 Billion rows, and is about 186g of space.  I decided to do a select count on the table to see how many rows were really there.

select count(1) from sp.wxyz_detl;

The query took about 5 minutes and 36 seconds (336 seconds) .

I was astonished !  186g took almost 6 minutes ? Well on an exadata that seemed really slow.  I took a closer look at the plan.

-----------------------------------------------------------------------------------------------------
| Id  | Operation              | Name               | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                    |       |   653K(100)|          |       |       |
|   1 |  SORT AGGREGATE        |                    |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL  |                    |  1850M|   653K  (6)| 02:10:42 |     1 |    29 |
|   3 |    INDEX FAST FULL SCAN| PIDX_WXYZ_WRKR_ID  |  1850M|   653K  (6)| 02:10:42 |     1 |    29 |
-----------------------------------------------------------------------------------------------------




Well that looked good.  INDEX FAST FULL SCAN should offload to the cells right ? it says "full scan" what could be simpler..

I looked deeper at the V$SQLSTATS table, but it didn't show what I expected.

DISK_READS BUFFER_GETS EXECUTIONS CPU_TIME            ELAPSED_TIME   CELL_OFFLOAD_ELIG_BYTES PHYSICAL_READ_BY  IO_CELL_UNC_bytes OFFLOAD_RETURNED
---------- ----------- ---------- ---------------- ---------------- ------------------------ ----------------   ----------------   ----------------
 3,165,962   3,172,055          1      112,050,965      366,230,300                0            25,935,560,704                0                0


The Offload_eligable bytes is 0, and the IO_CELL_UNCOMPRESSED_BYTES is 0.

Huh ?

Now I figured I would force a full table scan and see what happens


select /*+ full(t) */ count(1) from spr.wxyz_detl t;


-------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |  1958K(100)|          |       |       |
|   1 |  SORT AGGREGATE             |           |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL       |           |  1850M|  1958K  (3)| 06:31:40 |     1 |    29 |
|   3 |    TABLE ACCESS STORAGE FULL| WXYZ_DETL |  1850M|  1958K  (3)| 06:31:40 |     1 |    29 |
-------------------------------------------------------------------------------------------------



Looks like the cost went up, the expected elapsed time went up.. this looks like a worse plan to the optimizer, but here are the stats from v$sqlstats.

DISK_READS BUFFER_GETS EXECUTIONS CPU_TIME         ELAPSED_TIME     CELL_OFFLOAD_ELIG_BYTES PHYSICAL_READ_BY  IO_CELL_UNC_bytes  OFFLOAD_RETURNED
---------- ----------- ---------- ---------------- ---------------- ------------------------ ---------------- ---------------- ----------------
 8,684,661   8,685,010          1       36,904,390       52,623,383           71,144,235,008   71,144,742,912   71,324,041,216   25,825,191,648




Wow, look at that.. Using the FULL hint caused the cell offload eligability to change, and the elapsed time is now 52 seconds.  Big change and exactly what I wanted.

I did some searching and came up with this blog with a similar issue, but the key was a reverse lookup.

http://martincarstenbach.wordpress.com/2011/08/22/why-is-my-exadata-smart-scan-not-offloading/


This is a very simple schema (one table, one PK).. The PK was necessary because we are using Golden Gate to insert the data, and we don't want duplicates.

Thursday, December 15, 2011

Are my stats old for my query?

If you work someplace like I do you hear this all the time..
"my query is running long, can you gather stats?"

Of course the person saying this is bringing this up because somewhere, somehow it worked when reanalyzed long ago... so it's going to work again right ?  It's not like any of the users are Graduate students majoring in statistical analysis at a prestegious college like my wonderful wife (hi Jo).

Well, as we all know, that isn't always the answer, and I was looking for a faster way to tell.. I have query X and I know the SQL_ID, but are any of the statistics stale ??

Here is a great query I came up with..

set linesize 170
set pagesize 150
select table_name object_name,object_type,last_analyzed,
to_char(row_count,'999,999,999') row_count,
to_char(inserts,'999,999,999') inserts,to_char(updates,'999,999,999') updates,to_char(deletes,'999,999,999') deletes,
case when (nvl(row_count,0)/10 < nvl(inserts,0)+nvl(updates,0)+nvl(deletes,0))
then 'Y'
else 'N'
end "stale?",
case row_count
   when null then null
   when 0 then null
   else to_char(((nvl(inserts,0)+nvl(updates,0)+nvl(deletes,0))/nvl(row_count,0) *100),'99.99') || '%'
end "%change"
from
(
select  distinct object_name table_name, 'TABLE                                  ' object_type,
        (select last_analyzed from dba_tables where table_name=object_name and owner=object_owner) last_analyzed,
        (select num_rows from dba_tables where table_name=object_name and owner=object_owner) row_count,
        (select inserts from dba_tab_modifications where table_name=object_name and table_owner=object_owner) inserts,
         (select updates from dba_tab_modifications where table_name=object_name and table_owner=object_owner) updates,
        (select deletes from dba_tab_modifications where table_name=object_name and table_owner=object_owner) deletes,
        object_name sort_column
 from sys.dba_hist_sql_plan 
where sql_id='31v8553cuza05'
and object_name in (select table_name from dba_tables where table_name=object_name)
union
select  distinct object_name ,  'INDEX on ' || (select table_name from dba_indexes where index_name=object_name and owner=object_owner)  index_name,
        (select last_analyzed from dba_indexes where index_name=object_name and owner=object_owner) last_analyzed,
        (select num_rows from dba_indexes where index_name=object_name and owner=object_owner) row_count,
        null inserts,
        null updates,
        null deletes,
        (select table_name from dba_indexes where index_name=object_name and owner=object_owner)  sort_column
 from sys.dba_hist_sql_plan 
where sql_id='31v8553cuza05'
and object_name in (select index_name from dba_indexes where index_name=object_name)
)
order by sort_column,object_type desc;



and here is the output..  You can see I organized it by object in alphabetical order.   Indexes are sorted with the tables that they are on so they get grouped together.


Here is what the output looks like.

OBJECT_NAME                     OBJECT_TYPE                             LAST_ANALYZED       ROW_COUNT    INSERTS      UPDATES      DELETES      s %change
------------------------------- --------------------------------------- ------------------- ------------ ------------ ------------ ------------ - -------
CAR_TAB                         TABLE                                   2011-11-16 03:00:12           77                                        N    .00%
PK_CAR_TAB                      INDEX on CAR_TAB                        2011-11-16 03:00:13           77                                        N    .00%
CAR_DEFD_WORK_TAB               TABLE                                   2011-11-16 03:00:13           61                                        N    .00%
PK_CAR_DEFD_WORK_TAB            INDEX on CAR_DEFD_WORK_TAB              2011-11-16 03:00:13           61                                        N    .00%
CO_CAR_TAB                      TABLE                                   2011-12-01 11:19:11       27,998           94          739            0 N   2.98%
CO_EXEC_TAB                     TABLE                                   2011-11-16 03:00:57       32,679          187            2           21 N    .64%
D$VANR_TAB                      TABLE                                   2011-12-15 15:40:53            0                                        N
DIM_CLIENT                      TABLE                                   2011-12-13 22:11:51       27,203            3           22            0 N    .09%
ELEC_CMMN_ADDR                  TABLE                                   2011-11-16 03:01:06      375,095        1,949            0          171 N    .57%
PK_ELEC_CMMN_ADDR               INDEX on ELEC_CMMN_ADDR                 2011-11-16 03:01:06      375,095                                        N    .00%
ENTY_CMMN_ADDR                  TABLE                                   2011-11-16 03:01:11    2,234,749        9,221        7,755          722 N    .79%
ENTY_CMMN_ADDR_VT               TABLE                                   2011-11-16 03:01:14    2,249,575        9,395          648          722 N    .48%
IDX_ECAV_ENCA_ID                INDEX on ENTY_CMMN_ADDR_VT              2011-11-21 16:20:10    2,252,376                                        N    .00%
MAP_AREA                        TABLE                                   2011-11-16 03:01:24        4,835           11          342            0 N   7.30%
PK_MAP_AREA                     INDEX on MAP_AREA                       2011-11-16 03:01:24        4,835                                        N    .00%
INDEP_CNTRC                     TABLE                                   2011-11-16 03:01:31       17,879          241            0           32 N   1.53%
INSR_ROST_ELIG_CLSF             TABLE                                   2011-11-16 03:01:31            0                                        N
PK_INSR_ROST_ELIG_CLSF          INDEX on INSR_ROST_ELIG_CLSF            2011-11-16 03:01:31            0                                        N
J$VANR                          TABLE                                   2011-12-15 22:03:51          212                                        N    .00%
SLVR_LKUP_VAL                   TABLE                                   2011-11-16 03:01:41        2,536           24           19            2 N   1.77%
PK_SLVR_LKUP_VAL                INDEX on SLVR_LKUP_VAL                  2011-11-16 03:01:41        2,536                                        N    .00%
OPT_VAL                         TABLE                                   2011-11-16 03:01:45          628           43           16            0 N   9.39%
PK_OPT_VAL                      INDEX on OPT_VAL                        2011-11-16 03:01:45          628                                        N    .00%
REG_NM                          TABLE                                   2011-11-16 03:02:00      257,597        2,436        2,501           44 N   1.93%
REG_NM_VT                       TABLE                                   2011-11-16 03:02:02      260,111        2,513          630           44 N   1.23%
REG_ROLE                        TABLE                                   2011-11-16 03:02:05       87,808          526          239           18 N    .89%
PK_REG_ROLE                     INDEX on REG_ROLE                       2011-11-16 03:02:05       87,808                                        N    .00%
WOMN                            TABLE                                   2011-11-16 03:02:40      642,408        1,854           52           66 N    .31%
PK_WOMN                         INDEX on WOMN                           2011-11-16 03:02:41      642,408                                        N    .00%
WOMN_ETHN_CLSS                  TABLE                                   2011-11-16 03:02:42       90,622          900            4           32 N   1.03%
WOMN_NM                         TABLE                                   2011-11-16 03:02:43      678,775        1,901           84           66 N    .30%
PROD_CPNT                       TABLE                                   2011-12-02 22:05:00        2,104                                        N    .00%
PK_PROD_CPNT                    INDEX on PROD_CPNT                      2011-12-02 22:05:00        2,104                                        N    .00%
PSTL_CMMN_ADDR                  TABLE                                   2011-11-16 03:03:03      489,200        1,868          283           62 N    .45%
PK_PSTL_CMMN_ADDR               INDEX on PSTL_CMMN_ADDR                 2011-11-16 03:03:04      489,200                                        N    .00%
REF_CTRY                        TABLE                                   2011-10-25 22:02:19          260            0           21            0 N   8.08%
REF_CONV_FREQ_TYPE              TABLE                                   2011-10-26 22:01:53            8                                        N    .00%
REF_ST                          TABLE                                   2011-12-13 22:14:10           72                                        N    .00%
SNP_CDC_SUBS                    TABLE                                   2011-12-15 22:01:23            2                                        N    .00%
PK_SNP_CDC_SBS                  INDEX on SNP_CDC_SUBS                   2011-12-15 22:01:23            2                                        N    .00%
TCMN_ADDR                       TABLE                                   2011-11-16 03:03:30      628,266        4,826          219          284 N    .85%
PK_TCMN_ADDR                    INDEX on TCMN_ADDR                      2011-11-16 03:03:30      628,266                                        N    .00%
TRUC_IDFN                       TABLE                                   2011-11-16 03:03:38      471,413        3,392        4,050           84 N   1.60%
TRUC_IDFN_VT                    TABLE                                   2011-11-16 03:03:40      548,277        4,471        1,458           96 N   1.10%
VANR                            TABLE                                   2011-12-15 10:43:22      309,110           47          101            0 N    .05%
PK_VANR                         INDEX on VANR                           2011-12-15 10:43:23      309,110                                        N    .00%
VANR_EMPT_STUS                  TABLE                                   2011-11-16 03:04:43      689,725        3,098           23           54 N    .46%
VANR_EMPT_STUS_RESN_DT          TABLE                                   2011-11-16 03:04:44      477,062        2,414           21           40 N    .52%
VANR_PROD_CFG                   TABLE                                   2011-11-16 03:05:38      292,458        1,564          279           24 N    .64%
VANR_VT                         TABLE                                   2011-11-16 03:05:52      335,413        2,476          303           42 N    .84%
WV_VANR_ID_IDX                  INDEX on VANR_VT                        2011-12-13 13:05:54      337,452                                        N    .00%
VANR_WORK_CATG                  TABLE                                   2011-11-16 03:05:52      159,673        1,356            2           19 N    .86%




I'm sure I'm going to find this very useful next time I get that question.. It also nicely pinpoints any objects that you should immediately consider analyzing.

Of course you need to understand your application to really read this completely.  Especially with updates. Are they updating an index column ? Did a massive update just change the number of distinct values, and the range of values for an indexed column? Were the updates just updates to an "update date" column that isn't used (except for audits).

Lastely, it doesn't describe anything about how the statistics were gathered (histograms or not, which columns etc, etc).

Do not use this as the absolutely truth, but at least it will help point you in the right direction.