mongodb sharding keynote

February 12th, 2012 No comments

在公司内部做的一个简单的关于mongodb sharding的介绍

Categories: Oracle Tags:

再次认识for 循环

September 28th, 2011 No comments

SYS@XFAN: SQL> select count(*) from t;

  COUNT(*)
———-
  16320128
 

如下代码:
for i in (select * from t) loop
       dbms_lock.sleep(1);    — some operations which will take a long time
end loop;
假设1:  select * from t 被执行一次, 然后将结果集存放到一个位置,然后可以有i 来逐步遍历。 这里sql 的执行效率是很高的,但需要消耗掉很多的内存。
如果果真如此,我们必须避免对大结果集的类似操作。

假设2:  每次读取1行,这样就很容易产生2个问题。 1.该sql 会执行很久,增大了ora-1555 错误的可能性。 2.  context switch 的次数太多。

大多数人为了 context switch 的次数太多,
所以,人们会使用bulk collect 的方法来避免context switch 太多的弊端。而为了避免消耗太多内存, 有可以使用limit n 的情况, 可以limit 10, limit 30 等等。

Oracle 会如何操作呢?

Before the test:
Session1:  – run the plsql
SYS@XFAN: SQL> select sid from v$mystat where rownum=1;

       SID
———-
       362

Elapsed: 00:00:00.02
Session 2:   — mointor the sql ELAPSED_TIME

SYS@XFAN: SQL> select to_char(sysdate,’hh24:mi:ss’), EXECUTIONS,ELAPSED_TIME,ROWS_PROCESSED from v$sqlstats where sql_id=’6k1fc899x4ud2′;

no rows selected

 

Session 3:   –Monitor pga, uga  statistics
SYS@XFAN: SQL> select name, value from v$sesstat se, v$statname st where st.name in (‘session uga memory’,'session pga memory’,'session uga memory max’,'session pga memory max’,'temp space allocated (bytes)’) and se.sid=362 and se.statistic#=st.statistic#;

NAME                                                                  VALUE
—————————————————————- ———-
session uga memory                                                   311728
session uga memory max                                               377216
session pga memory                                                  1017048
session pga memory max                                              1017048
temp space allocated (bytes)                                              0

begin:
Session 1:
SYS@XFAN: SQL> select count(*) from t;

  COUNT(*)
———-
  16320128

Elapsed: 00:00:07.18

SYS@XFAN: SQL> begin
  2  for i in (select * from t) loop
  3  dbms_lock.sleep(1);
  4  end loop;
  5  end;
  6  /
 
Session 2:
SYS@XFAN: SQL> select to_char(sysdate,’hh24:mi:ss’), EXECUTIONS,ELAPSED_TIME,ROWS_PROCESSED from v$sqlstats where sql_id=’6k1fc899x4ud2′;

TO_CHAR( EXECUTIONS ELAPSED_TIME ROWS_PROCESSED
——– ———- ———— ————–
07:21:04          1         8132            100

SYS@XFAN: SQL> /

TO_CHAR( EXECUTIONS ELAPSED_TIME ROWS_PROCESSED
——– ———- ———— ————–
07:21:15          1         8132            100
SYS@XFAN: SQL> /

TO_CHAR( EXECUTIONS ELAPSED_TIME ROWS_PROCESSED
——– ———- ———— ————–
07:21:53          1         8132            100

SYS@XFAN: SQL> /

TO_CHAR( EXECUTIONS ELAPSED_TIME ROWS_PROCESSED
——– ———- ———— ————–
07:22:25          1         8132            100

SYS@XFAN: SQL> /

TO_CHAR( EXECUTIONS ELAPSED_TIME ROWS_PROCESSED
——– ———- ———— ————–
07:23:59          1         8671            200

SYS@XFAN: SQL> /

TO_CHAR( EXECUTIONS ELAPSED_TIME ROWS_PROCESSED
——– ———- ———— ————–
07:26:37          1         9846            400

SYS@XFAN: SQL> /

TO_CHAR( EXECUTIONS ELAPSED_TIME ROWS_PROCESSED
——– ———- ———— ————–
07:26:45          1         9846            400

SYS@XFAN: SQL> /

TO_CHAR( EXECUTIONS ELAPSED_TIME ROWS_PROCESSED
——– ———- ———— ————–
07:27:11          1         9846            400

SYS@XFAN: SQL> /

TO_CHAR( EXECUTIONS ELAPSED_TIME ROWS_PROCESSED
——– ———- ———— ————–
07:28:07          1        10501            500

SYS@XFAN: SQL> /

TO_CHAR( EXECUTIONS ELAPSED_TIME ROWS_PROCESSED
——– ———- ———— ————–
07:31:07          1        11579            700
Session 3:
SYS@XFAN: SQL> select name, value from v$sesstat se, v$statname st where st.name in (‘session uga memory’,'session pga memory’,'session uga memory max’,'session pga memory max’,'temp space allocated (bytes)’) and se.sid=482 and se.statistic#=st.statistic#;

NAME                                                                  VALUE
—————————————————————- ———-
session uga memory                                                   713200
session uga memory max                                               848448
session pga memory                                                  1803480
session pga memory max                                              1803480
temp space allocated (bytes)                                              0

SYS@XFAN: SQL> /

NAME                                                                  VALUE
—————————————————————- ———-
session uga memory                                                   713200
session uga memory max                                               848448
session pga memory                                                  1803480
session pga memory max                                              1803480
temp space allocated (bytes)                                              0

SYS@XFAN: SQL> /

NAME                                                                  VALUE
—————————————————————- ———-
session uga memory                                                   713200
session uga memory max                                               848448
session pga memory                                                  1803480
session pga memory max                                              1803480
temp space allocated (bytes)                                              0
从测试结果上看,for 循环并没有使用太多的pga, uga 等去存储整个结果集。从session 2 的监控情况,oracle 每次读取100行,然后缓存。 然后再次读取。

测试有2个意义:
1  不要在loop 中间有太多消耗时间的操作,  无论是使用了bulk collect分批操作 还是for , 否则就增加了ora-1555 发生的可能性。
2  假设select * from t 返回1000行, context switch 的次数是1000/100=10 次,而不是1000次。 如果使用bulk collect   limit n.   n<=100. 
   反而增加了context switch 的次数。

用不同的表测试, 100 的实验结果是不变的。 如有其它意见,欢迎拍砖。

Categories: Oracle Tags:

Library Cache Mutex: X with cursor build lock

September 28th, 2011 No comments

Before starting my introduction, I’d like to recommend a good Blog “latch, Mutex and beyond“. Andrey did a great research about Mutex in 11g. He gave a good introduction about mutex as below,

In 2005 mutexes were introduced with Oracle 10.2.0.2 patchset. Documentation states that “cursor: pin S” wait event should rarely be seen.”
In June 2008, Oracle released Patch 6904068: High CPU usage when there are “cursor: pin S” waits. Patch used “_first_spare_parameter” to tune the waits duration.
In December 2010, Oracle created another Patch 10411618: Add different wait schemes for mutex waits for 11.2. This patch completely changed the mutex waits and introduced exponential backoff. With this enhancement, mutex waits work like Oracle 8i latch waits. The patch tunables are “_mutex_spin_count”, “_mutex_wait_time” and “_mutex_wait_scheme”. This patch was later was included in 11.2.0.2.2 PSU
In May 2011, new MOS note Waitevent: “cursor: pin S” Reference Note [ID 1310764.1] shed some light on “cursor: pin S” waits. I will refer to this note during the post.
On 14 May 2011 Oracle released new version of Patch 10411618 and recommended to reapply the patch. Note 10411618.8 now states that Patch 6904068 is superseded by Patch 10411618
On 17 May 2011 Oracle urgently recommended to apply Patch: 12431716 Unexpected Change in Mutex Wait Behavior on top of the 11.2.0.2.2 PSU.

In our case, it’s different. We use Oracle 11.2.0.2.2 for Solaris SPARC (64-bit) T3, 128 CPU threads
It’s related to the Bug 12633340. After exchange inactive parition, when a very large number of sessions try to parse the same SQL statement and at the same time and excessive contention for cursor build locks with “Library Cache mutex: X and library cache lock” events.
Patch 12791691 contains a fix for bug 12633340 which aims to limit the number of reload failures that will be tolerated. That patch will set that threshold to a default value but if the default value fails to resolve the problem, then this threshold can be set explicitly by setting event 12633340.
The level of the event will be used as the number of allowable reload failures. This level should be set to the highest value that allows the problem to be resolved. Values lower than 3 are not recommended.

In 11.2, Oracle introduced the concept of cursor build locks as a mechanism for serializing the building of a child cursor under a given parent cursor. This causes severe contention for the build lock in extreme scenarios when thousands of sessions concurrently try to parse the same SQL statement. This manifests as waits for the build lock itself and also as library cache mutex contention (since the build lock is just a KGL lock).

There are two kinds of possible failures: reload/sharing failures. Our issue was to do with bind mismatch. What happens was that a LOT of sessions attempt to reload a particular child cursor and get the samebind mismatch failure during reload. These failed reloads resulted in sessions spending a lot of time in scanning the child list over and over again. Since they repeatedly took and released the build lock here, there is a lot of lock/mutex contention.

Now ordinarily, one of the sessions will manage to get the build lock in exclusive mode (while other sessions wait) and will build a child cursor that the other sessions can then share. In this best case scenario, the contention will be very brief. But in a scenario where reload failures are constantly occurring (the reason for the reload failure itself is not relevant here), the build lock and mutex contention will extend much longer. Due to the way build locks are currently architected, sessions could get stuck in the scan for long periods of time,fighting over the build lock and searching the child list over and over. This would continue until some session finally manages to build a child cursor that everyone can share.

If reload failures are known to be occurring, then eliminating the cause of this failures (if possible) should also prevent this bug.

The patch provided here is technically a temporary fix that limits the number of reload failures that we will tolerate. If the number of reload failures in the last 5 minutes, let’s say, exceeds a certain threshold, then we will mark the child cursor as unusable and proceed to create a new one. The rationale is that if a certain number of sessions have tried and failed to reload a particular child cursor, then other sessions are also likely to fail and hence it’s best to abandon it and move on. The default value for the threshold is the same as number of CPUs on the system.

Categories: Oracle Tags:

11g Bug 10082277 – Excessive allocation in PCUR or KGLH0 heap of “kkscsAddChildNo”

September 13th, 2011 No comments

在11.2.0.2的数据库上,我们遇到了oracle 11g的Bug 10082277 Excessive allocation in PCUR or KGLH0 heap of “kkscsAddChildNo” (ORA-4031)

Bug 10082277 Excessive allocation in PCUR or KGLH0 heap of “kkscsAddChildNo” (ORA-4031)
This note gives a brief overview of bug 10082277.
The content was last updated on: 26-AUG-2011
Click here for details of each of the sections below.
Affects:

Range of versions believed to be affected Versions BELOW 12.1
Versions confirmed as being affected

11.2.0.2
11.2.0.1

Platforms affected Generic (all / most platforms affected)

Fixed:

This issue is fixed in

12.1 (Future Release)
11.2.0.3 (Future Patch Set)
11.2.0.2.3 Patch Set Update
11.2.0.2 Bundle Patch 4 for Exadata Database
11.2.0.2 Patch 2 on Windows Platforms
11.2.0.1 Patch 11 on Windows Platforms

Description

Under certain circumstances the “perm” space in PCUR subheaps (11.2.0.1)
or KGLH0 subheaps (11.2.0.2) for cursors in the shared pool may continue
to grow over time with continual additions of memory of the type
“kkscsAddChildNo”.

Over time this can use excessive shared pool memory , even leading
to ORA-4031 errors.

Rediscovery Notes:
A heapdump of the SGA will show one or more very large
subheaps with a name of the form “PCUR^xxxx” or “KGLH0^xxxx”.
Within this subheap there is lots of “perm” space allocated.
If CPRM tracing is enable (event 10235 level 65536) then
the subheap dump shows the perm space as due to “kkscsAddChildNo”
allocations of memory.

Workaround
This issue requires that there is a lack of sharing of the
parent cursor. Hence a workaround, where possible, is to
address the reason for not sharing the parent cursor.
(eg: Check V$SQL_SHARED_CURSOR for why the parent is not
being shared).

该问题在我们的系统上表现为

1. 单个SQL占用了大量的shared pool memory, 这个例子中单个SQL就占用了1.7GB的shared pool memory

select VERSION_COUNT,SHARABLE_MEM from v$sqlarea where
hash_value=2038009379;
 
VERSION_COUNT SHARABLE_MEM
-----------
-- ------------
          
96   1888704961

而在10.2.0.4的数据库上,同样的SQL只占用了4M左右大小的内存

select VERSION_COUNT,SHARABLE_MEM from v$sqlarea where
hash_value=2038009379;
 
VERSION_COUNT SHARABLE_MEM
-----------
-- ------------
          
214      4216097

2. 大部分的内存都被parent cursor (cursor id为65535)占用了

通过Tanel的脚本curheaps.sql 可以查看各个child cursor的大小

SQL> @curheaps 2038009379 65535
old  20:        KGLNAHSH in (&1)
new  20:        KGLNAHSH in (2038009379)
old  21: and    KGLOBT09 like ('&2')
new  21: and    KGLOBT09 like ('65535')
 
 
KGLNAHSH KGLHDPAR             CHILD# KGLHDADR
KGLOBHD0                          SIZE0    SIZE1    SIZE2    SIZE3
--------
-- ---------------- ---------- ---------------- ----------------
--------------------
-- -------- -------- --------
KGLOBHD4            SIZE4    SIZE5 KGLOBHD6            SIZE6    SIZE7
STATUS
--------------
-- -------- -------- ---------------- -------- --------
--------
--
2038009379 0000000F3BC53E78      65535 0000000F3BC53E78

0000000F5BF1E648             *1883443712        *0        0        0
00                      0        0 00                      0
0          1
 
 
old  10:     KSMCHDS = hextoraw('&v_curheaps_kglobhd0')
new  10:     KSMCHDS = hextoraw('0000000F5BF1E648')
 
HEAP  CLASS    ALLOC_COMMENT         BYTES     CHUNKS
---
-- -------- ---------------- ---------- ----------
HEAP0 perm     permanent memor  *1898642464     *474659
HEAP0 free     free memory        26531224     473772
HEAP0 freeabl  kksfbc:hash1           4872         96
HEAP0 freeabl  kgltbtab                912          6
 
old  10:     KSMCHDS = hextoraw('&v_curheaps_kglobhd4')
new  10:     KSMCHDS = hextoraw('00')
 
no rows selected
 
old  10:     KSMCHDS = hextoraw('&v_curheaps_kglobhd6')
new  10:     KSMCHDS = hextoraw('00')
 
no rows selected

另外该问题只发生在client的jdbc driver升级到11g以后,jdbc driver为10g的时候没有这个问题,估计和shared cursor sharing有关系。

SQL的 parent cursor不断增长一方面会使得shared pool的内存耗尽,另外如果发生hard parse耗时非常严重,可能会导致大量的和parse相关的等待时间,例如“cursor: mutex S”。

Oracle有相关的patch可以下载,打上patch后问题解决。

Categories: Oracle Tags:

Wrong plan on MySQL with key_len

August 11th, 2011 No comments

MySQL Optimizer is also cost based, however, it doesn’t smart enough as Oracle. Using EXPLAIN, we can get the execution plan of select query. For the output of EXPLAIN, the column key_len shows the number of bytes to use in the index. For multiple columns index, MySQL can’t use full key length in some case. After my converting table from Memory Engine to InnoDB, I had the performance degradtion and more times consumed of one SQL. The table named sdu_test with index sdu_test_UX (USER_ID,APP_ID,ID), the data type is APP_ID int(10), USER_ID bigint(20), ID bigint(20).

MySQL version: 5.5.12-eb-5.5.12-log
Platform: x86_64 GNU/Linux
Default Engine: InnoDB

The SQL and related explain plan is listed as below

root@localhost >explain SELECT main.MESSAGE_BODY,main.CREATION_DATE FROM sdu_test main WHERE main.USER_ID = 30004163 AND main.APP_ID = 4932 AND main.ID > 3321261 LIMIT 10;
+—-+————-+——-+——+———————+————-+———+————-+——–+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——+———————+————-+———+————-+——–+————-+
| 1 | SIMPLE | main | ref | PRIMARY,sdu_test_UX | sdu_test_UX | 12 | const,const | 495444 | Using where |
+—-+————-+——-+——+———————+————-+———+————-+——–+————-+
1 row in set (0.00 sec)

We can see the key_len is 12, which is only two columns of indexes are used (USER_ID, APP_ID). 12 is calucated by User_ID bigint with 8 bytes + APP_ID int 4 bytes
How can we make all these three columns are used? Analyze table can’t help this. We can fix that using index hint as below,

root@localhost >explain SELECT main.MESSAGE_BODY,main.CREATION_DATE FROM sdu_test main USE KEY (sdu_test_UX) WHERE main.USER_ID = 30004163 AND main.APP_ID = 4932 AND main.ID > 3321261 LIMIT 10;
+—-+————-+——-+——-+—————+————-+———+——+——–+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——-+—————+————-+———+——+——–+————-+
| 1 | SIMPLE | main | range | sdu_test_UX | sdu_test_UX | 20 | NULL | 495444 | Using where |
+—-+————-+——-+——-+—————+————-+———+——+——–+————-+
1 row in set (0.00 sec)

We can see the column type changed from “ref” to “range” and key_len is changed to 20.
After that, SQL has better performance.
It is the bug reported in MySQL 4.0 by Percona and it is not fixed in our version 5.5.

Hope MySQL can detect this and make the opertimizer more mature and stable.

Categories: MySQL Tags:

Pool of Threads in MySQL

July 11th, 2011 No comments

    Pool-of-Threads is introduced to supports a large number of clients that will be maintaining their connections to the database, but will not be sending a constant connection requests to the database. In theory, it is the similar concept as Oracle MTS(Multi-threaded server), which introduced the dispatch and shared server to manage connections in order to support more connections and save memory usage per connections.

     The parameter thread_handling e set to pool-of-threads instead of one-thread-per-connection. eBay patched that in 5.0 version with setting max_connnections=30000. The parameter thread_pool_size is used to control the working thread. In the following, the thread_pool_size is set to 12,

[mysql@~]$ ps -Lf `pgrep mysqld`|awk ‘{print $1,$2,$4,$11}’
UID PID LWP CMD
mysql 2502 2502 /export/home/mysql/mysql/bin/mysqld
mysql 2502 2505 /export/home/mysql/mysql/bin/mysqld
mysql 2502 2506 /export/home/mysql/mysql/bin/mysqld
mysql 2502 2507 /export/home/mysql/mysql/bin/mysqld
mysql 2502 2508 /export/home/mysql/mysql/bin/mysqld
mysql 2502 2509 /export/home/mysql/mysql/bin/mysqld
mysql 2502 2510 /export/home/mysql/mysql/bin/mysqld
mysql 2502 2511 /export/home/mysql/mysql/bin/mysqld
mysql 2502 2512 /export/home/mysql/mysql/bin/mysqld
mysql 2502 2513 /export/home/mysql/mysql/bin/mysqld
mysql 2502 2514 /export/home/mysql/mysql/bin/mysqld
mysql 2502 2515 /export/home/mysql/mysql/bin/mysqld
mysql 2502 2516 /export/home/mysql/mysql/bin/mysqld
mysql 2502 2517 /export/home/mysql/mysql/bin/mysqld

    If thread per connnection mode, we can set max_connections>max_user_connections to save a free connnection for super user to avoid running out of connections or change max_connections by gdb(refer to [1]). However with pool-of-threads is that if all worker threads are doing work (like running long queries) or are locked by a row/table lock no new connections can be established and you can’t login and find out what’s wrong or login and kill queries. In my case, we try to pstack it and found all the threads were waiting for mutex_lock(). The free thread was not available. We could take any acitons and had to kill and restart MySQL.

—————– lwp# 12 / thread# 12 ——————–
fffffd7fff12ff57 lwp_park (0, 0, 0)
fffffd7fff129b64 cond_wait_queue () + 44
fffffd7fff12a0f9 _cond_wait () + 59
fffffd7fff12a126 cond_wait () + 26
fffffd7fff12a169 pthread_cond_wait () + 9
000000000084c3a6 wait_for_lock () + d6
000000000084cf80 thr_multi_lock () + a0
00000000005fe8dc _Z17mysql_lock_tablesP3THDPP8st_tablejjPb () + 14c
0000000000639685 _Z11lock_tablesP3THDP13st_table_listjPb () + 155
0000000000640037 _Z20open_and_lock_tablesP3THDP13st_table_list () + 27
0000000000662c2c _Z12mysql_insertP3THDP13st_table_listR4ListI4ItemERS3_IS5_ES6_S6_15enum_duplicatesb () + 1ec
00000000006115f5 _Z21mysql_execute_commandP3THD () + 2e95
0000000000613fa3 _Z11mysql_parseP3THDPcj () + 163
0000000000614e22 _Z16dispatch_command19enum_server_commandP3THDPcj () + e42
000000000061556b _Z10do_commandP3THD () + bb
0000000000609924 ep_thread_proc () + 184
fffffd7fff12fcab _thr_setup () + 5b
fffffd7fff12fee0 _lwp_start ()
—————– lwp# 13 / thread# 13 ——————–
fffffd7fff12ff57 lwp_park (0, 0, 0)
fffffd7fff129b64 cond_wait_queue () + 44
fffffd7fff12a0f9 _cond_wait () + 59
fffffd7fff12a126 cond_wait () + 26
fffffd7fff12a169 pthread_cond_wait () + 9
000000000084c3a6 wait_for_lock () + d6
000000000084cf80 thr_multi_lock () + a0
00000000005fe8dc _Z17mysql_lock_tablesP3THDPP8st_tablejjPb () + 14c
0000000000639685 _Z11lock_tablesP3THDP13st_table_listjPb () + 155
0000000000640037 _Z20open_and_lock_tablesP3THDP13st_table_list () + 27
0000000000662c2c _Z12mysql_insertP3THDP13st_table_listR4ListI4ItemERS3_IS5_ES6_S6_15enum_duplicatesb () + 1ec
00000000006115f5 _Z21mysql_execute_commandP3THD () + 2e95
0000000000613fa3 _Z11mysql_parseP3THDPcj () + 163
0000000000614e22 _Z16dispatch_command19enum_server_commandP3THDPcj () + e42
000000000061556b _Z10do_commandP3THD () + bb
0000000000609924 ep_thread_proc () + 184
fffffd7fff12fcab _thr_setup () + 5b
fffffd7fff12fee0 _lwp_start ()
—————– lwp# 14 / thread# 14 ——————–
fffffd7fff13208a sigtimedwait (fffffd7ffec99de0, fffffd7ffec99df0, 0)
fffffd7fff1243ad sigwait () + d
fffffd7fff11c180 __posix_sigwait () + 40
0000000000606404 signal_hand () + 194
fffffd7fff12fcab _thr_setup () + 5b
fffffd7fff12fee0 _lwp_start ()

In MariaDB, the parameter extra-port or extra-max-connections to avoid this case[2] and you can connection to use one more threads to check what happened.

The limitations of this implementation – besides the known ones from MariaDB and MySQL 6.0 [3] – are
(from Percona support):

– The Performance Schema is not supported, information from it should not be taken into account at all.
– The server needs to be built with -DWITH_LIBEVENT=bundled in order to have the thread pool option. No other value for this option is supported (e.g.
any libevent library installed on the system – it must be the bundled library).
– The connection timeout support is not implemented, as in the original MySQL 6.0 thread pool implementation. For example, changes to global or session variable wait_timeout will have no effect.

References:

[1]  http://www.mysqlperformanceblog.com/2010/03/23/too-many-connections-no-problem/
[2] http://kb.askmonty.org/v/pool-of-threads
[3] http://mikaelronstrom.blogspot.com/2009/12/new-threadpool-design.html

Categories: MySQL Tags:

Forecasting Database Performance

July 9th, 2011 No comments

After half a year involved into database capacity project, I have learned some about basic database performance forcast knowledge, still far away to be a capacity expert. The following Presentation is my summary about my understanding of forecasting database performance.


Categories: Oracle Tags:

Oracle 11gR2 smart flash cache

June 28th, 2011 No comments

Oracle smart flash cache是11gR2的new feature. 和Exadata Smart Flash Cache 是两个不同的概念,虽然都和flash card有关.

工作原理是用flash card做为数据库的二级cache,没有被修改过的数据块从db cache中被age out的时候,写入到flash cache中. 如果需要再次读取,可以从flash cache中读取,避免了从磁盘读取. 适用于存在大量db file sequential read的数据库.

在公司内做的一个分享,slides已经upload到www.slideshare.net

Categories: Oracle Tags:

ORA-600 [qksdsEvalBlock0]

April 16th, 2011 No comments

Oracle版本:11.1.0.7.0 for Linux,

现象:alert log中出现,

ORA-00600: internal error code, arguments: [qksdsEvalBlock0], [], [], [], [], [], [], [], [], [], [], []
Line: 12987 – Incident details in: /oracle/XXXXXX/home/diag/rdbms/XXXXXX/XXXXXX/incident/incdir_36801/XXXXXX_j002_30952_i36801.trc

同时出现

/oracle/XXXXXX/home/diag/rdbms/XXXXXX/XXXXXX/trace/XXXXXX_j004_30956.trc:
Line: 12993 – ORA-20000: Unable to analyze TABLE “USER”.”XXXX”, insufficient privileges or does not exist,

经查证,这些数据库enable了AUTOmatical maintenance task 和应用了interval partition ,

从metalink中可以找到unpublished Bug: 7703692,所以,要么不用这个feature,要么patch或者使用11.2.0.1版本.
Symptoms

– Problem Statement:
ORA-00600: internal error code, arguments: [qksdsEvalBlock0], [], [], [], [], [], [], [], [], [],[], []
when running SQL tuning advisor.

—– Call Stack Trace —–
skdstdst ksedst1 ksedst dbkedDefDump ksedmp PGOSF52_ksfdmp dbgexPhaseII dbgexProcessError dbgeExecuteForError dbgePostErrorKGE dbkePostKGE_kgsf kgeadse kgerinv_internal kgerinv kgeasnmierr qksdsEvaBlock qksdsExecute kkoatTabStatsCollec tAll kkoatTabStatsVerify kkoatStatsCollect kkoipt kkoqbc apakkoqb apaqbdDescendents apaqbdDescendents apadrv opitca kksFullTypeCheck rpiswu2 kksSetBindType kksfbc opiexe kpoal8 opiodr PGOSF514_kpoodrc rpiswu2 kpoodr upirtrc kpurcsc kpuexec OCIStmtExecute qksanAnalyzeSql qksanAnalyzeSegSql kestsaInitialRound kestsaAutoTuneSql kestsaAutoTuneDrv kestsTuneSqlDrv kesaiExecAction kesaiTuneSqlDrv 264 spefcifa spefmccallstd pextproc PGOSF489_peftrust PGOSF515_psdexsp rpiswu2 psdextp pefccal pefcal pevm_FCAL pfrinstr_FCAL pfrrun_no_tool pfrrun plsql_run peicnt kkxexe opiexe kpoal8 opiodr kpoodr upirtrc kpurcsc kpuexec OCIStmtExecute PGOSF456_jslvec_e xeccb jslvswu xeccb jslve_execute0 jslve_execute rpiswu2 kkjex1e kkjsexe kkjrdp opirip opidrv sou2o opimai_real ssthrdmain main libc_start_main start

Cause

Query involves an interval-partitioned table.

This is unpublished Bug: 7703692
RELEASE NOTES:
Error ORA-600 [qksdsEvalBlock0] was raised when sql tuning a query involving an interval-partitioned table.
REDISCOVERY INFORMATION:
If you get error ORA-600 [qksdsEvalBlock0] while sql tuning a sql statement involving an interval-partitioned table then you are likely to have encountered this problem.

Solution

– To implement the solution, please execute the following steps::
Check if Patch 7703692 is available for your current platform and Oracle version though Metalink.

If not, log a service request with Oracle Support and request one for unpublished Bug: 7703692.

Categories: Oracle Tags:

乾坤大挪移–利用exchange partition将range partition表转为range hash partition表

March 14th, 2011 No comments

Exchange Partition是一个很好用的功能,在我们的系统中被经常使用。

如果有同学不熟悉,可以参考这篇文章exchange partition

灵活运用这一功能,可以解决很多问题。上周我们利用exchange partition成功的将range partition表转为range hash partition表。

事件背景如下:

Range分区的日志表,大小为1.6T, 每天写入一个分区,每天truncate下一个分区给第二天的数据使用,33个数据分区循环使用再加一个Partmax分区。

问题是Insert操作太多,在高峰时段达到每秒钟近700次,出现大量的HW enqueue,导致数据库不稳定。

解决问题的方法:

将Range分区表转换为Range-Hash分区表,partition level range分区的方法不变,在subpartition level按照primary key ID值做hash分区。这样每个分区上的insert操作大大降低。

传统的方法是用Online Redefinition,但是用online redefinition的问题是需要大量的临时空间来存储中间表,并且准备时间太长。

而利用Exchange Partition可以在一分钟内实现range hash分区。

主要的步骤是这样的:

为了描述问题的方便,我将其他不相干的列都去除掉了,只保留了ID列作为hash partition key和PARTITION_KEY列为Range partition Key.

SQL> desc LOG_TABLE
 
Name                 Null?        Type
 
ID                     NOT NULL NUMBER(38)
 
PARTITION_KEY   NOT NULL NUMBER(4)

1. 首先建立一个range-hash分区的空表,每一个分区含有一个Hash子分区

CREATE TABLE "EXCH_LOG_TABLE"  (
 
"ID"        NUMBER(38, 0) NOT NULL ENABLE
 
"PARTITION_KEY"                   NUMBER(4, 0) NOT NULL)                 
 
PARTITION BY RANGE ("PARTITION_KEY" )  SUBPARTITION BY HASH (ID)
 
(
 
PARTITION "PART0" VALUES LESS THAN (1) ,
 
PARTITION "PART1" VALUES LESS THAN (2) ,
 
PARTITION "PART2" VALUES LESS THAN (3) ,
 
PARTITION "PART3" VALUES LESS THAN (4)  ,
 
PARTITION "PART4" VALUES LESS THAN (5)  ,
 
PARTITION "PART100" VALUES LESS THAN (101)  ,
 
PARTITION "PART101" VALUES LESS THAN (102)  ,
 
PARTITION "PART102" VALUES LESS THAN (103)  ,
 
PARTITION "PART103" VALUES LESS THAN (104)  ,
 
PARTITION "PART104" VALUES LESS THAN (105)  ,
 
PARTITION "PART105" VALUES LESS THAN (106)  ,
 
PARTITION "PART106" VALUES LESS THAN (107)  ,
 
PARTITION "PART107" VALUES LESS THAN (108)  ,
 
PARTITION "PART108" VALUES LESS THAN (109)  ,
 
PARTITION "PART109" VALUES LESS THAN (110)  ,
 
PARTITION "PART110" VALUES LESS THAN (111)  ,
 
PARTITION "PART111" VALUES LESS THAN (112)  ,
 
PARTITION "PART112" VALUES LESS THAN (113)  ,
 
PARTITION "PART113" VALUES LESS THAN (114)  ,
 
PARTITION "PART114" VALUES LESS THAN (115)  ,
 
PARTITION "PART115" VALUES LESS THAN (116)  ,
 
PARTITION "PART116" VALUES LESS THAN (117)  ,
 
PARTITION "PART117" VALUES LESS THAN (118)  ,
 
PARTITION "PART118" VALUES LESS THAN (119)  ,
 
PARTITION "PART119" VALUES LESS THAN (120)  ,
 
PARTITION "PART120" VALUES LESS THAN (121)  ,
 
PARTITION "PART121" VALUES LESS THAN (122)  ,
 
PARTITION "PART122" VALUES LESS THAN (123)  ,
 
PARTITION "PART123" VALUES LESS THAN (124)  ,
 
PARTITION "PART124" VALUES LESS THAN (125)  ,
 
PARTITION "PART125" VALUES LESS THAN (126)  ,
 
PARTITION "PART126" VALUES LESS THAN (127)  ,
 
PARTITION "PART127" VALUES LESS THAN (128)  ,
 
PARTITION "PART128" VALUES LESS THAN (129)  ,
 
PARTITION "PART129" VALUES LESS THAN (130)  ,
 
PARTITION "PART130" VALUES LESS THAN (131)  ,
 
PARTITION "PART131" VALUES LESS THAN (132)  ,
 
PARTITION "PART132" VALUES LESS THAN (133)  ,
 
PARTITION "PARTMAX" VALUES LESS THAN (MAXVALUE) 
 
)
 
SQL> l
 
1* select partition_name,subpartition_name from user_tab_subpartitions where table_name='EXCH_LOG_TABLE'
SQL> /
 
PARTITION_NAME                 SUBPARTITION_NAME
----------------------------
-- ------------------------------
PART0                          SYS_SUBP143
PART1                          SYS_SUBP144
PART2                          SYS_SUBP145
PART3                          SYS_SUBP146
PART4                          SYS_SUBP147
PART100                        SYS_SUBP148
PART101                        SYS_SUBP149
PART102                        SYS_SUBP150
PART103                        SYS_SUBP151
PART104                        SYS_SUBP152
PART105                        SYS_SUBP153
PART106                        SYS_SUBP154
PART107                        SYS_SUBP155
PART108                        SYS_SUBP156
PART109                        SYS_SUBP157
PART110                        SYS_SUBP158
PART111                        SYS_SUBP159
PART112                        SYS_SUBP160
PART113                        SYS_SUBP161
PART114                        SYS_SUBP162
PART115                        SYS_SUBP163
PART116                        SYS_SUBP164
PART117                        SYS_SUBP165
PART118                        SYS_SUBP166
PART119                        SYS_SUBP167
PART120                        SYS_SUBP168
PART121                        SYS_SUBP169
PART122                        SYS_SUBP170
PART123                        SYS_SUBP171
PART124                        SYS_SUBP172
PART125                        SYS_SUBP173
PART126                        SYS_SUBP174
PART127                        SYS_SUBP175
PART128                        SYS_SUBP176
PART129                        SYS_SUBP177
PART130                        SYS_SUBP178
PART131                        SYS_SUBP179
PART132                        SYS_SUBP180
PARTMAX                        SYS_SUBP181
 
39 rows selected.

3. 在维护时间段内,将partition和subpartition做exchange,这两个不能直接交换,需要一个普通表做中转

SQL> CREATE TABLE "TEMP_LOG_TABLE"  (                   
"ID"        NUMBER(38, 0) NOT NULL ENABLE
"PARTITION_KEY"                   NUMBER(4, 0) NOT NULL)  ;
 
Table created.

对每一个partition和subpartition做如下的操做

LOG_TABLE.Partition <=> TEMP_LOG_TABLE <=> EXCH_LOG_TABLE.Subpartition

注意在exchange过程中要如果在表上有constraint,要临时disable(注意要keep index),否则在exchange时会进行数据检查

SQLalter table LOG_TABLE exchange partition PART0 with table TEMP_LOG_TABLE including indexes without validation;
 
Table altered.
 
SQL> alter table EXCH_LOG_TABLE exchange subpartition SYS_SUBP143 with table TEMP_LOG_TABLE including indexes without validation;
 
Table altered.

所有的partition都转换过后,数据就从LOG_TABLE表进入了EXCH_LOG_TABLE表

4. 下一步就是rename表了,还有相关的索引,并且enable第三步中被disable的constraint

SQL> rename LOG_TABLE to OLD_LOG_TABLE;
 
Table renamed.
 
SQL> rename EXCH_LOG_TABLE to LOG_TABLE;
 
Table renamed.

5. 到这里,Range Partition表就变成了Range Hash partition表了,但是问题还没有解决。因为每个Range分区下面只有一个Hash分区,还是有问题。

这时要在下一个分区被Truncate后为空的时候做subpartiiton split。比如说下一个分区是PART130,我们要将其拆分为16个分区:

SQL> alter table LOG_TABLE modify partition PART130 add subpartition;
 
Table altered.
 
SQL> /
 
Table altered.
 
SQL> /
 
Table altered.
 
SQL> /
 
Table altered.
 
SQL> /
 
Table altered.
 
SQL> /
 
Table altered.
 
SQL> /
 
Table altered.
 
SQL> /
 
Table altered.
 
SQL> /
 
Table altered.
 
SQL> /
 
Table altered.
 
SQL> /
 
Table altered.
 
SQL> /
 
Table altered.
 
SQL> /
 
Table altered.
 
SQL> /
 
Table altered.
 
SQL> /
 
Table altered.
 
SQL> select partition_name,subpartition_name from user_tab_subpartitions where table_name='LOG_TABLE';
 
PARTITION_NAME                 SUBPARTITION_NAME
----------------------------
-- ------------------------------
PART0                          SYS_SUBP143
PART1                          SYS_SUBP144
PART2                          SYS_SUBP145
PART3                          SYS_SUBP146
PART4                          SYS_SUBP147
PART100                        SYS_SUBP148
PART101                        SYS_SUBP149
PART102                        SYS_SUBP150
PART103                        SYS_SUBP151
PART104                        SYS_SUBP152
PART105                        SYS_SUBP153
PART106                        SYS_SUBP154
PART107                        SYS_SUBP155
PART108                        SYS_SUBP156
PART109                        SYS_SUBP157
PART110                        SYS_SUBP158
PART111                        SYS_SUBP159
PART112                        SYS_SUBP160
PART113                        SYS_SUBP161
PART114                        SYS_SUBP162
PART115                        SYS_SUBP163
PART116                        SYS_SUBP164
PART117                        SYS_SUBP165
PART118                        SYS_SUBP166
PART119                        SYS_SUBP167
PART120                        SYS_SUBP168
PART121                        SYS_SUBP169
PART122                        SYS_SUBP170
PART123                        SYS_SUBP171
PART124                        SYS_SUBP172
PART125                        SYS_SUBP173
PART126                        SYS_SUBP174
PART127                        SYS_SUBP175
PART128                        SYS_SUBP176
PART129                        SYS_SUBP177
PART130                        SYS_SUBP196
PART130                        SYS_SUBP195
PART130                        SYS_SUBP194
PART130                        SYS_SUBP193
PART130                        SYS_SUBP192
PART130                        SYS_SUBP191
PART130                        SYS_SUBP190
PART130                        SYS_SUBP189
PART130                        SYS_SUBP188
PART130                        SYS_SUBP187
PART130                        SYS_SUBP186
PART130                        SYS_SUBP185
PART130                        SYS_SUBP184
PART130                        SYS_SUBP183
PART130                        SYS_SUBP182
PART130                        SYS_SUBP178
PART131                        SYS_SUBP179
PART132                        SYS_SUBP180
PARTMAX                        SYS_SUBP181
 
54 rows selected.

这样当所有的分区循环一遍后,所有的Range Partition就都拥有了32个子分区。相应的问题也就解决了。

Categories: Oracle Tags: