排序操作
創(chuàng)新互聯公司專注于鶴慶企業(yè)網站建設,響應式網站,電子商務商城網站建設。鶴慶網站建設公司,為鶴慶等地區(qū)提供建站服務。全流程按需定制制作,專業(yè)設計,全程項目跟蹤,創(chuàng)新互聯公司專業(yè)和態(tài)度為您提供的服務
排序是計算機承擔的最基本操作之一,尤其是在數據庫處理領域,oracle也不例外。可能需要oracle排序數據的操作包括以下幾種;
(1)創(chuàng)建一個索引
(2)通過group by,unique或distinct關鍵字對數據進行分組或聚合
(3)因為使用order by子句使得數據按照排好的順序返回
(4)使用排序合并方法聯結表或結果集
(5)使用集合操作union,intersect或minus
(6)執(zhí)行特定的子查詢
查詢可能需要可觀的資源,具體說明如下:
CPU總是要消耗的。需要CPU的數量和需要排序的結果集大小成正比。
oracle分配一塊內存用來排序。這塊內存來自于程序全局區(qū)(PGA).可用的PGA內存總量通常取決于MEMORY_TARGET或PGA_AGGREGATE_TARGET的值
如果內存區(qū)域不足以完成排序,oracle會分配一個或多個臨時段到臨時表空間。這個叫作磁盤排序。磁盤排序會增加額外的開銷,包括在臨時段中分配空間和把數據寫入臨時表空間以及磁盤讀出的IO消耗。
1.1 最優(yōu),一次或多次排序
可供執(zhí)行排序操作的內存量是決定排序性能的最重要的因數。如果供排序使用的內存量足以使排序在內存中完成,則性能將是最佳的,實際上oracle將這種類型的排序操作稱為最優(yōu)排序。
如果沒有足夠的內存,當執(zhí)行排序操作時,oracle必須讀寫臨時段。在一次排序中,orale僅需要寫入(然后讀回)一個短。在多次排序中,oracle需要寫并讀回多個排序段。需要的排序次數越多。排序中所包含的IO就越多,排序性能也就越差。
排序所需要的IO隨著排序次數的增加而急劇增長,排序次數最終成為影響sql性能的主要因素。
1.2 衡量排序活動
現在讓我們來看看如何能夠衡量排序活動
優(yōu)化器計算它認為將需要的內存量,并且你可以通過檢查dbms_xplan輸出的tempSpc字段看到這些(即使在你執(zhí)行sql之前)
SQL> explain plan for select * from customers order by cust_last_name,cust_first_name,cust_year_of_birth;
Explained.
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 2792773903
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 55500 | 9810K| | 2609 (1)| 00:00:02 |
| 1 | SORT ORDER BY | | 55500 | 9810K| 12M| 2609 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL| CUSTOMERS | 55500 | 9810K| | 405 (1)| 00:00:01 |
---------------------------------------------------------------------------------------
9 rows selected.
DBMS_XPLAN產生的值是一個估計值,但是它可以給你一個對于所需內存的大體感受。
V$SQL_WORKAREA視圖包含于數據庫中已發(fā)生的內存使用及排序行為的相關的詳細統(tǒng)計信息。
WITH sql_workarea AS
(SELECT sql_id || '-' || child_number sql_id_child,
operation_type operation,
last_execution last_exec,
round(active_time / 1000000, 2) seconds,
optimal_executions || '/' || onepass_executions || '/' ||
multipasses_executions olm,
substr(sql_text, 1, 155) sql_text,
rank() over(ORDER BY active_time DESC) ranking
FROM v$sql_workarea
JOIN v$sql
USING (sql_id, child_number))
SELECT sql_id_child "SQL_ID -CHILD",
seconds,
operation,
last_exec,
olm "O/1/M",
sql_text
FROM sql_workarea
WHERE ranking <= 3
ORDER BY ranking;
1.3 跟蹤排序活動
SQL> oradebug event 10032 trace name context forever,level 12;
---- Sort Parameters ------------------------------
sort_area_size 12691456
sort_area_retained_size 12691456
sort_multiblock_read_count 1
max intermediate merge width 773
*** 2014-12-15 13:34:36.765
sorcls: sorp 0x7fd8df3ea1c8
---- Sort Statistics ------------------------------
Input records 55500
Output records 55500
Total number of comparisons performed 656484
Comparisons performed by in-memory sort 656484
Total amount of memory used 12691456
Uses version 2 sort
---- End of Sort Statistics -----------------------
1.4 使用索引規(guī)避排序
如果再order by子句中的部分或全部列上存在索引,oracle有可能使用索引來按照要求的順序獲取記錄,因此也避免了排序操作。
假如索引是出現于order by子句里的列相同的列上,oracle可以直接從索引中按照索引排序的順序讀取記錄。然而,按鍵的順序讀取記錄需要一塊接一塊地全掃描索引葉子塊。雖然快速全掃描比常規(guī)的全索引掃描高效得多,但是快速全掃描無法按索引順序返回記錄。因此也不能用來避免排序操作。
SQL> select * from customers order by cust_last_name,cust_first_name,cust_year_of_birth;
55500 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2792773903
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 55500 | 9810K| | 2609 (1)| 00:00:02 |
| 1 | SORT ORDER BY | | 55500 | 9810K| 12M| 2609 (1)| 00:00:02 |
| 2 | TABLE ACCESS FULL| CUSTOMERS | 55500 | 9810K| | 405 (1)| 00:00:01 |
----------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1456 consistent gets
1454 physical reads
0 redo size
6366832 bytes sent via SQL*Net to client
41213 bytes received via SQL*Net from client
3701 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
55500 rows processed
然而,如果我們在order by的列上創(chuàng)建一個索引,并使用first_rows提示(或index提示),oracle將使用索引替代排序:
SQL> create index cust_member_i on customers(cust_last_name,cust_first_name,cust_year_of_birth);
Index created.
SQL> select /*+ index(customers cust_member_i) */* from customers order by cust_last_name,cust_first_name,cust_year_of_birth;
55500 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3023887059
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 55500 | 9810K| 20550 (1)| 00:00:15 |
| 1 | TABLE ACCESS BY INDEX ROWID| CUSTOMERS | 55500 | 9810K| 20550 (1)| 00:00:15 |
| 2 | INDEX FULL SCAN | CUST_MEMBER_I | 55500 | |225 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
26557 consistent gets
1701 physical reads
0 redo size
11063057 bytes sent via SQL*Net to client
41213 bytes received via SQL*Net from client
3701 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
55500 rows processed
雖然使用索引就不再需要排序了,但是同時讀取索引和表塊,已經按塊順次讀取這種并不高效的掃描方式所帶來的開銷,比僅使用全表掃描讀取表塊的方式要欠佳很多。通常,這意味著了為了避免排序而使用索引,實際上會到導致更差的性能。然而,使用索引的再檢索第一行記錄時速度更快。因為一旦需要的記錄被檢索到,它就立即返回。相比之下排序的方法要求再任一記錄返回之前,全部記錄都必須被檢索并完成排序。因此,在優(yōu)化器目標為first_rows_n時,優(yōu)化器傾向于使用索引,而在目標是all_rows時,則會使用全表掃描。
2 分組與聚合
聚合(aggregate)操作基于多行記錄返回數據,平均,求和,最大和最小值等。分組(grouping)操作允許我們對共享相同的group by值的記錄集返回它的集合。
2.1 聚合操作
聚合操作(sum和average)必須處理輸入數據的每一行記錄。因此,它們通常和全表掃描聯系在一起:
SQL> select sum(quantity_sold) from sales;
Execution Plan
----------------------------------------------------------
Plan hash value: 3519235612
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 525 (2)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 3 | | | | |
| 2 | PARTITION RANGE ALL| | 918K|2691K| 525 (2)| 00:00:01 | 1 | 28 |
| 3 | TABLE ACCESS FULL | SALES | 918K|2691K| 525 (2)| 00:00:01 | 1 | 28 |
-------------------------------------------------------------------------------------------
如果被聚合上的列存在索引,對于這個索引的快速全掃描通常會更高效:
2.2 最大值和最小值
與大多數其他聚合操作不同,如果在相關列上存在索引,max和min操作并不需要讀取每一行記錄。如果存在B樹索引,我們可以通過檢查第一個或最后一個索引項來確定最大值或最小值,這僅需要3-5個邏輯讀的開銷
SQL> select max(amount_sold) from sales;(沒有索引)
Execution Plan
----------------------------------------------------------
Plan hash value: 3519235612
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 525 (2)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 5 | | | | |
| 2 | PARTITION RANGE ALL| | 918K|4486K| 525 (2)| 00:00:01 | 1 | 28 |
| 3 | TABLE ACCESS FULL | SALES | 918K|4486K| 525 (2)| 00:00:01 | 1 | 28 |
-------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1635 consistent gets
1619 physical reads
0 redo size
536 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
有索引的情況:
SQL> select max(amount_sold) from sales;
Execution Plan
----------------------------------------------------------
Plan hash value: 781264156
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
| 2 | INDEX FULL SCAN (MIN/MAX)| AMOUNT_SOLD_IDX | 1 | 5 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3 consistent gets
2 physical reads
0 redo size
536 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
然而,如果我們要同時查找最大值和最小值,oracle需要花費很多的邏輯讀,此時全掃描索引代價會更高,oracle選擇全表掃描。
SQL> select max(amount_sold),min(amount_sold) from sales;
Execution Plan
----------------------------------------------------------
Plan hash value: 3519235612
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 525 (2)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 5 | | | | |
| 2 | PARTITION RANGE ALL| | 918K|4486K| 525 (2)| 00:00:01 | 1 | 28 |
| 3 | TABLE ACCESS FULL | SALES | 918K|4486K| 525 (2)| 00:00:01 | 1 | 28 |
-------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1635 consistent gets
1619 physical reads
0 redo size
618 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
實際上分別提交MAX和MIN查詢然后將結果合并到一起是一種更好的辦法:
SQL> select max_sold,min_sold from(select max(amount_sold)max_sold from sales)maxt,(select min(amount_sold) min_sold from sales)mint;
Execution Plan
----------------------------------------------------------
Plan hash value: 3650580342
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 6 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 26 | 6 (0)| 00:00:01 |
| 2 | VIEW | | 1 | 13 | 3 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 5 | | |
| 4 | INDEX FULL SCAN (MIN/MAX)| AMOUNT_SOLD_IDX | 1 | 5 | 3 (0)| 00:00:01 |
| 5 | VIEW | | 1 | 13 | 3 (0)| 00:00:01 |
| 6 | SORT AGGREGATE | | 1 | 5 | | |
| 7 | INDEX FULL SCAN (MIN/MAX)| AMOUNT_SOLD_IDX | 1 | 5 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
602 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
2.3 "前N"查詢
在sales表中按美元計銷量最好的前10種商品,以下查詢肯定是不對的:
SQL> select * from sales where rownum<=10 order by amount_sold desc;
PROD_ID CUST_ID TIME_IDCHANNEL_ID PROMO_ID QUANTITY_SOLD AMOUNT_SOLD
---------- ---------- --------- ---------- ---------- ------------- -----------
13 987 10-JAN-98 3 999 11232.16
13 1660 10-JAN-98 3 999 11232.16
13 1762 10-JAN-98 3 999 11232.16
13 1843 10-JAN-98 3 999 11232.16
13 4663 10-JAN-98 3 999 11232.16
13 2273 10-JAN-98 3 999 11232.16
13 2380 10-JAN-98 3 999 11232.16
13 2683 10-JAN-98 3 999 11232.16
13 2865 10-JAN-98 3 999 11232.16
13 1948 10-JAN-98 3 999 11232.16
10 rows selected.
這是因為對where的處理會先于order by,因此這個查詢將獲取它最先發(fā)現的的10條記錄,然后對它們進行排序。這樣的結果不是真正的前10.
下面的查詢更合理:
SQL> select /* top10 subquery */ * from(select cust_id,prod_id,time_id,amount_sold from sales order by amount_sold desc) where rownum<=10;
CUST_ID PROD_ID TIME_IDAMOUNT_SOLD
---------- ---------- --------- -----------
3948 18 26-APR-99 1782.72
4150 18 26-JUN-99 1782.72
40 18 26-JUN-99 1782.72
33724 18 21-JUN-99 1782.72
32863 18 21-JUN-99 1782.72
31364 18 21-JUN-99 1782.72
10864 18 21-JUN-99 1782.72
10620 18 21-JUN-99 1782.72
6490 18 21-JUN-99 1782.72
4788 18 21-JUN-99 1782.72
Execution Plan
----------------------------------------------------------
Plan hash value: 443584055
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 480 | 4 (0)| 00:00:01 | | |
|* 1 | COUNT STOPKEY | | | | | | | |
| 2 | VIEW | | 10 | 480 | 4 (0)| 00:00:01 | | |
| 3 | TABLE ACCESS BY GLOBAL INDEX ROWID| SALES | 918K| 19M| 4 (0)| 00:00:01 | ROWID | ROWID |
| 4 | INDEX FULL SCAN DESCENDING | AMOUNT_SOLD_IDX | 10 | | 3 (0)| 00:00:01 | | |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=10)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
908 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
STOPKEY表示不用將排序的結果全部返回。
使用分析函數:
SQL> set lines 200
SELECT /* top10 dense_rank */
*
FROM (SELECT cust_id,
prod_id,
time_id,
amount_sold,
dense_rank() over(ORDER BY amount_sold DESC) ranking
FROM sales)
9 WHERE ranking <= 10;
1355 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2275521554
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 918K| 53M| | 6623 (1)| 00:00:05 | | |
|* 1 | VIEW | | 918K| 53M| | 6623 (1)| 00:00:05 | | |
|* 2 | WINDOW SORT PUSHED RANK| | 918K| 19M| 31M| 6623 (1)| 00:00:05 | | |
| 3 | PARTITION RANGE ALL | | 918K| 19M| | 525 (2)| 00:00:01 | 1 | 28 |
| 4 | TABLE ACCESS FULL | SALES | 918K| 19M| | 525 (2)| 00:00:01 | 1 | 28 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RANKING"<=10)
2 - filter(DENSE_RANK() OVER ( ORDER BY INTERNAL_FUNCTION("AMOUNT_SOLD") DESC )<=10)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1635 consistent gets
1619 physical reads
0 redo size
31018 bytes sent via SQL*Net to client
1513 bytes received via SQL*Net from client
92 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1355 rows processed
盡管我對于子查詢得到前10的方法有所保留,但是它更高效。dense_rank方法比排序的子查詢方法消耗更多的內存和CPU。
2.4 分組操作
分組(group by)操作允許sql語句對一批列中的每個唯一值返回一行記錄,并且對每個一個這樣的組計算聚合。例如,下面的語句對每一個國家代碼返回平均信用限額:
SQL> select country_id,avg(cust_credit_limit) from customers group by country_id;
19 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1577413243
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 171 | 407 (1)| 00:00:01 |
| 1 | HASH GROUP BY | | 19 | 171 | 407 (1)| 00:00:01 |
| 2 | TABLE ACCESS FULL| CUSTOMERS | 55500 | 487K| 405 (1)| 00:00:01 |
--------------------------------------------------------------------------------
標題名稱:排序,分組和集合操作
本文來源:http://m.rwnh.cn/article16/jipsdg.html
成都網站建設公司_創(chuàng)新互聯,為您提供網站建設、網頁設計公司、商城網站、微信小程序、Google、
聲明:本網站發(fā)布的內容(圖片、視頻和文字)以用戶投稿、用戶轉載內容為主,如果涉及侵權請盡快告知,我們將會在第一時間刪除。文章觀點不代表本網站立場,如需處理請聯系客服。電話:028-86922220;郵箱:631063699@qq.com。內容未經允許不得轉載,或轉載時需注明來源: 創(chuàng)新互聯