Hi,
Today, let me discuss some thing about PG Optimizer while taking a better execution plan.
Seq Scan (vs) Index Scan
----------------------------------
Both scans having its own pros and cons. Sometimes, seq scan will give you the better execution time and sometimes Index scan.
Let's find out..
Step 1:-
postgres=# CREATE TABLE TEST(T INT PRIMARY KEY);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
CREATE TABLE
postgres=# INSERT INTO TEST VALUES(GENERATE_SERIES(1,100));
INSERT 0 100
Step 2:-
Check the no.of pages occupied by the table "test" from pg_class.
postgres=# ANALYZE TEST;
ANALYZE
postgres=# SELECT RELNAME,RELPAGES FROM PG_CLASS WHERE relname like 'test';
relname | relpages
---------+----------
test | 1
Step 3:-
Find the below execution plans.
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=10;
QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..2.25 rows=1 width=4) (actual time=0.032..0.063 rows=1 loops=1)
Filter: (t = 10)
Total runtime: 0.110 ms
postgres=# show seq_page_cost ;
seq_page_cost
---------------
1
(1 row)
postgres=# show random_page_cost ;
random_page_cost
------------------
4
(1 row)
Let's make it to take Index scan.
postgres=# set seq_page_cost to 4;
SET
postgres=# set random_page_cost to 1;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Index Scan using test_pkey on test (cost=0.00..2.27 rows=1 width=4) (actual time=0.049..0.053 rows=1 loops=1)
Index Cond: (t = 10)
Total runtime: 0.113 ms
In this case, we need index_scan as well heap_scan(Fetching data from the pages). That's why we have a bit overhead. However, in sequential scan, we don't have any overhead.
So, depends on no. Of pages occupied by the table and the page costs,the optimizer will take the better execution plans.
Step 4:
Let's increase no.of pages for the table (by inserting some data).
postgres=# INSERT INTO TEST VALUES(generate_series(1,1000));
INSERT 0 1000
postgres=# ANALYZE TEST;
ANALYZE
postgres=# ANALYZE TEST;
ANALYZE
postgres=# SELECT RELNAME,RELPAGES FROM PG_CLASS WHERE relname like 'test';
relname | relpages
---------+----------
test | 4
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Index Scan using test_pkey on test (cost=0.00..8.27 rows=1 width=4) (actual time=0.019..0.023 rows=1 loops=1)
Index Cond: (t = 10)
Total runtime: 0.078 ms
postgres=# show random_page_cost ;
random_page_cost
------------------
4
(1 row)
postgres=# show seq_page_cost ;
seq_page_cost
---------------
1
(1 row)
Let's make this to use sequentail scan.
postgres=# set random_page_cost to 10;
SET
postgres=# set seq_page_cost to 1;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=10;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..16.50 rows=1 width=4) (actual time=0.022..0.301 rows=1 loops=1)
Filter: (t = 10)
Total runtime: 0.337 ms
(3 rows)
Let's find out another way of sequential scan and index scans.
Step 1
> drop table test;
> CREATE TABLE TEST(T INT);
> INSERT INTO TEST SELECT 2 FROM GENERATE_SERIES(1,1);
> INSERT INTO TEST SELECT 1 FROM GENERATE_SERIES(1,10000);
> CREATE INDEX IDX ON TEST(T);
Step 2
Let's find out the most_comm_frequencies from pg_stats.
postgres=# SELECT * FROM PG_STATS WHERE TABLENAME='test' and schemaname='public';
postgres=# SELECT * FROM PG_STATS WHERE TABLENAME='test' and schemaname='public';
-[ RECORD 1 ]-----+-----------
schemaname | public
tablename | test
attname | t
inherited | f
null_frac | 0
avg_width | 4
n_distinct | 2
most_common_vals | {1}
most_common_freqs | {0.999833}
histogram_bounds |
correlation | 0.999001
Step 3
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..181.64 rows=11009 width=4) (actual time=0.215..94.610 rows=10000 loops=1)
Filter: (t = 1)
Total runtime: 180.054 ms
Here Seq Scan, is the best plan. Because, 99% of the table entries having "1".
Let's make it to use index scan which leads more over head.
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Index Scan using idx on test (cost=0.00..17393.65 rows=11009 width=4) (actual time=0.210..264.606 rows=10000 loops=1)
Index Cond: (t = 1)
Total runtime: 388.713 ms
See how much difference is here when compare to Seq Scan. Almost we spend 3 times of actual execution.
Step 4
postgres=# set seq_page_cost to 1;
SET
postgres=# set random_page_cost to 4;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=2;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Index Scan using idx on test (cost=0.00..8.29 rows=2 width=4) (actual time=1.567..1.572 rows=1 loops=1)
Index Cond: (t = 2)
Total runtime: 1.853 ms
(3 rows)
In this case, this plan is good. Because, we have only one entry in the table with the value 2.
Let's make this to use Seq Scan, which leads to take more time.
postgres=# set random_page_cost to 400; ("Made this to big value for picking the seq scan, instead of index scan).
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=2;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..181.64 rows=2 width=4) (actual time=0.447..32.853 rows=1 loops=1)
Filter: (t = 2)
Total runtime: 32.995 ms
See the difference, we spend nearly 32 times of prior execution.
So, as per out test cases Optimizer is always including "seq_page_cost","random_page_cost","no.of pages" and "most_ common_ values'" for making better a execution plan for the given query.
Will keep on posting on Optimizer ....
-Dinesh
Today, let me discuss some thing about PG Optimizer while taking a better execution plan.
Seq Scan (vs) Index Scan
----------------------------------
Both scans having its own pros and cons. Sometimes, seq scan will give you the better execution time and sometimes Index scan.
Let's find out..
Step 1:-
postgres=# CREATE TABLE TEST(T INT PRIMARY KEY);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
CREATE TABLE
postgres=# INSERT INTO TEST VALUES(GENERATE_SERIES(1,100));
INSERT 0 100
Step 2:-
Check the no.of pages occupied by the table "test" from pg_class.
postgres=# ANALYZE TEST;
ANALYZE
postgres=# SELECT RELNAME,RELPAGES FROM PG_CLASS WHERE relname like 'test';
relname | relpages
---------+----------
test | 1
Step 3:-
Find the below execution plans.
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=10;
QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..2.25 rows=1 width=4) (actual time=0.032..0.063 rows=1 loops=1)
Filter: (t = 10)
Total runtime: 0.110 ms
postgres=# show seq_page_cost ;
seq_page_cost
---------------
1
(1 row)
postgres=# show random_page_cost ;
random_page_cost
------------------
4
(1 row)
Let's make it to take Index scan.
postgres=# set seq_page_cost to 4;
SET
postgres=# set random_page_cost to 1;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Index Scan using test_pkey on test (cost=0.00..2.27 rows=1 width=4) (actual time=0.049..0.053 rows=1 loops=1)
Index Cond: (t = 10)
Total runtime: 0.113 ms
In this case, we need index_scan as well heap_scan(Fetching data from the pages). That's why we have a bit overhead. However, in sequential scan, we don't have any overhead.
So, depends on no. Of pages occupied by the table and the page costs,the optimizer will take the better execution plans.
Step 4:
Let's increase no.of pages for the table (by inserting some data).
postgres=# INSERT INTO TEST VALUES(generate_series(1,1000));
INSERT 0 1000
postgres=# ANALYZE TEST;
ANALYZE
postgres=# ANALYZE TEST;
ANALYZE
postgres=# SELECT RELNAME,RELPAGES FROM PG_CLASS WHERE relname like 'test';
relname | relpages
---------+----------
test | 4
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Index Scan using test_pkey on test (cost=0.00..8.27 rows=1 width=4) (actual time=0.019..0.023 rows=1 loops=1)
Index Cond: (t = 10)
Total runtime: 0.078 ms
postgres=# show random_page_cost ;
random_page_cost
------------------
4
(1 row)
postgres=# show seq_page_cost ;
seq_page_cost
---------------
1
(1 row)
Let's make this to use sequentail scan.
postgres=# set random_page_cost to 10;
SET
postgres=# set seq_page_cost to 1;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=10;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..16.50 rows=1 width=4) (actual time=0.022..0.301 rows=1 loops=1)
Filter: (t = 10)
Total runtime: 0.337 ms
(3 rows)
Let's find out another way of sequential scan and index scans.
Step 1
> drop table test;
> CREATE TABLE TEST(T INT);
> INSERT INTO TEST SELECT 2 FROM GENERATE_SERIES(1,1);
> INSERT INTO TEST SELECT 1 FROM GENERATE_SERIES(1,10000);
> CREATE INDEX IDX ON TEST(T);
Step 2
Let's find out the most_comm_frequencies from pg_stats.
postgres=# SELECT * FROM PG_STATS WHERE TABLENAME='test' and schemaname='public';
postgres=# SELECT * FROM PG_STATS WHERE TABLENAME='test' and schemaname='public';
-[ RECORD 1 ]-----+-----------
schemaname | public
tablename | test
attname | t
inherited | f
null_frac | 0
avg_width | 4
n_distinct | 2
most_common_vals | {1}
most_common_freqs | {0.999833}
histogram_bounds |
correlation | 0.999001
Step 3
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..181.64 rows=11009 width=4) (actual time=0.215..94.610 rows=10000 loops=1)
Filter: (t = 1)
Total runtime: 180.054 ms
Here Seq Scan, is the best plan. Because, 99% of the table entries having "1".
Let's make it to use index scan which leads more over head.
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Index Scan using idx on test (cost=0.00..17393.65 rows=11009 width=4) (actual time=0.210..264.606 rows=10000 loops=1)
Index Cond: (t = 1)
Total runtime: 388.713 ms
See how much difference is here when compare to Seq Scan. Almost we spend 3 times of actual execution.
Step 4
postgres=# set seq_page_cost to 1;
SET
postgres=# set random_page_cost to 4;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=2;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Index Scan using idx on test (cost=0.00..8.29 rows=2 width=4) (actual time=1.567..1.572 rows=1 loops=1)
Index Cond: (t = 2)
Total runtime: 1.853 ms
(3 rows)
In this case, this plan is good. Because, we have only one entry in the table with the value 2.
Let's make this to use Seq Scan, which leads to take more time.
postgres=# set random_page_cost to 400; ("Made this to big value for picking the seq scan, instead of index scan).
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=2;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..181.64 rows=2 width=4) (actual time=0.447..32.853 rows=1 loops=1)
Filter: (t = 2)
Total runtime: 32.995 ms
See the difference, we spend nearly 32 times of prior execution.
So, as per out test cases Optimizer is always including "seq_page_cost","random_page_cost","no.of pages" and "most_ common_ values'" for making better a execution plan for the given query.
Will keep on posting on Optimizer ....
-Dinesh
his brilliant site really has the entirety of the data I needed concerning this subject and didn't have a clue who to inquire.
ReplyDeletebest interiors