Skip to main content

Query Tuning Tips

Query Tuning
-------------------

CO-Relations (vs) Joins (vs) Sets
++++++++++++++++++++++++++++++++++

Co-Relations/Joins/Sets are actually using for joint or disjoint collections.

1. Co-Relations sub queries are high cost expensive queries when compare to Joins.
2. However, Co-Relations are using less memory when compared to Joins.
3. Joins are always using some builting algorithms like (Hash join (Small set of rows), Merge Join (Huge set of row + Sort)) which makes the things very faster.

Find the below examples

-> CREATE TABLE TEST(T INT);
-> CREATE TABLE TEST1(T1 INT);
-> INSERT INTO TEST VALUES(GENERATE_SERIES(1,1000));
-> INSERT INTO TEST VALUES(GENERATE_SERIES(500,1499));
CO-Relation
-----------
postgres=# EXPLAIN SELECT * FROM TEST AS OUT WHERE OUT.T IN (SELECT INN.T1 FROM TEST1 AS INN WHERE INN.T1=OUT.T);
QUERY PLAN
------------------------------------------------------------------
Seq Scan on test "out" (cost=0.00..48043.00 rows=1200 width=4)
Filter: (SubPlan 1)
SubPlan 1
-> Seq Scan on test1 inn (cost=0.00..40.00 rows=1 width=4)
Filter: (t1 = "out".t)
(5 rows)
If you want to tune this Co-Relation then use "EXISTS" instead of "IN" Operation.(Note:- Replacing EXISTS/NOT EXISTS with IN/NOT IN is depends on the subquery logic).

postgres=# EXPLAIN SELECT * FROM TEST AS OUT WHERE EXISTS (SELECT INN.T1 FROM TEST1 AS INN WHERE INN.T1=OUT.T);
QUERY PLAN
-------------------------------------------------------------------------
Hash Semi Join (cost=26.50..54.25 rows=1000 width=4)
Hash Cond: ("out".t = inn.t1)
-> Seq Scan on test "out" (cost=0.00..14.00 rows=1000 width=4)
-> Hash (cost=14.00..14.00 rows=1000 width=4)
-> Seq Scan on test1 inn (cost=0.00..14.00 rows=1000 width=4)
(5 rows)
Observer the cost of both queries.
Joins
-----
postgres=# EXPLAIN SELECT * FROM TEST AS OUT INNER JOIN TEST1 AS INN ON INN.T1=OUT.T;
QUERY PLAN
-------------------------------------------------------------------------
Hash Join (cost=26.50..54.25 rows=1000 width=8)
Hash Cond: ("out".t = inn.t1)
-> Seq Scan on test "out" (cost=0.00..14.00 rows=1000 width=4)
-> Hash (cost=14.00..14.00 rows=1000 width=4)
-> Seq Scan on test1 inn (cost=0.00..14.00 rows=1000 width=4)
(5 rows)
Set Operators(Intersect)
------------------------
postgres=# EXPLAIN SELECT * FROM TEST INTERSECT SELECT * FROM TEST1;                
QUERY PLAN
---------------------------------------------------------------------------------
HashSetOp Intersect (cost=0.00..53.00 rows=1000 width=4)
-> Append (cost=0.00..48.00 rows=2000 width=4)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..24.00 rows=1000 width=4)
-> Seq Scan on test (cost=0.00..14.00 rows=1000 width=4)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..24.00 rows=1000 width=4)
-> Seq Scan on test1 (cost=0.00..14.00 rows=1000 width=4)
In the above three plans Set Operator's plan showing optimal result. We mayn't the same for all the cases. The execution plan always depends on the number of conditions what we are implying and the no.of rows and the row length.

In some of the cases Join Operations gives us very less cost when compares to Intersect operation.Because, Set operators always uses some internal SORT algorithms.
Inserted more records.

postgres=# SELECT COUNT(*) FROM TEST;
count
-------
32000
(1 row)

postgres=# SELECT COUNT(*) FROM TEST1;
count
-------
32000
(1 row)
Co-Relations
------------
postgres=# EXPLAIN SELECT * FROM TEST AS OUT WHERE OUT.T IN (SELECT INN.T1 FROM TEST1 AS INN WHERE INN.T1=OUT.T);
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on test "out" (cost=0.00..8417806.00 rows=16000 width=4)
Filter: (SubPlan 1)
SubPlan 1
-> Seq Scan on test1 inn (cost=0.00..526.00 rows=32 width=4)
Filter: (t1 = "out".t)
(5 rows)

postgres=# EXPLAIN SELECT * FROM TEST AS OUT WHERE EXISTS (SELECT INN.T1 FROM TEST1 AS INN WHERE INN.T1=OUT.T);
QUERY PLAN
---------------------------------------------------------------------------
Hash Semi Join (cost=846.00..1772.00 rows=32000 width=4)
Hash Cond: ("out".t = inn.t1)
-> Seq Scan on test "out" (cost=0.00..446.00 rows=32000 width=4)
-> Hash (cost=446.00..446.00 rows=32000 width=4)
-> Seq Scan on test1 inn (cost=0.00..446.00 rows=32000 width=4)
(5 rows)
Joins
------
postgres=# EXPLAIN SELECT * FROM TEST AS OUT INNER JOIN TEST1 AS INN ON INN.T1=OUT.T;
QUERY PLAN
---------------------------------------------------------------------------
Hash Join (cost=846.00..12892.00 rows=1024000 width=8)
Hash Cond: ("out".t = inn.t1)
-> Seq Scan on test "out" (cost=0.00..446.00 rows=32000 width=4)
-> Hash (cost=446.00..446.00 rows=32000 width=4)
-> Seq Scan on test1 inn (cost=0.00..446.00 rows=32000 width=4)
(5 rows)

Set Operators
-------------
postgres=# EXPLAIN SELECT * FROM TEST INTERSECT SELECT * FROM TEST1;                                            
QUERY PLAN
-----------------------------------------------------------------------------------
HashSetOp Intersect (cost=0.00..1692.00 rows=1000 width=4)
-> Append (cost=0.00..1532.00 rows=64000 width=4)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..766.00 rows=32000 width=4)
-> Seq Scan on test (cost=0.00..446.00 rows=32000 width=4)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..766.00 rows=32000 width=4)
-> Seq Scan on test1 (cost=0.00..446.00 rows=32000 width=4)
(6 rows)
We need to check all the cases for fine tuning the query. Sometimes, we might get a better plan with JOINS when compares to INTERSECT.

We may get the co-related queries with DELETE/UPDATE Statements also. So, please re-write them with USING/FROM always. If possible we can go with INTERSECT also.

UPDATE
------

Actual
------
UPDATE Test AS OUT
SET T = ( SELECT T1 FROM Test1 INN WHERE INN.T=OUT.T);
Recommended
-----------
UPDATE Test AS OUT
SET T = T1
FROM Test1 AS INN Where INN.T=OUT.T;
DELETE
------
Actual
------
DELETE From Test As OUT Where NOT Exists (Select INN.T1 FROM TEST1 Where OUT.T!=INN.T1);

Recommended
-----------
DELETE From Test As Out USING Test1 as Inn where Out.T!=Inn.T1
Equating Single Values (vs) In Single Value
-------------------------------------------

There could be slight cheper cost when compares to "=" and "IN". Find the below test case with "=" and using "IN".
postgres=# BEGIN WORK;
postgres=# DELETE FROM TEST WHERE T =1;
DELETE 64
Time: 65.426 ms
postgres=# ROLLBACK;

postgres=# BEGIN WORK;
postgres=# DELETE FROM TEST WHERE T IN(1);
DELETE 64
Time: 34.683 ms
postgres=# ROLLBACK;
UNION (vs) FULL OUTER JOINS
---------------------------

Actual Query with Unions.
-------------------------
SELECT * FROM test WHERE true AND t IN (2,3,4,5) AND mp > 0
UNION
SELECT * FROM test WHERE true AND t IN (2,3,4,5) AND sp > 0
UNION
SELECT * FROM TEST WHERE TRUE AND T IN (2,3,4,5) AND TP>0;

Most of the Set operators are using Sort operations. If the sort operation is taking much more time, then following one is recommended with the indexes.
And with FullOuter Joins.
-------------------------
SELECT coalesce(a.*,b.*,c.*) FROM
(SELECT * FROM test WHERE true AND t IN (2,3,4,5)AND mp > 0) AS A
FULL OUTER JOIN
(SELECT * FROM test WHERE true AND t IN (2,3,4,5)AND sp> 0) AS B
ON A.t=B.t and a.mp=b.mp and a.sp=b.sp and a.tp=b.tp
FULL OUTER JOIN
(SELECT * FROM test WHERE true AND t IN (2,3,4,5)AND tp> 0) AS C
ON B.t=C.t and B.mp=C.mp and C.sp=b.sp and C.tp=b.tp;
DISTINCT (vs) Group By
----------------------
DISTINCT operation also using "SORT" when it's in action. So, please use Group by instead of DISTINCT.

Actual
------
SELECT DISTINCT T FROM TEST;
Recommended
-----------
SELECT T FROM TEST GROUP BY T;

Comments

Popular posts from this blog

Pgpool Configuration & Failback

I would like to share the pgpool configuration, and it's failback mechanism in this post.

Hope it will be helpful to you in creating pgpool and it's failback setup.

Pgpool Installation & Configuration

1. Download the pgpool from below link(Latest version is 3.2.1).
    http://www.pgpool.net/mediawiki/index.php/Downloads


2. Untart the pgpool-II-3.2.1.tar.gz and goto pgpool-II-3.2.1 directory.

3. Install the pgpool by executing the below commands:

./configure ­­prefix=/opt/PostgreSQL92/ ­­--with­-pgsql­-includedir=/opt/PostgreSQL92/include/ --with­-pgsql­-libdir=/opt/PostgreSQL92/lib/ make make install 4. You can see the pgpool files in /opt/PostgreSQL92/bin location.
/opt/PostgreSQL92/bin $ ls clusterdb   droplang  pcp_attach_node  pcp_proc_count pcp_systemdb_info  pg_controldata  pgpool pg_test_fsync pltcl_loadmod  reindexdb createdb    dropuser  pcp_detach_node  pcp_proc_info createlang  ecpg      pcp_node_count   pcp_promote_node oid2name  pcp_pool_status  pcp_stop_pgpool  …

pgBucket - A new concurrent job scheduler

Hi All,

I'm so excited to announce about my first contribution tool for postgresql. I have been working with PostgreSQL from 2011 and I'm really impressed with such a nice database.

I started few projects in last 2 years like pgHawk[A beautiful report generator for Openwatch] , pgOwlt [CUI monitoring. It is still under development, incase you are interested to see what it is, attaching the image here for you ],


pgBucket [Which I'm gonna talk about] and learned a lot and lot about PostgreSQL/Linux internals.

Using pgBucket we can schedule jobs easily and we can also maintain them using it's CLI options. We can update/insert/delete jobs at online. And here is its architecture which gives you a basic idea about how it works.


Yeah, I know there are other good job schedulers available for PostgreSQL. I haven't tested them and not comparing them with this, as I implemented it in my way.
Features are: OS/DB jobsCron style sytaxOnline job modificationsRequired cli options

N-Node Mutlimaster Replication With Bucardo...!

Our team recently got  a problem, which is to solve the N-Node multi master replication in PostgreSQL.

We all know that, there are some other db engines like Postgres-XC which works in this way. But, we don't have any tool available in PostgreSQL, except Bucardo.

Bucardo is the nice solution for 2-Nodes. Is there a way we can exceed this limitation from 2 to N..?

As an initial step on this, I have done with 3 Nodes, which I believe, we can extend this upto N. { I might be wrong here.}

Please follow the below steps to set up the 1 - 1 multi master replication.

1. Follow the below steps to get all the pre-requisites for the Bucardo.

yum install perl-DBIx-Safe or apt-get install libdbix-safe-perl Install the below components from CPAN. DBI DBD::Pg Test::Simple boolean (Bucardo 5.0 and higher) Download the latest tarball from here. tar xvfz Bucardo-4.4.8.tar.gz cd Bucardo-4.4.8 perl Makefile.PL make sudo make install 2. We need to create plperl extension in db. For this, download…