Skip to main content

Postgres-XC Setup


Hi ,

Finally i have configured PG-XC in my local box .. Please find the below steps how to configure PG-XC .. 

Steps To Configure PG-XC
========================
Step 1
======
Download PG-XC Version 1.0 from the below link 

projects/postgres-xc/files/
Version_1.0/pgxc-v1.0.0.tar.gz/download

Step 2
======
mkdir -p /opt/Postgres-xc
chown -R postgres:postgres /opt/Postgres-xc/
tar -zxvf pgxc-v1.0.0.tar.gz

Step 3
======
Pre-Requistes
----------------
Readline,Bison,Flex
yum -y install readline*
yum -y install bison*
yum -y install flex*

./configure --prefix=/opt/Postgres-xc/
make
make install

Step 4
======
Setup of GTM  {Global Transaction Manager}
------------------------------
--------------------
-bash-4.1$ mkdir data_gtm
-bash-4.1$ chmod 700 data_gtm/
-bash-4.1$ /opt/Postgres-xc/bin/initgtm -Z gtm -D /usr/local/pgsql/data_gtm   

# It will create gtm.conf file under data_gtm location .. Then change the port of gtm server and listen_addresses if required.

Below are mine settings ..

nodename = 'GTM_Node'                      # Specifies the node name.
                                        # (changes requires restart)
listen_addresses = '*'                  # Listen addresses of this GTM.
                                        # (changes requires restart)
port = 7777  

-bash-4.1$ /opt/Postgres-xc/bin/gtm_ctl -Z gtm start -D /opt/Postgres-xc/data_gtm 
Server Started

Step 5
======
Setup of Co-Ordinator 
--------------------------
-bash-4.1$ mkdir data_coord1
-bash-4.1$ chmod 700 data_coord1/
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl -D  /opt/Postgres-xc/data_coord1/ -o '--nodename coord1' initdb #It will initialize the PostgreSQL Cluster

# We need to configure this Co-Ordinator to connect as a node to GTM .. 
listen_addresses = '*'
port = 2345
gtm_host = 'localhost'                                                          
gtm_port = 7777                                                         
pgxc_node_name = 'coord1'                                                               
pooler_port = 2344                      
min_pool_size = 1                       
max_pool_size = 100                     
persistent_datanode_
connections = on                                            
max_coordinators = 16                   
max_datanodes = 16                      

-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl start -D /opt/Postgres-xc/data_coord1/ -Z coordinator -l /tmp/logfile_cord #Staring this initialized cluster as a PG-XC Co-Ordinator

Step 6
======
Setup of Data Node {Datanode1}
------------------------------
-------
-bash-4.1$ mkdir data_node1
-bash-4.1$ chmod 700 data_node1/
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl -D  /opt/Postgres-xc/data_node1/ -o '--nodename datanode1' initdb #It will initialize the PostgreSQL Cluster

#We need to configure this Data Node as below..

listen_addresses = '*'
port = 1234
gtm_host = 'localhost'                  
gtm_port = 7777                 
pgxc_node_name = 'datanode1'                    

-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl start -D /opt/Postgres-xc/data_node1 -Z datanode -l /tmp/logfile_datanode1 #Starting this initialized cluster as a Data Node 

Step 7
======
Setup of Data Node {Datanode2}
------------------------------
-------
-bash-4.1$ mkdir data_node2
-bash-4.1$ chmod 700 data_node2/
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl -D  /opt/Postgres-xc/data_node2/ -o '--nodename datanode2' initdb #It will initialize the PostgreSQL Cluster

#We need to configure this Data Node as below..

listen_addresses = '*'
port = 1233
gtm_host = 'localhost'                  
gtm_port = 7777                 
pgxc_node_name = 'datanode2'                    

-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl start -D /opt/Postgres-xc/data_node2 -Z datanode -l /tmp/logfile_datanode2 #Starting this initialized cluster as a Data Node 

Step 8
======
Creating Nodes @Co-Ordinator
------------------------------
-----
-bash-4.1$ ../bin/psql -p 2345
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.

postgres=# CREATE NODE datanode1 WITH ( TYPE = DATANODE , HOST = LOCALHOST , PORT = 1234 );
CREATE NODE

postgres=# CREATE NODE datanode2 WITH ( TYPE = DATANODE , HOST = LOCALHOST , PORT = 1233 );
CREATE NODE


Step 9
======
Distributing By Replication
------------------------------
-
-bash-4.1$ ../bin/psql -p 2345
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.
postgres=# CREATE TABLE DIST_REP(T INT) DISTRIBUTE BY REPLICATION TO NODE datanode1,datanode2;
CREATE TABLE

postgres=# INSERT INTO DIST_REP VALUES(GENERATE_SERIES(1,100))
;
INSERT 0 100

Explain Plan
--------------
postgres=# EXPLAIN ANALYZE SELECT * FROM DIST_REP;
                                                       QUERY PLAN                                                       
------------------------------
------------------------------
------------------------------------------------------------
 Data Node Scan on "__REMOTE_FQS_QUERY__"  (cost=0.00..0.00 rows=0 width=0) (actual time=0.797..0.864 rows=100 loops=1)
   Node/s: datanode1
 Total runtime: 0.899 ms
(3 rows)

Datanode 1
-------------
-bash-4.1$ ./psql -p 1234
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.

postgres=# SELECT COUNT(*) FROM DIST_REP;
WARNING:  Do not have a GTM snapshot available
WARNING:  Do not have a GTM snapshot available
 count 
-------
   100
(1 row)

Datanode 2
-------------
-bash-4.1$ ./psql -p 1233
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.

postgres=# SELECT COUNT(*) FROM DIST_REP;
WARNING:  Do not have a GTM snapshot available
WARNING:  Do not have a GTM snapshot available
 count 
-------
   100
(1 row)

Step 10
=======
Distributing by HASH {It's the similar behaviour what we have in PL/Proxy Hashing Mechanism}
------------------------------
------------------------------
--------------------------------
-bash-4.1$ ./psql -p 2345
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.

postgres=# CREATE TABLE DIST_HASH(T INT) DISTRIBUTE BY HASH(T) TO NODE datanode1,datanode2;
CREATE TABLE

postgres=# INSERT INTO DIST_HASH VALUES(1);
INSERT 0 1

--Inserted 1 to 4 Rows ..

Explain Plan
---------------
postgres=# EXPLAIN ANALYZE select * from dist_hash where t=1;
                                                      QUERY PLAN                                                      
------------------------------
------------------------------
----------------------------------------------------------
 Data Node Scan on "__REMOTE_FQS_QUERY__"  (cost=0.00..0.00 rows=0 width=0) (actual time=0.815..0.816 rows=1 loops=1)
   Node/s: datanode1
 Total runtime: 0.840 ms
(3 rows)


postgres=# EXPLAIN ANALYZE select * from dist_hash where t=4;
                                                      QUERY PLAN                                                      
------------------------------
------------------------------
----------------------------------------------------------
 Data Node Scan on "__REMOTE_FQS_QUERY__"  (cost=0.00..0.00 rows=0 width=0) (actual time=0.906..0.907 rows=1 loops=1)
   Node/s: datanode2
 Total runtime: 0.928 ms
(3 rows)


Datanode 1
-------------
-bash-4.1$ ./psql -p 1234
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.

postgres=# SELECT * FROM DIST_HASH;
WARNING:  Do not have a GTM snapshot available
WARNING:  Do not have a GTM snapshot available
 t 
---
 1
 2
(2 rows)

Datanode 2
-------------
-bash-4.1$ ./psql -p 1233
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.

postgres=# SELECT * FROM DIST_HASH;
WARNING:  Do not have a GTM snapshot available
WARNING:  Do not have a GTM snapshot available
 t 
---
 3
 4
(2 rows)

Step 11
=======
How to stop the PG-XC Nodes
---------------------------

Stop Co-Ordinator
--------------------
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl stop -D /opt/Postgres-xc/data_coord1/ -Z coordinator -l /tmp/logfile_cord -mf

Stop Datanode
-----------------
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl stop -D /opt/Postgres-xc/data_node1 -Z datanode -l /tmp/logfile_datanode1 -mf

Stop GTM
------------
-bash-4.1$ /opt/Postgres-xc/bin/gtm_ctl -Z gtm start -D /opt/Postgres-xc/data_gtm

Will keep post more about this wonderful tool ..

--Dinesh

Comments

  1. Hi! Good post! I have a few clarifications:
    1. So how does Postgres-XC achieve master-master & read-write scalability(especially write scalability).
    I created as per the above and when I created datasets in node1, it was not getting reflected on node2.
    2. Similarly, when I removed some datasets from node1, it was neither getting removed from the coordinator nor node2.

    ReplyDelete
    Replies
    1. Thanks for the comments.

      Which DISTRIBUTION method you have used. If you want to implement multi-master, we have to use DISTRIBUTED BY REPLICATION method

      Delete
    2. Yes, I have used DISTRIBUTED BY REPLICATION only. Still I don't see changes made on Node 1 getting reflected on Node 2.
      Its goes like this:
      1. Any changes I make in coordinator, its getting reflected on all three - Coordinator, Node 1, Node 2.
      2. Changes made in Node 1 are getting reflected on - Node 1 & Coordinator. (Not in Node 2)
      3. Changes made in Node 2 are getting reflected on - Node 2 alone (Not in Coordinator & Node 1)

      It's like Node 2 is partially disconnected. I checked the .conf files on all four - GTM, Coordinator, Node 1 & Node 2. Everything seems to be perfect there.

      Thanks in advance!

      Delete
    3. Hi

      If i remember correctly,

      Distributed by replication is for read scale => Use coordinator to do DML, and fetch records from individual nodes
      Distributed by hash is for write scale => Use nodes to do DML, and fetch records from Coordinator.

      -Dinesh

      Delete
    4. Thanks Dinesh! Tried HASH Distribution, that seems to have done the trick and it's working perfectly.

      Delete
  2. Hi Dinesh,
    I have configured two gtm (gtm1,gtm2) in two different machine using postgres-xc. And configured coordinator1,datanode1 in gtm1 machine and coordinator2,datanode2 in gtm2 machine. I have logged in both coordinator servers and created the datanode1 and datanode2 in both server. When i try to create the tables using the both node i am getting the error "Failed to get Pooled connection" and when i check the datanode1 log file found the error like this "unexpected EOF on client connection" . Please help me on this.

    ReplyDelete
    Replies
    1. Oh. Are these two GTMs are independent.

      Delete
    2. Yes Dinesh.. gtm1 is in separate server and Gtm2 is in separate server.

      Delete

Post a Comment

Popular posts from this blog

How To Send E-Mail From PostgreSQL

Hi , If you want to send E-Mails from PostgreSQL, then use the below Python 3.2 Script as below. I have used ActivePython 3.2 with PostgreSQL 9.1 for sending E-Mails from PostgreSQL. If you want to configure the Python 3.2 with PostgreSQL 9.1 then, please refer the below steps. http://manojadinesh.blogspot.in/2012/06/fatal-python-error-pyinitialize-unable.html Once, your Python 3.2 successful then follow the below steps to send an e-mail. Step 1 ===== postgres=# CREATE OR REPLACE FUNCTION public.send_email(_from Text,_password Text,smtp Text,port INT,receiver text, subject text, send_message text) RETURNS TEXT  LANGUAGE plpython3u AS $function$ import smtplib sender = _from receivers = receiver message = ("From: %s\nTo: %s\nSubject: %s\n\n %s"  % (_from,receiver,subject,send_message)) try:   smtpObj = smtplib.SMTP(smtp,port)   smtpObj.starttls()   smtpObj.login(_from, _password)   smtpObj.sendmail(sender, receivers,message) ...

Parallel Operations With pl/pgSQL

Hi, I am pretty sure that, there will be a right heading for this post. For now, i am going with this. If you could suggest me proper heading, i will update it :-) OK. let me explain the situation. Then will let you know what i am trying to do here, and how i did it. Situation here is, We have a table, which we need to run update on “R” no.of records. The update query is using some joins to get the desired result, and do update the table.  To process these “R” no.of records, it is taking “H” no.of hours. That too, it’s giving load on the production server. So, we planned to run this UPDATE as batch process.  Per a batch process, we took “N” no.or records. To process this batch UPDATE, it is taking “S” no.of seconds. With the above batch process, production server is pretty stable, and doing great. So, we planned to run these Batch updates parallel.  I mean, “K” sessions, running different record UPDATEs. Of-course, we can also increase the Batch size ...

Pgpool Configuration & Failback

I would like to share the pgpool configuration, and it's failback mechanism in this post. Hope it will be helpful to you in creating pgpool and it's failback setup. Pgpool Installation & Configuration 1. Download the pgpool from below link(Latest version is 3.2.1).     http://www.pgpool.net/mediawiki/index.php/Downloads 
2. Untart the pgpool-II-3.2.1.tar.gz and goto pgpool-II-3.2.1 directory. 3. Install the pgpool by executing the below commands:   ./configure ­­prefix=/opt/PostgreSQL92/ ­­--with­-pgsql­-includedir=/opt/PostgreSQL92/include/ --with­-pgsql­-libdir=/opt/PostgreSQL92/lib/ make make install 4. You can see the pgpool files in /opt/PostgreSQL92/bin location. /opt/PostgreSQL92/bin $ ls clusterdb   droplang  pcp_attach_node  pcp_proc_count pcp_systemdb_info  pg_controldata  pgpool pg_test_fsync pltcl_loadmod  reindexdb createdb    dropuser  pcp_detach_node  pcp_proc_info createla...