Skip to main content

Postgres-XC Setup


Hi ,

Finally i have configured PG-XC in my local box .. Please find the below steps how to configure PG-XC .. 

Steps To Configure PG-XC
========================
Step 1
======
Download PG-XC Version 1.0 from the below link 

projects/postgres-xc/files/
Version_1.0/pgxc-v1.0.0.tar.gz/download

Step 2
======
mkdir -p /opt/Postgres-xc
chown -R postgres:postgres /opt/Postgres-xc/
tar -zxvf pgxc-v1.0.0.tar.gz

Step 3
======
Pre-Requistes
----------------
Readline,Bison,Flex
yum -y install readline*
yum -y install bison*
yum -y install flex*

./configure --prefix=/opt/Postgres-xc/
make
make install

Step 4
======
Setup of GTM  {Global Transaction Manager}
------------------------------
--------------------
-bash-4.1$ mkdir data_gtm
-bash-4.1$ chmod 700 data_gtm/
-bash-4.1$ /opt/Postgres-xc/bin/initgtm -Z gtm -D /usr/local/pgsql/data_gtm   

# It will create gtm.conf file under data_gtm location .. Then change the port of gtm server and listen_addresses if required.

Below are mine settings ..

nodename = 'GTM_Node'                      # Specifies the node name.
                                        # (changes requires restart)
listen_addresses = '*'                  # Listen addresses of this GTM.
                                        # (changes requires restart)
port = 7777  

-bash-4.1$ /opt/Postgres-xc/bin/gtm_ctl -Z gtm start -D /opt/Postgres-xc/data_gtm 
Server Started

Step 5
======
Setup of Co-Ordinator 
--------------------------
-bash-4.1$ mkdir data_coord1
-bash-4.1$ chmod 700 data_coord1/
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl -D  /opt/Postgres-xc/data_coord1/ -o '--nodename coord1' initdb #It will initialize the PostgreSQL Cluster

# We need to configure this Co-Ordinator to connect as a node to GTM .. 
listen_addresses = '*'
port = 2345
gtm_host = 'localhost'                                                          
gtm_port = 7777                                                         
pgxc_node_name = 'coord1'                                                               
pooler_port = 2344                      
min_pool_size = 1                       
max_pool_size = 100                     
persistent_datanode_
connections = on                                            
max_coordinators = 16                   
max_datanodes = 16                      

-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl start -D /opt/Postgres-xc/data_coord1/ -Z coordinator -l /tmp/logfile_cord #Staring this initialized cluster as a PG-XC Co-Ordinator

Step 6
======
Setup of Data Node {Datanode1}
------------------------------
-------
-bash-4.1$ mkdir data_node1
-bash-4.1$ chmod 700 data_node1/
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl -D  /opt/Postgres-xc/data_node1/ -o '--nodename datanode1' initdb #It will initialize the PostgreSQL Cluster

#We need to configure this Data Node as below..

listen_addresses = '*'
port = 1234
gtm_host = 'localhost'                  
gtm_port = 7777                 
pgxc_node_name = 'datanode1'                    

-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl start -D /opt/Postgres-xc/data_node1 -Z datanode -l /tmp/logfile_datanode1 #Starting this initialized cluster as a Data Node 

Step 7
======
Setup of Data Node {Datanode2}
------------------------------
-------
-bash-4.1$ mkdir data_node2
-bash-4.1$ chmod 700 data_node2/
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl -D  /opt/Postgres-xc/data_node2/ -o '--nodename datanode2' initdb #It will initialize the PostgreSQL Cluster

#We need to configure this Data Node as below..

listen_addresses = '*'
port = 1233
gtm_host = 'localhost'                  
gtm_port = 7777                 
pgxc_node_name = 'datanode2'                    

-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl start -D /opt/Postgres-xc/data_node2 -Z datanode -l /tmp/logfile_datanode2 #Starting this initialized cluster as a Data Node 

Step 8
======
Creating Nodes @Co-Ordinator
------------------------------
-----
-bash-4.1$ ../bin/psql -p 2345
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.

postgres=# CREATE NODE datanode1 WITH ( TYPE = DATANODE , HOST = LOCALHOST , PORT = 1234 );
CREATE NODE

postgres=# CREATE NODE datanode2 WITH ( TYPE = DATANODE , HOST = LOCALHOST , PORT = 1233 );
CREATE NODE


Step 9
======
Distributing By Replication
------------------------------
-
-bash-4.1$ ../bin/psql -p 2345
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.
postgres=# CREATE TABLE DIST_REP(T INT) DISTRIBUTE BY REPLICATION TO NODE datanode1,datanode2;
CREATE TABLE

postgres=# INSERT INTO DIST_REP VALUES(GENERATE_SERIES(1,100))
;
INSERT 0 100

Explain Plan
--------------
postgres=# EXPLAIN ANALYZE SELECT * FROM DIST_REP;
                                                       QUERY PLAN                                                       
------------------------------
------------------------------
------------------------------------------------------------
 Data Node Scan on "__REMOTE_FQS_QUERY__"  (cost=0.00..0.00 rows=0 width=0) (actual time=0.797..0.864 rows=100 loops=1)
   Node/s: datanode1
 Total runtime: 0.899 ms
(3 rows)

Datanode 1
-------------
-bash-4.1$ ./psql -p 1234
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.

postgres=# SELECT COUNT(*) FROM DIST_REP;
WARNING:  Do not have a GTM snapshot available
WARNING:  Do not have a GTM snapshot available
 count 
-------
   100
(1 row)

Datanode 2
-------------
-bash-4.1$ ./psql -p 1233
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.

postgres=# SELECT COUNT(*) FROM DIST_REP;
WARNING:  Do not have a GTM snapshot available
WARNING:  Do not have a GTM snapshot available
 count 
-------
   100
(1 row)

Step 10
=======
Distributing by HASH {It's the similar behaviour what we have in PL/Proxy Hashing Mechanism}
------------------------------
------------------------------
--------------------------------
-bash-4.1$ ./psql -p 2345
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.

postgres=# CREATE TABLE DIST_HASH(T INT) DISTRIBUTE BY HASH(T) TO NODE datanode1,datanode2;
CREATE TABLE

postgres=# INSERT INTO DIST_HASH VALUES(1);
INSERT 0 1

--Inserted 1 to 4 Rows ..

Explain Plan
---------------
postgres=# EXPLAIN ANALYZE select * from dist_hash where t=1;
                                                      QUERY PLAN                                                      
------------------------------
------------------------------
----------------------------------------------------------
 Data Node Scan on "__REMOTE_FQS_QUERY__"  (cost=0.00..0.00 rows=0 width=0) (actual time=0.815..0.816 rows=1 loops=1)
   Node/s: datanode1
 Total runtime: 0.840 ms
(3 rows)


postgres=# EXPLAIN ANALYZE select * from dist_hash where t=4;
                                                      QUERY PLAN                                                      
------------------------------
------------------------------
----------------------------------------------------------
 Data Node Scan on "__REMOTE_FQS_QUERY__"  (cost=0.00..0.00 rows=0 width=0) (actual time=0.906..0.907 rows=1 loops=1)
   Node/s: datanode2
 Total runtime: 0.928 ms
(3 rows)


Datanode 1
-------------
-bash-4.1$ ./psql -p 1234
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.

postgres=# SELECT * FROM DIST_HASH;
WARNING:  Do not have a GTM snapshot available
WARNING:  Do not have a GTM snapshot available
 t 
---
 1
 2
(2 rows)

Datanode 2
-------------
-bash-4.1$ ./psql -p 1233
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.

postgres=# SELECT * FROM DIST_HASH;
WARNING:  Do not have a GTM snapshot available
WARNING:  Do not have a GTM snapshot available
 t 
---
 3
 4
(2 rows)

Step 11
=======
How to stop the PG-XC Nodes
---------------------------

Stop Co-Ordinator
--------------------
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl stop -D /opt/Postgres-xc/data_coord1/ -Z coordinator -l /tmp/logfile_cord -mf

Stop Datanode
-----------------
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl stop -D /opt/Postgres-xc/data_node1 -Z datanode -l /tmp/logfile_datanode1 -mf

Stop GTM
------------
-bash-4.1$ /opt/Postgres-xc/bin/gtm_ctl -Z gtm start -D /opt/Postgres-xc/data_gtm

Will keep post more about this wonderful tool ..

--Dinesh

Comments

  1. Hi! Good post! I have a few clarifications:
    1. So how does Postgres-XC achieve master-master & read-write scalability(especially write scalability).
    I created as per the above and when I created datasets in node1, it was not getting reflected on node2.
    2. Similarly, when I removed some datasets from node1, it was neither getting removed from the coordinator nor node2.

    ReplyDelete
    Replies
    1. Thanks for the comments.

      Which DISTRIBUTION method you have used. If you want to implement multi-master, we have to use DISTRIBUTED BY REPLICATION method

      Delete
    2. Yes, I have used DISTRIBUTED BY REPLICATION only. Still I don't see changes made on Node 1 getting reflected on Node 2.
      Its goes like this:
      1. Any changes I make in coordinator, its getting reflected on all three - Coordinator, Node 1, Node 2.
      2. Changes made in Node 1 are getting reflected on - Node 1 & Coordinator. (Not in Node 2)
      3. Changes made in Node 2 are getting reflected on - Node 2 alone (Not in Coordinator & Node 1)

      It's like Node 2 is partially disconnected. I checked the .conf files on all four - GTM, Coordinator, Node 1 & Node 2. Everything seems to be perfect there.

      Thanks in advance!

      Delete
    3. Hi

      If i remember correctly,

      Distributed by replication is for read scale => Use coordinator to do DML, and fetch records from individual nodes
      Distributed by hash is for write scale => Use nodes to do DML, and fetch records from Coordinator.

      -Dinesh

      Delete
    4. Thanks Dinesh! Tried HASH Distribution, that seems to have done the trick and it's working perfectly.

      Delete
  2. Hi Dinesh,
    I have configured two gtm (gtm1,gtm2) in two different machine using postgres-xc. And configured coordinator1,datanode1 in gtm1 machine and coordinator2,datanode2 in gtm2 machine. I have logged in both coordinator servers and created the datanode1 and datanode2 in both server. When i try to create the tables using the both node i am getting the error "Failed to get Pooled connection" and when i check the datanode1 log file found the error like this "unexpected EOF on client connection" . Please help me on this.

    ReplyDelete
    Replies
    1. Oh. Are these two GTMs are independent.

      Delete
    2. Yes Dinesh.. gtm1 is in separate server and Gtm2 is in separate server.

      Delete

Post a Comment

Popular posts from this blog

Pgpool Configuration & Failback

I would like to share the pgpool configuration, and it's failback mechanism in this post.

Hope it will be helpful to you in creating pgpool and it's failback setup.

Pgpool Installation & Configuration

1. Download the pgpool from below link(Latest version is 3.2.1).
    http://www.pgpool.net/mediawiki/index.php/Downloads


2. Untart the pgpool-II-3.2.1.tar.gz and goto pgpool-II-3.2.1 directory.

3. Install the pgpool by executing the below commands:

./configure ­­prefix=/opt/PostgreSQL92/ ­­--with­-pgsql­-includedir=/opt/PostgreSQL92/include/ --with­-pgsql­-libdir=/opt/PostgreSQL92/lib/ make make install 4. You can see the pgpool files in /opt/PostgreSQL92/bin location.
/opt/PostgreSQL92/bin $ ls clusterdb   droplang  pcp_attach_node  pcp_proc_count pcp_systemdb_info  pg_controldata  pgpool pg_test_fsync pltcl_loadmod  reindexdb createdb    dropuser  pcp_detach_node  pcp_proc_info createlang  ecpg      pcp_node_count   pcp_promote_node oid2name  pcp_pool_status  pcp_stop_pgpool  …

pgBucket v1.0 is ready

pgBucket v1.0 pgBucket v1.0 (concurrent job scheduler for PostgreSQL) is released. This version is more stable and fixed the issues which was observed in the previous beta releases.
Highlights of this tool are Schedule OS/DB level jobsCron style syntax {Schedule up to seconds}On fly job modificationsInstant daemon status by retrieving live job queue, job hashEnough cli options to deal with all the configured/scheduled job Here is the URL for the pgBucket build/usage instructions. https://bitbucket.org/dineshopenscg/pgbucket
I hope this tool will be helpful for the PostgreSQL users to get things done in the scheduled time. Note: This tool requires c++11{gcc version >= 4.9.3} to compile.
--Dinesh

pgBucket - A new concurrent job scheduler

Hi All,

I'm so excited to announce about my first contribution tool for postgresql. I have been working with PostgreSQL from 2011 and I'm really impressed with such a nice database.

I started few projects in last 2 years like pgHawk[A beautiful report generator for Openwatch] , pgOwlt [CUI monitoring. It is still under development, incase you are interested to see what it is, attaching the image here for you ],


pgBucket [Which I'm gonna talk about] and learned a lot and lot about PostgreSQL/Linux internals.

Using pgBucket we can schedule jobs easily and we can also maintain them using it's CLI options. We can update/insert/delete jobs at online. And here is its architecture which gives you a basic idea about how it works.


Yeah, I know there are other good job schedulers available for PostgreSQL. I haven't tested them and not comparing them with this, as I implemented it in my way.
Features are: OS/DB jobsCron style sytaxOnline job modificationsRequired cli options