Hi ,
Finally i have configured PG-XC in my local box .. Please find the below steps how to configure PG-XC ..
Steps To Configure PG-XC
========================
Step 1
======
Download PG-XC Version 1.0 from the below link
projects/postgres-xc/files/
Version_1.0/pgxc-v1.0.0.tar.gz/download
Step 2
======
mkdir -p /opt/Postgres-xc
chown -R postgres:postgres /opt/Postgres-xc/
tar -zxvf pgxc-v1.0.0.tar.gz
Step 3
======
Pre-Requistes
----------------
Readline,Bison,Flex
yum -y install readline*
yum -y install bison*
yum -y install flex*
./configure --prefix=/opt/Postgres-xc/
make
make install
Step 4
======
Setup of GTM {Global Transaction Manager}
------------------------------
--------------------
-bash-4.1$ mkdir data_gtm
-bash-4.1$ chmod 700 data_gtm/
-bash-4.1$ /opt/Postgres-xc/bin/initgtm -Z gtm -D /usr/local/pgsql/data_gtm
# It will create gtm.conf file under data_gtm location .. Then change the port of gtm server and listen_addresses if required.
Below are mine settings ..
nodename = 'GTM_Node' # Specifies the node name.
# (changes requires restart)
listen_addresses = '*' # Listen addresses of this GTM.
# (changes requires restart)
port = 7777
-bash-4.1$ /opt/Postgres-xc/bin/gtm_ctl -Z gtm start -D /opt/Postgres-xc/data_gtm
Server Started
Step 5
======
Setup of Co-Ordinator
--------------------------
-bash-4.1$ mkdir data_coord1
-bash-4.1$ chmod 700 data_coord1/
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl -D /opt/Postgres-xc/data_coord1/ -o '--nodename coord1' initdb #It will initialize the PostgreSQL Cluster
# We need to configure this Co-Ordinator to connect as a node to GTM ..
listen_addresses = '*'
port = 2345
gtm_host = 'localhost'
gtm_port = 7777
pgxc_node_name = 'coord1'
pooler_port = 2344
min_pool_size = 1
max_pool_size = 100
persistent_datanode_
connections = on
max_coordinators = 16
max_datanodes = 16
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl start -D /opt/Postgres-xc/data_coord1/ -Z coordinator -l /tmp/logfile_cord #Staring this initialized cluster as a PG-XC Co-Ordinator
Step 6
======
Setup of Data Node {Datanode1}
------------------------------
-------
-bash-4.1$ mkdir data_node1
-bash-4.1$ chmod 700 data_node1/
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl -D /opt/Postgres-xc/data_node1/ -o '--nodename datanode1' initdb #It will initialize the PostgreSQL Cluster
#We need to configure this Data Node as below..
listen_addresses = '*'
port = 1234
gtm_host = 'localhost'
gtm_port = 7777
pgxc_node_name = 'datanode1'
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl start -D /opt/Postgres-xc/data_node1 -Z datanode -l /tmp/logfile_datanode1 #Starting this initialized cluster as a Data Node
Step 7
======
Setup of Data Node {Datanode2}
------------------------------
-------
-bash-4.1$ mkdir data_node2
-bash-4.1$ chmod 700 data_node2/
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl -D /opt/Postgres-xc/data_node2/ -o '--nodename datanode2' initdb #It will initialize the PostgreSQL Cluster
#We need to configure this Data Node as below..
listen_addresses = '*'
port = 1233
gtm_host = 'localhost'
gtm_port = 7777
pgxc_node_name = 'datanode2'
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl start -D /opt/Postgres-xc/data_node2 -Z datanode -l /tmp/logfile_datanode2 #Starting this initialized cluster as a Data Node
Step 8
======
Creating Nodes @Co-Ordinator
------------------------------
-----
-bash-4.1$ ../bin/psql -p 2345
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.
postgres=# CREATE NODE datanode1 WITH ( TYPE = DATANODE , HOST = LOCALHOST , PORT = 1234 );
CREATE NODE
postgres=# CREATE NODE datanode2 WITH ( TYPE = DATANODE , HOST = LOCALHOST , PORT = 1233 );
CREATE NODE
Step 9
======
Distributing By Replication
------------------------------
-
-bash-4.1$ ../bin/psql -p 2345
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.
postgres=# CREATE TABLE DIST_REP(T INT) DISTRIBUTE BY REPLICATION TO NODE datanode1,datanode2;
CREATE TABLE
postgres=# INSERT INTO DIST_REP VALUES(GENERATE_SERIES(1,100))
;
INSERT 0 100
Explain Plan
--------------
postgres=# EXPLAIN ANALYZE SELECT * FROM DIST_REP;
QUERY PLAN
------------------------------
------------------------------
------------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) (actual time=0.797..0.864 rows=100 loops=1)
Node/s: datanode1
Total runtime: 0.899 ms
(3 rows)
Datanode 1
-------------
-bash-4.1$ ./psql -p 1234
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.
postgres=# SELECT COUNT(*) FROM DIST_REP;
WARNING: Do not have a GTM snapshot available
WARNING: Do not have a GTM snapshot available
count
-------
100
(1 row)
Datanode 2
-------------
-bash-4.1$ ./psql -p 1233
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.
postgres=# SELECT COUNT(*) FROM DIST_REP;
WARNING: Do not have a GTM snapshot available
WARNING: Do not have a GTM snapshot available
count
-------
100
(1 row)
Step 10
=======
Distributing by HASH {It's the similar behaviour what we have in PL/Proxy Hashing Mechanism}
------------------------------
------------------------------
--------------------------------
-bash-4.1$ ./psql -p 2345
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.
postgres=# CREATE TABLE DIST_HASH(T INT) DISTRIBUTE BY HASH(T) TO NODE datanode1,datanode2;
CREATE TABLE
postgres=# INSERT INTO DIST_HASH VALUES(1);
INSERT 0 1
--Inserted 1 to 4 Rows ..
Explain Plan
---------------
postgres=# EXPLAIN ANALYZE select * from dist_hash where t=1;
QUERY PLAN
------------------------------
------------------------------
----------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) (actual time=0.815..0.816 rows=1 loops=1)
Node/s: datanode1
Total runtime: 0.840 ms
(3 rows)
postgres=# EXPLAIN ANALYZE select * from dist_hash where t=4;
QUERY PLAN
------------------------------
------------------------------
----------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) (actual time=0.906..0.907 rows=1 loops=1)
Node/s: datanode2
Total runtime: 0.928 ms
(3 rows)
Datanode 1
-------------
-bash-4.1$ ./psql -p 1234
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.
postgres=# SELECT * FROM DIST_HASH;
WARNING: Do not have a GTM snapshot available
WARNING: Do not have a GTM snapshot available
t
---
1
2
(2 rows)
Datanode 2
-------------
-bash-4.1$ ./psql -p 1233
psql (PGXC 1.0.0, based on PG 9.1.4)
Type "help" for help.
postgres=# SELECT * FROM DIST_HASH;
WARNING: Do not have a GTM snapshot available
WARNING: Do not have a GTM snapshot available
t
---
3
4
(2 rows)
Step 11
=======
How to stop the PG-XC Nodes
---------------------------
Stop Co-Ordinator
--------------------
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl stop -D /opt/Postgres-xc/data_coord1/ -Z coordinator -l /tmp/logfile_cord -mf
Stop Datanode
-----------------
-bash-4.1$ /opt/Postgres-xc/bin/pg_ctl stop -D /opt/Postgres-xc/data_node1 -Z datanode -l /tmp/logfile_datanode1 -mf
Stop GTM
------------
-bash-4.1$ /opt/Postgres-xc/bin/gtm_ctl -Z gtm start -D /opt/Postgres-xc/data_gtm
Will keep post more about this wonderful tool ..
--Dinesh
Hi! Good post! I have a few clarifications:
ReplyDelete1. So how does Postgres-XC achieve master-master & read-write scalability(especially write scalability).
I created as per the above and when I created datasets in node1, it was not getting reflected on node2.
2. Similarly, when I removed some datasets from node1, it was neither getting removed from the coordinator nor node2.
Thanks for the comments.
DeleteWhich DISTRIBUTION method you have used. If you want to implement multi-master, we have to use DISTRIBUTED BY REPLICATION method
Yes, I have used DISTRIBUTED BY REPLICATION only. Still I don't see changes made on Node 1 getting reflected on Node 2.
DeleteIts goes like this:
1. Any changes I make in coordinator, its getting reflected on all three - Coordinator, Node 1, Node 2.
2. Changes made in Node 1 are getting reflected on - Node 1 & Coordinator. (Not in Node 2)
3. Changes made in Node 2 are getting reflected on - Node 2 alone (Not in Coordinator & Node 1)
It's like Node 2 is partially disconnected. I checked the .conf files on all four - GTM, Coordinator, Node 1 & Node 2. Everything seems to be perfect there.
Thanks in advance!
Hi
DeleteIf i remember correctly,
Distributed by replication is for read scale => Use coordinator to do DML, and fetch records from individual nodes
Distributed by hash is for write scale => Use nodes to do DML, and fetch records from Coordinator.
-Dinesh
Thanks Dinesh! Tried HASH Distribution, that seems to have done the trick and it's working perfectly.
DeleteHi Dinesh,
ReplyDeleteI have configured two gtm (gtm1,gtm2) in two different machine using postgres-xc. And configured coordinator1,datanode1 in gtm1 machine and coordinator2,datanode2 in gtm2 machine. I have logged in both coordinator servers and created the datanode1 and datanode2 in both server. When i try to create the tables using the both node i am getting the error "Failed to get Pooled connection" and when i check the datanode1 log file found the error like this "unexpected EOF on client connection" . Please help me on this.
Oh. Are these two GTMs are independent.
DeleteYes Dinesh.. gtm1 is in separate server and Gtm2 is in separate server.
DeleteHi nice reading yourr blog
ReplyDelete