PostgreSQL Clustering

Recently I have evaluated different tools to Cluster PostgreSQL Database.

Key Criteria of Evaluation

Open source
Impact on Application
Performance gain, connection pooling support, High Availability, Support to advance feature such as Sharding.
Documentation and Community support

Available Platform and Features Matrix

Program

License

Platform supported

Maturity

Replication Method

Sync

scalability

Connection Pooling

Load Balancing

Query Partitioning

Architecture

Support

Postgres-xc

BSD

All Linux

Stable

Master-Master

Synchronous

read and write

Yes

shared-nothing

NTT Open Source Software Center (Japanese)
EnterpriseDB

Pgpool-II

BSD

CentOS

Stable

Statement-Based Middleware

Synchronous

read

Yes

Proxy between application and database

SRA OSS, Inc. Japan

PgCluster

BSD

Died

Master-Master

Synchronous

Yes

Its not more active project

slony

BSD

Stable

Master-Slave

Asynchronous

Its replication based tool

Slony

Bucardo

BSD

Stable

Master-Master, Master-Slave

Asynchronous

Londiste

BSD

Stable

Master-Slave

Asynchronous

Platform Matrix : http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling

List of Platform: http://www.postgresql.org/download/products/3-clusteringreplication/

I found PostgreSQL-XC & PgPool-II much closer to our requirement.

Criteria	PostgreSQL-XC	PgPool-II
Replication Method	All Node act as master and accept both Read-Write operation	1) Write on Master and read from Slave nodes. 2) Need to use Streaming Replication to replicate writes on Slave nodes
Architecture	shared-nothing	Act as middleware between Application and PostgreSQL database.
PostgreSQL Version	Postgre-XC 1.1 comes with PostgreSQL 9.2	PgPool-II 3.3.2 comes with PostgreSQL 9.2 & 9.3 both
Optimizer	Since its shared nothing distributed database, PostgreSQL-XC has to analyze and optimize the SQL. This optimizer is different than built-in PostgreSQL optimizer. So Much more scope of improvement on SQL Optimizer side	PgPool analyze SQL for Read (SELECT) and Write (INSERT/UPDATE/DELETE/TRUNCATE/COMMIT/ROLLBACK) type of operation and forwards SQL either to Master (write) or slave(read). Actual SQL Optimization is done by PostgreSQL Database. So all optimization features of PostgreSQL optimizer holds true in this case.
Advantages	1. Most active community than PgPool-II 2. Provides both Read-Write Scalability 3. Replication can be done at table level, No need of additional framework	1. Act as middleware, No Changes to Database design 2. Act as Connection pool, Load balancer for read operations, Failover for HA 3. Easy to implement than PostgreSQL-XC 4. Even if the PgPool Node goes down, application can still bypass the PgPool and connect to database. 5. Provides result and table cache features.
Disadvantages	1. Impact on Database design, for every table will have to design distribution key 2. Does not support Join operation across the Data node 3. Additional node will be required for Global Transaction Management 4. Table level constraints are not supported across cluster 5. Foreign Keys are not supported 6. Failure of 1 node will lead to data unavailability.	Streaming Replication must be Synchronous
Documentation	Well explained	Adequate
Application Compatibility	if we want to use Sharding then PostgreSQL-XC could be considered. However due to above limitation and impact on database design, this will not be good choice.	At present we are using PGbouncer for Connection pooling. PgPool-II can be used to simply replace PGbouncer and use to separate out read-write operations. This is completely transparent to Application and has no impact on Database design. In future once all restriction on Parallel SQL execution are resolved, even the read operation can be executed in parallel on all the slave node. http://girders.org/blog/2012/09/29/scaling-postgresql-with-pgpool-and-pgbouncer/ explains benefit of PgPool over PgBouncer

1) PgPool-II with Streaming Replication

2) PostgreSQL-XC project :

Reference:

1. PostgreSQL-XC project :

2. PgPool-II:

PgPool-II installation process is explained here : http://www.pgpool.net/docs/latest/pgpool-en.html#install

Santosh Kangane

Search This Blog

PostgreSQL Clustering

Comments

Post a Comment

Popular posts from this blog

IP – Based Storage Area Network Configuration using iSCSI

Drop all Objects from Schema In Postgres

Distributed transaction in Oracle ( Over Oracle DBLink)