Skip to main content

PostgreSQL Clustering

Recently I have evaluated different tools to Cluster PostgreSQL Database.

Key Criteria of Evaluation
  • Open source
  • Impact on Application
  • Performance gain, connection pooling support, High Availability, Support to advance feature such as Sharding.  
  • Documentation and Community support 

Available Platform and Features Matrix
ProgramLicensePlatform supportedMaturityReplication MethodSyncscalabilityConnection PoolingLoad BalancingQuery PartitioningArchitectureSupport
Postgres-xcBSDAll LinuxStableMaster-MasterSynchronousread and writeNoYesYesshared-nothingNTT Open Source Software Center (Japanese)
EnterpriseDB
Pgpool-IIBSDCentOSStableStatement-Based MiddlewareSynchronousreadYesYesYesProxy between application and databaseSRA OSS, Inc. Japan 
PgClusterBSD DiedMaster-MasterSynchronous NoYesNoIts not more active project 
slonyBSD StableMaster-SlaveAsynchronous NoNoNoIts replication based toolSlony
BucardoBSD StableMaster-Master, Master-SlaveAsynchronous NoNoNo  
LondisteBSD StableMaster-SlaveAsynchronous NoNoNo  

I found PostgreSQL-XC & PgPool-II much closer to our requirement.

CriteriaPostgreSQL-XCPgPool-II
Replication MethodAll Node act as master and accept both Read-Write operation1) Write on Master and read from Slave nodes. 
2) Need to use Streaming Replication to replicate writes on Slave nodes
Architectureshared-nothingAct as middleware between Application and PostgreSQL database.
PostgreSQL VersionPostgre-XC 1.1 comes with PostgreSQL 9.2PgPool-II 3.3.2 comes with PostgreSQL 9.2 & 9.3 both
OptimizerSince its shared nothing distributed database, PostgreSQL-XC has to analyze and optimize the SQL. This optimizer is different than built-in PostgreSQL optimizer. So Much more scope of improvement on SQL Optimizer sidePgPool analyze SQL for Read (SELECT) and Write (INSERT/UPDATE/DELETE/TRUNCATE/COMMIT/ROLLBACK) type of operation and forwards SQL either to Master (write) or slave(read). Actual SQL Optimization is done by PostgreSQL Database. So all optimization features of PostgreSQL optimizer holds true in this case.
Advantages1. Most active community than PgPool-II 
2. Provides both Read-Write Scalability 
3. Replication can be done at table level, No need of additional framework 
1. Act as middleware, No Changes to Database design 
2. Act as Connection pool, Load balancer for read operations, Failover for HA 
3. Easy to implement than PostgreSQL-XC 
4. Even if the PgPool Node goes down, application can still bypass the PgPool and connect to database. 
5. Provides result and table cache features.
Disadvantages1. Impact on Database design, for every table will have to design distribution key 
2. Does not support Join operation across the Data node 
3. Additional node will be required for Global Transaction Management 
4. Table level constraints are not supported across cluster 
5. Foreign Keys are not supported
6. Failure of 1 node will lead to data unavailability.  
Streaming Replication must be Synchronous 
DocumentationWell explainedAdequate
Application Compatibilityif we want to use Sharding then PostgreSQL-XC could be considered. However due to above limitation and impact on database design, this will not be good choice.At present we are using PGbouncer for Connection pooling. PgPool-II can be used to simply replace PGbouncer and use to separate out read-write operations. This is completely transparent to Application and has no impact on Database design. In future once all restriction on Parallel SQL execution are resolved, even the read operation can be executed in parallel on all the slave node. http://girders.org/blog/2012/09/29/scaling-postgresql-with-pgpool-and-pgbouncer/ explains benefit of PgPool over PgBouncer


1) PgPool-II with Streaming Replication
2) PostgreSQL-XC project :



Reference: 
1. PostgreSQL-XC project : 
2. PgPool-II:
PgPool-II installation process is explained here : http://www.pgpool.net/docs/latest/pgpool-en.html#install














Comments

Popular posts from this blog

Drop all Objects from Schema In Postgres

To Drop all objects from Postgres Schema there could be following two approaches: Drop Schema with cascade all and re-create it again.  In some cases where you dont want to/not allowed to drop and recreate schema, its easy to look for objects on current schema and drop them. following script would help to do so, Create function which would do the task and then drop that function too. --- CREATE OR REPLACE FUNCTION drop_DB_objects() RETURNS VOID AS $$ DECLARE  rd_object RECORD; v_idx_statement VARCHAR(500);   BEGIN ---1. Dropping all stored functions RAISE NOTICE '%', 'Dropping all stored functions...'; FOR rd_object IN ( SELECT format('%I.%I(%s)', ns.nspname, p.proname, oidvectortypes(p.proargtypes)) as functionDef     FROM pg_proc p     INNER JOIN pg_namespace ns ON (p.pronamespace = ns.oid)    WHERE ns.nspname = current_schema      AND p.proname <...

Distributed transaction in Oracle ( Over Oracle DBLink)

To fetch the data from one server to and other Oracle server over DBLink, I experienced the following facts of Oracle Distributed transactions: Security issue: -           We cannot create Public synonym for the remote object accessed over Private DBLink of other Database.   -           It’s allowed to create private synonym for remote object, but you cannot grant the access over this synonym to any other schema. If you try to provide the grants to other schema Oracle raises an error:              [ORA-02021: DDL operations are not allowed on a remote database] “In an all you can access remote objects over private DBLink in the same schema where DBLink is created”. Fetching the Ref Cursor at Remote site:                   Let’s say we have two site...

Vacuum Analyze Full Schema in PostgreSQL Database

To analyze full database schema, you can use following shell script : -------------------------------------------------------------------------------------------------------------------------- #!/bin/sh rm -rf /tmp/vacuum_analyze_MyData.sql dbname="mydata" username="mydatadatamart" namespace="mydatadatamart" # Vacuum only those tables which are fragmented over 30% # Analyze tables only if they are not already # Process CMF and Groups Master table every time as they are master Tables=`psql $dbname $username << EOF SELECT 'VACUUM ANALYZE VERBOSE ' || tablename || ' ;' AS TableList   FROM ( SELECT *,       n_dead_tup > av_threshold AS "av_needed",       CASE WHEN reltuples > 0      THEN round(100.0 * n_dead_tup / (reltuples))       ELSE 0       END AS pct_dead FROM (SELECT C.relname AS TableName, pg_stat_get_tuples_inserted(C.oid) AS n_tup_ins, pg_s...