Skip to main content

PostgreSQL 9.1 Vs 9.2 vs 9.3 - Part1

Recently one of my customer asked me what is advantage of upgrading PostgreSQL from 9.1 to 9.2 or 9.3?
In this series of post I will try to explain the impact & benefit of moving to 9.2 or 9.3

Key Point of Evaluation:
  1. Stability of Product
  2. Performance Improvement
  3. Other Key New Features
  4. Compatibility with Existing Applications
  5. Cost of Upgreadation.
PostgreSQL 9.1 Vs 9.2

I)     Stability: PostgreSQL 9.2 released on 10th Sept 2012.
II)    Performance Improvement:
  1. Index Only scan: (Very important feature) 
    1. 9.1 PgSQL has to scan both index and Table segment to show the final result set as it needs to check the tuple visibility.
    2. In 9.2 PgSQL has been improved to allow queries to retrieve data only from indexes, avoiding heap access. Show the result just by scanning the index segment if all the columns in query are present in Index segment itself. There by reduces IO.
    3. PgSQL manages and update visibility map during VACUUM operation, in-order use this feature VACUUM should be done at appropriate time interval to keep map up to date.
Example:
CREATE TABLE demo_ios (col1 float, col2 float, col3 text);INSERT INTO demo_ios SELECT generate_series(1,100000000),random(), 'mynotsolongstring';
SELECT pg_size_pretty(pg_total_relation_size('demo_ios'));
pg_size_pretty
----------------
  6512 MB
CREATE index idx_demo_ios on demo_ios(col2,col1);
VACUUM demo_ios;
Consider Following SQL:
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.01 AND 0.02;
SQL Plan in 9.1:
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;
    "Bitmap Heap Scan on demo_ios  (cost=27953.81..922204.27 rows=1088991 width=16) (actual time=447.273..92438.443 rows=1000279 loops=1)"
    "  Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))"
    "  Buffers: shared read=587827"
    "  ->  Bitmap Index Scan on idx_demo_ios  (cost=0.00..27681.56 rows=1088991 width=0) (actual time=444.820..444.820 rows=1000279 loops=1)"
    "        Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))"
    "        Buffers: shared read=3837"
    "Total runtime: 92593.792 ms"  
SQL Plan in 9.2 & 9.3:
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;
    "Index Only Scan using idx_demo_ios on demo_ios  (cost=0.57..35545.09 rows=1004026 width=16) (actual time=0.147..326.648 rows=1001061 loops=1)"
    "  Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))"
    "  Heap Fetches: 0"
    "  Buffers: shared hit=926518 read=1165"
    "Total runtime: 353.166 ms"
2. Optimizer Changes:
  • Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.
  • A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.

CREATE TABLE a (
         a_id serial PRIMARY KEY NOT NULL,
         b_id integer
);
CREATE INDEX a__b_id ON a USING btree (b_id);
CREATE TABLE b (
     b_id serial NOT NULL,
     c_id integer
);
CREATE INDEX b__c_id ON b USING btree (c_id);
CREATE TABLE c (
     c_id serial PRIMARY KEY NOT NULL,
     value integer UNIQUE
);
INSERT INTO b (b_id, c_id)
     SELECT g.i, g.i FROM generate_series(1, 50000) g\(i);
INSERT INTO a(b_id)
     SELECT g.i FROM generate_series(1, 50000) g\(i);
INSERT INTO c(c_id,value)
     VALUES (1,1);
EXPLAIN ANALYZE SELECT 1
FROM c
WHERE
 EXISTS (
     SELECT *
     FROM a
     JOIN b USING (b_id)
     WHERE b.c_id = c.c_id)
     AND c.value = 1;

SQL Plan in 9.1:
"Nested Loop Semi Join  (cost=1543.00..4486.27 rows=1 width=0) (actual time=106.266..106.273 rows=1 loops=1)"
"  Join Filter: (c.c_id = b.c_id)"
"  ->  Index Scan using c_value_key on c  (cost=0.00..8.27 rows=1 width=4) (actual time=0.015..0.021 rows=1 loops=1)"
"        Index Cond: (value = 1)"
"  ->  Hash Join  (cost=1543.00..3853.00 rows=50000 width=4) (actual time=44.647..103.054 rows=24904 loops=1)"
"        Hash Cond: (a.b_id = b.b_id)"
"        ->  Seq Scan on a  (cost=0.00..722.00 rows=50000 width=4) (actual time=0.020..13.075 rows=50000 loops=1)"
"        ->  Hash  (cost=722.00..722.00 rows=50000 width=8) (actual time=43.766..43.766 rows=50000 loops=1)"
"              Buckets: 4096  Batches: 2  Memory Usage: 981kB"
"              ->  Seq Scan on b  (cost=0.00..722.00 rows=50000 width=8) (actual time=0.019..15.136 rows=50000 loops=1)
"Total runtime: 112.887 ms"
SQL Plan in 9.2 & 9.3:
" Nested Loop Semi Join  (cost=0.73..16.84 rows=1 width=0) (actual time=0.188..0.190 rows=1 loops=1)"
"  ->  Index Scan using c_value_key on c  (cost=0.16..8.17 rows=1 width=4) (actual time=0.009..0.011 rows=1 loops=1)"
"        Index Cond: (value = 1)"
"  ->  Nested Loop  (cost=0.58..8.66 rows=1 width=4) (actual time=0.173..0.173 rows=1 loops=1)"
"        ->  Index Scan using b__c_id on b  (cost=0.29..8.31 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=1)"
"              Index Cond: (c_id = c.c_id)"
"        ->  Index Only Scan using a__b_id on a  (cost=0.29..0.34 rows=1 width=4) (actual time=0.159..0.159 rows=1 loops=1)"
"              Index Cond: (b_id = b.b_id)"
"              Heap Fetches: 1"
"Total runtime: 0.574 ms"


    3)  Other Key Optimizer Changes:
  • Fix planner to handle indexes with duplicated columns more reliably
  • Improve cost estimates for use of partial indexes
  • Improve the planner's ability to use statistics for columns referenced in subqueries
  • Improve statistical estimates for subqueries using DISTINCT
  • Support MIN/MAX index optimizations on boolean columns 
III) Other Key New Features: 
  • Replication: Allow streaming replication slaves to forward data to other slaves. Previously, only the master server could supply streaming replication log files to standby servers.
  • When a row fails a CHECK or NOT NULL constraint, show the row's contents as error detail
  • DROP INDEX CONCURRENTLY: Allow to drop index on table without taking exclusive lock on table. This allows to run parallel read/write transaction on same table.
  • Reduce need to rebuild tables and indexes for certain ALTER TABLE... ALTER COLUMN TYPE operations.
    Increasing the length limit for a varchar or varbit column, or removing the limit altogether, no longer requires a table rewrite. Similarly, increasing the allowable precision of a numeric column, or changing a column from constrained numeric to unconstrained numeric, no longer requires a table rewrite. Table rewrites are also avoided in similar cases involving the intervaltimestamp, and timestamptz types.
IV)   Compatibility:
Major changes has been done on Database catalog views, which might affect the monitoring tools, like pgbouncer and xymon. So these tools will have a impact 
  • pg_tablespace: spclocation field is removed from the view as it was no getting updated if tablespace location is changed
  • pg_stat_activity and pg_stat_replication's definitions have changed
    • current_query field disappears and is replaced by two columns:*** state: is the session running a query, waiting 
      • query: what is the last run (or still running if stat is "active") query
    • The column procpid is renamed to pid, to be consistent with other system views
  • Change all SQL-level statistics timing values to float8-stored milliseconds: pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.
V)   Cost of Upgreadation: 























Comments

Popular posts from this blog

Drop all Objects from Schema In Postgres

To Drop all objects from Postgres Schema there could be following two approaches: Drop Schema with cascade all and re-create it again.  In some cases where you dont want to/not allowed to drop and recreate schema, its easy to look for objects on current schema and drop them. following script would help to do so, Create function which would do the task and then drop that function too. --- CREATE OR REPLACE FUNCTION drop_DB_objects() RETURNS VOID AS $$ DECLARE  rd_object RECORD; v_idx_statement VARCHAR(500);   BEGIN ---1. Dropping all stored functions RAISE NOTICE '%', 'Dropping all stored functions...'; FOR rd_object IN ( SELECT format('%I.%I(%s)', ns.nspname, p.proname, oidvectortypes(p.proargtypes)) as functionDef     FROM pg_proc p     INNER JOIN pg_namespace ns ON (p.pronamespace = ns.oid)    WHERE ns.nspname = current_schema      AND p.proname <...

Distributed transaction in Oracle ( Over Oracle DBLink)

To fetch the data from one server to and other Oracle server over DBLink, I experienced the following facts of Oracle Distributed transactions: Security issue: -           We cannot create Public synonym for the remote object accessed over Private DBLink of other Database.   -           It’s allowed to create private synonym for remote object, but you cannot grant the access over this synonym to any other schema. If you try to provide the grants to other schema Oracle raises an error:              [ORA-02021: DDL operations are not allowed on a remote database] “In an all you can access remote objects over private DBLink in the same schema where DBLink is created”. Fetching the Ref Cursor at Remote site:                   Let’s say we have two site...

Vacuum Analyze Full Schema in PostgreSQL Database

To analyze full database schema, you can use following shell script : -------------------------------------------------------------------------------------------------------------------------- #!/bin/sh rm -rf /tmp/vacuum_analyze_MyData.sql dbname="mydata" username="mydatadatamart" namespace="mydatadatamart" # Vacuum only those tables which are fragmented over 30% # Analyze tables only if they are not already # Process CMF and Groups Master table every time as they are master Tables=`psql $dbname $username << EOF SELECT 'VACUUM ANALYZE VERBOSE ' || tablename || ' ;' AS TableList   FROM ( SELECT *,       n_dead_tup > av_threshold AS "av_needed",       CASE WHEN reltuples > 0      THEN round(100.0 * n_dead_tup / (reltuples))       ELSE 0       END AS pct_dead FROM (SELECT C.relname AS TableName, pg_stat_get_tuples_inserted(C.oid) AS n_tup_ins, pg_s...