PostgreSQL 9.1 Vs 9.2 vs 9.3

PostgreSQL 9.1 Vs 9.2 vs 9.3 - Part1

Recently one of my customer asked me what is advantage of upgrading PostgreSQL from 9.1 to 9.2 or 9.3?
In this series of post I will try to explain the impact & benefit of moving to 9.2 or 9.3

Key Point of Evaluation:

Stability of Product
Performance Improvement
Other Key New Features
Compatibility with Existing Applications
Cost of Upgreadation.

PostgreSQL 9.1 Vs 9.2

I) Stability: PostgreSQL 9.2 released on 10^th Sept 2012.

II) Performance Improvement:

Index Only scan: (Very important feature)

9.1 PgSQL has to scan both index and Table segment to show the final result set as it needs to check the tuple visibility.
In 9.2 PgSQL has been improved to allow queries to retrieve data only from indexes, avoiding heap access. Show the result just by scanning the index segment if all the columns in query are present in Index segment itself. There by reduces IO.
PgSQL manages and update visibility map during VACUUM operation, in-order use this feature VACUUM should be done at appropriate time interval to keep map up to date.

Example:

CREATE TABLE demo_ios (col1 float, col2 float, col3 text);INSERT INTO demo_ios SELECT generate_series(1,100000000),random(), 'mynotsolongstring';

SELECT pg_size_pretty(pg_total_relation_size('demo_ios'));

pg_size_pretty

----------------

  6512 MB

CREATE index idx_demo_ios on demo_ios(col2,col1);

VACUUM demo_ios;

Consider Following SQL:

SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.01 AND 0.02;

SQL Plan in 9.1:

explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;

    "Bitmap Heap Scan on demo_ios  (cost=27953.81..922204.27 rows=1088991 width=16) (actual time=447.273..92438.443 rows=1000279 loops=1)"

    "  Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))"

    "  Buffers: shared read=587827"

    "  ->  Bitmap Index Scan on idx_demo_ios  (cost=0.00..27681.56 rows=1088991 width=0) (actual time=444.820..444.820 rows=1000279 loops=1)"

    "        Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))"

    "        Buffers: shared read=3837"

    "Total runtime: 92593.792 ms"

SQL Plan in 9.2 & 9.3:

explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02;

    "Index Only Scan using idx_demo_ios on demo_ios  (cost=0.57..35545.09 rows=1004026 width=16) (actual time=0.147..326.648 rows=1001061 loops=1)"

    "  Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))"

    "  Heap Fetches: 0"

    "  Buffers: shared hit=926518 read=1165"

    "Total runtime: 353.166 ms"

2. Optimizer Changes:

Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.
A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.

CREATE TABLE a (

         a_id serial PRIMARY KEY NOT NULL,

         b_id integer

);

CREATE INDEX a__b_id ON a USING btree (b_id);

CREATE TABLE b (

     b_id serial NOT NULL,

     c_id integer

);

CREATE INDEX b__c_id ON b USING btree (c_id);

CREATE TABLE c (

     c_id serial PRIMARY KEY NOT NULL,

     value integer UNIQUE

);

INSERT INTO b (b_id, c_id)

     SELECT g.i, g.i FROM generate_series(1, 50000) g\(i);

INSERT INTO a(b_id)

     SELECT g.i FROM generate_series(1, 50000) g\(i);

INSERT INTO c(c_id,value)

     VALUES (1,1);

EXPLAIN ANALYZE SELECT 1

FROM c

WHERE

 EXISTS (

     SELECT *

     FROM a

     JOIN b USING (b_id)

     WHERE b.c_id = c.c_id)

     AND c.value = 1;

SQL Plan in 9.1:

"Nested Loop Semi Join  (cost=1543.00..4486.27 rows=1 width=0) (actual time=106.266..106.273 rows=1 loops=1)"

"  Join Filter: (c.c_id = b.c_id)"

"  ->  Index Scan using c_value_key on c  (cost=0.00..8.27 rows=1 width=4) (actual time=0.015..0.021 rows=1 loops=1)"

"        Index Cond: (value = 1)"

"  ->  Hash Join  (cost=1543.00..3853.00 rows=50000 width=4) (actual time=44.647..103.054 rows=24904 loops=1)"

"        Hash Cond: (a.b_id = b.b_id)"

"        ->  Seq Scan on a  (cost=0.00..722.00 rows=50000 width=4) (actual time=0.020..13.075 rows=50000 loops=1)"

"        ->  Hash  (cost=722.00..722.00 rows=50000 width=8) (actual time=43.766..43.766 rows=50000 loops=1)"

"              Buckets: 4096  Batches: 2  Memory Usage: 981kB"

"              ->  Seq Scan on b  (cost=0.00..722.00 rows=50000 width=8) (actual time=0.019..15.136 rows=50000 loops=1)

"Total runtime: 112.887 ms"

SQL Plan in 9.2 & 9.3:

" Nested Loop Semi Join  (cost=0.73..16.84 rows=1 width=0) (actual time=0.188..0.190 rows=1 loops=1)"

"  ->  Index Scan using c_value_key on c  (cost=0.16..8.17 rows=1 width=4) (actual time=0.009..0.011 rows=1 loops=1)"

"        Index Cond: (value = 1)"

"  ->  Nested Loop  (cost=0.58..8.66 rows=1 width=4) (actual time=0.173..0.173 rows=1 loops=1)"

"        ->  Index Scan using b__c_id on b  (cost=0.29..8.31 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=1)"

"              Index Cond: (c_id = c.c_id)"

"        ->  Index Only Scan using a__b_id on a  (cost=0.29..0.34 rows=1 width=4) (actual time=0.159..0.159 rows=1 loops=1)"

"              Index Cond: (b_id = b.b_id)"

"              Heap Fetches: 1"

"Total runtime: 0.574 ms"

3) Other Key Optimizer Changes:

Fix planner to handle indexes with duplicated columns more reliably
Improve cost estimates for use of partial indexes
Improve the planner's ability to use statistics for columns referenced in subqueries
Improve statistical estimates for subqueries using DISTINCT
Support MIN/MAX index optimizations on boolean columns

III) Other Key New Features:

Replication: Allow streaming replication slaves to forward data to other slaves. Previously, only the master server could supply streaming replication log files to standby servers.
When a row fails a CHECK or NOT NULL constraint, show the row's contents as error detail
DROP INDEX CONCURRENTLY: Allow to drop index on table without taking exclusive lock on table. This allows to run parallel read/write transaction on same table.
Reduce need to rebuild tables and indexes for certain ALTER TABLE... ALTER COLUMN TYPE operations.
Increasing the length limit for a varchar or varbit column, or removing the limit altogether, no longer requires a table rewrite. Similarly, increasing the allowable precision of a numeric column, or changing a column from constrained numeric to unconstrained numeric, no longer requires a table rewrite. Table rewrites are also avoided in similar cases involving the interval, timestamp, and timestamptz types.

IV) Compatibility:

Major changes has been done on Database catalog views, which might affect the monitoring tools, like pgbouncer and xymon. So these tools will have a impact

pg_tablespace: spclocation field is removed from the view as it was no getting updated if tablespace location is changed
pg_stat_activity and pg_stat_replication's definitions have changed
- current_query field disappears and is replaced by two columns:*** state: is the session running a query, waiting
  - query: what is the last run (or still running if stat is "active") query
- The column procpid is renamed to pid, to be consistent with other system views
Change all SQL-level statistics timing values to float8-stored milliseconds: pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.

V) Cost of Upgreadation:

Explained at : Upgrade PostgreSQL 9.1 to 9.3 - Step by Step

References:

https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.2

http://www.postgresql.org/docs/9.2/static/release-9-2.html

Drop all Objects from Schema In Postgres

To Drop all objects from Postgres Schema there could be following two approaches: Drop Schema with cascade all and re-create it again. In some cases where you dont want to/not allowed to drop and recreate schema, its easy to look for objects on current schema and drop them. following script would help to do so, Create function which would do the task and then drop that function too. --- CREATE OR REPLACE FUNCTION drop_DB_objects() RETURNS VOID AS $$ DECLARE rd_object RECORD; v_idx_statement VARCHAR(500); BEGIN ---1. Dropping all stored functions RAISE NOTICE '%', 'Dropping all stored functions...'; FOR rd_object IN ( SELECT format('%I.%I(%s)', ns.nspname, p.proname, oidvectortypes(p.proargtypes)) as functionDef FROM pg_proc p INNER JOIN pg_namespace ns ON (p.pronamespace = ns.oid) WHERE ns.nspname = current_schema AND p.proname <...

My Other Post >>

Santosh Kangane

Search This Blog

PostgreSQL 9.1 Vs 9.2 vs 9.3 - Part1

Comments

Post a Comment

Popular posts from this blog

Drop all Objects from Schema In Postgres

Distributed transaction in Oracle ( Over Oracle DBLink)

Vacuum Analyze Full Schema in PostgreSQL Database