Recently one of my customer asked me what is advantage of upgrading PostgreSQL from 9.1 to 9.2 or 9.3?
In this series of post I will try to explain the impact & benefit of moving to 9.2 or 9.3
In this series of post I will try to explain the impact & benefit of moving to 9.2 or 9.3
Key Point of Evaluation:
- Stability of Product
- Performance Improvement
- Other Key New Features
- Compatibility with Existing Applications
- Cost of Upgreadation.
PostgreSQL 9.1 Vs 9.2
I) Stability: PostgreSQL 9.2 released on 10th Sept 2012.
II) Performance Improvement:
- Index Only scan: (Very important feature)
- 9.1 PgSQL has to scan both index and Table segment to show the final result set as it needs to check the tuple visibility.
- In 9.2 PgSQL has been improved to allow queries to retrieve data only from indexes, avoiding heap access. Show the result just by scanning the index segment if all the columns in query are present in Index segment itself. There by reduces IO.
- PgSQL manages and update visibility map during VACUUM operation, in-order use this feature VACUUM should be done at appropriate time interval to keep map up to date.
Example:
CREATE TABLE demo_ios (col1 float , col2 float , col3 text);INSERT INTO demo_ios SELECT generate_series( 1 , 100000000 ),random(), 'mynotsolongstring' ; SELECT pg_size_pretty(pg_total_relation_size( 'demo_ios' )); pg_size_pretty ---------------- 6512 MB CREATE index idx_demo_ios on demo_ios(col2,col1); VACUUM demo_ios; |
Consider Following SQL:
SELECT col1,col2 FROM demo_ios where col2 BETWEEN 0.01 AND 0.02 ; |
SQL Plan in 9.1:
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02 ; "Bitmap Heap Scan on demo_ios (cost=27953.81..922204.27 rows=1088991 width=16) (actual time=447.273..92438.443 rows=1000279 loops=1)" " Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))" " Buffers: shared read=587827" " -> Bitmap Index Scan on idx_demo_ios (cost=0.00..27681.56 rows=1088991 width=0) (actual time=444.820..444.820 rows=1000279 loops=1)" " Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))" " Buffers: shared read=3837" "Total runtime: 92593.792 ms" |
SQL Plan in 9.2 & 9.3:
explain (analyze,buffers) select col1,col2 from demo_ios where col2 between 0.01 and 0.02 ; "Index Only Scan using idx_demo_ios on demo_ios (cost=0.57..35545.09 rows=1004026 width=16) (actual time=0.147..326.648 rows=1001061 loops=1)" " Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))" " Heap Fetches: 0" " Buffers: shared hit=926518 read=1165" "Total runtime: 353.166 ms" |
2. Optimizer Changes:
- Prepared statements used to be optimized once, without any knowledge of the parameters' values. With 9.2, the planner will use specific plans regarding to the parameters sent (the query will be planned at execution), except if the query is executed several times and the planner decides that the generic plan is not too much more expensive than the specific plans.
- A new feature has been added: parameterized paths. Simply put, it means that a sub-part of a query plan can use parameters it has got from a parent node. It fixes several bad plans that could occur, especially when the optimizer couldn't reorder joins to put nested loops where it would have been efficient.
CREATE TABLE a ( a_id serial PRIMARY KEY NOT NULL, b_id integer ); CREATE INDEX a__b_id ON a USING btree (b_id); CREATE TABLE b ( b_id serial NOT NULL, c_id integer ); CREATE INDEX b__c_id ON b USING btree (c_id); CREATE TABLE c ( c_id serial PRIMARY KEY NOT NULL, value integer UNIQUE ); INSERT INTO b (b_id, c_id) SELECT g.i, g.i FROM generate_series( 1 , 50000 ) g\(i); INSERT INTO a(b_id) SELECT g.i FROM generate_series( 1 , 50000 ) g\(i); INSERT INTO c(c_id,value) VALUES ( 1 , 1 ); EXPLAIN ANALYZE SELECT 1 FROM c WHERE EXISTS ( SELECT * FROM a JOIN b USING (b_id) WHERE b.c_id = c.c_id) AND c.value = 1 ; |
SQL Plan in 9.1:
"Nested Loop Semi Join (cost=1543.00..4486.27 rows=1 width=0) (actual time=106.266..106.273 rows=1 loops=1)" " Join Filter: (c.c_id = b.c_id)" " -> Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) (actual time=0.015..0.021 rows=1 loops=1)" " Index Cond: (value = 1)" " -> Hash Join (cost=1543.00..3853.00 rows=50000 width=4) (actual time=44.647..103.054 rows=24904 loops=1)" " Hash Cond: (a.b_id = b.b_id)" " -> Seq Scan on a (cost=0.00..722.00 rows=50000 width=4) (actual time=0.020..13.075 rows=50000 loops=1)" " -> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=43.766..43.766 rows=50000 loops=1)" " Buckets: 4096 Batches: 2 Memory Usage: 981kB" " -> Seq Scan on b (cost= 0.00 .. 722.00 rows= 50000 width= 8 ) (actual time= 0.019 .. 15.136 rows= 50000 loops= 1 ) "Total runtime: 112.887 ms" |
SQL Plan in 9.2 & 9.3:
" Nested Loop Semi Join (cost=0.73..16.84 rows=1 width=0) (actual time=0.188..0.190 rows=1 loops=1)" " -> Index Scan using c_value_key on c (cost=0.16..8.17 rows=1 width=4) (actual time=0.009..0.011 rows=1 loops=1)" " Index Cond: (value = 1)" " -> Nested Loop (cost=0.58..8.66 rows=1 width=4) (actual time=0.173..0.173 rows=1 loops=1)" " -> Index Scan using b__c_id on b (cost=0.29..8.31 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=1)" " Index Cond: (c_id = c.c_id)" " -> Index Only Scan using a__b_id on a (cost=0.29..0.34 rows=1 width=4) (actual time=0.159..0.159 rows=1 loops=1)" " Index Cond: (b_id = b.b_id)" " Heap Fetches: 1" "Total runtime: 0.574 ms"
|
3) Other Key Optimizer Changes:
- Fix planner to handle indexes with duplicated columns more reliably
- Improve cost estimates for use of partial indexes
- Improve the planner's ability to use statistics for columns referenced in subqueries
- Improve statistical estimates for subqueries using DISTINCT
- Support MIN/MAX index optimizations on boolean columns
III) Other Key New Features:
- Replication: Allow streaming replication slaves to forward data to other slaves. Previously, only the master server could supply streaming replication log files to standby servers.
- When a row fails a CHECK or NOT NULL constraint, show the row's contents as error detail
- DROP INDEX CONCURRENTLY: Allow to drop index on table without taking exclusive lock on table. This allows to run parallel read/write transaction on same table.
- Reduce need to rebuild tables and indexes for certain ALTER TABLE... ALTER COLUMN TYPE operations.
Increasing the length limit for a varchar or varbit column, or removing the limit altogether, no longer requires a table rewrite. Similarly, increasing the allowable precision of a numeric column, or changing a column from constrained numeric to unconstrained numeric, no longer requires a table rewrite. Table rewrites are also avoided in similar cases involving the interval, timestamp, and timestamptz types.
IV) Compatibility:
Major changes has been done on Database catalog views, which might affect the monitoring tools, like pgbouncer and xymon. So these tools will have a impact
- pg_tablespace: spclocation field is removed from the view as it was no getting updated if tablespace location is changed
- pg_stat_activity and pg_stat_replication's definitions have changed
- current_query field disappears and is replaced by two columns:*** state: is the session running a query, waiting
- query: what is the last run (or still running if stat is "active") query
- The column procpid is renamed to pid, to be consistent with other system views
- current_query field disappears and is replaced by two columns:*** state: is the session running a query, waiting
- Change all SQL-level statistics timing values to float8-stored milliseconds: pg_stat_user_functions.total_time, pg_stat_user_functions.self_time, pg_stat_xact_user_functions.total_time, pg_stat_xact_user_functions.self_time, and pg_stat_statements.total_time (contrib) are now in milliseconds, to be consistent with the rest of the timing values.
V) Cost of Upgreadation:
Explained at : Upgrade PostgreSQL 9.1 to 9.3 - Step by Step
References:
Comments
Post a Comment