Skip to main content

Greenplum MPP Query Execution

To learn new Database, I always prefer to start with  How it executes the given SQL/Task? For Greenplum MPP database, here is my finding on How it works:

1.   Client connect to Postmaster process
2.   Postmaster process spawns a background worker process, Query Dispatcher (QD)
3.   Then Client submits the SQL’s for execution to QD  
4.   Query Dispatcher (QD) : one who,
      a.      Works only on master as driving and coordination process
      b.     Takes care of Optimizing the SQL using catalog data
      c.     Create execution Plan
      d.    Write the changes, DTM context to WAL
      e.    Co-ordinate Distributed transaction (DTM)
5.    QD Calls segment process for execution, Query Executer( QE) and submits the execution plan to QE
       a.    Query Executer( QE), is segment side worker process who is responsible for Query execution on each of the segment node
       b.   Gang communication across the segments
       c.    Send final result set to Master QD  
6.    SQL Execution : QE takes the execution plan tree and start working on it by using local catalog data, buffer cache, disk IO ..etc
7.    Gang communication : since each of the segment works on given set of data, they needs to communicate each other on who is doing what. Also share the data for Joins through motions
 
8. Once all the segments are done with execution, results are submitted to master. Master does aggregation and returns it to client.


Comments

  1. Good place. I like it a lot… but why is it so brief?

    Also visit my blog - freiwillige krankenversicherung kündigen

    ReplyDelete
  2. As this is first post of GPDB, thought of keeping it brief and crispy !!!

    ReplyDelete

Post a Comment

Popular posts from this blog

Some facts and Figures of WCF

SOAP Message in WCF: 1.        The max size of SOAP message in WCF is 9,223,372,036,854,775,807 bytes. Including metadata. 2.        For actual user data we can use 2,147,483,647 bytes out of it. 3.        With default setting WCF uses only 65536 bytes. 4.        We can change it by setting maxReceivedMessageSize in clients app.config file.    5.        So selection of data types in Data Contract and Data table will matter a lot! 6.       Reference :   http://blogs.msdn.com/drnick/archive/2006/03/16/552628.aspx          http://blogs.msdn.com/drnick/archive/2006/03/10/547568.aspx       “Amazing blog for WCF!” Data Contract: 1.        By Default WCF can serialize 65536 DataMember. 2.        We can change it to max  2147483646. 3.       How?  Please go to: http://social.msdn.microsoft.com/Forums/en-US/wcf/thread/4a81a226-175b-41d3-864a-181792c71ffe Tracing WCF service message: 1.       Enable message log of WCF: http://www.avingtonsolutions.com/blog/post/2008/07/25/Tracing-SOAP-Message

Drop all Objects from Schema In Postgres

To Drop all objects from Postgres Schema there could be following two approaches: Drop Schema with cascade all and re-create it again.  In some cases where you dont want to/not allowed to drop and recreate schema, its easy to look for objects on current schema and drop them. following script would help to do so, Create function which would do the task and then drop that function too. --- CREATE OR REPLACE FUNCTION drop_DB_objects() RETURNS VOID AS $$ DECLARE  rd_object RECORD; v_idx_statement VARCHAR(500);   BEGIN ---1. Dropping all stored functions RAISE NOTICE '%', 'Dropping all stored functions...'; FOR rd_object IN ( SELECT format('%I.%I(%s)', ns.nspname, p.proname, oidvectortypes(p.proargtypes)) as functionDef     FROM pg_proc p     INNER JOIN pg_namespace ns ON (p.pronamespace = ns.oid)    WHERE ns.nspname = current_schema      AND p.proname <> 'drop_db_objects' )

Vacuum Analyze Full Schema in PostgreSQL Database

To analyze full database schema, you can use following shell script : -------------------------------------------------------------------------------------------------------------------------- #!/bin/sh rm -rf /tmp/vacuum_analyze_MyData.sql dbname="mydata" username="mydatadatamart" namespace="mydatadatamart" # Vacuum only those tables which are fragmented over 30% # Analyze tables only if they are not already # Process CMF and Groups Master table every time as they are master Tables=`psql $dbname $username << EOF SELECT 'VACUUM ANALYZE VERBOSE ' || tablename || ' ;' AS TableList   FROM ( SELECT *,       n_dead_tup > av_threshold AS "av_needed",       CASE WHEN reltuples > 0      THEN round(100.0 * n_dead_tup / (reltuples))       ELSE 0       END AS pct_dead FROM (SELECT C.relname AS TableName, pg_stat_get_tuples_inserted(C.oid) AS n_tup_ins, pg_stat_get_tuples_updated(