Skip to main content

Greenplum MPP Query Execution

To learn new Database, I always prefer to start with  How it executes the given SQL/Task? For Greenplum MPP database, here is my finding on How it works:

1.   Client connect to Postmaster process
2.   Postmaster process spawns a background worker process, Query Dispatcher (QD)
3.   Then Client submits the SQL’s for execution to QD  
4.   Query Dispatcher (QD) : one who,
      a.      Works only on master as driving and coordination process
      b.     Takes care of Optimizing the SQL using catalog data
      c.     Create execution Plan
      d.    Write the changes, DTM context to WAL
      e.    Co-ordinate Distributed transaction (DTM)
5.    QD Calls segment process for execution, Query Executer( QE) and submits the execution plan to QE
       a.    Query Executer( QE), is segment side worker process who is responsible for Query execution on each of the segment node
       b.   Gang communication across the segments
       c.    Send final result set to Master QD  
6.    SQL Execution : QE takes the execution plan tree and start working on it by using local catalog data, buffer cache, disk IO ..etc
7.    Gang communication : since each of the segment works on given set of data, they needs to communicate each other on who is doing what. Also share the data for Joins through motions
 
8. Once all the segments are done with execution, results are submitted to master. Master does aggregation and returns it to client.


Comments

  1. Good place. I like it a lot… but why is it so brief?

    Also visit my blog - freiwillige krankenversicherung kündigen

    ReplyDelete
  2. As this is first post of GPDB, thought of keeping it brief and crispy !!!

    ReplyDelete

Post a Comment

Popular posts from this blog

Drop all Objects from Schema In Postgres

To Drop all objects from Postgres Schema there could be following two approaches: Drop Schema with cascade all and re-create it again.  In some cases where you dont want to/not allowed to drop and recreate schema, its easy to look for objects on current schema and drop them. following script would help to do so, Create function which would do the task and then drop that function too. --- CREATE OR REPLACE FUNCTION drop_DB_objects() RETURNS VOID AS $$ DECLARE  rd_object RECORD; v_idx_statement VARCHAR(500);   BEGIN ---1. Dropping all stored functions RAISE NOTICE '%', 'Dropping all stored functions...'; FOR rd_object IN ( SELECT format('%I.%I(%s)', ns.nspname, p.proname, oidvectortypes(p.proargtypes)) as functionDef     FROM pg_proc p     INNER JOIN pg_namespace ns ON (p.pronamespace = ns.oid)    WHERE ns.nspname = current_schema      AND p.proname <> 'drop_db_objects' )

Distributed transaction in Oracle ( Over Oracle DBLink)

To fetch the data from one server to and other Oracle server over DBLink, I experienced the following facts of Oracle Distributed transactions: Security issue: -           We cannot create Public synonym for the remote object accessed over Private DBLink of other Database.   -           It’s allowed to create private synonym for remote object, but you cannot grant the access over this synonym to any other schema. If you try to provide the grants to other schema Oracle raises an error:              [ORA-02021: DDL operations are not allowed on a remote database] “In an all you can access remote objects over private DBLink in the same schema where DBLink is created”. Fetching the Ref Cursor at Remote site:                   Let’s say we have two sites involved in Distributed transaction, Server1 and Server2. The Ref Cursor opened on Server1 procedure, cannot be fetched at Server2 site. If we try to fetch this cursor oracle raises an exception:      [ORA-02055: distributed updat

Vacuum Analyze Full Schema in PostgreSQL Database

To analyze full database schema, you can use following shell script : -------------------------------------------------------------------------------------------------------------------------- #!/bin/sh rm -rf /tmp/vacuum_analyze_MyData.sql dbname="mydata" username="mydatadatamart" namespace="mydatadatamart" # Vacuum only those tables which are fragmented over 30% # Analyze tables only if they are not already # Process CMF and Groups Master table every time as they are master Tables=`psql $dbname $username << EOF SELECT 'VACUUM ANALYZE VERBOSE ' || tablename || ' ;' AS TableList   FROM ( SELECT *,       n_dead_tup > av_threshold AS "av_needed",       CASE WHEN reltuples > 0      THEN round(100.0 * n_dead_tup / (reltuples))       ELSE 0       END AS pct_dead FROM (SELECT C.relname AS TableName, pg_stat_get_tuples_inserted(C.oid) AS n_tup_ins, pg_stat_get_tuples_updated(