To learn new Database, I always prefer to start with How it executes the given SQL/Task? For Greenplum MPP database, here is my finding on How it works:
1. Client connect to Postmaster process
2. Postmaster process spawns a background worker process, Query Dispatcher (QD)
3. Then Client submits the SQL’s for execution to QD
4. Query Dispatcher (QD) : one who,
a. Works only on master as driving and coordination process
b. Takes care of Optimizing the SQL using catalog data
c. Create execution Plan
d. Write the changes, DTM context to WAL
e. Co-ordinate Distributed transaction (DTM)
5. QD Calls segment process for execution, Query Executer( QE) and submits the execution plan to QE
a. Query Executer( QE), is segment side worker process who is responsible for Query execution on each of the segment node
b. Gang communication across the segments
c. Send final result set to Master QD
6. SQL Execution : QE takes the execution plan tree and start working on it by using local catalog data, buffer cache, disk IO ..etc
7. Gang communication : since each of the segment works on given set of data, they needs to communicate each other on who is doing what. Also share the data for Joins through motions
8. Once all the segments are done with execution, results are submitted to master. Master does aggregation and returns it to client.
Good place. I like it a lot… but why is it so brief?
ReplyDeleteAlso visit my blog - freiwillige krankenversicherung kündigen
As this is first post of GPDB, thought of keeping it brief and crispy !!!
ReplyDelete