分析MD Accessor对元数据的缓存能力
set client_min_messages='log';
set optimizer to on;
set optimizer_print_optimization_stats to on;
--执行SQL
optimizer_print_optimization_stats GUC会打印处ORCA优化器优化流程中的各步骤的统计数据。分析打印日志如下,由CMDAccessor::~CMDAccessor析构函数打印出来,打印的是m_dFetchTime
和m_dLookupTime
两个成员变量。
2023-07-11 16:46:03:555868 CST,THD000,TRACE,"[OPT]: Total metadata fetch time: 13176.370000ms [OPT]: Total metadata lookup time (including fetch time): 14591.598000ms
从src/backend/gporca/libgpopt/src/mdcache/CMDAccessor.cpp/CMDAccessor::GetImdObj(IMDid *)函数来看,object not found in MD cahce: retrieve it from MD provider来看,就是从MD CACHE中找不到对应的对象就从MD provider中获取,代码中使用了timeFetch CTimerUser来统计开始结束时间差,如下代码所示。
orca对元数据有3层缓存: local hashtable、MD cache、MD provider。m_dLookupTime体现的是从这3层缓冲中获取元数据对象的耗时,包含了如果从MD provider中获取的元数据对象需要更新进MD cache和local hashtable中,同样从MD cache中查找到的也需要更新到local hashtable中。
不同会话下,执行获取元数据缓存耗时:
2023-07-12 14:45:01:855297 CST,THD000,TRACE,“timer: [OPT]: Query To DXL Translation Time: 885ms”,
2023-07-12 14:45:30:500581 CST,THD000,TRACE,“[OPT]: Total metadata fetch time: 14473.282000ms
[OPT]: Total metadata lookup time (including fetch time): 16382.339000ms
第1次执行计划耗时29s,元数据获取时间14s,元数据查找时间16s
2023-07-12 14:49:25:139170 CST,THD000,TRACE,“timer: [OPT]: Query To DXL Translation Time: 779ms”,
2023-07-12 14:49:52:102328 CST,THD000,TRACE,”[OPT]: Total metadata fetch time: 13731.516000ms
[OPT]: Total metadata lookup time (including fetch time): 15566.536000ms
第2次执行计划耗时27s,元数据获取时间13s,元数据查找时间15s
2023-07-12 14:57:24:431293 CST,THD000,TRACE,“timer: [OPT]: Query To DXL Translation Time: 817ms”,
2023-07-12 14:57:49:018854 CST,THD000,TRACE,“[OPT]: Total metadata fetch time: 12863.061000ms
[OPT]: Total metadata lookup time (including fetch time): 14672.054000ms
第3次执行计划耗时25s,元数据获取时间12s,元数据查找时间14s
2023-07-12 15:14:22:723433 CST,THD000,TRACE,“timer: [OPT]: Query To DXL Translation Time: 863ms”,
2023-07-12 15:14:47:571352 CST,THD000,TRACE,”[OPT]: Total metadata fetch time: 13167.755000ms
[OPT]: Total metadata lookup time (including fetch time): 14981.800000ms
第4次执行计划耗时25s,元数据获取时间13s,元数据查找时间14s
同一个会话下,执行获取元数据缓存耗时:
2023-07-12 15:21:08:237880 CST,THD000,TRACE,“timer: [OPT]: Query To DXL Translation Time: 870ms”,
2023-07-12 15:21:34:591085 CST,THD000,TRACE,“[OPT]: Total metadata fetch time: 13729.502000ms
[OPT]: Total metadata lookup time (including fetch time): 15601.946000ms
第1次执行计划耗时26s,元数据获取时间13s,元数据查找时间15s
2023-07-12 15:21:39:953092 CST,THD000,TRACE,“timer: [OPT]: Query To DXL Translation Time: 781ms”,
2023-07-12 15:22:04:924564 CST,THD000,TRACE,”[OPT]: Total metadata fetch time: 13512.558000ms
[OPT]: Total metadata lookup time (including fetch time): 15360.607000ms
第2次执行计划耗时25s,元数据获取时间13s,元数据查找时间15s
2023-07-12 15:22:07:900124 CST,THD000,TRACE,“timer: [OPT]: Query To DXL Translation Time: 205ms”,
2023-07-12 15:22:15:184400 CST,THD000,TRACE,“[OPT]: Total metadata fetch time: 0.000000ms
[OPT]: Total metadata lookup time (including fetch time): 1027.527000ms
第3次执行计划耗时8s,元数据获取时间0s,元数据查找时间1s
2023-07-12 15:22:15:542861 CST,THD000,TRACE,“timer: [OPT]: Query To DXL Translation Time: 202ms”,
2023-07-12 15:22:22:561944 CST,THD000,TRACE,”[OPT]: Total metadata fetch time: 0.000000ms
[OPT]: Total metadata lookup time (including fetch time): 960.423000ms
第4次执行计划耗时7s,元数据获取时间0s,元数据查找时间0.9s
2023-07-12 15:22:28:083738 CST,THD000,TRACE,“timer: [OPT]: Query To DXL Translation Time: 192ms”,
2023-07-12 15:22:34:790351 CST,THD000,TRACE,“[OPT]: Total metadata fetch time: 0.000000ms
[OPT]: Total metadata lookup time (including fetch time): 946.501000ms
第5次执行计划耗时6s,元数据获取时间0s,元数据查找时间0.9s
同一个会话下,执行获取元数据缓存耗时,增加语句耗时:
2023-07-12 15:40:02:665883 CST,THD000,TRACE,“timer: [OPT]: Query To DXL Translation Time: 227ms”,
2023-07-12 15:40:11:426017 CST,THD000,TRACE,”[OPT]: Total metadata fetch time: 0.000000ms
[OPT]: Total metadata lookup time (including fetch time): 1030.394000ms
psql:dwszzr2.sql:19: LOG: duration: 600431.348 ms statement: select * from pg_sleep(600);
2023-07-12 15:50:19:376402 CST,THD000,TRACE,“timer: [OPT]: Query To DXL Translation Time: 895ms”,
2023-07-12 15:50:46:477806 CST,THD000,TRACE,"[OPT]: Total metadata fetch time: 13853.042000ms
[OPT]: Total metadata lookup time (including fetch time): 15858.812000ms
三级缓存之MD provider hashtable
首先CMDAccessor类缓存了MD providers类,通过hashtable进行组织,实际上是以CSystemId作为键,IMDProvider作为值。代码上通过CSyncHashtable作为hashtable,哈希表中的element是SMDProviderElem,其实就是IMDProvider *m_pmdp
指针。
struct SMDProviderElem{ // element in the MD provider hashtableprivate: CSystemId m_sysid; // source system id IMDProvider *m_pmdp; // value of the hashed elementpublic: SLink m_link; // generic linkstatic const SMDProviderElem m_mdpelemInvalid; // invalid key SMDProviderElem(CSystemId sysid, IMDProvider *pmdp); // ctor ~SMDProviderElem(); // dtor IMDProvider *Pmdp(); // return the MD provider CSystemId Sysid() const; // return the system id static BOOL Equals(const SMDProviderElem &mdpelemLeft, const SMDProviderElem &mdpelemRight); // equality function for hash tables static ULONG HashValue(const SMDProviderElem &mdpelem); // hash function for MD providers hash table};
MD Accessor类提供了如下函数用于向MD provider hashtable插入新的MD provider,也提供了Pmdp函数通过sysid查找MD provider。
// register a new MD providervoid RegisterProvider(CSystemId sysid, IMDProvider *pmdp);// register given MD providersvoid RegisterProviders(const CSystemIdArray *pdrgpsysid, const CMDProviderArray *pdrgpmdp); IMDProvider *Pmdp(CSystemId sysid); // lookup an MD provider by system id//---------------------------------------------------------------------------
// @function:
// CMDAccessor::Pmdp
// @doc:
// Retrieve the MD provider for the given source system id
//---------------------------------------------------------------------------
IMDProvider *CMDAccessor::Pmdp(CSystemId sysid){SMDProviderElem *pmdpelem = NULL;{ // scope for HT accessorSMDProviderElem mdpelem(sysid, NULL /*pmdp*/);MDPHTAccessor mdhtacc(m_shtProviders, mdpelem);pmdpelem = mdhtacc.Find();}GPOS_ASSERT(NULL != pmdpelem && "Could not find MD provider");return pmdpelem->Pmdp();
}
三级缓存之MDCache
MDCache使用CCache类(LRU缓存、单次session中CCACHE_GCLOCK_INIT_COUNTER计数内有效。),其键为CMDKey,也就是IMDId metadata object;值为IMDCacheObject 元数据缓存对象。而CCache类定义在src/backend/gporca/libgpos/include/gpos/memory/CCahce.h文件下。针对CCacheEntry有refcout和gclockcounter两种缓存替换评估指标,可以通过src/backend/gporca/libgpos/include/gpos/memory/CCache.h中的EvictEntries函数查看替换策略。由如下代码可以看出时间初始值为CCACHE_GCLOCK_INIT_COUNTER, cache_quota为初始大小
COptTasks::OptimizeTask函数会对CMDCache的CacheQuota进行设置
三级缓存之hashtable of cache accessors
m_shtCacheAccessors作为和CMDAccessor生命周期一致的单次优化流程中的hashtable,其键为Mdid指针,其值为SMDAccessorElem,其实也即是IMDCacheObject元数据缓存对象。
struct SMDAccessorElem{private: IMDCacheObject *m_imd_obj; // hashed objectpublic: IMDId *m_mdid; // hash key SLink m_link; // generic link// invalid keystatic const MdidPtr m_pmdidInvalid; SMDAccessorElem(IMDCacheObject *pimdobj, IMDId *mdid); // ctor ~SMDAccessorElem(); // dtor IMDCacheObject *GetImdObj(){return m_imd_obj;} // hashed object IMDId *MDId(); // return the key for this hashtable element// equality function for hash tablesstatic BOOL Equals(const MdidPtr &left_mdid, const MdidPtr &right_mdid);// hash function for cost contexts hash tablestatic ULONG HashValue(const MdidPtr &mdid);};