阿里技術 - 「現代C++設計魅力」虛函數繼承-thunk技術初探－鑽石舞台

一問題背景1 實踐驗證

工作中使用LLDB調試器調試這一段C++多繼承程序的時候，發現通過lldb print(expression命令的別名) 命令獲取的指針地址和實際理解的C++的內存模型的地址不一樣。那麼到底是什麼原因呢？程序如下：

class Base {public: Base(){}protected: float x;};class VBase {public: VBase(){} virtual void test(){}; virtual void foo(){};protected: float x;};class VBaseA: public VBase {public: VBaseA(){} virtual void test(){} virtual void foo(){};protected: float x;};class VBaseB: public VBase {public: VBaseB(){} virtual void test(){ printf("test \n"); } virtual void foo(){};protected: float x;};class VDerived : public VBaseA, public Base, public VBaseB {public: VDerived(){} virtual void test(){} virtual void foo(){};protected: float x;};int main(int argc, char *argv[]){ VDerived *pDerived = new VDerived(); //0x0000000103407f30 Base *pBase = (Base*)pDerived; //0x0000000103407f40 VBaseA *pvBaseA = static_cast<VBaseA*>(pDerived);//0x0000000103407f30 VBaseB *pvBaseB = static_cast<VBaseB*>(pDerived);//0x0000000103407f30 這裡應該為0x0000000103407f48,但是顯示的是0x0000000103407f30 unsigned long pBaseAddressbase = (unsigned long)pBase; unsigned long pvBaseAAddressbase = (unsigned long)pvBaseA; unsigned long pvBaseBAddressbase = (unsigned long)pvBaseB; pvBaseB->test();}

通過lldb print命令獲取的地址如下圖：

正常理解的C++內存模型

由於我使用的是x86_64的mac系統，所以指針是8字節對齊,align=8。

按正常的理解的C++內存模型：pDerived轉換為Base 類型pBase,地址偏移了16，是沒問題的。

pDerived轉化為VBaseA，由於共用了首地址為0x0000000103407f30，一樣可以理解。pDerived轉化為Base，地址偏移了16個字節(sizeof(VBaseA))為0x0000000103407f40,也是符合預期的。

但是pDerived轉化為VBase 類型pBaseB內存地址應該偏移24，為0x0000000103407f48；而不是0x0000000103407f30(對象的首地址)，這個到底是什麼原因引起的的呢？

2 驗證引發的猜測

對於上面的這段代碼

Base 類中沒有虛函數，VBaseB 中有虛函數test和foo，猜測如下

1.不含有虛函數的(不含有虛表的)基類的指針，在類型轉換時編譯器對地址按照實際偏移。

2.含有虛函數的(含有虛表的)基類指針，在類型轉換時，編譯器實際上沒有做地址的偏移，還是指向派生類，並沒有指向實際的VBaseB類型。

二現象帶來的問題

1.有虛函數的(含有虛表的)基類指針，在派生類類型轉換為有虛函數的基類時，編譯器背後有做真實的地址偏移嗎？

2.如果做了偏移

那C++中在通過基類指針調用派生類重寫的虛函數以及通過派生類指針調用虛函數的時候，編譯器是如何保證這兩種調用this指針的值是一樣的，以確保調用的正確性的？

那為什麼LLDB expression獲取的地址是派生類對象的首地址呢？

3.如果沒有做偏移，那是如何通過派生類的指針調用基類成員變量和函數的？

三現象核心原因

編譯器背後和普通的非虛函數繼承一樣，也做了指針的偏移。

做了指針偏移，C++ 中基類對象指針調用派生類對象時,編譯器通過thunk技術來實現每次參數調用和參數返回this地址的調整。

LLDB expression顯示的是派生類對象的首地址(0x0000000103407f30),而不是偏移後基類對象的首地址(0x0000000103407f48),是由於LLDB調試器在expression向用戶展示的時候,對於虛函數繼承的基類指針LLDB內部會通過summary format來對要獲取的結果進行格式化。summary format時，會根據當前的內存地址獲取C++運行時的動態類型和地址，來向用戶展示。

四證實結論過程1 指針類型轉換時編譯器是否做了偏移？匯編指令分析

有虛函數的(含有虛表的)基類指針，在派生類類型轉換為有虛函數的基類時,編譯器背後有做真實的地址偏移嗎？

基於上面的猜測，通過下面運行時反匯編的程序，來驗證上面的猜測：

在開始反匯編程序之前，有一些下面要用到的匯編知識的普及。如果熟悉，可以忽略跳過。

注意：由於小編使用的是mac操作系統，所以處理器使用的是AT&T語法；和Intel語法不一樣。

AT&T語法的指令是從左到右,第一個是源操作數,第二個是目的操作數,比如：

movl %esp, %ebp //movl是指令名稱。%則表明esp和ebp是寄存器.在AT&T語法中, 第一個是源操作數,第二個是目的操作數。

而Intel指令是從右到左,第二個是源操作數,第一個是目的操作數

MOVQ EBP, ESP //interl手冊，你會看到是沒有%的intel語法, 它的操作數順序剛好相反

在x86_64的寄存器調用約定規定中

1.第一個參數基本上放在：RDI/edi寄存器,第二個參數：RSI/esi寄存器，第三個參數：RDX寄存器,第四個參數：RCD寄存器,第五個參數：R8寄存器,第六個參數：R9 寄存器；

2.如果超過六個參數在函數裡就會通過棧來訪問額外的參數；

3.函數返回值一般放在eax寄存器，或者rax寄存器。

下面使用的mac Unix操作系統，本文用到的匯編指令都是AT&T語法，在函數傳參數時的第一個參數都放在RDI寄存器中。

下面是上面的main程序從開始執行到退出程序的所有匯編程序

通過上看的匯編代碼我們發現編譯器在做類型轉換的時候不管是繼承的基類有虛函數，還是沒有虛函數，編譯器都會做實際的指針偏移，偏移到實際的基類對象的地址，證明上面的猜測是錯誤的。編譯器在類型轉換的時候不區分有沒有虛函數，都是實際做了偏移的。

2 內存分析

上面的猜測，後來我通過LLDB調試器提供的：memory read ptr（memory read 命令縮寫 x ）得到了驗證

(lldb) memory read pDerived0x103407f30: 40 40 00 00 01 00 00 00 00 00 00 00 00 00 00 00 @@..............0x103407f40: 10 00 00 00 00 00 00 00 60 40 00 00 01 00 00 00 ........`@......(lldb) memory read pvBaseB0x103407f48: 60 40 00 00 01 00 00 00 00 00 00 00 00 00 00 00 `@..............0x103407f58: de 2d 05 10 00 00 00 00 00 00 00 00 00 00 00 00 .-..............

我們發現不同類型的指針在內存中確實讀取到的內容分別是pDerived:0x103407f30 pvBaseB:0x103407f48內存地址都不一樣；都是實際偏移後地址。

2 虛函數調用如何保證this的值一致的呢？

那既然內容中的真實地址是偏移後的，派生類重寫了基類的虛函數，在通過基類指針調用派生類重新的虛函數的時候和通過派生類調用自身實現的虛函數的時候，編譯器是如何保證這兩種調用this指針的值是一樣的，來確保調用的正確性的？

在網上查閱資料得知：C++在調用函數的時候，編譯器通過thunk技術對this指針的內容做了調整，使其指向正確的內存地址。那麼什麼是thunk技術？編譯器是如何實現的呢？

虛函數調用匯編指令分析

通過上面main函數不難發現的pvBaseB->test() 的反匯編：

pBaseB->test(); 0x100003c84 <+244>: movq -0x40(%rbp), %rax //-x40存方的是pBaseB指針的內容，這裡取出pBaseB指向的地址 0x100003c88 <+248>: movq (%rax), %rcx //然後將 rax的內容賦值給rcx 0x100003c8b <+251>: movq %rax, %rdi // 之後再將rax的值給到rdi寄存器：我們都知道，rdi寄存器是函數調用的第一個參數，這裡的this是基類的地址-> 0x100003c8e <+254>: callq *(%rcx) // 在這裡取出rcx的地址，然後通過*(rcx) 間接調用rcx中存的地址

我們再跳到VDerived::test函數的匯編實現, 在這裡通過lldb的命令：register read rdi查看函數的第一個傳參，也就是 this的地址，已經是派生類的地址了，不是調用前基類的地址

testCPPVirtualMemeory`VDerived::test: 0x100003e00 <+0>: pushq %rbp // 棧低指針壓棧 0x100003e01 <+1>: movq %rsp, %rbp // 將BP指針指向SP，因為上一級函數的棧頂指針是下一級函數的棧底指針 0x100003e04 <+4>: subq $0x10, %rsp // 開始函數棧幀空間 0x100003e08 <+8>: movq %rdi, -0x8(%rbp) // 將函數第一個參數入棧，也就是this 指針-> 0x100003e0c <+12>: leaq 0x15c(%rip), %rdi ; "test\n" 0x100003e13 <+19>: movb $0x0, %al 0x100003e15 <+21>: callq 0x100003efc ; symbol stub for: printf 0x100003e1a <+26>: addq $0x10, %rsp //回收棧空間 0x100003e1e <+30>: popq %rbp //出棧指回上一層 rbp 0x100003e1f <+31>: retq //指向下一條命令

通過上面的匯編我們分析，編譯器在調用虛函數表中的函數時，是通過 *(%rcx) 間接尋址，然後中間做了某一個操作，跳到 test的實現，那麼這個過程中thunk做了什麼操作呢？

llvm-thunk源代碼分析

小編使用的IDE都使用的是LLVM編譯器，於是通過翻看LLVM的源碼找到了答案: 在VTableBuilder.cpp的AddMethods函數，小編找到了答案，描述如下：

// Now go through all virtual member functions and add them to the current // vftable. This is done by // - replacing overridden methods in their existing slots, as long as they // don't require return adjustment; calculating This adjustment if needed. // - adding new slots for methods of the current base not present in any // sub-bases; // - adding new slots for methods that require Return adjustment. // We keep track of the methods visited in the sub-bases in MethodInfoMap.

編譯器在編譯的時候會判斷基類的虛函數派生類有沒有覆蓋，如果有實現的時候，則動態替換虛函數表中的地址為派生類的地址，同時：

1.會計算調用時this指針的地址是否需要調整，如果需要調整的話，會為當前的方法開闢一塊新的內存空間；

2.也會為需要this返回值的函數開闢一塊新的內存空間；

代碼如下：

void VFTableBuilder::AddMethods(BaseSubobject Base, unsigned BaseDepth, const CXXRecordDecl *LastVBase, BasesSetVectorTy &VisitedBases) { const CXXRecordDecl *RD = Base.getBase(); if (!RD->isPolymorphic()) return; const ASTRecordLayout &Layout = Context.getASTRecordLayout(RD); // See if this class expands a vftable of the base we look at, which is either // the one defined by the vfptr base path or the primary base of the current // class. const CXXRecordDecl *NextBase = nullptr, *NextLastVBase = LastVBase; CharUnits NextBaseOffset; if (BaseDepth < WhichVFPtr.PathToIntroducingObject.size()) { NextBase = WhichVFPtr.PathToIntroducingObject[BaseDepth]; if (isDirectVBase(NextBase, RD)) { NextLastVBase = NextBase; NextBaseOffset = MostDerivedClassLayout.getVBaseClassOffset(NextBase); } else { NextBaseOffset = Base.getBaseOffset() + Layout.getBaseClassOffset(NextBase); } } else if (const CXXRecordDecl *PrimaryBase = Layout.getPrimaryBase()) { assert(!Layout.isPrimaryBaseVirtual() && "No primary virtual bases in this ABI"); NextBase = PrimaryBase; NextBaseOffset = Base.getBaseOffset(); } if (NextBase) { AddMethods(BaseSubobject(NextBase, NextBaseOffset), BaseDepth + 1, NextLastVBase, VisitedBases); if (!VisitedBases.insert(NextBase)) llvm_unreachable("Found a duplicate primary base!"); } SmallVector<const CXXMethodDecl*, 10> VirtualMethods; // Put virtual methods in the proper order. GroupNewVirtualOverloads(RD, VirtualMethods); // Now go through all virtual member functions and add them to the current // vftable. This is done by // - replacing overridden methods in their existing slots, as long as they // don't require return adjustment; calculating This adjustment if needed. // - adding new slots for methods of the current base not present in any // sub-bases; // - adding new slots for methods that require Return adjustment. // We keep track of the methods visited in the sub-bases in MethodInfoMap. for (const CXXMethodDecl *MD : VirtualMethods) { FinalOverriders::OverriderInfo FinalOverrider = Overriders.getOverrider(MD, Base.getBaseOffset()); const CXXMethodDecl *FinalOverriderMD = FinalOverrider.Method; const CXXMethodDecl *OverriddenMD = FindNearestOverriddenMethod(MD, VisitedBases); ThisAdjustment ThisAdjustmentOffset; bool ReturnAdjustingThunk = false, ForceReturnAdjustmentMangling = false; CharUnits ThisOffset = ComputeThisOffset(FinalOverrider); ThisAdjustmentOffset.NonVirtual = (ThisOffset - WhichVFPtr.FullOffsetInMDC).getQuantity(); if ((OverriddenMD || FinalOverriderMD != MD) && WhichVFPtr.getVBaseWithVPtr()) CalculateVtordispAdjustment(FinalOverrider, ThisOffset, ThisAdjustmentOffset); unsigned VBIndex = LastVBase ? VTables.getVBTableIndex(MostDerivedClass, LastVBase) : 0; if (OverriddenMD) { // If MD overrides anything in this vftable, we need to update the // entries. MethodInfoMapTy::iterator OverriddenMDIterator = MethodInfoMap.find(OverriddenMD); // If the overridden method went to a different vftable, skip it. if (OverriddenMDIterator == MethodInfoMap.end()) continue; MethodInfo &OverriddenMethodInfo = OverriddenMDIterator->second; VBIndex = OverriddenMethodInfo.VBTableIndex; // Let's check if the overrider requires any return adjustments. // We must create a new slot if the MD's return type is not trivially // convertible to the OverriddenMD's one. // Once a chain of method overrides adds a return adjusting vftable slot, // all subsequent overrides will also use an extra method slot. ReturnAdjustingThunk = !ComputeReturnAdjustmentBaseOffset( Context, MD, OverriddenMD).isEmpty() || OverriddenMethodInfo.UsesExtraSlot; if (!ReturnAdjustingThunk) { // No return adjustment needed - just replace the overridden method info // with the current info. MethodInfo MI(VBIndex, OverriddenMethodInfo.VFTableIndex); MethodInfoMap.erase(OverriddenMDIterator); assert(!MethodInfoMap.count(MD) && "Should not have method info for this method yet!"); MethodInfoMap.insert(std::make_pair(MD, MI)); continue; } // In case we need a return adjustment, we'll add a new slot for // the overrider. Mark the overridden method as shadowed by the new slot. OverriddenMethodInfo.Shadowed = true; // Force a special name mangling for a return-adjusting thunk // unless the method is the final overrider without this adjustment. ForceReturnAdjustmentMangling = !(MD == FinalOverriderMD && ThisAdjustmentOffset.isEmpty()); } else if (Base.getBaseOffset() != WhichVFPtr.FullOffsetInMDC || MD->size_overridden_methods()) { // Skip methods that don't belong to the vftable of the current class, // e.g. each method that wasn't seen in any of the visited sub-bases // but overrides multiple methods of other sub-bases. continue; } // If we got here, MD is a method not seen in any of the sub-bases or // it requires return adjustment. Insert the method info for this method. MethodInfo MI(VBIndex, HasRTTIComponent ? Components.size() - 1 : Components.size(), ReturnAdjustingThunk); assert(!MethodInfoMap.count(MD) && "Should not have method info for this method yet!"); MethodInfoMap.insert(std::make_pair(MD, MI)); // Check if this overrider needs a return adjustment. // We don't want to do this for pure virtual member functions. BaseOffset ReturnAdjustmentOffset; ReturnAdjustment ReturnAdjustment; if (!FinalOverriderMD->isPure()) { ReturnAdjustmentOffset = ComputeReturnAdjustmentBaseOffset(Context, FinalOverriderMD, MD); } if (!ReturnAdjustmentOffset.isEmpty()) { ForceReturnAdjustmentMangling = true; ReturnAdjustment.NonVirtual = ReturnAdjustmentOffset.NonVirtualOffset.getQuantity(); if (ReturnAdjustmentOffset.VirtualBase) { const ASTRecordLayout &DerivedLayout = Context.getASTRecordLayout(ReturnAdjustmentOffset.DerivedClass); ReturnAdjustment.Virtual.Microsoft.VBPtrOffset = DerivedLayout.getVBPtrOffset().getQuantity(); ReturnAdjustment.Virtual.Microsoft.VBIndex = VTables.getVBTableIndex(ReturnAdjustmentOffset.DerivedClass, ReturnAdjustmentOffset.VirtualBase); } } AddMethod(FinalOverriderMD, ThunkInfo(ThisAdjustmentOffset, ReturnAdjustment, ForceReturnAdjustmentMangling ? MD : nullptr)); }}

通過上面代碼分析，在this 需要調整的時候，都是通過AddMethod(FinalOverriderMD，ThunkInfo(ThisAdjustmentOffset, ReturnAdjustment，ForceReturnAdjustmentMangling ? MD : nullptr))函數來添加一個ThunkInfo的結構體，ThunkInfo在結構體(實現在ABI.h)如下：

struct ThunkInfo { /// The \c this pointer adjustment. ThisAdjustment This; /// The return adjustment. ReturnAdjustment Return; /// Holds a pointer to the overridden method this thunk is for, /// if needed by the ABI to distinguish different thunks with equal /// adjustments. Otherwise, null. /// CAUTION: In the unlikely event you need to sort ThunkInfos, consider using /// an ABI-specific comparator. const CXXMethodDecl *Method; ThunkInfo() : Method(nullptr) { } ThunkInfo(const ThisAdjustment &This, const ReturnAdjustment &Return, const CXXMethodDecl *Method = nullptr) : This(This), Return(Return), Method(Method) {} friend bool operator==(const ThunkInfo &LHS, const ThunkInfo &RHS) { return LHS.This == RHS.This && LHS.Return == RHS.Return && LHS.Method == RHS.Method; } bool isEmpty() const { return This.isEmpty() && Return.isEmpty() && Method == nullptr; }};}

Thunkinfo的結構體有一個method,存放函數的真正實現，This和Return記錄this需要調整的信息，然後在生成方法的時候，根據這些信息，編譯器自動插入thunk函數的信息，通過ItaniumMangleContextImpl::mangleThunk(const CXXMethodDecl *MD,const ThunkInfo &Thunk,raw_ostream &Out)的函數，我們得到了證實，函數如下：

（mangle和demangle：將C++源程序標識符(original C++ source identifier)轉換成C++ ABI標識符(C++ ABI identifier)的過程稱為mangle；相反的過程稱為demangle。wiki）

void ItaniumMangleContextImpl::mangleThunk(const CXXMethodDecl *MD, const ThunkInfo &Thunk, raw_ostream &Out) { // <special-name> ::= T <call-offset> <base encoding> // # base is the nominal target function of thunk // <special-name> ::= Tc <call-offset> <call-offset> <base encoding> // # base is the nominal target function of thunk // # first call-offset is 'this' adjustment // # second call-offset is result adjustment assert(!isa<CXXDestructorDecl>(MD) && "Use mangleCXXDtor for destructor decls!"); CXXNameMangler Mangler(*this, Out); Mangler.getStream() << "_ZT"; if (!Thunk.Return.isEmpty()) Mangler.getStream() << 'c'; // Mangle the 'this' pointer adjustment. Mangler.mangleCallOffset(Thunk.This.NonVirtual, Thunk.This.Virtual.Itanium.VCallOffsetOffset); // Mangle the return pointer adjustment if there is one. if (!Thunk.Return.isEmpty()) Mangler.mangleCallOffset(Thunk.Return.NonVirtual, Thunk.Return.Virtual.Itanium.VBaseOffsetOffset); Mangler.mangleFunctionEncoding(MD);}

thunk匯編指令分析

至此，通過LLVM源碼我們解開了thunk技術的真面目，那麼我們通過反匯編程序來驗證證實一下, 這裡使用objdump 或者逆向利器 hopper都可以，小編使用的是hopper,匯編代碼如下：

1.我們先來看編譯器實現的thunk 版的test函數

派生類實現的test函數

編譯器實現的thunk版的test函數

2.通過上面兩張截圖我們發現

編譯器實現的thunk的test函數地址為0x100003e30

派生類實現的test函數地址為0x100003e00

下面我們來看下派生類的虛表中存的真實地址是那一個

通過上圖我們可以看到：派生類的虛表中存的真實地址為編譯器動態添加的thunk函數的地址0x100003e30。

上面分析的*(rcx)間接尋址：就是調用thunk函數的實現，然後在thunk中去調用真正的派生類覆蓋的函數。

在這裡我們可以確定的 thunk技術：

就是編譯器在編譯的時候,遇到調用this和返回值this需要調整的地方，動態的加入對應的thunk版的函數,在thunk函數的內部實現this的偏移調整，和調用派生類實現的虛函數；並將編譯器實現的thunk函數的地址存入虛表中，而不是派生類實現的虛函數的地址。

thunk函數的內存布局

也可以確定對應的內存布局如下：

故（繼承鏈中不是第一個）虛函數繼承的基類指針的調用順序為：

virtual-thunk和non-virtual-thunk

注意：在這裡可以看到，內存中有兩份VBase,在多繼承中分為普通繼承、虛函數繼承、虛繼承。虛繼承主要是為了解決上面看到的問題：在內存中同時有兩份Vbase 的內存，將上面的代碼改動一下就會確保內存中的實例只有一份：

class VBaseA: public VBase 改成 class VBaseA: public virtual VBase

class VBaseB: public VBase 改成 class VBaseB: public virtual VBase

這樣內存中的VBase就只有一分內存了。

到這裡還有問題沒有解答，就是上面截圖裡的thunk函數類型是：

我們發現thunk函數是 non-virtual-thunk類型，那對應的virtual-thunk是什麼類型呢？

在解答這個問題之前我們現看下下面的例子？

public A { virtual void test() { }}public B { virtual void test1() { }}public C { virtual void test2() { }}public D : public virtual A, public virtual B, public C { virtual void test1() { // 這裡實現的test1函數在 B類的虛函數表里就是virtual-trunk的類型 } virtual void test2() { // 這裡實現的test2函數在 C類的虛函數表示就是no-virtual-trunk的類型 }}

虛函數繼承和虛繼承相結合，且該類在派生類的繼承鏈中不是第一個基類的時候，則該派生類實現的虛函數在編譯器編譯的時候，虛表里存放就是virtual-trunk類型。

只有虛函數繼承的時候，且該類在派生類的繼承鏈中不是第一個基類的時候，則該派生類實現的虛函數在編譯器編譯的時候，虛表里存放就是no-virtual-trunk類型。

3 為什麼LLDB調試器顯示的地址一樣呢？

如果做了偏移，那為什麼LLDB expression顯示的地址是派生類對象的首地址呢？

到了現在了解了什麼是thunk技術，還沒有一個問題沒有解決：就是LLDB調試的時候，顯示的this的地址是基類偏移後的(派生類的地址)，前面通過匯編分析編譯器在類型轉換的時候，做了真正的偏移，通過讀取內存地址也發現是偏移後的真實地址，那lldb expression獲取的地址為啥還是派生類的地址呢？由此可以猜測是LLDB調試器通過exppress 命令執行的時候做了類型的轉換。

通過翻閱LLDB調試器的源碼和LLDB說明文檔，通過文檔得知LLDB在每次拿到一個地址，需要向用戶友好的展示的時候，首先需要通過summary format()進行格式化轉換，格式化轉化的依據是動態類型(lldb-getdynamictypeandaddress)的獲取，在LLDB源碼的bool ItaniumABILanguageRuntime::GetDynamicTypeAndAddress(lldb-summary-format)函數中找到了答案，代碼如下

// For Itanium, if the type has a vtable pointer in the object, it will be at // offset 0 // in the object. That will point to the "address point" within the vtable // (not the beginning of the // vtable.) We can then look up the symbol containing this "address point" // and that symbol's name // demangled will contain the full class name. // The second pointer above the "address point" is the "offset_to_top". We'll // use that to get the // start of the value object which holds the dynamic type.

bool ItaniumABILanguageRuntime::GetDynamicTypeAndAddress( ValueObject &in_value, lldb::DynamicValueType use_dynamic, TypeAndOrName &class_type_or_name, Address &dynamic_address, Value::ValueType &value_type) { // For Itanium, if the type has a vtable pointer in the object, it will be at // offset 0 // in the object. That will point to the "address point" within the vtable // (not the beginning of the // vtable.) We can then look up the symbol containing this "address point" // and that symbol's name // demangled will contain the full class name. // The second pointer above the "address point" is the "offset_to_top". We'll // use that to get the // start of the value object which holds the dynamic type. // class_type_or_name.Clear(); value_type = Value::ValueType::eValueTypeScalar; // Only a pointer or reference type can have a different dynamic and static // type: if (CouldHaveDynamicValue(in_value)) { // First job, pull out the address at 0 offset from the object. AddressType address_type; lldb::addr_t original_ptr = in_value.GetPointerValue(&address_type); if (original_ptr == LLDB_INVALID_ADDRESS) return false; ExecutionContext exe_ctx(in_value.GetExecutionContextRef()); Process *process = exe_ctx.GetProcessPtr(); if (process == nullptr) return false; Status error; const lldb::addr_t vtable_address_point = process->ReadPointerFromMemory(original_ptr, error); if (!error.Success() || vtable_address_point == LLDB_INVALID_ADDRESS) { return false; } class_type_or_name = GetTypeInfoFromVTableAddress(in_value, original_ptr, vtable_address_point); if (class_type_or_name) { TypeSP type_sp = class_type_or_name.GetTypeSP(); // There can only be one type with a given name, // so we've just found duplicate definitions, and this // one will do as well as any other. // We don't consider something to have a dynamic type if // it is the same as the static type. So compare against // the value we were handed. if (type_sp) { if (ClangASTContext::AreTypesSame(in_value.GetCompilerType(), type_sp->GetForwardCompilerType())) { // The dynamic type we found was the same type, // so we don't have a dynamic type here... return false; } // The offset_to_top is two pointers above the vtable pointer. const uint32_t addr_byte_size = process->GetAddressByteSize(); const lldb::addr_t offset_to_top_location = vtable_address_point - 2 * addr_byte_size; // Watch for underflow, offset_to_top_location should be less than // vtable_address_point if (offset_to_top_location >= vtable_address_point) return false; const int64_t offset_to_top = process->ReadSignedIntegerFromMemory( offset_to_top_location, addr_byte_size, INT64_MIN, error); if (offset_to_top == INT64_MIN) return false; // So the dynamic type is a value that starts at offset_to_top // above the original address. lldb::addr_t dynamic_addr = original_ptr + offset_to_top; if (!process->GetTarget().GetSectionLoadList().ResolveLoadAddress( dynamic_addr, dynamic_address)) { dynamic_address.SetRawAddress(dynamic_addr); } return true; } } } return class_type_or_name.IsEmpty() == false;}

通過上面代碼分析可知，每次在通過LLDB 命令expression動態調用指針地址的時候，LLDB 會去按照調試器默認的格式進行格式化，格式化的前提是動態獲取到對應的類型和偏移後的地址；在碰到C++有虛表的時候，且不是虛表中的第一個基類指針的時候，就會使用指針上頭的offset_to_top 獲取到這個對應動態的類型和返回動態獲取的該類型對象開始的地址。

五總結

上面主要驗證了在指針類型轉換的時候,編譯器內部做了真實的地址偏移；

通過上面的分析，我們得知編譯器在函數調用時通過thunk技術動態調整入參this指針和返回值this指針，保證C++調用時this的正確性；

在通過LLDB expression獲取非虛函數基類指針內容時，LLDB內部通過summary format進行格式化轉換，格式化轉化時會進行動態類型的獲取。

六工具篇1 獲取匯編程序預處理->匯編

clang++ -E main.cpp -o main.i

clang++ -S main.i

objdump

objdump -S -C 可執行程序

反匯編利器: hopper

下載hopper，可執行程序拖入即可

Xcode

Xcode->Debug->Debug WorkFlow->Show disassembly

2 導出C++內存布局Clang++編譯器

clang++ -cc1 -emit-llvm -fdump-record-layouts -fdump-vtable-layouts main.cpp

七參考文獻

https://matklad.github.io/2017/10/21/lldb-dynamic-type.html

https://lldb.llvm.org/use/variable.html

https://github.com/llvm-mirror/lldb/blob/bc19e289f759c26e4840aab450443d4a85071139/source/Plugins/LanguageRuntime/CPlusPlus/ItaniumABI/ItaniumABILanguageRuntime.cpp#L185

https://clang.llvm.org/doxygen/VTableBuilder_8cpp_source.html#l03109

https://clang.llvm.org/doxygen/ABI_8h_source.html

相關技術：

llvm-virtual-thunk

llvm-no-virtual-thunk

lldb-summary-format

lldb-getdynamictypeandaddress

數據庫安全

點擊閱讀原文查看詳情！

揚阜

鑽石舞台

鑽石舞台發表在痞客邦留言(0) 人氣()

鑽石舞台

鑽石鑽石亮晶晶

阿里技術 - 「現代C++設計魅力」虛函數繼承-thunk技術初探

歷史上的今天

留言列表

文章搜尋

最新文章

熱門文章

誰來我家

參觀人氣

鑽石舞台

鑽石鑽石亮晶晶

阿里技術 - 「現代C++設計魅力」虛函數繼承-thunk技術初探

歷史上的今天

留言列表

文章搜尋

最新文章

熱門文章

誰來我家

參觀人氣

贊助商連結