Delphi10.4托管记录(Managed Record)产生的机器代码分析

只分析了Delphi Windows平台X86,X64上的代码,ARM平台和Linux平台基于是LLVM编译器的,没有分析。

Delphi 10.4

type
  TMyRecord = record
    Value: Integer;
    class operator Initialize(out Dest: TMyRecord); 
    class operator Finalize(var Dest: TMyRecord); 
    class operator Assign(var Dest: TMyRecord; const [ref] Src: TMyRecord); 
  end;


  var a, b : TMyRecord;
  a := b;

VC2015

struct TMyRecord {
	TMyRecord() {
		GetTickCount();
		Sleep(1);
	}
	~TMyRecord() {
		GetTickCount();
		Sleep(3);
	}

	TMyRecord& operator = (const TMyRecord& v)
	{
		this->value = v.value;
		GetTickCount();
		Sleep(9);
		return *this;
	}
private:
	int   value;
};

	TMyRecord a, b;
	a = b;

VC的代码之所以加了GetTickCount,Sleep是因为VC的编译器极其智能,Release版本的时候如果函数为空会直接优化掉。

先看X86版本Delphi的代码(代码优化是否勾选,产生的代码没有区别,也就是这块Debug和Release版本是一样的)

Unit2.pas.54: var a, b : TMyRecord;
0060DE3A 8D45FC lea eax,[ebp-$04]
0060DE3D E8E6FFFFFF call TMyRecord.&op_Initialize
0060DE42 33C0 xor eax,eax
0060DE44 55 push ebp
0060DE45 68A4DE6000 push $0060dea4
0060DE4A 64FF30 push dword ptr fs:[eax]
0060DE4D 648920 mov fs:[eax],esp
0060DE50 8D45F8 lea eax,[ebp-$08]
0060DE53 E8D0FFFFFF call TMyRecord.&op_Initialize
Unit2.pas.55: a := b;
0060DE58 33C0 xor eax,eax
0060DE5A 55 push ebp
0060DE5B 6887DE6000 push $0060de87
0060DE60 64FF30 push dword ptr fs:[eax]
0060DE63 648920 mov fs:[eax],esp
0060DE66 8D45FC lea eax,[ebp-$04]
0060DE69 8D55F8 lea edx,[ebp-$08]
0060DE6C E8BFFFFFFF call TMyRecord.&op_Assign
0060DE71 33C0 xor eax,eax
0060DE73 5A pop edx
0060DE74 59 pop ecx
0060DE75 59 pop ecx
0060DE76 648910 mov fs:[eax],edx
0060DE79 688EDE6000 push $0060de8e
Unit2.pas.56: end;
0060DE7E 8D45F8 lea eax,[ebp-$08]
0060DE81 E8A6FFFFFF call TMyRecord.&op_Finalize
0060DE86 C3 ret
0060DE87 E9B8B5DFFF jmp @HandleFinally
0060DE8C EBF0 jmp $0060de7e
0060DE8E 33C0 xor eax,eax
0060DE90 5A pop edx
0060DE91 59 pop ecx
0060DE92 59 pop ecx
0060DE93 648910 mov fs:[eax],edx
0060DE96 68ABDE6000 push $0060deab
0060DE9B 8D45FC lea eax,[ebp-$04]
0060DE9E E889FFFFFF call TMyRecord.&op_Finalize
0060DEA3 C3 ret
0060DEA4 E99BB5DFFF jmp @HandleFinally
0060DEA9 EBF0 jmp $0060de9b

十分的啰嗦,基本可以理解成是加了一大堆的Try Finally。懒得要死。

再看VC X86 Debug版本的代码

TMyRecord a, b;
TMyRecord a, b;
00171860 lea ecx,[ebp-18h]
00171863 call TMyRecord::TMyRecord (0171005h)
00171868 mov dword ptr [ebp-4],0
0017186F lea ecx,[ebp-24h]
00171872 call TMyRecord::TMyRecord (0171005h)
00171877 mov byte ptr [ebp-4],1
a = b;
0017187B lea eax,[ebp-24h]
0017187E push eax
0017187F lea ecx,[ebp-18h]
00171882 call TMyRecord::operator= (0171055h)
}
00171887 mov byte ptr [ebp-4],0
0017188B lea ecx,[ebp-24h]
0017188E call TMyRecord::~TMyRecord (0171154h)
00171893 mov dword ptr [ebp-4],0FFFFFFFFh
0017189A lea ecx,[ebp-18h]
0017189D call TMyRecord::~TMyRecord (0171154h)

还算干脆利索,非常直接。
再看VC X86 Release版本产生的代码

TMyRecord a, b;
00A11002 mov edi,dword ptr [__imp__GetTickCount@0 (0A12000h)]
00A11008 call edi
00A1100A mov esi,dword ptr [__imp__Sleep@4 (0A12004h)]
00A11010 push 1
00A11012 call esi
00A11014 call edi
00A11016 push 1
00A11018 call esi
a = b;
00A1101A call edi
a = b;
00A1101C push 9
00A1101E call esi
}
00A11020 call edi
00A11022 push 3
00A11024 call esi
00A11026 call edi
00A11028 push 3
00A1102A call esi
00A1102C pop edi

发现没,极其简洁,而且是原地展开的。执行效率也是非常高的。

再看X64位版本的代码,同样的,Delphi的代码优化无论是否打开,这块的代码是一样的。
先看Delphi的X64部分代码

Unit2.pas.54: var a, b : TMyRecord;
000000000070A490 488D4D38 lea rcx,[rbp+$38]
000000000070A494 E8A7FFFFFF call TMyRecord.&op_Initialize
000000000070A499 90 nop
000000000070A49A 488D4D3C lea rcx,[rbp+$3c]
000000000070A49E E89DFFFFFF call TMyRecord.&op_Initialize
Unit2.pas.55: a := b;
000000000070A4A3 90 nop
000000000070A4A4 488D4D38 lea rcx,[rbp+$38]
000000000070A4A8 488D553C lea rdx,[rbp+$3c]
000000000070A4AC E8BFFFFFFF call TMyRecord.&op_Assign
Unit2.pas.56: end;
000000000070A4B1 90 nop
000000000070A4B2 488D4D3C lea rcx,[rbp+$3c]
000000000070A4B6 E895FFFFFF call TMyRecord.&op_Finalize
000000000070A4BB 90 nop
000000000070A4BC 488D4D38 lea rcx,[rbp+$38]
000000000070A4C0 E88BFFFFFF call TMyRecord.&op_Finalize

眼前一亮,代码质量提高和很多,基本又简洁又好

再看VC的X64 Debug版本产生的代码

TMyRecord a, b;
00007FF665A5185A lea rcx,[rbp+4]
00007FF665A5185E call TMyRecord::TMyRecord (07FF665A511B8h)
00007FF665A51863 nop
00007FF665A51864 lea rcx,[rbp+24h]
00007FF665A51868 call TMyRecord::TMyRecord (07FF665A511B8h)
00007FF665A5186D nop
a = b;
00007FF665A5186E lea rdx,[rbp+24h]
00007FF665A51872 lea rcx,[rbp+4]
00007FF665A51876 call TMyRecord::operator= (07FF665A511B3h)
00007FF665A5187B nop
}
00007FF665A5187C lea rcx,[rbp+24h]
00007FF665A51880 call TMyRecord::~TMyRecord (07FF665A5128Ah)
00007FF665A51885 nop
00007FF665A51886 lea rcx,[rbp+4]
00007FF665A5188A call TMyRecord::~TMyRecord (07FF665A5128Ah)

跟Delphi的代码如出一辙。
再看VC X64 Release版本的代码

TMyRecord a, b;
00007FF7BFF11004 call qword ptr [__imp_GetTickCount (07FF7BFF12000h)]
00007FF7BFF1100A mov ecx,1
00007FF7BFF1100F call qword ptr [__imp_Sleep (07FF7BFF12008h)]
00007FF7BFF11015 call qword ptr [__imp_GetTickCount (07FF7BFF12000h)]
00007FF7BFF1101B mov ecx,1
00007FF7BFF11020 call qword ptr [__imp_Sleep (07FF7BFF12008h)]
a = b;
00007FF7BFF11026 call qword ptr [__imp_GetTickCount (07FF7BFF12000h)]
00007FF7BFF1102C mov ecx,9
00007FF7BFF11031 call qword ptr [__imp_Sleep (07FF7BFF12008h)]
}
00007FF7BFF11037 call qword ptr [__imp_GetTickCount (07FF7BFF12000h)]
00007FF7BFF1103D mov ecx,3
00007FF7BFF11042 call qword ptr [__imp_Sleep (07FF7BFF12008h)]
00007FF7BFF11048 call qword ptr [__imp_GetTickCount (07FF7BFF12000h)]
00007FF7BFF1104E mov ecx,3
00007FF7BFF11053 call qword ptr [__imp_Sleep (07FF7BFF12008h)]

同样的,直接原地展开,展开的代码又小效率又高。

结论:
Delphi托管记录这块开不开优化代码是一样的,在X86上,产生的代码及其啰嗦,可以看成是try finally语法糖,而不是编译器直接产生的优化代码。

聊以安慰的是在X64上面,产生的代码无论体积和效率都是值得称赞的。

估计是易博龙对X86也不怎么上心了吧,毕竟连微软都放出要放弃纯X86版本Windows的信号了,所以X86这块的托管代码基本是try finally语法糖,能用,但效率不保证。X64这块确实优化得还可以,不过也只能对标VC的不优化代码或者说关闭内联函数优化的代码。

这里要对VC编译器称赞一下,优化得及其细致,空函数的删除,函数的内联化处理的也非常的好,既做到体积小,又做到了效率高。

此条目发表在Delphi分类目录,贴了标签。将固定链接加入收藏夹。

Delphi10.4托管记录(Managed Record)产生的机器代码分析》有4条回应

  1. abcd123说:

    这个问题个人觉得完全可能是 var 的内联变量没优化好导致的,在目前的10.4版本中,只要是返回的类型为托管类型,都会给你加一个 try ,比如我上一个帖子回复的 function test: string 这个示例代码一样,感觉编译器作者完全就是为了图省事有一个算一个的加 try 的感觉,明明可以在同一级的代码块中使用一个 try 来处理所有清理工作,但感觉就是没解决好,对此我一直很纳闷,直接按以前在 begin..end 之前 var 变量的方式去生成代码就那么麻烦吗?我感觉完全可以借鉴以前的代码逻辑的啊?

  2. abcd123说:

    不,x64生成的汇编码也同样垃圾,所有涉及到托管类型的返回,也都是当成内联变量加 try..finally 去处理,以下以 string 来作为托管类型示例的 x64 代码:

    原代码是这样的:
    function test: string;
    begin
    Result := ‘123’;
    end;

    代码测试一:
    procedure TForm2.FormCreate(Sender: TObject);
    begin
    test;
    test;
    test;
    end;

    然后生成的 x64 是这样的(注意加的 ———-> 描述):

    Unit2.pas.32: begin
    000000000070A410 55 push rbp
    000000000070A411 4883EC50 sub rsp,$50
    000000000070A415 488BEC mov rbp,rsp
    000000000070A418 48896D28 mov [rbp+$28],rbp
    000000000070A41C 48894D60 mov [rbp+$60],rcx
    Unit2.pas.33: test;
    000000000070A420 48C7453800000000 mov qword ptr [rbp+$38],$0000000000000000
    Unit2.pas.34: test;
    000000000070A428 90 nop
    000000000070A429 48C7454000000000 mov qword ptr [rbp+$40],$0000000000000000
    Unit2.pas.35: test;
    000000000070A431 90 nop
    000000000070A432 48C7454800000000 mov qword ptr [rbp+$48],$0000000000000000
    Unit2.pas.33: test;
    000000000070A43A 90 nop
    000000000070A43B 488D4D38 lea rcx,[rbp+$38]
    000000000070A43F E88CFFFFFF call test
    Unit2.pas.34: test;
    000000000070A444 488D4D40 lea rcx,[rbp+$40]
    000000000070A448 E883FFFFFF call test
    Unit2.pas.35: test;
    000000000070A44D 488D4D48 lea rcx,[rbp+$48]
    000000000070A451 E87AFFFFFF call test
    Unit2.pas.36: end;
    000000000070A456 90 nop ———-> finally
    000000000070A457 488D4D48 lea rcx,[rbp+$48]
    000000000070A45B E8B054D0FF call @UStrClr
    000000000070A460 90 nop ———-> finally
    000000000070A461 488D4D40 lea rcx,[rbp+$40]
    000000000070A465 E8A654D0FF call @UStrClr
    000000000070A46A 90 nop ———-> finally
    000000000070A46B 488D4D38 lea rcx,[rbp+$38]
    000000000070A46F E89C54D0FF call @UStrClr
    000000000070A474 488D6550 lea rsp,[rbp+$50]
    000000000070A478 5D pop rbp
    000000000070A479 C3 ret

    而如果把代码变成这样:

    代码测试二:
    procedure TForm1.FormCreate(Sender: TObject);
    var
    s1, s2, s3: string;
    begin
    s1 := test;
    s2 := test;
    s3 := test;
    end;

    它的代码是这样的:

    Unit2.pas.34: begin
    000000000070A410 55 push rbp
    000000000070A411 4883EC40 sub rsp,$40
    000000000070A415 488BEC mov rbp,rsp
    000000000070A418 48C7453800000000 mov qword ptr [rbp+$38],$0000000000000000
    000000000070A420 48C7453000000000 mov qword ptr [rbp+$30],$0000000000000000
    000000000070A428 48C7452800000000 mov qword ptr [rbp+$28],$0000000000000000
    000000000070A430 48894D50 mov [rbp+$50],rcx
    000000000070A434 90 nop
    Unit2.pas.35: s1 := test;
    000000000070A435 488D4D38 lea rcx,[rbp+$38]
    000000000070A439 E892FFFFFF call test
    Unit2.pas.36: s2 := test;
    000000000070A43E 488D4D30 lea rcx,[rbp+$30]
    000000000070A442 E889FFFFFF call test
    Unit2.pas.37: s3 := test;
    000000000070A447 488D4D28 lea rcx,[rbp+$28]
    000000000070A44B E880FFFFFF call test
    000000000070A450 90 nop
    Unit2.pas.38: end;
    000000000070A451 488D4D28 lea rcx,[rbp+$28] ———-> finally
    000000000070A455 BA03000000 mov edx,$00000003
    000000000070A45A E89155D0FF call @UStrArrayClr
    000000000070A45F 488D6540 lea rsp,[rbp+$40]
    000000000070A463 5D pop rbp
    000000000070A464 C3 ret

    如果使用 10.3,代码一生成的结果应该如代码二的汇编码一样

  3. abcd123说:

    这个问题涉及到所有托管类型,比如我某个项目中有很多接口实现,比如以下的示例代码,可能产生很多垃圾性能,无论x86还是x64:

    procedure test;
    begin
    if self.intf1.ok then
    begin
    if self.intf2.name = ‘admin’ then
    begin
    // same interface code…
    end;
    end;
    end;

评论已关闭。