非アセンブリルーチンでIACAを使用する

私はIACA（インテルの静的コードアナライザ）で遊んできました。
このように、私はマジックを手動でバイト入力することができ、組立スニペットをテストするときにそれが正常に動作します：非アセンブリルーチンでIACAを使用する

procedure TSlice.BitSwap(a, b: integer); 
asm 
    //RCX = self 
    //edx = a 
    //r8d = b 

    mov ebx, 111  // Start IACA marker bytes 
    db $64, $67, $90 // Start IACA marker bytes 

    xor eax, eax 
    xor r10d, r10d 

    mov r9d, [rcx] // read the value 
    mov ecx,edx  // need a in cl for the shift 
    btr r9d, edx // read and clear the a bit 

    setc al   // convert cf to bit 
    shl eax, cl  // shift bit to ecx position 

    btr r9d, r8d // read and clear the b bit 

    mov ecx, r8d // need b in ecx for shift 
    setc r10b  // convert cf to bit 
    shl r10d, cl // shift bit to edx position 

    or r9d, eax  // copy in old edx bit 
    or r9d, r10d // copy in old ecx bit 

    mov [r8], r9d // store result 
    ret 

    mov ebx, 222  // End IACA marker bytes 
    db $64, $67, $90 // End IACA marker bytes 
end;

必要な魔法のマーカーと/接尾辞以外アセンブリコードを先頭に付加する方法はあります私ができるように、コンパイラが生成したコードを分析しますか？

は私がCPUビューから生成されたアセンブリをコピー＆ペーストして、それを使用してルーチンを作成することができます知っているが、私は

EDIT
私が探しているより簡単にワークフローがあり期待していました64ビットコンパイラで動作するソリューション私はアセンブリと通常のコードを32ビットコンパイラで混在させることができます。

更新
@ Dsmの提案が動作します。 @ルディのトリックはありません。

次ダミーのコードは動作します：

Throughput Analysis Report 
-------------------------- 
Block Throughput: 13.33 Cycles  Throughput Bottleneck: Dependency chains (possibly between iterations) 

Port Binding In Cycles Per Iteration: 
--------------------------------------------------------------------------------------- 
| Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | 
--------------------------------------------------------------------------------------- 
| Cycles | 1.3 0.0 | 1.4 | 1.0 1.0 | 1.0 1.0 | 0.0 | 1.4 | 2.0 | 0.0 | 
--------------------------------------------------------------------------------------- 

N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0) 
D - Data fetch pipe (on ports 2 and 3), CP - on a critical path 
F - Macro Fusion with the previous instruction occurred 
* - instruction micro-ops not bound to a port 
^ - Micro Fusion happened 
# - ESP Tracking sync uop was issued 
@ - SSE instruction followed an AVX256/AVX512 instruction, dozens of cycles penalty is expected 
X - instruction not supported, was not accounted in Analysis 

| Num Of |     Ports pressure in cycles      | | 
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | | 
--------------------------------------------------------------------------------- 
| 3^ | 0.3  | 0.3 | 1.0 1.0 |   |  | 0.3 | 1.0 |  | CP | ret 
| X |   |  |   |   |  |  |  |  | | int3 
[... more int3's] 
| X |   |  |   |   |  |  |  |  | | int3 
| 1 | 1.0  |  |   |   |  |  |  |  | | shl eax, 0x10 
| 1 |   | 0.6 |   |   |  | 0.3 |  |  | | cmp eax, 0x64 
| 3^ |   | 0.3 |   | 1.0 1.0 |  | 0.6 | 1.0 |  | CP | ret 
| X |   |  |   |   |  |  |  |  | | int3 
| X |   |  |   |   |  |  |  |  | | int3 
[...] 
Total Num Of Uops: 8

UPDATE 2
呼び出し文がIACAに存在する場合は、コードを分析したい爆撃していないようです。違法な指示に苦情を言います。しかし、基本的な考え方は機能します。明らかに、最初のretとそれに関連するコストを引く必要があります。

出典

2017-09-19 Johan

両方のシーケンスは正確に8バイトです。あなたは 'X：= $ 906764000000F6BB'をあなたのルーチンの最後に' X：= $ 906764000000DEBB'と書くことができますか？ 'X'は' UInt64'ですか？ –

機械コードを変更できない場合、静的解析はどのような点で良いですか？ –

@RudyVelthuis、比較のベースラインとして使用する。非アセンブリコードはインライン化でき、アセンブリコードはインライン化できません。 – Johan

私はので、私はこのアイデアをテストすることはできません、それが動作しない場合、私は回答を削除しますが、あなたはこのような何かすることはできませんIACAを使用しないでください：

procedure TForm10.Button1Click(Sender: TObject); 
begin 
    asm 
    //RCX = self 
    //edx = a 
    //r8d = b 

    mov ebx, 111  // Start IACA marker bytes 
    db $64, $67, $90 // Start IACA marker bytes 
    end; 

    fRotate(fLine - Point(0,1), 23); 

    asm 
    mov ebx, 222  // End IACA marker bytes 
    db $64, $67, $90 // End IACA marker bytes 

    end; 
end;

これだったのコンパイルするかどうかを調べるためのサンプルルーチンです。

悲しいことに、これは32ビットでのみ機能します.Johanは64ビットでは許可されていないと指摘しています。

64ビットの場合、次のように動作する可能性がありますが、もう一度テストすることはできません。

procedure TForm10.Button1Click(Sender: TObject); 
    procedure Test1; 
    asm 
    //RCX = self 
    //edx = a 
    //r8d = b 

    mov ebx, 111  // Start IACA marker bytes 
    db $64, $67, $90 // Start IACA marker bytes 
    end; 
    procedure Test2; 
    begin 
    fRotate(fLine - Point(0,1), 23); 
    end; 
    procedure Test3; 
    asm 
    mov ebx, 222  // End IACA marker bytes 
    db $64, $67, $90 // End IACA marker bytes 

    end; 
begin 
    Test1; 
    Test2; 
    Test3; 
end;

出典

2017-09-19 12:55:15 Dsm

IACAの最新バージョンはx64でしか動作しません。 64ビットコンパイラではアセンブリコードと通常コードの混在はできません。 32ビット版とそれ以前のバージョンのIACAで動作します。 – Johan

@Johan私は卑劣なバージョンのコードを修正しました - これはあなたの必要に応じて複雑すぎるかどうかわかりませんが。 – Dsm

ルーチンの開始時に無関係な 'ret'とおそらくいくつかのアラインメントコードがありますが、うまくいくかもしれません。 – Johan

非アセンブリルーチンでIACAを使用する

答えて

関連する問題