Walkthrough: Simple Binary Rewriting with Phoenix

In the last walkthrough we looked at symbols in the symbol table, but we actually never dove into the IR of Phoenix. In this walkthrough we’re going to learn a bit about the Phoenix IR in one of the simplest programs you can imagine.

This program, AddNop-tool, comes from the Phoenix RDK (like many programs we will walkthrough will). It takes as input a PE file (managed or unmanaged) and outputs a modified PE file. The resulting PE file will be identical to the input except a NOP instruction will be inserted in-between every instruction. A NOP instruction is an instruction that does not do anything, it stands for “No Operation”. It’s often used to insert padding or to create alignment for instruction sequences, without changing the semantics of the program.

As I promised, I will switch between C# and C++/CLI throughout the course of my blog. Last time I used C# and this program is written in C++/CLI. If you aren’t familiar with the syntax, it is actually quite similar to C#, but if you want more details on the language then the C++/CLI language specification is located at this link.

Things Covered in this Article

· Reading/Writing a PE file.

· Adding an instruction to the instruction stream

This program takes two arguments on the command-line, the input-file and the output-file. The output-file will be semantically equivalent to the input-file, except with a whole bunch of nops interspersed throughout the code.

The Main Function

Like StaticGlobalDump, we start with main(), which is given below. The code that is bolded are function calls that have more user-defined functionality behind it, whereas the non-bold code calls directly into supplied framework code (either the CRT, STL, CLR, or Phoenix).

 

Code Point 1: Looking at the code, we see that the first thing we do is to initialize the Phoenix targets. In the StaticGlobalDump walkthrough I explained this code, so I’ll skip discussion of it here. The code is close to identical.

 

Code point 2: This is where we begin initialization of the infrastructure. This is the second time we have seen the BeginInit method, as we also saw it in the StaticGlobalDump program.

What happens when you call BeginInit is that a LOT of things get initialized under the covers. Everything from the initialization of threading and memory management infrastructure of Phoenix, to the symbol and type table, to the controls infrastructure. BeginInit is just something you need to do to get Phoenix started.

Code Point 3: This is standard usage check for the command-line arguments. You may be wondering why is it in-between the BeginInit and EndInit calls. Actually there is no good reason for that. This check could have been done before the code in code point 1. There are some things in Phoenix that need to be done in-between BeginInit and EndInit, but we will talk about those in a future article.

Another question you may have is, why aren’t you using System::Console::WriteLine? We could, and probably should be. We have our own class for output to have a level of abstraction above the BCL.

Code point 4: This is identical to what we did in code point 3 of the StaticGlobalDump code, except this is C++/CLI.

Code point 5: Here we set the path for the output image to the second command line argument (the output-file). This is a property of the PEModuleUnit. The PEModuleUnit will use the fact that this property is set when it closes, and write out a PE file to that path on closing.

Code point 6: DoAddNop is where the action of this code is. This is examined in more depth starting at code point 8.

Code point 7: This closes the PEModuleUnit that we opened. It does more than simply closes the PEModuleUnit. It also checks if the OutputImagePath is non-null. If it is non-null then it writes out the PEModuleUnit to disk (doing everything that is necessary to create a legal PE image), using the OutputImagePath. Note that we did set the OutputImagePath in code point 5, thus when this program ends it generates a new binary.

Note that it does not create a new PDB file for the binary though. This is because there is another property OutputPdbPath that is checked on close of the PEModuleUnit. Since this property was not set, no PDB file was generated. If we added the following line before the call to Close() then it would generate a pdb file with the name “sample.pdb”: module->OutputPdbPath(“sample.pdb”);

int main(array<String ^> ^ args) {

   // 1

   ::InitializeTargets();

   // Initialize the infrastructure.

   // 2

   Phx::Init::BeginInit();

   // Simple usage check

   // 3

   if (args->Length != 2) {

      Phx::Output::WriteLine(

         "Usage: AddNop-tool <input-image-name> <output-image

          name>\n");

      return 1;

   }

   Phx::Init::EndInit("PHX|_PHX_|", args);

   // Open the module.

   // 4

   Phx::PEModuleUnit ^ module = Phx::PEModuleUnit::Open(args[0]);

   // Set up the writer

   // 5

   module->OutputImagePath = args[1];

   // Do some useful work on the tool front here :

   // 6

   ::DoAddNop(module);

   // Write out the new pe

   // 7

   module->Close();

   return 0;

}

The DoAddNop Function

This function is where the fun lies. This function takes as input a PEModuleUnit and changes the IR such that a NOP instruction is inserted in-between every pair of instructions.

The basic idea of this function is to traverse each function in the PEModuleUnit, and traverse each instruction from within each function, adding a NOP instruction before each instruction.

Code point 8: This for-loop iterates over the FuncUnits in the PEModuleUnit. It uses GetEnumerableContribUnit to get an enumerator for each FuncUnit in the PEModuleUnit. I’m not particularly fond of the name ContribUnit for this purpose, but it’s what we have today.

The WriteableFuncUnit is a flag that we can pass the GetEnumerableContribUnit to tell the method that we plan to write to the FuncUnit. Getting a writeable FuncUnit is more expensive than a FuncUnit we only plan to read, since extra-processing needs to be done.

Code point 9: DissassembleToBeforeLayout() raises the binary to low-level IR (LIR). Since we’re reading a PE file directly from a PEModuleUnit, all of the code in a FuncUnit is still in binary format until we do this call.

The name DisassembleToBeforeLayout suggests that we are going to raise the code to before “code layout” has been performed. Code layout is the process where the code generator moves the code into different parts of the image, often to increase locality. Since we are disassembling to before the code layout is performed (and thus reversing all of the code layout decisions as we raise), when we write the image back out to the file the code will be laid out again. We’ll see what this means in more depth later in this article.

Code point 10: This for-loop iterates over each LIR instruction in the FuncUnit that we just disassembled. Phx::IR::Instr::Iter gives us an instruction iterator for the FuncUnit passed to it.

Code point 11: There are two types of instructions in Phoenix, Real and Pseudo. Pseudo instructions are things like pragmas, labels, and data. They do not directly result in code being generated, although they certainly affect the resulting code (and as you can imagine data instructions can leave data in the image). Thus we focus on all of the other instructions, and we will only insert a NOP instruction in-between two real instructions.

Code point 12: We need to create the NOP instruction, and we do this as a ValueInstr. Each opcode, such as NOP, has an associated instruction class, such as ValueInstr. Usually it is fairly obvious to determine which instruction class to use for a given opcode, but sometimes it is not; and NOP is one of those times where I don’t think it is obvious. I am putting together a table that will list the instruction class associated with each opcode, coming in the future.

It’s also worth noting that the opcodes that are in Phx::Common::Opcodes are HIR opcodes. One of the benefits of using HIR is that the opcodes work across all platforms. When the opcode gets lowered, in this example at code point 14, it gets lowered to the appropriate opcode for the target platform (for example MSIL vs x86). So as the developer you don’t need to worry about picking the right low-level opcode.

Code point 13: We now have created the instruction, nop, but it is just sitting in the air. It is not part of an IR stream. instr->InsertBefore(nop) will insert the nop instruction into the IR stream, just before the instr instruction.

Code point 14: The nop instruction that was inserted in code point 13 is an HIR instruction, but the rest of the code that we have raised is at LIR. In order for this instruction to work with the rest of the code we need to lower it.

We use funcUnit->Lower, which returns a Phx::Targets::Runtimes::Lower class. This class provides the framework for lowering opcodes from HIR to LIR for a specific target, e.g., x86. Lower->Instr() method takes a given HIR instruction and lowers it to LIR. Since we are in LIR, we need to do this to ensure that the instructions are all at LIR before we emit to the PE file.

void DoAddNop(Phx::PEModuleUnit ^ module){

   //8

   for each (Phx::FuncUnit ^ funcUnit in

    module->GetEnumerableContribUnit(

         Phx::ContribUnitEnumerationFlags::WriteableFuncUnit))

   {

   // 9

      funcUnit->DisassembleToBeforeLayout();

   // 10

      for each (Phx::IR::Instr ^ instr in

       Phx::IR::Instr::Iter(funcUnit))

      {

   // 11

         if (instr->IsReal && instr->Prev->IsReal)

         {

   // 12

                Phx::IR::ValueInstr ^ nop =

                     Phx::IR::ValueInstr::New(funcUnit,

                        Phx::Common::Opcode::Nop);

   // 13

                instr->InsertBefore(nop);

   // 14

                funcUnit->Lower->Instr(nop);

         }

      }

   }

}

Another successful Phoenix tool built. Admittedly, this tool is of little use by itself, but it is a great learning exercise. Lets try it out on an application. You can run it on itself:

~> addnop-tool.exe addnop-tool.exe out.exe

The resulting output file out.exe can be examined under ILDASM and you’ll see code with NOPs in-between every instruction, for example:

   …

  IL_0000: ldc.i4.0

  IL_0001: nop

  IL_0002: stloc.1

  IL_0003: nop

  IL_0004: ldarg.0

  IL_0005: nop

  IL_0006: ldlen

  IL_0007: nop

  IL_0008: ldc.i4.2

  …

You can also build your own unmanaged C++ application and run addnop-tool.exe on it. Make sure to use /Zi and /link /PROFILE when you build your unmaanged application, since we need this debug information in the PDB file. Unmanaged code, unlike managed code, does not have rich metadata.

You can use DUMPBIN /DISASM application.exe to see the resulting code from running the tool on an unmanaged application. Here’s an example after I ran it on a toy C++ program:

  00401000: 55 push ebp

  00401001: 90 nop

  00401002: 8B EC mov ebp,esp

  00401004: 90 nop

  00401005: 51 push ecx

  00401006: 90 nop

  00401007: C7 45 FC 00 00 00 mov dword ptr [ebp-4],0

    00

  0040100E: 90 nop

  0040100F: E9 00 00 00 00 jmp 00401014

Now that you understand this sample, it is fun to play with it to learn a bit more about Phoenix. For example, what if we added a branch instruction that jumped over the NOP that we inserted. The resulting code might be expected to look something like (note the jumps over the the NOPs):

  00401000: 55 push ebp

  00401001: E9 00 00 00 00 jmp 00401006

  00401002: 90 nop

  00401006: 8B EC mov ebp,esp

  00401008: E9 00 00 00 00 jmp 0040100D

  00401009: 90 nop

  0040100D: 51 push ecx

  0040100E: E9 00 00 00 00 jmp 00401013

  0040100F: 90 nop

  00401013: C7 45 FC 00 00 00 mov dword ptr [ebp-4],0

So now lets change the AddNop-tool code to generate this code instead. Here’s what the new block looks like. Again, note that we are using HIR opcodes from Phx::Common::Opcode. This allows this code to be completely platform independent:

         if (instr->IsReal && instr->Prev->IsReal)

         {

                Phx::IR::ValueInstr ^ nop =

                     Phx::IR::ValueInstr::New(funcUnit,

                        Phx::Common::Opcode::Nop);

           

                Phx::IR::LabelInstr ^ label =

                     Phx::IR::LabelInstr::New(funcUnit,

                        Phx::Common::Opcode::Label);

                Phx::IR::BranchInstr ^ branch =

                     Phx::IR::BranchInstr::New(funcUnit,

                        Phx::Common::Opcode::Goto, label);

            // Insert it before the instruction

                instr->InsertBefore(branch);

                instr->InsertBefore(nop);

                instr->InsertBefore(label);

            // Let lower find right instruction opcode.

          funcUnit->Lower->Instr(branch);

                funcUnit->Lower->Instr(nop);

                funcUnit->Lower->Instr(label);

         }

After recompiling I ran this on my toy C++ application and looked at the resulting assembly with DUMPBIN. To my surprise I did not see assembly in the form I expted. Rather this is what I saw:

  00401000: 55 push ebp

  00401001: E9 00 00 00 00 jmp 00401006

  00401006: 8B EC mov ebp,esp

  00401008: E9 00 00 00 00 jmp 0040100D

  0040100D: 51 push ecx

  0040100E: E9 00 00 00 00 jmp 00401013

  00401013: C7 45 FC 00 00 00 mov dword ptr [ebp-4],0

            00

  0040101A: E9 00 00 00 00 jmp 0040101F

  0040101F: E9 00 00 00 00 jmp 00401024

  00401024: 83 7D FC 02 cmp dword ptr [ebp-4],2

  00401028: E9 00 00 00 00 jmp 0040102D

  0040102D: 0F 8D 43 00 00 00 jge 00401076

  00401033: E9 00 00 00 00 jmp 00401038

  00401038: 68 00 C4 41 00 push 41C400h

  0040103D: E9 00 00 00 00 jmp 00401042

  00401042: E8 55 00 00 00 call 0040109C

  00401047: E9 00 00 00 00 jmp 0040104C

  0040104C: 83 C4 04 add esp,4

  0040104F: E9 00 00 00 00 jmp 00401054

  00401054: E9 00 00 00 00 jmp 00401059

  00401059: 8B 45 FC mov eax,dword ptr [ebp-4]

  0040105C: E9 00 00 00 00 jmp 00401061

  00401061: 83 C0 01 add eax,1

  00401064: E9 00 00 00 00 jmp 00401069

  00401069: 89 45 FC mov dword ptr [ebp-4],eax

  0040106C: E9 00 00 00 00 jmp 00401071

  00401071: E9 AE FF FF FF jmp 00401024

  00401076: B8 01 00 00 00 mov eax,1

  0040107B: E9 00 00 00 00 jmp 00401080

  00401080: 8B E5 mov esp,ebp

  00401082: E9 00 00 00 00 jmp 00401087

  00401087: 5D pop ebp

  00401088: E9 00 00 00 00 jmp 0040108D

  0040108D: C3 ret

  0040108E: 90 nop

  0040108F: 90 nop

  00401090: 90 nop

  00401091: 90 nop

  00401092: 90 nop

The NOPs are not in-between the branch and branch-target like I had specified, what happened? Since the NOPs are effectively dead code, during code-layout Phoenix decided to move the NOPs out of the main part of the code. This helps the locality. The reason code-layout occurs is that we had raised the PE file to before code-layout is performed; we did that in code point 9. When we lowered the IR back out to a binary, code-layout was performed.

So that’s some fun with AddNop-tool. Continue to play around with it a bit. It’s a good program to learn a few things from.