The interface to the disassembler level.
The following definitions are used in documentation of modules and functions in this interface.
An instruction is a sequence of consecutive bytes that has known decoding in the given instruction set architecture (ISA). The following semantic properties of an instruction, as provided by ISA specification. In the definitions below the following properties play an important role (see Insn.property
for more details about the properties):
An instruction address is the address of the first byte of the instruction.
A jump
instruction destination is an address defined by ISA specification to which the control flow should transfer if the jump is taken. Potentially, it is possible that the destination of a jump instruction follows the instruction, but otherwise, the instruction that follows the instruction is not the destination, only destinations of the taken jump are considered to be in the set of destinations of an instruction.
An instruction is a conditional jump
if it is a jump
instruction that is not always taken, as defined by the ISA specification.
An instruction is a barrier if it a jump
that is not a call
and is not conditional.
An execution order, is an order in which CPU executes instructions.
The linear order of a sequence of instructions is the ascending order of their addresses.
An instruction is delayed by m > 0
instructions if it takes effect not immediately but after m
other instructions are executed.
An instruction i(k)
follows the instruction i(j)
if i(j)
is not a barrier and either the address of i(k)
is the successor of the address of the last byte of i(j)
or if either i(k+m)
or i(k)
is an instruction that is delayed by m > 0
instructions.
A chain of instructions is a sequence of instruction {i(0); ...; i(k),i(k+1),i(n)}
so that i(k+1) is either a resolved destination of i(k)
or follows it. An instruction can belong to more than one chain.
A valid chain of instructions is a chain where the last instruction is a jump
instruction that is either indirect or its destinations belong to some previous jump in the same chain.
An instruction is valid if it belongs to a valid chain of instructions.
A byte is data if one the following is true: 1) its address is an address of an instruction that is not valid; 2) it was classified in the knowledge base as data; 3) it is not an instruction.
A basic block is an non-empty instruction chain {i(1); ... i(n)}
such that for each 1 < i <= n
,
i(i)
follows i(i-1)
;- there is no valid instruction in the knowledge base that has
i(i)
as a known destination; i(i)
is not a jump when i < n
.
A subroutine is a non-empty finite set of basic blocks {b(1); ..; b(n)}
such that b(1)
dominates each block in {b(2); ..; b(n)}
(which also implies that they are reachable) and b(1)
is called the entry block (or point).
disassemble ?roots arch mem
disassemble provided memory region mem
using best available algorithm and backend for the specified arch
. Roots, if provided, should point to memory regions, that are believed to contain code. At best, this should be a list of function starts. If no roots are provided, then the starting address of the provided memory mem
will be used as a root.
The returned value will contain all memory reachable from the a given set of roots, at our best knowledge.
disassemble_image image
disassemble a given image. Will take executable segments of the image and disassemble it, applying disassemble
function. If no roots are specified, then symbol table will be used as a source of roots. If file doesn't contain one, then entry point will be used.
disassemble_file ?roots path
takes a path to a binary and disassembles it
With_exn.f
is the same as f
except that it throws an exception instead of returning Error
.
merge d1 d2
is a union of control flow graphs and erros of the two disassemblers.
returns all instructions that was successfully decoded in an ascending order of their addresses. Each instruction is accompanied with its block of memory.