Алгоритмы дизассемблирования - Code vs Data


Соломенные сандалии
Дизассемблер сегодня должен обладать гибкими функциями анализа кода, эта тема посвящается методам дизассемблирования кода и инновационным фреймворкам или надстройкам.

Probabilistic Disassembly

Abstract - Disassembling stripped binaries is a prominent challenge for binary analysis, due to the interleaving of code segments and data, and the difficulties of resolving control transfer targets of indirect calls and jumps. As a result, most existing disassemblers have both false positives (FP) and false negatives (FN). We observe that uncertainty is inevitable in disassembly due to the information loss during compilation and code generation. Therefore, we propose to model such uncertainty using probabilities and propose a novel disassembly technique, which computes a probability for each address in the code space, indicating its likelihood of being a true positive instruction. The probability is computed from a set of features that are reachable to an address, including control flow and data flow features. Our experiments with more than two thousands binaries show that our technique does not have any FN and has only 3.7% FP. In comparison, a state-of-the-art superset disassembly technique has 85% FP. A rewriter built on our disassembly can generate binaries that are only half of the size of those by superset disassembly and run 3% faster. While many widelyused disassemblers such as IDA and BAP suffer from missing function entries, our experiment also shows that even without any function entry information, our disassembler can still achieve 0 FN and 6.8% FP.



Spedi - Speculative disassembly, CFG recovery, and call-graph recovery from stripped binaries.

Spedi is a speculative disassembler for the variable-size Thumb ISA. Given an ELF file as input, Spedi can:

Recover correct assembly instructions.
Recover targets of switch jumps tables.
Identify functions in the binary and their call graph.
Spedi works directly on the binary without using symbol information. We found Spedi to outperform IDA Pro in our experiments.

This project depends on Capstone disassembly library (v3.0.4).

Result summary
Spedi (almost) perfectly recovers assembly instructions from our benchmarks binaries with 99.96% average. In comparison, IDA Pro has an average of 95.83% skewed by the relative poor performance on sha benchmark.

Spedi precisely recovers 97.46% of functions on average. That is, it identifies the correct start address and end address. Compare that to 40.53% average achieved by IDA Pro.

Disassembly time
A nice property of our technique is that it's also fast and scales well with increased benchmark size. For example, spedi disassembles du (50K instructions) in about 150 ms. Note that there is good room for further optimizations.

Spedi - --> Link <--
Speculative disassembly of binary code - --> Link <--


Nucleus, a tool for function identification in x64 binaries. Their paper "Compiler-Agnostic Function Detection in Binaries" was accepted at IEEE Euro S&P 2017. They use more or less the same function identification techniques implemented in Spedi.

Nucleus - --> Link <--
Compiler-Agnostic Function Detection in Binaries - --> Link <--


State-Enhanced Control Flow Graph
Abstract— In the omnipresent model of the stored-program computer, both the instructions and data are held in a single storage structure. Therefore, instructions can be read and written as if they were data. In practice however, instructions rarely change during the execution of the program.
As a result, it is often assumed that the instructions are constant. Therefore, many tools and analyses fail in the presence of self-modifying code. In this paper, we present an extension to the control flow graph representation, which enables the analysis, optimization and generation of self-modifying code: the state-enhanced control flow graph.
Keywords—Self-Modifying Code, Viruses, Obfuscation, State-Enhanced Control Flow Graph

A Model for Self-Modifying Code
Self-modifying code is notoriously hard to understand and
therefore very well suited to hide program internals. In this paper we introduce a program representation for this type of code: the state-enhanced control flow graph. It is shown how this program representation can be constructed, how it can be linearized into a binary program, and how it can be used to generate, analyze and transform self-modifying code.

Diablo is a retargetable link-time binary rewriting framework - https://github.com/csl-ugent/diablo
(Вроде как использует технику SECFG)

Diablo (Diablo Is A Better Link-time Optimizer) is a retargetable link-time binary rewriting framework. While our focus has been mostly on program compaction and software protection, binary rewriting has a much broader range of applications: speed optimizations, power consumption optimizations, size optimizations, watermarking, instrumentation, etc.

A good binary rewriting framework (one like Diablo :)) is also very useful for program analysis and understanding. For instance, Diablo can print out the control flow graph for all functions in a program, annotated with for example liveness information.

Lately Diablo has been used in the ASPIRE project to develop a number of software protection techniques, the code of which can be found in the aspire subdirectory.

Прикрепленные файлы:

Последнее редактирование:


Судя по результатам, похоже что народ хайпует на популярной теме.
Верх Низ