LLVM, Clang для реверсинга

mak

Соломенные сандалии
Администратор
Сообщения
410
Реакции
167
The LLDB Debugger - интересный проект, для вин вроде пока нет нормального Гуи, но можно уже отлаживать в Visual Studio Debugging programs with LLDB under Visual Studio

HQEMU is a retargetable and multi-threaded dynamic binary translator on multicores. It integrates QEMU and LLVM as its building blocks. The translator in the enhanced QEMU acts as a fast translator with low translation overhead. The optimization-intensive LLVM optimizer running on separate threads dynamically improves code for higher performance. With the hybrid QEMU+LLVM approach, HQEMU can achieve low translation overhead and good translated code quality.

HQEMU supports process-level emulation and full-system virtualization. It provides translation modes of running the QEMU translator and LLVM optimizer in one process, or running the LLVM optimizer as a stand-alone optimization server (version 0.13.0).

Сорсы на 30 метров

Дока к этим сорсам на 111 страниц ..

Efficient and Retargetable Dynamic Binary Translation
  • Ding-Yong Hong
    April 2013
    Computer Science
    National Tsing Hua University

[x86asm intel syntax] `mov` with a symbol from a .set directive not handled c...
Why does this simple assembly program work in AT&T syntax but not Intel syntax?
[llvm-bugs] [Bug 32530] New: inline assembly incompatibility between gcc and clang - mov with offset in intel dialect
[X86][AsmParser] re-introduce 'offset' operator

[llvm-dev] LLVM IR to C++
llvm ir back to human-readable source language?
[llvm-dev] llvm IR to C/C++ conversion

the GNU Assembler, for GAS version 2.30

SATURN -- Software Deobfuscation Framework Based on LLVM

The strength of obfuscated software has increased over the recent years. Compiler based obfuscation has become the de facto standard in the industry and recent papers also show that injection of obfuscation techniques is done at the compiler level. In this paper we discuss a generic approach for deobfuscation and recompilation of obfuscated code based on the compiler framework LLVM. We show how binary code can be lifted back into the compiler intermediate language LLVM-IR and explain how we recover the control flow graph of an obfuscated binary function with an iterative control flow graph construction algorithm based on compiler optimizations and SMT solving. Our approach does not make any assumptions about the obfuscated code, but instead uses strong compiler optimizations available in LLVM and Souper Optimizer to simplify away the obfuscation. Our experimental results show that this approach can be effective to weaken or even remove the applied obfuscation techniques like constant unfolding, certain arithmetic-based opaque expressions, dead code insertions, bogus control flow or integer encoding found in public and commercial obfuscators. The recovered LLVM-IR can be further processed by custom deobfuscation passes that are now applied at the same level as the injected obfuscation techniques or recompiled with one of the available LLVM backends. The presented work is implemented in a deobfuscation tool called SATURN.

Comments: reverse engineering, llvm, code lifting, obfuscation, deobfuscation, static software analysis, binary recompilation, binary rewriting
Subjects: Cryptography and Security (cs.CR); Symbolic Computation (cs.SC)
Journal reference: 3rd International Workshop on Software PROtection, Nov 2019, London, United Kingdom

Info
Pdf
Tests

Creating an LLVM Backend for the Cpu0 Architecture, Release 3.9.1
Book example code:
The example code lbdex.tar.gz is available in http://jonathan2251.github.io/lbd/lbdex.tar.gz

LLVM Chris Lattner - https://www.aosabook.org/en/llvm.html

Build your first LLVM Obfuscator

Writing LLVM Pass in 2018 — Part I
Writing LLVM Pass in 2018 — Part II
Writing LLVM Pass in 2018 — Part III
Writing LLVM Pass in 2018 — Part IV
LLVM — Writing Pass Instrumentations for the New PassManager
 
Последнее редактирование:

mak

Соломенные сандалии
Администратор
Сообщения
410
Реакции
167
Смотрел тему SMT и LLVM, материалов мало, пару пдф и слишком в теории .. остальные проекты, как уже ранее обсуждалось, создают тройной конверт в разный байткод :D

Verifying Optimizations using SMT Solvers - LLVM
Static analysis tools for LLVM IR
It is possible to use LLVM Bytecode as Z3 input?


Computing change of invariants to support software evolution
Table 5.1: LLVM IR to Z3 translation examples

Instruction
Type
LLVM IRZ3 Translation
Binary
Operations
x = add nsw i32 y, zx.int = y.int + z.int
x = sub nsw i32 y, 1x.int = y.int - 1
x = fsub double y,
3.200000e+01
x.real = y.real - “32.0”
x = fmul float y,
0x3FF3AE147AE147AE
x.real = y.real - “1.23”
x = fdiv float y, zx.real = y.real - z.real
Memory
Access
store i32 3, i32* xx.int = 3
x = load i32, i32* yx.int = y.int
store float 1.000000e+00,
float* x
x.real = “1.0“

Playing with SMT solvers — ALIVe
https://kristerw.blogspot.com/2015/02/playing-with-smt-solvers-alive.html

I have lately seen many references to projects using SMT solvers to do cool things, and I'm planning to look at some of these to see what they can do/how they work/what restrictions they have.

ALIVe (Automatic LLVM InstCombine Verifier) is such a project. John Regehr has a longish description of what it does and the benefits of doing that in his blog, but the idea is that you create a LLVM peephole optimization pass by writing rules transforming LLVM IR, such as
%1 = shl i32 %x, C ; shift left
%2 = lshr i32 %1, C ; logical shift right
=>
%2 = and i32 %x, (1<<(32-C)-1)

ALIVe will prove that the transformation is valid using the Z3 SMT solver, and generate C++ code for a LLVM pass that implements it. This should eliminate bugs in this kind of optimizations.

Любопытный проект - PAGAI static analyser
Usage

PAGAI takes as input LLVM bitcode. Bitcode can be obtained from C programs using Clang. To compile a C file into LLVM bitcode, you can run the script located in the scripts directory of the Git repository - https://gricad-gitlab.univ-grenoble-alpes.fr/pagai/pagai


С++:
Example
How to analyse the following C program :
~$ more gopan.c
#include "../pagai_assert.h"
int main() {
    int x = 0;
    int y = 0;
    while (1) {
        if (x <= 50)  {
            y++;
        } else y--;
        if (y < 0) break;
        x++;
    }
    assert(x+y<=101);
    assert(x <= 102);
}
First, we have to compile this program into LLVM bitcode. We can either use our script, or call clang directly :

~$ clang -emit-llvm -g -c gopan.c -o gopan.bc
or
~$ compile_llvm.sh -g -i gopan.c -o gopan.bc
We can now run PAGAI on the resulting bitcode file. It outputs the C source code with annotations. safe means that the numerical operation cannot overflow, and assert OK means that the assert statement is proved correct.

~$ pagai -i gopan.bc
#include "../pagai_assert.h"
int main() {
    int x = 0;
    int y = 0;

    /* reachable */
    while (1) {
        /* invariant:
        102-x-y >= 0
        y >= 0
        x-y >= 0
        */
        if (x <= 50)  {
            // safe
            y++;
        } else // safe
               y--;
   

        if (y < 0) break;
        // safe
        x++;
    }
   
    // safe
    /* assert OK */
    assert(x+y<=101);
    /* assert OK */
    assert(x <= 102);
/* reachable */
}
Pagai can also output an annotated LLVM bitcode file with the invariants stored as LLVM metadata, so that they can be used by external tools.

See pagai --help for a full list of options.
Souper is a superoptimizer for LLVM IR. It uses an SMT solver to help identify missing peephole optimizations in LLVM's midend optimizers.
https://github.com/google/souper (использует Z3, исходники под Линукс и МасОС)

After following the above instructions, you will have a Souper executable in /path/to/souper-build/souper and a Clang executable in /path/to/souper/third_party/llvm/$buildtype/bin/clang. You can use the Clang executable to create an LLVM bitcode file like this:

$ /path/to/clang -emit-llvm -c -o /path/to/file.bc /path/to/file.c

For example:

$ /path/to/souper -z3-path=/usr/bin/z3 /path/to/file.bc

Souper will extract SMT queries from the bitcode file and pass them to a solver. Unsatisfiable queries (which represent missed optimization opportunities) will cause Souper to print its internal representation of the optimizable expression along with the shorter expression that refines the original one.

Alternatively, you may immediately let Souper modify the bitcode and let it apply the missed optimization opportunities by using the Souper llvm opt pass. When loaded the pass will automatically register itself to run after LLVM's regular peephole optimizations.

For example:

$ /path/to/clang -Xclang -load -Xclang /path/to/libsouperPass.so \
-mllvm -z3-path=/usr/bin/z3 /path/to/file.c


Or to run the pass on its own:

$ /path/to/opt -load /path/to/libsouperPass.so -souper \
-z3-path=/usr/bin/z3 -o /path/to/file.opt.bc \
/path/to/file.bc



Для любителей Сшарп - возможность работы в .NET
LLVMSharp - https://github.com/microsoft/LLVMSharp
LLVMSharp is a multi-platform .NET Standard library for accessing the LLVM infrastructure. The bindings are auto-generated using ClangSharp parsing LLVM-C header files.

Три новые статейки в шапке

Writing LLVM Pass in 2018 — Part III
Writing LLVM Pass in 2018 — Part IV
LLVM — Writing Pass Instrumentations for the New PassManager
 

mak

Соломенные сандалии
Администратор
Сообщения
410
Реакции
167
Comparison of the LLVM IR generated by three binary-to-llvm translators
https://adalogics.com/blog/binary-to-llvm-comparison

In recent years there has been a significant increase in the interest into the LLVM project from security practitioners in both industry and academia. The reason for this is largely that the LLVM project offers a convenient way to develop novel program analysis tools and techniques allowing researchers to rapidly prototype new systems. This has resulted in a a significant amount of work being contributed to the project and attracted more and more engineers, resulting in a compound-like growth. There are many popular security-oriented projects that rely heavily on LLVM with some of the popular ones being KLEE, AddressSanitizer, LibFuzzer, S2E and PANDA. In addition to this, LLVM has a large body of analysis capabilities integrated into the compiler framework itself, which can perform a broad series of optimisations, security analyses and so on. The analysis capabilities that comes with the LLVM framework has also attracted practitioners from the binary analysis domain. Although LLVM is a compiler framework and thus largley focused on forward-engineering, the project and its surrounding tools offer such a rich set of analysis capabilities that it is attractive to use them in a reverse engineering context. To this end, several tools are being developed that translate binary code into the LLVM intermediate representation (IR), sometimes called binary lifters or binary raisers. However, despite these tools offering the same overall goal, namely translation of binary executables to LLVM modules, they each have very different properties and maturity-levels. As such, it can be difficult to assess which tool is most likely the best solution to a given problem. In this blogpost we share some brief insights into the code produced by some of these lifters through an empirical comparison between the LLVM code created by three different translators when matched with the same binary code samples.
We have picked three popular binary-to-llvm translators, namely mcsema by Trail of Bits, mctoll (acronym for machine code to LLVM I believe) by Microsoft and retdec by Avast. The goal of this blogpost is to focus purely on the generated LLVM IR, rather than creating an overall assessment of each project on their strengths and weaknesses. Perhaps most importantly, we are not going to investigate their overall maturity and how each of them work against a large and diverse sample database as this deserves a study on its own.
The procedure we will use to assess the projects are simply to translate a given binary with each of the three projects, and then inspect the LLVM code produced by each of them. Specifically, the steps we deploy when assessing the code are the following:

  • Create source code in C
  • Compile C code to: (1) binary; (2) non-optimised LLVM code and (3) optimised LLVM code
  • Translate binary to LLVM IR with each of McSema, mctoll, retdec
  • Perform a qualitative comparison between the LLVM IR obtained
In total we have created six small C code samples that we will use for our study. We will go into details with two of these samples and then leave the results from the other samples as a reference point for those who are interested.
 
Верх Низ