Intel(R) Processor Trace - Intel PT

mak

Соломенные сандалии
Администратор
Сообщения
861
Реакции
654
Intel(R) Processor Trace - Intel PT (сборная тема)
Intel® Processor Trace (Intel® PT) is an exciting new feature coming in future processors that can be enormously helpful in debugging because it will expose an accurate and detailed trace of activity with triggering and filtering capabilities to help with isolating the tracing that matters. We released specifications recently, and now a library is available to enable tool development as well as a talk this week on the work to make these capabilities available in Linux. Tool and operating system developers have specifications and the library to enable development.

Intel has released a library along wth sample tools to enable use of Intel® Processor Trace (Intel® PT) available as the "Processor Trace Decoder Library" available as a free download. I can tell you a little about this project and I will also explain Intel PT to motivate the decoding capabilities of the Processor Trace Decoder Library.

The project itself will be able to support any operating system which itself is enabled for using Intel PT. Intel PT is presented as a performace event, therefore support in an operating system is easy to detect by seeing if that event is available to configure/use. Changes for Linux have been worked on; the status of some Linux work was presented this week. In time, I expect other operating systems including Windows and OS X will include support for Intel PT too and the Processor Trace Decoder Library is ready for that. The decoder library currently has been verified to build on Linux, Windows and OS X so it is ready!

The project for Processor Trace Decoder Library contains a library for decoding Intel PT together with sample implementations of simple tools built on top of the library that show how to use the library in your own tool. The following are included in the download:
  • libipt: A packet encoder/decoder library plus a document describing the usage of the decoder library.
  • Optional Contents and Samples:
    • ptdump: Example implementation of a packet dumper.
    • ptxed: Example implementation of a trace disassembler.
    • pttc: A trace test generator.
    • script: A collection of scripts.
Processor Trace

Intel recently released details about Intel Processor Trace in the latest Intel® Architecture Instruction Set Extensions Programming Reference as Chapter 11. Intel Processor Trace is a low-overhead execution tracing feature that will be supported by some processors in the future. It works by capturing information about software execution on each hardware thread using dedicated hardware facilities so that after execution completes software can do processing of the captured trace data and reconstruct the exact program flow. Intel PT is not free with respect to execution overhead, but the overhead is low enough that it should work well in production builds for most applications.

The captured information is collected in data packets. The first implementation of Intel PT offers control flow tracing, which includes in these packets timing and program flow information (e.g. branch targets, branch taken/not taken indications) and program-induced mode related information (e.g., Intel® TSX state transitions, CR3 changes). These packets may be buffered internally before being sent to the memory subsystem.

Why is this useful?

Intel PT provides the context around all kinds of events. Performance profilers can use PT to discover the root causes of 'response-time' issues - performance issues which affect the quality of execution, if not the overall runtime. For example, using PT, video application developers can explore, in very fine detail, the execution of problematic individual frames, something not generally possible with more traditional sampling-based collection.

Furthermore, the complete tracing provided by Intel PT enables a much deeper view into execution than has previously been commonly available; for example, loop behavior, from entry and exit down to specific backedges and loop tripcounts, is easy to extract and report.

Debuggers can use it to reconstruct the code flow that led to the current location. Whether this is a crash site, a breakpoint, a watchpoint, or simply the instruction following a function call we just stepped over. They may even allow navigating in the recorded execution history via reverse stepping commands.

Another important use case is debugging stack corruptions. When the call stack has been corrupted, normal frame unwinding usually fails or may not produce reliable results. Intel PT can be used to reconstruct the stack back trace based on actual CALL and RET instructions.

Operating systems could include Intel PT into core files. This would allow debuggers to not only inspect the program state at the time of the crash, but also to reconstruct the control flow that led to the crash. It is also possible to extend this to the whole system to debug kernel panics and other system hangs. Intel PT can trace globally so that when an operating system crash occurs the trace can be saved as part of a operating system crash dump mechanism and then used later to reconstruct the failure.

Intel PT can also help to narrow down data races in multi-threaded operating system and user program code. It can log the execution of all threads with a rough time indication. While it is not precise enough to detect data races automatically, it can give enough information to aid in the analysis.

Trace Buffer Management

The trace data can be collected into operating system provided circular buffers. To simplify memory management and to make it easier for the operating system to find a suitably large piece of memory, the buffer need not be contiguous.

The logical buffer consists of a collection of memory pages and a control structure that describes the page layout. The operating system may configure Intel PT to generate an interrupt when any of the sections is near full.

This enables a variety of different use cases:
  • a single circular buffer
  • a single buffer with copy-out
  • a single buffer with copy-out section by section
While Intel PT generates too much data to store the execution trace over a long period of time to disk, shorter snippets can be saved.

Intel is enabling Linux to provide support for Intel PT through the perf_event interface.

Execution flow reconstruction

Intel PT uses a compact format to store the execution trace. It omits everything that can be deduced directly from the code or from previous trace.

You can compare this with a brief list of instructions for navigating a maze. As long as the way is obvious, you simply follow the twists and turns of the maze. When you come to a junction you need to know whether to turn left or right. In order to navigate the maze, all you really need is a short list of left or right directions. Similar to that, Intel PT uses a single bit to indicate whether a conditional branch has been taken or not taken. Unconditional jumps and linear code are not represented in the trace, at all.

The PT trace consists of a sequence of packets (which come in different types). To represent a selection of conditional branches, for example, Intel PT uses the TNT packet that comes in two different sizes: 8 bits and 64 bits. For reconstructing execution flow, there are a few more things to consider such as indirect branches, function returns, or interrupts. To model these, Intel PT adds more packets like TIP for indirect branches and function returns, and FUP for asynchronous event locations. An interrupt will then be represented as a FUP followed by a TIP, giving the source and destination of the asynchronous branch, respectively. Intel PT also gives information about transactional synchronization. Whenever a transaction is started, committed, or aborted, Intel PT will generate two packets: a MODE.TSX packet giving the new transactional state, and a FUP packet giving the code location at which the new state is effective. For a transaction abort, an additional TIP packet will be generated giving the location of the corresponding abort handler.

Please refer to the specification (Chapter 11 of the Intel® Architecture Instruction Set Extensions Programming Reference) for a full list of supported packets.

In order to reconstruct the execution flow, a decoder therefore needs to decode the instructions in the traced executable or library as well as the PT trace packets. To handle dynamic libraries, the decoder also needs to consider sideband information provided by the operating system.

Intel provides an Open Source reference implementation for decoding PT packets and for reconstructing the execution flow. The Processor Trace Decoder Library (a collection of tools and libraries to enable use of Intel® Processor Trace) is available as a free download. Intel is currently working to help enabling GDB, the GNU* debugger. Additional integration with other tools are being considered as well.

Summary

Intel provides a low-overhead tracing feature that allows recording the execution flow and reconstructing it at a later time. This feature has applications for functional as well as for performance debugging.

Intel Docs:
Architecture instruction set extension programming reference (old)
Intel® 64 and IA-32 architectures software developer’s manual combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4
(Chapter 31 - Intel Processor Trace)


Статьи:
Enhance performance analysis with Intel Processor Trace - Part 1
Intel Processor Trace Part2. Better debugging experience - Part 2

Decoding Intel(R) Processor Trace Using libipt
INSPECTOR: Data Provenance Using Intel Processor Trace (PT)

Видео:
Intel Processor Trace on Linux, TracingSummit2015 - YouTube

Примеры - библиотеки:
intel/libipt
libipt - an Intel(R) Processor Trace decoder library

intelpt/WindowsIntelPT
This driver implements the Intel Processor Trace functionality in Intel Skylake architecture for Microsoft Windows

Windows Intel PT Support
This driver implements the Intel Processor Trace functionality in Intel Skylake architecture for Microsoft Windows.

Overview
Intel Processor Trace is a high performance hardware supported branch tracing mechanism in Intel Skylake architecure.

Primary benefits include:
  • Avoids cache and TLB polution by writing directly to physical memory​
  • Uses a compressed logging format that is suitable for long running traces​
  • Able to trace all branches on a CPU core including userspace and kernel​
Driver Features
  • Trace user processes using CR3 filtering​
  • Trace kernel mode drivers using linear range filtering​
  • Trace up to four arbitrary ranges of physical memory​
  • Log to single physical address range​
  • Log to table of physical pages and map to virtual address range​
  • Multi-core tracing support​
  • Full support for HyperV Root Partitions​
Build Instructions
  • Open the included Visual Studio Project file in Visual Studio 2013 or 2015.​
  • Ensure build options are set to x64 Release and build​
Driver Loading Instructions
  • Ensure your CPU is Skylake architecture and you are running on native hardware (not a hypervisor)​
  • Boot your Windows 8.1 or Windows 10 OS using boot options that allow loading test signed drivers​
  • Install the WindowsPtDriver using sc create intelpt BinPath=%cd%\WindowsPtDriver\x64\Release\WindowsPtDriver.sys​
Current Limitations

All threads in a usermode process will log to a single buffer, making it difficult to determine accurate execution per-thread. This something we are working to fix.

The IOCTLs for this driver must not be called from within the traced process. The driver maps the physical memory ranges holding the trace data into the process that initialized the trace, this is unstable if mapped into the trace target. Use the included command line tool for executing traces against target processes.​


ionescu007/winipt
The Windows Library for Intel Process Trace (WinIPT) is a project that leverages the new Intel Processor Trace functi…

DProvinciani/pt-detector
Code-Reuse Exploits detection using Intel Processor Trace
Computer Engineering - Thesis Project
Title: “Code-Reuse Exploits detection using the Intel Processor Trace technology”

Description: This project pretends, making use of the IntelPT technology, develop a mechanism to detect different kinds of Code-Reuse exploits (ie: ret2libc, ROP, JOP, etc.) at runtime, based on the Control Flow Integrity mitigation (aka. CFI).

Credits: This project make use of the following projects:​

BlackLuny/WinIPTCollector
Intel Processor Trace package collector for Windows

intel/vmtaint
Full-VM taint analysis with Xen, Intel(R) Processor Trace and Triton.

mqf20/ghidra-PT
Tool for integrating Intel Processor Trace (PT) with Ghidra.

yogitapgarud/Intel-PT-work
Tracing Linux kernel with Intel Processor Trace

Mic92/tracedump
System service to dump Intel processor trace + memory after a crash.
 

Indy

Ветеран
Сообщения
117
Реакции
69
Мой годами писался крутился, интересно что это. Фигня от китайских друзей.
 

savinnetsec

Новичок
Сообщения
71
Реакции
6
Не могу статью найти, в AMD вроде были какие-то процессоры, под AM3 вроде, которые давали низкоуровненый доступ к гипервизору и были тем самым удобными для отладки по

ещё до Ryzen и его дыр https://4cio.ru/news/view/6525. Может кто подскажет?
 

mak

Соломенные сандалии
Администратор
Сообщения
861
Реакции
654
https://github.com/BlackLuny/WinIPTCollector/tree/master/WinIPTCollector

Это win patch protection, KPP патчгвард ?

Какой то механизм его обхода wtf ?

В сурках системки нет, что это вообще такое ?
Это один из реализаций декодера пакетов Хардварного трасировщика от интел, здесь можно почитать про описание пакетов - Intel® 64 and IA-32 architectures software developer’s manual combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4
31.1.1.1 Packet Summary
After a tracing tool has enabled and configured the appropriate MSRs, the processor will collect and generate trace
information in the following categories of packets (for more details on the packets, see Section 31.4)
 
Верх Низ