Type system

04 Dec 2016, by mrexodia

This week there wasn’t much going on in the codebase and therefore I decided to skip the weekly digest and write a more substantial post, this time about the type system.

The goal of the type system is to provide a more powerful view of memory than just a linear stream of bytes. It can be used to visualize structures and it also supports function definitions that will be used later. Hopefully it’s an interesting read!

Internal representation

The internal representation of the types is inspired by the radare2 type profiles document by oddcoder.

Primitives

enum Primitive
{
    Void,
    Int8,
    Uint8,
    Int16,
    Uint16,
    Int32,
    Uint32,
    Int64,
    Uint64,
    Dsint,
    Duint,
    Float,
    Double,
    Pointer,
    PtrString, //char* (null-terminated)
    PtrWString //wchar_t* (null-terminated)
};

Complex types are built from primitive types (see the full list above). The Void primitive is not a real type (it cannot have a value) and it’s used as a special case. An alternative name would be Unknown but that was already taken.

All primitive types (except Void) have a fixed size, but that size is not defined as part of the primitive (abstractions love to be abstract). Notice that there is no Bit primitive, which means that bit fields or bit arrays are not supportable in the current type system. There are two primitives to represent the common null-terminated string pointer types, mostly for convenience of the user.

The generic Pointer type is equivalent to void* and can get a more specific meaning in the Type representation below.

Types

struct Type
{
    std::string owner; //Type owner
    std::string name; //Type identifier.
    std::string pointto; //Type identifier of *Type
    Primitive primitive; //Primitive type.
    int size = 0; //Size in bytes.
};

The actual type representation used to represent a primitive type, in say a struct is shown above. The comments should be pretty self-explanatory, but it is worth mentioning that the size member cannot be defined by user-types directly. You can create your own (named) types and for that you can use one of the pre-defined internal types:

p("int8_t,int8,char,byte,bool,signed char", Int8, sizeof(char));
p("uint8_t,uint8,uchar,unsigned char,ubyte", Uint8, sizeof(unsigned char));
p("int16_t,int16,wchar_t,char16_t,short", Int16, sizeof(short));
p("uint16_t,uint16,ushort,unsigned short", Int16, sizeof(unsigned short));
p("int32_t,int32,int,long", Int32, sizeof(int));
p("uint32_t,uint32,unsigned int,unsigned long", Uint32, sizeof(unsigned int));
p("int64_t,int64,long long", Int64, sizeof(long long));
p("uint64_t,uint64,unsigned long long", Uint64, sizeof(unsigned long long));
p("dsint", Dsint, sizeof(void*));
p("duint,size_t", Duint, sizeof(void*));
p("float", Float, sizeof(float));
p("double", Double, sizeof(double));
p("ptr,void*", Pointer, sizeof(void*));
p("char*,const char*", PtrString, sizeof(char*));
p("wchar_t*,const wchar_t*", PtrWString, sizeof(wchar_t*));

The p function simply binds all (comma-separated) type names to a Primitive and a size. The sizes are defined by your compiler implementation.

The owner member is used to represent what created the type. This will generally be the filename of the file it was loaded from, or cmd if the type was created with the commands.

The pointto member is used when primitive is Pointer and it’s the name of the type that the pointer points to. As an example, the type MyStruct* will have the following values:

t.owner = owner; //owner of MyStruct
t.name = "MyStruct*";
t.pointto = "MyStruct";
t.primitive = Pointer;
t.size = sizeof(void*); //predefined

The validPtr function will (recursively) create pointer type aliases if you use a construct like MyStruct* as part of checking if a type is defined.

Members

struct Member
{
    std::string name; //Member identifier
    std::string type; //Type.name
    int arrsize = 0; //Number of elements if Member is an array
    int offset = -1; //Member offset (only stored for reference)
};

If you use a definition inside a complex type (think struct) it will use the Member representation from above. A member like int arrsize; will have the following values:

m.name = "arrsize";
m.type = "int";
m.arrsize = 0; //not an array
m.offset = -1; //unused, only for reference

If the arrsize member is bigger than zero it means that the member was an array of fixed size. For instance bool threadsDone[10];.

StructUnions

struct StructUnion
{
    std::string owner; //StructUnion owner
    std::string name; //StructUnion identifier
    std::vector<Member> members; //StructUnion members
    bool isunion = false; //Is this a union?
    int size = 0;
};

The definition of a struct (or union) shouldn’t be very surprising. A struct is simply a list of Member instances. The size member is used in the Sizeof function and is the combined size of all members. This means that there is no implicit alignment. When adding a member with a defined offset it will simply put an array of padding bytes to make up for the missing space. This also means that you cannot define members out of memory order. This is to prevent overlapping members and also to prevent lots of complexity that isn’t needed for most use cases.

Functions

struct Function
{
    std::string owner; //Function owner
    std::string name; //Function identifier
    std::string rettype; //Function return type
    CallingConvention callconv; //Function calling convention
    bool noreturn; //Function does not return (ExitProcess, _exit)
    std::vector<Member> args; //Function arguments
};

Functions are similar to structs, but they also have a return type and a calling convention. You can define functions (and their arguments), but they are (currently) not used by the GUI. In the future they can be used to provide argument information.

Where is the tree?

You might have noticed that the data structures don’t have a direct tree structure. The main reason for this is that trees are annoying to both represent and manipulate in C++. They are also annoying to serialize and considering that x64dbg uses JSON as a general format I decided to store everything in dictionaries and leave the trees implicit.

There are dictionaries for the Type, StructUnion and Function structures as described above. The type field inside Member for example is a key in either of these dictionaries and that is how the tree’s edges are represented. The tree nodes are the values in the dictionary.

Visitor

struct Visitor
{
    virtual ~Visitor() { }
    virtual bool visitType(const Member & member, const Type & type) = 0;
    virtual bool visitStructUnion(const Member & member, const StructUnion & type) = 0;
    virtual bool visitArray(const Member & member) = 0;
    virtual bool visitPtr(const Member & member, const Type & type) = 0;
    virtual bool visitBack(const Member & member) = 0;
};

The tree structure returns in the Visitor. The visitMember function recursively walks a Member and it’s subtypes with depth first search and it will call one of the visitX functions to signal that a certain kind of node was visited. The visitBack function is called when a complex type subtree was left.

As an example, take the Ray structure:

struct Vec3
{
    int x;
    int y;
    int z;
};

struct Ray
{
    float speed;
    Vec3 direction;
    int lifetime;
};

The tree and the order the nodes are visited in can be visualized like this:

ray tree

The actual structure view in x64dbg will look like this:

ray struct

Conclusion

This post has mostly highlighted the internal representation of the type system, for more information on how to actually use it in x64dbg you can check out Weekly Digest 14 and if you have any questions, please leave comments and I will try to address them.

Comments

Weekly digest 14

27 Nov 2016, by mrexodia

This is already number fourteen of the weekly digests! It will highlight the things that happened to and around x64dbg this week.

Types

There has been quite a lot of progress on the type system in the last few months, but it has now (sort of) come together and you can really start using it. Currently you can get types in the following ways:

Add them with commands;
Load them from JSON;
Load simple C++ headers.

If you want to show a structure (as seen below) you first have to load/parse the types and then you can ‘visit’ the type with an (optional) address to lay it over linear memory. Pointers are supported but the VisitType command has to be used with an explicit pointer depth to expand pointers.

This took all my time for the week, which is why this post is very short. The technical details are interesting though. The built-in type system has no/limited support for dynamic types (variable array sizes are not supported). This was needed to keep the structures simple and get started quickly. The GUI however is designed to be more generic and the API is much simpler.

typedef struct _TYPEDESCRIPTOR
{
    bool expanded; //is the type node expanded?
    bool reverse; //big endian?
    const char* name; //type name (int b)
    duint addr; //virtual address
    duint offset; //offset to addr for the actual location
    int id; //type id
    int size; //sizeof(type)
    TYPETOSTRING callback; //convert to string
    void* userdata; //user data
} TYPEDESCRIPTOR;

BRIDGE_IMPEXP void* GuiTypeAddNode(void* parent, const TYPEDESCRIPTOR* type);
BRIDGE_IMPEXP bool GuiTypeClear();

You can directly build the tree and a callback is provided to convert a TYPEDESCRIPTOR to a string value to display, which allows for great flexibility. Some possible use cases would be:

Parse types with clang and show them in the GUI;
Support Binary Templates;
Support Kaitai Struct.

In the future I want to add often-used types to a database and ship that with x64dbg. There will (eventually) be a blogpost describing everything in detail, but if you are interested you should come and talk to me on Telegram.

Fix log links and show suspected call stack frame

In pull request #1282, torusrxxx added an alternative view for the callstack (without using the dbghelp StackWalk function) that might help in certain situations with displaying possible return values. The hyperlink in the logs of x32dbg are now also working again!

Finished layered loop implementation

You can now add (layered) loop markers with the loopadd command (undocumented). The API for plugins is DbgLoopAdd.

layered loops

Fixed ‘cannot get module filename’

Various people had issues with x64dbg showing ‘Cannot get module filename’ or ‘GetModuleFileNameExW failed’. These should now be fixed. In addition you can now properly debug executables from a (VirtualBox) network share on Windows XP (and older versions of Windows 7).

Allow for more customization

You can now customize more details of the graph, which allows for some nice themes. See Solarized Dark by Storm Shadow. There have also been various fixes with some color options not behaving correctly.

solarized dark graph

Usual things

That has been about it for this week again. If you have any questions, contact us on Telegram, Gitter or IRC. If you want to see the changes in more detail, check the commit log.

You can always get the latest release of x64dbg here. If you are interested in contributing, check out this page.

Finally, if someone is interested in hiring me to work on x64dbg more, please contact me!

Comments

xAnalyzer Reviewed

24 Nov 2016, by ThunderCls

Introduction

First of all I want to thank mrexodia for giving me the opportunity to be part of x64dbg community, to collaborate on the project and even write an entry for this blog. I’m known as ThunderCls and I come from a group of enthusiasts and reverse engineers called CrackSLatinoS, a big family leaded by a great cracker, exploit writer and person, Ricardo Narvaja.

In this post I pretend to give a first look from my perspective, of the task of interacting with x64dbg debugger plugins API to extend and give some extra functionality to this awesome and modern debugger.

Going back

Like a year ago I started having my first contact with x64dbg and due to the simplicity and similarities with my first debugger (OllyDbg I began using it for some debugging sessions, but as an Olly user I couldn’t resist to start missing some of the features that Oleh’s debugger had, and I’m referring in this case to the extra analysis OllyDbg does over API functions calls and their arguments and values. I opened an issue in the project page asking for such a feature.

At the time of opening, the development team and collaborators were not able to get into it, instead I was given a couple choices like APIInfo by mrfearless, and I even found another one StaticAnalysis written by tr4ceflow. Both of them were very close to what I wanted, but still they didn’t fulfil all of my cravings. Like a month ago I came back to x64dbg community just to see how improved the debugger was from my last contact with it and this made me gain some interest in developing and collaborating with the project, and so xAnalyzer came in.

What is xAnalyzer?

xAnalyzer is a x64dbg plugin written by me to extend and/or complement the core analysis functionality in this debugger. The plugin was written and intended as a feature that, in the present day of writing this article, has not been implemented yet as a builtin functionality in x64dbg, and I quote:

This plugin is based on mrfearless APIInfo Plugin code, although some improvements and additions have been included. xAnalyzer is capable of calling internal x64dbg commands to make various types of analysis… This plugin is going to make extensive API functions call detections to add functions definitions, arguments and data types as well as any other complementary information, something close at what you get with OllyDbg.

Basic functionality

As I said before, this plugin took as base code to APIInfo, so most of its core functionality is from mrfearless’ code. Apart from that, I wanted to go a little bit further than just make a translation of his code into C++ and so I came up with something more like the kind of features I wanted before. The process of creating your own plugins for x64dbg is explained here and even the documentation and plugin templates for Visual Studio and other several compilers have been created, so I don’t pretend to cover all of that in this post.

Anyway, the functioning of the plugin is pretty straightforward. In the image below it’s found a flowchart of its main backbone functions.

The plugin starts by launching some of the internal analysis algorithms of x64dbg, such as: cfanal, exanal, analx, analadv or anal. Soon after that it goes into API call analysis. The plugin gets the start and end address of the section in which the current CONTEXT is, this in order to loop and make the analysis overall these bytes. For processing each instruction the plugin uses DbgDisasmFastAt function which has the following definition:

void DbgDisasmFastAt(duint addr, BASIC_INSTRUCTION_INFO* basicinfo);

Addr: Address being disassembled.
basicinfo: Pointer to a struct of type BASIC_INSTRUCTION_INFO.

typedef struct
{
    DWORD type; //value|memory|addr
    VALUE_INFO value; //immediat
    MEMORY_INFO memory;
    duint addr; //addrvalue (jumps + calls)
    bool branch; //jumps/calls
    bool call; //instruction is a call
    int size;
    char instruction[MAX_MNEMONIC_SIZE * 4];
} BASIC_INSTRUCTION_INFO;

All the values we need are present in the returned structure. In case of calls found, the plugin makes some checks to try to include as many scenarios as it can. Some of these different schemes are:

CALL -> JMP -> API (Indirect Call)

indirect call

indirect jmp

CALL -> POINTER -> API (Indirect Call)

call pointer

infobox

CALL -> API (Direct Call)

call MessageBox

The plugin creates and emulates a stack for saving all of the possible functions arguments. These instructions are filtered in the function IsArgumentInstruction(). The code depends on the platform, for x86 an argument would be any push instruction, except for push ebp, push esp, push ds, push es. Once a valid argument is found is saved to the global stack container.

On the other hand x64 platforms differ in this point, so to find if an instruction is a valid candidate it would have to be any of the instructions mov, lea, xor, or, and. But an additional check has to be made since x64 doesn’t use push instructions anymore. The x64 platform uses the registers RCX, RDX, R8, R9 for a four argument function, including floating points registers XMM0, XMM1, XMM2, XMM3 and for the rest of arguments it uses the stack [RSP + DISPLACEMENT]. So the check consist of checking if these instructions have any of those, including 32, 16 and 8 bits variants. The stack would be cleared if a function prolog or epilog is found as well as jumps (no jumps among arguments) and internal subs.

Finally, the key is that when a call is found, it will traverse the stack to find the valid arguments for it. Here once again x64 brings some differences to the table, as for the x64 functions calls arguments might have been saved to the registers or stack without any specific order, going against the function arguments definition order. Another hack had to be made, in this case, x64 depends on the registers order as the arguments order, so the scheme would be:

RCX First argument of the function;
RDX Second argument of the function;
R8 Third argument of the function;
R9 Fourth argument of the function;
STACK ([RSP + DISPLACEMENT]) The rest of the arguments of the function including floating points registers XMM0, XMM1, XMM2, XMM3.

With that in consideration, the rest is easy. Taking the same path of APIInfo plugin, xAnalyzer has a folder which should contain all the API definition files as .ini with the following structure:

Filename This is the name of the module on which the API function is located with extension .api (kernel32.api, shell32.api, etc)

A single entry in any of these files would be like:

[MessageBoxA]
1=HWND hWnd
2=LPCSTR lpText
3=LPCSTR lpCaption
4=UINT uType
ParamCount=4
@=MessageBoxA(HWND hWnd, LPCSTR lpText, LPCSTR lpCaption, UINT uType);

In this case, all of these definition files may be customized and populated by each user following the same shown pattern. If you find that a certain API call definition is not being detected by xAnalyzer it might mean that it’s not present in the definition files, so in this case an addition could be made to include any missing function.

To set the API function name comment, as well as its arguments, the plugin read over the definition files to get the correct data. Finally it also uses some of the functions in the SDK of x64dbg such as: DbgGetCommentAt, DbgSetCommentAt, DbgClearAutoCommentRange and Script::Argument::Add* to set up the visual aid for the current executable function.

As for now, x64dbg doesn’t allow nested function arguments, even though xAnalyzer does, definition is going to be present, while arguments brackets won’t. xAnalyzer has been made compatible with 64 bits binaries in the latest release and even a couple more features are also coming soon.

And this is all for this post, xAnalyzer x64dbg plugin exposed. For latest relases, info, issues, etc go to the project page.

ThunderCls signing out

Comments

Older Newer