Hooking WinAPI to improve Qt performance

Hello,

First of all, apologies for the long absence. I have been dealing with personal issues and university, so writing this blog every week was an easy thing to cross off my list of things to do (especially considering I made it rather stressful for myself to produce these). I don’t exactly know yet how I will approach this blog from now on, but it will definitely not be every week. Note: If you have time, please write an entry for this blog! You can find more information here. If you want to write something but don’t know exactly how, come in contact discuss a topic with us.

Today I would like to discuss performance and how caching can drastically improve it. If you don’t read this but use x64dbg, at least install the GetCharABCWidthsI_cache plugin to take advantage of this performance improvement…

To render those beautifully highlighted instructions, x64dbg uses a self-cooked rich-text format called CustomRichText_t:

enum CustomRichTextFlags
{
    FlagNone,
    FlagColor,
    FlagBackground,
    FlagAll
};

struct CustomRichText_t
{
    QString text;
    QColor textColor;
    QColor textBackground;
    CustomRichTextFlags flags;
    bool highlight;
    QColor highlightColor;
};

This structure describes a single unit of text, with various options for highlighting it. This is extremely flexible, simple, easy to extend and doesn’t require any parsing of a text-based markup language like HTML or RTF. Since the most-used/refreshed views (disassembly, dump and stack) use this, rendering these units should be very fast and when failing to do this the user will suffer (noticeable) lag.

Now when profiling and holding down F7 (step into) I noticed that the majority of the time is spent in functions related to Qt, the first having to do with QPainter::fillRect and the second being related to QPainter::drawText. Both these functions are called very often from RichTextPainter::paintRichText.

profile before

It looks like QPainter::fillRect is part of drawing the main window and I cannot find a way to optimize it away, but the GetCharABCWidthsI function is definitely a candidate for optimization! The root cause appears to be in a function called QWindowsFontEngine::getGlyphBearings that is used during the layout phase of text. However GetCharABCWidthsI returns information of the font and it only has to be retrieved once! Take a look at the code:

void QWindowsFontEngine::getGlyphBearings(glyph_t glyph, qreal *leftBearing, qreal *rightBearing)
{
    HDC hdc = m_fontEngineData->hdc;
    SelectObject(hdc, hfont);

    if (ttf)
    {
        ABC abcWidths;
        GetCharABCWidthsI(hdc, glyph, 1, 0, &abcWidths);
        if (leftBearing)
            *leftBearing = abcWidths.abcA;
        if (rightBearing)
            *rightBearing = abcWidths.abcC;
    }
    else {
        QFontEngine::getGlyphBearings(glyph, leftBearing, rightBearing);
    }
}

Important information here is that SelectObject is called to set the current font handle and immediately after GetCharABCWidthsI is called to query information on a single glyph. To add a cache (and some diagnostics) I will write a plugin that hooks these functions and provides a cache of the glyph data. I’ll be using MinHook to accomplish this since it’s really easy to use.

The code for SelectObject is pretty straightforward. The goal here is to prepare a global variable with the HFONT handle that will be used in GetCharABCWidthsI to get the appropriate information. Reason for this is that the function GetCurrentObject is very slow and will generate a little spike of its own in the performance profile.

static HGDIOBJ WINAPI hook_SelectObject(
    HDC hdc,
    HGDIOBJ h)
{
    auto result = original_SelectObject(hdc, h);
    auto found = fontData.find(h);
    if(checkThread() && found != fontData.end())
    {
        curHdc = hdc;
        curFont = &found->second;
    }
    else
    {
        curHdc = nullptr;
        curFont = nullptr;
    }
    return result;
}

This function will also call checkThread() to avoid having to deal with thread-safety and it will only select font handles that were already used by GetCharABCWidthsI to retrieve data. The hook for GetCharABCWidthsI is a little more involved, but shouldn’t be difficult to understand.

static BOOL WINAPI hook_GetCharABCWidthsI(
    __in HDC hdc,
    __in UINT giFirst,
    __in UINT cgi,
    __in_ecount_opt(cgi) LPWORD pgi,
    __out_ecount(cgi) LPABC pabc)
{
    //Don't cache if called from a different thread
    if(!checkThread())
        return original_GetCharABCWidthsI(hdc, giFirst, cgi, pgi, pabc);

    //Get the current font object and get a (new) pointer to the cache
    if(!curFont || curHdc != hdc)
    {
        auto hFont = GetCurrentObject(hdc, OBJ_FONT);
        auto found = fontData.find(hFont);
        if(found == fontData.end())
            found = fontData.insert({ hFont, FontData() }).first;
        curFont = &found->second;
    }
    curFont->count++;

    //Functions to lookup/store glyph index data with the cache
    bool allCached = true;
    auto lookupGlyphIndex = [&](UINT index, ABC & result)
    {
        auto found = curFont->cache.find(index);
        if(found == curFont->cache.end())
            return allCached = false;
        result = found->second;
        return true;
    };
    auto storeGlyphIndex = [&](UINT index, ABC & result)
    {
        curFont->cache[index] = result;
    };

    //A pointer to an array that contains glyph indices.
    //If this parameter is NULL, the giFirst parameter is used instead.
    //The cgi parameter specifies the number of glyph indices in this array.
    if(pgi == NULL)
    {
        for(UINT i = 0; i < cgi; i++)
            if(!lookupGlyphIndex(giFirst + i, pabc[i]))
                break;
    }
    else
    {
        for(UINT i = 0; i < cgi; i++)
            if(!lookupGlyphIndex(pgi[i], pabc[i]))
                break;
    }

    //If everything was cached we don't have to call the original
    if(allCached)
    {
        curFont->hits++;
        return TRUE;
    }

    curFont->misses++;

    //Call original function
    auto result = original_GetCharABCWidthsI(hdc, giFirst, cgi, pgi, pabc);
    if(!result)
        return FALSE;

    //A pointer to an array that contains glyph indices.
    //If this parameter is NULL, the giFirst parameter is used instead.
    //The cgi parameter specifies the number of glyph indices in this array.
    if(pgi == NULL)
    {
        for(UINT i = 0; i < cgi; i++)
            storeGlyphIndex(giFirst + i, pabc[i]);
    }
    else
    {
        for(UINT i = 0; i < cgi; i++)
            storeGlyphIndex(pgi[i], pabc[i]);
    }

    return TRUE;
}

A command abcdata is also added to the plugin to gives some more insight in the number of cache misses and such and it appears to have been worth it (these numbers are from running x64dbg for about 20 seconds)!

HGDIOBJ: 3B0A22E9
count: 4, hits: 2, misses: 2

HGDIOBJ: A70A1E93
count: 1374, hits: 1348, misses: 26

HGDIOBJ: 000A1F1B
count: 140039, hits: 139925, misses: 114

HGDIOBJ: 7C0A2302
count: 581, hits: 550, misses: 31

The profile also confirms that this helped and I noticed a small improvement in speed!

profile after

A ticket has been opened in the Qt issue tracker and I hope this can help in further improving Qt. There have also been various suggestions on how to handle drawing lots of text which I will try another time. You can get the GetCharABCWidthsI_cache plugin if you want to try this yourself.

That’s it for today, have a good day!

Duncan

Comments