The big handle gamble

Yesterday I was debugging some programs and after restarting I saw that the status label stayed stuck on Initializing. At first it didn’t seem to impact anything, but pretty soon after that other things started breaking as well.

Reproduction steps:

Observed behaviours:

  1. The label stays stuck on Initializing
  2. The label stays stuck on Paused (appears to be more rare)

A shot in the dark

After getting more or less stable reproductions I started to look into why this could be happening. On the surface the TaskThread appeared to be correct, but since the WakeUp function was probably failing I put an assert on ReleaseSemaphore, which should trigger the TaskThread:

template <typename F, typename... Args>
void TaskThread_<F, Args...>::WakeUp(Args... _args)
{
    ++this->wakeups;
    EnterCriticalSection(&this->access);
    this->args = CompressArguments(std::forward<Args>(_args)...);
    LeaveCriticalSection(&this->access);
    // This will fail silently if it's redundant, which is what we want.
    if(!ReleaseSemaphore(this->wakeupSemaphore, 1, nullptr))
        __debugbreak();
}

I tried to reproduce the bug and unsurprisingly the assert triggered! At this point I suspected memory corruption, so I inserted a bunch of debug tricks in the TaskThread to store the original handle in a safe memory location:

struct DebugStruct
{
    HANDLE wakeupSemaphore = nullptr;
};

template <int N, typename F, typename... Args>
TaskThread_<N, F, Args...>::TaskThread_(F fn,
                                     size_t minSleepTimeMs, DebugStruct* debug) : fn(fn), minSleepTimeMs(minSleepTimeMs)
{
    //make the semaphore named to find it more easily in a handles viewer
    wchar_t name[256];
    swprintf_s(name, L"_TaskThread%d_%p", N, debug);
    this->wakeupSemaphore = CreateSemaphoreW(nullptr, 0, 1, name);
    if(debug)
    {
        if(!this->wakeupSemaphore)
            __debugbreak();
        debug->wakeupSemaphore = this->wakeupSemaphore;
    }
    InitializeCriticalSection(&this->access);

    this->thread = std::thread([this, debug]
    {
        this->Loop(debug);
    });
}

The TaskThread instance is now initialized and called like so:

void GuiSetDebugStateAsync(DBGSTATE state)
{
    GuiSetDebugStateFast(state);
    static TaskThread_<
               6,
               decltype(&GuiSetDebugState),
               DBGSTATE>
           GuiSetDebugStateTask(
               &GuiSetDebugState,
               300,
               new (VirtualAlloc(0,
                                 sizeof(DebugStruct),
                                 MEM_RESERVE | MEM_COMMIT,
                                 PAGE_EXECUTE_READWRITE)
                   ) DebugStruct()
           );
    GuiSetDebugStateTask.WakeUp(state, true);
}

Semaphore handle

Now I started x64dbg and used Process Hacker to find the _TaskThread6_XXXXXXXX semaphore to take note of the handle. I then reproduced and found to my surprise that the value of wakeupSemaphore was 0x640, the same value as on startup!

However when I checked the handle view again, 0x640 was no longer the handle to a semaphore, but rather to a mapped file!

Mapped file handle

Pushing our luck

This started to smell more and more like bad WinAPI usage. Tools like Application Verifier exist to find these kind of issues, but I could not get it to work so I had to roll my own.

The idea is rather simple:

  1. Use minhook to hook the CloseHandle API.
  2. Save the correct semaphore handle to a global variable
  3. Crash if this handle is ever closed
static DebugStruct* g_Debug = nullptr;
typedef BOOL(WINAPI* CLOSEHANDLE)(HANDLE hObject);
static CLOSEHANDLE fpCloseHandle = nullptr;

static BOOL WINAPI CloseHandleHook(HANDLE hObject)
{
    if(g_Debug && g_Debug->wakeupSemaphore == hObject)
        __debugbreak();
    return fpCloseHandle(hObject);
}

static void DoHook()
{
    if(MH_Initialize() != MH_OK)
        __debugbreak();
    if(MH_CreateHook(GetProcAddress(GetModuleHandleW(L"kernelbase.dll"), "CloseHandle"), &CloseHandleHook, (LPVOID*)&fpCloseHandle) != MH_OK)
        __debugbreak();
    if(MH_EnableHook(MH_ALL_HOOKS) != MH_OK)
        __debugbreak();
}

This time reproducing the issue gave some very useful results:

WinDbg callstack

Winner winner chicken dinner!

The actual bug turned out to be in TitanEngine. The ForceClose function is supposed to close all the DLL handles from the current debug session, but all of these handles were already closed at the end of the same LOAD_DLL_DEBUG_EVENT handler.

__declspec(dllexport) void TITCALL ForceClose()
{
    //manage process list
    int processcount = (int)hListProcess.size();
    for(int i = 0; i < processcount; i++)
    {
        EngineCloseHandle(hListProcess.at(i).hFile);
        EngineCloseHandle(hListProcess.at(i).hProcess);
    }
    ClearProcessList();
    //manage thread list
    int threadcount = (int)hListThread.size();
    for(int i = 0; i < threadcount; i++)
        EngineCloseHandle(hListThread.at(i).hThread);
    ClearThreadList();
    //manage library list
    int libcount = (int)hListLibrary.size();
    for(int i = 0; i < libcount; i++)
    {
        if(hListLibrary.at(i).hFile != (HANDLE) - 1)
        {
            if(hListLibrary.at(i).hFileMappingView != NULL)
            {
                UnmapViewOfFile(hListLibrary.at(i).hFileMappingView);
                EngineCloseHandle(hListLibrary.at(i).hFileMapping);
            }
            EngineCloseHandle(hListLibrary.at(i).hFile); // <-- this is there the bug happens
        }
    }
    ClearLibraryList();

    if(!engineProcessIsNowDetached)
    {
        StopDebug();
    }
    RtlZeroMemory(&dbgProcessInformation, sizeof PROCESS_INFORMATION);
    if(DebugDebuggingDLL)
        DeleteFileW(szDebuggerName);
    DebugDebuggingDLL = false;
    DebugExeFileEntryPointCallBack = NULL;
}

But how does the semaphore handle value come to be the same as a previous file handle? The answer to that puzzling question is given when you look at the flow of events:

Now for why this doesn’t happen every single time (sometimes I had to restart the debuggee 20 or more times), the handle value is ‘randomly’ reused from the closed handle pool and it’s kind of a coin toss as to when this happens. I found that you can greatly increase the likelyhood of this happening when your PC has been on for a few days and you have 70k handles open. Probably the kernel will use a more aggressive recycling strategy when low on handles, but that’s just my guess.

If you are interested in trying to reproduce this at home, you can use the handle_gamble branch. You can also take a look at the relevant issue.

Duncan

Comments