QueryPerformanceCounter calibration with GetTickCount

In one of my older posts I'm describing how the Mozilla Platform decides on whether this high precision timer function is behaving properly or not.  That algorithm is now obsolete and we have a better one.

The current logic, that seems proven stable, is using a faults-per-tolerance-interval algorithm, introduced in bug 836869 - Make QueryPerformanceCounter bad leap detection heuristic smarter.  I decided to use such evaluation since the only real critical use of the hi-res timer is for animations and video rendering where large leaps in time may cause missing frames or jitter during playback.  Faults per interval is a good reflection of stability that we want to ensure in reality.  QueryPerformanceCounter is not perfectly precise all the time when calibrated against GetTickCount while it doesn't always need to be considered a faulty behavior of QueryPerformanceCounter result.

The improved algorithm

There is no need for a calibration thread or a calibration code as well as any global skew monitoring.  Everything is self-contained.

As the first measure, we consider QueryPerformanceCounter as stable when TSC is stable, meaning it is running at a constant rate during all ACPI power saving states [see HasStableTSC function]

When TSC is not stable or its status is unknown, we must use the controlling mechanism.

Definable properties

  • what is the number of failures we are willing to tolerate during an interval, set at 4
  • the fault-free interval, we use 5 seconds
  • a threshold that is considered a large enough skew for indicating a failure, currently 50ms

Fault-counter logic outline

  • keep an absolute time checkpoint, that shifts to the future with every failure by one fault-free interval duration, base it on GetTickCount
  • each call to Now() produces a timestamp that records values of both QueryPerformanceCounter (QPC) and GetTickCount (GTC)
  • when two timestamps (T1 and T2) are subtracted to get the duration, following math happens:
    • deltaQPC = T1.QPC - T2.QPC
    • deltaGTC = T1.GTC - T2.GTC
    • diff = deltaQPC - deltaGTC
    • if diff < 4 * 15.6ms: return deltaQPC ; this cuts of what GetTickCount's low resolution unfortunately cannot cover
    • overflow = diff - 4 * 15.6ms
    • if overflow < 50ms (the failure threshold): return deltaQPC
    • from now on, result of the subtraction is only deltaGTC
    • fault counting part:
      • if deltaGTC > 2000ms: return ; we don't count failures when timestamps are more then 2 seconds each after other *)
      • failure-count = max( checkpoint - now, 0 ) / fault-free interval
      • if failure-count > failure tolerance count: disable usage of QueryPerformanceCounter
      • otherwise: checkpoint = now + (failure-count + 1) * fault-free interval

 

You can check the code by looking at TimeStamp_windows.cpp directly.

 

I'm personally quite happy with this algorithm.  So far, no issues with redraw after wake-up even on exotic or older configurations.  Video plays smoothly, while we are having a hi-res timing for telemetry and logging where possible.

*) Reason is to omit unexpected QueryPerformanceCounter leaps from failure counting when a machine is suspended even for a short period of time

4 thoughts on “QueryPerformanceCounter calibration with GetTickCount

  1. What's the failure threshold good for?

    You could change to
    · if diff < (4 * 15.6ms + 50ms): return deltaQPC ; this cuts of what GetTickCount’s low resolution unfortunately cannot cover
    and then remove the next 2 steps and have the exact same result, no?

    1. No, because when the +50ms threshold is overran I want to return deltaGTC (see "from now on, result of the subtraction is only deltaGTC" line), just to be safe, since I don't know how much QPC is skewed off. Those 50ms is my threshold to consider QPC safe to use for determining the duration.

      1. Sorry - the blog software ate some parts of my comment. 2nd try:

        if diff < (4 * 15.6ms + 50ms): return deltaQPC
        (this is the same before and after calculating overflow)

        if diff ≥ (4 * 15.6ms + 50ms): run the fault counting part, then return deltaGTC

        There's only one real threshold: (4 * 15.6ms + 50ms). It's split in two parts: (4 * 15.6ms) and (4 * 15.6ms + 50ms), but the first two possibilities do the same (return deltaQPC, do nothing more).

        By looking at the actual code, the only difference I see is that
        LOG(("TimeStamp: QPC check after %llums with overflow %1.4fms",
        ... gets called if (4 * 15.6ms) is exceeded but (4 * 15.6ms + 50ms) is not.

        Maybe the intention was to return deltaGTC in that case? With the comment
        // XXX Should we return GTC here?

        1. Sorry for later response (the incomplete comment trashed).

          It conforms the code. Since it's vague how to decided which value may be correct in that interval, I left it split. Also to make more clear which part of the algorithm is doing what. I inclined to QPC because it's getting more stable with every new hardware and OS version. And since GTC as a software based timer may well suffer from jittering, it's IMO better to take the QPC value. If QPC is not stable in the long term, we will hit the failure intolerance sooner or later anyway and disable it completely.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.