QueryPerformanceCounter calibration with GetTickCount

In one of my older posts I'm describing how the Mozilla Platform decides on whether this high precision timer function is behaving properly or not.  That algorithm is now obsolete and we have a better one.

The current logic, that seems proven stable, is using a faults-per-tolerance-interval algorithm, introduced in bug 836869 - Make QueryPerformanceCounter bad leap detection heuristic smarter.  I decided to use such evaluation since the only real critical use of the hi-res timer is for animations and video rendering where large leaps in time may cause missing frames or jitter during playback.  Faults per interval is a good reflection of stability that we want to ensure in reality.  QueryPerformanceCounter is not perfectly precise all the time when calibrated against GetTickCount while it doesn't always need to be considered a faulty behavior of QueryPerformanceCounter result.

The improved algorithm

There is no need for a calibration thread or a calibration code as well as any global skew monitoring.  Everything is self-contained.

As the first measure, we consider QueryPerformanceCounter as stable when TSC is stable, meaning it is running at a constant rate during all ACPI power saving states [see HasStableTSC function]

When TSC is not stable or its status is unknown, we must use the controlling mechanism.

Definable properties

  • what is the number of failures we are willing to tolerate during an interval, set at 4
  • the fault-free interval, we use 5 seconds
  • a threshold that is considered a large enough skew for indicating a failure, currently 50ms

Fault-counter logic outline

  • keep an absolute time checkpoint, that shifts to the future with every failure by one fault-free interval duration, base it on GetTickCount
  • each call to Now() produces a timestamp that records values of both QueryPerformanceCounter (QPC) and GetTickCount (GTC)
  • when two timestamps (T1 and T2) are subtracted to get the duration, following math happens:
    • deltaQPC = T1.QPC - T2.QPC
    • deltaGTC = T1.GTC - T2.GTC
    • diff = deltaQPC - deltaGTC
    • if diff < 4 * 15.6ms: return deltaQPC ; this cuts of what GetTickCount's low resolution unfortunately cannot cover
    • overflow = diff - 4 * 15.6ms
    • if overflow < 50ms (the failure threshold): return deltaQPC
    • from now on, result of the subtraction is only deltaGTC
    • fault counting part:
      • if deltaGTC > 2000ms: return ; we don't count failures when timestamps are more then 2 seconds each after other *)
      • failure-count = max( checkpoint - now, 0 ) / fault-free interval
      • if failure-count > failure tolerance count: disable usage of QueryPerformanceCounter
      • otherwise: checkpoint = now + (failure-count + 1) * fault-free interval

 

You can check the code by looking at TimeStamp_windows.cpp directly.

 

I'm personally quite happy with this algorithm.  So far, no issues with redraw after wake-up even on exotic or older configurations.  Video plays smoothly, while we are having a hi-res timing for telemetry and logging where possible.

*) Reason is to omit unexpected QueryPerformanceCounter leaps from failure counting when a machine is suspended even for a short period of time

Mozilla Firefox new HTTP cache is live!

mozilla firefox new http cache performance speed crash kill freeze

 

The new Firefox HTTP cache back-end that keeps the cache content after a crash or a kill and doesn't cause any UI hangs - has landed!

 

It's currently disabled by default but you can test it by installing Firefox Nightly (version 27) and enabling it. This applies to Firefox Mobile builds as well.  There is a preference that enables or disables the new cache, find it in about:config. You can switch it on and off any time you want, even during active browsing, there is no need to restart the browser to take the changes in effect:

browser.cache.use_new_backend

  • 0 - disable, use the old crappy cache (files are stored under Cache directory in your profile) - now the default
  • 1 - enable, use the brand new HTTP cache (files are stored under cache2 directory in your profile)

Other new preferences that control the cache behavior:

No longer used any new specific preferences.  We are now backward compatible with the old cache preferences.

browser.cache.memory_limit

  • number of kBs that are preserved in RAM tops to keep the most used content in memory, so page loads speed up
  • on desktop this is now set to 50MB (i.e. 51'200kB)

 

There are still open bugs before we can turn this fully on.  The one significant is that we don't automatically delete cache files when browser.cache.disk.capacity is overreached, so your disk can get flooded by heavy browsing.  But you still can delete the cache manually using Clear Recent History.

Enabling the new HTTP cache by default is planned for Q4/2013.  For Firefox Mobile it can even be sooner, since we are using the Android's context cache directory that is automatically deleted when the storage gets out of space.  Hence, we don't need to monitor the cache capacity our self on mobile.

Please report any bug you find during tests under Core :: Networking: Cache.

 

Appcache prompt removed from Firefox

firefox appcache application cache prompt removed bother

The bothering prompt when a web app is using offline application cache (a.k.a appcache) has been removed from Firefox!

Beginning with Firefox 26 there will no more be this prompt users have to accept.  Firefox will cache the stuff automatically as if the user has clicked the Allow button.

This actually applies to every software based on Gecko, like Firefox Mobile or Firefox OS. Tracked in bug 892488.

Application cache, a not really favorite feature, is not that widely used on today web and one of the reasons has been this prompt.  It may be a little late in the game, but it still has happen.  I'm curious on what the feedback from web developers is going to be.

New Firefox HTTP cache backend - story continues

In my previous post I was writing about the new cache backend and some of the very first testing.

Now we've stepped further and there are significant improvements.  I was also able to test with more various hardware this time.

The most significant difference is a single I/O thread with relatively simple event prioritization.  Opening and reading urgent (render-blocking) files is done first, opening and reading less priority files after that and writing is performed as the last operation. This greatly improves performance when loading from non-warmed cache and also first paint time in many scenarios.

The numbers are much more precise then in the first post.  My measuring is more systematic and careful by now.  Also, I've merged gum with latest mozilla-central code few times and there are for sure some improvements too.

Here are the results, I'm using 50MB limit for keeping cached stuff in RAM.

 

[ complete page load time / first paint time ]

Old iMac with mechanical HDD
BackendFirst visitWarm go to 1)Cold go to 2)Reload
mozilla-central7.6s / 1.1s560ms / 570ms1.8s / 1.7s5.9s / 900ms
new back-end7.6s / 1.1s530ms / 540ms2.1s / 1.9s**6s / 720ms

 

Old Linux box with mechanical 'green' HDD
BackendFirst visitWarm go to 1)Cold go to 2)Reload
mozilla-central7.3s / 1.2s1.4s / 1.4s2.4s / 2.4s5.1s / 1.2s
new back-end7.3s/ 1.2s
or** 9+s / 3.5s
1.35s / 1.35s2.3s / 2.1s4.8s / 1.2s

 

Fast Windows 7 box with SSD
BackendFirst visitWarm go to 1)Cold go to 2)Reload
mozilla-central6.7s / 600ms235ms / 240ms530ms / 530ms4.7s / 540ms
new back-end6.7s / 600ms195ms / 200ms620ms / 620ms***4.7s / 540ms

 

Fast Windows 7 box and a slow microSD
BackendFirst visitWarm go to 1)Cold go to 2)Reload
mozilla-central13.5s / 6s600ms / 600ms1s / 1s7.3s / 1.2s
new back-end7.3s / 780ms
or** 13.7s / 1.1s
195ms / 200ms1.6 or 3.2s* / 460ms***4.8s / 530ms

 

To sum - most significant changes appear when using a really slow media.  For sure, first paint times greatly improves, not talking about the 10000% better UI responsiveness!  Still, space left for more optimizations.  We know what to do:

  • deliver data in larger chunks ; now we fetch only by 16kB blocks, hence larger files (e.g. images) load very slowly
  • think of interaction with upper levels by means of having some kind of an intelligent flood control

 

1) Open a new tab and navigate to a page when the cache is already pre-warmed, i.e. data are already fully in RAM.

2) Open a new tab and navigate to a page right after the Firefox start.

* I was testing with my blog home page.  There are few large images, ~750kB and ~600kB.  Delivering data to upper level consumers only by 16kB chunks causes this suffering.

** This is an interesting regression.  Sometimes with the new backend we delay first paint and overall load time.  Seems like the cache engine is 'too good' and opens the floodgate too much overwhelming the main thread event queue.  Needs more investigation.

*** Here it's combination of flood fill of the main thread with image loads, slow image data load it self and fact, that in this case we first paint after all resources on the page loaded - that needs to change.  It's also supported by fact that cold load first paint time is significantly faster on microSD then on SSD.  The slow card apparently simulates the flood control here for us.

New Firefox HTTP cache backend, first impressions

After some two months of coding me and Michal Novotný are closing to have first "private testing" stable enough build with new and simplified HTTP cache back-end.

The two main goals we've met are:

  • Be resilient to crashes and process kills
  • Get rid of any UI hangs or freezes (a.k.a janks)

We've abandoned the current disk format and use separate file for each URL however small in size it is.  Each file is using self-control hashes to check it's correct, so no fsyncs are needed.  Everything is asynchronous or fully buffered.  There is a single background thread to do any I/O like opening, reading and writing.  On Android we are writing to the context cache directory.  This way the cached data are actually treated as that.

I've performed some first tests using http://janbambas.cz/ as a test page.  Currently as I write this post there are some 460 images.  Testing was made on a relatively fast machine, but important is to differentiate on the storage efficiency.  I had two extremes available: an SSD and an old and slow-like-hell microSD via a USB reader.

Testing with a microSD card:

First-visit load
Full loadFirst paint
mozilla-central16s7s
new back-end12s4.5s
new back-end and separate threads for open/read/write10.5s3.5s
Reload, already cached and warmed
Full loadFirst paint
mozilla-central7s700ms
new back-end5.5s500ms
new back-end and separate thread for open/read/write5.5s500ms
Type URL and go, cached and warmed
Full loadFirst paint
mozilla-central900ms900ms
new back-end400ms400ms
Type URL and go, cached but not warmed
Full loadFirst paint
mozilla-central5s4.5s
new back-end~28s5-28s
new back-end and separate threads for open/read/write *)~26s5-26s

*) Here I'm getting unstable results.  I'm doing more testing with having more concurrent open and read threads.  It seems there is not that much effect and the jitter in time measurements is just a noise.

I will report on thread concurrent I/O more in a different post later since I find it quite interesting space to explore.

Clearly the cold "type and go" test case shows that blockfiles are beating us here.  But the big difference is that UI is completely jank free with the new back-end!

Testing on an SSD disk:

The results are not that different for the current and the new back-end, only a small regression in warmed and cold "go to" test cases:

Type URL and go, cached and warmed
Full loadFirst paint
mozilla-central220ms230ms
new back-end310ms320ms
Type URL and go, cached but not warmed
Full loadFirst paint
mozilla-central600ms600ms
new back-end1100ms1100ms

Having multiple threads seems not to have any affect as far as precision of my measurements goes.

At this moment I am not sure what causes the regression for both the "go to" cases on an SSD, but I believe it's just a question of some simple optimizations, like delivering more then just 4096 bytes per a thread loop as we do now or a fact we don't cache redirects - it's a known bug right now.

Still here and want to test your self? Test builds can be downloaded from 'gum' project treeDisclaimer: the code is very very experimental at this stage, so use at your own risk!