Mozilla Firefox new HTTP cache is live!

mozilla firefox new http cache performance speed crash kill freeze

 

The new Firefox HTTP cache back-end that keeps the cache content after a crash or a kill and doesn’t cause any UI hangs – has landed!

 

It’s currently disabled by default but you can test it by installing Firefox Nightly (version 27) and enabling it. This applies to Firefox Mobile builds as well.  There is a preference that enables or disables the new cache, find it in about:config. You can switch it on and off any time you want, even during active browsing, there is no need to restart the browser to take the changes in effect:

browser.cache.use_new_backend

  • 0 – disable, use the old crappy cache (files are stored under Cache directory in your profile) – now the default
  • 1 – enable, use the brand new HTTP cache (files are stored under cache2 directory in your profile)

Other new preferences that control the cache behavior:

No longer used any new specific preferences.  We are now backward compatible with the old cache preferences.

browser.cache.memory_limit

  • number of kBs that are preserved in RAM tops to keep the most used content in memory, so page loads speed up
  • on desktop this is now set to 50MB (i.e. 51’200kB)

 

There are still open bugs before we can turn this fully on.  The one significant is that we don’t automatically delete cache files when browser.cache.disk.capacity is overreached, so your disk can get flooded by heavy browsing.  But you still can delete the cache manually using Clear Recent History.

Enabling the new HTTP cache by default is planned for Q4/2013.  For Firefox Mobile it can even be sooner, since we are using the Android’s context cache directory that is automatically deleted when the storage gets out of space.  Hence, we don’t need to monitor the cache capacity our self on mobile.

Please report any bug you find during tests under Core :: Networking: Cache.

 

New Firefox HTTP cache backend – story continues

In my previous post I was writing about the new cache backend and some of the very first testing.

Now we’ve stepped further and there are significant improvements.  I was also able to test with more various hardware this time.

The most significant difference is a single I/O thread with relatively simple event prioritization.  Opening and reading urgent (render-blocking) files is done first, opening and reading less priority files after that and writing is performed as the last operation. This greatly improves performance when loading from non-warmed cache and also first paint time in many scenarios.

The numbers are much more precise then in the first post.  My measuring is more systematic and careful by now.  Also, I’ve merged gum with latest mozilla-central code few times and there are for sure some improvements too.

Here are the results, I’m using 50MB limit for keeping cached stuff in RAM.

 

[ complete page load time / first paint time ]

Old iMac with mechanical HDD
BackendFirst visitWarm go to 1)Cold go to 2)Reload
mozilla-central7.6s / 1.1s560ms / 570ms1.8s / 1.7s5.9s / 900ms
new back-end7.6s / 1.1s530ms / 540ms2.1s / 1.9s**6s / 720ms

 

Old Linux box with mechanical 'green' HDD
BackendFirst visitWarm go to 1)Cold go to 2)Reload
mozilla-central7.3s / 1.2s1.4s / 1.4s2.4s / 2.4s5.1s / 1.2s
new back-end7.3s/ 1.2s
or** 9+s / 3.5s
1.35s / 1.35s2.3s / 2.1s4.8s / 1.2s

 

Fast Windows 7 box with SSD
BackendFirst visitWarm go to 1)Cold go to 2)Reload
mozilla-central6.7s / 600ms235ms / 240ms530ms / 530ms4.7s / 540ms
new back-end6.7s / 600ms195ms / 200ms620ms / 620ms***4.7s / 540ms

 

Fast Windows 7 box and a slow microSD
BackendFirst visitWarm go to 1)Cold go to 2)Reload
mozilla-central13.5s / 6s600ms / 600ms1s / 1s7.3s / 1.2s
new back-end7.3s / 780ms
or** 13.7s / 1.1s
195ms / 200ms1.6 or 3.2s* / 460ms***4.8s / 530ms

 

To sum – most significant changes appear when using a really slow media.  For sure, first paint times greatly improves, not talking about the 10000% better UI responsiveness!  Still, space left for more optimizations.  We know what to do:

  • deliver data in larger chunks ; now we fetch only by 16kB blocks, hence larger files (e.g. images) load very slowly
  • think of interaction with upper levels by means of having some kind of an intelligent flood control

 

1) Open a new tab and navigate to a page when the cache is already pre-warmed, i.e. data are already fully in RAM.

2) Open a new tab and navigate to a page right after the Firefox start.

* I was testing with my blog home page.  There are few large images, ~750kB and ~600kB.  Delivering data to upper level consumers only by 16kB chunks causes this suffering.

** This is an interesting regression.  Sometimes with the new backend we delay first paint and overall load time.  Seems like the cache engine is ‘too good’ and opens the floodgate too much overwhelming the main thread event queue.  Needs more investigation.

*** Here it’s combination of flood fill of the main thread with image loads, slow image data load it self and fact, that in this case we first paint after all resources on the page loaded – that needs to change.  It’s also supported by fact that cold load first paint time is significantly faster on microSD then on SSD.  The slow card apparently simulates the flood control here for us.

Firefox detailed event tracer – about:timeline

I always wanted to see all the little things that happen when loading a web page in Firefox.  I know how it works well as a Gecko developer, but to actually see the interaction and chaining is not at all always obvious, unless you study crazy NSPR logs.  Hence, I started a development of a tool, called event tracer, to get a timeline showing the hidden guts.

Example screenshot will tell the story the best way, a trial run of www.mozilla.org load:

firefox event timeline performance diagnostic tool

Planning improvements

At this time the work is not complete.  There is a lot more to make this an actually useful development tool and my next steps are to hunt those requirements down.

I am using about:timeline to verify patches I review do what they intend to do.  It can also help find hidden bugs.

However, using about:timeline for discovery of perfomance suboptimal code paths showed up being not that simple.  Events are just spread all over around and connection between them is not easily, if at all, discoverable.

Hence, this needs more thinking and work.

First thoughts to improve are to more focus on “the resource” and a single object dealing with it.  It might be better to show what all events are happening on e.g. a single instance of an http channel or http transactions then to tortuously hunt them somewhere in the graph.  There is some simple way to highlight and filter, but that is not enough for an analytical view.

Then, I’m missing a general way to easily recognize how things are chained together.  So, I’d like to link events that are coming one from another (like http channel creates an http transaction, then a connection etc.) and present the timeline more like a gantt chart plus show a critical path or flow for any selected pass through.

From inspecting the timeline it should be visible where bottlenecks and long wait times worth fixing are.  At this time I don’t have a complete clear plan on this, though.

 

Still here?  Cool :)  If you have any though or ideas for how to use the data we collect and visualize as a valuable source for the performance optimization surgery please feel free to share them here.  More on how the timeline data are produced check bellow.

 

How it works

firefox event timeline performance diagnostic tool

This event track on the image is produced with a special code instrumentation.  To get for instance “net::http::transaction” traces, following 3 places of the code have been instrumented (in blue):

 

1. WAIT – record time when an http transaction is scheduled:

nsresult
nsHttpTransaction::Init(uint32_t caps,
                        nsHttpConnectionInfo *cinfo,
                        nsHttpRequestHead *requestHead,
                        nsIInputStream *requestBody,
                        bool requestBodyHasHeaders,
                        nsIEventTarget *target,
                        nsIInterfaceRequestor *callbacks,
                        nsITransportEventSink *eventsink,
                        nsIAsyncInputStream **responseBody)
{
    MOZ_EVENT_TRACER_COMPOUND_NAME(static_cast<nsAHttpTransaction*>(this),
                                   requestHead->PeekHeader(nsHttp::Host),
                                   requestHead->RequestURI().BeginReading());

    MOZ_EVENT_TRACER_WAIT(static_cast<nsAHttpTransaction*>(this),
                          "net::http::transaction");

 

2. EXEC – record time when the transaction first comes to an action, it is the time it gets a connection assigned and starts it’s communication with the server:

void
nsHttpTransaction::SetConnection(nsAHttpConnection *conn)
{
    NS_IF_RELEASE(mConnection);
    NS_IF_ADDREF(mConnection = conn);

    if (conn) {
        MOZ_EVENT_TRACER_EXEC(static_cast<nsAHttpTransaction*>(this),
                              "net::http::transaction");
    }

 

3. DONE – record time when the transaction has finished it’s job by completing the response fetch:

nsHttpTransaction::Close(nsresult reason)
{
    LOG(("nsHttpTransaction::Close [this=%x reason=%x]\n", this, reason));

    ...

    MOZ_EVENT_TRACER_DONE(static_cast<nsAHttpTransaction*>(this),
                          "net::http::transaction");
}

The thread timeline where an event is finally displayed is the thread where EXEC code has been called on.

What is the exact definition of the WAIT and EXEC phase is up to the developer now.  For me the WAIT phase is any time an operation is significantly blocked before it can be carried out, it’s the time having the main performance affect we may be in particular interested in shortening.  Few examples:

  • time spent in a thread’s event queue – duration from the dispatch to the run
  • time spent waiting for an asynchronous callback such as reading from disk or network
  • time waiting for necessary resources, such as an established TCP connection before an object can proceed with its job
  • time spent waiting for acquirement of a lock or a monitor

How to bind a URL or any identifying info to an event

The following instrumentation is used (in red):

nsresult
nsHttpTransaction::Init(uint32_t caps,
                        nsHttpConnectionInfo *cinfo,
                        nsHttpRequestHead *requestHead,
                        nsIInputStream *requestBody,
                        bool requestBodyHasHeaders,
                        nsIEventTarget *target,
                        nsIInterfaceRequestor *callbacks,
                        nsITransportEventSink *eventsink,
                        nsIAsyncInputStream **responseBody)
{
    MOZ_EVENT_TRACER_COMPOUND_NAME(static_cast<nsAHttpTransaction*>(this),
                                   requestHead->PeekHeader(nsHttp::Host),
                                   requestHead->RequestURI().BeginReading());

    MOZ_EVENT_TRACER_WAIT(static_cast<nsAHttpTransaction*>(this),
                          "net::http::transaction");

 

Here the http transaction event is set a host + path of the resource it loads.

The object’s this pointer, that needs to be properly cast by the developer, is what sticks all together.  This is the main difference from how usual profiling tools work.  Event timeline is providing a view of event chaining crossing thread and method boundaries, and not just a pure stack trace.

View details of a tracked event

Each event track is bound with e.g. a URL it deals with, where applicable.  You can inspect the URL (the associated resource) and more details when an event is clicked on:

firefox event timeline performance diagnostic tool event detailWait between is track of the time the event spent waiting, i.e. the time the event has been scheduled and time of the execution start, both since the time tracking has been turned on.  The number in parentheses is simply the wait phase duration.

Execution is track of time spent by execution, i.e. when the intended job it self has started and when it has actually finished.  The parenthesized number is how long the job execution took.

Posted from is name of the thread the event has been scheduled at.

The Filter button is used to quickly filter this particular event plus it’s sister events out.  How it work is described bellow.

The Zero time button is used to shift the “time 0.000” of the timeline to the start time of the inspected event, so that you can inspect recorded timing of other events relative to this one particular.

mxr link will open results of search for the event name in the code.  This way you can quickly inspect how this event timing is actually instrumented and collected right in the code.

Filtering timeline events

You can filter events using two filtering functions:

  • By type of an event (e.g. “net::nttp::transaction”, “docshell::pageload” etc.)
  • By name of a resource an event has been associated with (e.g. “www.mozilla.org”, “/favicon.ico” etc…)

Filter by type – currently there are following event types so far implemented (instrumented).  Yyou get this check box list after the tracer run when you click filter events at the top bar:

firefox event timeline performance diagnostic tool event filterEach event has a namespace, e.g. for “net::http::transaction” the namespaces are “net::” and “net::http::”.  You can turn on or off the whole set of events in a namespace easily.  Currently there are only “net::” and “docshell::” top level namespaces worth mentioning.

 

 

 

 

 

 

 

 

Filtering by resource, i.e. usually the URL a particular event or set of events have been bound to, is also possible when you click filter resource:

firefox event timeline performance diagnostic tool resource filter

You can enter the filter string manually or use the provided autocomplete.  The by-the-resource filtering works as follows:

1. we inspect whether the string set as the filter is a substring of the event resource string
2. we inspect whether the event resource string is a substring of the string set as the filter

If one of these conditions passes, we display the event, otherwise, we hide it.

This way when you enter “http://www.mozilla.org/favicon.ico” as a filter, you see the http channel and transaction for this resource load, as well as any DNS and TCP connection setups related to “www.mozilla.org”.

 

How to use about:timeline your self

  • Create an optimized build of Firefox with --enable-visual-event-tracer configure option
  • Install about:timeline extension
  • Run Firefox
  • Go to about:timeline
  • Press the orange [ Start ] button, you get the message the trace is running
  • Proceed with your intended actions, e.g. a page load, and let it finish
  • Return to about:timeline tab and press the [ Stop ] button
  • Wait a little to get your events timeline details

 

So, here it is.  It’s a work in progress tho.  I’ll keep updates.

Firefox 23 now has faster localStorage implementation

Finally, the more efficient implementation of localStorage DOM API has landed and sticks in Firefox 23, currently on the Nightly channel.  The first build having the patches has id 20130416030901.

More about the motivation and details in my previous post on DOM storage code overhaul.

I want to thank all involved people, namely Vladan Djeric, Marco Bonardo and Olli Pettay for their help, ideas and mainly reviews of this large change.  Thanks!

 

 

New faster localStorage in Firefox 22

LocalStorage, simple web API to store persistent key/value pairs, is very favorite among web developers for its simplicity of use thanks its synchronous design.  But in current versions of Firefox is one of most serious culprits of UI janks.  The browser UI may simply stop reacting for a short time when localStorage data is written as well as read, mainly on mobile devices with poor flash memory performance.

A month ago I started rewrite of the DOM storage code in the Mozilla Platform from scratch.   Except the performance and memory consumption problems other motivation for it is simply a strong need for code cleanup.  The work on the new implementation is currently in stage of a pending review to get it into to the tree in mozilla bug 600307.  It’s planned to land for Firefox 22, since 21 soon transits to the Aurora channel.

The main difference in the new implementation is that writes, but also reads, are completely moved to a background thread so it will not freeze your browser.  When a web content script touches localStorage it only works with a memory cache and UI then cannot be blocked on waiting for disk writes to complete.  All data changes are posted to and flushed in regular short intervals on the background I/O thread.

However, before localStorage can be used, data for the page origin has to be there – loaded into the memory cache.  I implemented an early pre-load of the data when we start navigation to a web page.  This is so far trivial:  when prefetch doesn’t make it on time to load all data quickly enough, we block UI on access to localStorage until it is done.  There are obvious and less obvious reasons to load all the data:  if you want to read a key it simply needs to load from your disk first, but there is also quota usage checking when data are added or modified and StorageEvent providing the previous key value before modification.

So, there is still a room for more optimizations here I plan as a followup work to the core patch.  To shortly explain, prefetch on the background thread pushes localStorage keys and its values one by one to the cache.  Access to localStorage is blocked until all data is loaded.  To make that time as short as possible I plan following further optimizations:

  • Obviously, when reading a single key and that key is already in the cache, just provide it without waiting for the rest of the data to load.
  • However, when a key is not in the cache, we have two options: wait just for that one key to get loaded or, when WAL mode on the SQLite database connection is enabled, read just that one key synchronously from the database.  Both approaches, however, have to block the UI.  Fortunately, our telemetry data (that I must thank you all for submitting it!) shows that reads from the database are generally very fast even for a whole domain data.
  • When a key is about to write, we don’t need to block, yet quota checking is the enemy here.  We must lower our demands on its precision slightly.  Other complication is the StorageEvent that has to fire after a data modification.  The event has to report the previous value of a modified key.  Fortunately, the StorageEvent can be asynchronous.  Hence, we can just fire it later after the key value has loaded from the database.

Before I jump to write these more changes I first want to collect telemetry data on which operation and for how long we still may block.

I wrote a fairly technical overview of how the whole new implementation works at this MDC document.

I want to thank Vladan Djeric for his help with this work.  He is the one to review that huge patch and he also gave me inspiration for some aspects of the new implementation, like WAL usage and task batching, through his work on intermediate localStorage optimization in bug 807021.

I’ve made an experimental Firefox Nightly build with my patch.  I intend to use it to get some first telemetry data.  If anyone is brave enough and wants to help improve, then feel free to install it as well, enable telemetry in about:telemetry and report any crashes that may happen.  I will sync these builds with latest Firefox Nightlies from time to time and expose links to them here.

Disclaimer: the patch didn’t go though a code or security review, only through mozilla automated testing.  Any data loss is at your own risk!

The latest (updated fifth time) experimental builds based on Mar 15 2013 mozilla-central code can be found here: for Windows, Linux (64) and Mac.