Firefox 57 delays requests to tracking domains

Firefox Quantum – version 57 – introduced number of changes to the network requests scheduler.  One of them is using data of the Tracking Protection database to delay load of scripts from tracking domains when possible during the time a page is actively loading and rendering – I call it tailing.

This has a positive effect on page load performance as we save some of the network bandwidth, I/O and CPU for loading and processing of images and scripts running on the site so the web page is complete and ready sooner.

Tracking scripts are not disabled, we only delay their load for few seconds when we can.  Requests are kept on hold only while there are site sub-resources still loading and only up to about 6 seconds.  The delay is engaged only for scripts added dynamically or as async.  Tracking images and XHRs are always delayed, as well as any request made by a tracking script.  This is legal according all HTML specifications and it’s assumed that well built sites will not be affected regarding functionality.

To make it more clear what we exactly do for site and tracking requests, this is how scheduling roughly looks like when tailing is engaged:

Firefox Quantum Tracker Tailing OK

And here with the tailing turned off:

Firefox Quantum Tracker Tailing OFF

This is of course not without problems.  For sites that are either not well built or their rendering is influenced by scripts from tracking domains there can be a visible or even functional regression.  Simply said, some sites need to be fixed to be able to adopt this change in scheduling.

One example is Google’s Page-Hiding Snippet, which may cause a web page to be blank for whole 4 seconds since the navigation start.  What happens?  Google’s A/B testing initially hides the whole web page with opacity: 0.  The test script first has to do its job to prepare the page for the test and only then it unhides the page content.  The test script is dynamically loaded by the analytics.js script.  Both the analytics.js and the test script are loaded from www.google-analytics.com, a tracking domain, for which we engage the tailing delay.  As the result the page is blank until one of the following wins: 4 seconds timeout elapses or we load both the scripts and execute them.  For a common user this appears as a performance drawback and not a win.

Other example can be a web page referring an API of an async tracking script from a sync script, which obviously is a race condition, since there is no guarantee that an async script loads before a sync script.  There is a real life example of such not-well-built site using a Twitter API – window.twttr.  The twttr object is simply not there when the site’s script calls on it.  An exception is thrown and the rest of the site script is not executed breaking some of the page’s functionality.  That effected web page worked before tailing just because Twitter’s servers were fast to respond and executed sooner than the site script using the window.twttr object.  Hence, worked only by a lucky accident.  Note that sites with such race condition issues are 100% broken also when opened in Private Browsing windows or when Tracking Protection with just the default list is turned on.

To conclude on how useful the tailing feature is – unfortunately, at the moment I don’t have enough data to provide (it’s on its way, though.)  So far testing was made mostly locally and on our Web Page Test internal testing infrastructure.  The effect was unfortunately just hidden in the overall noise, hence more scientific and wide testing needs to be done.

 

EDIT: Interesting reactions on www.bleepingcomputer.com and Hacker News.

 

(Note: few somewhat off-topic comments have been trashed in case you wonder why they don’t appear here ; I will only accept comments bringing a benefit to discussion of this feature and its issues, thanks for understanding)

Mozilla Log Analyzer added basic network diagnostics

Mozilla Log Analyzer objects search results

Few weeks ago I’ve published Mozilla Log Analyzer (logan).  It is a very helpful tool itself when diagnosing our logs, but looking at the log lines doesn’t give answers about what’s wrong or right with network requests scheduling.  Lack of other tools, like Backtrack, makes informed decisions on many projects dealing with performance and prioritization hard or even impossible.  The same applies to verification of the changes.

Hence, I’ve added a simple network diagnostics to logan to get at least some notion of how we do with network request and response parallelization during a single page load.  It doesn’t track dependencies, by means of where from exactly a request originates, like which script has added the DOM node leading to a new request (hmm… maybe bug 1394369 will help?) or what all has to load to satisfy DOMContentLoaded or early first paint.  That’s not in powers of logan right now, sorry, and I don’t much plan investing time in it.  My time will be given to Backtrack.

But what logan can give us now is a breakdown of all requests being opened and active before and during a request you pick as your ‘hero request.’  May tell you what the concurrent bandwidth utilization was during the request in question, or what lower priority requests have been scheduled, been active or even done before the hero request.  What requests were blocking the socket where your request was finally dispatched on, and so on…

To obtain this diagnostic breakdown, use the current Nightly (at this time its Firefox 57) and capture logs from the parent AND also child processes with the following modules set:

MOZ_LOG=timestamp,sync,nsHttp:5,cache2:5,DocumentLeak:5,PresShell:5,DocLoader:5,nsDocShellLeak:5,RequestContext:5,LoadGroup:5,nsSocketTransport:5

(sync is optional, but you never know.)

Make sure you let the page you are analyzing to load, it’s OK to cancel too.  It’s best to close the browser then and only after that load all the produced logs (parent + children) to logan.  Find your ‘hero’ nsHttpChannel.  Expand it and then click its breadcrumb at the top of the search results.  There is a small [ diagnose ] button at the top.  Clicking it brings you to the breakdown page with number of sections listing the selected channel and also all concurrent channels according few – I found interesting – conditions.

This all is tracked on github and open to enhancements.

Automatically attaching child and test-spawned Firefox processes in Visual Studio IDE

Did you ever dream of debugging Firefox in Visual Studio with all its child processes attached automatically?  And also when being started externally from a test suit like mochitest or browsertest?  Tired of finding the right pid and time to attach manually?  Here is the solution for you!

Combination of the following two extensions to Visual Studio Community 2015 will do the trick:

  1. Spawned Process Catcher X – attaches automatically to all child processes the debugee (and its children) spawns
  2. Entrian Attach – attaches the IDE automatically to an instance of a process spawned FROM ANYWHERE, e.g. when running tests via mach where Firefox is started by a python script – yes, magic happens ;)

Spawned Process Catcher X works automatically after installation without a need for any configuration.

Entrian Attach is easy to configure: In the IDE, in the main menu go to TOOLS/Entrian Attach: Configuration…, you’ll get the following window:

UPDATE: It’s important to enter the full path for the executable.  The Windows API for capturing process spawning is stupid – it only takes name of an executable, not a full path or wildchars.  Hence you can only specify names of executable files you want Entrian Attach to automatically attach to.  Obviously, when Visual Studio is running with Entrian Attach enabled and you start your regular browser, it will attach too.  I’ve added a toolbar button EntrianAttachEnableDisable to the standard toolbar for a quick switch and status visibility.

Other important option is to set Attach at process start when to “I’m not already debugging its exe”.  Otherwise, when firefox.exe is started externally, a shim process is inserted between the parent and a child process which breaks our security and other checks for expected pid == actual pid.  You would just end up with a MOZ_CRASH.

Note that the extension configuration and the on/off switch are per-solution.

Entrian Attach developer is very responsive.  We’ve already cooked the “When I’m not already debugging its exe” option to allow child process attaching without the inserted shim process, took just few days to release a fixed version.

Entrian Attach is a shareware with 10-day trial.  Then a single developer license is for $29.  There are volume discounts available.  Hence, since this is so super-useful, Mozilla could consider buying a multi-license.  Anyway, I believe it’s money very well spent!

Moz logging (former NSPR logging) file now has a size limit option

There are lot of cases of mainly networking issues rare enough to reproduce making users have their Firefox run for long hours to hit the problem.  Logging is great in bringing the information to us when finally reproduced, but after few hours the log file can be – well – huge.  Easily even gigabytes.

But now we have a size limit, all you need to do:

Adding rotate module to the list of modules will engage log file size limit:

MOZ_LOG=rotate:200,log modules...

The argument is the limit in megabytes.

This will produce up to 4 files with names appended a numbering extension, .0, .1, .2, .3.  The logging back end cycles the files it writes to while sum of these files’ sizes will never go over the specified limit.

The patch just landed on mozilla-central (version 51), bug 1244306.

Note 1: the file with the largest number is not guarantied to be the last file written.  We don’t move the files, we only cycle.  Using the rotate module automatically adds timestamps to the log, so it’s always easy to recognize which file keeps the most recent data.

Note 2: rotate doesn’t support append.  When you specify rotate, on every start all the files (including any previous non-rotated log file) are deleted to avoid any mixture of information.  The append module specified is then ignored.

Illusion of atomic reference counting

Most people believe that having an atomic reference counter makes them safe to use RefPtr on multiple threads without any more synchronization.  Opposite may be truth, though!

Imagine a simple code, using our commonly used helper classes, RefPtr<> and an object Type with ThreadSafeAutoRefCnt reference counter and standard AddRef and Release implementations.

Sounds safe, but there is a glitch most people may not realize.  See an example where one piece of code is doing this, no additional locks involved:

RefPtr<Type> local = mMemeber; // mMember is RefPtr<Type>, holding an object

And other piece of code then, on a different thread presumably:

mMember = new Type(); // mMember's value is rewritten with a new object

Usually, people believe this is perfectly safe.  But it’s far from it.

Just break this to actual atomic operations and put the two threads side by side:

Thread 1

local.value = mMemeber.value;
/* context switch */ 
.
.
.
.
.
.
local.value->AddRef();

Thread 2

.
.
Type* temporary = new Type();
temporary->AddRef();
Type* old = mMember.value; 
mMember.value = temporary; 
old->Release(); 
/* context switch */ 
.

Similar for clearing a member (or a global, when we are here) while some other thread may try to grab a reference to it:

RefPtr<Type> service = sService; // sService is a RefPtr
if (!service) {
  return; // service being null is our 'after shutdown' flag
}

And another thread doing, usually during shutdown:

sService = nullptr; // while sService was holding an object

And here what actually happens:

Thread 1

local.value = sService.value;
/* context switch */
.
.
.
.
local.value->AddRef();

Thread 2

.
.
Type* old = sService.value; 
sService.value = nullptr; 
old->Release(); 
/* context switch */
.

And where is the problem?  Clearly, if the Release() call on the second thread is the last one on the object, the AddRef() on the first thread will do its job on a dying or already dead object.  The only correct way is to have both in and out assignments protected by a mutex or, ensure that there cannot be anyone trying to grab a reference from a globally accessed RefPtr when it’s being finally released or just being re-assigned. The letter may not always be easy or even possible.

Anyway, if somebody has a suggestion how to solve this universally without using an additional lock, I would be really interested!