Moz logging (former NSPR logging) file now has a size limit option

There are lot of cases of mainly networking issues rare enough to reproduce making users have their Firefox run for long hours to hit the problem.  Logging is great in bringing the information to us when finally reproduced, but after few hours the log file can be – well – huge.  Easily even gigabytes.

But now we have a size limit, all you need to do:

Adding rotate module to the list of modules will engage log file size limit:

MOZ_LOG=rotate:200,log modules...

The argument is the limit in megabytes.

This will produce up to 4 files with names appended a numbering extension, .0, .1, .2, .3.  The logging back end cycles the files it writes to while sum of these files’ sizes will never go over the specified limit.

The patch just landed on mozilla-central (version 51), bug 1244306.

Note 1: the file with the largest number is not guarantied to be the last file written.  We don’t move the files, we only cycle.  Using the rotate module automatically adds timestamps to the log, so it’s always easy to recognize which file keeps the most recent data.

Note 2: rotate doesn’t support append.  When you specify rotate, on every start all the files (including any previous non-rotated log file) are deleted to avoid any mixture of information.  The append module specified is then ignored.

Illusion of atomic reference counting

Regardless if an object’s reference counter is atomic, there is one major problem when a single RefPtr holding it is being re-assigned and read concurrently.

Here I’ll explain on a simple example. Note that all the time we are in a method of a single object, Type has ThreadSafeAutoRefCnt reference counter, when talking Mozilla code-base terms:

RefPtr<Type> local = mMember; // mMember is RefPtr<Type>, holding an object

And other piece of code then, on a different thread:

mMember = new Type(); // mMember's value is rewritten with a new object

Usually, people believe this is perfectly safe. But it’s far from it. Just break this to actual atomic operations and put the two threads side by side:

Thread 1

local.value = mMemeber.value;
/* context switch */
.
.
.
.
.
.
local.value->AddRef();

Thread 2

.
.
Type* temporary = new Type();
temporary->AddRef();
Type* old = mMember.value;
mMember.value = temporary;
old->Release();
/* context switch */
.

Similar for clearing a member (or a global, when we are here) while some other thread may try to grab a reference to it:

RefPtr<Type> service = sService; // sService is a RefPtr
if (!service) return; // service being null is our 'after shutdown' flag

And another thread doing, usually during shutdown:

sService = nullptr; // while sService was holding an object

And here what actually happens:

Thread 1

local.value = sService.value;
/* context switch */
.
.
.
.
local.value->AddRef();

Thread 2

.
.
Type* old = sService.value;
sService.value = nullptr;
old->Release();
/* context switch */
.

And where is the problem? Clearly, if the Release() call on the second thread is the last one on the object, the AddRef() on the first thread will do its job on a dying or already dead object, not talking about further access to a bad pointer.

The only correct way is to have both in and out assignments protected by a mutex or, ensure that there cannot be anyone trying to grab a reference from a globally accessed RefPtr when it’s being finally released or just being re-assigned. The latter may not always be easy or even possible.

Anyway, if somebody knows a way how to solve this universally without using an additional lock, I would be really interested!

Backtrack meets Gecko Profiler

Backtrack is about to be a new performance tool, focused on revealing and solving scheduling and delay problems.  Those are big offenders of performance, very hard to track, and hidden from conventional profilers.

To find out how long and what all has to happen to reach a certain point – an objective, just add a simple instrumentation marker.  When hit during run, it’s added to a list you can then pick from and start tracing to its origin.  Backtrack follows from the selected objective back to the originating user input event that has started the whole processing chain.

The walk-back crosses runnables and their wait time in thread event queues, but also network requests and responses, any code specific queues such as DOM mutations, scheduled reflows or background JS parsing 1), monitor and condvar notifications, mutex acquirements 2), and disk I/O operations.

Visually the result is a single timeline – we can call it a critical path – revealing wait, network and CPU times as distinct intervals involved in reaching solely the selected objective.  Spotting mainly dispatch wait delays is then very easy.  The most important and new is that Backtrack tells you what other operations or events block (makes the critical path wait) and where from have been scheduled.  And more importantly, it recognizes which of them are (or are not) related to reaching the selected objective.  Those not related are then clear candidates for rescheduling.

To distinguish related and unrelated operations Backtrack captures all sub-tasks that are involved in reaching the selected objective.  Good example is the page first paint time – actually unsuppress of painting.  First paint is blocked by loading more than one resource, the HTML and head referenced CSS and JS.  These loads and their processing – the sub-tasks – happen in parallel and only completion of all of them unsuppresses the painting (said in a very simplified way, of course.)  Each such sub-task’s completion is marked with an added instrumentation.  That creates a list of sub-objectives that are then added to the whole picture.

Screen shot of how Backtrack is integrated to the Gecko Profiler Cleopatra web UI

Future improvements:

  • Backtrack could be used in our perfomance automation.  Except calculation of time between an objective and its input source event, it can also calculate CPU vs dispatch delays vs network response time.  It could also be able to filter out code paths clean of any outer jitter.
  • Indeed, networking has strong influence to load times.  Adding more detailed breakdown and analyzes how well we schedule and allocate network resources is one of the next steps.
  • Adding PCAPs or even let Backtrack capture network activity like Wireshark directly from inside Firefox and join it with the Gecko Profiler UI might help too.

The current state of Backtrack development is a work in active progress and is not yet available to users of Gecko Profiler.  There are patches for Gecko, but also for the Cleopatra UI and the Gecko Profiler Add-on.  The UI changes, where also the analyzes happens, are mostly prototype-like and need a clean up.  There are also problems with larger memory consumption and bigger chances to hit OOMs when processing the captured data with Backtrack captured markers.


1) code specific queues need to be manually instrumented
2) with ability to follow to the thread that was keeping the mutex for the time you were waiting to acquire it

 

Filter out errors out of mozilla build on command line

Mozilla build errors filtered once again under the build log

I wrote a small filter script that lists all build errors once again at the bottom of the whole build log, so that you don’t have to look for them like for a needle in a haystack. Something I wanted mach to do natively.  But I was always told something like “filter yourself”.  So here it is :)

  • Download this small script *)
  • Copy it to your source directory or somewhere your $PATH points to
  • run mach as: ./mach build | err

When there are errors during build, those will be listed under the build log and conveniently highlighted.

*) It’s tuned for and mainly targeting mingw, but might well work on linux/osx too.

INACCESSIBLE BOOT DEVICE on Windows 10 boot after update of the Intel Rapid Storage Technology driver

Have a BSOD after you’ve installed the latest version of Intel RST from Intel’s download center? During the boot, staring at the Windows logo and the spinning wheel, after a minute or two getting just INACCESSIBLE_BOOT_DEVICE error and “we must reboot” message?  Restarts, nothing helps?  Yeah, I’ve been there.

How to fix the BSOD

Note: can be applied only when you have updated the driver from a previously working version, since it counts with a previous driver file stored on your disk.

  • Check your BIOS and RAID setting are as expected, since I once encountered iRST update that screwed that up – actually turned off RAID!
  • During boot hold F8, if this doesn’t work for you, you need to “Create a recovery drive” on a USB and boot using that
  • Choose Troubleshoot
  • Choose Advanced options
  • Choose Command Prompt, a command prompt window, as you know it, should open
  • My system drive was mounted as E:, if yours is mounted elsewhere, replace E: in below commands with that letter
  • At the prompt type:
    cd /d E:\Windows\system32\drivers
    ren iaStorA.sys iaStorA.sys-bad-version
    cd ..
    dir /s iaStorA.sys
    
  • That will list something like this:
     Volume in drive C has no label.
    Volume Serial Number is XXXX-XXXX
    
     Directory of E:\Windows\System32\drivers
    
    07/29/2015  19:44         1,462,720 iaStorA.sys
                   1 File(s)      1,462,720 bytes
    
     Directory of E:\Windows\System32\DriverStore\FileRepository\iastorac.inf_amd64_26544f4e51074f52
    
    05/28/2014  10:10           672,104 iaStorA.sys
                   1 File(s)        672,104 bytes
    
     Directory of E:\Windows\System32\DriverStore\FileRepository\iastorac.inf_amd64_61378e65f4f142a0
    
    07/29/2015  19:44         1,462,720 iaStorA.sys
                   1 File(s)      1,462,720 bytes
    
         Total Files Listed:
                   3 File(s)      2,806,928 bytes
     
     
    
  • For me the previous working driver file is apparently at
    DriverStore\FileRepository\iastorac.inf_amd64_26544f4e51074f52

    , yours can be elsewhere, so update the source directory in the copy command below according that

  • Continue typing following commands, you should still be at the E:\Windows\System32 directory:
    copy DriverStore\FileRepository\iastorac.inf_amd64_26544f4e51074f52\iaStorA.sys drivers\iaStorA.sys
  • You should see a “1 file(s) copied” message
  • exit
  • Now normally reboot
  • Good luck!

After that Intel RST service still starts up, shows the status etc, disks are fine, everything seems to work, even after another (deliberate) reboot. Only weird thing was that System Restore for the C: drive was turned off. Not sure if it was caused by the RST update, by the boot problems, by some of my other manual changes (not listed here) or what.. Re-enabling it works fortunately well, so not an issue.

A good thing to do is yet to rollback the driver from the Device Manager (Storage Controllers/Intel(R) *** SATA RAID Controller/Properties/Driver/Roll Back Driver) to put the registry records back to the correct state.

Known iRST BIOS and iRST driver combinations

driver ↓BIOS 11.1.0.1413BIOS 14.8.0.2377
13.1.0.1058worksunknonwn
14.0.0.1143worksunknonwn
14.8.0.1042 BSOD unknonwn
15.2.0.1020 BSOD works

Dear readers, if you know about other versions combinations, please let me know so that we can fill this table with more data. Thanks in advance!