After some two months of coding me and Michal Novotný are closing to have first "private testing" stable enough build with new and simplified HTTP cache back-end.
The two main goals we've met are:
- Be resilient to crashes and process kills
- Get rid of any UI hangs or freezes (a.k.a janks)
We've abandoned the current disk format and use separate file for each URL however small in size it is. Each file is using self-control hashes to check it's correct, so no fsyncs are needed. Everything is asynchronous or fully buffered. There is a single background thread to do any I/O like opening, reading and writing. On Android we are writing to the context cache directory. This way the cached data are actually treated as that.
I've performed some first tests using http://janbambas.cz/ as a test page. Currently as I write this post there are some 460 images. Testing was made on a relatively fast machine, but important is to differentiate on the storage efficiency. I had two extremes available: an SSD and an old and slow-like-hell microSD via a USB reader.
Testing with a microSD card:
Full load | First paint | |
---|---|---|
mozilla-central | 16s | 7s |
new back-end | 12s | 4.5s |
new back-end and separate threads for open/read/write | 10.5s | 3.5s |
Full load | First paint | |
---|---|---|
mozilla-central | 7s | 700ms |
new back-end | 5.5s | 500ms |
new back-end and separate thread for open/read/write | 5.5s | 500ms |
Full load | First paint | |
---|---|---|
mozilla-central | 900ms | 900ms |
new back-end | 400ms | 400ms |
Full load | First paint | |
---|---|---|
mozilla-central | 5s | 4.5s |
new back-end | ~28s | 5-28s |
new back-end and separate threads for open/read/write *) | ~26s | 5-26s |
*) Here I'm getting unstable results. I'm doing more testing with having more concurrent open and read threads. It seems there is not that much effect and the jitter in time measurements is just a noise.
I will report on thread concurrent I/O more in a different post later since I find it quite interesting space to explore.
Clearly the cold "type and go" test case shows that blockfiles are beating us here. But the big difference is that UI is completely jank free with the new back-end!
Testing on an SSD disk:
The results are not that different for the current and the new back-end, only a small regression in warmed and cold "go to" test cases:
Full load | First paint | |
---|---|---|
mozilla-central | 220ms | 230ms |
new back-end | 310ms | 320ms |
Full load | First paint | |
---|---|---|
mozilla-central | 600ms | 600ms |
new back-end | 1100ms | 1100ms |
Having multiple threads seems not to have any affect as far as precision of my measurements goes.
At this moment I am not sure what causes the regression for both the "go to" cases on an SSD, but I believe it's just a question of some simple optimizations, like delivering more then just 4096 bytes per a thread loop as we do now or a fact we don't cache redirects - it's a known bug right now.
Still here and want to test your self? Test builds can be downloaded from 'gum' project tree. Disclaimer: the code is very very experimental at this stage, so use at your own risk!
These are very interesting results, thank you for posting the update :)
- How are you measuring the "full load" and "first paint" times? Are you using about:timeline?
- Is full page load with the new cache really 5 times slower in the micro SD/"type URL and go, cached but not warmed" case?
- Are you planning to measure the performance benefits of the other new cache features listed in https://wiki.mozilla.org/Necko/Cache/Plans#Primary_Design_Goals ?
Yes, I'm using about:timeline, intensively :) I've just added locally a probe for first-paint.
The cold load is really that much slower. More testing I've just made shows that having multiple threads for I/O doesn't really help. At least on the Windows system, the microSD and the reader I posses.
I was thinking about it more and I believe we need just a better scheduling of operations on the I/O thread. We will probably need our own event queue implementation that will be smarter. Prioritizing by resource type (.html, .css first) is IMO a must. What actually slows the stuff down when loading my homepage is the huge number of images and also one .ogg audio file that I often see blocked in file-open queue. Only that prolongs the time progress is spinning which is the overall load time but not actual content load time. I often find the rendered page fully loaded (all images are there) but the progress still spins, then about:timeline gives me the answer - the ogg file has been blocked for some 20 seconds. But sometimes this way is blocked a css file. I've seen it scheduled to open 1.2 secs after load start but actually it opened 24 seconds later (!) while read was then instant. That advances first paint time really a lot.
When getting from the network is faster than loading from a (cold) cache, you've lost.
I know. I believe that better scheduling I talk about in one of the comments will improve it. It's our next task to work on. I'll report back with what results we'll get.
I am going to nick pick, might be to me only but i find it hard to read the table. I would have expected a % difference between each branch. So the Rolls and Columns should swap with % against the main branch as well.
While It was nice to see huge improvement in slow I/O scenario. It doesn't look like SSD benefits, so may be bottleneck is some where else.
I think I know what the bottlenecks are. I also refer to them in the post and in the comments here. To sum: better I/O scheduling, cache also redirects, deliver data already in memory faster to consumers.
Do you think it would be good idea to do additional, SSD only optimizations, if a SSD is detected on a system ? (like reading and writing multiple files at once, to exploit the parallel architecture of SSD's ? )
Also,rather than an SSD or a microSD card, could you do some tests on 5400/7200 RPM HDD's ?
Re SSD: sure, but there is much more to optimize. I will do additional and separate research in this area.
Re HDD: yes, I plan to do it. If results are interesting, I'll update the post with additional numbers.
> like reading and writing multiple files at once, to exploit the parallel architecture of SSD’s
Don't you have this advantage of HDDs too thanks to NCQ?
Something to explore, for sure. Though, probably will not happen for the first version.
important work cause when i remove all from recent apps menu on my tablet (android 4.1.1) the cache gets corrupted and starts again . well on a data package like 8gb it becomes annoying .
on the right way you are at least on mobile .
got the gum nightly build but its the same when i close firefox from recent apps on samsung cache servic e restarts
I think I know what could be the cause. The latest work on the Gum tree is with the new cache back-end disabled. I'm now shaping the source code to a state we can check-in the whole work on mozilla-central (a.k.a. Firefox Nightly) with old cache still being used. There is then a preference you can switch to enable the new cache code.
Please install the very latest gum builds (https://tbpl.mozilla.org/?tree=Gum) and manually set browser.cache.use_new_backend preference to 1 in about:config. Then try again your test case and let us know!
Also, I'd be interested to know how you recognize the cache is completely gone. Is it just by page load speed being slow after restart?
well the problem is :
when i tap recent apss menu on my tablet and tap remove all apps , firefox cache starts again
i got that many times on firefox 23 then i started to search for the source of the problem .
i did a test not scientifique but approves it.
if you add a quit button via addons and close firefox from the quit button and check the about:cache page cache is always there
but if you remove firefox from recent apps (while the cache service is working writing data or etc) the cache gets corrupted sometimes and you see 0kb on about cache .
i did that you told and grabbed latest build of gum and started trying will report you back after heavy usage of force kill while firefox is working . opera and chrome have no issues like this
lets c what happens .
Thanks for your reply.
Killing the Firefox process (the running app) is the same as if the process would crash. It is a known symptom of the old (current) cache back-end that this work (the new cache) is about to solve.
Let you know that about:cache DOESN'T work when the new cache back-end is enabled. Thus don't get confused when seeing 0kB under about:cache when using the new cache back-end.
The way you can check the cache remains intact after killing (a.k.a removing) all apps is to look under Settings -> Applications -> Downloaded -> Nightly -> Cache. If you browse a bit with the new cache enabled, you will see how the Cache occupation is non-zero. After 'remove all apps' the size of the Cache has to remain unchanged.
well i know your cache approach is promising
i saw this and got that the new backend is working
wyciwyg://0/about:blank
i got the latest gum build 26. a 1 and now i am happy but file selector is not working now and nightly crashes if i select a file to upload. is it something about the new backend or general problem .
gum solved a big issue for me cause i love firefox interface on my 7 inch tablet and have a limited data package.
do you have a chance to build a new backend enabled stable release of 23 ? anyway gum rocks .
good work
Thanks for a quick reply. I like the "gum rocks" line :D Thanks!
The file selector crash is not a problem with the new cache, but a known bug. It's reproducible with an unmodified Nightly built from the gum base revision (https://tbpl.mozilla.org/?rev=218d4334d29e) which is almost a week old. It might already been fixed by now, I plan on regular re-base next week.
There is a good chance the new cache will land in Firefox 27, by default disabled. Don't expect it any sooner, sorry. It might happen that we turn it on early for mobile where users, like you, have the worst experience with the current cache.
I'll post updates.
Cheers.
following you up here and your nightly gum builds ... installed todays build and file selector is not working they didnt fix it i think .
no matter thanks for gum and your daily builds hope it will be fixed in time lol
respects