In my previous post I was writing about the new cache backend and some of the very first testing.
Now we've stepped further and there are significant improvements. I was also able to test with more various hardware this time.
The most significant difference is a single I/O thread with relatively simple event prioritization. Opening and reading urgent (render-blocking) files is done first, opening and reading less priority files after that and writing is performed as the last operation. This greatly improves performance when loading from non-warmed cache and also first paint time in many scenarios.
The numbers are much more precise then in the first post. My measuring is more systematic and careful by now. Also, I've merged gum with latest mozilla-central code few times and there are for sure some improvements too.
Here are the results, I'm using 50MB limit for keeping cached stuff in RAM.
[ complete page load time / first paint time ]
Backend | First visit | Warm go to 1) | Cold go to 2) | Reload |
---|---|---|---|---|
mozilla-central | 7.6s / 1.1s | 560ms / 570ms | 1.8s / 1.7s | 5.9s / 900ms |
new back-end | 7.6s / 1.1s | 530ms / 540ms | 2.1s / 1.9s** | 6s / 720ms |
Backend | First visit | Warm go to 1) | Cold go to 2) | Reload |
---|---|---|---|---|
mozilla-central | 7.3s / 1.2s | 1.4s / 1.4s | 2.4s / 2.4s | 5.1s / 1.2s |
new back-end | 7.3s/ 1.2s or** 9+s / 3.5s | 1.35s / 1.35s | 2.3s / 2.1s | 4.8s / 1.2s |
Backend | First visit | Warm go to 1) | Cold go to 2) | Reload |
---|---|---|---|---|
mozilla-central | 6.7s / 600ms | 235ms / 240ms | 530ms / 530ms | 4.7s / 540ms |
new back-end | 6.7s / 600ms | 195ms / 200ms | 620ms / 620ms*** | 4.7s / 540ms |
Backend | First visit | Warm go to 1) | Cold go to 2) | Reload |
---|---|---|---|---|
mozilla-central | 13.5s / 6s | 600ms / 600ms | 1s / 1s | 7.3s / 1.2s |
new back-end | 7.3s / 780ms or** 13.7s / 1.1s | 195ms / 200ms | 1.6 or 3.2s* / 460ms*** | 4.8s / 530ms |
To sum - most significant changes appear when using a really slow media. For sure, first paint times greatly improves, not talking about the 10000% better UI responsiveness! Still, space left for more optimizations. We know what to do:
- deliver data in larger chunks ; now we fetch only by 16kB blocks, hence larger files (e.g. images) load very slowly
- think of interaction with upper levels by means of having some kind of an intelligent flood control
1) Open a new tab and navigate to a page when the cache is already pre-warmed, i.e. data are already fully in RAM.
2) Open a new tab and navigate to a page right after the Firefox start.
* I was testing with my blog home page. There are few large images, ~750kB and ~600kB. Delivering data to upper level consumers only by 16kB chunks causes this suffering.
** This is an interesting regression. Sometimes with the new backend we delay first paint and overall load time. Seems like the cache engine is 'too good' and opens the floodgate too much overwhelming the main thread event queue. Needs more investigation.
*** Here it's combination of flood fill of the main thread with image loads, slow image data load it self and fact, that in this case we first paint after all resources on the page loaded - that needs to change. It's also supported by fact that cold load first paint time is significantly faster on microSD then on SSD. The slow card apparently simulates the flood control here for us.
Make sure you can structure your metadata so you can read most entries without using seek(). Then open files with
FILE_FLAG_SEQUENTIAL_SCAN so the OS knows you are gonna read things in sequences and continues warming cache for you in background.
16kb is too small indeed. 32 or 64kb should be a minimum. Modern ssds operate with 1mb erase block sizes, so writes are most efficient when done with huge buffers. FILE_FLAG_SEQUENTIAL_SCAN seems to kick in faster if you use large read buffer...but I might be wrong about that. Can verify this with xperf eg https://developer.mozilla.org/en-US/docs/Performance/Cold_Startup_Profiling_with_Xperf
Hi , i tought thet ram for cache was disabled time ago , how do you set the max ram usage ?
In about cache a limit of 32 mbyte is setted , but the real usage is almost inexistend , just some kbyte
thx for the good work
Maurizio
Forget about the current cache implementation, it all goes away. I never new how exactly about:cache was working. AFAIK, the current limit is for cache area used to store stuff we don't want to store on disk (e.g. responses with the no-store header or content you visit in a private browsing window).
My "memory cache" approach is different. I decided that it would be good to have a pool of stuff that is used often kept in memory to prevent frequent reload from disk. This pool can be understood as a "write-back cache" for the disk cache as well as a pool to keep data we don't want to persist. Disk, memory and private cached content all share this one pool's quota. When this quota is overreached less used and oldest content is purged from this memory pool to free up resources.
The memory limit in the new cache backend is controlled by browser.cache.memory_limit preference. It can be found in builds from the Gum project tree where we develop. It has different default values for desktop (50MB) and for mobile and boot2gecko (~5MB).
Thank you for the update, it's great to see the new cache numbers improve so much :)
I was wondering, when you compared the old network cache vs the new network cache, did they both have the same memory cache size? I think it's important for the "warm cache" test case.
I don't know how the old cache in this manner works. I think we load blockfiles or parts of them into memory and keep them there, but that's just a bet. I have no idea what the limit can be also.