Posts tagged Mozilla
Just wanted to give another quick Perfherder update. Since the last time, I’ve added summary series (which is what GraphServer shows you), so we now have (in theory) the best of both worlds when it comes to Talos data: aggregate summaries of the various suites we run (tp5, tart, etc), with the ability to dig into individual results as needed. This kind of analysis wasn’t possible with Graphserver and I’m hopeful this will be helpful in tracking down the root causes of Talos regressions more effectively.
Let’s give an example of where this might be useful by showing how it can highlight problems. Recently we tracked a regression in the Customization Animation Tests (CART) suite from the commit in bug 1128354. Using Mishra Vikas‘s new “highlight revision mode” in Perfherder (combined with the revision hash when the regression was pushed to inbound), we can quickly zero in on the location of it:
It does indeed look like things ticked up after this commit for the CART suite, but why? By clicking on the datapoint, you can open up a subtest summary view beneath the graph:
We see here that it looks like the 3-customize-enter-css.all.TART entry ticked up a bunch. The related test 3-customize-enter-css.half.TART ticked up a bit too. The changes elsewhere look minimal. But is that a trend that holds across the data over time? We can add some of the relevant subtests to the overall graph view to get a closer look:
As is hopefully obvious, this confirms that the affected subtest continues to hold its higher value while another test just bounces around more or less in the range it was before.
Hope people find this useful! If you want to play with this yourself, you can access the perfherder UI at http://treeherder.mozilla.org/perf.html.
For the past few months I’ve been working on a sub-project of Treeherder called Perfherder, which aims to provide a workflow that will let us more easily detect and manage performance regressions in our products (initially just those detected in Talos, but there’s room to expand on that later). This is a long term project and we’re still sorting out the details of exactly how it will work, but I thought I’d quickly announce a milestone.
As a first step, I’ve been hacking on a graphical user interface to visualize the performance data we’re now storing inside Treeherder. It’s pretty bare bones so far, but already it has two features which graphserver doesn’t: the ability to view sub-test results (i.e. the page load time for a specific page in the tp5 suite, as opposed to the geometric mean of all of them) and the ability to see results for e10s builds.
Here’s an example, comparing the tp5o 163.com page load times on windows 7 with e10s enabled (and not):
Green is e10s, red is non-e10s (the legend picture doesn’t reflect this because we have yet to deploy a fix to bug 1130554, but I promise I’m not lying). As you can see, the gap has been closing (in particular, something landed in mid-January that improved the e10s numbers quite a bit), but page load times are still measurably slower with this feature enabled.
Lots of movement in mozregression (a tool for automatically determining when a regression was introduced in Firefox by bisecting builds on ftp.mozilla.org) in the last few months. Here’s some highlights:
- Support for win64 nightly and inbound builds (Kapil Singh, Vaibhav Agarwal)
- Support for using an http cache to reduce time spent downloading builds (Sam Garrett)
- Way better logging and printing of remaining time to finish bisection (Julien Pagès)
- Much improved performance when bisecting inbound (Julien)
- Support for automatic determination on whether a build is good/bad via a custom script (Julien)
- Tons of bug fixes and other robustness improvements (me, Sam, Julien, others…)
Also thanks to Julien, we have a spiffy new website which documents many of these features. If it’s been a while, be sure to update your copy of mozregression to the latest version and check out the site for documentation on how to use the new features described above!
Thanks to everyone involved (especially Julien) for all the hard work. Hopefully the payoff will be a tool that’s just that much more useful to Firefox contributors everywhere.
Over last few months, I discovered the joy that is CSS Flexbox, which solves the “how do I lay out this set of div’s in horizontally or vertically”. I’ve used it in three projects so far:
- Centering the timer interface in my meditation app, so that it scales nicely from a 320×480 FirefoxOS device all the way up to a high definition monitor
- Laying out the chart / sidebar elements in the Eideticker dashboard so that maximum horizontal space is used
- Fixing various problems in the Treeherder UI on smaller screens (see bug 1043474 and its dependent bugs)
Flexbox has pretty much put an end to these problems for me. There’s no longer any need to “give up and use tables” because using flexbox is pretty much just *like* using tables for layout, just with more uniform and predictable behaviour. They’re so great. I think we’re pretty close to Flexbox being supported across all the major browsers, so it’s fair to start using them for custom web applications where compatibility with (e.g.) IE8 is not an issue.
To try and spread the word, I wrote up a howto article on using flexbox for web applications on MDN, covering some of the common use cases I mention above. If you’ve been curious about flexbox but unsure how to use it, please have a look.
I just released mozregression 0.24. This would be a good time to note some of the user-visible fixes / additions that have gone in recently:
Thanks to Sam Garrett, you can now specify a different branch other than inbound to get finer grained regression ranges from. E.g. if you’re pretty sure a regression occurred on fx-team, you can do something like:
mozregression --inbound-branch fx-team -g 2014-09-13 -b 2014-09-14
- Fixed a bug where we could get an incorrect regression range (bug 1059856). Unfortunately the root cause of the bug is still open (it’s a bit tricky to match mozilla-central commits to that of other branches) but I think this most recent fix should make things work in 99.9% of cases. Let me know if I’m wrong.
- Thanks to Julien Pagès, we now download the inbound build metadata in parallel, which speeds up inbound bisection quite significantly
If you know a bit of python, contributing to mozregression is a great way to have a high impact on Mozilla. Many platform developers use this project in their day-to-day work, but there’s still lots of room for improvement.
Over the past two weeks, I’ve been working a bit on the Treeherder front end (our interface to managing build and test jobs from mercurial changesets), trying to help get things in shape so that the sheriffs can feel comfortable transitioning to it from tbpl by the end of the quarter.
One thing that has pleasantly surprised me is just how easy it’s been to get going and be productive. The process looks like this on Linux or Mac:
git clone https://github.com/mozilla/treeherder-ui.git
We have a fair backlog of issues to get through, many of them related to the front end. If you’re interested in helping out, please have a look:
If nothing jumps out at you, please drop by irc.mozilla.org #treeherder and we can probably find something for you to work on. We’re most active during Pacific Time working hours.
I had some time on my hands two weekends ago and was feeling a bit of an itch to build something, so I decided to do a project I’ve had in the back of my head for a while: a meditation timer.
If you’ve been following this log, you’d know that meditation has been a pretty major interest of mine for the past year. The foundation of my practice is a daily round of seated meditation at home, where I have been attempting to follow the breath and generally try to connect with the world for a set period every day (usually varying between 10 and 30 minutes, depending on how much of a rush I’m in).
Clock watching is rather distracting while sitting so having a tool to notify you when a certain amount of time has elapsed is quite useful. Writing a smartphone app to do this is an obvious idea, and indeed approximately a zillion of these things have been written for Android and iOS. Unfortunately, most are not very good. Really, I just want something that does this:
- Select a meditation length (somewhere between 10 and 40 minutes).
- Sound a bell after a short preparation to demarcate the beginning of meditation.
- While the meditation period is ongoing, do a countdown of the time remaining (not strictly required, but useful for peace of mind in case you’re wondering whether you’ve really only sat for 25 minutes).
- Sound a bell when the meditation ends.
Yes, meditation can get more complex than that. In Zen practice, for example, sometimes you have several periods of varying length, broken up with kinhin (walking meditation). However, that mostly happens in the context of a formal setting (e.g. a Zendo) where you leave your smartphone at the door. Trying to shoehorn all that into an app needlessly complicates what should be simple.
Even worse are the apps which “chart” your progress or have other gimmicks to connect you to a virtual “community” of meditators. I have to say I find that kind of stuff really turns me off. Meditation should be about connecting with reality in a more fundamental way, not charting gamified statistics or interacting online. We already have way too much of that going on elsewhere in our lives without adding even more to it.
So, you might ask why the alarm feature of most clock apps isn’t sufficient? Really, it is most of the time. A specialized app can make selecting the interval slightly more convenient and we can preselect an appropriate bell sound up front. It’s also nice to hear something to demarcate the start of a meditation session. But honestly I didn’t have much of a reason to write this other than the fact than I could. Outside of work, I’ve been in a bit of a creative rut lately and felt like I needed to build something, anything and put it out into the world (even if it’s tiny and only a very incremental improvement over what’s out there already). So here it is:
The app was written entirely in HTML5 so it should work fine on pretty much any reasonably modern device, desktop or mobile. I tested it on my Nexus 5 (Chrome, Firefox for Android)1, FirefoxOS Flame, and on my laptop (Chrome, Firefox, Safari). It lives on a subdomain of this site or you can grab it from the Firefox Marketplace if you’re using some variant of Firefox (OS). The source, such as it is, can be found on github.
I should acknowledge taking some design inspiration from the Mind application for iOS, which has a similarly minimalistic take on things. Check that out too if you have an iPhone or iPad!
1 Note that there isn’t a way to inhibit the screen/device from going to sleep with these browsers, which means that you might miss the ending bell. On FirefoxOS, I used the requestWakeLock API to make sure that doesn’t happen. I filed a bug to get this implemented on Firefox for Android.
[ For more information on the Eideticker software I’m referring to, see this entry ]
Just wanted to write up a few notes on using Eideticker to measure animation smoothness, since this is a topic that comes up pretty often and I wind up explaining these things repeatedly. 😉
When rendering web content, we want the screen to update something like 60 times per second (typical refresh rate of an LCD screen) when an animation or other change is occurring. When this isn’t happening, there is often a user perception of jank (a.k.a. things not working as they should). Generally we express how well we measure up to this ideal by counting the number of “frames per second” that we’re producing. If you’re reading this, you’re probably already familiar with the concept in outline. If you want to know more, you can check out the wikipedia article which goes into more detail.
At an internal level, this concept matches up conceptually with what Gecko is doing. The graphics pipeline produces frames inside graphics memory, which is then sent to the LCD display (whether it be connected to a laptop or a mobile phone) to be viewed. By instrumenting the code, we can see how often this is happening, and whether it is occurring at the right frequency to reach 60 fps. My understanding is that we have at least some code which does exactly this, though I’m not 100% up to date on how accurate it is.
But even assuming the best internal system monitoring, Eideticker might still be useful because:
- It is more “objective”. This is valuable not only for our internal purposes to validate other automation (sometimes internal instrumentation can be off due to a bug or whatever), but also to “prove” to partners that our software has the performance characteristics that we claim.
- The visual artifacts it leaves behind can be valuable for inspection and debugging. i.e. you can correlate videos with profiling information.
Unfortunately, deriving this sort of information from a video capture is more complicated than you’d expect.
What does frames per second even mean?
Given a set of N frames captured from the device, the immediate solution when it comes to “frames per second” is to just compare frames against each other (e.g. by comparing the value of individual pixels) and then counting the ones that are different as “unique frames”. Divide the total number of unique frames by the length of the
capture and… voila? Frames per second? Not quite.
First off, there’s the inherent problem that sometimes the expected behaviour of a test is for the screen to be unchanging for a period of time. For example, at the very beginning of a capture (when we are waiting for the input event to be acknowledged) and at the end (when we are waiting for things to settle). Second, it’s also easy to imagine the display remaining static for a period of time in the middle of a capture (say in between gestures in a multi-part capture). In these cases, there will likely be no observable change on the screen and thus the number of frames counted will be artificially low, skewing the frames per second number down.
Ok, so you might not consider that class of problem that big a deal. Maybe we could just not consider the frames at the beginning or end of the capture. And for pauses in the middle… as long as we get an absolute number at the end, we’re fine right? That’s at least enough to let us know that we’re getting better or worse, assuming that whatever we’re testing is behaving the same way between runs and we’re just trying to measure how many frames hit the screen.
I might agree with you there, but there’s a further problems that are specific to measuring on-screen performance using a high-speed camera as we are currently with FirefoxOS.
An LCD updates gradually, and not all at once. Remnants of previous frames will remain on screen long past their interval. Take for example these five frames (sampled at 120fps) from a capture of a pan down in the FirefoxOS Contacts application (movie):
Note how if you look closely these 5 frames are actually the intersection of *three* seperate frames. One with “Adam Card” at the top, another with “Barbara Bloomquist” at the top, then another with “Barbara Bloomquist” even further up. Between each frame, artifacts of the previous one are clearly visible.
Plausible sounding solutions:
- Try to resolve the original images by distinguishing “new” content from ghosting artifacts. Sounds possible, but probably hard? I’ve tried a number of simplistic techniques (i.e. trying to find times when change is “peaking”), but nothing has really worked out very well.
- Somehow reverse engineering the interface between the graphics chipset and the LCD panel, and writing some kind of custom hardware to “capture” the framebuffer as it is being sent from one to the other. Also sounds difficult.
- Just forget about this problem altogether and only try to capture periods of time in the capture where the image has stayed static for a sustained period of time (i.e. for say 4–5 frames and up) and we’re pretty sure it’s jank.
Personally the last solution appeals to me the most, although it has the obvious disadvantage of being a “homebrew” metric that no one has ever heard of before, which might make it difficult to use to prove that performance is adequate — the numbers come with a long-winded explanation instead of being something that people immediately understand.
[ For more information on the Eideticker software I’m referring to, see this entry ]
Just wanted to give an update on where Eideticker is at the end of Q2 2014. The big news is that we’ve started to run startup tests against the Flame, the results of which are starting to appear on the dashboard:
It is expected that these tests will provide a useful complement to the existing startup tests we’re running with b2gperf, in particular answering the “is this regression real?” question.
Pending work for Q3:
- Enable scrolling tests on the Flame. I got these working against the Hamachi a few months ago but because of some weird input issue we’re seeing we can’t yet enable them on the Flame. This is being tracked in bug 1028824. If anyone has background on the behaviour of the touch screen driver for this device I would appreciate some help.
- Enable tests for multiple branches on the Flame (currently we’re only doing master). This is pretty much ready to go (bug 1017834), just need to land it.
- Annotate eideticker graphs with internal benchmark information. Eli Perelman of the FirefoxOS performance team has come up with a standard set of on-load events for the Gaia apps (app chrome loaded, app content loaded, …) that each app will generate, feeding into tools like b2gperf and test-perf. We want to show this information in Eideticker’s frame-by-frame analysis (example) so we can verify that the app’s behaviour is consistent with what it is claimed. This is being tracked in bug 1018334
- Re-enable Eideticker for Android and run tests more frequently. Sadly we haven’t been consistently generating new Eideticker results for Android for the last quarter because of networking issues in the new Mountain View office, where the test rig for those live. One way or another, we want to fix this next quarter and hopefully run tests more frequently against mozilla-inbound (instead of just nightly builds)
The above isn’t an exhaustive list: there’s much more that we have in mind for the future that’s not yet scheduled or defined well (e.g. get Eideticker reporting to Treeherder’s new performance module). If you have any questions or feedback on anything outlined above I’d love to hear it!
Just wanted to make a quick announcement that ManifestDestiny, the python package we use internally here at Mozilla for declaratively managing lists of tests in Mochitest and other places, has been renamed to manifestparser. We kept the versioning the same (0.6), so the only thing you should need to change in your python package dependencies is a quick substitution of “ManifestDestiny” with “manifestparser”. We will keep ManifestDestiny around indefinitely on pypi, but only to make sure old stuff doesn’t break. New versions of the software will only be released under the name “manifestparser”.
Quick history lesson: “Manifest destiny” refers to a philosophy of exceptionalism and expansionism that was widely held by American settlers in the 19th century. The concept is considered offensive by some, as it was used to justify displacing and dispossessing Native Americans. Wikipedia’s article on the subject has a good summary if you want to learn more.
Here at Mozilla Tools & Automation, we’re most interested in creating software that everyone can feel good about depending on, so we agreed to rename it. When I raised this with my peers, there were no objections. I know these things are often the source of much drama in the free software world, but there’s really none to see here.
Happy manifest parsing!