Category Archives: Data Visualization

It’s all about the entropy

[ For more information on the Eideticker software I'm referring to, see this entry ]

So recently I’ve been exploring new and different methods of measuring things that we care about on FirefoxOS — like startup time or amount of checkerboarding. With Android, where we have a mostly clean signal, these measurements were pretty straightforward. Want to measure startup times? Just capture a video of Firefox starting, then compare the frames pixel by pixel to see how much they differ. When the pixels aren’t that different anymore, we’re “done”. Likewise, to measure checkerboarding we just calculated the areas of the screen where things were not completely drawn yet, frame-by-frame.

On FirefoxOS, where we’re using a camera to measure these things, it has not been so simple. I’ve already discussed this with respect to startup time in a previous post. One of the ideas I talk about there is “entropy” (or the amount of unique information in the frame). It turns out that this is a pretty deep concept, and is useful for even more things than I thought of at the time. Since this is probably a concept that people are going to be thinking/talking about for a while, it’s worth going into a little more detail about the math behind it.

The wikipedia article on information theoretic entropy is a pretty good introduction. You should read it. It all boils down to this formula:

wikipedia-entropy-formula

You can see this section of the wikipedia article (and the various articles that it links to) if you want to break down where that comes from, but the short answer is that given a set of random samples, the more different values there are, the higher the entropy will be. Look at it from a probabilistic point of view: if you take a random set of data and want to make predictions on what future data will look like. If it is highly random, it will be harder to predict what comes next. Conversely, if it is more uniform it is easier to predict what form it will take.

Another, possibly more accessible way of thinking about the entropy of a given set of data would be “how well would it compress?”. For example, a bitmap image with nothing but black in it could compress very well as there’s essentially only 1 piece of unique information in it repeated many times — the black pixel. On the other hand, a bitmap image of completely randomly generated pixels would probably compress very badly, as almost every pixel represents several dimensions of unique information. For all the statistics terminology, etc. that’s all the above formula is trying to say.

So we have a model of entropy, now what? For Eideticker, the question is — how can we break the frame data we’re gathering down into a form that’s amenable to this kind of analysis? The approach I took (on the recommendation of this article) was to create a histogram with 256 bins (representing the number of distinct possibilities in a black & white capture) out of all the pixels in the frame, then run the formula over that. The exact function I wound up using looks like this:


def _get_frame_entropy((i, capture, sobelized)):
    frame = capture.get_frame(i, True).astype('float')
    if sobelized:
        frame = ndimage.median_filter(frame, 3)

        dx = ndimage.sobel(frame, 0)  # horizontal derivative
        dy = ndimage.sobel(frame, 1)  # vertical derivative
        frame = numpy.hypot(dx, dy)  # magnitude
        frame *= 255.0 / numpy.max(frame)  # normalize (Q&D)

    histogram = numpy.histogram(frame, bins=256)[0]
    histogram_length = sum(histogram)
    samples_probability = [float(h) / histogram_length for h in histogram]
    entropy = -sum([p * math.log(p, 2) for p in samples_probability if p != 0])

    return entropy

[Context]

The “sobelized” bit allows us to optionally convolve the frame with a sobel filter before running the entropy calculation, which removes most of the data in the capture except for the edges. This is especially useful for FirefoxOS, where the signal has quite a bit of random noise from ambient lighting that artificially inflate the entropy values even in places where there is little actual “information”.

This type of transformation often reveals very interesting information about what’s going on in an eideticker test. For example, take this video of the user panning down in the contacts app:

If you graph the entropies of the frame of the capture using the formula above you, you get a graph like this:

contacts scrolling entropy graph
[Link to original]

The Y axis represents entropy, as calculated by the code above. There is no inherently “right” value for this — it all depends on the application you’re testing and what you expect to see displayed on the screen. In general though, higher values are better as it indicates more frames of the capture are “complete”.

The region at the beginning where it is at about 5.0 represents the contacts app with a set of contacts fully displayed (at startup). The “flat” regions where the entropy is at roughly 4.25? Those are the areas where the app is “checkerboarding” (blanking out waiting for graphics or layout engine to draw contact information). Click through to the original and swipe over the graph to see what I mean.

It’s easy to see what a hypothetical ideal end state would be for this capture: a graph with a smooth entropy of about 5.0 (similar to the start state, where all contacts are fully drawn in). We can track our progress towards this goal (or our deviation from it), by watching the eideticker b2g dashboard and seeing if the summation of the entropy values for frames over the entire test increases or decreases over time. If we see it generally increase, that probably means we’re seeing less checkerboarding in the capture. If we see it decrease, that might mean we’re now seeing checkerboarding where we weren’t before.

It’s too early to say for sure, but over the past few days the trend has been positive:

entropy-levels-climbing
[Link to original]

(note that there were some problems in the way the tests were being run before, so results before the 12th should not be considered valid)

So one concept, at least two relevant metrics we can measure with it (startup time and checkerboarding). Are there any more? Almost certainly, let’s find them!

Automatically measuring startup / load time with Eideticker

So we’ve been using Eideticker to automatically measure startup/pageload times for about a year now on Android, and more recently on FirefoxOS as well (albeit not automatically). This gives us nice and pretty graphs like this:

flot-startup-times-gn

Ok, so we’re generating numbers and graphing them. That’s great. But what’s really going on behind the scenes? I’m glad you asked. The story is a bit different depending on which platform you’re talking about.

Android

On Android we connect Eideticker to the device’s HDMI out, so we count on a nearly pixel-perfect signal. In practice, it isn’t quite, but it is within a few RGB values that we can easily filter for. This lets us come up with a pretty good mechanism for determining when a page load or app startup is finished: just compare frames, and say we’ve “stopped” when the pixel differences between frames are negligible (previously defined at 2048 pixels, now 4096 — see below). Eideticker’s new frame difference view lets us see how this works. Look at this graph of application startup:

frame-difference-android-startup
[Link to original]

What’s going on here? Well, we see some huge jumps in the beginning. This represents the animated transitions that Android makes as we transition from the SUTAgent application (don’t ask) to the beginnings of the FirefoxOS browser chrome. You’ll notice though that there’s some more changes that come in around the 3 second mark. This is when the site bookmarks are fully loaded. If you load the original page (link above) and swipe your mouse over the graph, you can see what’s going on for yourself. :)

This approach is not completely without problems. It turns out that there is sometimes some minor churn in the display even when the app is for all intents and purposes started. For example, sometimes the scrollbar fading out of view can result in a significantish pixel value change, so I recently upped the threshold of pixels that are different from 2048 to 4096. We also recently encountered a silly problem with a random automation app displaying “toasts” which caused results to artificially spike. More tweaking may still be required. However, on the whole I’m pretty happy with this solution. It gives useful, undeniably objective results whose meaning is easy to understand.

FirefoxOS

So as mentioned previously, we use a camera on FirefoxOS to record output instead of HDMI output. Pretty unsurprisingly, this is much noisier. See this movie of the contacts app starting and note all the random lighting changes, for example:

My experience has been that pixel differences can be so great between visually identical frames on an eideticker capture on these devices that it’s pretty much impossible to settle on when startup is done using the frame difference method. It’s of course possible to detect very large scale changes, but the small scale ones (like the contacts actually appearing in the example above) are very hard to distinguish from random differences in the amount of light absorbed by the camera sensor. Tricks like using median filtering (a.k.a. “blurring”) help a bit, but not much. Take a look at this graph, for example:

plotly-contacts-load-pixeldiff
[Link to original]

You’ll note that the pixel differences during “static” parts of the capture are highly variable. This is because the pixel difference depends heavily on how “bright” each frame is: parts of the capture which are black (e.g. a contacts icon with a black background) have a much lower difference between them than parts that are bright (e.g. the contacts screen fully loaded).

After a day or so of experimenting and research, I settled on an approach which seems to work pretty reliably. Instead of comparing the frames directly, I measure the entropy of the histogram of colours used in each frame (essentially just an indication of brightness in this case, see this article for more on calculating it), then compare that of each frame with the average of the same measure over 5 previous frames (to account for the fact that two frames may be arbitrarily different, but that is unlikely that a sequence of frames will be). This seems to work much better than frame difference in this environment: although there are plenty of minute differences in light absorption in a capture from this camera, the overall color composition stays mostly the same. See this graph:

plotly-contacts-load-entropy
[Link to original]

If you look closely, you can see some minor variance in the entropy differences depending on the state of the screen, but it’s not nearly as pronounced as before. In practice, I’ve been able to get extremely consistent numbers with a reasonable “threshold” of “0.05″.

In Eideticker I’ve tried to steer away from using really complicated math or algorithms to measure things, unless all the alternatives fail. In that sense, I really liked the simplicity of “pixel differences” and am not thrilled about having to resort to this: hopefully the concepts in this case (histograms and entropy) are simple enough that most people will be able to understand my methodology, if they care to. Likely I will need to come up with something else for measuring responsiveness and animation smoothness (frames per second), as likely we can’t count on light composition changing the same way for those cases. My initial thought was to use edge detection (which, while somewhat complex to calculate, is at least easy to understand conceptually) but am open to other ideas.

NIXI Update

I’ve been working on a new, mobile friendly version of Nixi on-and-off for the past year and a bit. I’m not sure when it’s ever going to be finished, so I thought I might as well post the work-in-progress, which has these noteworthy improvements:

  • Even faster than before (using the Bootstrap library behind the scenes, no longer using slow canvas library to update map)
  • Sexier graphics (thanks to the aforementioned Bootstrap library)
  • Now uses client side URLs to keep track of state as you navigate through the site. This allows you to bookmark a favorite spot (e.g. your home) and then go back to it later. For example, this link will give you a list of BIXI docks near Station C, the coworking space I belong to.

If you use BIXI at all, check it out and let me know what you think!

nixi screenshot

Measuring reduced checkerboarding in mobile Fennec

After my post on measuring checkerboarding in mobile Firefox, Clint Talbert (my fearless manager) suggested I run a before and after test to measure the improvement that just landed as part of bug 709512. After a bit of cleanup, I did so, measuring the delta between my build on December 20th and the latest version of Aurora. The difference is pretty remarkable: at least on the LG G2X that I’ve been using for testing, we’ve gone from checkerboarding between 10-20% of the time and not checkerboarding almost at all (in between two runs of the test with the Aurora build, there is exactly one frame that checkerboards). All credit to Chris Lord for that!

See the video evidence for yourself. Before:

After:

Year end Eideticker update

Just before I leave for some Christmas vacation, it’s time for another update on the state of Eideticker. Since I last blogged about the software, I’ve been working on the following three areas:

  1. Coming up with better algorithm (green screen / red screen) for both determining the area of the capture as well as the start/end of the capture. The harness was already flood filling the area with these colours at the beginning/end of the capture, but now we’re actually using this information. The code’s a little hacky, but it seems to work well enough for the test cases I’ve been using so far.
  2. As a demonstration, I wrote up a quick test that demonstrates checkerboarding on mobile Fennec, and wrote up a quick bit of analysis code to detect this pattern and give an overall measure of how much this test “checkerboards” (i.e. has regions that are not fully painted when the user scrolls). As I understand this is an area that our mobile team is currently working on this problem quite a bit, it will be interesting to watch the numbers given by this test and see if things improve.
  3. It’s a minor thing, but you can now view a complete webm movie of the captured movie right from the web interface.

Here’s a quick demonstration video that shows all the above in action. As before, you might want to watch this full screen:

Happy holidays!

Eideticker update

Since I last blogged about Eideticker, I’ve made some good progress. Here’s some highlights:

  1. Eideticker has a new, much simpler harness and tests are much easier to write. Initially, I was using Talos for this task with the idea that it’s better not to have duplicate code where it’s not really required. Seemed like a fine idea in principle, but in practice Talos’s architecture (which is really oriented around running a large sequence of tests and uploading the results to a central server) was difficult to extend to do what we need to do. At heart, eideticker really only needs to do a few things right now (start up Firefox, start videocapture, load a webpage, stop videocapture) so it’s best to keep things simple.
  2. I’ve reworked the capture analysis API to use numpy behind the scenes. It’s still not quite as fast as I would like (doing a framediff analysis on a 30 second animation still takes a minute or so on my fast machine), but we’re doing an order of magnitude better than before. numpy also seem to have quite the library of routines for doing the types of matrix algebra useful in image analysis, which should be helpful as the project progresses.
  3. I added the beginnings of a fancy pants web interface for browsing captures and doing visualizations on them! I’m pretty happy with how this is turning out so far, it’s already been an incredibly useful tool for debugging Eideticker’s analysis system and I think it will be equally useful for understanding Firefox’s behaviour in general.

Here’s an example analysis session, where I examine a ~60 second capture of the fishtank demo from Microsoft, borrowed from Mark Cote’s speedtest library. You might want to view this fullscreen:

A few interesting things to note about this capture:

1. Our frame comparison algorithm is still comparatively dumb, it just computes the norm of the difference in RGB values between two frames. Since there’s a (very tiny) amount of noise in the capture, we have to use a threshold to determine whether two frames are the same or not. For all that, the FPS estimate it comes with for the fishtank demo seems about right (and unfortunately at 2 fps, it’s not particularly good).
2. I added a green screen / red screen at the start / end of every capture to eliminate race conditions with starting the capture, but haven’t yet actually taken those frames out of the analysis.
3. If you look carefully at the animation, not all of the fish that should be displaying in the demo are. I think this has to do with the new native version of Fennec that I’m using to test (old versions don’t exhibit this property). I filed a bug for this.

What’s next? Well, as I mentioned last time, the real goal is to create a tool that developers will find useful. To that end, we have plans to set up an Eideticker machine in Mozilla Mountain View office that more people can use (either locally or remotely over the VPN). For this to be workable, I need to figure out how to get the full setup working on “demand”. Most of the setup already allows this, with one big exception: the actual Android device that we want to capture video from. The LG G2X that I’m currently using works fine when I have physical access to it, but as far as I can tell it’s not possible to get it outputting proper video of an application unless it’s in an unlocked state (which it obviously isn’t most of the time).

My current thinking is that a Panda Board running a Vanilla version of Android might be a good candidate for a permanently-connected device. It is capable of HDMI output, doesn’t have unwanted the bells and whistles of a physical phone (e.g. a lock screen), and should be much reliable due to its physical networking. So far I haven’t had much luck getting it the video output working with the Decklink capture card, but I’ve only just started trying. Work will continue.

If I can somehow figure that out, and smooth out some of the rough edges with the web interface and capture API, I think the stage will be set for us all to do some pretty interesting stuff! Looking forward to it.

Measuring what the user sees

I’ve been spending the last month or so at Mozilla prototyping a new project called Eideticker which aims to use video capture data and image/frame analysis for performance measurement of Firefox Mobile. It’s still in quite a rough state, but it’s now complete enough that I thought it would be worth spending a bit of time describing both its motivation and how it works.

First, a bit of an introduction. Up to now, our automated performance tools have used entirely synthetic benchmarks (how long til we get the onload event? how many ms since we last hit the main loop?) to gather performance information. As we’ve found out, there’s a lot you can measure with synthetic benchmarks. Tools like Talos have proven themselves by catching performance regressions on a very regular basis.

Still, there’s many things that synthetic benchmarks can’t easily or reliably measure. For example, it’s nice to know that a page has triggered an “onload” event (and the sooner it does that, the better), but what does the browser look like before then? If it’s a complicated or image intensive page, it might take 10 or 15 seconds to load. In this interval, user studies have clearly shown that an application displaying something sooner rather than later is always desirable if it’s not possible to display everything immediately (due to network traffic, CPU constraints, whatever). It’s this area of user-perceived performance that Eideticker aims to help with. Eideticker creates a system to capture live data of what the browser is displaying, then performs image/frame analysis on the result to see how we’re actually doing on these inherently subjective metrics. The above was just one example, others might include:

  • Measuring amount of time it takes to actually see the start page from time of launch.
  • Measuring amount of time you see the checkboard pattern after panning the browser.
  • Measuring the visual artifacts while loading a complicated page (how long does it take to display something? how long until we get something close to the final expected result? how long until we get the actual final result?)

It turns out that it’s possible to put together a system that does this type of analysis using off-the-shelf components. We’re still very much in the early phase, but initial signs are promising. The initial test system has the following pieces:

  1. A Linux workstation equipped with a Decklink extreme 3D video capture card
  2. An Android phone with HDMI output (currently using the LG G2X)
  3. A version of talos modified to video capture the results of a test.
  4. A bit of python code to actually analyze the video capture data.

So far, I’ve got the system working end-to-end for two simple cases. The first is the “pageload” case. This lets you capture the results of loading any page within a talos pageset. Here’s a quick example of the movie we generate from a tsvg test:

Here’s another example, a color cycle test (actually the first test case I created, as a throwaway):

After the video is captured, the next step is to analyze it! As described above (and in further detail on the Eideticker wiki page), there’s lots of things we could measure but the easiest thing is probably just to count the number of unique frames and derive a frame rate for the capture based on that (the higher the better, obviously). Based on an initial prototype from Chris Jones, I’ve started work on a python library to do exactly this. Assuming you have an eideticker capture handy, you can run a tool called “analyze.py” on the command line, and it’ll give you its best guess of the # of unique frames:


(eideticker)wlach@eideticker:~/src/eideticker$ bin/analyze.py ./src/talos/talos/captures/capture-2011-11-11T11:23:51.627183.zip
Unique frames: 121/272

(There are currently some rough edges with this: we’re doing frame comparisons based on per-pixel changes, but the video capture data is slightly noisy so sometimes a pixel changes its value even when nothing has actually happened in the browser)

So that’s what I’ve got working so far. What’s next? Short term, we have some specific high-level goals about where we want to be with the system by the end of the quarter. The big unfinished pieces are getting an end-to-end test involving real user interaction (typing into the URL bar, etc.) going and turning this prototype system into something that’s easy for others to duplicate and is robust enough to be easily extended. Hopefully this will come together fairly quickly now that the basics are in place.

The longer term picture really depends on feedback from the community. Unlike many of the projects we work on in automation & tools, Eideticker is not meant to be something that’s run on every checkin. Rather, it’s intended to be a useful tool that can be run on an as needed basis by developers and QA. We obviously have our own ideas on how something like this might be useful (and what a reasonable user interface might be), but I’ve found in cases like this it’s much better to go to the people who will actually be using this thing. So with that in mind, here’s a call for feedback. I have two very specific questions:

  • Is there a specific problem you’ve been working on that a framework like this might be helpful for?
  • What do you think of the current workflow model described in the README?

My goal is to make something that people will love, so please do let me know what you think. :) Nothing about this project is cast in stone and the last thing I want is to deliver a product that people don’t actually want to use.

Equally, while Eideticker is being written primarily with the goal of making Mobile Firefox better (and in the slightly-less short term, desktop Firefox and Boot to Gecko), much of it is broadly applicable to any user-facing mobile or desktop application. If you think some component of Eideticker might be interesting to your project and want to collaborate, feel free to get in touch.

Faster, but not quite there yet…

So as others have been posting about, we’ve been making some headway on our progress on the GoFaster project. Unfortunately it seems like we’re still some distance away from reaching our magic number of a 2 hour turnaround for each revision pushed.

It’s a bit hard to see the exact number on the graph (someone should fix that), but we seem to teetering around an average of 3 hours at this point. Looking at our build charts, it seems like the critical path has shifted in many cases from Windows to MacOS X. Is there something we can do to close the gap there? Or is there a more general fix which would lead to substantial savings? If you have any thoughts, or would like to help out, we’re scheduled to have a short meeting tomorrow.

Anyone is welcome to join, but note that we’re practical, results-oriented people. Crazy ideas are fun, but we’re most interested in proposals that have measurable data behind them and can be implemented in reasonable amounts of time. :)

A better BIXI web site

There’s much to like about the BIXI bike-sharing system in Montréal: it’s affordable ($78 for a year of biking), accessible and fun to use. There’s absolutely no doubt in my mind that it’s made cycling more of a main stream activity here in Montreal, which benefits everyone (even drivers indirectly gain from less congested streets).

With the arrival of the first BIXI stations in NDG, I decided to subscribe to it this year even though I have a bike of my own. So far, it looks like I’m going to easily use it enough to justify the cost. I still use my regular bike for my commute from NDG to the Plateau, but on the edges there’s a ton of cases where it just makes sense to use something that I don’t have to worry about locking up and returning home. Sometimes I only want to go one way (for weather or whatever other reason). Other times I want to take public transit for one leg of my trip (or day), but need/want to take a quick jaunt elsewhere once I’m downtown.

I do have to say though, their new web site drives me crazy. I’ve thought prety deeply about the domain of creating user-friendly transit-focused web sites, so I think I can speak with some authority here.

Leaving aside it’s value as a promotional tool for the service itself (not my area of expertise), the experience of trying to find a nearby station is complicated by a slow, multi-layered UI that requires repeated clicking and searching to find the nearest station that has bikes available. Why bother with this step when we can just display this information outright on the map? iPhone applications like Bixou have been doing this for years. It’s time we brought the same experience to the desktop.

Thus, I present nixi.ca: a clean, useable interface to BIXI’s bike share system that presents the information you care about as effectively as possible, without all the clutter. I’ve already found it useful, and I hope you do too. Think you can do better? Fork the source on github and submit your changes back to me! Minus some glue code to fetch station info server side, it’s entirely a client-side application written in HTML/JavaScript.

Note that the site uses a bunch of modern HTML5 features, so currently requires a modern browser like Firefox, Chrome, or Safari to display properly. I may or may not fix this. Other notable omissions include support for other cities with the BIXI system (Toronto, Ottawa, …) and French localization. Patches welcome!