William Lachance's Log: William Lachance's Log

Neural networks mystify me

2023-04-15T14:33:52Z

Spent a good chunk of a week off (re)learning the basics of neural networks: forward propagation, gradient descent, loss functions. It’s taking some time, but gradually I feel some level of understanding is taking hold. What kind of continues to amaze me is how simple it all is: you really only need a high-school level understanding of linear algebra and calculus to understand most of what’s going on behind the scenes. Best as I can tell, the recent innovations in the last few years (in particular transformer models behind things like ChatGPT) are just refinements on top of these basic concepts.

Neural networks (there really there is nothing “neural” about them) are really not new: I remember hearing about that as an undergraduate in the early 2000s (and I think they were rather old hat even then). At the time, they were pretty much dismissed as a warmed-over model of behaviorism, unlikely to be useful anywhere except perhaps in a few simplistic applications. Based on what I saw at the time, I agreed and basically bought into the idea that computers are mainly useful as an adjunct to human processes, systems and intuition.

Thus, I find the fact that these systems can produce something even mildly resembling novel or creative outputs (as is the case with things like ChatGPT and Midjourney) surprising - as in it wasn’t something I saw coming. Yes, much of what has been built using these technologies is overhyped and arguably dangerous. Still, I also don’t want to lose the sense of wonder that this is possible at all. If I was mistaken about this what else might I be missing?

I feel like the best response at this point is to take a step back, learn as much as I can, and then develop an opinion. I expect this process to take at least a year, probably longer.

In case it’s helpful to others, here’s some literature I’ve been working through on these topics.

Some theoretical but approachable material for understanding the basics:

Hacker’s Guide to Neural Networks: Now dislcaimed by the author as out of date, I still found its explanation of neural networks as a circuit quite helpful for building my intuition.
Neural Networks and Deep Learning: Another introduction to the basics, more refined and comprehensive than the above.
How to trick a neural network into thinking a panda is a vulture: Often the best way of understanding something is to break it.

On the ethical side:

On the Dangers of Stochastic Parrots: Good antidote to some of the hype around LLMs, incorporating an understanding of how these systems may further perpetuate social harms. Much more nuanced and interesting than most of the critiques I’ve seen fly by in the last few months.
Resisting Determinstic Thinking: How can we think critically about AI without falling into the trap of black/white thinking (“this is all good” vs. “this is all bad”)

And some articles on how to think about LLMs from a pragmatic perspective as a programmer:

Think of language models like ChatGPT as a “calculator for words”: This feels right to me based on my usage of OpenAI as a “tool for thought so far” (though I don’t think I’ve yet been able to use ChatGPT as effectively as this person).
Cheating is all you need: Some speculative thinking about how this stuff might play out for those of us doing programming from the internet-famous Steve Yegge.

Working with depression

2022-11-26T15:24:01Z

Been struggling with depression over the past couple of weeks. Some of this is seasonal (with the shortening of the days), though I wouldn’t say it always happens. Last year at this time I recall feeling the opposite of depressed: that probably had to do with the fact that I knew I was leaving my previous job at Mozilla and wanted to get as much done as possible. Sometimes a highly motivating life situation can keep it in abeyance. Nonetheless, it’s here now, again, and demands to be dealt with.

What does depression even mean? It’s a bit of a hard thing to pin down, exactly. But from the perspective of looking after my own well being, I don’t think I really need a definition. What matters are the symptoms, which I’d roughly express as:

Knowing what I should be doing but not able to do it
An increase in time spent scanning social media, news sites
Feelings of low self worth
Feelings that nothing really matters
Lack of creativity, improvisation
An increase in self-referential thinking (this post would be an example of that)

A few things I tend to try to diminish its effects:

Go outside, expose myself to sunlight. Explore nature.
Exercise
Meditate
Eat less carbohydrates, more vegetables and protein
Talk to friends
Do nice things for other people

I don’t feel like doing these things but I try with all my might, against all my will, to do them anyway. Any individual action might not do much: but the cumulative effect of doing all of the above seems to have an impact — or at least that’s what I tell myself.

And yet despite my best efforts, it’s not always enough. I do all of the things in the second list, and yet still find myself suffering in all the ways described by the first. What do I do then?

I try to understand that there really isn’t an escape from unpleasant feeling, and that it’s just part of life: glorious and beautiful in its complexity. I try to be curious about what’s going on, even if I think it’s all happened before. If that’s not possible, I at least try to be present with it. That’s all I can do.

Using Sphinx in a Monorepo

2022-09-25T19:55:40Z

Just wanted to type up a couple of notes about working with Sphinx (the python documentation generator) inside a monorepo, an issue I’ve been struggling with (off and on) at Voltus since I started. I haven’t seen much written about this topic despite (I suspect) it being a reasonably frequent problem.

In general, there’s a lot to like about Sphinx: it’s great at handling deeply nested trees of detailed documentation with cross-references inside a version control system. It has local search that works pretty well and some themes (like readthedocs) scale pretty nicely to hundreds of documents. The directives and roles system is pretty flexible and covers most of the common things one might want to express in technical documentation. And if the built-in set of functionality isn’t enough, there’s a wealth of third party extension modules. My only major complaint is that it uses the somewhat obscure restructuredText file format by default, but you can get around that by using the excellent MyST extension.

Unfortunately, it has a pretty deeply baked in assumption that all documentation for your project lives inside a single subfolder. This is fine for a small repository representing a single python module, like this:

<root>
README.md
setup.cfg
pyproject.toml
mymodule/
docs/

However, this doesn’t work for a large monorepo, where you would typically see something like:

<root>/module-1/submodule-a
<root>/module-1/submodule-b
<root>/module-2/submodule-c
...

In a monorepo, you usually want to include a module’s documentation inside its own directory. This allows you to use your code ownership constraints for documentation, among other things.

The naive solution would be to create a sphinx site for every single one of these submodules. This is what happened at Voltus and I don’t recommend it. For a large monorepo you’ll end up with dozens, maybe hundreds of documentation “sites”. Under this scenario, discoverability becomes a huge problem: no longer can you rely on tables of contents and the built-in search to discover content: you just have to “know” where things live. I’m more than nine months in here and I’m still discovering new documentation.

It would be much better if we could somehow collect documentation from other parts of the repository into a single site. Is this possible? tl;dr: Yes. There’s a few solutions, each with their pros and cons.

The obvious solution that doesn’t work

The most obvious solution here is to create a symbolic link inside your documentation directory, say the following:

<root>/docs/
<root>/docs/module-1/submodule-a -> <root>/module-1/submodule-a/docs

Unfortunately, this doesn’t work. ☹️ Sphinx doesn’t follow symbolic links.

Solution 1: Just copy the files in

The most obvious solution is to just copy the files from various parts of the monorepo into place, as part of the build system. Mozilla did this for Firefox, with the moztreedocs system.

The results look pretty good, but this is a bespoke solution. Aside from general ideas, there’s no way I’m going to be able to apply anything in moztreedocs to Voltus’s monorepo (which is based on a completely different build system). And being honest, I’m not sure if the 40+ hour (estimated) effort to reimplement it would be a good use of time compared to other things I could be doing.

Solution 2: Use the include directive with MyST

Later versions of MyST include support for directly importing a markdown file from another part of the repository.

This is a limited form of embedding: it won’t let you import an entire directory of markdown files. But if your submodules mostly just include content in the form of a README.md (or similar), it might just be enough. Just create a directory for these files to live (say services) and slot them in:

<root>/docs/services/module-1/submodule-a/index.md:

```{include} ../../../module-1/submodule-a/README.md
```

I’m currently in the process of implementing this solution inside Voltus. I have optimism that this will be a big (if incremental) step up over what we have right now. There are obviously limits, but you can cram a lot of useful information in a README. As a bonus, it’s a pretty nice marker for those spelunking through the source code (much more so than a forest of tiny documentation files).

Solution 3: Sphinx Collections

This one I just found about today: Sphinx Collections is a small python module that lets you automatically import entire directories of files into your sphinx tree, under a _collections module. You configure it in your top-level conf.py like this:

extensions = [
    ...
    "sphinxcontrib.collections"
]

collections = {
    "submodule-a": {
        "driver": "symlink",
        "source": "/monorepo/module-1/submodule-a/docs",
        "target": "submodule-a"
    },
    ...
}

After setting this up, submodule-a is now available under _collections and you can include it in your table of contents like this:

...

```{toctree}
:caption: submodule-a

_collections/submodule-a/index.md
```

...

At this point, submodule-a’s documentation should be available under http://<my doc domain>/_collections/submodule-a/index.html

Pretty nifty. The main downside I’ve found so far is that this doesn’t play nicely with the Edit on GitHub links that the readthedocs theme automatically inserts (it thinks the files exist under _collections), but there’s probably a way to work around that.

I plan on investigating this approach further in the coming months.

90 days out and in

2022-04-16T16:44:14Z

The 90 day mark just passed at my new gig at Voltus, feels like a good time for a bit of self-reflection.

In general, I think it’s been a good change and that it was the right time to leave Mozilla. Since I left, a few people have asked me why I chose to do so: while the full answer is pretty complicated (these things are never simple!), I think it does ultimately come down to wanting to try something new after 10+ years. I’ve accumulated a fair amount of expertise in web development and data engineering and I wanted to see if I could apply them to a new area that I cared about— in this case, climate change and the energy transition.

Voltus is a much younger and different company than Mozilla was, and there’s no shortage of things to learn and do. Energy markets are a rather interesting technical domain to work in— a big intersection between politics, technology, and business. Lots of very old and very new things all at once. As a still-relatively young company, there is definitely more of a feeling that it’s possible to shape Voltus’s culture and practices, which has been interesting. There’s a bit of a balancing act between sharing what you’ve learned in previous roles while having the humility to recognize that there’s much you still don’t understand in a new workplace.

On the downside, I have to admit that I do miss being able to work in the open. Voltus is currently in the process of going public, which has made me extra shy about saying much of anything about what I’ve been working on in a public forum.

To some extent I’ve been scratching this itch by continuing to work on Irydium when I have the chance. I’ve done up a few new releases in the last couple of months, which I think have been fairly well received inside my very small community of people doing like-minded things. I’m planning on attending (at least part of) a pyodide sprint in early May, which I think should be a lot of fun as well as an opportunity to push browser-based data science forward.

I’ve also kept more of a connection with Mozilla than I thought I would have: some video meetings with former colleagues, answering questions on Element (chat.mozilla.org), even some pull requests where I felt like I could make a quick contribution. I’m still using Firefox, which has actually given me more perspective on some problems that people at Mozilla might not experience (e.g. this screensharing bug which you’d only see if you’re using a WebRTC-based video conferencing solution like Google Meet).

That said, I’m not sure to what extent this will continue: even if the source code to Firefox and the tooling that supports it is technically “open source”, outsiders like myself really have very limited visibility into what Mozilla is doing these days. This makes it difficult to really connect with much of what’s going on or know how I might be able to contribute. While it might be theoretically possible to join Mozilla’s Slack (at least last I checked), that feels like a rabbit hole I’d prefer not to go down. While I’m still interested in supporting Mozilla’s mission, I really don’t want more than one workplace chat tool in my life: there’s a lot of content there that is no longer relevant to me as a non-employee and (being honest) I’d rather leave behind. There’s lots more I could say about this, but probably best to leave it there: I understand that there’s reasons why things are the way they are, even if they make me a little sad.

Leaving Mozilla

2021-12-21T18:43:09Z

I’ve decided to leave Mozilla as an employee: my last day will be December 31st, 2021.

It’s hard to overstate the impact Mozilla has had on my life. In particular, I’m grateful for all the interactions I’ve had with the community: the opportunity to build technology for the public good with people around the world was unique and I’m really going to miss it.

Looking back over the past 10 years, I’m feeling pretty good about the impact I had through building better developer and data tooling: mozregression, Perfherder, Iodide and the Glean Dictionary stand out as particular highlights. Thanks to everyone who worked on those things with me! I am because we are.

Last Lecture

It’s become traditional in Data @ Mozilla for the person leaving to give a last lecture on their way out. I decided to give a talk on a specific area of focus for me over the last couple of years: documentation.

I’m not sure how comprehensible it will be to people outside of my particular context at Mozilla, but it seemed fitting to post it publically regardless.

I’m more convinced than ever that documentation is one of the keys to empowering people to make better decisions with data (no matter what their job title). I hope my efforts here have been helpful.

Joining the Community

After having spent a good chunk of energy on making it possible for people outside the Mozilla Corporation to contribute to our projects, I’m looking forward to seeing what it’s like on the other side of the fence.

I’m not sure right now exactly how active I’ll be, but I plan on sticking around on Matrix and Bugzilla, at least a little bit. If there’s anything I can help you with, feel free to reach out!

Learning about Psychological Safety at the Recurse Center

2021-10-19T12:34:40Z

Last summer, I took a 6-week sabbatical from my job to attend a virtual “programmers retreat” at the Recurse Center. I thought I’d write up some notes on the experience, with a particular lens towards what makes an environment suited towards learning, innovation, and personal growth.

Some context: I’m currently working as a software engineer at Mozilla, building out our data pipeline and analysis tooling. I’ve been at my current position for more than 10 years (my “anniversary” actually passed while I was out). I started out as a senior engineer in 2011, and was promoted to staff engineer in 2016. In tech-land, this is a really long tenure at a company. I felt like it was time to take a break from my day-to-day, explore some new ideas and concepts, and hopefully expose myself to a broader group of people in my field.

My original thinking was that I would mostly be spending this time building out an interactive computation environment I’ve been working on called Irydium. And I did quite a bit of that. However, I think the main thing I took away from this experience was some insight on what makes a remote environment for knowledge work really “click”. In particular, what makes somewhere feel psychologically safe, and how this feeling allows us to innovate and do our best work.

While the Recurse Center obviously has different goals than an organization that builds and delivers consumer software, I do think there are some things that it does that could be applied to Mozilla (and, likely, many other tech workplaces).

What is the Recurse Center?

Most succinctly, the Recurse Center is a “writer’s retreat for programmers”. It tries to provide an environment conducive to learning and creativity, an opportunity to refine your craft and learn new things, both from the act of programming itself and from interactions with the other like-minded people attending. The Recurse Center admits a wide variety of people, from those who have only been through a coding bootcamp to those who have been in the industry many years, like myself. The main admission criteria, from what I gather, are curiosity and friendliness.

Once admitted, you do a “batch”— either a mini (1 week), half-batch (6 weeks), or a full batch (12 weeks). I did a half-batch.

How does it work (during a global pandemic)?

The Recurse experience used to be entirely in-person, in a space in New York City - if you wanted to go, you needed to move there at least temporarily. Obviously that’s out the window during a Global Pandemic, and all activities are currently happening online. This was actually pretty ideal for me at this point in my life, as it allowed me to participate entirely remotely from my home in Hamilton, Ontario, Canada (near Toronto).

There’s a few elements that make “Virtual RC” tick:

A virtual space (pictured below) where you can see other people in your cohort. This is particularly useful when you want to jump into a conference room.
A shared “calendar” where people can schedule events, either adhoc (e.g. a one off social event, discussing a paper) or on a regular basis (e.g. a reading group)
A zulip chat server (which is a bit like Slack) for adhoc conversation with people in your cohort and alumni. There are multiple channels, covering a broad spectrum of interests.

Why does it work?

So far, what I’ve described probably sounds a lot like any remote tech workplace during the pandemic… and it sort of is! In some ways, my schedule and life while at Recurse didn’t feel all that different from my normal day-to-day. Wake up in the morning, drink coffee, meditate, work for roughly 8 hours, done. Qualitatively, however, my experience at Recurse felt unusually productive, and I learned a lot more than I expected to: not just the core stuff related to Irydium, but also unexpected new concepts like CRDTs, product design, and even how visual studio code syntax highlighting works.

What made the difference? Certainly, not having the normal pressures of a workplace helps - but I think there’s more to it than that. The way RC is constructed reinforces a sense of psychological safety which I think is key to learning and growth.

What is psychological safety and why should I care?

Psychological safety is a bit of a hot topic these days and there’s a lot of discussion about in management circles. I think it comes down to a feeling that you can take risks and “put yourself out there” without fear that you’ll be ignored, attacked, or ridiculed.

Why is this important? I would argue, because knowledge work is about building understanding — going from a place of not understanding to understanding. If you’re working on anything at all innovative, there is always an element of the unknown. In my experience, there is virtually always a sense of discomfort and uncertainty that goes along with that. This goes double when you’re working around and with people that you don’t know terribly well (and who might have far more experience than you). Are they going to make fun of you for not knowing a basic concept or for expressing an idea that’s “so wrong I don’t even know where to begin”? Or, just as bad, will you not get any feedback on your work at all?

In reality, except in truly toxic environments, you’ll rarely encounter outright abusive behaviour. But the isolation of remote work can breed similar feelings of disquiet and discomfort over time. My sense, after a year of working “hardcore” remote in COVID times, is that our normal workplace rituals of meetings, “stand ups”, and discussions over Slack don’t provide enough space for a meaningful sense of psychological safety to develop. They’re good enough for measuring progress towards agreed-upon goals but a true sense of belonging depends on less tightly scripted interactions among peers.

How the Recurse environment creates psychological safety

But the environment I described above isn’t that different from a workplace, is it? Speaking from my own experience, my coworkers at Mozilla are all pretty nice people. There’s also many channels for informal discussion at Mozilla, and of course direct messaging is always available (via Slack or Matrix). And yet, I still feel there is a pretty large gap between the two experiences. So what makes the difference? I’d say there were three important aspects of Recurse that really helped here: social rules, gentle prompts, and a closed space.

There’s been a lot of discussion about community participation guidelines and standards of behaviour in workplaces. In general, these types of policies target really egregious behaviour like harassment: this is a pretty low bar. They aren’t, in my experience, sufficient to actually create an environment that actually feels safe.

The Recurse Center goes over and above a basic code of conduct, with four simple social rules:

No well-actually’s: corrections that aren’t relevant to the point someone was trying to make (this is probably the rule we’re most heavily conditioned to break).
No feigned surprise: acting surprised when someone doesn’t know something.
No backseat driving: lobbing advice from across the room (or across the online chat) without really joining or engaging in a conversation.
No subtle -isms: subtle expressions of racism, sexism, ageism, homophobia, transphobia and other kinds of bias and prejudice.

These rules aren’t “commandments” and you’re not meant to feel shame for violating them. The important thing is that by being there, the rules create an environment conducive to learning and growth. You can be reasonably confident that you can bring up a question or discussion point (or respond to one) and it won’t lead to a bad outcome. For example, you can expect not to be made fun of for asking what a UNIX socket is (and if you are, you can tell the person doing so to stop). Rather than there being an unspoken rule that everyone should already know everything about what they are trying to do, there is a spoken rule that states it’s expected that they don’t.

Working on Irydium, there’s an infinite number of ways I can feel incompetent: this is a requirement when engaging with concepts that I still don’t feel completely comfortable with: parsers, compilers, WebAssembly… the list goes on. Knowing that I could talk about what I’m working on (or something I’m interested in) and that the responses I got would be constructive and directed to the project, not the person made all the difference.¹

Gentle prompts

The thing I loved the most about Recurse were the gentle prompts to engage with other people, talk about your work, and get help. A few that I really enjoyed during my time there:

The “checkins” channel. People would post what’s going on with their time at RC, their challenges, their struggles. Often there would be little snippits about people’s lives in there, which built to a feeling of community.
Hack & Tell: A weekly event where a group of us would get together in a Zoom room, talk about working on or building something, then rejoin the chat an hour later to show off what we accomplished.
Coffee Chats: A “coffee chat” bot at RC would pair you with other people in your batch (or alumni) on a cadence of your choosing. I met so many great people this way!
Weekly Presentations: At the end of each week, people would sign up to share something that they were working on our learned.

… and I could go on. What’s important are not the specific activities, but their end effect of building connectedness, creating opportunities for serendipitous collaboration and interaction (more than one discussion group came out of someone’s checkin post on Zulip) and generally creating an environment well-suited to learning.

A (semi) closed space

One of the things that makes the gentle prompts above “work” is that you have some idea of who you’re going to be interacting with. Having some predictability about who’s going to see what you post and engage with you (that they were vetted by RC’s interview process and are committed to the above-mentioned social rules) gives you some confidence to be vulnerable and share things that you might be reluctant to otherwise.

Those who have known me for a while will probably see the above as being a bit of departure from what I normally preach: throughout my tenure at Mozilla, I’ve constantly pushed the people I’ve worked with to do more work in public. In the case of a product like Firefox, which touches so many people, I think open and transparent practices are absolutely essential to building trust, creating opportunity, and ensuring that our software reflects a diversity of views. I applied the same philosophy to Irydium’s development while I was at the Recurse Center: I set up a public Matrix channel to discuss the project, published all my work on GitHub, and was quite chatty about what I was working on, both in this blog and on Twitter.

The key, I think, is being deliberate about what approach you take when: there is a place for both public and private conversations about what we work on. I’m strongly in favour of open design documents, community calls, public bug trackers and open source in general. But I think it’s also pretty ok to have smaller spaces for learning, personal development, and question asking. I know I strongly appreciated having a smaller group of people that I could talk to about ideas that were not yet fully formed: you can always bring them out into the open later. The psychological risk of working in public can be mitigated by the psychological safety that can be developed within an intentional community.

Bringing it back

Returning to my job, I wondered if it might be possible to bring some of what I described above back to Mozilla? Obviously not everything would be directly transferable: Mozilla has its own mission and goals, and there are pressures that exist in a workplace that do not exist in an environment purely directed at learning. Still, I suspected that there was something we could do here. And that it would be worth doing, not just to improve the felt experience of the people here (though that would be reason enough) but also to get more feedback on our work and create more opportunities for collaboration and innovation.

I felt like trying to do something inside our particular organization (Data Engineering and Data Science) would be the most tractable initial step. I talked a bit about my experience with Will Kahn-Green (who has been at Mozilla around the same length of time as I have) and we came up with what we called the “Data Neighbourhood” project: a set of grassroots micro-initiatives to increase our connectedness as a group. As an organization directed primarily at serving other parts of Mozilla, most of our team’s communication is directed outward. It’s often hard to know what everyone else is up to, where they’re struggling, and how we could help each other out. Attacking that problem directly seemed like the best place to start.

The first experiment we tried was a “data checkins” channel on Slack, a place for people to talk informally about their work (or life!). I explicitly set it up with a similar set of social rules as outlined above and tried to emphasize that it was a place to talk about how things are going, rather than a place to report status to your manager. After a somewhat slow start (the initial posts were from Will, myself, and a few other people from Data Engineering who had been around for a long time) we’re beginning to see engagement from others, including some newer people I hadn’t interacted with much before. There’s also been a few useful threads of conversations across different sub-teams (for example, a discussion on how we identify distinct versions of Firefox for iOS) that likely would not have happened without the channel.

Since then, others have tried a few other things in the same vein (an adhoc coffee chat pairing bot, a “writing help” channel) and there are some signs of success. There’s clearly an appetite for new and better ways for us to relate to each other about the work we’re doing, and I’m excited to see how these ideas evolve over time.

I suspect there are limits to how psychologically safe a workplace can ever feel (and some of that is probably outside of any individual’s control). There are dynamics in a workplace which make applying some of Recurse’s practices difficult. In particular, a posture of “not knowing things is o.k.” may not apply perfectly to a workplace where people are hired (and promoted) based on perceived competence and expertise. Still, I think it’s worth investigating what might be possible within the constraints of the system we’re in. There are big potential benefits, for our creative output and our well-being.

Many thanks to Jenny Zhang, Kathleen Beckett, Joe Trellick, Taylor Phebillo and Vaibhav Sagar, and Will Kahn-Greene for reviewing earlier drafts of this post

This is generally considered best practice inside workplaces as well. For example, see Google’s guide on how to write code review comments. ↩

Python dependency gotchas: always go to the source

2021-08-16T20:06:09Z

Getting back into the swing of things at Mozilla after my extended break. I’m currently working on enhancing and extending Looker support for Glean-based applications, which eventually led me back to working on bigquery-etl, our framework for creating derived datasets in our data lake.

I spent some time working on improving the initial developer experience of bigquery-etl early this year, so I figured it would be no problem to get going again despite an extended hiatus from it (I think it’s probably been ~2–3 months since I last touched it). Unfortunately the first thing I got after creating a fresh virtual environment (to pick up the new dependency updates) was this exciting looking error:

wlach@antwerp bigquery-etl % ./bqetl --help
Traceback (most recent call last):
  ...
  File "/Users/wlach/src/bigquery-etl/venv/lib/python3.9/site-packages/google/cloud/bigquery_v2/types/__init__.py", line 16, in <module>
    from .encryption_config import EncryptionConfiguration
  File "/Users/wlach/src/bigquery-etl/venv/lib/python3.9/site-packages/google/cloud/bigquery_v2/types/encryption_config.py", line 26, in <module>
    class EncryptionConfiguration(proto.Message):
  File "/Users/wlach/src/bigquery-etl/venv/lib/python3.9/site-packages/proto/message.py", line 200, in __new__
    file_info = _file_info._FileInfo.maybe_add_descriptor(filename, package)
  File "/Users/wlach/src/bigquery-etl/venv/lib/python3.9/site-packages/proto/_file_info.py", line 42, in maybe_add_descriptor
    descriptor=descriptor_pb2.FileDescriptorProto(
TypeError: descriptor to field 'google.protobuf.FileDescriptorProto.name' doesn't apply to 'FileDescriptorProto' object

What I did

Since we have pretty decent continuous integration at Mozilla, when I see an error like this I am usually pretty sure it’s some kind of strange interaction between my local development environment and whatever dependencies we’ve specified for the repository in question. Usually these problems are pretty easy to solve.

First thing I tried was to type the error into Google, to see if this had come up for anyone else before. I tried several variations of TypeError: descriptor to field and FileDescriptorProto and nothing really turned up. This strategy almost always turns up something. When it doesn’t it usually indicates that something pretty strange is happening.

To see if this was a strange problem particular to us, I asked on our internal channel but no one had offhand seen or heard of this error either. One of my colleagues (who had a working setup on a Mac, the same environment I was using) suggested I set up pyenv to isolate my development environment, which was a good idea but did not seem to solve the problem: both Python 3.8 and 3.9 installed via pyenv ran into the exact same issue.

After flailing around trying a number of other failed approaches (maybe I need to upgrade the version of virtualenv that we’re using?), I broke down and looked harder at the error itself. It seemed to be some kind of typing error in Google’s protobuf library, which google-cloud-bigquery is calling. If this sort of thing was happening to everyone, we probably would have seen it happening more broadly. So my guess, again, was that it was happening due to an obscure interaction between some variable on my machine and this particular combination of dependencies.

At this point, I systematically went through our set of python dependencies to see what might be the matter. For the most part, I found nothing surprising or suspicious. google-api-core was at the latest version, as was google-cloud-bigquery. However, I did notice that the version of protobuf we were using was a little older (3.15.8 when the latest “official” version on pypi was 3.17.3).

It seemed like a longshot that the problem was there, but it seemed like upgrading the dependency was worth a try just in case. So I bumped the version of protobuf to the latest version in my local checkout (pip install protobuf==3.17.3)…

… and sure enough, after doing so, the problem was fixed and ./bqetl --help started working again:

wlach@antwerp bigquery-etl % ./bqetl --help
Usage: bqetl [OPTIONS] COMMAND [ARGS]...

  CLI tools for working with bigquery-etl.

...

After doing so, I did up a quick pull request and the problem is now fixed, at least for me.

It’s a bit unfortunate that dependabot (which we have configured for this repository) didn’t send an update for protobuf, which would have fixed this problem earlier.¹ It seems like it’s not completely reliable for python packages, for whatever reason: I have also noticed this problem with mozregression.

I suspect (though can’t confirm) that the problem here is a backwards-incompatible change made to either protobuf or one of the packages that uses it. However, the nature of the incompatibility seems subtle: bigquery-etl works fine with the old set of dependencies we run in continuous integration and it appears to only come up in specific circumstances (i.e. mine). Unfortunately, I need to get back to what I was actually planning to work on and don’t have time to unwind the rather set of complex interactions going on here. Maybe later!

What I would have done differently

This kind of illustrates (again) to me that while some shortcuts and heuristics can save a bunch of time and mental effort (Googling things all the time is basically standard practice in the industry at this point), sometimes you really just need to start a little closer at the problem to find a solution. I was hesitant to do this in this case because I’m never sure where those kinds of rabbit holes are going to take me (e.g. I spent several days debugging a bad interaction between Kubernetes and our airflow cluster in late 2019 with not much to show for the effort), but often all it takes is understanding the general shape of the problem to move you to a quick solution.

Other lessons

Here’s a couple of other things this experience reinforced for me (these are more subjective, take them or leave them):

Local development environments are kind of a waste of time. The above work took me several hours and it’s going to result in ~zero user-visible improvements for anyone outside of Mozilla Data Engineering. I’m excited about the potential productivity improvements that might come from using tools like GitHub Codespaces.
While I can’t confirm this was the source of the problem in this particular case, in general backwards compatibility on every level is super important when your software has broad reach and doubly so if it’s a widely-used dependency of other software (and is thus hard to reason about in isolation). In these cases, what seems like a trivial change (e.g. improving the type signatures inside a Python library) can squander many hours of people’s time if you’re not careful. Backwards-incompatible changes, however innocuous they may seem, should always invoke a major version bump.
Likewise, bugs in software that have broad usage (like dependabot) can have big downstream impacts. If dependabot’s version bumping for python was more reliable, we likely wouldn’t have had this problem. The glass-half-full interpretation of this is that fixing these types of issues would have an outsized benefit for the commons.

As an aside, the main reason we use dependabot and aggressively update packages like google-api-core is due to a bug in pip. ↩

Lightweight dashboards and reports with Irydium and surge.sh

2021-08-03T15:37:35Z

One of my main goals with Irydium is to allow it to be a part of as many data science and engineering workflows as possible (including ones I haven’t thought of). Yes, like Iodide and other products, I am (slowly) building a web-based interface for building and sharing dashboards, reports, and similar things. However, I also want to fully support local and command-line based workflows. Beyond the obvious utility of being able to use your favorite text-editor to create documents, this also opens up the possibility of combining Irydium with other tools and workflows. For a slightly longer exposition on why this is desirable, I would highly recommend reading Ryan Harter’s post on the subject: Don’t make me code in your text box.

Using the irydium template

To make getting started easier, I just created an irydium-template: a simple GitHub repository which contains a minimal markdown document (a big mac index visualization) which you can use as a base, as well as a bit of npm scaffolding to get you up and running quickly. To check it out via the console, I recommend using degit (the tool of choice for such things in the Svelte community):

npx degit git@github.com:irydium/irydium-template.git my-notebook
npm install
npm run dev

This will create a webserver which renders the document (index.md) at port 3000, along with some debugging options. As you edit and save the document, the site should update automatically.

Publishing your work

When you’re happy with the results, you can create a static version of the site (an index.html file) by running npm run build. You can publish this via whatever you like: GitHub pages, Netlify / Vercel or… my new favorite service, surge.sh. Surge provides a really simple hosting service for hosting static sites and works great with Irydium. Installing and running it locally is two commands:

npm install -g surge
surge

Surge will prompt you for an email and a password, then will automatically publish your site at a unique URL. As an example, I published a site for the above template: few-blade.surge.sh

Interested in chatting more about this? Feel free to reach out on the Irydium Gitter chat.

Irydium @ Recurse Updates

2021-07-28T19:16:42Z

Some quick updates on where Irydium is at, roughly a week-and-a-half before my mini-sabbatical at the Recurse Centre ends.

JupyterBook and MyST

I’d been admiring JupyterBook from afar for some time: their project philosophy appealed to me greatly. In particular, the MyST extensions to markdown seemed like a natural fit for this project and a natural point of collaboration and cross-pollination. A couple of weeks ago, I finally got in touch with some people working on that project, which prompted a few small efforts:

Adding Svelte support to VSCode’s MyST Plugin, which in turn prompted me to figure out why the VSCode plugin for Svelte doesn’t render inline content correctly (tl;dr: probably a bug in VSCode?)
Adding support for a couple of MyST’s built-in directives (“warning” and “note”).

I’ve become convinced that building on top of MyST is right for both Irydium and the larger community. Increasing Irydium’s support for MyST is tracked in irydium/irydium#123.

Using Irydium to build Irydium

I’ve been spending a fair bit of time thinking of how to ma ke it easier for people to build Irydium documents through composition of existing documents. Landed the first pieces of this. The first is the ability to “import” a code chunk from another irydium document. There’s a few examples of this in the new components section of irydium.dev:

In a sense this allows you to define a reusable piece of code along with both documentation and usage examples. I think this concept will be particularly useful for supporting language plugins (which I will write about in an upcoming post).

It’s a real project now

I spent a bit of time last week doing some community gardening. I still consider Irydium an “experiment” but I’d like to at least open up the possibility of it being something larger. To help make that happen, I started working on some basic project governance pieces, namely:

We have a code of conduct and contributing guidelines. I opted to go for the Contributor Covenant, which seems to be a good minimal viable social contract. I considered something proposing something more comprehensive (like the Rust Code of Conduct), but I felt that’s something for a group of people to discuss and debate, should the time come where Irydium is more than a one-person show. For now, I’ll do my best to make sure that everyone in Irydium’s orbit has a good experience.
There’s a proper issues list, including some “good first bugs” for people to look at (shout out to @m-clare for submitting the first PR to Irydium!)
We have a channel on gitter, also accessible via Matrix. Come say hi!

Next steps

There’s not a ton of time left at RC, so some of these things may have to be done in my spare time after the batch ends. That said, here’s my near-term roadmap:

Add support for code chunks to output content directly to the DOM (currently the only way to output to an Irydium document is through a Svelte component). This will be particularly important for Python support, where people expect the output of a cell running altair or matplotlib to display directly in the document (as they do in Jupyter). Tracked in irydium/irydium#122.
Integrate ellx.io’s next-generation JavaScript bundler, tokamak. This should make building irydium documents much more robust and error proof and paves the way to further improvements. Special shout-out to the ellx developers for being so friendly and open to collaboration: ellx is a novel approach to application development and definitely worth checking out if you haven’t already. Tracked in irydium/irydium#125.
Finish and document support for language plugins (and make another blog post especially about them, they’re cool!). Tracked in irydium/irydium#144.

10 years at Mozilla

2021-07-12T16:30:42Z

Yesterday (July 11, 2021) was the 10 year anniversary of starting at the Mozilla Corporation. My life has changed a ton in those years: in that time I ended a marriage, changed the city in which I live two times, and took up religion¹. Mozilla has also changed pretty drastically in my time here, especially in the last year.

Yet somehow I’m still at it, for more or less for the same reasons that led me to accept my initial offer to join the A-team.² The Internet has the immense potential to be a force for individual empowerment and yet more than ever, we see this technology used to consolidate unchecked power, spread misinformation, and generally exploit people. Mozilla is not perfect (no organization is: 10 years anywhere will teach you that), but it’s one of the few remaining counter-forces to these accelerating trends. While I’m currently taking a bit of a break to explore some stuff on my own, I am looking forward to getting back to work on the mission when I return in mid-August.

To the extent that Zen Buddhism is a religion. ↩
I’ve since moved to Data @ Mozilla ↩

Adding persistence to Irydium with Supabase

2021-07-05T15:09:56Z

Entering the second week of Recurse. Besides orientation and a few adventures in pair programming (special shout out to Jane Adams for trying out Irydium with me!), I spent most of my time attempting to get document saving & loading working with Irydium.

I learned from Iodide that not having a good document sharing story really inhibits collaboration and sharing, which is something I explicitly want to do here at the Recurse centre (and in general for this project). That said, this isn’t actually an area I want to spend a lot of time on right now: it’s the shape of problem I’ve solved many times before (and that has been solved by many others). I’d rather spend my time over the next few weeks on things I haven’t had much of a chance to look at or pursue in my day-to-day.

So, to try to keep the complexity down, I decided to take the same approach as the svelte repl, which aims only to allow the reproduction of simple examples. It allows you to save anything you type in it and also browse anything that you had previously saved. That’s not going to replace GitHub, but it’s more than enough to get started.

Supabase

So with that goal in mind, how to do go about it? If I wanted to completely fall back on my previous knowledge, I could have gone for the tried + true approach of Django / Heroku to add a persistence layer (what I did for Iodide). That would have had the benefit of being familiar but would also have increased the overall implementation complexity of Irydium considerably. In the past year, I’ve become convinced that serverless approaches to building web applications are the wave of the future, at least for applications like this one. They’re easier to set up, easier to develop, and (generally speaking) cheaper to deploy. Just before I launched, I set up irydium.dev as a static site on Netlify and it’s been a great experience: deploys are super fast and it’s easy to reason about what’s going on “under the hood” (since there’s not a much of a hood to look under).

With that in mind, I decided to take a (small) gamble and give Supabase a try for this one after determining it would be compatible with the approach I wanted to take. Supabase bills itself as a “Firebase Alternative” (Firebase is another popular solution for bootstrapping simple web applications with persistence). In contrast to Firebase, Supabase uses a standard database technologies (Postgres!) and has a nice JavaScript SDK and a bunch of well-written tutorials (including one especially for Svelte).

The naive model for integrating with Supabase is pretty simple:

Set up a Supabase application, which provides you with a unique API endpoint to make web requests (this endpoint can be exposed publicly).
Have your client authenticate with an OAuth provider (e.g. GitHub, GitLab), then store an authentication token in localStorage.
You can then make requests to the above endpoint with the authentication token, which lets Supabase use row-level security to restrict modifications to the database: in this case, we can restrict users to updating their own documents.

I’d say it probably took me 20–30 hours to get the feature working end-to-end (including documentation), which wasn’t too bad. My impressions were pretty positive: the aforementioned tutorial is pretty decent, the supabase-js library provides a nice ORM-like abstraction over SQL and integrates nicely with Svelte. In general working with Supabase felt pretty familiar to me from previous experiences writing database-backed applications, which I take as a very good sign.

The part that felt the weirdest was writing raw SQL to set up the “documents” table that Irydium uses: SQL is something I’m fairly used to writing because of my experiences at Mozilla, but I imagine this might be off-putting to someone newer to writing these types of things. Also, I have some concerns of how maintainable a Supabase database is over the long term: while it was easy enough to document the currently-simple setup instructions in the README, I do somewhat fear the prospect of managing my database via their SQL console. Something like Django’s schema migrations and management commands would be a welcome addition to Supabase’s SDK.

Netlify functions

The above approach isn’t what most people would consider to be “best practice”¹. In particular, storing credentials in localStorage is probably not the best idea for an application presenting interactive content like Irydium: it wouldn’t be particularly difficult for a malicious document to steal someone’s secret and send it somewhere it shouldn’t be.

I’m not so worried about it at this stage of the project, but one intriguing possibility here (that’s compatible with our current deploy set up) would be to write some simple Netlify Functions to do the actual interaction with Supabase, while delegating to Netlify for the authentication itself (using Netlify Identity).

I experimented writing a simple function to prove out this approach and it seems to work quite well (source, example). This particular function is making an anonymous query to the database, but I see no obstacle to handling authenticated ones as well. Having an API under a .netlify namespace seems kinda weird on first blush, but I can probably get used to it.

I want to move on to other things now (parsers! document state visualizations!) but might poke at this more later. In the mean time, if you write/build something cool at irydium.dev/repl, let me know!

See, for example, Please Stop Using Local Storage ↩

Irydium: Points of departure

2021-06-28T16:28:40Z

So it’s my first day at the Recurse centre, which I blogged briefly about last week. I thought I’d start out by going into a bit more detail about what I’m trying to do with Irydium. This post might be a bit discursive and some of my thoughts are only half-formed: my intent here is towards trying to express some of these ideas at all rather than to come up with the perfect formulation for them, which is going to take time. It is based partly on a presentation I gave at Mozilla last Friday (just before going on my 6-week leave, which starts today).

First principles

The premise of Irydium is that despite obvious advances in terms of the ability of computers to crunch numbers and analyze data, our ability to share whatever we learn from these understandings is still far too difficult, especially for people new to the field. Even for domain experts (those with the job title “Data Engineer” or “Data Scientist” or similar) this is still more difficult than one would like.

I’ve made a few observations over the past couple years of trying to explain and document Mozilla’s data platform that I think form a good starting point for trying to close the gap:

Text is pretty great. Writing, just plain text, is (in my opinion) the single best medium for giving context to data. In terms of raw information density and ability to communicate complex ideas, nothing beats it. If you haven’t read it before, the essay always bet on text (by Graydon Hoare, creator of Rust) is well worth reading.
Markdown is pretty great too. Essentially an easy-to-write superset of HTML, it’s become the medium of choice for many desktop publishing workflows and has become the basis for many efforts in the “interactive presentation” space that I’m most interested in.
Reactive Systems make Data Exposition Exposition Easier. A reactive abstraction in front of your computational model reduces development times, makes your work more reproducible and is often easier for less-experienced people to understand. I’d cite the long-standing success of Excel and the recent interest in projects like Observable as evidence for this.

Ok, so what is Irydium?

Irydium is, at heart, a way to translate markdown documents into an interactive, compelling visual presentation.

My view is that publishing markdown text on the web is very close to a solved problem, and that we should build on that success rather than invent something new. This is not necessarily a new point of view (e.g. Rmarkdown and JupyterBook have similar premises) but I think some aspects of Irydium’s approach are mildly novel (or at least within the space of “not generally accepted ideas”).

If you want to get a bit of a flavor for how it works, visit the demonstration site (irydium.dev) and play with some of the examples.

What makes Irydium different from <X>?

While there are a bunch of related projects in this space, there’s a few design principles about Irydium that make it a little different from most of what’s already out there¹:

Reactive: Irydium is reactive in the same way that a spreadsheet is — that is, any individual change you make will immediately flow to the rest of the system. This provides a more intuitive model for the creator of the document and also makes it easier to create truly interactive visualizations.
Idempotent: in Irydium, a source document will yield the same presentation every time it’s run. There’s no need to reason about what the state of the “kernel” is. This is a highly valuable property when thinking about how to make your analyses reproducible.
Familiar: Irydium uses as few novel concepts and technologies as possible: it builds on some of the best ideas and technologies produced by the open source community: Python, pyodide, Svelte, mdsvex, MyST and a few others — chosen for having a reasonably shallow learning curve.
Hackable: While I’m working on an online environment to build and share irydium documents, it’s also fully possible to do so using the tools you know and love like Visual Studio Code.

With the above caveats, there are still a number of projects that overlap with Irydium’s ideas and/or design goals. A few that seem worth mentioning here:

Iodide: This is the obvious one, at least for those who have been following my work for a while. Iodide was an experiment in making a “web native” version of a scientific notebook: it uses the cell-based computational model that will be familiar to anyone who’s used Jupyter, but all the computation happens on the client. It is probably most famous for launching pyodide, a port of Python to WebAssembly (that Irydium now uses to support Python). I feel like it has a number of design issues (some of which I’ve blogged about previously) and is not currently in active development.
Observable: Client-side reactive notebooks, commercial backing, broadly used in the D3 community. Shares Irydium’s reactive approach, departs from it in terms of using a custom file format and emphasizing their interactive editing and collaboration environment (which is indeed quite impressive). I’ve used Observable for a few small work things (example) and while there’s a lot I like about it, I am a bit non-plussed by how many wheels it reinvents and the implicit lock-in to a single vendor.²
Starboard: Similar in some ways to Iodide, but in active development. I’ve started chatting a bit with the core developers on whether there might be areas we could collaborate.
Ellx: I found out a bit about this relatively recently, via the Svelte discord. Actually very close in some ways to Irydium in terms of choices of technology (e.g. Svelte). Again, in initial chats with the core developers on possible collaborations.

Success criteria

My intent with Irydium, at this point in its development, is to prove out some concepts and see where they lead. While I’d welcome it if Irydium became a successful, widely adopted environment for building interactive data visualizations, I’d also be totally happy with other outcomes, such as:

Providing a source of ideas and/or code for other people.
Working on (or with) Irydium being a good learning experience both for myself and others

Please don’t conflate “unique” with “superior”: I’m well aware that all designs come with trade offs. In particular, Irydium’s approach will almost certainly make it difficult / impossible to directly interact with “big data” systems in an efficient way. ↩
There is at least one effort (Dataflow) to allow editing Observable documents without using Observable itself, which is interesting. ↩

Mini-sabbatical and introducing Irydium

2021-06-23T21:15:15Z

Approaching my 10-year moz-iversary in July, I’ve decided it’s time to take a bit of a mini-sabbatical: I’ll be out (and trying as hard as possible not to check bugmail) from Friday, June 25th until August 9th. During this time, I’ll be doing a batch at the Recurse Centre (something like a writer’s retreat for programmers), exploring some of my interests around data visualization and analysis that don’t quite fit into my role as a Data Engineer here at Mozilla.

In particular, I’m planning to work a bunch on a project tentatively called “Irydium”, which pursues some of the ideas I sketched out last year in my Iodide retrospective and a few more besides. I’ve been steadily working on it in my off hours, but it’s become clear that some of the things I want to pursue would benefit from more dedicated attention and the broader perspective that I’m hoping the Recurse community will be able to provide.

I had meant to write up a proper blog post to announce the project before I left, but it looks like I’m pretty much out of time. Instead, I’ll just offer up the examples on the newly-minted irydium.dev and invite people to contact me if any of the ideas on the site sounds interesting. I’m hoping to blog a whole bunch while I’m there, but probably not under the Mozilla tag. Feel free to add wrla.ch to your RSS feed if you want to follow what I’m up to!

Glean Dictionary updates

2021-06-02T16:01:52Z

(this is a cross-post from the data blog)

Lots of progress on the Glean Dictionary since I made the initial release announcement a couple of months ago. For those coming in late, the Glean Dictionary is intended to be a data dictionary for applications built using the Glean SDK and Glean.js. This currently includes Firefox for Android and Firefox iOS, as well as newer initiatives like Rally. Desktop Firefox will use Glean in the future, see Firefox on Glean (FoG).

Production URL

We’re in production! You can now access the Glean Dictionary at dictionary.telemetry.mozilla.org. The old protosaur-based URL will redirect.

Glean Dictionary + Looker = ❤️

At the end of last year, Mozilla chose Looker as our internal business intelligence tool. Frank Bertsch, Daniel Thorn, Anthony Miyaguchi and others have been building out first class support for Glean applications inside this platform, and we’re starting to see these efforts bear fruit. Looker’s explores are far easier to use for basic data questions, opening up data based inquiry to a much larger cross section of Mozilla.

I recorded a quick example of this integration here:

Note that Looker access is restricted to Mozilla employees and NDA’d volunteers. Stay tuned for more public data to be indexed inside the Glean Dictionary in the future.

Glean annotations!

I did up the first cut of a GitHub-based system for adding annotations to metrics — acting as a knowledge base for things data scientists and others have discovered about Glean Telemetry in the field. This can be invaluable when doing new analysis. A good example of this is the annotation added for the opened as default browser metric for Firefox for iOS, which has several gotchas:

Many thanks to Krupa Raj and Leif Oines for producing the requirements which led up to this implementation, as well as their evangelism of this work more generally inside Mozilla. Last month, Leif and I did a presentation about this at Data Club, which has been syndicated onto YouTube:

Since then, we’ve had a very successful working session with some people Data Science and have started to fill out an initial set of annotations. You can see the progress in the glean-annotations repository.

Other Improvements

Lots more miscellaneous improvements and fixes have gone into the Glean Dictionary in the last several months: see our releases for a full list. One thing that irrationally pleases me are the new labels Linh Nguyen added last week: colorful and lively, they make it easy to see when a Glean Metric is coming from a library:

Future work

The Glean Dictionary is just getting started! In the next couple of weeks, we’re hoping to:

Expand the Looker integration outlined above, as our deploy takes more shape.
Work on adding “feature” classification to the Glean Dictionary, to make it easier for product managers and other non-engineering types to quickly find the metrics and other information they need without needing to fully understand what’s in the source tree.
Continue to refine the user interface of the Glean Dictionary as we get more feedback from people using it across Mozilla.

If you’re interested in getting involved, join us! The Glean Dictionary is developed in the open using cutting edge front-end technologies like Svelte. Our conviction is that being transparent about the data Mozilla collects helps us build trust with our users and the community. We’re a friendly group and hang out on the #glean-dictionary channel on Matrix.

mozregression update May 2021

2021-05-10T13:56:43Z

Just wanted to give some quick updates on the state of mozregression.

Anti-virus false positives

One of the persistent issues with mozregression is that it seems to be persistently detected as a virus by many popular anti-virus scanners. The causes for this are somewhat complex, but at root the problem is that mozregression requires fairly broad permissions to do the things it needs to do (install and run copies of Firefox) and thus its behavior is hard to distinguish from a piece of software doing something malicious.

Recently there have been a number of mitigations which seem to be improving this situation:

:bryce has been submitting copies of mozregression to Microsoft so that Windows Defender (probably the most popular anti-virus software on this platform) doesn’t flag it.
I recently released mozregression 4.0.17, which upgrades the GUI dependency for pyinstaller to a later version which sets PE checksums correctly on the generated executable (pyinstaller/pyinstaller#5579).

It’s tempting to lament the fact that this is happening, but in a way I can understand it’s hard to reliably detect what kind of software is legitimate and what isn’t. I take the responsibility for distributing this kind of software seriously, and have pretty strict limits on who has access to the mozregression GitHub repository and what pull requests I’ll merge.

CI ported to GitHub Actions

Due to changes in Travis’s policies, we needed to migrate continuous integration for mozregression to GitHub actions. You can see the gory details in bug 1686039. One possibly interesting wrinkle to others: due to Mozilla’s security policy, we can’t use (most) external actions inside our GitHub repository. I thus rewrote the logic for uploading a mozregression release to GitHub for MacOS and Linux GUI builds (Windows builds are still happening via AppVeyor for now) from scratch. Feel free to check the above out if you have a similar need.

MacOS Big Sur

As of version 4.0.17, the mozregression GUI now works on MacOS Big Sur. It is safe to ask community members to install and use it on this platform (though note the caveats due to the bundle being unsigned).

Usage Dashboard

Fulfilling a promise I implied last year, I created a public dataset for mozregression and created an dashboard tracking mozregression use using Observable. There are a few interesting insights and trends there that can be gleaned from our telemetry. I’d be curious if the community can find any more!

Blog moving back to wrla.ch

2021-03-21T07:12:19Z

House keeping news: I’m moving this blog back to the wrla.ch domain from wlach.github.io. This domain sorta kinda worked before (I set up a netlify deploy a couple years ago), but the software used to generate this blog referenced github all over the place in its output, so it didn’t really work as you’d expect. Anyway, this will be the last entry published on wlach.github.io: my plan is to turn that domain into a set of redirects in the future.

I don’t know how many of you are out there who still use RSS, but if you do, please update your feeds. I have filed a bug to update my Planet Mozilla entry, so hopefully the change there will be seamless.

Why? Recent events have made me not want to tie my public web presence to a particular company (especially a larger one, like Microsoft). I don’t have any immediate plans to move this blog off of github, but this gives me that option in the future. For those wondering, the original rationale for moving to github is in this post. Looking back, the idea of moving away from a VPS and WordPress made sense, the move away from my own domain less so. I think it may have been harder to set up static hosting (esp. with HTTPS) at that time… or I might have just been ignorant.

In related news, I decided to reactivate my twitter account: you can once again find me there as @wrlach (my old username got taken in my absence). I’m not totally thrilled about this (I basically stand by what I wrote a few years ago, except maybe the concession I made to Facebook being “ok”), but Twitter seems to be where my industry peers are. As someone who doesn’t have a large organic following, I’ve come to really value forums where I can share my work. That said, I’m going to be very selective about what I engage with on that site: I appreciate your understanding.

Community @ Mozilla: People First, Open Source Second

2021-02-24T14:08:16Z

It’s coming up on ten years at Mozilla for me, by far the longest I’ve held any job personally and exceedingly long by the standards of the technology industry. When I joined up in Summer 2011 to work on Engineering Productivity¹, I really did see it as a dream job: I’d be paid to work full time on free software, which (along with open data and governance) I genuinely saw as one of the best hopes for the future. I was somewhat less sure about Mozilla’s “mission”: the notion of protecting the “open web” felt nebulous and ill-defined and the Mozilla Manifesto seemed vague when it departed from the practical aspects of shipping a useful open source product to users.

It seems ridiculously naive in retrospect, but I can remember thinking at the time that the right amount of “open source” would solve all the problems. What can I say? It was the era of the Arab Spring, WikiLeaks had not yet become a scandal, Google still felt like something of a benevolent upstart, even Facebook’s mission of “making the world more connected” sounded great to me at the time. If we could just push more things out in the open, then the right solutions would become apparent and fixing the structural problems society was facing would become easy!

What a difference a decade makes. The events of the last few years have demonstrated (conclusively, in my view) that open systems aren’t necessarily a protector against abuse by governments, technology monopolies and ill-intentioned groups of individuals alike. Amazon, Google and Facebook are (still) some of the top contributors to key pieces of open source infrastructure but it’s now beyond any doubt that they’re also responsible for amplifying a very large share of the problems global society is experiencing.

At the same time, some of the darker sides of open source software development have become harder and harder to ignore. In particular:

Harassment and micro aggressions inside open source communities is rampant: aggressive behaviour in issue trackers, personal attacks on discussion forums, the list goes on. Women and non-binary people are disproportionately affected, although this behaviour exacts a psychological toll on everyone.
Open source software as exploitation: I’ve worked with lots of contributors while at Mozilla. It’s hard to estimate this accurately, but based on some back-of-the-envelope calculations, I’d estimate that the efforts of community volunteers on projects I’ve been involved in have added up to (conservatively) to hundreds of thousands of U.S. dollars in labour which has never been directly compensated monetarily. Based on this experience (as well as what I’ve observed elsewhere), I’d argue that Mozilla as a whole could not actually survive on a sustained basis without unpaid work, which (at least on its face) seems highly problematic and creates a lingering feeling of guilt given how much I’ve benefited financially from my time here.
It’s a road to burnout. Properly managing and nurturing an open source community is deeply complex work, involving a sustained amount of both attention and emotional labour — this is difficult glue work that is not always recognized or supported by peers or management. Many of the people I’ve met over the years (community volunteers and Mozilla employees alike) have ended up feeling like it just isn’t worth the effort and have either stopped doing it or have outright left Mozilla. If it weren’t for an intensive meditation practice which I established around the time I started working here, I suspect I would have been in this category by now.

All this has led to a personal crisis of faith. Do openness and transparency inherently lead to bad outcomes? Should I continue to advocate for it in my position? As I mentioned above, the opportunity to work in the open with the community is the main thing that brought me to Mozilla— if I can’t find a way of incorporating this viewpoint into my work, what am I even doing here?

Trying to answer these questions, I went back to the manifesto that I just skimmed over in my early days. Besides openness — what are Mozilla’s values, really, and do I identify with them? Immediately I was struck by how much it felt like it was written explicitly for the present moment (even aside from the addendums which were added in 2018). Many points seem to confront problems we’re grappling with now which I was only beginning to perceive ten years ago.

Beyond that, there was also something that resonated with me on a deeper level. There were a few points, highlighted in bold, that really stood out:

The internet is an integral part of modern life—a key component in education, communication, collaboration, business, entertainment and society as a whole.
The internet is a global public resource that must remain open and accessible.
The internet must enrich the lives of individual human beings.
Individuals’ security and privacy on the internet are fundamental and must not be treated as optional.
Individuals must have the ability to shape the internet and their own experiences on the internet.
The effectiveness of the internet as a public resource depends upon interoperability (protocols, data formats, content), innovation and decentralized participation worldwide.
Free and open source software promotes the development of the internet as a public resource.
Transparent community-based processes promote participation, accountability and trust.
Commercial involvement in the development of the internet brings many benefits; a balance between commercial profit and public benefit is critical.
Magnifying the public benefit aspects of the internet is an important goal, worthy of time, attention and commitment.

I think it’s worth digging beneath the surface of these points: what is the underlying value system behind them? I’d argue it’s this, simply put: human beings really do matter. They’re not just line items in a spreadsheet or some other resource to be optimized. They are an end in of themselves. People (more so than a software development methodology) are the reason why I show up every day to do the work that I do. This is really an absolute which has enduring weight: it’s a foundational truth of every major world religion to say nothing of modern social democracy.

What does working and building in then open mean then? As we’ve seen above, it certainly isn’t something I’d consider “good” all by itself. Instead, I’d suggest it’s a strategy which (if we’re going to follow it) should come out of that underlying recognition of the worth of Mozilla’s employees, community members, and users. Every single one of these people matter, deeply. I’d argue then, that Mozilla should consider the following factors in terms of how we work in the open:

Are our spaces² generally safe for people of all backgrounds to be their authentic selves? This not only means free from sexual harassment and racial discrimination, but also that they’re able to work to their full potential. This means creating opportunities for everyone to climb the contribution curve, among other things.
We need to be more honest and direct about the economic benefits that community members bring to Mozilla. I’m not sure exactly what this means right now (and of course Mozilla’s options are constrained both legally and economically), but we need to do better about acknowledging their contributions to Mozilla’s bottom line and making sure there is a fair exchange of value on both sides. At the very minimum, we need to make sure that people’s contributions help them grow professionally or otherwise if we can’t guarantee monetary compensation for their efforts.
We need to acknowledge the efforts that our employees make in creating functional communities. This work does not come for free and we need to start acknowledging it in both our career development paths and when looking at individual performance. Similarly, we need to provide better guidance and mentorship on how to do this work in a way that does not extract too hard a personal toll on the people involved — this is a complex topic, but a lot of it in my opinion comes down to better onboarding practices (which is something we should be doing anyway) as well as setting better boundaries (both in terms of work/life balance, as well as what you’ll accept in your interactions).
Finally, what is the end result of our work? Do the software and systems we build genuinely enrich people’s lives? Do they become better informed after using our software? Do they make them better decisions? Free software might be good in itself, but one must also factor in how it is used when measuring its social utility (see: Facebook).

None of the above is easy to address. But the alternatives are either close everything down to public participation (which I’d argue will lead to the death of Mozilla as an organization: it just doesn’t have the resources to compete in the marketplace without the backing of the community) or continue down the present path (which I don’t think is sustainable either). The last ten years have shown that the “open source on auto-pilot” approach just doesn’t work.

I suspect these problems aren’t specific to Mozilla and affect other communities that work in the open. I’d be interested in hearing other perspectives on this family of problems: if you have anything to add, my contact information is below.

I’ve since moved to the Data team. ↩
This includes our internal communications channels like our Matrix instance as well as issue trackers like Bugzilla. There’s also a question of what to do about non-Mozilla channels, like Twitter or the Orange Site. Although not Mozilla spaces, these places are often vectors for harassment of community members. I don’t have any good answers for what to do about this, aside from offering my solidarity and support to those suffering abuse on these channels. Disagreement with Mozilla’s strategy or policy is one thing, but personal attacks, harassment, and character assasination is never ok— no matter where it’s happening. ↩

The Glean Dictionary

2021-01-27T13:53:44Z

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find an index of all TWiG posts online.)

On behalf of Mozilla’s Data group, I’m happy to announce the availability of the first milestone of the Glean Dictionary, a project to provide a comprehensive “data dictionary” of the data Mozilla collects inside its products and how it makes use of it. You can access it via this development URL:

https://dictionary.protosaur.dev/

The goal of this first milestone was to provide an equivalent to the popular “probe” dictionary for newer applications which use the Glean SDK, such as Firefox for Android. As Firefox on Glean (FoG) comes together, this will also serve as an index of what data is available for Firefox and how to access it.

Part of the vision of this project is to act as a showcase for Mozilla’s practices around lean data and data governance: you’ll note that every metric and ping in the Glean Dictionary has a data review associated with it — giving the general public a window into what we’re collecting and why.

In addition to displaying a browsable inventory of the low-level metrics which these applications collect, the Glean Dictionary also provides:

Code search functionality (via Searchfox) to see where any given data collection is defined and used.
Information on how this information is represented inside Mozilla’s BigQuery data store.
Where available, links to browse / view this information using the Glean Aggregated Metrics Dashboard (GLAM).

Over the next few months, we’ll be expanding the Glean Dictionary to include derived datasets and dashboards / reports built using this data, as well as allow users to add their own annotations on metric behaviour via a GitHub-based documentation system. For more information, see the project proposal.

The Glean Dictionary is the result of the efforts of many contributors, both inside and outside Mozilla Data. Special shout-out to Linh Nguyen, who has been moving mountains inside the codebase as part of an Outreachy internship with us. We welcome your feedback and involvement! For more information, see our project repository and Matrix channel (#glean-dictionary on chat.mozilla.org).

iodide retrospective

2020-11-10T14:09:20Z

A bit belatedly, I thought I’d write a short retrospective on Iodide: an effort to create a compelling, client-centred scientific notebook environment.

Despite not writing a ton about it (the sole exception being this brief essay about my conservative choice for the server backend) Iodide took up a very large chunk of my mental energy from late 2018 through 2019. It was also essentially my only attempt at working on something that was on its way to being an actual product while at Mozilla: while I’ve led other projects that have been interesting and/or impactful in my 9 odd-years here (mozregression and perfherder being the biggest successes), they fall firmly into the “internal tools supporting another product” category.

At this point it’s probably safe to say that the project has wound down: no one is being paid to work on Iodide and it’s essentially in extreme maintenance mode. Before it’s put to bed altogether, I’d like to write a few notes about its approach, where I think it had big advantanges, and where it seems to fall short. I’ll conclude with some areas I’d like to explore (or would like to see others explore!). I’d like to emphasize that this is my opinion only: the rest of the Iodide core team no doubt have their own thoughts. That said, let’s jump in.

What is Iodide, anyway?

One thing that’s interesting about a product like Iodide is that people tend to project their hopes and dreams onto it: if you asked any given member of the Iodide team to give a 200 word description of the project, you’d probably get a slightly different answer emphasizing different things. That said, a fair initial approximation of the project vision would be “a scientific notebook like Jupyter, but running in the browser”.

What does this mean? First, let’s talk about what Jupyter does: at heart, it’s basically a Python “kernel” (read: interpreter), fronted by a webserver. You send snipits of the code to the interpreter via a web interface and they would faithfully be run on the backend (either on your local machine in a separate process or a server in the cloud). Results are then be returned back to the web interface and then rendered by the browser in one form or another.

Iodide’s model is quite similar, with one difference: instead of running the kernel in another process or somewhere in the cloud, all the heavy lifting happens in the browser itself, using the local JavaScript and/or WebAssembly runtime. The very first version of the product that Hamilton Ulmer and Brendan Colloran came up with was definitely in this category: it had no server-side component whatsoever.

Truth be told, even applications like Jupyter do a fair amount of computation on the client side to render a user interface and the results of computations: the distinction is not as clear cut as you might think. But in general I think the premise holds: if you load an iodide notebook today (still possible on alpha.iodide.io! no account required), the only thing that comes from the server is a little bit of static JavaScript and a flat file delivered as a JSON payload. All the “magic” of whatever computation you might come up with happens on the client.

Let’s take a quick look at the default tryit iodide notebook to give an idea of what I mean:

%% fetch
// load a js library and a csv file with a fetch cell
js: https://cdnjs.cloudflare.com/ajax/libs/d3/4.10.2/d3.js
text: csvDataString = https://data.sfgov.org/api/views/5cei-gny5/rows.csv?accessType=DOWNLOAD

%% js
// parse the data using the d3 library (and show the value in the console)
parsedData = d3.csvParse(csvDataString)

%% js
// replace the text in the report element "htmlInMd"
document.getElementById('htmlInMd').innerText = parsedData[0].Address

%% py
# use python to select some of the data
from js import parsedData
[x['Address'] for x in parsedData if x['Lead Remediation'] == 'true']

The %% delimiters indicate individual cells. The fetch cell is an iodide-native cell with its own logic to load the specified resources into a JavaScript variables when evaluated. js and py cells direct the browser to interpret the code inside of them, which causes the DOM to mutate. From these building blocks (and a few others), you can build up an interactive report which can also be freely forked and edited by anyone who cares to.

In some ways, I think Iodide has more in common with services like Glitch or Codepen than with Jupyter. In effect, it’s mostly offering a way to build up a static web page (doing whatever) using web technologies— even if the user interface affordances and cell-based computation model might remind you more of a scientific notebook.

What works well about this approach

There’s a few nifty things that come from doing things this way:

The environment is easily extensible for those familiar with JavaScript or other client side technologies. There is no need to mess with strange plugin architectures or APIs or conform to the opinions of someone else on what options there are for data visualization and presentation. If you want to use jQuery inside your notebook for a user interface widget, just import it and go!
The architecture scales to many, many users: since all the computation happens on the client, the server’s only role is to store and retrieve notebook content. alpha.iodide.io has happily been running on Heroku’s least expensive dyno type for its entire existence.
Reproducibility: so long as the iodide notebook has no references to data on third party servers with authentiction, there is no stopping someone from being able to reproduce whatever computations led to your results.
Related to reproducibility, it’s easy to build off of someone else’s notebook or exploration, since everything is so self-contained.

I continue to be surprised and impressed with what people come up with in the Jupyter ecosystem so I don’t want to claim these are things that “only Iodide” (or other tools like it) can do— what I will say is that I haven’t seen many things that combine both the conciseness and expressiveness of iomd. The beauty of the web is that there is such an abundance of tutorials and resources to support creating interactive content: when building iodide notebooks, I would freely borrow from resources such as MDN and Stackoverflow, instead of being locked into what the authoring software thinks one should be able express.

What’s awkward

Every silver lining has a cloud and (unfortunately) Iodide has a pretty big one. Depending on how you use Iodide, you will almost certainly run into the following problems:

You are limited by the computational affordances provided by the browser: there is no obvious way to offload long-running or expensive computations to a cluster of machines using a technology like Spark.
Long-running computations will block the main thread, causing your notebook to become extremely unresponsive for the duration.

The first, I (and most people?) can usually live with. Computers are quite powerful these days and most data tasks I’m interested in are easily within the range of capabilities of my laptop. In the cases where I need to do something larger scale, I’m quite happy to fire up a Jupyter Notebook, BigQuery Console, or <other tool of choice> to do the heavy-lifting, and then move back to a client-side approach to visualize and explore my results.

The second is much harder to deal with, since it means that the process of exploration that is so key to scientific computing. I’m quite happy to wait 5 seconds, 15 seconds, even a minute for a longer-running computation to complete but if I see the slow script dialog and everything about my environment grinds to a halt, it not only blocks my work but causes me to lose faith in the system. How do I know if it’s even going to finish?

This occurs way more often than you might think: even a simple notebook loading pandas (via pyodide) can cause Firefox to pop up the slow script dialog:

Why is this so? Aren’t browsers complex beasts which can do a million things in parallel these days? Only sort of: while the graphical and network portions of loading and using a web site are highly parallel, JavaScript and any changes to the DOM can only occur synchronously by default. Let’s break down a simple iodide example which trivially hangs the browser:

%% js

while (1) { true }

link (but you really don’t want to)

What’s going on here? When you execute that cell, the web site’s sole thread is now devoted to running that infinite loop. No longer can any other JavaScript-based event handler be run, so for example the text editor (which uses Monaco under the hood) and menus are now completely unresponsive.

The iodide team was aware of this issue since I joined the project. There were no end of discussions about how to work around it, but they never really come to a satisfying conclusion. The most obvious solution is to move the cell’s computation to a web worker, but workers don’t have synchronous access to the DOM which is required for web content to work as you’d expect. While there are projects like ComLink that attempt to bridge this divide, they require both a fair amount of skill and code to use effectively. As mentioned above, one of the key differentiators between iodide and other notebook environments is that tools like jQuery, d3, etc. “just work”. That is, you can take a code snipit off the web and run it inside an iodide notebook: there’s no workaround I’ve been able to find which maintains that behaviour and ensures that the Iodide notebook environment is always responsive.

It took a while for this to really hit home, but having some separation from the project, I’ve come to realize that the problem is that the underlying technology isn’t designed for the task we were asking of it, nor is it likely to ever be in the near future. While the web (by which I mean not only the browser, but the ecosystem of popular tooling and libraries that has been built on top of it) has certainly grown in scope to things it was never envisioned to handle like games and office applications, it’s just not optimized for what I’d call “editable content”. That is, the situation where a web page offers affordances for manipulating its own representation.

While modern web browsers have evolved to (sort of) protect one errant site from bringing the whole thing down, they certainly haven’t evolved to protect a particular site against itself. And why would they? Web developers usually work in the terminal and text editor: the assumption is that if their test code misbehaves, they’ll just kill their tab and start all over again. Application state is persisted either on-disk or inside an external database, so nothing will really be lost.

Could this ever change? Possibly, but I think it would be a radical rethinking of what the web is, and I’m not really sure what would motivate it.

A way forward

While working on iodide, we were fond of looking at this diagram, which was taken from this study of the data science workflow:

It describes how people typically perform computational inquiry: typically you would poke around with some raw data, run some transformations on it. Only after that process was complete would you start trying to build up a “presentation” of your results to your audience.

Looking back, it’s clear that Iodide’s strong suit was the explanation part of this workflow, rather than collaboration and exploration. My strong suspicion is that we actually want to use different tools for each one of these tasks. Coincidentally, this also maps to the bulk of my experience working with data at Mozilla, using iodide or not: my most successful front-end data visualization projects were typically the distilled result of a very large number of adhoc explorations (python scripts, SQL queries, Jupyter notebooks, …). The actual visualization itself contains very little computational meat: basically “just enough” to let the user engage with the data fruitfully.

Unfortunately much of my work in this area uses semi-sensitive internal Mozilla data so can’t be publicly shared, but here’s one example:

link

I built this dashboard in a single day to track our progress (resolved/open bugs) when we were migrating our data platform from AWS to GCP. It was useful: it let us quickly visualize which areas needed more attention and gave upper management more confidence that we were moving closer to our goal. However, there was little computation going on in the client: most of the backend work was happening in our Bugzilla instance: the iodide notebook just does a little bit of post-processing to visualize the results in a useful way.

Along these lines, I still think there is a place in the world for an interactive visualization environment built on principles similar to iodide: basically a markdown editor augmented with primitives oriented around data exploration (for example, charts and data tables), with allowances to bring in any JavaScript library you might want. However, any in-depth data processing would be assumed to have mostly been run elsewhere, in advance. The editing environment could either be the browser or a code editor running on your local machine, per user preference: in either case, there would not be any really important computational state running in the browser, so things like dynamic reloading (or just closing an errant tab) should not cause the user to lose any work.

This would give you the best of both worlds: you could easily create compelling visualizations that are easy to share and remix at minimal cost (because all the work happens in the browser), but you could also use whatever existing tools work best for your initial data exploration (whether that be JavaScript or the more traditional Python data science stack). And because the new tool has a reduced scope, I think building such an environment would be a much more tractable project for an individual or small group to pursue.

mozregression and older builds

2020-08-17T19:01:57Z

Periodically the discussion comes up about pruning away old stored Firefox build artifacts in S3. Each build is tens of megabytes, multiply that by the number of platforms we support and the set of revisions we churn through on a daily basis, and pretty soon you’re talking about real money.

This came up recently in a discussion about removing the legacy taskcluster deployment — what do we actually lose by cutting back our archive of integration builds? The main reason to keep them around is to facilitate bisection testing with mozregression, to find out when a bug was introduced. Up to now, discussions about this have been a bit hand-wavey: we do keep logs about who’s accessing old builds, but it’s never been clear whether it was mozregression accessing them or something else.

Happily, now that mozregression has some telemetry, it’s a little easier to get some answers on what people are actually doing. This query gets the distribution of build ages (launched or bisected) over the past 6 months, at a month long granularity.¹ Ages are relative to the date mozregression was launched: for example, if someone asked for a build from May 2019 in June 2020, the number would be “13”.

SELECT metrics.string.usage_app AS app,
       metrics.string.usage_build_type AS build_type,
       DATE_DIFF(DATE(submission_timestamp), IF(LENGTH(metrics.datetime.usage_bad_date) > 0, PARSE_DATE('%Y-%m-%d', substr(metrics.datetime.usage_bad_date, 1, 10)), PARSE_DATE('%Y-%m-%d', substr(metrics.datetime.usage_launch_date, 1, 10))), MONTH) + 1 AS build_age
FROM `moz-fx-data-shared-prod`.org_mozilla_mozregression.usage
WHERE DATE(submission_timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH)
  AND client_info.app_display_version NOT LIKE '%dev%'
  AND LENGTH(metrics.string.usage_build_type) > 0
  AND (LENGTH(metrics.datetime.usage_bad_date) > 0
       OR LENGTH(metrics.datetime.usage_launch_date) > 0)

I ran this query on sql.telemetry.mozilla.org and generated a box plot, broken down by product and build type:

link (requires Mozilla LDAP)

Unsurprisingly, Firefox shippable builds are the number one thing people try to bisect. Let’s take a little bit of a closer look at what’s going on there:

The median value is 1, which indicates that most people are bisecting builds within one month of the day in which mozregression was run. And the upper fence result is 6, suggesting that most of the time people are looking at a regression range that is within a 6 month range. However, looking more closely at the data points themselves (the little points in the chart above), there are a considerable number of outliers where a range greater than 20 months was asked for.

… which brings up to the question that we want to answer. Given that getting old builds isn’t that common (which we sort of knew already, based on the access patterns in the S3 logs), what is the impact of the times that we do? And it’s here where I have to throw up my hands and say “I don’t know” and suggest that we go back to empirical observation and user research.

You can go back to the thread I linked above, and see that core Firefox/Gecko developers find the ability to get a precise regression range for older revisions valuable. One thing that’s worth mentioning is that mozregression isn’t run that often, compared to a product that we ship: on the order of 50 to 100 times per a day. But when it comes to internal tooling, a small amount of use might have a big impact: if a mozregression invocation a developer a few hours (or more), that’s a real benefit to Firefox and Mozilla. The same argument might apply here, where a small number of bisections on older builds might have a disproportionate impact on the quality of the product.

I only added the telemetry to capture this information relatively recently, so we’re actually only looking at about a month of data in this post. We’ll have more complete results later this year. ↩