William Lachance's Log: Posts tagged 'Telemetry'urn:https-wrla-ch:-tags-Telemetry-html2021-05-10T13:56:43Zmozregression update May 2021urn:https-wrla-ch:-blog-2021-05-mozregression-update-may-20212021-05-10T13:56:43Z2021-05-10T13:56:43ZWilliam Lachance
<p>Just wanted to give some quick updates on the state of <a href="https://mozilla.github.io/mozregression">mozregression</a>.</p>
<h2 id="anti-virus-false-positives">Anti-virus false positives</h2>
<p>One of the persistent issues with mozregression is that it seems to be <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1647533">persistently detected as a virus by many popular anti-virus scanners</a>. The causes for this are somewhat complex, but at root the problem is that mozregression requires fairly broad permissions to do the things it needs to do (install and run copies of Firefox) and thus its behavior is hard to distinguish from a piece of software doing something malicious.</p>
<p>Recently there have been a number of mitigations which seem to be improving this situation:</p>
<ul>
<li>:bryce has been submitting copies of mozregression to Microsoft so that Windows Defender (probably the most popular anti-virus software on this platform) doesn’t flag it.</li>
<li>I recently <a href="https://github.com/mozilla/mozregression/releases">released mozregression 4.0.17</a>, which upgrades the GUI dependency for pyinstaller to a later version which sets PE checksums correctly on the generated executable (<a href="https://github.com/pyinstaller/pyinstaller/issues/5579">pyinstaller/pyinstaller#5579</a>).</li></ul>
<p>It’s tempting to lament the fact that this is happening, but in a way I can understand it’s hard to reliably detect what kind of software is legitimate and what isn’t. I take the responsibility for distributing this kind of software seriously, and have pretty strict limits on who has access to the mozregression GitHub repository and what pull requests I’ll merge.</p>
<h2 id="ci-ported-to-github-actions">CI ported to GitHub Actions</h2>
<p>Due to changes in Travis’s policies, we needed to migrate continuous integration for mozregression to GitHub actions. You can see the gory details in <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1686039">bug 1686039</a>. One possibly interesting wrinkle to others: due to Mozilla’s security policy, we can’t use (most) external actions inside our GitHub repository. I thus rewrote the logic for uploading a mozregression release to GitHub for MacOS and Linux GUI builds (Windows builds are still happening via <a href="https://www.appveyor.com/">AppVeyor</a> for now) <a href="https://github.com/mozilla/mozregression/blob/495ef37e701709dce3a4b76ea67ec5b1f26043be/.github/workflows/build.yml#L86">from scratch</a>. Feel free to check the above out if you have a similar need.</p>
<h2 id="macos-big-sur">MacOS Big Sur</h2>
<p>As of version 4.0.17, the mozregression GUI now works on MacOS Big Sur. It is safe to ask community members to install and use it on this platform (though <a href="https://mozilla.github.io/mozregression/install.html#mozregression-gui">note the caveats</a> due to the bundle being unsigned).</p>
<h2 id="usage-dashboard">Usage Dashboard</h2>
<p>Fulfilling a promise I implied last year, I created a <a href="https://docs.telemetry.mozilla.org/cookbooks/public_data.html">public dataset</a> for mozregression and created an <a href="https://observablehq.com/@wlach/mozregression-public-usage-dashboard">dashboard tracking mozregression use</a> using <a href="https://observablehq.com/">Observable</a>. There are a few interesting insights and trends there that can be gleaned from our telemetry. I’d be curious if the community can find any more!</p>Mozilla Telemetry in 2020: From "Just Firefox" to a "Galaxy of Data"urn:https-wrla-ch:-blog-2020-07-mozilla-telemetry-in-2020-from-just-firefox-to-a-galaxy-of-data2020-07-16T18:42:44Z2020-07-16T18:42:44ZWilliam Lachance
<p><em>(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find <a href="https://mozilla.github.io/glean/book/appendix/twig.html">an index of all TWiG posts online.</a>)</em></p>
<p><em>This is a special guest post by non-Glean-team member William Lachance!</em></p>
<p>In the last year or so, there’s been a significant shift in the way we (Data Engineering) think about application-submitted data @ Mozilla, but although we have a new application-based SDK based on these principles (<a href="https://mozilla.github.io/glean/book/index.html">the Glean SDK</a>), most of our <a href="https://telemetry.mozilla.org">data tools</a> and <a href="https://docs.telemetry.mozilla.org">documentation</a> have not yet been updated to reflect this new state of affairs.</p>
<p>Much of this story is known <em>inside</em> Mozilla Data Engineering, but I thought it might be worth jotting them down into a blog post as a point of reference for people outside the immediate team. Knowing this may provide some context for some our activities and efforts over the next year or two, at least until our tools, documentation, and tribal knowledge evolve.</p>
<p>In sum, the key differences are:</p>
<ul>
<li>Instead of just one application we care about, there are many.</li>
<li>Instead of just caring about (mostly<sup><a href="#2020-07-16-mozilla-telemetry-in-2020-footnote-1-definition" name="2020-07-16-mozilla-telemetry-in-2020-footnote-1-return">1</a></sup>) one type of ping (the Firefox <em>main</em> ping), an individual application may submit <em>many different</em> types of pings in the course of their use.</li>
<li>Instead of having both probes (histogram, scalar, or other data type) <em>and</em> bespoke parametric values in a JSON schema like the <a href="https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/data/environment.html">telemetry environment</a>, there are now only <em>metric types</em> which are explicitly defined as part of each ping.</li></ul>
<p>The new world is pretty exciting and freeing, but there is some new domain complexity that we need to figure out how to navigate. I’ll discuss that in my last section.</p>
<h2 id="the-old-world-firefox-is-king">The Old World: Firefox is king</h2>
<p>Up until roughly mid–2019, Firefox was the centre of Mozilla’s data world (with the occasional nod to Firefox for Android, which uses the same source repository). The Data Platform (often called “Telemetry”) was explicitly designed to cater to the needs of Firefox Developers (and to a lesser extent, product/program managers) and a set of bespoke tooling was built on top of our data pipeline architecture - <a href="https://ravitillo.wordpress.com/2017/01/23/an-overview-of-mozillas-data-pipeline/">this blog post from 2017 describes much of it</a>.</p>
<p>In outline, the model is simple: on the client side, assuming a given user had not turned off Telemetry, during the course of a day’s operation Firefox would keep track of various measures, called “probes”. At the end of that duration, it would submit a JSON-encoded “main ping” to our servers with the probe information and <a href="https://github.com/mozilla-services/mozilla-pipeline-schemas/blob/97bac7acaaa5cb328d7f0f7348f3ddaaae657eda/schemas/telemetry/main/main.4.schema.json">a bunch of other mostly hand-specified junk</a>, which would then find its way to a “data lake” (read: an Amazon S3 bucket). On top of this, we provided a <a href="https://github.com/mozilla/python_moztelemetry/">python API</a> (built on top of <a href="https://spark.apache.org/docs/latest/api/python/index.html">PySpark</a>) which enabled people inside Mozilla to query all submitted pings across our usage population.</p>
<p>The only type of low-level object that was hard to keep track of was the list of probes: Firefox is a complex piece of software and there are <em>many</em> aspects of it we wanted to instrument to validate performance and quality of the product - especially on the more-experimental Nightly and Beta channels. To solve this problem, a <a href="https://probes.telemetry.mozilla.org/">probe dictionary</a> was created to help developers find measures that corresponded to the product area that they were interested in.</p>
<p>On a higher-level, accessing this type of data using the python API quickly became slow and frustrating: the aggregation of years of Firefox ping data was hundreds of terabytes big, and even taking advantage of PySpark’s impressive capabilities, querying the data across any reasonably large timescale was slow and expensive. Here, the solution was to create derived datasets which enabled fast(er) access to pings and other derived measures, document them on docs.telemetry.mozilla.org, and then allow access to them through tools like <a href="https://docs.telemetry.mozilla.org/tools/stmo.html">sql.telemetry.mozilla.org</a> or the <a href="https://telemetry.mozilla.org/new-pipeline/dist.html">Measurement Dashboard</a>.</p>
<h2 id="the-new-world-more-of-everything">The New World: More of everything</h2>
<p>Even in the old world, other products that submitted telemetry <em>existed</em> (e.g. Firefox for Android, Firefox for iOS, the venerable FirefoxOS) but I would not call them first-class citizens. Most of our documentation treated them as (at best) weird edge cases. At the time of this writing, you can see this distinction clearly on docs.telemetry.mozilla.org where there is one (fairly detailed) tutorial called “Choosing a Desktop Dataset” while essentially all other products are lumped into “Choosing a Mobile Dataset”.</p>
<div class="figure"><img src="/files/2020/07/docs-tmo-pic.png" alt="" />
<p class="caption"></p></div>
<p>While the new universe of mobile products are probably the most notable addition to our list of things we want to keep track of, they’re only one piece of the puzzle. Really we’re interested in measuring <em>all the things</em> (in accordance with our <a href="https://www.mozilla.org/en-US/about/policy/lean-data/">lean data practices</a>, of course) including tools we use to <em>build our products</em> like <a href="https://wiki.mozilla.org/MozPhab">mozphab</a> and <a href="https://mozilla.github.io/mozregression">mozregression</a>.</p>
<p>In expanding our scope, we’ve found that mobile (and other products) have different requirements that influence what data we would want to send and when. For example, sending one blob of JSON multiple times per day might make sense for performance metrics on a desktop product (which is usually on a fast, unmetered network) but is much less acceptable on mobile (where every byte counts). For this reason, it makes sense to have <em>different ping types</em> for the same product, not just one. For example, Fenix (the new Firefox for Android) sends a tiny baseline ping<sup><a href="#2020-07-16-mozilla-telemetry-in-2020-footnote-2-definition" name="2020-07-16-mozilla-telemetry-in-2020-footnote-2-return">2</a></sup> on every run to (roughly) measure daily active users and a larger metrics ping sent on a (roughly) daily interval to measure (for example) a distribution of page load times.</p>
<p>Finally, we found that naively collecting certain types of data as raw histograms or inside the schema didn’t always work well. For example, encoding session lengths as plain integers <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1514392">would often produce weird results in the case of clock skew</a>. For this reason, we decided to <a href="https://mozilla.github.io/glean/book/user/metrics/index.html">standardize on a set of well-defined metrics</a> using Glean, which tries to minimize footguns. We explicitly no longer allow clients to submit arbitrary JSON or values as part of a telemetry ping: if you have a use case not covered by the existing metrics, <a href="https://wiki.mozilla.org/Glean/Adding_or_changing_Glean_metric_types">make a case for it and add it to the list</a>!</p>
<p>To illustrate this, let’s take a (subset) of what we might be looking at in terms of what the Fenix application sends:</p>
<div class="figure"><img src="/files/2020/07/fenix-pings-diagram.png" alt="" />
<p class="caption"></p></div>
<p><a href="/files/2020/07/fenix-pings-diagram.mmd">mermaid source</a></p>
<p>At the top level we segment based on the “application” (just Fenix in this example). Just below that, there are the pings that this application might submit (I listed three: the baseline and metrics pings described above, along with a “migration” ping, which tracks metrics when a user migrates from Fennec to Fenix). And below <em>that</em> there are different types of metrics included in the pings: I listed a few that came out of a quick scan of the Fenix BigQuery tables using my <a href="https://mozilla-schema-dictionary.netlify.app/#!/tables/org_mozilla_fenix.metrics">prototype schema dictionary</a>.</p>
<p>This is actually only the surface-level: at the time of this writing, Fenix has no fewer than 12 different ping types and <em>many</em> different metrics inside each of them.<sup><a href="#2020-07-16-mozilla-telemetry-in-2020-footnote-3-definition" name="2020-07-16-mozilla-telemetry-in-2020-footnote-3-return">3</a></sup> On a client level, the new Glean SDK provides easy-to-use primitives to help developers collect this type of information in a principled, privacy-preserving way: for example, <a href="https://github.com/mozilla/data-review">data review</a> is built into every metric type. But what about after it hits our ingestion endpoints?</p>
<p>Hand-crafting schemas, data ingestion pipelines, and individualized ETL scripts for such a large matrix of applications, ping types, and measurements would quickly become intractable. Instead, we (Mozilla Data Engineering) refactored our data pipeline to parse out the information from the Glean schemas and then create tables in our BigQuery datastore corresponding to what’s in them - this has proceeded as an extension to our (now somewhat misnamed) <a href="https://github.com/mozilla/probe-scraper">probe-scraper</a> tool.</p>
<p>You can then query this data directly (see <a href="https://docs.telemetry.mozilla.org/concepts/glean/accessing_glean_data.html">accessing glean data</a>) or build up a derived dataset using our SQL-based ETL system, <a href="https://github.com/mozilla/bigquery-etl/">BigQuery-ETL</a>. This part of the equation has been working fairly well, I’d say: we now have a diverse set of products producing Glean telemetry and submitting it to our servers, and the amount of manual effort required to add each application was minimal (aside from adding new capabilities to the platform as we went along).</p>
<p>What hasn’t quite kept pace is our tooling to make navigating and using this new collection of data tractable.</p>
<h2 id="what-could-bring-this-all-together">What could bring this all together?</h2>
<p>As mentioned before, this new world is quite powerful and gives Mozilla a bunch of new capabilities but it isn’t yet well documented and we lack the tools to easily connect the dots from “I have a product question” to “I know how to write an SQL query / Spark Job to answer it” or (better yet) “this product dashboard will answer it”.</p>
<p>Up until now, our defacto answer has been some combination of “Use the probe dictionary / telemetry.mozilla.org” and/or “refer to docs.telemetry.mozilla.org”. I submit that we’re at the point where these approaches break down: as mentioned above, there are many more types of data we now need to care about than just “probes” (or “metrics”, in Glean-parlance). When we just cared about the main ping, we could write dataset documentation for its recommended access point (<a href="https://docs.telemetry.mozilla.org/datasets/batch_view/main_summary/reference.html">main_summary</a>) and the raw number of derived datasets was managable. But in this new world, where we have <em>N</em> applications times <em>M</em> ping types, the number of canonical ping tables are now so many that documenting them all on docs.telemetry.mozilla.org no longer makes sense.</p>
<p>A few months ago, I thought that <a href="https://cloud.google.com/data-catalog">Google’s Data Catalog</a> (billed as offering “a unified view of all your datasets”) might provide a solution, but on further examination it only solves part of the problem: it provides only a view on your BigQuery tables and it isn’t designed to provide detailed information on the domain objects we care about (products, pings, measures, and tools). You can map some of the properties from these objects onto the tables (e.g. adding a probe’s description field to the column representing it in the BigQuery table), but Data Calalog’s interface to surfacing and filtering through this information is rather slow and clumsy and requires detailed knowledge of how these higher level concepts relate to BigQuery primitives.</p>
<p>Instead, what I think we need is a <em>new system</em> which allows a data practitioner (Data Scientist, Firefox Engineer, Data Engineer, Product Manager, whoever) to visualize the relevant set of domain objects relevant to their product/feature of interest <em>quickly</em> then map them to specific BigQuery tables and other resources (e.g. visualizations using tools like <a href="https://github.com/mozilla/glam">GLAM</a>) which allow people to quickly answer questions so we can make better products. Basically, I am thinking of some combination of:</p>
<ul>
<li>The existing probe dictionary (derived from existing product metadata)</li>
<li>A new “application” dictionary (derived from some simple to-be-defined application metadata description)</li>
<li>A new “ping” dictionary (derived from existing product metadata)</li>
<li>A BigQuery schema dictionary (I wrote up a <a href="https://mozilla-schema-dictionary.netlify.app/">prototype of this a couple weeks ago</a>) to map between these higher-level objects and what’s in our low-level data store</li>
<li>Documentation for derived datasets generated by BigQuery-ETL (ideally stored alongside the ETL code itself, so it’s easy to keep up to date)</li>
<li>A data tool dictionary describing how to easily <em>access</em> the above data in various ways (e.g. SQL query, dashboard plot, etc.)</li></ul>
<p>This might sound ambitious, but it’s basically just a system for collecting and visualizing various types of documentation— something we have proven we know how to do. And I think a product like this could be incredibly empowering, not only for the internal audience at Mozilla but also the <em>external</em> audience who wants to support us but has valid concerns about what we’re collecting and why: since this system is based entirely on systems which are already open (inside GitHub or Mercurial repositories), there is no reason we can’t make it available to the public.</p>
<div class="footnotes">
<ol>
<li id="2020-07-16-mozilla-telemetry-in-2020-footnote-1-definition" class="footnote-definition">
<p>Technically, <a href="https://docs.telemetry.mozilla.org/datasets/pings.html">there are various other types of pings</a> submitted by Firefox, but the main ping is the one 99% of people care about. <a href="#2020-07-16-mozilla-telemetry-in-2020-footnote-1-return">↩</a></p></li>
<li id="2020-07-16-mozilla-telemetry-in-2020-footnote-2-definition" class="footnote-definition">
<p>This is actually a capability that the Glean SDK provides, so other products (e.g. Lockwise, Firefox for iOS) also benefit from this capability. <a href="#2020-07-16-mozilla-telemetry-in-2020-footnote-2-return">↩</a></p></li>
<li id="2020-07-16-mozilla-telemetry-in-2020-footnote-3-definition" class="footnote-definition">
<p>The scope of this data collection comes from the fact that Fenix is a <em>very</em> large and complex application. rather than a desire to collect everything just because we can— smaller efforts like mozregression collect a <a href="https://mozilla.github.io/mozregression/documentation/telemetry.html">much more limited set of data</a>. <a href="#2020-07-16-mozilla-telemetry-in-2020-footnote-3-return">↩</a></p></li></ol></div>This Week in Glean: mozregression telemetry (part 2)urn:https-wrla-ch:-blog-2020-05-this-week-in-glean-mozregression-telemetry-part-22020-05-08T14:32:40Z2020-05-08T14:32:40ZWilliam Lachance
<p><em>(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find <a href="https://mozilla.github.io/glean/book/appendix/twig.html">an index of all TWiG posts online.</a>)</em></p>
<p><em>This is a special guest post by non-Glean-team member William Lachance!</em></p>
<p>This is a continuation of an exploration of adding Glean-based telemetry to a python application, in this case <a href="https://mozilla.github.io/mozregression">mozregression</a>, a tool for automatically finding the source of Firefox regressions (breakage).</p>
<p>When we left off <a href="/blog/2020/02/this-week-in-glean-special-guest-post-mozregression-telemetry-part-1/">last time</a>, we had written some test scripts and verified that the data was visible in the debug viewer.</p>
<h2 id="adding-telemetry-to-mozregression-itself">Adding Telemetry to mozregression itself</h2>
<p>In many ways, this is pretty similar to what I did inside the sample application: the only significant difference is that these are shipped inside a Python application that is meant to be be installable via <a href="https://pypi.org/project/pip/">pip</a>. This means we need to specify the <code>pings.yaml</code> and <code>metrics.yaml</code> (located inside the <code>mozregression</code> subirectory) as package data inside <code>setup.py</code>:</p>
<div class="brush: py">
<div class="colorful">
<pre><span></span><span class="n">setup</span><span class="p">(</span>
<span class="n">name</span><span class="o">=</span><span class="s2">"mozregression"</span><span class="p">,</span>
<span class="o">...</span>
<span class="n">package_data</span><span class="o">=</span><span class="p">{</span><span class="s2">"mozregression"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"*.yaml"</span><span class="p">]},</span>
<span class="o">...</span>
<span class="p">)</span>
</pre></div>
</div>
<p>There were also a number of Glean SDK enhancements which we determined were necessary. Most notably, Michael Droettboom added 32-bit Windows wheels to the Glean SDK, which we need to make building the <a href="https://mozilla.github.io/mozregression/quickstart.html#gui">mozregression GUI</a> on Windows possible. In addition, some minor changes needed to be made to Glean’s behaviour for it to work correctly with a command-line tool like mozregression — for example, Glean used to assume that Telemetry would always be disabled via a GUI action so that it would send a deletion ping, but this would obviously not work in an application like mozregression where there is only a configuration file — so for this case, Glean needed to be modified to check if it had been disabled <em>between</em> runs.</p>
<p>Many thanks to Mike (and others on the Glean team) for so patiently listening to my concerns and modifying Glean accordingly.</p>
<h2 id="getting-data-review">Getting Data Review</h2>
<p>At Mozilla, we don’t just allow random engineers like myself to start collecting data in a product that we ship (even a semi-internal like mozregression). We have <a href="https://wiki.mozilla.org/Firefox/Data_Collection">a process</a>, overseen by Data Stewards to make sure the information we gather is actually answering important questions and doesn’t unnecessarily collect personally identifiable information (e.g. email addresses).</p>
<p>You can see the specifics of how this worked out in the case of mozregression in <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1581647#c9">bug 1581647</a>.</p>
<h2 id="documentation">Documentation</h2>
<p>Glean has some fantastic utilities for generating markdown-based documentation on what information is being collected, which I have made available on GitHub:</p>
<p><a href="https://github.com/mozilla/mozregression/blob/master/docs/glean/metrics.md">https://github.com/mozilla/mozregression/blob/master/docs/glean/metrics.md</a></p>
<p>The generation of this documentation is <a href="https://github.com/mozilla/mozregression/blob/3454e1eafe83f53a84cb6b10f46649320d5ed097/.travis.yml#L57">hooked up to mozregression’s continuous integration</a>, so we can sure it’s up to date.</p>
<p>I also added <a href="https://mozilla.github.io/mozregression/documentation/telemetry.html">a quick note</a> to mozregression’s web site describing the feature, along with (very importantly) instructions on how to turn it off.</p>
<h2 id="enabling-data-ingestion">Enabling Data Ingestion</h2>
<p>Once a Glean-based project has passed data review, getting our infrastructure to ingest it is pretty straightforward. Normally <a href="https://mozilla.github.io/glean/book/user/adding-glean-to-your-project.html#adding-metadata-about-your-project-to-the-pipeline">we would suggest just filing a bug</a> and let us (the data team) handle the details, but since I’m <em>on</em> that team, I’m going to go a (little bit) of detail into how the sausage is made.</p>
<p>Behind the scenes, we have a collection of ETL (extract-transform-load) scripts in the <a href="https://github.com/mozilla/probe-scraper/">probe-scraper repository</a> which are responsible for parsing the ping and probe metadata files that I added to mozregression in the step above and then automatically creating BigQuery tables and updating our ingestion machinery to insert data passed to us there.</p>
<p>There’s quite a bit of complicated machinery being the scenes to make this all work, but since it’s already in place, adding a new thing like this is relatively simple. The changeset I submitted as part of a <a href="https://github.com/mozilla/probe-scraper/pull/184">pull request</a> to probe-scraper was all of 9 lines long:</p>
<div class="brush: diff">
<div class="colorful">
<pre><span></span><span class="gh">diff --git a/repositories.yaml b/repositories.yaml</span><span class="w"></span>
<span class="gh">index dffcccf..6212e55 100644</span><span class="w"></span>
<span class="gd">--- a/repositories.yaml</span><span class="w"></span>
<span class="gi">+++ b/repositories.yaml</span><span class="w"></span>
<span class="gu">@@ -239,3 +239,12 @@ firefox-android-release:</span><span class="w"></span>
<span class="w"> </span> - org.mozilla.components:browser-engine-gecko-beta<span class="w"></span>
<span class="w"> </span> - org.mozilla.appservices:logins<span class="w"></span>
<span class="w"> </span> - org.mozilla.components:support-migration<span class="w"></span>
<span class="gi">+mozregression:</span><span class="w"></span>
<span class="gi">+ app_id: org-mozilla-mozregression</span><span class="w"></span>
<span class="gi">+ notification_emails:</span><span class="w"></span>
<span class="gi">+ - wlachance@mozilla.com</span><span class="w"></span>
<span class="gi">+ url: 'https://github.com/mozilla/mozregression'</span><span class="w"></span>
<span class="gi">+ metrics_files:</span><span class="w"></span>
<span class="gi">+ - 'mozregression/metrics.yaml'</span><span class="w"></span>
<span class="gi">+ ping_files:</span><span class="w"></span>
<span class="gi">+ - 'mozregression/pings.yaml'</span><span class="w"></span>
</pre></div>
</div>
<h2 id="a-pretty-graph">A Pretty Graph</h2>
<p>With the probe scraper change merged and deployed, we can now start querying! A number of tables are automatically created according to the schema outlined above: notably “live” and “stable” tables corresponding to the usage ping. Using <a href="https://docs.telemetry.mozilla.org/tools/stmo.html">sql.telemetry.mozilla.org</a> we can start exploring what’s out there. Here’s a quick query I wrote up:</p>
<div class="brush: sql">
<div class="colorful">
<pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="nb">DATE</span><span class="p">(</span><span class="n">submission_timestamp</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="nb">date</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">metrics</span><span class="p">.</span><span class="n">string</span><span class="p">.</span><span class="n">usage_variant</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">variant</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">),</span><span class="w"></span>
<span class="k">FROM</span><span class="w"> </span><span class="o">`</span><span class="n">moz</span><span class="o">-</span><span class="n">fx</span><span class="o">-</span><span class="k">data</span><span class="o">-</span><span class="n">shared</span><span class="o">-</span><span class="n">prod</span><span class="o">`</span><span class="p">.</span><span class="n">org_mozilla_mozregression_stable</span><span class="p">.</span><span class="n">usage_v1</span><span class="w"></span>
<span class="k">WHERE</span><span class="w"> </span><span class="nb">DATE</span><span class="p">(</span><span class="n">submission_timestamp</span><span class="p">)</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s1">'2020-04-14'</span><span class="w"></span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">client_info</span><span class="p">.</span><span class="n">app_display_version</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">'%.dev%'</span><span class="w"></span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="nb">date</span><span class="p">,</span><span class="w"> </span><span class="n">variant</span><span class="p">;</span><span class="w"></span>
</pre></div>
</div>
<p>… which generates a chart like this:</p>
<div class="figure"><img src="/files/2020/05/mozregression-variant-usage.png" alt="" />
<p class="caption"></p></div>
<p>This chart represents the absolute volume of mozregression usage since April 14th 2020 (around the time when we first released a version of mozregression with Glean telemetry), grouped by mozregression “variant” (GUI, console, and mach) and date - you can see that (unsurprisingly?) the GUI has the highest usage. I’ll talk about this more in an upcoming installment, speaking of…</p>
<h2 id="next-steps">Next Steps</h2>
<p>We’re not done yet! Next time, we’ll look into making a public-facing dashboard demonstrating these results and making an aggregated version of the mozregression telemetry data publicly accessible to researchers and the general public. If we’re lucky, there might even be a bit of <em>data science</em>. Stay tuned!</p>This week in Glean (special guest post): mozregression telemetry (part 1)urn:https-wrla-ch:-blog-2020-02-this-week-in-glean-special-guest-post-mozregression-telemetry-part-12020-02-28T15:50:58Z2020-02-28T15:50:58ZWilliam Lachance
<p><em>(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find <a href="https://mozilla.github.io/glean/book/appendix/twig.html">an index of all TWiG posts online.</a>)</em></p>
<p><em>This is a special guest post by non-Glean-team member William Lachance!</em></p>
<p>As I <a href="/blog/2019/09/mozregression-update-python-3-edition/">mentioned last time</a> I talked about <a href="https://mozilla.github.io/mozregression/">mozregression</a>, I have been thinking about adding some telemetry to the system to better understand the usage of this tool, to justify some part of Mozilla spending some cycles maintaining and improving it (assuming my intuition that this tool is heavily used is confirmed).</p>
<p>Coincidentally, the Telemetry client team has been working on a new library for measuring these types of things in a principled way called <a href="https://mozilla.github.io/glean/book/index.html">Glean</a>, which even has python bindings! Using this has the potential in saving a lot of work: not only does Glean provide a framework for submitting data, our backend systems are automatically set up to process data submitted via into Glean into <a href="https://cloud.google.com/bigquery">BigQuery</a> tables, which can then easily be queried using tools like <a href="https://docs.telemetry.mozilla.org/tools/stmo.html">sql.telemetry.mozilla.org</a>.</p>
<p>I thought it might be useful to go through some of what I’ve been exploring, in case others at Mozilla are interested in instrumenting their pet internal tools or projects. If this effort is successful, I’ll distill these notes into a tutorial in the Glean documentation.</p>
<h2 id="initial-steps-defining-pings-and-metrics">Initial steps: defining pings and metrics</h2>
<p>The initial step in setting up a Glean project of any type is to define explicitly the types of pings and metrics. You can look at a “ping” as being a small bucket of data submitted by a piece of software in the field. A “metric” is something we’re measuring and including in a ping.</p>
<p>Most of the Glean documentation focuses on browser-based use-cases where we might want to sample lots of different things on an ongoing basis, but for mozregression our needs are considerably simpler: we just want to know when someone <em>has</em> used it along with a small number of non-personally identifiable characteristics of their usage, e.g. the mozregression version number and the name of the application they are bisecting.</p>
<p>Glean has <a href="https://mozilla.github.io/glean/book/user/pings/events.html">the concept of event pings</a>, but it seems like those are there more for a fine-grained view of what’s going on during an application’s use. So let’s define a new ping just for ourselves, giving it the unimaginative name “usage”. This goes in a file called <code>pings.yaml</code>:</p>
<div class="brush: yaml">
<div class="colorful">
<pre><span></span><span class="nn">---</span><span class="w"></span>
<span class="nt">$schema</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">moz://mozilla.org/schemas/glean/pings/1-0-0</span><span class="w"></span>
<span class="nt">usage</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">></span><span class="w"></span>
<span class="w"> </span><span class="no">A ping to record usage of mozregression</span><span class="w"></span>
<span class="w"> </span><span class="nt">include_client_id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span><span class="w"></span>
<span class="w"> </span><span class="nt">notification_emails</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">wlachance@mozilla.com</span><span class="w"></span>
<span class="w"> </span><span class="nt">bugs</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://bugzilla.mozilla.org/123456789/</span><span class="w"></span>
<span class="w"> </span><span class="nt">data_reviews</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://example.com/path/to/data-review</span><span class="w"></span>
</pre></div>
</div>
<p>We also need to define a list of things we want to measure. To start with, let’s just test with one piece of sample information: the app we’re bisecting (e.g. “Firefox” or “Gecko View Example”). This goes in a file called <code>metrics.yaml</code>:</p>
<div class="brush: yaml">
<div class="colorful">
<pre><span></span><span class="nn">---</span><span class="w"></span>
<span class="nt">$schema</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">moz://mozilla.org/schemas/glean/metrics/1-0-0</span><span class="w"></span>
<span class="nt">usage</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">app</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">string</span><span class="w"></span>
<span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">></span><span class="w"></span>
<span class="w"> </span><span class="no">The name of the app being bisected</span><span class="w"></span>
<span class="w"> </span><span class="nt">notification_emails</span><span class="p">:</span><span class="w"> </span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">wlachance@mozilla.com</span><span class="w"></span>
<span class="w"> </span><span class="nt">bugs</span><span class="p">:</span><span class="w"> </span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">https://bugzilla.mozilla.org/show_bug.cgi?id=1581647</span><span class="w"></span>
<span class="w"> </span><span class="nt">data_reviews</span><span class="p">:</span><span class="w"> </span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://example.com/path/to/data-review</span><span class="w"></span>
<span class="w"> </span><span class="nt">expires</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">never</span><span class="w"></span>
<span class="w"> </span><span class="nt">send_in_pings</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">usage</span><span class="w"></span>
</pre></div>
</div>
<p>The <code>data_reviews</code> sections in both of the above are obviously bogus, we will need to actually get data review before landing and using this code, to make sure that we’re in conformance with Mozilla’s <a href="https://wiki.mozilla.org/Firefox/Data_Collection">data collection policies</a>.</p>
<h2 id="testing-it-out">Testing it out</h2>
<p>But in the mean time, we can test our setup with the <a href="https://docs.telemetry.mozilla.org/concepts/glean/debug_ping_view.html">Glean debug pings viewer</a> by setting a special tag (<code>mozregression-test-tag</code>) on our output. Here’s a small python script which does just that:</p>
<div class="brush: py">
<div class="colorful">
<pre><span></span><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
<span class="kn">from</span> <span class="nn">glean</span> <span class="kn">import</span> <span class="n">Glean</span><span class="p">,</span> <span class="n">Configuration</span>
<span class="kn">from</span> <span class="nn">glean</span> <span class="kn">import</span> <span class="p">(</span><span class="n">load_metrics</span><span class="p">,</span>
<span class="n">load_pings</span><span class="p">)</span>
<span class="n">mozregression_path</span> <span class="o">=</span> <span class="n">Path</span><span class="o">.</span><span class="n">home</span><span class="p">()</span> <span class="o">/</span> <span class="s1">'.mozilla2'</span> <span class="o">/</span> <span class="s1">'mozregression'</span>
<span class="n">Glean</span><span class="o">.</span><span class="n">initialize</span><span class="p">(</span>
<span class="n">application_id</span><span class="o">=</span><span class="s2">"mozregression"</span><span class="p">,</span>
<span class="n">application_version</span><span class="o">=</span><span class="s2">"0.1.1"</span><span class="p">,</span>
<span class="n">upload_enabled</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">configuration</span><span class="o">=</span><span class="n">Configuration</span><span class="p">(</span>
<span class="n">ping_tag</span><span class="o">=</span><span class="s2">"mozregression-test-tag"</span>
<span class="p">),</span>
<span class="n">data_dir</span><span class="o">=</span><span class="n">mozregression_path</span> <span class="o">/</span> <span class="s2">"data"</span>
<span class="p">)</span>
<span class="n">Glean</span><span class="o">.</span><span class="n">set_upload_enabled</span><span class="p">(</span><span class="kc">True</span><span class="p">)</span>
<span class="n">pings</span> <span class="o">=</span> <span class="n">load_pings</span><span class="p">(</span><span class="s2">"pings.yaml"</span><span class="p">)</span>
<span class="n">metrics</span> <span class="o">=</span> <span class="n">load_metrics</span><span class="p">(</span><span class="s2">"metrics.yaml"</span><span class="p">)</span>
<span class="n">metrics</span><span class="o">.</span><span class="n">usage</span><span class="o">.</span><span class="n">app</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s2">"reality"</span><span class="p">)</span>
<span class="n">pings</span><span class="o">.</span><span class="n">usage</span><span class="o">.</span><span class="n">submit</span><span class="p">()</span>
</pre></div>
</div>
<p>Running this script on my laptop, I see that a respectable JSON payload was delivered to and processed by our servers:</p>
<p><img style="width:600px" src="/files/2020/02/glean-debug-ping-viewer.png" /></p>
<p>As you can see, we’re successfully processing both the “version” number of mozregression, some characteristics of the machine sending the information (my MacBook in this case), as well as our single measure. We also have a client id, which should tell us roughly how many distinct installations of mozregression are sending pings. This should be more than sufficient for an initial “mozregression usage dashboard”.</p>
<h2 id="next-steps">Next steps</h2>
<p>There are a bunch of things I still need to work through before landing this inside mozregression itself. Notably, the Glean python bindings are python3-only, so we’ll need to <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1426766">port the mozregression GUI to python 3</a> before we can start measuring usage there. But I’m excited at how quickly this work is coming together: stay tuned for part 2 in a few weeks.</p>Conda is pretty greaturn:https-wrla-ch:-blog-2020-01-conda-is-pretty-great2020-01-13T16:08:57Z2020-01-13T16:08:57ZWilliam Lachance
<p>Lately the data engineering team has been looking into productionizing (i.e. running in Airflow) a bunch of models that the data science team has been producing. This often involves languages and environments that are a bit outside of our comfort zone — for example, <a href="https://github.com/mozilla/missioncontrol-v2">the next version of Mission Control</a> relies on the <a href="https://mc-stan.org/users/interfaces/rstan">R-stan library</a> to produce a model of expected crash behaviour as Firefox is released.</p>
<p>To make things as simple and deterministic as possible, we’ve been building up Docker containers to run/execute this code along with their dependencies, which makes things nice and reproducible. My initial thought was to use just the language-native toolchains to build up my container for the above project, but quickly found a number of problems:</p>
<ol>
<li>For local testing, Docker on Mac is <em>slow</em>: when doing a large number of statistical calculations (as above), you can count on your testing iterations taking 3 to 4 (or more) times longer.</li>
<li>On initial setup, the default R packaging strategy is to have the user of a package like R-stan recompile from source. This can take <em>forever</em> if you have a long list of dependencies with C-compiled extensions (pretty much a given if you’re working in the data space): rebuilding my initial docker environment for missioncontrol-v2 took almost an hour. This isn’t just a problem for local development: it also makes continuous integration using a service like Circle or Travis expensive and painful.</li></ol>
<p>I had been vaguely aware of <a href="https://docs.conda.io/en/latest/">Conda</a> for a few years, but didn’t really understand its value proposition until I started working on the above project: why bother with a heavyweight package manager when you already have Docker to virtualize things? The answer is that it solves both of the above problems: for local development, you can get something more-or-less identical to what you’re running inside Docker with no performance penalty whatsoever. And for building the docker container itself, Conda’s package repository contains pre-compiled versions of all the dependencies you’d want to use for something like this (even somewhat esoteric libraries like R-stan are available on <a href="https://conda-forge.org/">conda-forge</a>), which brought my build cycle times down to less than 5 minutes.</p>
<p>tl;dr: If you have a bunch of R / python code you want to run in a reproducible manner, consider Conda.</p>Using BigQuery JavaScript UDFs to analyze Firefox telemetry for fun & profiturn:https-wrla-ch:-blog-2019-10-using-bigquery-javascript-udfs-to-analyze-firefox-telemetry-for-fun-profit2019-10-30T15:11:17Z2019-10-30T15:11:17ZWilliam Lachance
<p>For the last year, we’ve been gradually migrating our backend Telemetry systems from AWS to GCP. I’ve been helping out here and there with this effort, most recently porting a job we used to detect slow tab spinners in Firefox nightly, which produced a small dataset that feeds a <a href="https://mikeconley.github.io/bug1310250/">small adhoc dashboard</a> which Mike Conley maintains. This was a relatively small task as things go, but it highlighted some features and improvements which I think might be broadly interesting, so I decided to write up a small blog post about it.</p>
<p>Essentially all this dashboard tells you is what percentage of the Firefox nightly population saw a tab spinner over the past 6 months. And of those that did see a tab spinner, what was the severity? Essentially we’re just trying to make sure that there are no major regressions of user experience (and also that efforts to improve things bore fruit):</p>
<center><img style="width:600px" srcset="/files/2019/10/tab-spinner-dash.png" /></center>
<p>Pretty simple stuff, but getting the data necessary to produce this kind of dashboard used to be anything but trivial: while some common business/product questions could be answered by a quick query to <a href="https://docs.telemetry.mozilla.org/datasets/batch_view/clients_daily/reference.html">clients_daily</a>, getting engineering-specific metrics like this usually involved trawling through gigabytes of raw heka encoded blobs using an Apache Spark cluster and then extracting the relevant information out of the telemetry probe histograms (in this case, <code>FX_TAB_SWITCH_SPINNER_VISIBLE_MS</code> and <code>FX_TAB_SWITCH_SPINNER_VISIBLE_LONG_MS</code>) contained therein.</p>
<p>The code itself was rather complicated (<a href="https://github.com/mozilla/python_mozetl/blob/58dce245ce8012b338e8b102a8c2c0f00601be60/mozetl/tab_spinner/tab_spinner.py">take a look, if you dare</a>) but even worse, running it could get <em>very expensive</em>. We had a 14 node cluster churning through this script daily, and it took on average about an hour and a half to run! I don’t have the exact cost figures on hand (and am not sure if I’d be authorized to share them if I did), but based on a back of the envelope sketch, this one single script was probably costing us somewhere on the order of $10-$40 a day (that works out to between $3650-$14600 a year).</p>
<p>With our move to <a href="https://cloud.google.com/bigquery/">BigQuery</a>, things get a lot simpler! Thanks to the combined effort of my team and data operations[1], we now produce “stable” ping tables on a daily basis with <em>all</em> the relevant histogram data (stored as JSON blobs), queryable using relatively vanilla SQL. In this case, the data we care about is in <code>telemetry.main</code> (named after the main ping, appropriately enough). With the help of a small <a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions">JavaScript UDF</a> function, all of this data can easily be extracted into a table inside a single SQL query scheduled by <a href="https://docs.telemetry.mozilla.org/tools/stmo.html">sql.telemetry.mozilla.org</a>.</p>
<div class="brush: sql">
<div class="colorful">
<pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="n">TEMP</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"></span>
<span class="w"> </span><span class="n">udf_js_json_extract_highest_long_spinner</span><span class="w"> </span><span class="p">(</span><span class="k">input</span><span class="w"> </span><span class="n">STRING</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="n">INT64</span><span class="w"></span>
<span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">js</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="ss">"""</span>
<span class="ss"> if (input == null) {</span>
<span class="ss"> return 0;</span>
<span class="ss"> }</span>
<span class="ss"> var result = JSON.parse(input);</span>
<span class="ss"> var valuesMap = result.values;</span>
<span class="ss"> var highest = 0;</span>
<span class="ss"> for (var key in valuesMap) {</span>
<span class="ss"> var range = parseInt(key);</span>
<span class="ss"> if (valuesMap[key]) {</span>
<span class="ss"> highest = range > 0 ? range : 1;</span>
<span class="ss"> }</span>
<span class="ss"> }</span>
<span class="ss"> return highest;</span>
<span class="ss">"""</span><span class="p">;</span><span class="w"></span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">build_id</span><span class="p">,</span><span class="w"></span>
<span class="k">sum</span><span class="w"> </span><span class="p">(</span><span class="k">case</span><span class="w"> </span><span class="k">when</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">64000</span><span class="w"> </span><span class="k">then</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">end</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">v_64000ms_or_higher</span><span class="p">,</span><span class="w"></span>
<span class="k">sum</span><span class="w"> </span><span class="p">(</span><span class="k">case</span><span class="w"> </span><span class="k">when</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">27856</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">64000</span><span class="w"> </span><span class="k">then</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">end</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">v_27856ms_to_63999ms</span><span class="p">,</span><span class="w"></span>
<span class="k">sum</span><span class="w"> </span><span class="p">(</span><span class="k">case</span><span class="w"> </span><span class="k">when</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">12124</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">27856</span><span class="w"> </span><span class="k">then</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">end</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">v_12124ms_to_27855ms</span><span class="p">,</span><span class="w"></span>
<span class="k">sum</span><span class="w"> </span><span class="p">(</span><span class="k">case</span><span class="w"> </span><span class="k">when</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">5277</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">12124</span><span class="w"> </span><span class="k">then</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">end</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">v_5277ms_to_12123ms</span><span class="p">,</span><span class="w"></span>
<span class="k">sum</span><span class="w"> </span><span class="p">(</span><span class="k">case</span><span class="w"> </span><span class="k">when</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">2297</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">5277</span><span class="w"> </span><span class="k">then</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">end</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">v_2297ms_to_5276ms</span><span class="p">,</span><span class="w"></span>
<span class="k">sum</span><span class="w"> </span><span class="p">(</span><span class="k">case</span><span class="w"> </span><span class="k">when</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">1000</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">2297</span><span class="w"> </span><span class="k">then</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">end</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">v_1000ms_to_2296ms</span><span class="p">,</span><span class="w"></span>
<span class="k">sum</span><span class="w"> </span><span class="p">(</span><span class="k">case</span><span class="w"> </span><span class="k">when</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">50</span><span class="w"> </span><span class="k">then</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">end</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">v_0ms_to_49ms</span><span class="p">,</span><span class="w"></span>
<span class="k">sum</span><span class="w"> </span><span class="p">(</span><span class="k">case</span><span class="w"> </span><span class="k">when</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">50</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">100</span><span class="w"> </span><span class="k">then</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">end</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">v_50ms_to_99ms</span><span class="p">,</span><span class="w"></span>
<span class="k">sum</span><span class="w"> </span><span class="p">(</span><span class="k">case</span><span class="w"> </span><span class="k">when</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">100</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">200</span><span class="w"> </span><span class="k">then</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">end</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">v_100ms_to_199ms</span><span class="p">,</span><span class="w"></span>
<span class="k">sum</span><span class="w"> </span><span class="p">(</span><span class="k">case</span><span class="w"> </span><span class="k">when</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">200</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">400</span><span class="w"> </span><span class="k">then</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">end</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">v_200ms_to_399ms</span><span class="p">,</span><span class="w"></span>
<span class="k">sum</span><span class="w"> </span><span class="p">(</span><span class="k">case</span><span class="w"> </span><span class="k">when</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">400</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">highest</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">800</span><span class="w"> </span><span class="k">then</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">end</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">v_400ms_to_799ms</span><span class="p">,</span><span class="w"></span>
<span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="k">count</span><span class="w"></span>
<span class="k">from</span><span class="w"></span>
<span class="p">(</span><span class="k">select</span><span class="w"> </span><span class="n">build_id</span><span class="p">,</span><span class="w"> </span><span class="n">client_id</span><span class="p">,</span><span class="w"> </span><span class="k">max</span><span class="p">(</span><span class="n">greatest</span><span class="p">(</span><span class="n">highest_long</span><span class="p">,</span><span class="w"> </span><span class="n">highest_short</span><span class="p">))</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">highest</span><span class="w"></span>
<span class="k">from</span><span class="w"></span>
<span class="p">(</span><span class="k">SELECT</span><span class="w"></span>
<span class="w"> </span><span class="n">SUBSTR</span><span class="p">(</span><span class="n">application</span><span class="p">.</span><span class="n">build_id</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">build_id</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">client_id</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">udf_js_json_extract_highest_long_spinner</span><span class="p">(</span><span class="n">payload</span><span class="p">.</span><span class="n">histograms</span><span class="p">.</span><span class="n">FX_TAB_SWITCH_SPINNER_VISIBLE_LONG_MS</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">highest_long</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">udf_js_json_extract_highest_long_spinner</span><span class="p">(</span><span class="n">payload</span><span class="p">.</span><span class="n">histograms</span><span class="p">.</span><span class="n">FX_TAB_SWITCH_SPINNER_VISIBLE_MS</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">highest_short</span><span class="w"></span>
<span class="k">FROM</span><span class="w"> </span><span class="n">telemetry</span><span class="p">.</span><span class="n">main</span><span class="w"></span>
<span class="k">WHERE</span><span class="w"></span>
<span class="w"> </span><span class="n">application</span><span class="p">.</span><span class="n">channel</span><span class="o">=</span><span class="s1">'nightly'</span><span class="w"></span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">normalized_os</span><span class="o">=</span><span class="s1">'Windows'</span><span class="w"></span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">application</span><span class="p">.</span><span class="n">build_id</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">FORMAT_DATE</span><span class="p">(</span><span class="ss">"%Y%m%d"</span><span class="p">,</span><span class="w"> </span><span class="n">DATE_SUB</span><span class="p">(</span><span class="k">CURRENT_DATE</span><span class="p">(),</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="n">QUARTER</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="nb">DATE</span><span class="p">(</span><span class="n">submission_timestamp</span><span class="p">)</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">DATE_SUB</span><span class="p">(</span><span class="k">CURRENT_DATE</span><span class="p">(),</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="n">QUARTER</span><span class="p">))</span><span class="w"></span>
<span class="k">group</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">build_id</span><span class="p">,</span><span class="w"> </span><span class="n">client_id</span><span class="p">)</span><span class="w"> </span><span class="k">group</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">build_id</span><span class="p">;</span><span class="w"></span>
</pre></div>
</div>
<p>In addition to being much simpler, this new job is also <em>way</em> cheaper. The last run of it scanned just over 1 TB of data, meaning it cost us just over $5. Not as cheap as I might like, but considerably less expensive than before: I’ve also scheduled it to only run once every other day, since Mike tells me he doesn’t need this data any more often than that.</p>
<p>[1] I understand that Jeff Klukas, Frank Bertsch, Daniel Thorn, Anthony Miyaguchi, and Wesley Dawson are the principals involved - apologies if I’m forgetting someone.</p>