What could be wrong?

State of web compatibility test automation at Mozilla

2015-04-16 / Tools, Mozilla

..as of April 16th, 2015

When testing the compatibility of web sites and browsers, there’s lots of potential for automation. We can automate some of the work related to:

  • Discovery of problems
  • Triage of bug reports dealing with problems
  • Regression testing to check if former issues have re-surfaced when a website is updated

We can for example automatically identify certain problems that will cause a site to get classified as failed/incompatible. These include at least:

  • Flexbox and background gradient webkit stuff w/o equivalents in a selector that is applied on any of the pages we visit
  • Redirects to different sites based on User-Agent
  • JavaScript errors thrown in a browser we care about that do not occur with a different engine
  • For mobile content: Dependency on Flash or other plugin stuff w/o equivalents (i.e. OBJECT w/o VIDEO tag)
  • If a page has a video, it must play without errors
  • WAP served with a WAP-specific MIME type
  • Invalid XHTML causing parse errors
  • Custom written tests with a plug-in architecture to check for i.e. buggy versions of specific scripts

The same approach can be used to compare the behaviour of new versions and current releases, to have greater assurance that the update will not break the web and pinpoint risky regressions. Basically, we can use the Web (or at least a given subset of it) as our browser engine test suite.

Current implementations (AKA prototypes)

Site compat tester extension

Repository: https://github.com/hallvors/sitecomptester-extension/blob/master/package.json

  • Ran as Firefox extension
  • No longer developed
  • JSON-based description of tests (but with function expressions)
  • Regression testing only
  • Crude WAP detection

slimertester.js

Repository: https://github.com/hallvors/sitecomptester-extension/blob/master/slimertester.js

  • Runs tests with SlimerJS
  • Supports several test types for regression testing
    • WAP
    • Mixed content
    • Custom tests, bug-specific
  • Reads webcomptest JSON format.
  • Can track console errors.
  • Batch mode for regression tests (not as good for exploratory tests)
  • Can run both SlimerJS and PhantomJS (but not fully tested across both yet)

testsites.py

Repository: https://github.com/hallvors/sitecomptester-extension/blob/master/datagenerator/testsites.py

  • URL player based on Mozilla Marionette
  • Can spoof, load URLs, click, submit forms…
  • Generates, compares, splices screenshots that can be reviewed on arewecompatibleyet.com (example review).
  • Records differences between code sent to different UAs, generates webcomptest JSON format.
  • Generated tests are reviewed and used with slimertester.js for regression testing.
  • Limited support for interactivity with sites: has code for automated login to services.

marionette_remote_control.py

Repository: https://github.com/hallvors/sitecomptester-extension/blob/master/datagenerator/marionette-remote-control.py

  • Based on Mozilla Marionette, can control browser on device (e.g. Flame phone).
  • Sets up web server that accepts commands.
  • Used to sync browsing actions on laptop and device - device loads same URL, clicks and scrolling is reproduced automatically (depends on some JS injected into the page to monitor your actions and send commands to the Python script, plus a proxy to forward the commands if the script doesn’t have cross-origin privileges).
  • Can check for problems, e.g. be told to check if a given element exists
  • Struggles with frames/iframes.
  • No recording of the results (yet)

dualdriver.py

Repository: https://github.com/hallvors/marionette_utils/blob/master/dualdriver.py

  • Based on Mozilla Marionette, helps with bug triage
  • Accepts bug search URL as input. Goes through each bug, launches URLs automatically on device.
  • Interacts with bug trackers - can generate comments and add screenshots.
  • Finds contact points.
  • Does HTTP header checking.

Compatipede 1

Repository: http://github.com/seiflotfy/compatipede/

  • Headless browser, based on GTK-Webkit. Runs only on *nix.
  • Batch operation over many URLs.
  • Plugin architecture makes it easy to add new “tests”.
  • Logic for finding CSS issues.
  • Resource scan feature tests included CSS and JS against a given regexp.
  • Somewhat “trigger-happy” in classifying sites as compat failures.
  • Logs results to MongoDB

Compatipede 2

Repository: none yet

Compatipede 2 is under development. Based on SlimerJS and PhantomJS, it simplifies comparisons across browser engines - not just spoofing as another browser but actually rendering pages with that engine.

css-fixme.htm

Repository: https://github.com/hallvors/css-fixme Script for identifying CSS issues and suggesting fixes. CSS logic here is probably more refined than in Compatipede 1 (and written in JS, whereas the logic in Compatipede 1 is in Python). Should be compared / reviewed - has received some feedback and bug fixes from Daniel Holbert.

Future

The goal is to develop a service that includes many of the best features from those prototypes.

Primary features

  • Run minimum two distinct browser engines, default to Gecko and WebKit (prototyped in Compatipede 2).
  • Define what binaries to use for both engines, enabling comparisons of Gecko.current and Gecko.next (unknown).
  • Set UA string (affecting both navigator.userAgent and HTTP) (testsites.py, Compatipede 1, Compatipede 2, slimertester.js)
    • Set UA string separately per engine
  • Run explorative tests from a list of URLs including
    • comparisons of HTTP headers / redirects (from Compatipede 1)
    • analysis of applied CSS (from Compatipede 1, css-fixme.htm)
    • logging and analysis of JS errors (Rudimentary support in slimerjstester.js, no comparisons)
      • Ideally both those logged to the console and those caught by the page in try..catch.
  • Run regression tests described in the JSON(-like) format used by testsites.py and slimertester.js.
  • Take screenshots (testsites.py, Compatipede 2)
  • Enable easily adding new tests or statistics through a “plugin” architecture (Compatipede 1)
  • Resource scan (Compatipede 1)
  • Logging results to database (Compatipede 1)

Secondary features

  • Log existence of OBJECT, EMBED, AUDIO and VIDEO tags (none)
  • Discover WAP and XHTML MIME types and flag sites that send these to one UA but not another
  • Log in to sites automatically (testsites.py)
  • Screenshot comparison, flagging those with greater differences (testsites.py)
  • Write JSON files that can be used for regression testing. (testsites.py)
  • Look for contact points on web sites, e.g. direct links to “contact us” forms (dualdriver.py)
  • Bug search mode - give a link to bug tracker, it will scan all URLs in those bugs (dualdriver.py
  • Tagging bugs automatically - for example to set “serversniff” and “contactready” in whiteboard when HTTP redirects differ (None)
  • Suggesting bug comments - for human review/cut’n’paste? (dualdriver.py - to some extent)

Given that most of these features already exist in various scripts that are useful prototypes for the final “Mozilla Compatipede” (or whatever we end up calling the project), it doesn’t seem overly ambitious to pull them together, refine them and create a really useful tool. However, there’s one more piece of the puzzle to consider - and it’s one we haven’t gotten right so far. Let’s call it..

Data usability

Some of our past efforts (with Compatipede 1 as perhaps the best example) failed because it’s easy to generate a lot of data, but hard to present parts of it in a way that’s useful and a context that’s relevant. Compatipede 1 can scan thousands of sites and generate megabytes of statistics. Our next-gen service will be even better at generating data. To make this useful, we need to spend considerable attention on the data presentation and data usability problems.

We should develop a service that, based on a couple of inputs like a host name and an optional User-Agent / browser name, returns known information (test results/statistics, links to screenshots). More importantly, we should develop an extension that will modify bug tracker issue pages, vet the information carefully, and present the most relevant parts of it (differences between engines, screenshots, contact URLs) right there in the bug. We use the bug trackers all the time - having carefully selected, relevant information presented right there for cut-and-paste into comments and analysis is going to make the data on sites with known bugs most useful. (I have written an experimental extension earlier, something more powerful and polished than that would work.)

Secondly, we need a tool similar to (likely based on) the screenshot review UI, but including all the information that indicates there is a problem on a web site we don’t have bugs for. Information reviewers will mark the difference as “not a problem”, mark it as related to an existing bug, or report a new issue.

(I used the surprisingly nice StackEdit markdown editor to draft this article. It deserves a link.)

Spidering for compatibility problems: evaluating our next-gen tools

2015-04-02 / Tools, Mozilla

OK, this might seem like a simple assignment: write a script or tool to find and analyze web site problems by comparing the code sent to different User-Agents.

Now, if your first thought is using wget/curl and diff you’re not wrong but you’re going to waste your time on a tool that’s just too simple to find many of the issues we’re dealing with. The analysis simply needs to run on top of a full browser engine, doing a normal page load - running JavaScripts and applying CSS. Nothing less is good enough.

I’ve made several attempts during the last couple of years - initially writing a desktop Firefox add-on using the add-on SDK, then moved on to different experiments based on Mozilla Marionette or SlimerJS. At the moment, the regressiontests that generate results for AreWeCompatibleYet.com run with SlimerJS using this script while I use Marionette and this script for exploratory testing and for generating screenshots.

One and a half years ago, at the Mozilla summit in Brussels I met a Mozilla volunteer, Seif Lotfy, who was interested in helping us write tools for web compatibility. He wrote the Compatipede tool which does a good job at exploratory testing. It has a plugin-based architecture where you can easily add new things to test for. However, it also has a couple of drawbacks: Underneath the hood it uses the GTK WebKit API, meaning the browser engine it runs is a WebKit one. It’s also limited to running on Linux and will (obviously) slow down the machine you use it on somewhat while working.

Developers at async.london (some people appreciate ICANN’s new TLDs ;)), have now rewritten this tool into one that runs in the cloud. Compatipede 2 lets you run tests in either SlimerJS (Gecko) or PhantomJS (WebKit) - which eventually enables really interesting things like live comparison of a site’s behaviour across two engines.

I’ve been playing around with testing the new stuff. My current client script is truth be told not a shining example of the power we really have at our disposal here - it limits itself to opening one page at a time while the system actually scales to opening a large number of tabs for paralell testing.

Last week, I had an interesting use case for the new tool: Firefox on Android and Firefox OS have known problems with the Brightcove video player script - or to be more specific, the older version of this script. The new version runs pretty well. So the interesting question is: how many sites are still using the old version?

So I set up the Compatipede client to start spidering from Brightcove’s public customers list and run this JavaScript on each site:

And the tool ran the Brightcove version check along with some other statistics - here’s a screenshot of some of the data:

Brightcove results - a list of sites, some labelled as running the old scripts

Several of them got the “none” label (interestingly none got “new” - perhaps my detection of the new version was buggy?), and in a second iteration I added some code to detect and click links named “video” - this caught another set of sites using the old version, but not running video on their main page.

Compatipede 2 ran pretty well for this experiment. We’d like to see it open-sourced with a suitable license, so that we can invite others to contribute improvements and features, and so that we can run it on various platforms and use binaries of upcoming Firefox versions. We might base more of our testing (such as the regression testing for the AWCY site) on it. If we can use such a tool to its full potential, it will really help us push the web forward.

ClojureScript and RegExp.source

2015-04-02 / Site compatibility, Specs

The basic social contracts on the web is that browsers should strive to implement standards, sites should write their code according to standards, and the authors of the standards should take great care to write implementable specs.

Sometimes, more often than you might expect, a small detail in a new spec causes a compatibility problem. When this happens, either the spec or the affected websites must change. (If neither does, the problem is simply dumped on browser vendors who must decide whether they want to follow the spec or make websites work).

Yes, sometimes we do ask the web to change itself. It happens. Perhaps it even happens more and more often. (A related example: browsers are hardening their security and start disallowing more weak encryption methods - which means a significant number of sites using these old methods need to change.)

This raises several interesting questions: how do we even find the problem in the first place? How do we find as many affected sites as possible? How do we figure out the right people to talk to when we want a site fixed?

Mozilla’s bug 1138325 is a good example of this cycle. A minor change deep in the ECMAScript engine while it is updated to the new Edition 6 version of the spec caused a problem described as “Turning RegExp#source from an instance property into an accessor breaks ClojureScript apps”. This was fixed in the ClojureScript library rather quickly. However, the library is used on several sites - and if they don’t update, the issue will remain a problem. This might make it needlessly painful for all browsers trying to update to ECMAScript edition 6.

The ClojureScript project had a handy reference list listing users. Using a SlimerJS-based URL player I’ve been developing, injecting a little bit of code looking for the problematic JavaScript, I scanned through these URLs, 63 sites. Here’s an excerpt of this log:

Presumably ClojureScript http://jamesmacaulay.github.io/zelkova-todomvc/js/app.js
Broken ClojureScript seen! http://jamesmacaulay.github.io/zelkova-todomvc/js/app.js

Presumably ClojureScript http://www.8thlight.com/
Broken ClojureScript seen! http://www.8thlight.com/assets/eighthlight-200eb63b72445bdaca9c38e9e19f7b86.js

Now opening https://www.cognician.com/
Broken ClojureScript seen! https://s3.amazonaws.com/cognician-static/js/elf/elf.js

This approach found 11 sites with old ClojureScript. There are 7 sites with presumably updated versions (the word ‘clojure’ in source but the scrip does not contain hasOwnProperty(“source”). The remaining 45 sites had no ‘clojure’ in source. They might use ClojureScript on a non-front page part of the site, or internally. Scanning the public parts of sites won’t find everything (ebay.com listed as a user is particularly intriguing).

Next step is to contact the sites and ask them to upgrade. I’ve opened bugs on webcompat.com to track this work. With some luck, the next browser engine that tries to implement ES6 might not run into this particular problem.

Aptana April's fools 64bit detection

Bug of the day: https://bugzilla.mozilla.org/show_bug.cgi?id=1149421

User Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:40.0) Gecko/20100101 Firefox/40.0 Steps to reproduce: Win 8.1 64 got to http://www.aptana.com Hit link to download Actual results: Recognised win 8.1 as Mac OSX

I can reproduce - this screenshot is from Firefox on Windows 8, 64 bit:

Aptana download page

Hm.. Maybe something in this JavaScript code can’t handle a Firefox 40 UA string..?


  setCurrentScenario: function () {
    currentScenario = this.generateScenario();
    var foundScenario = null;
    for (var key in studioScenarios) {
      var targetScenario = studioScenarios[key];
      if (this.matchScenarioKey(targetScenario["studio_version"], currentScenario["studio_version"]) &&
          this.matchScenarioKey(targetScenario["studio_versiontype"], currentScenario["studio_versiontype"]) &&
          this.matchScenarioKey(targetScenario["studio_system"], currentScenario["studio_system"]) &&
          this.matchScenarioKey(targetScenario["studio_arch"], currentScenario["studio_arch"])) {
        foundScenario = targetScenario;
        //Set the download labels
        $(".aptana-download-requirements").html( foundScenario.systemRequirements );
        $(".aptana-download-system").html( foundScenario.studio_system_human );
        $(".aptana-download-version").html( foundScenario.studio_version );
        $(".aptana-download-architecture").html( foundScenario.studio_architecture_human );
        $("input[name=download_url]").val( foundScenario.downloadUrl );
        $(".submit-aptana-download-form").attr('href', foundScenario.downloadUrl);

Actually, it’s even simpler: it can’t handle any Windows browser UA string if it contains ‘x64’ - period.

Deep inside the JavaScript, all downloads for Windows are described as “studio_arch”:”x86” with an additional human-readable description string saying “(x64 compatible)” - but the description string is ignored by the code. It’s not the code above which is odd - although it fooled me into stepping through it several times.

It’s the input data.

Their “is this a 64-bit system?” code has a very convenient bug that means they do not actually detect common tokens like “WOW64”: they first lower-case the UA string, then do indexOf() with upper-case strings:


  var userAgent = navigator.userAgent.toLowerCase();
  var is_x64 = userAgent.indexOf("x86_64") >= 0 || userAgent.indexOf("x64") >= 0 || 
               userAgent.indexOf("WOW64") >= 0 || userAgent.indexOf("Win64") >= 0;
  $('input[name=studio_arch]').each(function(i){
    if (this.value == (is_x64 ? "x64" : "x86")) {
      this.checked = true;
    }
  });

So most browsers on 64bit systems won’t be detected as running on 64bit systems, and thus they will be offered the expected Win32 (64-bit compatible) download.

The code even concludes with a


    // FIXME: what do we do if no matching scenario is found?

Well - the answer to the coder’s question is that in this case the JS doesn’t set the form’s values at all, and it happens to end up with the default Mac 64bit option.

Trackcountry - sorry, backcountry.com

2015-01-08 / Found code

Statistics are important, sure. And analysis. And advertising. And customer insight, using predictive algorithms and big data to predict internet trends and behaviour. The webmasters for backcountry.com must be true believers in all of this. At least judged by the sheer number of external scripts and trackers I found in their page while looking into bug 1106810..

Here’s a potentially incomplete list based on listing the external scripts in the page I was served this morning, plus a quick skim through of the markup itself:

  • They naturally believe that getting the customer experience right starts with ForeSee, loading three script libraries
  • And they are also convinced that the Criteo Engine drives results and gives their customers personalized recommendations with these scripts.
  • It was slightly harder to find out what tracker this script is a part of, but it seems to be a library from CrazyEgg helping them see exactly what people are doing on the website
  • ..but I’m not sure if they’re satisfied with CrazyEgg after all, because look - there’s a good old Google Analytics script in the same page.
  • And because you can never have enough big data, here’s also a ScorecardResearch tracking script to ensure they get studies and reports on Internet trends and behavior
  • But why stop there when they can make every experience count by tracking and targeting customers with Optimizely? That’s two more JavaScript files to run.
  • And there are no limits to personalization, so why not be a customer of richrelevance too? They will certainly transform customer data into extraordinary customer experience with just two more scripts!
  • Speaking of tracking, there’s also the s_code.js library from the-tracker-formerly-known-as-Omniture, now Adobe analytics. This tracker is so invasive that I’ve analysed several problems where broken versions made entire sites unusable - either rendering them as empty white pages, or disabling all links and buttons. Let’s hope the version they have a copy of is reasonably bug-free..
  • And who doesn’t love saving time when managing reports and campaigns, as promised by Marin Software’s little tracker here?
  • And some inline code seems to come from Boomerang which I’m sure solves their digital marketing needs in just a few lines of JavaScript.
  • There’s a feature that pops up a “chat to a real person” - it seems to be LivePerson delivering lasting customer relationships - no wonder it takes quite a few scripts to do so.
  • They are a Google Trusted Store, which requires adding a couple of scripts.
  • Some more Google code running ads, I presume. There’s also a HTML comment with some inline scripts explaining that it’s going to “start Google Ad Services Other Conversion Tag”. And what about the JS for “Google Code for Smart Pixel List Remarketing List”? Well, let there be no doubt that Google believes their pixels are smarter than others. I got it right from the source.
  • Another inline comment is referencing “Nanigans Facebook Ads” - so let’s assume they are also taking control of their digital advertising with Nanigans.
  • Finally, why not let Mercent transforme their retail by tracking their customers too?

The mobile site also has code from New Relic - a company doing performance analysis for sites and apps. (Full disclosure: two of my ex-colleagues work there). I think I see some optimization potential here…

Also, the performance analysts might want to run this script:

document.documentElement.classList.length

and come up with some appropriate recommendations based on that number.

So we’re counting 27 scripts from 15 different companies crammed into that home page. This is the sort of sight that makes me feel I hope I haven’t looked into the future when reading their source code. But hey, I sure hope all that big data that is being generated works out for them. If you know any other good trackers they should add while they are at it, ping me on Twitter and I’ll sure let them know!