What could be wrong?
A few weeks ago I posted a “state of webcompat testing” report on this blog.
Here’s an update - some interesting progress happening.
For test result storage, I’ve written a script for storing compat results in a database. We’ve named the project “compat entomology” because entomology is the systematic study of bugs and insects. The code is on GitHub and runs live on a Mozilla test PaaS service. (That link isn’t very interesting - to get real data you need to append data/domain - for example http://compatentomology.com.paas.allizom.org/data/yahoo.com). It stores several data points our different test scripts generate, plus screenshots.
I’ve updated the three test scripts I use (Compatipede 1, a SlimerJS script and the Marionette-based testsites.py) to submit data to the compat-entomology DB. Effectively, every single test I run now results in publicly accessible data that can help us analyze bugs, or find regressions and fixes.
Even more exciting news: the volunteers who had been working on the system we like to call “Compatipede 2” have now released their code as an open source project on GitHub. Their project is Node.js based, runs both SlimerJS and PhantomJS for cross-engine comparison capabilities, and is written with scalability in mind - it can control browsers on servers across the world and test in hundreds of tabs simultaneously.
Compatipede 2 is quite possibly the strongest contender for the One True Site Compat Testing Framework title. However, we need to extend it with some features we’ve prototyped in various other scripts, either in Compatipede 2 itself or in “client” scripts that will drive the testing using the Compatipede 2 infrastructure.
That brings us to the “roadmap” part of this post.
PLANS - aka roadmap
REPORTING. We will grow a significant collection of test data and screenshots. We need some good reporting approach that can boil all this data down to minimal and informative stuff. I’ve started experimenting with one “screenshot review”-like page and one simple reporting script based on existing site lists on arewecompatibleyet.com (any list ID from that site works in the URL, but not all sites have test data). This work is just starting and needs plenty of experiments and refining.
DB QUERY REGRESSION TESTS We can now define “tests” that just ask the DB whether a failure was seen during the last test run. This, however, requires us to figure out some format to say “such a failure is related to bug nnn”. We want to be able to say
“if the most recent test result for site domain.tld with ua family ‘gecko’ and engine ‘gecko’ contains a ‘TypeError: b is undefined’ JS error message, the bug still exists. Otherwise, the issue needs re-testing.”
“if the most recent test result for site domain.tld with ua family ‘gecko’ and engine ‘webkit’ contains a CSS error saying selector ‘.navigation_menu’, property ‘display’ had value ‘-webkit-flex’, the bug still exists. Otherwise, the issue needs re-testing.”
We need to translate that to JSON (or even SQL?) and write a script that can output reports per bug number.
C2 STORE DATA Compatipede 2 (or its client script) should learn to submit results to compat-entomology - screenshots and data - like our other test scripts already do.
C2 PLUGINS Compatipede 2 (or its client script) should run the plugins we used to collect data in Compatipede 1. (Potentially, my SlimerJS test runner and testsites.py can also grow support for those plugins, although we should not spend much time on developing those if it turns out Compatipede 2 is the one we want to use).
C2 MARIONETTE Compatipede 2 can also be used to test devices and emulators that support Marionette if we write a small Python bridge script that sets up a web server and supports the same commands Compatipede 2 sends.
C2 LOG ERRORS Some of the other tests scripts can now log CSS errors, JS errors and network/SSL errors that appear in the console. Compatipede 2 should also be able to do so.
Some “nice to have” goals:
- C2 (or its client) should also be able to run our existing regression tests
- Review and potentially refine the “plugin” feature from C1
- Add more datapoints, and more clever ones. More plug-ins!
- Develop a script that can run in Phantom and find all the elements in the page that have -webkit- CSS problems
- ..then develop a script that can check for corresponding elements in Gecko and check what styles apply
- Maybe use the above information to help analyse screenshots - e.g. if you click a specific part of the screenshot, it lists the elements that have -webkit- CSS in that part of the page..
I have now opened 10 issues that cover significant parts of this roadmap (plus some small and easy stuff) on these two Github projects:
- Compat-entomology - the database server
- Compatipede2-client - the scripts to make use of the “Jannah” AKA Compatipede 2 infrastructure
I have also added details about how to run a local instance to the README file for https://github.com/Asynchq/jannah.
Want to have a real impact on an interesting site compatibility project? Now is a good time - dive right in.
Both Chrome and Firefox are now working on their support for the Clipboard API spec, which I’ve been editing for the last couple of years.
Clipboard support has been a neglected area for web sites. It was and still is hard-to-impossible to paste an image into an online editor, for example. Part of the reason is that while copying and pasting seems like simple functionality, under the hood it’s really complex: lots of data types you may want to or need to handle, safety precautions and considerations, extremely important and sensitive privacy concerns..
Spec’ing this we’ve tried to strike a balance between various concerns, and this is basically how it is intended to work - in a nutshell:
- Scripts can use copy/cut events (triggered by your copy/cut commands in the browser’s trusted UI) to modify the data and data types that will end up on the clipboard
- Scripts can use paste events (again when triggered by trusted UI) to read data from the clipboard. It will be given access to any “supported” data types and be able to process for example HTML code placed on the clipboard by Word and other editors.
- Scripts can trigger copy/cut actions with document.execCommand() if the JS thread is considered user-initiated (here we re-use the browser’s “allowed to show a popup” logic, typically this means running from for example a click or mouseup event - check that link for full details.)
- Scripts can not trigger paste actions, not even from user-initiated threads (though the spec leaves the door open to implementing specific permission UIs that will allow this on a per-site basis).
The only capability that is new to the web platform (although partially supported in IE for years) is getting data from the clipboard on paste events. Click-to-copy is widely used, powered by Flash. Modifying what ends up on the clipboard in copy and cut events is trivial if you change the selection or DOM from a mouseup or keypress event. In a sense, we’re taking baby steps.
Even so, there are some fully legitimate concerns that are being raised about the potential for abuse of this API. I think we’ve found a pretty good balance, and that any further annoyance can be handled in social ways by users complaining to badly behaved sites, blogging about them etc. This is already happening. In some ways, the new and simpler API will also make blocking this functionality simpler - it’s easier to write an extension to modify document.execCommand() behaviour than one double-guessing what a Flash applet is up to.
..as of April 16th, 2015
When testing the compatibility of web sites and browsers, there’s lots of potential for automation. We can automate some of the work related to:
- Discovery of problems
- Triage of bug reports dealing with problems
- Regression testing to check if former issues have re-surfaced when a website is updated
We can for example automatically identify certain problems that will cause a site to get classified as failed/incompatible. These include at least:
- Flexbox and background gradient webkit stuff w/o equivalents in a selector that is applied on any of the pages we visit
- Redirects to different sites based on User-Agent
- For mobile content: Dependency on Flash or other plugin stuff w/o equivalents (i.e. OBJECT w/o VIDEO tag)
- If a page has a video, it must play without errors
- WAP served with a WAP-specific MIME type
- Invalid XHTML causing parse errors
- Custom written tests with a plug-in architecture to check for i.e. buggy versions of specific scripts
The same approach can be used to compare the behaviour of new versions and current releases, to have greater assurance that the update will not break the web and pinpoint risky regressions. Basically, we can use the Web (or at least a given subset of it) as our browser engine test suite.
Current implementations (AKA prototypes)
Site compat tester extension
- Ran as Firefox extension
- No longer developed
- JSON-based description of tests (but with function expressions)
- Regression testing only
- Crude WAP detection
- Runs tests with SlimerJS
- Supports several test types for regression testing
- Mixed content
- Custom tests, bug-specific
- Reads webcomptest JSON format.
- Can track console errors.
- Batch mode for regression tests (not as good for exploratory tests)
- Can run both SlimerJS and PhantomJS (but not fully tested across both yet)
- URL player based on Mozilla Marionette
- Can spoof, load URLs, click, submit forms…
- Generates, compares, splices screenshots that can be reviewed on arewecompatibleyet.com (example review).
- Records differences between code sent to different UAs, generates webcomptest JSON format.
- Generated tests are reviewed and used with slimertester.js for regression testing.
- Limited support for interactivity with sites: has code for automated login to services.
- Based on Mozilla Marionette, can control browser on device (e.g. Flame phone).
- Sets up web server that accepts commands.
- Used to sync browsing actions on laptop and device - device loads same URL, clicks and scrolling is reproduced automatically (depends on some JS injected into the page to monitor your actions and send commands to the Python script, plus a proxy to forward the commands if the script doesn’t have cross-origin privileges).
- Can check for problems, e.g. be told to check if a given element exists
- Struggles with frames/iframes.
- No recording of the results (yet)
- Based on Mozilla Marionette, helps with bug triage
- Accepts bug search URL as input. Goes through each bug, launches URLs automatically on device.
- Interacts with bug trackers - can generate comments and add screenshots.
- Finds contact points.
- Does HTTP header checking.
- Headless browser, based on GTK-Webkit. Runs only on *nix.
- Batch operation over many URLs.
- Plugin architecture makes it easy to add new “tests”.
- Logic for finding CSS issues.
- Resource scan feature tests included CSS and JS against a given regexp.
- Somewhat “trigger-happy” in classifying sites as compat failures.
- Logs results to MongoDB
Repository: none yet
Compatipede 2 is under development. Based on SlimerJS and PhantomJS, it simplifies comparisons across browser engines - not just spoofing as another browser but actually rendering pages with that engine.
Repository: https://github.com/hallvors/css-fixme Script for identifying CSS issues and suggesting fixes. CSS logic here is probably more refined than in Compatipede 1 (and written in JS, whereas the logic in Compatipede 1 is in Python). Should be compared / reviewed - has received some feedback and bug fixes from Daniel Holbert.
The goal is to develop a service that includes many of the best features from those prototypes.
- Run minimum two distinct browser engines, default to Gecko and WebKit (prototyped in Compatipede 2).
- Define what binaries to use for both engines, enabling comparisons of Gecko.current and Gecko.next (unknown).
- Set UA string (affecting both navigator.userAgent and HTTP) (testsites.py, Compatipede 1, Compatipede 2, slimertester.js)
- Set UA string separately per engine
- Run explorative tests from a list of URLs including
- comparisons of HTTP headers / redirects (from Compatipede 1)
- analysis of applied CSS (from Compatipede 1, css-fixme.htm)
- logging and analysis of JS errors (Rudimentary support in slimerjstester.js, no comparisons)
- Ideally both those logged to the console and those caught by the page in try..catch.
- Run regression tests described in the JSON(-like) format used by testsites.py and slimertester.js.
- Take screenshots (testsites.py, Compatipede 2)
- Enable easily adding new tests or statistics through a “plugin” architecture (Compatipede 1)
- Resource scan (Compatipede 1)
- Logging results to database (Compatipede 1)
- Log existence of OBJECT, EMBED, AUDIO and VIDEO tags (none, but trivial via plugin APIs like Compatipede 1)
- Discover WAP and XHTML MIME types and flag sites that send these to one UA but not another
- Log in to sites automatically (testsites.py)
- Screenshot comparison, flagging those with greater differences (testsites.py)
- Write JSON files that can be used for regression testing. (testsites.py)
- Look for contact points on web sites, e.g. direct links to “contact us” forms (dualdriver.py)
- Bug search mode - give a link to bug tracker, it will scan all URLs in those bugs (dualdriver.py)
- Tagging bugs automatically - for example to set “serversniff” and “contactready” in whiteboard when HTTP redirects differ (None)
- Suggesting bug comments - for human review/cut’n’paste? (dualdriver.py - to some extent)
Given that most of these features already exist in various scripts that are useful prototypes for the final “Mozilla Compatipede” (or whatever we end up calling the project), it doesn’t seem overly ambitious to pull them together, refine them and create a really useful tool. However, there’s one more piece of the puzzle to consider - and it’s one we haven’t gotten right so far. Let’s call it..
Some of our past efforts (with Compatipede 1 as perhaps the best example) failed because it’s easy to generate a lot of data, but hard to present parts of it in a way that’s useful and a context that’s relevant. Compatipede 1 can scan thousands of sites and generate megabytes of statistics. Our next-gen service will be even better at generating data. To make this useful, we need to spend considerable attention on the data presentation and data usability problems.
We should develop a service that, based on a couple of inputs like a host name and an optional User-Agent / browser name, returns known information (test results/statistics, links to screenshots). More importantly, we should develop an extension that will modify bug tracker issue pages, vet the information carefully, and present the most relevant parts of it (differences between engines, screenshots, contact URLs) right there in the bug. We use the bug trackers all the time - having carefully selected, relevant information presented right there for cut-and-paste into comments and analysis is going to make the data on sites with known bugs most useful. (I have written an experimental extension earlier, something more powerful and polished than that would work.)
Secondly, we need a tool similar to (likely based on) the screenshot review UI, but including all the information that indicates there is a problem on a web site we don’t have bugs for. Information reviewers will mark the difference as “not a problem”, mark it as related to an existing bug, or report a new issue.
(I used the surprisingly nice StackEdit markdown editor to draft this article. It deserves a link.)
OK, this might seem like a simple assignment: write a script or tool to find and analyze web site problems by comparing the code sent to different User-Agents.
I’ve made several attempts during the last couple of years - initially writing a desktop Firefox add-on using the add-on SDK, then moved on to different experiments based on Mozilla Marionette or SlimerJS. At the moment, the regressiontests that generate results for AreWeCompatibleYet.com run with SlimerJS using this script while I use Marionette and this script for exploratory testing and for generating screenshots.
One and a half years ago, at the Mozilla summit in Brussels I met a Mozilla volunteer, Seif Lotfy, who was interested in helping us write tools for web compatibility. He wrote the Compatipede tool which does a good job at exploratory testing. It has a plugin-based architecture where you can easily add new things to test for. However, it also has a couple of drawbacks: Underneath the hood it uses the GTK WebKit API, meaning the browser engine it runs is a WebKit one. It’s also limited to running on Linux and will (obviously) slow down the machine you use it on somewhat while working.
Developers at async.london (some people appreciate ICANN’s new TLDs ;)), have now rewritten this tool into one that runs in the cloud. Compatipede 2 lets you run tests in either SlimerJS (Gecko) or PhantomJS (WebKit) - which eventually enables really interesting things like live comparison of a site’s behaviour across two engines.
I’ve been playing around with testing the new stuff. My current client script is truth be told not a shining example of the power we really have at our disposal here - it limits itself to opening one page at a time while the system actually scales to opening a large number of tabs for paralell testing.
Last week, I had an interesting use case for the new tool: Firefox on Android and Firefox OS have known problems with the Brightcove video player script - or to be more specific, the older version of this script. The new version runs pretty well. So the interesting question is: how many sites are still using the old version?
And the tool ran the Brightcove version check along with some other statistics - here’s a screenshot of some of the data:
Several of them got the “none” label (interestingly none got “new” - perhaps my detection of the new version was buggy?), and in a second iteration I added some code to detect and click links named “video” - this caught another set of sites using the old version, but not running video on their main page.
Compatipede 2 ran pretty well for this experiment. We’d like to see it open-sourced with a suitable license, so that we can invite others to contribute improvements and features, and so that we can run it on various platforms and use binaries of upcoming Firefox versions. We might base more of our testing (such as the regression testing for the AWCY site) on it. If we can use such a tool to its full potential, it will really help us push the web forward.
The basic social contracts on the web is that browsers should strive to implement standards, sites should write their code according to standards, and the authors of the standards should take great care to write implementable specs.
Sometimes, more often than you might expect, a small detail in a new spec causes a compatibility problem. When this happens, either the spec or the affected websites must change. (If neither does, the problem is simply dumped on browser vendors who must decide whether they want to follow the spec or make websites work).
Yes, sometimes we do ask the web to change itself. It happens. Perhaps it even happens more and more often. (A related example: browsers are hardening their security and start disallowing more weak encryption methods - which means a significant number of sites using these old methods need to change.)
This raises several interesting questions: how do we even find the problem in the first place? How do we find as many affected sites as possible? How do we figure out the right people to talk to when we want a site fixed?
Mozilla’s bug 1138325 is a good example of this cycle. A minor change deep in the ECMAScript engine while it is updated to the new Edition 6 version of the spec caused a problem described as “Turning RegExp#source from an instance property into an accessor breaks ClojureScript apps”. This was fixed in the ClojureScript library rather quickly. However, the library is used on several sites - and if they don’t update, the issue will remain a problem. This might make it needlessly painful for all browsers trying to update to ECMAScript edition 6.
Presumably ClojureScript http://jamesmacaulay.github.io/zelkova-todomvc/js/app.js Broken ClojureScript seen! http://jamesmacaulay.github.io/zelkova-todomvc/js/app.js Presumably ClojureScript http://www.8thlight.com/ Broken ClojureScript seen! http://www.8thlight.com/assets/eighthlight-200eb63b72445bdaca9c38e9e19f7b86.js Now opening https://www.cognician.com/ Broken ClojureScript seen! https://s3.amazonaws.com/cognician-static/js/elf/elf.js
This approach found 11 sites with old ClojureScript. There are 7 sites with presumably updated versions (the word ‘clojure’ in source but the scrip does not contain hasOwnProperty(“source”). The remaining 45 sites had no ‘clojure’ in source. They might use ClojureScript on a non-front page part of the site, or internally. Scanning the public parts of sites won’t find everything (ebay.com listed as a user is particularly intriguing).
Next step is to contact the sites and ask them to upgrade. I’ve opened bugs on webcompat.com to track this work. With some luck, the next browser engine that tries to implement ES6 might not run into this particular problem.