Monday, April 13, 2009

Content Scripts in Chromium

Here's an interesting factoid about browser extensions: lots of them are not about extending the browser at all. By my count, about 75% of the this week's top 20 Firefox extensions are more about extending the web content rendered by the browser than extending the browser itself. Similar trends exist in other browser extension systems.

Chromium extensions will be able to interact with web content too, using a feature we're calling content scripts (we've gone around and around on the name, this may not be final). The code for this is at a pretty good stopping point now, so I wanted to pause and write down what we did, why we did it, and some ideas I have for future improvements.

If you want to try it out, you can check out the beginnings of our Extension Tutorial, which covers most of what I'll talk about here.

First, some background on the feature...

Content scripts are basically the same thing as Greasemonkey scripts, with some important improvements.

You register your content scripts declaratively in your extension's manifest, like this:
"name": "My first extension",
"description": "The first extension that I made",
"version": "1.0",
"content_scripts": [
"matches": ["*", ""],
"css": ["foo.css", "bar.css"],
"js": ["hot.js", "dog.js"],
"run_at": "document_start"
The syntax for matching URLs is slightly different than in Greasemonkey. The reason for this is that we wanted to eliminate a common bug in Greasemonkey scripts, where people accidentally match URLs more loosely than they intend. A classic example is the common Greasemonkey pattern @include **, which matches every domain, not just and its subdomains.

The matching syntax used in content scripts separates the domain portion of the pattern from the path portion, making it more explicit which sites a script will run on. One way we could use this is to someday do UI like this:

Install 'My extension'?
This extension will be able to interact with
web pages on:

[ok] [cancel]

Other minor feature differences:
  • A content script can consist of multiple physical JavaScript files or CSS files, and it can also reference images or other resources included in the extension by URL.
  • Content scripts support "early injection", which allows them to request being injected before any nodes have been added to the document by using the optional "run_at" key.

Execution Environment

To understand the execution environment for content scripts, it helps to first understand the execution environment of normal web page JavaScript.

All JavaScript is defined in a context. Each DOM window gets its own context, one purpose of which is to hold the prototypes of all the global objects (Object, Array, String, and so-on). This is why when you extend Array.prototype in one frame, it doesn't affect Arrays created in other frames.

Importantly, you can call functions and access objects across contexts. This happens normally when you do something like window.frames['otherframe'].someFunction().

Here's a diagram that explains the relationship between the various objects in pretty picture form (thanks, Gliffy!):

Each context also has a single global object. When you access global variables in a JavaScript program, you are really interacting with the properties of this global object. In HTML, the global object is of course the Window object.

To make property hiding work, in Chromium's implementation, the global object is not actually the same JavaScript object that represents ("wraps") the C++ DOMWindow. There is actually a separate JavaScript object whose __proto__ points to that object. When you define global variables, it is this object where the properties are actually defined.

Ok, so how do content scripts fit into this?

Content scripts run in a very similar-looking environment. They run in a separate context, and have a separate global object. But that global object's __proto__ points at the same JS object that represents the Window.

So content scripts get their own global scope and their own set of prototypes. Variables defined in the web page won't be "visible" by default in content scripts, and the same is true in reverse. Other than that, the environment for content scripts is exactly the same as for normal JavaScript running in web pages. Writing content scripts should be exactly the same as writing JavaScript for web pages.

Sometimes it is useful to access the page's global variables. For example, in Gmail there is an API that allows Greasemonkey scripts to drive some parts of the UI. To allow this kind of functionality, the content script envionment has a special contentWindow global variable defined that can be used to access the global scope of the page's JavaScript.


Another difference from Greasemonkey is the model for accessing privileged APIs. Greasemonkey scripts have direct access to some privileged APIs. The most popular of these is GM_xmlhttpRequest, which provides access to origins other than the one for the current document. These APIs are very useful, but there have been bugs where they leaked into web content, which was bad.

In order to prevent this from being possible, Chromium extensions are split into two main pieces: a privileged part (I'll call it just 'the extension' from now on) that has access to special powerful APIs, and an unprivileged part (the content script) that runs in the renderer and has no special APIs.

The two parts cannot interact directly. In fact, they run in separate OS processes, so direct interaction is impossible. The only way they can communicate is via message passing APIs, similar to postMessage().

(NOTE: The implementation of content script messaging is still in progress and is incomplete in current trunk and dev builds)

It is the extension developer's responsibility to send only specific messages to the extension process from the renderer, and to validate those messages carefully. Extension developers need to be aware that malicious web pages could send them messages exactly the same way their content scripts can.

This design is modeled after the way Chromium itself works, where the renderers are untrusted and have to send messages to the browser process to get interesting work done.

Future Directions

I have a couple ideas for where I'd like to take this next...

Idea 1: Completely separate content scripts and page JavaScript

Right now, the way that JavaScript access to the DOM is implemented, there is essentially a global table of JavaScript wrappers for each C++ DOM object. Whenever code needs to find the JS object for a given C++ object, it consults this table:

This single table creates a bridge between any two JavaScript contexts that have access to the same DOM nodes. For example if page JavaScript does something like document.body.onclick = function() { ... }, any other code that has access to document.body will also have access to the onclick function handler that the page JavaScript defined .

This makes sense for web pages, where you want frames in the same origin to see the same sets of JavaScript variables. But for content scripts, it would be nice to wall these two worlds off from each other. It is relatively infrequent for content scripts to need to see the JavaScript enironment fo pages. It is more typical to only need access to the DOM.

In order to isolate content scripts from page JavaScript, we'd have to have separate mapping tables: one for the page JavaScript, and one for each content script. A C++ DOM node could have multiple wrappers, one for each of these "worlds". Then, when we needed to get a JavaScript object for a particular C++ object, we'd decide which table to look in based on which context the calling code was running in. Every context could only be in one "world".

We could even add assertions to the JavaScript engine that worlds are never bridged. That way if we ever had a bug, in the worst case we'd crash the renderer, not have a security problem.

If we can wall these worlds off from each other, then we can offer some increased privileges to content scripts directly, because we'd be confident that they couldn't leak to web content. You'd no longer have to go to the extension process to get cross-origin XHR, for example. This would also have the advantage of not requiring extension developers to carefully validate their messages, since we would know that page JavaScript could not send extensions messages.

We'd still probably need content scripts as they exist today if you want to interact with the JS defined by the page (for example for the Gmail API). But lots of use cases don't need that, and this idea would decrease complexity for those cases.

Idea 2: DOM Access from Extension Processes

Another idea is to offer some form of DOM access directly to extension processes. There is a team in Chromium working on an out-of-process version of the web inspector. This will clearly need some form of DOM access to work, so we can probably reuse what they do to give extension developers the ability interact with page DOM directly from their extension process.

I can imagine something simple based on querySelectorAll(). You ask for some nodes based on a CSS expression, get back a snapshot, and then send some updates. Of course, there are problems with races: the nodes might be gone by the time you send the update. But I think in most cases this would work pretty nicely I think. Again, I think we'd want to keep content scripts as they are today for more complex needs.

Yawn... Greasemonkey is great, but when do we get real extensions?

I know, I know. These aren't "real" extensions. You want to know when you'll be able to put things in the Chrome UI. Good news: that is well underway. Hopefully my next blog post will be about how to add "toolstrips" to Chromium.

Until then, have a look at content scripts and let us know what you think.


Anonymous said...

Great post. I really like your idea #1. I think that idea #1 + some cross domain HTTP request support "firewalled" by a permission declaration in the extension manifest would allow developers to build interesting mashups and shield them from the complexity of extension/content script communication.

Note: this blog is great. It would be great to see if become a central point where all extension related information/discussions are broadcasted (because otherwise the forums+wiki is a little too hard to follow)

Thank you!

Johan Sundström said...

All of the above shows lots of promise. I'd hope #2, if done, might offer XPath support too. (CSS selectors might be some order of magnitude faster, but where needed, orders of magnitude easier to read than the matching code explosion of CSS selector plus extra javascript filtering code needed to do what an XPath expression could right away.)

Am I right to hope that these sandboxes will end up a lot less toxic that Mozilla's (foreign unknown identifiers breaking your scripts, should they override them)?

More normalized to how the normal BOM looks and works (for sandboxes that have DOM access)? So there is a document object that looks and behaves as it does in normal on-web javascript land, with all the DOM 0 APIs like object.onsomething event handlers that our grandparents used before there was a whisper of the W3C, and the like?

(I suppose both questions might be a bit early to know much about, but it would be great to do away with some of the larger baggage we face in Greasemonkey-under-Mozilla.)

hagabaka said...

I would like to see a merge between the features and API of Chrome extension content scripts and userscripts/Greasemonkey. Userscripts have the advantage that they can be used on a wide range of browsers, and I think more commonly, the user is concerned with improving the interaction with the web page, not with the web browser. Most of the Chrome extensions I've used could have been implemented as just userscripts, if userscripts have the ability to access cross-origin content.

As the creator of Greasemonkey, do you think there is a way for browsers (including Chrome) to fully support its current features, with at least the same degree of security of Chrome extensions?