For better context, first check out WebTiles.
For a long time, I've had a dream to make a web page that anyone can collaborate to. I've spent a fair amount of thinking about it over the years. The main obstacles always were:
- How to structure everything without making it possible for 1 bad actor to destroy entire page
- How to safely allow people to insert their own HTML and CSS
- Most ambitious: if it would be possible to allow people to run JS code
To solve the first problem, I've decided to divide the page into grid of 250x250 tiles. This seemed like the simplest solution overall, without turning the page into unmanagable mess. It would be extremely cool to let people have a sort of free canvas, but unfortunately I never managed to actually think of any system that would allow doing this safely and anyhow performantly.
I've made it so you can grab the canvas with your mouse and move it around. The GIF above is one of the first tests of the camera.
Eventually I ended up with nice looking tiles. You could click on any of the tiles to make it active - which would then allow you to actually click on the HTML elements inside the tile, run CSS animations, JS code, etc.
To control the size of files in each tile, I implemented a simple dashboard and editor for files. Each file type had a different max size. Also, images, videos and audios had sanity checks for length and dimensions.
HTML
Let's start with HTML. The first and most obvious choice when it comes to isolation would be to just put everything into iframes. Well, that doesn't work. Not only iframes take a few seconds to load, they completely freeze the page for a while past around 10 iframes. And since canvas is big, that means you have to constantly unrender and render elements again which would simply make the page unusable.
I was thinking of simply rendering sanitized HTML directly on the page, but that would cause a huge amount of issues - overlapping ids, conflicting styles etc. I then realized that we have a perfect tool for this - Shadow DOM. It completely eliminates these issues. All elements are isolated into their own context and don't interfere with rest of the page. It would still be possible to overflow the page, but that's the issue for CSS part.
You still have to sanitize the HTML though. I used node-html-parser for this, which just went through every element and removed and transformed everything. This is what gets done:
- Forbidden tags are removed: meta, title, iframe, object, embed, applet, frame, frameset, base, svg, geolocation, permission. SVG tag has quite a lot of attack vectors, which is why it was banned. People could still use svg files in img tag though. Iframes were banned simply for performance reasons.
- Amount of elements is limited to 500. All elements after that simply get removed.
- All inline event handlers (onload, etc) are removed. This isn't exactly a security issue (see JS section why), but simply to have less ugly errors in console.
- All
src,href(exceptatag) andposterattributes are sanitized to only allow using files that user uploaded to their account. - All
atags gettarget="_blank"added. widthandheightattributes are limited to 4096px maximum to prevent gigantic media content causing performance issues.- All
videoandaudiotags haveautoplayremoved andpreload="metadata"added. inputandtextareagetautocomplete="off"added to prevent passwords from being autocompleted.- All
linktags are converted intostyletag based on the linked file. The CSS inside is sanitized. - If
scripttag hassrc, it's read from file and converted into inline script. - All
scripttags gettype="text/tilescript"added to prevent usual browser JS execution. - All forms sanitized to disallow actions sent to same origin.
- All
styletags are CSS-sanitized.
Overall, this was effective and worked well, until someone actually managed to get a couple of forbidden elements in. After some investigation, I've found that they had a corrupted HTML, which after sanitization would contain different elements than the ones it started with. The solution was to run the parser multiple times and stop when HTML stopped changing. If HTML kept changing after 3 loops, it would get nuked.
Also, a tags have a special behavior implemented that allowed people to have multiple pages on their tile. On click, if href was a relative page, then tile would simply be re-rendered with the new path.
CSS
CSS was fairly simple, as Shadow DOM already does heavy lifting here. Every shadow DOM would sit in container element that would have overflow: hidden, and a fairly new property called contain. It would be set to strict, which prevents any escapes and hopefully helps with performance.
For such a element-heavy website, I felt like will-change would be perfect, but in the end it ended up making performance much worse and cause random rendering issues I've never seen before. And yes, I'm certain I used it on proper elements that *would* change.
One of the limitations of having CSS inside Shadow DOM is that for some reason font-face doesn't work. For this, I had to parse the CSS and add the font declarations outside the Shadow DOM. They would be removed once tile gets unrendered.
For CSS sanitization, I resisted the urge to parse CSS with regexused lightningcss library. It allows going through every token very efficiently. The documentation was horrible, but I managed to do it after a lot of tinkering. The list is smaller:
- All
importdeclarations are removed. - All
animation-play-stateproperties are removed. This is since there's a*by default withanimation-play-state: paused, to prevent all CSS animations playing at once. They get enabled when tile gets clicked on. - All
url()functions were sanitized in the same way assrcattributes were. Only user-uploaded files are allowed. - All length values (
px, etc) are set to maximum of 4096px to prevent performance issues.
And once again, it was only escaped once. I didn't know this, but apparently :host can escape the Shadow DOM, so all declarations with it are removed.
JavaScript
Here comes the most interesting part... I really didn't know if this would be possible until the very last moment. I really wanted to implement this, as tiles would be much, much more interesting than if only HTML+CSS was allowed.
There is a Content Security Policy set on the page that would disallow inline scripts from running. I spent quite a lot of time trying out different approaches to running unsafe code.
- WebWorkers - one of the first approaches I thought of. It would be quite difficult to ensure safety. While impossible to freeze the page, there would be attack vectors possible (for example unsafe network requests). Also, it would be impossible to pause the execution. It'd be possible to kill it and restart (on clicking off and on the tile), but that's not the nicest solution.
- realms-shim - well, this one directly says it's no good for isolation.
- QuickJS and similar things - all of them are really big WASM binaries. Most don't provide a way to pause execution and have really weird and complex APIs.
- JS-Interpreter - really slow, ES5 only. Allows pausing and resuming execution, while running it all on the same thread. Has an easy way to create custom APIs. This is what I ended up using.
JS-Interpreter overall ended up being a perfect choice. It's slowness actually fit perfectly for project like WebTiles - everyone had to optimize their code to the maximum and work with limited memory. And I would just have to initialize interpreter only on the first click on the tile. After clicking off, execution would simply be paused until user clicks on the tile again. Only 1 tile could be active at once.
I could also tweak the speed of the interpreter by simply changing the amount of times I call step function per second. Memory management was done by checking overall memory used every 500 steps.
The hardest part about it all is that I had to implement common APIs in JS - DOM, Canvas, XMLHttpRequest, Events, localStorage, DOMParser, console, atob/btoa, alert/prompt/confirm, etc. This was tedious to say the least. Each of the API had to be reimplemented, although it was mostly just writing wrappers for existing APIs and calling interpreter.createNativeFunction on them.
Some interesting details about implemented APIs:
- Logs to console were limited to maximum of 1000 to prevent spamming it.
- Alerts, prompts, confirms were limited to 10 to prevent the tile for locking the page forever.
- Setting
innerHTMLwas not allowed. It's almost impossible to sanitize it. People had to use direct functions likedocument.createElement. - Setters for all attributes like
srcandhrefhad same checks as mentioned in HTML section. document.createElementhad the same blacklist of tags as in HTML section. It would also check if the 500 element limit was reached.- Calling
play()function required inserting element into the DOM first. This is so all sounds were actually pausable when user clicks away from the tiles. If you don't insert the element into DOM when playing, it's impossible to stop it without having the reference to the element. - Getting elements had to be capped to the fake "body" of the tile.
- Any DOM modifications didn't allow elements from DOMParser. See why below.
localStorageused IndexedDB under the hood. JS-Interpreter has a pretty neat function calledcreateAsyncFunction, that allows running async native functions while making them seem synchronous to the interpreter.XMLHttpRequestonly allowed network requests to external resources (to prevent calling WebTiles APIs). I was a bit hesistant whether to add network requests at all, but I felt like leaked IPs were worth the added functionality (people actually created a social media from this after). And requests would only go off when you click on the tile, so it'd be pretty much like visiting a website.
I was really happy that I managed to add JS support. It really made WebTiles much, much more interesting. People created games, apps, and other cool lil things.
Brick breaker game.
Minesweeper.
Whole 3D engine!
Entire social media client that actually works!
Sandbox escapes
The interpreter got escaped a couple of times, and basically all escapes were caused by document.createElement function (and then appending a script tag):
-
Instead of passing a normal string into the function,
new String("script");was inputted. This caused the sanity check to not see any issue, and pass it into the native function. - I didn't know that you can append elements from the DOMParser, so people could just create forbidden elements in the DOMParser and then append them using the normal DOM functions.
-
Someone managed to somehow replace
createElementfunction in DOM with the variant from DOMParser, which didn't have any sanity checks.
In the end, I had to add "tainting" for all DOMParser elements. Any tainted element couldn't be inserted into the DOM. All of those escapes were quite bad, but not catastrophic, given that JS would only run when clicked on the malicious user tile.
The worst, and at the same time interesting case of sandbox escape is a worm that someone created. It spread when anyone clicked on the infected tile, added itself to the main page of the user, and kept replicating while running unescaped code.
Unintentionally, I stopped the worm from spreading by fixing one of the exploits, so it thankfully only spread to around 70 tiles. The worm also unintentionally made all JS and CSS become inlined, removed indentation, and made JS stop working due to the error in it’s code after patching.
When examining it’s code, I at first thought it was the usual obfuscation, but it ended up being an actual VM made of a couple of functions/opcodes:
0x00 read
0x01 call
0x02 sum
0x03 new
0x04 call with 2 args
0x05 call with 3 args
0x06 call with 4 args
0x07 read and call with 2 args (?)
I believe basically all of these escapes could be avoided by having a stricter CSP from the start. I didn't know you can specify a folder for the allowed scripts, and thought it only supported domains. Some mistakes were made, but it was quite fun to see all the creative escapes people did.
End
So, despite the half-joking title, I think it is actually fine to accept user-supplied code. Be careful though.
