Introduction
A-Painter was the first demo we made using A-Frame. It was released more than a year ago, and it’s still one of the most popular WebVR experiences shown at meetups and exhibitions.
We wanted to show that the browser can deliver native like VR experiences and also push the limits of what A-Frame was able to do at the time. We’ve seen A-Painter being used more and more by professional artists and programmers that test the limits and extend its capabilities. Performance is the bottleneck that people first hit with moderately complex drawings, due to the increased number of strokes and geometry. On collaborative drawing experiences performance degrades even faster since you have multiple users adding geometry simultaneously.
Recently we had some bandwidth and rolled up our sleeves to implement some of the optimizations ideas that we had collected in the past (Issue #222, PR #241).
Draw calls simplified
There can be several causes of bad performance in your graphics application but looking at the number of draw calls it is a good starting point to investigate. . A draw call is, as its name indicates, a call to a drawing function on the graphics API with the geometry and the material properties that we want to render. In WebGL it could be a call to gl.drawElements or gl.drawArrays. These calls are expensive so we want to keep them as low as possible.
We will use the Balrog scene by @feiss to measure the impact of the optimizations I’m proposing here. The following image shows the statistics when rendering this scene on my computer (Windows i7 GTX1080):
We should identify which numbers affect our performance:
- 14 textures: Every brush that needs a material (lines and stamps) creates its own texture. Instead, they could reuse an atlas with all the textures.
- 542 entities: We created one entity per stroke.
- 454 geometries: One entity per stroke means also one
Object3D
,Mesh
,BufferGeometry
and Material per stroke. - 450 calls: The number of draw calls we would like to optimize.
To reduce the number of draw calls we should:
- Reduce the number of textures.
- Reduce the number of materials (same reason).
- Reduce the number of geometries by merging all the meshes we can into a bigger mesh, so we could use just one draw call to paint multiple geometries.
In the following sections we will go through these steps explaining how we could apply them to our application.
Materials
Atlas
The first step is to reduce the number of materials created as switching from one material to another will cause a new draw call.
We will start creating a new atlas containing all the textures for each brush, using the spritesheet.js tool, adding a new npm command called atlas
using spritesheet-js
to pack them:
"atlas": "spritesheet-js --name brush_atlas --path assets/images brushes/*.png"
If we now execute npm run atlas
, it will create a png image with all our brush textures and a JSON file with the needed information to locate them inside the atlas.
The generated JSON is very easy to parse and includes the size of the generated atlas (meta.size
) and the list of images included with their position on the atlas (frames
).
{
"meta": {
"image": "brush_atlas.png",
"size": {"w":3584,"h":2944},
"scale": "1"
},
"frames": {
"stamp_grass.png":
{
"frame": {"x":0,"y":128,"w":1536,"h":512},
"rotated": false,
"trimmed": false,
"spriteSourceSize": {"x":0,"y":0,"w":1536,"h":512},
"sourceSize": {"w":1536,"h":512}
},
"lines4.png":
{
"frame": {"x":0,"y":0,"w":2048,"h":128},
"rotated": false,
"trimmed": false,
"spriteSourceSize": {"x":0,"y":0,"w":2048,"h":128},
"sourceSize": {"w":2048,"h":128}
},
"stamp_fur2.png":
{
"frame": {"x":0,"y":640,"w":1536,"h":512},
"rotated": false,
"trimmed": false,
"spriteSourceSize": {"x":0,"y":0,"w":1536,"h":512},
"sourceSize": {"w":1536,"h":512}
},
Now we need a simple way to convert the local UV coordinates to the new coordinates inside the atlas space. To ease this task we will create a helper class called Atlas that will parse the generated JSON and will provide two functions to convert our UV coordinates.
function Atlas () {
this.map = new THREE.TextureLoader().load('assets/images/' + AtlasJSON.meta.image);
}
Atlas.prototype = {
getUVConverters (filename) {
if (filename) {
filename = filename.replace('brushes/', '');
return {
convertU (u) {
var totalSize = AtlasJSON.meta.size;
var data = AtlasJSON.frames[filename];
if (u > 1 || u < 0) {
u = 0;
}
return data.frame.x / totalSize.w + u * data.frame.w / totalSize.w;
},
convertV (v) {
var totalSize = AtlasJSON.meta.size;
var data = AtlasJSON.frames[filename];
if (v > 1 || v < 0) {
v = 0;
}
return 1 - (data.frame.y / totalSize.h + v * data.frame.h / totalSize.h);
}
};
} else {
return {
convertU (u) { return u; },
convertV (v) { return v; }
};
}
}
};
Using this helper we could easily convert the following code:
material.map = “lines1.png”;
uv[0].set(0, 0);
uv[1].set(1, 1);
Into this:
material.map = “atlas.png”;
converter = atlas.getUVConverters(“lines1.png”);
uvs[0].set( converter.convertU(0), converter.convertV(0) );
uvs[1].set( converter.convertU(1), converter.convertV(1) );
Thanks to the atlas technique, the number of textures in our app is reduced from 30 to 1.
Vertex colors
But we still have a problem: each stroke has a material and a custom value for material.color
, making impossible to share the same material across all the strokes.
Fortunately, reusing the same material when just changing the color has a very simple solution: vertex colors.
We can define a specific color for each vertex of the geometry with vertex colors, that will be multiplied by the material color that is applied to the whole geometry. Since this value will be pure white (1, 1, 1)
, vertex colors will remain unaltered (We could any color other than white to tint the vertex colors).
In order to do that, we should set up the material to use vertex colors...
mainMaterial.vertexColors = THREE.VertexColors;
...and create a new color buffer attribute for our mesh:
var colors = new Float32Array(this.maxBufferSize * 3);
geometry.addAttribute('color', new THREE.BufferAttribute(colors, 3).setDynamic(true));
// Set everything red
var color = [1, 0 ,0];
for (var i=0;i < numVertices; i++) {
colors[3 * i] = color[0];
colors[3 * i + 1] = color[1];
colors[3 * i + 2] = color[2];
}
geometry.attributes.uv.needsUpdate = true;
Thus, the number of materials is reduced from the number of individual strokes (hundreds in one painting) to just 4: two physically-based render materials (THREE.MeshStandardMaterial
), and two THREE.MeshBasicMaterial
: one textured and one for solid colors.
Reducing the number of A-Frame entities
Initially every stroke generated a new entity, and each entity contains a mesh that it’s rendered separately on its own draw call. Our goal is to reduce these entities and meshes by merging all the strokes into one big mesh that could be rendered with just one draw call.
We were creating one entity for each stroke appending it to a <a-entity class=”a-drawing”>
root entity:
// Get the root entity
var stroke = brushSystem.startNewStroke();
// Get the root entity
var drawing = document.querySelector('.a-drawing');
// Create a new entity for the current stroke
var entity = document.createElement('a-entity');
entity.className = "a-stroke";
drawing.appendChild(entity);
entity.setObject3D('mesh', stroke.object3D);
stroke.entity = entity;
We could remove the entities per stroke overhead by adding directly the new stroke mesh to the root entity’s Object3D:
var stroke = brushSystem.startNewStroke();
// Create a new entity for the current stroke
drawing.object3D.add(stroke.object3D);
and we would need to modify some pieces of code like the “undo” functionality to just remove the object3D instead of the entity. Although this is just a temporary step before we could get rid of these object3D per stroke by merging all of them and save many draw calls by sharing BufferGeometries.
Shared BufferGeometry
We have already reduced the number of textures, materials and entities but we still have an Object -> Mesh -> BufferGeometry
per stroke, so we haven’t reduced yet the number of draw calls.
Ideally it would be better to have a very big BufferGeometry and keep adding vertices to it on each stroke, and just send it to the GPU saving plenty of draw calls.
For this purpose we will create a class called SharedBufferGeometry that will be instantiated by just passing the type of material we want to use.
function SharedBufferGeometry (material) {
this.material = material;
this.maxBufferSize = 1000000; // an arbitrary high enough number of vertices
this.geometries = [];
this.currentGeometry = null;
this.addBuffer();
}
addBuffer
will create a Mesh with a BufferGeometry with the needed attributes and it will add it to the list of meshes the root Object3D has.
It will contain some helper functions so we don’t need to care about indices or buffer overflows as it should be handed automatically.
addVertex: function (x, y, z) {
var buffer = this.currentGeometry.attributes.position;
if (this.idx.position === buffer.count) {
this.addBuffer(true);
buffer = this.currentGeometry.attributes.position;
}
buffer.setXYZ(this.idx.position++, x, y, z);
},
addColor: function (r, g, b) {
this.currentGeometry.attributes.color.setXYZ(this.idx.color++, r, g, b);
},
addNormal: function (x, y, z) {
this.currentGeometry.attributes.normal.setXYZ(this.idx.normal++, x, y, z);
},
addUV: function (u, v) {
this.currentGeometry.attributes.uv.setXY(this.idx.uv++, u, v);
},
Actually we’ll need more than one BufferGeometry because some brushes don’t have textures so they don’t need UV attributes and/or are not affected by lighting so no need for normal attribute either.
We'll create SharedBufferGeometryManager
to handle all this buffers:
function SharedBufferGeometryManager () {
this.sharedBuffers = {};
}
SharedBufferGeometryManager.prototype = {
addSharedBuffer: function (name, material) {
var bufferGeometry = new SharedBufferGeometry(material);
this.sharedBuffers[name] = bufferGeometry;
},
getSharedBuffer: function (name) {
return this.sharedBuffers[name];
}
};
We will initialize the buffers by calling addSharedBuffer
with the different types of materials we’ll be using:
sharedBufferGeometryManager.addSharedBuffer(‘unlit’, unlitMaterial);
sharedBufferGeometryManager.addSharedBuffer(‘unlitTextured’, unlitTexturedMaterial);
sharedBufferGeometryManager.addSharedBuffer(‘PBR’, pbrMaterial);
sharedBufferGeometryManager.addSharedBuffer(PBRTextured’, pbrTexturedMaterial);
And we will be ready to use them:
var sharedBuffer = sharedBufferGeometryManager.get(‘PBR’);
sharedBuffer.addVertex(0, 1, 0);
sharedBuffer.addVertex(1, 1, 0);
sharedBuffer.addUV(0, 0);
As we are storing several meshes without any connection we’ll be using single triangle soup. That means we’ll store 3 vertices for each triangle without sharing any of them. It will increase the bandwidth and memory requirements as we’re storing (NUM_POINTS_PER_STROKE
* 2 * 3) floats for each stroke.
In the case of the Balrog scene using this method our final buffer will have 406.782 floats.
Triangle strips
In order to reduce the size of our BufferGeometry attributes, we could switch from triangle soup to triangle strips. Although using triangle strips in some situations could be challenging, using it when painting lines/ribbons is pretty straightforward: we just need to keep adding two new vertices at a time as the line grows.
Using triangle strips we’ll reduce the size of the position attribute array drastically.
- Triangle soup:
NUMBER_OF_STROKE_POINTS
* 2 (triangles
) * 3 (vertices
) * 3 (xyz
) - Triangle strip:
NUMBER_OF_STROKE_POINTS
* 2 (vertices
)
As an example here are the position array sizes from the Balrog example:
- Triangle soup: 135.594.
- Triangle strip: 47.248.
At first it looks like an easy win, but if we look carefully we will realize that we need to fix some issues: we are sharing the same buffer to draw several unconnected strokes, but every time we add a new vertice it will create a new triangle using the two previous vertices, so all the strokes will be hideously connected:
We need a way to tell WebGL to skip that triangles, and although WebGL doesn’t support primitive restart we can do our own “primitive restart” by creating a degenerated triangle, which is a triangle with no area that is discarded by the GPU.
To create these degenerated triangles to separate two strokes we will just duplicate the last vertex from the first stroke and duplicate the first vertex on the next stroke.
So I create a restartPrimitive()
function on the SharedBufferGeometry
to duplicate the last vertex of the latest stroke, and once the first point of the next stroke is added, it has to be duplicated too.
The hardest part here is when working with indices as we need to keep track of the degenerated vertices so these position offsets have to be applied to the texture coordinates, colors and normals, too.
You can take a look at the Line Brush to see the whole implementation.
Results
In the following table we could see the statistics of the Balrog scene before and after the exposed optimizations:
Please note that these numbers also include common scene objects: floor, sky, controllers…
If we launch devtools we can see that we drastically reduced the animation frame time:
Before:
After:
Looking at the performance graphs it’s visible how the GC hit because of the allocations we did on the render loop. Also we’ve reduced considerably the CPU usage.
Before:
After:
The performance differences are visible both on Chrome and Firefox. For example the following is a graph of the latter where the fps drop to 2fps almost every two seconds.
Before:
After:
As demonstrated, the optimizations implemented have a major impact on A-Painter performance without adding too much complexity to the code. VR mode benefits the most. You can now paint for a long time without getting motion sickness: That’s a pretty good usability enhancement.
Further improvements
There are still many other optimizations that can further improve performance:
- Find an alternative for a bug (or feature) when computing the bounding box or sphere when using BufferGeometry that will include always the unused vertices from the array (origin vertices at (0,0,0)) so the frustum culling won’t be as effective as it should.
- Use geometry instancing on brushes that paint spheres and cubes, so we don’t create new geometry on each stroke but reuse the previously created geometry.
- Use a LOD system to reduce the complexity of drawings based on distance. This could be useful if used on a big open environment like a social AR app.
- Also on big open spaces it could be useful to implement some kind of space partitioning (Octrees or BVH) to improve culling and interaction on each stroke.