## Saturday, June 4, 2011

### List of Free for Commercial Assets & Tools

For the programmers who are just too busy coding to find art folks and too busy eating baked beans to pay them; I present this list. Through much personal research and relentless trolling of Reddit I've found many free assets, tools, and everything in-betwixt. Make sure you carefully check the licensing terms yourself; I may have incorrectly read the license or it may have changed since I last checked.

MakeHuman - Generate human models with skeletal animation data. Like "The Sims" character editor, but can output Wavefront OBJ. They aim for movie grade, but also have low-poly models too.
BlendSwap - Blender Scene sharing site, most restrictive license permitted is CC-BY-SA, my quick survey indicated most are CC-BY only.
Blender Materials - Materials for use in Blender, good inspiration for writing your own real-time stuff, or use them in cutscenes.
Carnegie Melon University Mocap Data - Poke around some people have cleaned the data up
Ohio State University Mocap - They just restructured their website, but the data is still there
FatBOX Software 3D models [Facebook login required] - Free models, mostly mobile phone resolution
Arbaro - Generate movie grade trees (700k polys)
ngPlant - Generate real-time grade plants/trees with real-time preview
tree[d] [win32 only]- Generate real-time Trees
cgTextures - High res textures, many are tileable
Mayang Textures - Huge amount of textures, none are tileable but easy enough to make tileable with Resynth.
Backyard Ninja Free Stuff - 2D tileset assets.. really nice pixel painted tiles for a platformer.
The FreeSound Project - Huge amount of sound effects, licensed by CC-Sampling Plus
HasGraphics - CC licensed artwork listing, mostly 2D tiled.
Lost Garden - Webpage of the 2D guy behind Tyrian, lots of free top quality 2D artwork.

### 2D Image

MyPaint - this one's aimed at Artists, as opposed to The Gimp which is squarely aimed at programmers.
The Gimp - The Swiss-Army-Chainsaw of free 2d image editing
Gimp-Texturizer - One click make an image tileable (didn't work for me on windows)
Resynthesizer - The original "context aware fill", although it does much more than that. Load an image, in gimp click "Filters->Map->Resynthesize" and it will make most images brilliantly tileable.
AutoStitch [win32] - Joins several photos together, fixing scale, colour and exposure to make one homogeneous image.
Paint.NET [win32] - A lightweight and fast paint app that's great for quick work.
Inkscape - Vector graphics editor
GrafX2 - PixelArt oriented program, pining for the days of the Amiga 500 and Deluxe Paint
Tiled - Map editor for making Orthogonal and Isometric tile-based maps.
The Compressionator [DirectX] - Tool from AMD for generating compressed textures and mipmaps.
SSBump Generator [win32] - Generates bump maps from height maps. Can also generate height maps from regular images.
NeoTextureEdit - a LGPL3 procedural texture editor

### 3D Image

Blender - 3d modelling tool, great for cleaning up mocap data & making models
MeshMixer - From their site.. "meshmixer is a free tool for making crazy-ass 3D stuff without too much hassle. Or boring stuff too. You decide.". Basically drag and drop bits of things onto other things, and it makes the meshes fit. I think it will be great for baking clothes and stuff onto existing models.
Aqsis - Renderman compatible render, good for your cutscenes and as a replacement for Blenders built in renderer.
Normal Mapper [win32] - Can generate normal maps from a high poly model and a low poly model.
RenderMonkey [win32] - Tool for prototyping shaders, with real time preview.

### Audio

LMMS - MIDI Tracking tool (making music), the best free one I've seen by miles. Looks like it would happily compete with the commercial options.
Open ModPlug Tracker - A Mod tracker, very primitive looking interface, but don't let that fool you, it's very capable.
Ardour [Unix, OSX] - Non-Linear-Editing for Audio. Also capable of real-time mixing. It's an industrial grade tool.
Audacity - Audio tool, great for cleaning up samples and the like.

### Video

Lightworks Public Beta [win32] - Industrial grade Non-Linear Video editing suite.

### All-in-One

Aviary [Web-based] - Looks like it does everything in this list except 3D modelling.

## Friday, March 11, 2011

### OpenGL Vertex Buffer Objects

It took me some time, but I've finally got them working the way I want, and now you can too. But first, an example...

Now you can see why that broken foot silhouette was so exciting for me. Now onto the making it happen for you.

### VBO's in a nutshell

Traditionally Vertex Buffers were a place you could upload carefully formatted geometry to the video card, and that's about it. Then came the ARB_matrix_palette extension, and things got a bit exciting. Nowadays we have GLSL, and can put whatever we feel like in them. Lets take a look at how that's done...
// VBO init code
GLuint vbo = -1;
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, num_vert * sizeof(vertex), vert_ptr, GL_STATIC_DRAW);

// Element Array Init Code
GLuint ebo = -1;
glGenBuffers(1, &ebo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, &ebo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, num_quad * 8, quad_ptr, GL_STATIC_DRAW);

And to draw you call...
// Draw Code
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ebo);
glVertexPointer(3, GL_FLOAT, 0, 0);
glEnableClientState(GL_VERTEX_ARRAY);
glDrawElements(GL_QUADS, num_quad * 8, GL_UNSIGNED_SHORT, 0);

At least, that's what you would expect from reading the spec. And indeed this code may just render your pile of flat untextured polygons. On some cards.

This is where it gets fun. Buckle up.

Lets start with the VBO init code. Actually that's almost perfect - however a lone vertex doesn't cut it. Now a float is 4 bytes, so a lone vertex is 12 bytes. We also want Normals - another 12 bytes, and a Texture Coordinate, 8 more bytes bringing us to 32 bytes exactly, which is really convenient as we will find out, but first allow me to segway onto skeletal animation techniques for a moment.

### Skeletal Animation

So you have 2 bones connected by a joint, lets imagine a knee. Now when you bend your knee, say 90°, everything from a bit below the knee moves in a straight forward 100% of 90degrees motion. It's the points along your knee that are tricky, otherwise you get nasty artifacts like this...

So we need a method of partially rotating the points along your knee, depending how close they are to the joint, or whatever, we don't care, that's the modellers problem. What we need to care about is how it's done, with Weights. So the bone will have a list of which points it affects, and to what extent it affects each of them. In the bad old scary days when this was all done in software people used Quaternions. Quaternions can be used similar to matrices, in that they can store rotation, and you can bash some vertices against a quaternion and it will spit out rotated vertices. Quaternions have the very cool property that it is possible to interpolate between 2 quaternions, depending on the method you use, it will correctly follow the plane you want it to rotate in. Also Quaternions use less operations that Matrices do. Anyhow, these days we have hardware that does Matrix * vertex multiplication really fast, and the modern day trick is to use these weights to do weighted averaging of the points that are spit out by the two separate matrices.

### Back to our Vertex Buffer Object

So our nice friendly lovable artists have given us a model with a bunch of bones, which each reference a bunch of vertices and weights. We need to find out how many bones reference each vertex, and specifically find out what is the most bones referencing a single vertex. For the particular build of MakeHuman I'm using models from, this is 6.

ATI recommends that your vertex entries are multiples of 32 bytes, to speed up fetch operations. This is sound advice no matter what hardware you're using. So the vertex entry I'm uploading looks like this...
12 bytes of Vertex (3 floats)
12 bytes of Normal (3 floats)
8 bytes of TexCoord (2 floats)
24 bytes of Bone Weights (6 floats)
6 bytes of bone indicies (1 byte each)
1 byte of number of bones actually referencing this vert
1 byte of zero padding

Now we have that settled, lets look at the final version of the VBO init code...
// VBO init code - Final
GLuint vbo = -1;
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, num_vert * 64, vert_ptr, GL_STATIC_DRAW);

It's nice when things work out like that. Onto the Element Buffer. This is where things get a bit annoying. Here it is again to refresh your memory.
// Element Array Init Code
GLuint ebo = -1;
glGenBuffers(1, &ebo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, &ebo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, num_quad * 8, quad_ptr, GL_STATIC_DRAW);

On my ATI 4870x2, that call to glBufferData(), while it does allocate the appropriate memory on the video card, doesn't actually copy any data. No problem, a quick read of the spec says we can do this...
glBufferSubData(GL_ELEMENT_ARRAY_BUFFER, 0, num_quad * 8, quad_ptr);

No you can't. That call actually crashes the video driver on my card. Yeah not happy at all. All is not lost... here is the Element Buffer init code that actually works...
// Element Array Init Code
GLuint ebo;
void *tmp;
glGenBuffers(1, &ebo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ebo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, num_quad * 8, NULL, GL_STATIC_DRAW);
tmp = glMapBuffer(GL_ELEMENT_ARRAY_BUFFER, GL_WRITE_ONLY);
glUnmapBuffer(GL_ELEMENT_ARRAY_BUFFER);

### Last Wave of Dragons, I Promise

Here's the code I'm using to draw my mesh...
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glVertexPointer(3, GL_FLOAT, 64, 0);
glNormalPointer(GL_FLOAT, 64, 12);
glTexCoordPointer(2, GL_FLOAT, 64, 24);
glVertexAttribPointer(bw1, 4, GL_FLOAT, GL_FALSE, 64, 32);
glVertexAttribPointer(bw2, 2, GL_FLOAT, GL_FALSE, 64, 48);
glVertexAttribPointer(bi1, 4, GL_BYTE, GL_FALSE, 64, 56);
glVertexAttribPointer(bi2, 4, GL_BYTE, GL_FALSE, 64, 60);

glEnableVertexAttribArray(bw1);
glEnableVertexAttribArray(bw2);
glEnableVertexAttribArray(bi1);
glEnableVertexAttribArray(bi2);
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_NORMAL_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);

glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, obj->ebo);

glDisableVertexAttribArray(bw1);
glDisableVertexAttribArray(bw2);
glDisableVertexAttribArray(bi1);
glDisableVertexAttribArray(bi2);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_NORMAL_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glDisable(GL_TEXTURE_2D);

glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);

And here is the vertex shader I used...
uniform mat4 SkinMat[80];

attribute vec4 weight1;
attribute vec2 weight2;
attribute vec4 boneid1;
attribute vec4 boneid2;

varying vec4 pos;
varying vec3 normal;

void main()
{
normal = gl_NormalMatrix * gl_Normal;

pos = vec4(0,0,0,0);

vec4 weight = weight1;
vec2 w2 = weight2;
vec4 bone = boneid1;
vec4 b2 = boneid2;
int nbone = int(boneid2.w);

if(nbone==0)
{
pos = gl_Vertex;
}
else
{

for(int i=0; i<nbone; i++)
{
pos += weight.x * (SkinMat[int(bone.x)] * gl_Vertex);
// Rotate variables
weight = vec4(weight.yzw, w2.x);
w2.x = w2.y;
bone = vec4(bone.yzw, b2.x);
b2.x = b2.y;
}

}
gl_FrontColor = weight1;
gl_Position = gl_ModelViewProjectionMatrix * pos;
pos = gl_ModelViewMatrix * gl_Vertex;
}

So it's important to mention that all of those..
glEnableVertexAttribArray(bw1);
glEnableVertexAttribArray(bw2);
glEnableVertexAttribArray(bi1);
glEnableVertexAttribArray(bi2);

calls were preceded with...
bw1 = glGetAttribLocation(prog, "weight1");
bw2 = glGetAttribLocation(prog, "weight2");
bi1 = glGetAttribLocation(prog, "boneid1");
bi2 = glGetAttribLocation(prog, "boneid2");
SkinMatUnif = glGetUniformLocation(prog, "SkinMat");

Lastly I should mention this... looking at the spec may make you think that you can just throw whatever you want at the shader, and it's all good. Not so! While I'm sure one day there may be a device that supports every conceivable combination out there, something like...
glVertexAttribPointer(foo, 3, GL_BYTE, GL_FALSE, 64, 60);

Is going to end in tears. The short version is this... only expect it to work with the types that can be sent individualy old-school style... as documented here. Want to send 3 GL_SHORT's to the video card? Is there a glVertexAttrib3s() function? no. Hope that helps.

## Monday, March 7, 2011

### Most Underwhelming Screenshot Ever

See that silhouette of a horribly broken foot? That's not just any broken foot, that's a fully hardware accelerated foot :-)

### Ummm....

Yes I get what you're thinking... who cares? Well this particular foot is attached to a model exported from MakeHuman into an OpenGL Vertex Buffer Object, with the full armature and all skinning calculated on the GPU. I've spent the past few days banging my head against the ATI drivers wondering what crack they were smoking, and now I finally understand it all after writing a VBO-free version (that still uses the skinning vertex shader).

### So....

This 13k poly model (all quads), with most of the bone structure implemented (I've got 70 bones, Makehuman exports about 120 last I checked), currently uses about a single percentage point of my CPU (I have a 4870x2, and as such, don't care how much GPU it's using), and the video card doesn't need to spin up it's fans to deal with this. The model data totals about 1Mb of video memory. I still have to write some code to "unzip" the texture seams (VBO's can't have multiple Texture Coordinates per Vertex). So, lots of work left, but all of the technical challenges have been conquered. Hazzah!

## Wednesday, January 5, 2011

### Octtree's for Two

Next time you're out having dinner with that special someone, why not strike up a romantic conversation about Octtrees? For just such an occasion, here is what I've figured out about the subject.

### Hors d'œuvres

An octtree can be thought of as a 3 dimensional binary tree. Typically in a tree the OO mentality would suggest that we create objects for each node, with pointers to the child nodes. But we're using C, and want to avoid all of that hassle. If only there was a clever way to avoid all of that setup. With an array.

First of all, we want to malloc() some memory for this. Each node in an octtree has 8 child nodes, so for an Octtree with 1 layer, there are 9 nodes (1 for the root, and 8 children). For 2 layers there are 73 nodes, and for 3 layers 585. But we're playing with Octtrees, surely there is a trick for figuring out these numbers. If we look at those numbers in Octal, things become clearer. $9_{10}$ in octal is $11_8$. $73_{10}$ in octal is $111_8$, $585_{10}$ in octal is $1111_8$. So we can safely say a $6_{10}$ layer octtree has $1111111_8$ nodes in it. One octal number takes 3 bits to represent, so we can cheat.
int ot_size(int depth){   return 011111111111 & (0xFFFFFFFF >> (32-(depth*3 + 1))); }

#### Code Explanation

Prefixing a zero is how you write in Octal in C. We're trimming down the 0xFFF's down to the length we want, so we can get enough octal 1's for the size octtree we want. (Depth * 3) is because each octal digit is 3 bits long.

#### Wait a minute...

Why the plus one? Lets take a tree, With a root layer, 1 node, first layer, 8 nodes and second layer, 64 nodes. For calculating the size of the node, it works best to describe this as a 3 layer octtree because we need 3 octal digits to count the number of nodes. Like the good C programmers we are, our layers are indexed starting from 0. So while the root node technically is a layer, it's a very special case. Later when I'm saying a "3 layer tree", you can pretend I'm saying "3 layers plus a root node".

Unfortunately we have imposed a limit on ourselves with this function, of having octtree's with no more than 10 layers in them. But if you really need an Octtree with more than 1,227,133,513 nodes in it, you'll be able to adjust the function yourself.

If we create a 4 layer octtree, and want the offset in the array of the first element of the 4th layer,
ot_size(3);
will give it to us. Incidentally, the leaf layer contains 8 to the 4th power nodes, and in each dimension there is 2 to the 4th power of nodes. We can prove this as

$8^4 = 2^4 \times 2^4 \times 2^4$
or
$4096 = 16 \times 16 \times16$

### Main Course

Now we have allocated our octtree, we want to start populating it with data. So we need a uniform method of mapping space to a branch in the tree. This borders into Space filling curves, but we can do something easier than that. As mentioned earlier, an octal digit takes 3 bits to store, one for each spatial dimension. We can use Morton Numbers for this. Here is a pair of functions that will do the job
int spread_bits(int x){   x = (x | (x << 16)) & 0x030000FF;   x = (x | (x <<  8)) & 0x0300F00F;   x = (x | (x <<  4)) & 0x030C30C3;   x = (x | (x <<  2)) & 0x09249249;   return x;}int cell_offset(int x, int y, int z){   return spread_bits(x)      | spread_bits(y) << 1      | spread_bits(z) << 2;}

So if our octtree is covering 1000.0f units of space (and is an even cube for simlicity), and is 4 layers deep, then each cell on the leaf node layer is $\dfrac{1000}{2^4} = 62.5$ units along each edge. So to find the node into which to insert our point of data (p), we do this
int x,y,z, offset;x = (int)(p.x / 62.5f);y = (int)(p.y / 62.5f);z = (int)(p.z / 62.5f);offset = cell_offset(x,y,z);

Now before you go and index into your array, don't forget you need to add the offset of the current layer to that.
int layer = ot_size(3);int target = layer + offset;

Now you're probably wondering why I'm keeping the layer offset and the cell offset in separate variables. Here's why...

### Dessert

As I've been harping on about, octal numbers take 3 bits. And I've rather conspicuously avoided talking about tree traversal. That is because we are going to traverse the tree from bottom to top. So that node we've just populated with data, we want to find it's parent node, couldn't be easier.
layer = layer >> 3;offset = offset >> 3;target = layer+offset;

Want to find the neighboring nodes of the leaf we were just looking at?
offset = layer+cell_offset(x+1, y, z);

### After Dinner Mint

These techniques apply to n-dimensional binary trees. Want a Quad-Tree? Morton numbers with 2 variables is the canonical example. A Binary tree? Morton numbers with 1 variable are just the same number. Enjoy!

### Fluid Simulation Goodness

Just got the fluid simulation code I've been working on to the point where it works correctly. Until I get something more impressive, enjoy this video of a dot vortex!

If you're interested in playing with this yourself, check out the excellent articles by Dr Michael Gourlay over at the Intel Software site.

I should note, this simulation has 27000 particles in it, and runs at 60fps while using about 60% of one of my cores (on a 3.33Ghz c2d). That's without any optimisation, and full debug symbols compiled in.