## Thursday, November 22, 2012

### One Shiny Dragon + Source

This is viewing a hollow shell of the XYZ RGB Asian Dragon at 1024^3, using what I've written so far of my attempt to implement Cyril Crassin's GigaVoxel method. At the moment it would be optimistic to say I have partly implemented a quarter of it, but it's capable of drawing shiny dragons, and I'd like to share them :-) The original dragon has a tad more detail than 2048^3 can store, which I intend to address in the next phase of the method, mip-mapping. In the mean time, I for one always get sad when I read a blog post about something really cool, and I can't play with it myself.

GitHub - Win32 Binaries - zlib licensed.
This should build with minimal fiddling on your favourite linux distro. I use Debian, but it only depends on X and zlib, so should be easy to build. It also loves clang. It's written in pure C99, at least that's what I tell the compiler.

### Going Deeper...

To all yee who explore the source code, fair warning, it's a hatchet job. With that in mind I hope it should make a kind of sense. The first thing you'll want to look at is the Makefile. convert and convertoct convert simple Wavefront OBJ files, as output by Blender, into a format suitable for throwing at a video card. Worth grabbing the win32 binaries for the included data file, it's not in the git repo.

### convert and polyview

convert turns the OBJ file into the format that polyview expects for a VBO. polyview loads the file into memory, tells OpenGL where it is in memory and how it's formatted, and runs a simple display loop. It's not terribly clever, but it does push all of the load onto the video card, leaving the CPU idle in my tests.

### convertoct and octview

convertoct produces a Sparse Voxel Octtree by discarding the polygons of a mesh file produced by convert, and throwing the points (and their normals) into a buffer. Hatchet Job. So this will only work with models that have a uniform vertex density, mostly 3D scanned objects (like the dragon, and the bunny).

### The Data

A 1024^3 cube is big. That's out of the box we're talking about a Gigabyte of data if we're only using a single byte per voxel, but I'm using 4 floats. This means we don't have enough video memory. Well, it's sparse, which brings it down to 900Mb. The texture data is zipped up with zlib, which brings the data file down to a much more reasonable 17Mb. It's still extracting to 900Mb of RAM though. Due to ?!?!?, the current Nvidia drivers don't like me allocating 900Mb of texture memory on Windows 7, so the drivers convert it to 16bit floats, bringing it down to 450Mb, which it's quite happy with. This doesn't seem to affect performance in this instance, as I'm already abusing the video card in plenty of other ways. 64bit executables solve this problem, I've updated the binary and now it does allocate the full 900Mb.

### The Build System & Porting

The idea is win32.c and x11.c are "shivs", all platform dependant things are in there, they obey their respective OS's rules, and they toggle fullscreen when pressing F11. They call main_* functions in polyview.c or octview.c depending which one is linked. polyview.c has almost no intelligence at all, it's just a test mesh displayer. octview.c is marginally better, providing simple FPS controls, and handballing off to voxel.c. Making a osx.c shiv should be pretty easy, and I'm fairly sure I already have one laying around from an older project.

### voxel.c

This is where we find the initialisation, the texture block is decompressed with zlib, and loaded into a vanilla old OpenGL 3D floating point texture. The Node Pool is loaded into the format that Nvidia requires when you're planning on breaking the legs of the GLSL standard, like I am doing here. The main brains of the program is located in the fragment shader, which this file dutifully loads and compiles it ready for the GPU to use. The final text in this file is a few lines that update variables for the shader, such as the camera position, and then render a single flat polygon covering the whole screen. As all of the work is done in the fragment shader, it will be called once per pixel on the screen (unless you're using antialiasing, then this will run horribly slow, and at no visual benefit at all, this rendering method is aliasing free to begin with). OpenGL experts will marvel at how primitively I'm drawing the quad, and possibly be driven to self harm. Pressing the R key will reload the shader from disk, useful for debugging.

### render.frag

To understand this code you'll really have to read Cyril Crassin's PhD, which can be found here. The quick version is that for each pixel it traces a path through the Sparse Voxel Octtree that was built by convertoct. There's nothing more complicated than that going on, so there's currently no point comparing it to Gigavoxels. Also it does it in GLSL, as opposed to a real GPU compute language, so mine is an amusing abuse of GPU power, at best. For now...

## Tuesday, November 20, 2012

### Voxel Rendering Update

Here's the current progress with simple diffuse lighting. Viewing on the higher resolutions will reveal that I've still got some edge cases to go, quite literally. I think I've got some divide by zero's hiding in the block stepping code, which is why you can see the edges of the voxel blocks. While the video is recorded at 30fps, while I was capturing the video it was running at never less than 40fps, most of the time holding the constant 60fps my old LCD permits. At 1920x1200, 8 bits per colour channel. This is on a GTX680, and I'm feeling rather chuffed that not only this runs smoothly, but that it runs smoothly while recording HD video too :-)