Thursday, November 22, 2012

One Shiny Dragon + Source

This is viewing a hollow shell of the XYZ RGB Asian Dragon at 1024^3, using what I've written so far of my attempt to implement Cyril Crassin's GigaVoxel method. At the moment it would be optimistic to say I have partly implemented a quarter of it, but it's capable of drawing shiny dragons, and I'd like to share them :-) The original dragon has a tad more detail than 2048^3 can store, which I intend to address in the next phase of the method, mip-mapping. In the mean time, I for one always get sad when I read a blog post about something really cool, and I can't play with it myself.

GitHub - Win32 Binaries - zlib licensed.
This should build with minimal fiddling on your favourite linux distro. I use Debian, but it only depends on X and zlib, so should be easy to build. It also loves clang. It's written in pure C99, at least that's what I tell the compiler.



Going Deeper...


To all yee who explore the source code, fair warning, it's a hatchet job. With that in mind I hope it should make a kind of sense. The first thing you'll want to look at is the Makefile. convert and convertoct convert simple Wavefront OBJ files, as output by Blender, into a format suitable for throwing at a video card. Worth grabbing the win32 binaries for the included data file, it's not in the git repo.

convert and polyview


convert turns the OBJ file into the format that polyview expects for a VBO. polyview loads the file into memory, tells OpenGL where it is in memory and how it's formatted, and runs a simple display loop. It's not terribly clever, but it does push all of the load onto the video card, leaving the CPU idle in my tests.

convertoct and octview


convertoct produces a Sparse Voxel Octtree by discarding the polygons of a mesh file produced by convert, and throwing the points (and their normals) into a buffer. Hatchet Job. So this will only work with models that have a uniform vertex density, mostly 3D scanned objects (like the dragon, and the bunny).

The Data


A 1024^3 cube is big. That's out of the box we're talking about a Gigabyte of data if we're only using a single byte per voxel, but I'm using 4 floats. This means we don't have enough video memory. Well, it's sparse, which brings it down to 900Mb. The texture data is zipped up with zlib, which brings the data file down to a much more reasonable 17Mb. It's still extracting to 900Mb of RAM though. Due to ?!?!?, the current Nvidia drivers don't like me allocating 900Mb of texture memory on Windows 7, so the drivers convert it to 16bit floats, bringing it down to 450Mb, which it's quite happy with. This doesn't seem to affect performance in this instance, as I'm already abusing the video card in plenty of other ways. 64bit executables solve this problem, I've updated the binary and now it does allocate the full 900Mb.

The Build System & Porting


The idea is win32.c and x11.c are "shivs", all platform dependant things are in there, they obey their respective OS's rules, and they toggle fullscreen when pressing F11. They call main_* functions in polyview.c or octview.c depending which one is linked. polyview.c has almost no intelligence at all, it's just a test mesh displayer. octview.c is marginally better, providing simple FPS controls, and handballing off to voxel.c. Making a osx.c shiv should be pretty easy, and I'm fairly sure I already have one laying around from an older project.

voxel.c


This is where we find the initialisation, the texture block is decompressed with zlib, and loaded into a vanilla old OpenGL 3D floating point texture. The Node Pool is loaded into the format that Nvidia requires when you're planning on breaking the legs of the GLSL standard, like I am doing here. The main brains of the program is located in the fragment shader, which this file dutifully loads and compiles it ready for the GPU to use. The final text in this file is a few lines that update variables for the shader, such as the camera position, and then render a single flat polygon covering the whole screen. As all of the work is done in the fragment shader, it will be called once per pixel on the screen (unless you're using antialiasing, then this will run horribly slow, and at no visual benefit at all, this rendering method is aliasing free to begin with). OpenGL experts will marvel at how primitively I'm drawing the quad, and possibly be driven to self harm. Pressing the R key will reload the shader from disk, useful for debugging.

render.frag


To understand this code you'll really have to read Cyril Crassin's PhD, which can be found here. The quick version is that for each pixel it traces a path through the Sparse Voxel Octtree that was built by convertoct. There's nothing more complicated than that going on, so there's currently no point comparing it to Gigavoxels. Also it does it in GLSL, as opposed to a real GPU compute language, so mine is an amusing abuse of GPU power, at best. For now...

Tuesday, November 20, 2012

Voxel Rendering Update


Here's the current progress with simple diffuse lighting. Viewing on the higher resolutions will reveal that I've still got some edge cases to go, quite literally. I think I've got some divide by zero's hiding in the block stepping code, which is why you can see the edges of the voxel blocks. While the video is recorded at 30fps, while I was capturing the video it was running at never less than 40fps, most of the time holding the constant 60fps my old LCD permits. At 1920x1200, 8 bits per colour channel. This is on a GTX680, and I'm feeling rather chuffed that not only this runs smoothly, but that it runs smoothly while recording HD video too :-)

Sunday, May 6, 2012

Block-Edge Issues Sorted

I've worked out all of the block edge related issues. Most of it was solved by the neighbour border copy method, and the remaining ones were fixed by carrying over the step remainder between block rays. The artefacts visible at the moment are because of the crude method I'm converting the dragon, and also his very low resolution. I'm currently sampling him in a virtual block of 2563 voxels, while he's natively closer to 20003 voxels. I've also implemented proper camera transformation, so now you can move around the dragon and view it from any angle. Next on with the lighting...

Wednesday, May 2, 2012

First recognisable images out of the Voxel engine

Merrily hacking away after reading Cyril Crassin's thesis and finally have my first screenshot that looks like something. This is coded in C and GLSL only so far. It's a sparse oct-tree voxel based volume raycasting routine. It *really* hammers the GPU, but amazingly at 1024x768 this pulls somewhere over 20fps on my Nvidia GT520. At 1920x1200 it runs around 5fps. Considering this is the first version I'm rather pleased with myself. I've got it building on Windows and Linux.

Monday, April 2, 2012

Render test of the XYZ-RGB Dragon

This fella is the XYZ RGB "Asian Dragon", courtesy of The Stanford 3D Scanning Repository. The original file is a Stanford PLY file, which I converted to Wavefront Obj with Blender after rotating him 90 degrees around his X axis and scaling him. The OBJ file was about 230Mb, then I sent it through I tool I wrote to convert OBJ files into the format I put them into the video memory in. He has 3,609,455 vertices, 7,218,906 triangles, and 3.6M Normals, totalling 193Mb of VRAM. The first 3 Hard Drives I owned combined weren't that big. I generated per face normals, compiled a per vertex list of which faces reference which vertex, then went through that list averaging the normals to produce per vertex normals. Converting him takes about 12 seconds on my E8600 w/ DDR2, amusingly about 8 of those seconds is spent inside atof(). He renders at about 4FPS on my Nvidia GT520, Note the CPU usage up the top. Pretty good for a $30 video card if you ask me. Lets see if my current project can do something about that framerate...

Sunday, February 19, 2012

OpenCL Compilation Tool

I wrote a simple OpenCL compiler interface tool, partially to sink my teeth into OpenCL, partially because I wanted it. Also it seems I'm one of the few who's tackling OpenCL from a pure C angle, so this code may be useful to other folks going down that path.

It should be noted, that there is already a tool that does this, clcc, which supports... well... options. All of them it seems. It's written in C++, uses Boost, etc. I encourage folks to use that one instead of mine, it's much better. But if you're a fan of KISS, then you might want to check out mine. But clcc is better. On with the show...

Invoking it displays it's usage...

D:\code\opencl\ocl>ocl
OpenCL Compiler Test Tool
(c) 2012 Daniel Burke - dan.p.burke@gmail.com
USAGE:
        ocl
                display arguments and devices
        ocl filename
                compile filename with first device
        ocl N filename
                compile filename with Nth device

The Device list is...
-----------------------
AMD Accelerated Parallel Processing
   0    ATI RV770
   1    ATI RV770
   2    Intel(R) Core(TM)2 Duo CPU     E8600  @ 3.33GHz
Intel(R) OpenCL
   3    Intel(R) Core(TM)2 Duo CPU     E8600  @ 3.33GHz

If compilation works, it tells you how many bytes the binary is. If it fails, it outputs the compiler messages. That's it. It's a single C file, the linked archive includes a Makefile for Mingw, and a binary for windows. It's so simple you should have *no* problems building it on anything. Compiles with gcc and clang with -Wall with no warnings. What more could you want? Features? Use clcc.