MAIN DFPSR DFPGE STEAMROLLER LINKS CONTACT

David Piuva's Software Renderer

It's harder to use in the beginning because you have to learn CPU optimization to make your own real-time filters, but much easier to maintain in the long run by not having heavy dependencies scattered everywhere in the system. It has most of the basic features in 2D, 3D and isometric rendering and some additional GUI and window management to make it easier to use. If a new operating system comes out, you just need to write a new window backend uploading the canvas image and handling basic mouse and keyboard interaction.

Image of gloomy dungeon rendered using

I've just begun optimizing, but the Sandbox SDK example's performance is currently:
720x480 (native resolution) @ 470 FPS on Intel Core I5 9600K
800x600 (native resolution) @ 453 FPS on Intel Core I5 9600K
960x540 (upscaled to 1920x1080 on the CPU) @ 325 FPS on Intel Core I5 9600K
1920x1080 (native resolution) @ 185 FPS on Intel Core I5 9600K

To get it this fast, use an equivalent CPU on Linux, set the desktop's resolution manually (to a resolution you know that your screen can handle), press F11 to enter full-screen and press R to disable ambient light.

So why don't we just use the same technique on a GPU to make it even faster with more performance?
Using the same technique on the GPU will actually slow rendering down, because the fixed function hardware cannot be turned off. Ordering memory for linear cache reads does nothing on a GPU because its cache is multi-dimensional. GPUs dislike sampling transparent pixels, so pre-rendering actually hurts GPU performance by doing more work in total. Reading depth per pixel from an image disables the GPUs quad-tree optimization. In other words, the GPU cannot beat the CPU for isometric rendering when it comes to speed, and speed is all that GPUs are good at.

Border element

Disadvantages of software rendering

Image of black cube with yellow corners

* CPUs cannot beat GPUs for what they are actually made for, rendering complex 3D graphics quickly in high resolutions. But if you're mostly writing retro 2D games, this point makes little sense when your framerate is limited by the monitor either way.

* GPUs are easier to optimize shaders for, because the graphics card will just throw more calculation power at the problem in a lower frequency. Learning to optimize image filters on a CPU will however be a useful skill if you plan to work with safety-critical computer-vision.

* GPUs can work while the CPU does other things. This point only applies if both rendering and game logic is heavy. Most games that are heavy on game logic are just poorly optimized from relying on scripts in performance bottle-necks, thrashing heap memory and abusing generic physics engines.

Border element

Benefits to software rendering

* No graphics drivers required. I once wrote a text editor that required the latest version of Direct3D just to start, only to realize that the best looking effects were pixel-exact 2D operations. Felt really stupid, but then began thinking about writing a software renderer for the majority of software where GPU graphics is total overkill and a huge burden for long-term maintenance.

* Future proof without any missing libraries. My first software rendered applications I wrote for Windows 3.1 still work in both newer versions of Windows and using compatibility layers in Linux. My first 3D accelerated games developed for Windows 2000 stopped working when Windows XP came out just a few years later, due to a bug in a third-party dependency that I could not access. This modern software renderer uses all that experience to give you both reliable programs and good looking graphics.

* Pre-rasterized isometric rendering techniques are actually 2D operations under the hood, which puts the CPU on par with the GPU even if it uses the same optimization. When both are capable of displaying more triangles than pixels with a higher frequency than the monitor can display, the only remaining advantage for the GPU is multi-tasking between CPU and GPU.

* CPU rendering can be more deterministic by linking statically to all rendering algorithms. If only using integer types, it can be 100% bit exact between different computers. Higher determinism also unlocks optimizations that would be too risky on old OpenGL versions, such as dirty rectangles and passive rendering.

* CPUs have a higher frequency, which means that it's actually faster than a GPU for low-resolution 2D rendering where the amount of work per draw call is not significant enough to benefit from the GPU. You can reduce the number of draw calls on a GPU using hardware instancing, but it's much easier to just use CPU rendering with a lower call overhead and keep your code well structured.

* No graphics context required, just independent resources. This allow separating your program into completely independent modules without strange side-effects, which improves testability and quality.

* No feature flags. Every computer has all features, so that you don't have to waste time writing a fallback solution for every feature.

* No device lost exceptions. The CPU will not randomly tell you that it had amnesia and lost all your data.

* No need to mirror changes between CPU and GPU memory with complex synchronization methods to hide the delays of memory transfer. The cache system handles all that for you.

* Can modify the whole graphics pipeline without having to build your own graphics card. This allow learning more about how computer graphics works under the hood.

Border element

Why use this software renderer instead of other software renderers

Unlike most modern software renderers, this one is not just another by-product of someone's curiosity nor GPU emulator. This renderer was created because both graphics APIs and media layers available to Linux were too unstable, non-deterministic and complex to actually be used. A lot of developers abandoned using graphics APIs directly when Direct3D12 and Vulkan came out due to the complexity and OpenGL is a broken mess where literally no feature works the same on every graphics driver. I needed something that was well defined without feature flags, random crashes and heavy dependencies. It was important that the end user didn't have to install anything on the operating system while still having a graphical user experience.

Border element

Doing what the CPU is good at in 450 FPS instead of trying to be a GPU in 30 FPS

Image of a gloomy dungeon in isometric retro style but with modern dynamic light

Most software renderers only try to replicate what the GPU is good at and therefore get around 30 frames per second without any interesting light effects. This library has that too, in case that you need perspective, but it also has depth buffered 2D draw calls and an example of how to use it for an isometric rendering technique using deferred light.

Image of a wooden barrel with diffuse, height and normal maps color encoded

Isometric rendering on the CPU can get hundreds of frames per second with unlimited detail level and heavy effects by avoiding the things that CPUs are bad at. By pre-rasterizing models with fixed camera angles into diffuse, height and normal images, deep sprites can be drawn very quickly by reading memory in a linear cache pattern.

Image of normals in deferred rendering

Border element

Minimal dependency

Image of island surrounded by darkness

You do not need to install any graphics drivers for this renderer to work. Other software renderers often use OpenGL or Direct3D to upload the resulting image, which kind of defeats the whole purpose by still relying on having the drivers installed. Uploading the canvas using a CPU can be done on a background thread while the program does single-threaded logic anyway, which doesn't affect performance at all.

Most media layers are delivered as a dynamic dependency (which may fail to install due to a shitload of external dependencies). Once a media layer stops being ported to new desktop systems, you're left on your own trying to maintain their mess.

In this platform abstraction however, all system specific window management is in a separate module outside of the library, so that it's kept minimal in one place. Only the most essential features (mouse, keyboard, title, windowed, full-screen) are integrated natively on each platform, because it should be easily moved to other systems.

Static linking makes sure that you don't need a complex installer, just copy the folder and run. There's no broken dependencies, missing files, random bugs that comes and goes in different versions...

Dependencies:

* Operating system (If your program could run without an operating system, it would be an operating system)

* Standard C++ library (Can be linked statically using a compiler flag if you don't like making installers, but it's your choice)

* Display server (This is entirely optional, because you can also convert your images to ascii art in real-time for command line applications, which is useful for remote SSH debugging of embedded ARM systems)

Border element

Supported operating systems

Linux Tux Penguin logo Linux is the recommended platform where tools are developed first, because writing a shell script can be done in seconds. Tested on many different Linux distributions, both Debian and Arch derivatives, from Intel based desktops to embedded ARM devices. Most Linux distributions come with the GNU compiler toolchain pre-installed so that you just give access permissions and run your compilation script on a new computer. Some say that CLang is faster, but you can just change compiler before the final release and not have to think about it during development.

Border element

Microsoft Windows logo Microsoft Windows is supported to allow reaching a wider audience of end users. The library can be included as a folder in a Visual Studio or CodeBlocks project, which is a bit more work to install and set up but gives you a debugger. Visual Studio can be installed with CLang to get standard C++ with C extensions. Tested on Windows 7 but should in theory work all the way back to Windows 2000.

Border element

Target hardware

Optimized for Intel/AMD CPUs using SSE2 intrinsics.

Optimized for ARM CPUs using NEON intrinsics.

Also works without SIMD extensons by having a slower fallback implementation.

Border element

Project links

The static library is supposed to be re-built automatically from source code like a part of your own project, so there's no binaries to download. You can inspect all the source code or just cut out the modules you want if you don't trust me.

Source code: https://github.com/Dawoodoz/DFPSR

Discussion: https://handmade.network/p/dfpsr/

Border element

Back to main page

© David Piuva