Pixel-perfect GI Without Ray Tracing

Gin · 7 January 2026

game-dev tech-post gi

Hi! I’m building Anchorfall with a visual style inspired by Core Keeper: chunky pixel art with dynamic lighting, sharp shadows, and real-time global illumination.

To achieve that, I built a custom lighting pipeline that does direct light, indirect light (GI), pixel-perfect shadows and gives strong artistic control. It isn’t physically accurate, but fits pixel art games well.

A final render from Anchorfall (art not final). White-texture version here

It’s inspired by Core Keeper’s own lighting system (which uses heat diffusion in their pipeline) and David Maletz’s work on I Can’t Escape: Darkness. His blog post was the foundation for the indirect lighting approach.

This system is the result of ~8 months of iteration, and easily the steepest rendering learning curve I’ve put myself through. It was initially an indirect-only system, but merging URP’s own light/shadow with the system proved to be brittle and wasteful (we need direct to seed indirect anyway). I wanted something that looked great with pixel art, performed well and had none of the usual artifacts we see in modern, non-baked GI like temporal sampling or ghost trails while allowing artistic choices like dithering, posterization, palette swaps, etc.

I’m really happy with where I landed with this, and despite the length of this article, this system will likely grow to support more artistic control. But even as it stands now, it’s a solid system that feels production-worthy.

This is implemented in Unity (URP + RenderGraph). Code snippets are in C# and HLSL.

Overview
Pass Graph + Texture Formats
Requirements and compromises
1. Occlusion representation: The Conduction Mask
- Grid size + WallBuffer population
- Ambient occlusion
2. GI View Window
3. Direct Lighting (Point Lights)
4. Indirect Light: Thermodynamics of Pixels
Optimizations
Performance data
Closing thoughts

Overview

From a high level, here is how the system works:

It renders a texture that has “conduction” information, or in other words, where in the world light is allowed to exist. This is what we call the conduction mask.

Image of conduction mask — Black pixels represent geometry that blocks light

Point lights are rendered in a separate texture using DDA against the wall buffer. This yields our direct light data and shadow visibility.

Image of direct light — Direct light information. Blue lights are emissives, smaller shadows are shadow proxies.

We then use the data from direct to seed a heat-spread algorithm. This spreads the light, allowing it to turn corners and fill spaces.

Image of indirect light — Resulting indirect light after several heat spread rounds

Direct + indirect are sampled together (AO + conduction + tone map) to produce the final lighting

Then our Lit shader samples the resulting textures with some special handling for vertical geometry (walls, etc), resulting in:

Pass Graph + Texture Formats

I’m not going to walk through RenderGraph wiring, but here’s the minimal pass order + texture contract this system assumes. If you follow this, you’ll match the behavior in the shaders.

Update _CameraView (phase-locked)
ConductionMask         _ConductionMask (global) (R8)
AmbientOcclusion       _AmbientOcclusion (global) (R8) [optional]
Direct Point + Emissive  -> _DirectLightColor (global) (RGBA16F)
                         -> _GI_ShadowMap (global) (R8G8: R=point, G=directional)
                         -> _WallDirect_[PosX/NegX/PosZ/NegZ] (global) (RGBA16F)
Directional Lights      -> add into _DirectLightColor
                         -> write _GI_ShadowMap.G (max blend)
                         -> update _WallDirect_* textures
ExtractEmission         (RGBA16F)
HeatTransfer ping-pong  _IndirectLightColor (Global) (RGBA16F) + Buffer (RGBA16F), optional downsample
Optional Blur           Writes to _IndirectLightColor (RGBA16F)
WallIndirect            -> WallIndirect_PosX/NegX/PosZ/NegZ (RGBA16F)
Sample in Lit shader    (direct + indirect + AO + conduction + tone map)

All textures are linear (sRGB off), no mips, no MSAA, point + clamp.

Implementation detail: all of these are RTHandles. They persist across frames and are reallocated when the GI window size changes (RenderGraph ReAllocateHandleIfNeeded). Heat transfer uses ping-pong buffers (and an optional downsample + upsample path).

Directional, AO and blur passes are optional.

Requirements and compromises

This pipeline makes some strong assumptions to stay fast and stable:

Target hardware: desktop/console-class GPUs. I assume compute is available and memory bandwidth isn’t ultra-tight.
The world must be grid-aligned.
Lighting is simulated in 2D and projected onto 3D geometry, so it has no true height-aware GI. Complex vertical structures won’t look correct.
Light and shadows are pixelated by design (tied to the GI texture resolution).
It assumes a 3D world: we use normals and voxel height to shade ground, wall faces, and wall tops. A 2D-only version is possible, but it would need changes.

1. Occlusion representation: The Conduction Mask

In a typical 3D engine, light is blocked by geometry (meshes). In our world, the “truth” of the level geometry lives in a StructuredBuffer<int> called the _WallBuffer. This is a flat array of integers representing our 1x1 world tiles: 0 for empty, 1 for wall.

To make this data accessible to our pixel shaders, we first render a Conduction Mask. This is a camera-aligned texture (R8) that tells the lighting engine where light is allowed to exist.

The ConductionMask.shader is deceptively simple. It maps the fullscreen UVs to world grid coordinates.

 ConductionMask.shader 
 float frag(Varyings i) : SV_Target
{
    float2 uv = i.texcoord;
    
    float blockAllLightColor = 0;
    float allowAllLightColor = 1;
 
    float2 worldCoord = uvToGrid(uv);
uvToGrid returns grid coordinates, it's defined in the next section
    int2 worldTile = int2(floor(worldCoord));
 
    // Apply offset to convert world coords to buffer indices
    int2 bufferTile = worldTile + int2(_GridOffsetX, _GridOffsetY);
 
    if (bufferTile.x < 0 || bufferTile.x >= _GridWidth || bufferTile.y < 0 || bufferTile.y >= _GridHeight)
    {
        return allowAllLightColor;
    }
 
    int bufferIndex = Convert2DIndexTo1D(bufferTile, _GridWidth);
    int isWall = _WallBuffer[bufferIndex];
    
    if (isWall == 0)
    {
        return allowAllLightColor;
    }
 
    return blockAllLightColor; // Wall cell: no conduction (light can't exist/propagate here)
}
 

We index the wall buffer row-major (x + y * width) where y corresponds to world +Z. The shader helpers used across passes look like this:

 GI_Lib.hlsl (helpers) 
 int Convert2DIndexTo1D(int2 idx, int width)
{
    return idx.x + idx.y * width;
}
 

Optionally, you could choose to sample _WallBuffer directly instead of saving a conduction texture, but having the conduction mask texture helped immensely when things went wrong, so I’m adding both IsObstacle versions below, but beware that the snippets here assume a _ConductionMask texture is present.

 Texture sample version 
 bool IsObstacle(int x, int y)
{
    int bufferX = x + _GridOffsetX;
    int bufferY = y + _GridOffsetY;
 
    // Out-of-bounds = air (streaming-friendly edges)
    if (bufferX < 0 || bufferY < 0 || bufferX >= _GridWidth || bufferY >= _GridHeight)
        return false;
 
    float2 cellPos = float2(x, y) + 0.5;
    float2 conductionUV = gridToViewportUV(cellPos);
    float conductivity = SAMPLE_TEXTURE2D(_ConductionMask, sampler_ConductionMask, conductionUV).r;
    return conductivity < 0.1;
}
 

 _WallBuffer version 
 bool IsObstacle(int x, int y)
{
    int bufferX = x + _GridOffsetX;
    int bufferY = y + _GridOffsetY;
 
    // Out-of-bounds = air (streaming-friendly edges)
    if (bufferX < 0 || bufferY < 0 || bufferX >= _GridWidth || bufferY >= _GridHeight)
        return false;
 
    int bufferIndex = Convert2DIndexTo1D(int2(bufferX, bufferY), _GridWidth);
    return _WallBuffer[bufferIndex] != 0;
}
 

The result is a binary map of the world. White allows conduction, black blocks it (stored in the R channel because the texture is R8). Out-of-bounds is treated as empty/air to keep streaming-friendly behavior at the edges of the simulated window.
For DDA shadow rays, we query _WallBuffer directly so shadows don’t depend on the GI window. This prevents popping.

Grid size + WallBuffer population

The mask only works if the shader knows the grid dimensions and has the wall buffer bound. _GridWidth and _GridHeight define the buffer size, and _GridOffsetX/Y shifts world coordinates into buffer space (so a centered or streaming world can still map into a single flat array).

In Anchorfall, a GIWallBufferManager owns the buffer. It builds a flat ComputeBuffer<int> (0 = empty, 1 = wall) from the MapGen wall layer, binds it as _WallBuffer, and sets the grid globals. It also mirrors the data on the CPU so we can do incremental updates (chunk load/unload or single-tile edits) without GPU readback.

_GridOffset is independent from _CameraView: the camera view defines what area we render, while the grid offset defines where the global wall buffer lives in world space.

 GIWallBufferManager.cs (init) 
 _bufferSize = new int2(diameter, diameter);
_coordinateOffset = diameter / 2;
 
Shader.SetGlobalInt("_GridWidth", _bufferSize.x);
Shader.SetGlobalInt("_GridHeight", _bufferSize.y);
Shader.SetGlobalFloat("_WallHeight", wallHeight);
 
_wallBuffer = new ComputeBuffer(_bufferSize.x * _bufferSize.y, sizeof(int));
_wallBuffer.SetData(_cpuMirror);
Shader.SetGlobalBuffer("_WallBuffer", _wallBuffer);
 
GIBridge.GridSizeOverride = _bufferSize;
GIBridge.GridOffsetOverride = new int2(_coordinateOffset, _coordinateOffset);
 

Any time the wall data changes we push the updated range (or a whole chunk) and bump GIBridge.WallBufferVersion, which lets the GI render feature know the mask needs to refresh.

Ambient occlusion

Right after the conduction mask we compute a lightweight ambient occlusion (AO) texture. It only depends on the wall buffer and the grid mapping. We apply AO at sample-time to darken creases and wall-adjacent areas without affecting light transport.

This AO is purely a contact-darkening term; it doesn’t affect shadowing or bounce, only the final shading.

The AO pass samples the wall buffer in a radius around each pixel, weights by distance (and a softness power), and outputs a single-channel occlusion value. _AORadius is in world units (tile units in my case). Optional Bayer dithering helps hide banding at low resolution.

 AmbientOcclusionGI.shader (core) 
 float2 worldPos = uvToGrid(uv);
float giRes = max(_GI_Resolution_Lib, 1.0);
float2 gridPos = (floor(worldPos * giRes) + 0.5) / giRes;
 
int2 baseTile = int2(floor(gridPos));
float radius = max(_AORadius, 0.01);
int range = (int)ceil(radius);
 
float occSum = 0.0;
float weightSum = 0.0;
 
for (int y = -range; y <= range; y++)
{
    for (int x = -range; x <= range; x++)
    {
        float2 tileCenter = float2(baseTile.x + x + 0.5, baseTile.y + y + 0.5);
        float dist = distance(gridPos, tileCenter);
        if (dist > radius) continue;
 
        float w = 1.0 - saturate(dist / radius);
        w = pow(w, max(_AOPower, 0.01));
        weightSum += w;
 
        if (IsObstacle(baseTile.x + x, baseTile.y + y))
            occSum += w;
    }
}
 
float occ = (weightSum > 0.0) ? (occSum / weightSum) : 0.0;
float ao = 1.0 - saturate(_AOStrength) * occ;
 

2. GI View Window

The light textures have to move with the camera, so we need to tell the shaders where the camera is looking.

The GIUtils.CalculateCameraVisibleArea() function projects the camera view onto the ground plane, then multiplies the result by a padding factor. A value of 2 means we render twice the visible width and height.

Note

Padding is not primarily for lights whose transform is off-frustum. As long as a light’s radius overlaps the GI window, it will be included (even if the light itself is outside the camera frustum). Padding is mostly about diffusion stability, tall objects and hiding the GI window edge. DDA shadow rays query _WallBuffer directly, so occluders outside the GI window still block light without relying on the conduction mask.

Objects are lit based on the tile they occupy. If an object is too tall, you’ll need more padding (or a different sampling strategy), otherwise the object will first appear dark then suddenly light up as the GI window slides into the tile it occupies.

 GIUtils.cs 
 public static float4 CalculateCameraVisibleArea(Camera camera, float cameraViewPadding)
{
    const float groundPlaneY = 0.0f;
 
    Vector2[] corners = { new(0f, 0f), new(1f, 0f), new(0f, 1f), new(1f, 1f) };
 
    float2 minGrid = new float2(float.MaxValue, float.MaxValue);
    float2 maxGrid = new float2(float.MinValue, float.MinValue);
 
    for (int i = 0; i < corners.Length; i++)
    {
        Vector2 corner = corners[i];
        Ray ray = camera.ViewportPointToRay(new Vector3(corner.x, corner.y, 0f));
        float denom = ray.direction.y;
 
        // If the camera ray is parallel to the ground plane (or something goes NaN), we bail out and skip GI for the frame.
        if (math.abs(denom) < 1e-4f)
            return new float4(0f, 0f, 0f, 0f);
 
        float t = (groundPlaneY - ray.origin.y) / denom;
        if (t <= 0.0f || float.IsNaN(t) || float.IsInfinity(t))
            return new float4(0f, 0f, 0f, 0f);
 
        Vector3 hit = ray.origin + ray.direction * t;
        minGrid = math.min(minGrid, new float2(hit.x, hit.z));
        maxGrid = math.max(maxGrid, new float2(hit.x, hit.z));
    }
 
    float2 center = (minGrid + maxGrid) * 0.5f;
    float2 paddedSize = (maxGrid - minGrid) * cameraViewPadding;
 
    float2 minCorner = center - paddedSize * 0.5f;
    return new float4(minCorner.x, minCorner.y, paddedSize);
}
 

GI Resolution

The giResolution parameter controls how many texels we allocate per world unit.

The final texture dimensions are:

textureWidth  = viewWidth  x giResolution
textureHeight = viewHeight x giResolution

So if your camera sees a 30x20 world-unit area (after padding) and giResolution = 4, you get a 120x80 GI texture.

More precisely, the sizing math we use is:

resolution = max(1, giResolution)

// Convert padded world size -> texels
textureWidth  = ceil(viewWidth  * resolution)
textureHeight = ceil(viewHeight * resolution)

// Snap to multiples of resolution (keeps whole-tile alignment)
textureWidth  = ceil(textureWidth  / resolution) * resolution
textureHeight = ceil(textureHeight / resolution) * resolution

// Minimum size guard
textureWidth  = max(textureWidth,  64)
textureHeight = max(textureHeight, 64)

// World size implied by the snapped texture
worldWidth  = textureWidth  / resolution
worldHeight = textureHeight / resolution

That snapping step is important: it guarantees the texture grid lines up with whole world units, which keeps phase locking stable and prevents half-texel drift when the camera moves.

A resolution of 1 would produce a Minecraft-like light, with each tile having a single color. However, since our shadows use the same texture, you would get very blocky shadows.

This value also affects heat diffusion: at higher resolutions, world-space hops span more GI texels. That’s why the adaptive hierarchical sampling exists (covered later).

Phase Locking

We deliberately quantize the GI viewport origin to whole-tile (1 world unit) boundaries.

This does two things:

It eliminates sub-tile jitter where floor()-based mapping would cause texels to “switch” which tile they correspond to.
It enables aggressive caching: the GI textures only need to update when the camera crosses a tile boundary (or lights/walls change), which can skip lots of frames at high GI resolutions.

Trade-off: the GI window moves in discrete steps. With sufficient padding, the camera never sees the window edge, so this stays visually stable.

If you want smoother tracking, snap to GI-texel boundaries (1 / giResolution world units) instead, but you’ll update more often and caching becomes less effective.

 GIUtils.cs 
 private float4 CalculatePhaseLockedCameraView(float2 center, float2 size, int resolution)
{
    float minX = center.x - size.x * 0.5f;
    float minY = center.y - size.y * 0.5f;
 
    // Convert world -> texels
    int minXTexel = Mathf.FloorToInt(minX * resolution);
    int minYTexel = Mathf.FloorToInt(minY * resolution);
 
    // Snap to 1 world unit boundaries (resolution texels)
    minXTexel = Mathf.FloorToInt(minXTexel / (float)resolution) * resolution;
    minYTexel = Mathf.FloorToInt(minYTexel / (float)resolution) * resolution;
 
    float invRes = 1.0f / resolution;
    return new float4(minXTexel * invRes, minYTexel * invRes, size.x, size.y);
}
 

Coordinate Conventions

These are the rules I stick to everywhere. If you keep this mapping consistent, the rest of the system stays stable:

_CameraView = (minX, minZ, width, height) in world units.
GI texture size = viewSize * giResolution (integer texel dimensions).
The conduction mask is rendered at GI resolution (same texel grid as the light buffers).
Full-screen uv is normalized over the GI textures, not the screen.
World tile index: floor(worldPos.xz) (1 cell per world unit).
GI texel center (world units): (floor(worldPos.xz * giRes) + 0.5) / giRes.
When sampling GI or conduction mask, clamp + snap to texel center to avoid boundary flicker.

With _CameraView set, the shader can convert any screen UV to world coordinates:

 GI_Lib.hlsl 
 float2 uvToGrid(float2 uv)
{
    // _CameraView = (minX, minZ, width, height)
    float2 gridPos;
    gridPos.x = _CameraView.x + uv.x * _CameraView.z;
    gridPos.y = _CameraView.y + uv.y * _CameraView.w;
    return gridPos;
}
 

The inverse mapping (world → UV) and texel snapping look like this:

 GI_Lib.hlsl 
 float2 gridToViewportUV(float2 gridPos)
{
    CamBounds bounds = GetGlobalCameraViewData();
    float2 uv;
    uv.x = (gridPos.x - bounds.min.x) / bounds.size.x;
    uv.y = (gridPos.y - bounds.min.y) / bounds.size.y;
    return uv;
}
 
float2 snapUVToTexelCenter(float2 uv)
{
    float2 texelSize = getViewportTexelSize();
    float2 halfTexel = texelSize * 0.5;
    uv = clamp(uv, halfTexel, 1.0 - halfTexel);
    return (floor(uv / texelSize) + 0.5) * texelSize;
}
 

Please note that the GI “grid space” is (worldX, worldZ). I’ll call the second component y in shaders because it’s a 2D texture axis.

This simple linear interpolation maps (0,0) to the bottom-left of our world view and (1,1) to the top-right. Every GI pass uses this function to bridge screen space and world space.

Consistency here matters. If any pass uses a slightly different mapping (different padding, phase lock, or floor vs round), you get flicker and off-by-one lighting bugs.

3. Direct Lighting (Point Lights)

We now have a 2D texture with our light occluders. We are ready to fake some photons.

Gathering Lights

In the RenderFeature, we iterate through Unity’s visible lights array and pack light data into a GPU buffer:

 struct VM_LightData
{
    float4 color;
    float2 position;
    float radius;
    float _padding; // 16 byte alignment
};
 

We don’t need a separate intensity because visibleLight.finalColor already bakes in the Unity light’s intensity + color (and temperature), so the shader only needs radius and position:

 LightRenderFeature.cs 
 // In RecordRenderGraph
 
var lightData = frameData.Get<UniversalLightData>();
 
for (int i = 0; i < lightData.visibleLights.Length; i++)
By using lightData.visibleLights, we don't need to handle culling + range checks, but max lights are capped at 256 by URP. If you need more, gather them in a job or similar.
{
    var visibleLight = lightData.visibleLights[i];
    var unityLight = visibleLight.light;
    var position = unityLight.transform.position;
 
 
    WritePointLight(lightCount,
        new float3(position.x, position.y, position.z),
        visibleLight.finalColor,
        unityLight.range);
WritePointLight writes to the light buffer we'll send to the shaders
}
 

DDA

Instead of the classical approach, we use DDA (Digital Differential Analyzer) to cast rays from each light to each pixel.

More precisely it’s the Amanatides & Woo grid traversal, but it’s often called DDA in gamedev. The key property is stepping from grid boundary to grid boundary, visiting every cell the ray crosses in order.

For shadow casting, we want to know: “does a straight line from the light to this pixel pass through any walls?”

The algorithm works by tracking how far we need to travel in X and Y to cross the next cell boundary. Whichever is closer, we step in that direction:

 PointLights_DDA.hlsl 
 float CalculateShadowDDA(float2 lightPos, float2 pixelPos, float radius)
{
    float2 rayDir = pixelPos - lightPos;
    float rayLength = length(rayDir);
 
    if (rayLength < 1e-4)
        return 1.0;
 
    rayDir /= rayLength;
 
    // Bias the ray off grid boundaries to avoid corner/edge ambiguity.
    const float RAY_BIAS = 1e-4;
    float2 startPos = lightPos + rayDir * RAY_BIAS;
    float2 endPos = pixelPos - rayDir * RAY_BIAS;
    rayLength = max(rayLength - 2.0 * RAY_BIAS, 0.0);
Start/end bias keeps floor() stable when positions land on grid boundaries.
 
    // Grid cell coordinates of starting and ending points
    int2 startCell = int2(floor(startPos));
    int2 endCell = int2(floor(endPos));
 
    int2 cell = startCell;
 
    // Calculate step direction
    int2 step = int2(sign(rayDir));
 
    // Calculate distance to first X and Y cell boundaries
    const float DIR_EPS = 1e-6;
    bool xZero = abs(rayDir.x) < DIR_EPS;
    bool yZero = abs(rayDir.y) < DIR_EPS;
Guard near-zero directions to avoid 1/dir blowups.
 
    float2 tMax;
    if (xZero)
        tMax.x = 1e30;
    else
        tMax.x = (step.x > 0)
            ? (floor(startPos.x) + 1 - startPos.x) / rayDir.x
            : (startPos.x - floor(startPos.x)) / -rayDir.x;
 
    if (yZero)
        tMax.y = 1e30;
    else
        tMax.y = (step.y > 0)
            ? (floor(startPos.y) + 1 - startPos.y) / rayDir.y
            : (startPos.y - floor(startPos.y)) / -rayDir.y;
 
    // Calculate how far to step in each direction to move one cell
    float2 tDelta = float2(
        xZero ? 1e30 : abs(1.0 / rayDir.x),
        yZero ? 1e30 : abs(1.0 / rayDir.y));
 
    float distanceTraveled;
    const float CORNER_EPS = 1e-5;
 
    // Maximum number of steps to prevent infinite loops
    int maxSteps = max(1, (int)ceil(radius * 2.0) + 2);
 
    for (int i = 0; i < maxSteps; i++)
    {
        // IsObstacle MUST use _WallBuffer in this path
        if (IsObstacle(cell.x, cell.y))
IsObstacle treats out-of-bounds as empty (streaming-friendly).
        {
            return 0;
        }
 
        // If we've reached the destination cell, we're not in shadow
        if (cell.x == endCell.x && cell.y == endCell.y)
            return 1.0;
 
        // Step to next cell
        if (tMax.x < tMax.y - CORNER_EPS)
        {
            distanceTraveled = tMax.x;
            tMax.x += tDelta.x;
            cell.x += step.x;
        }
        else if (tMax.y < tMax.x - CORNER_EPS)
        {
            distanceTraveled = tMax.y;
            tMax.y += tDelta.y;
            cell.y += step.y;
        }
        else
        {
            // Corner hit: only block if both adjacent cells are obstacles
            distanceTraveled = tMax.x;
            int2 nextX = cell + int2(step.x, 0);
            int2 nextY = cell + int2(0, step.y);
            if (IsObstacle(nextX.x, nextX.y) && IsObstacle(nextY.x, nextY.y))
                return 0;
 
            tMax.x += tDelta.x;
            tMax.y += tDelta.y;
            cell.x += step.x;
            cell.y += step.y;
        }
 
        // If we've traveled farther than the ray length, we're done
        if (distanceTraveled > rayLength)
            return 1.0;
    }
 
    // If we've exceeded max steps, assume no shadow
    return 1.0;
}
 

Shadow Proxies (Analytical Occluders)

DDA gives grid-accurate shadows from the wall buffer, but it doesn’t cover thin props, dynamic obstacles, or anything that isn’t baked into the grid. For those I use shadow proxies: small analytic occluders that cast soft shadows without touching the wall buffer.

Each proxy is a simple shape (box or circle) with width, length, and penumbra parameters. The proxy buffer is updated on the CPU, culled against the GI view, and capped to maxShadowProxies. Both the point-light and directional-light passes read the same buffer.

Culling is straightforward: cullRadius = max(size) + maxLength + max(penumbraWidth, penumbraLength), then keep the closest N proxies to the GI view center.

 struct VM_ShadowProxyData
{
    float2 position;        // World XZ
    float2 size;            // Half extents (box) or radius (circle)
    float maxLength;        // Shadow length
    float endCapBlend;      // Blend between light dir and proxy axis
    float penumbraWidth;    // Soft edge across width
    float penumbraLength;   // Soft edge along length
    float penumbraPower;    // Curve sharpness
    float shape;            // 0 = box, 1 = circle
    float widthMode;        // 0 = projected width, 1 = max axis
    float penumbraMode;     // inward/outward/both
};
 

Core idea: treat the proxy like a capsule/box aligned with the light direction, compute perp and along distances, and attenuate by two penumbra curves.

 ComputePointLightProxyShadow (condensed) 
 float ComputePointLightProxyShadow(float2 lightPos, float2 pixelPos)
{
    if (_ShadowProxyCount <= 0) return 1.0;
 
    float2 ray = pixelPos - lightPos;
    float rayLen = length(ray);
    if (rayLen < 1e-4) return 1.0;
 
    float2 dir = ray / rayLen;
    float shadow = 1.0;
 
    for (int i = 0; i < _ShadowProxyCount; i++)
    {
        VM_ShadowProxyData proxy = _ShadowProxyBuffer[i];
        float2 toProxy = proxy.position - lightPos;
        float proxyDist = length(toProxy);
        if (proxyDist < 1e-4) continue;
 
        float2 dirFlat = toProxy / proxyDist;
        float2 dirShadow = normalize(lerp(dir, dirFlat, saturate(proxy.endCapBlend)));
        float2 perpDir = float2(-dirShadow.y, dirShadow.x);
 
        float occluderDist = dot(toProxy, dirShadow);
        if (occluderDist <= 0.0) continue;
 
        float2 toPixelFromProxy = pixelPos - proxy.position;
        float alongDist = dot(toPixelFromProxy, dirShadow);
        if (alongDist <= 0.0) continue;
 
        float perpDist = abs(dot(toPixelFromProxy, perpDir));
        float width = GetShadowProxyWidth(proxy, perpDir);
 
        if (alongDist > proxy.maxLength + proxy.penumbraLength) continue;
 
        float edge = ComputePenumbraFactor(perpDist, width, proxy.penumbraWidth, proxy.penumbraPower, proxy.penumbraMode);
        float len  = ComputePenumbraFactor(alongDist, proxy.maxLength, proxy.penumbraLength, proxy.penumbraPower, proxy.penumbraMode);
 
        float occlusion = edge * len;
        shadow = min(shadow, 1.0 - occlusion);
        if (shadow <= 0.001) break;
    }
 
    return shadow;
}
 

In the lighting loop I combine it like this:

shadow = min(ShadowDDA, ShadowProxy);

Directional lights use the same proxy logic (just with a fixed light direction).

Shadow proxy handling is basic and has room for improvement. It should be possible to project shadows in the walls non-uniformly so they retain their shape when they hit walls. Right now, if a shadow hits a wall, the wall gets shadowed from top to bottom. Penumbra helps massively but hard shadows can be distracting when they just barely touch a wall.

Multiple Render Targets: One Pass, Many Textures

Warning

If you are targeting Shader Model 3 (D3D9) or OpenGL ES, MRTs have a cap of 4 render targets. Pack 2 wall faces per texture to get around this.

We don’t just output a single color texture, we use Multiple Render Targets (MRT) to write several textures simultaneously:

 struct FragmentOutput
{
    float4 color   : SV_Target0;  // Direct light color
    float4 shadow  : SV_Target1;  // Shadow map (R=point, G=directional)
    float4 wallFace0 : SV_Target2;  // +X face (RGB)
    float4 wallFace1 : SV_Target3;  // -X face (RGB)
    float4 wallFace2 : SV_Target4;  // +Z face (RGB)
    float4 wallFace3 : SV_Target5;  // -Z face (RGB)
};
 

Why so many outputs? Because walls need per-face lighting.

The Wall Problem

Ground tiles are simple, they face up, sample the GI texture at their world position, done. But walls have sides. A wall lit from the east should have a bright east face and a dark west face. They also are perfectly aligned with the conduction mask occluders, so sampling at their position will always return no light.

For a while, we used the wall normal to sample the closest adjacent pixel to get around this limitation, but that causes “bands” that look distracting when shadows wrap around corners, and Ambient Occlusion interfered with the wall shading.

Our solution: for each wall texel, we store four lighting values (one per cardinal direction: +X, -X, +Z, -Z) as four separate RGBA16F textures. Each texture holds full RGB for one face. This keeps full color precision, avoids face packing math, and still lets us pick a dominant face with a single texture fetch. Non-wall texels stay zero.

This is a lot of textures (4 for direct, 4 for indirect), yes. But giResolution scales the tile resolution, not screen resolution. A 480x270 internal resolution game like Anchorfall, with giResolution 16 and padding 2 would produce textures at 960x540, which comes at ~4MB per texture at RGBA16F.

For the whole light system (assuming heat spread is done at full res), we have:

2x R8, 1x R8G8, 13x RGBA16F. At the resolution above, that amounts to just under 56MB of VRAM. That is next to nothing for modern hardware and even impressive for a whole lighting system.

Note

A Texture2DArray would be a slightly more elegant solution. They are not used here due to Unity limitations when rendering to multiple slices of the same array.

 // Which faces of this wall tile are exposed (not buried in other walls)?
float4 GetWallFaceMask(int2 cell)
{
    if (!IsObstacle(cell.x, cell.y))
        return float4(0, 0, 0, 0);  // Not a wall
 
    // Check each neighbor - if neighbor is NOT a wall, this face is exposed
    float facePosX = IsObstacle(cell.x + 1, cell.y) ? 0.0 : 1.0;
    float faceNegX = IsObstacle(cell.x - 1, cell.y) ? 0.0 : 1.0;
    float facePosZ = IsObstacle(cell.x, cell.y + 1) ? 0.0 : 1.0;
    float faceNegZ = IsObstacle(cell.x, cell.y - 1) ? 0.0 : 1.0;
 
    return float4(facePosX, faceNegX, facePosZ, faceNegZ);
}
 
// How much does this light contribute to each face?
float4 GetWallFacing(float2 dirToLight)
{
    return float4(
        saturate(dirToLight.x),   // +X face lit when light is to the right
        saturate(-dirToLight.x),  // -X face lit when light is to the left
        saturate(dirToLight.y),   // +Z face lit when light is above
        saturate(-dirToLight.y)   // -Z face lit when light is below
    );
}
 

For each light, we calculate the direction to the light source, multiply the face mask by the facing weights, and accumulate into the wall textures. Later, when rendering a wall sprite, we sample these textures using the sprite’s world normal to pick the right face. This is intentionally simple and snappy: it biases light toward the most directly oriented face without needing per-pixel normals in the GI texture. If you want softer transitions, raise the facing term to a power (e.g. pow(saturate(dirToLight.x), k)).

On bandwidth-constrained GPUs you may want fewer RTs or pack wall faces.

The Rendering Loop

Now for the actual rendering. Every pixel in our GI texture runs a fragment shader that loops through all point lights:

 PointLights_DDA.hlsl 
 FragmentOutput frag(Varyings input)
{
    FragmentOutput output;
    output.color = float4(0);
    output.shadow = float4(0);
    output.wallFace0 = float4(0);
    output.wallFace1 = float4(0);
    output.wallFace2 = float4(0);
    output.wallFace3 = float4(0);
 
    float2 gridPos = uvToGrid(input.texcoord);
    float giRes = max(_GI_Resolution_Lib, 1.0);
    float2 gridPosCell = (floor(gridPos * giRes) + 0.5) / giRes;
Snap to GI texel centers so wall/ground sampling is stable across camera movement.
 
    int2 wallCell = int2(floor(gridPos));
    // IsObstacle MUST use _WallBuffer in this path, otherwise popping might occur
    bool isWall = IsObstacle(wallCell.x, wallCell.y);
Walls are encoded in the wall buffer (1 = wall).
 
    float totalShadow = 0.0;
    float3 wallFace0 = float3(0, 0, 0);
    float3 wallFace1 = float3(0, 0, 0);
    float3 wallFace2 = float3(0, 0, 0);
    float3 wallFace3 = float3(0, 0, 0);
Wall lighting accumulates into per-face outputs (valid only on wall texels).
 
    if (isWall)
    {
        float2 wallSamplePos = gridPosCell;
Base wall position; per-face offsets push just outside the wall cell.
        float4 faceMask = GetWallFaceMask(wallCell);
        float2 cellMin = float2(wallCell);
        float2 cellMax = cellMin + 1.0;
        const float faceEps = 1e-3;
Small epsilon pushes the ray target just outside the wall cell (aligns wall/ground shadows).
        const float4 zero4 = float4(0, 0, 0, 0);
 
        for (int i = 0; i < _LightCount; i++)
        {
            float2 lightGridPos = _LightBuffer[i].position;
            float4 lightColor = _LightBuffer[i].color;
            float lightRadius = _LightBuffer[i].radius;
 
            float dist = distance(wallSamplePos, lightGridPos);
            if (dist > lightRadius)
                continue;
 
            float2 dirToLight = lightGridPos - wallSamplePos;
            float invLen = rsqrt(max(dot(dirToLight, dirToLight), 1e-6));
            dirToLight *= invLen;
            float4 facing = GetWallFacing(dirToLight);
            float4 faceWeight = faceMask * facing;
Face weights bias toward the wall face oriented to the light.
 
            if (all(faceWeight == zero4))
                continue;
 
            float falloff = 1.0 - saturate(dist / lightRadius);
            falloff = pow(falloff, _LightSoftness);
 
            float3 directContribution = lightColor.rgb * falloff;
 
            float4 faceShadow = float4(1, 1, 1, 1);
            if (faceWeight.x > 0.0)
Each face gets its own DDA to the adjacent cell + proxy shadow.
            {
                float2 facePos = float2(cellMax.x + faceEps, wallSamplePos.y);
                float shadowFactor = CalculateShadowDDA(lightGridPos, facePos, lightRadius);
                float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, facePos);
                faceShadow.x = min(shadowFactor, proxyShadowFactor);
            }
            if (faceWeight.y > 0.0)
            {
                float2 facePos = float2(cellMin.x - faceEps, wallSamplePos.y);
                float shadowFactor = CalculateShadowDDA(lightGridPos, facePos, lightRadius);
                float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, facePos);
                faceShadow.y = min(shadowFactor, proxyShadowFactor);
            }
            if (faceWeight.z > 0.0)
            {
                float2 facePos = float2(wallSamplePos.x, cellMax.y + faceEps);
                float shadowFactor = CalculateShadowDDA(lightGridPos, facePos, lightRadius);
                float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, facePos);
                faceShadow.z = min(shadowFactor, proxyShadowFactor);
            }
            if (faceWeight.w > 0.0)
            {
                float2 facePos = float2(wallSamplePos.x, cellMin.y - faceEps);
                float shadowFactor = CalculateShadowDDA(lightGridPos, facePos, lightRadius);
                float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, facePos);
                faceShadow.w = min(shadowFactor, proxyShadowFactor);
            }
 
            float4 faceContribution = faceWeight * faceShadow;
 
            wallFace0 += directContribution * faceContribution.x;
            wallFace1 += directContribution * faceContribution.y;
            wallFace2 += directContribution * faceContribution.z;
            wallFace3 += directContribution * faceContribution.w;
        }
 
        for (int i = 0; i < _EmissiveLightCount; i++)
Emissives skip DDA but still light wall faces.
        {
            float2 lightGridPos = _EmissiveLightBuffer[i].position;
            float4 lightColor = _EmissiveLightBuffer[i].color;
            float lightRadius = _EmissiveLightBuffer[i].radius;
 
            float dist = distance(wallSamplePos, lightGridPos);
            if (dist > lightRadius)
                continue;
 
            float falloff = 1.0 - saturate(dist / lightRadius);
            falloff = pow(falloff, _LightSoftness);
 
            float3 directContribution = lightColor.rgb * falloff;
            float2 dirToLight = lightGridPos - wallSamplePos;
            float invLen = rsqrt(max(dot(dirToLight, dirToLight), 1e-6));
            dirToLight *= invLen;
            float4 facing = GetWallFacing(dirToLight);
            float4 faceWeight = faceMask * facing;
 
            wallFace0 += directContribution * faceWeight.x;
            wallFace1 += directContribution * faceWeight.y;
            wallFace2 += directContribution * faceWeight.z;
            wallFace3 += directContribution * faceWeight.w;
        }
    }
    else
    {
        // Process each light for ground
        for (int i = 0; i < _LightCount; i++)
        {
            float2 lightGridPos = _LightBuffer[i].position;
            float4 lightColor = _LightBuffer[i].color;
            float lightRadius = _LightBuffer[i].radius;
 
            float dist = distance(gridPos, lightGridPos);
            if (dist > lightRadius)
                continue;
 
            float shadowFactor = CalculateShadowDDA(lightGridPos, gridPosCell, lightRadius);
Ground samples DDA to the GI texel center (not the wall boundary).
            float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, gridPosCell);
            shadowFactor = min(shadowFactor, proxyShadowFactor);
 
            totalShadow = max(totalShadow, shadowFactor);
 
            float falloff = 1.0 - saturate(dist / lightRadius);
            falloff = pow(falloff, _LightSoftness);
 
            float3 directContribution = lightColor.rgb * falloff * shadowFactor;
            output.color.rgb += directContribution;
        }
 
        // Process emissive-only lights (no shadows)
        for (int i = 0; i < _EmissiveLightCount; i++)
        {
            float2 lightGridPos = _EmissiveLightBuffer[i].position;
            float4 lightColor = _EmissiveLightBuffer[i].color;
            float lightRadius = _EmissiveLightBuffer[i].radius;
 
            float dist = distance(gridPos, lightGridPos);
            if (dist > lightRadius)
                continue;
 
            float falloff = 1.0 - saturate(dist / lightRadius);
            falloff = pow(falloff, _LightSoftness);
 
            output.color.rgb += lightColor.rgb * falloff;
        }
    }
 
    float shadowMask = isWall ? 0.0 : 1.0;
Shadow map only stores visibility for non-wall texels.
    output.shadow = float4(totalShadow * shadowMask, 0.0, 0.0, 0.0);
    output.wallFace0 = float4(wallFace0, 1.0);
    output.wallFace1 = float4(wallFace1, 1.0);
    output.wallFace2 = float4(wallFace2, 1.0);
    output.wallFace3 = float4(wallFace3, 1.0);
    return output;
}
 

A few things to note:

IsObstacle checks: In this path, IsObstacle needs to be wired to use _WallBuffer directly. The conduction mask is view-dependent, which might cause popping if padding is not high enough.

Light falloff: The pow(falloff, _LightSoftness) controls how “hard” or “soft” the light edge is. A value of 1.0 gives linear falloff; higher values create a sharper cutoff at the edge.

Additive accumulation: Multiple lights simply add together. Two overlapping red lights make a brighter red. A red and blue light make purple (well, magenta). This isn’t physically based, but it behaves intuitively and looks natural for games.

Directional Lights: The Sun Problem

Point lights are straightforward: position, radius, done. But directional lights (sun, moon) have no position. They have a direction and cast shadows based on angle.

We handle these in a separate pass with their own data structure:

 struct VM_DirectionalLightData {
    float3 direction;              // Normalized light direction
    float4 color;
    float shadowDistanceMultiplier; // Pre-computed: 1.0 / max(abs(direction.y), 0.02)
    float baseShadowDistance;
    float bounceIntensity;
    float2 _padding;               // 16 byte alignment
};
 

Shadow length depends on sun angle. When the sun is directly overhead (direction.y ≈ -1), shadows are short. At sunset (direction.y ≈ 0), shadows stretch to infinity.

We pre-compute shadowDistanceMultiplier on the CPU:

maxShadowDistance = baseShadowDistance x shadowDistanceMultiplier

The DDA then traces from each pixel in the light direction, checking against a fixed _WallHeight (uniform wall height) to approximate long shadows. The traversal is still 2D over the wall grid, _WallHeight and sun angle just scale the effective shadow reach.

Here’s the condensed pass logic:

 DirectionalLights_DDA.hlsl 
 float DirectionalShadowDDA(float2 pixelPos, float3 lightDir,
                           float baseShadowDistance, float shadowDistanceMultiplier)
{
    float3 rayDir3D = -lightDir;
    float2 rayDir = normalize(rayDir3D.xz);
    float rayDirXZLen = length(rayDir3D.xz);
    if (rayDirXZLen < 1e-4) return 1.0;
 
    const float DIR_EPS = 1e-6;
    const float RAY_BIAS = 1e-4;
    float2 startPos = pixelPos + rayDir * RAY_BIAS;
    int2 cell = int2(floor(startPos));
    int2 step = int2(sign(rayDir));
    bool xZero = abs(rayDir.x) < DIR_EPS;
    bool yZero = abs(rayDir.y) < DIR_EPS;
 
    float2 tMax;
    tMax.x = xZero ? 1e30 :
        (step.x > 0 ? (floor(startPos.x) + 1 - startPos.x) / rayDir.x
                    : (startPos.x - floor(startPos.x)) / -rayDir.x);
    tMax.y = yZero ? 1e30 :
        (step.y > 0 ? (floor(startPos.y) + 1 - startPos.y) / rayDir.y
                    : (startPos.y - floor(startPos.y)) / -rayDir.y);
 
    float2 tDelta = float2(
        xZero ? 1e30 : abs(1.0 / rayDir.x),
        yZero ? 1e30 : abs(1.0 / rayDir.y));
 
    float maxDist = baseShadowDistance * shadowDistanceMultiplier;
    int maxSteps = 2 * (_GridWidth + _GridHeight);
 
    for (int i = 0; i < maxSteps; i++)
    {
        if (BlocksDirectionalCell(cell, startPos, rayDir3D, rayDirXZLen))
            return 0.0;
 
        if (tMax.x < tMax.y) { tMax.x += tDelta.x; cell.x += step.x; }
        else                 { tMax.y += tDelta.y; cell.y += step.y; }
 
        float traveled = min(tMax.x, tMax.y);
        if (traveled > maxDist) return 1.0;
 
        int bufferX = cell.x + _GridOffsetX;
        int bufferY = cell.y + _GridOffsetY;
        if (bufferX < 0 || bufferY < 0 || bufferX >= _GridWidth || bufferY >= _GridHeight)
            return 1.0;
    }
    return 1.0;
}
 
float3 direct = 0;
float totalShadow = 0;
for (int i = 0; i < _DirectionalLightCount; i++)
{
    VM_DirectionalLightData lightData = _DirectionalLightBuffer[i];
    float3 lightDir = lightData.direction;
    float4 lightColor = lightData.color;
    float shadowDistanceMultiplier = lightData.shadowDistanceMultiplier;
    float baseShadowDistance = lightData.baseShadowDistance;
 
    float visibility = (lightDir.y >= 0.0) ? 0.0
        : DirectionalShadowDDA(gridPosCell, lightDir, baseShadowDistance, shadowDistanceMultiplier);
    visibility = min(visibility, ComputeDirectionalProxyShadow(gridPosCell, lightDir));
 
    float angle = saturate(-lightDir.y * 2.0);
    angle = lerp(angle, 1.0, _GI_DirectionalAngleScale);
 
    totalShadow = max(totalShadow, visibility);
    direct += lightColor.rgb * angle * visibility;
    // Wall faces use the same face-weight accumulation as point lights,
    // but their shadow rays start just outside each face to avoid self-occlusion.
}
output.color.rgb += direct;
output.shadow.g = totalShadow * shadowMask;
 

The height test is a simple wall-height check against the ray:

 Directional shadow height check (condensed) 
 bool BlocksDirectionalCell(int2 cell, float2 startPos, float3 rayDir3D, float rayDirXZLen)
{
    if (!IsObstacle(cell.x, cell.y)) return false;
 
    float distanceToWall = length(float2(cell) + 0.5 - startPos);
    if (abs(rayDir3D.y) > 0.001 && rayDirXZLen > 1e-4)
    {
        float verticalRate = rayDir3D.y / rayDirXZLen;
        float rayHeightAtWall = distanceToWall * verticalRate;
        return rayHeightAtWall < _WallHeight; // uniform wall height
    }
    return true;
}
 

This pass uses additive blending for direct color and max blending for shadow visibility (G channel), so “any directional light wins.” It also reads the existing wall-direct textures and adds its contribution before writing out the updated wall-direct RTs.

The angle-based brightness is blended by _GI_DirectionalAngleScale: 0 = full angle modulation, 1 = no angle modulation (flat intensity).

Emissive Lights (No Shadows)

Regular lights cast shadows. But what about glowing crystals, lava pools, or spell projectiles? These are emissive lights, they illuminate their surroundings but don’t cast shadows.

We handle these separately:

 struct VM_EmissiveLightData {
    float2 position;
    float4 color;
    float radius;
    float _padding; // 16 byte alignment
};
 

Emissives are uploaded into their own buffer and processed in the same pass as point lights. They add to direct lighting (ground) and wall face textures, but skip DDA shadows entirely. In my implementation, emissives bypass the maxLights cap.

Because emission is extracted from the direct texture right after this pass, emissive lights automatically seed indirect lighting too. Emissive color is pre-multiplied by intensity on the CPU, so the shader only needs position, radius, and color.

Shadow map encoding

Note

I’ve used “shadow map” loosely here, this is not a shadow map in the traditional sense, it’s closer to a visibility texture.

The shadow texture uses this encoding:

R channel: Point light visibility (max over point lights; 0 = none reach, 1 = at least one reaches)
G channel: Directional light visibility (0 = blocked, 1 = lit)
Combined: max(R, G) gives the quick “directly lit” hint

In the point-light pass we only write R (and set G to 0). The directional pass writes only G. We keep them separate because the sun and local lights behave differently, but the combined visibility is useful for diffusion heuristics. In HeatTransfer, we use max(R, G) to decide how aggressively to fill in shadows.

4. Indirect Light: Thermodynamics of Pixels

This is the part that produces the GI look: indirect bounce via diffusion.

For that, we’ll treat light as if it was heat.

Imagine a lit floor tile as a hot plate. Darkness is cold air. We run a simulation where heat (light energy) naturally flows from hot areas to cold areas, provided there’s a conductive medium (air, not walls) connecting them. Run this simulation enough times, and light “spreads” into shadowed corners just like real indirect illumination.

Heat spreading through each heat spread step — 23 heat spread iterations visualized

Step 1: Extract Emission

Before we can diffuse anything, we need to seed the heat buffer with “new energy.” We take the direct light texture and extract a percentage of it:

 ExtractEmission.hlsl 
 float4 directLight = SAMPLE_TEXTURE2D(_MainTex, sampler_MainTex, uv);
 
// Non-linear extraction to prevent white hotspots
float maxChannel = max(directLight.r, max(directLight.g, directLight.b));
float curve = 1.0 - exp(-maxChannel * 2.0);  // Exponential rolloff
float scaleFactor = _EmissionStrength * curve / (maxChannel + 0.001);
 
float4 emission = directLight * scaleFactor;
 
// Boost saturation to compensate for averaging
float luminance = dot(emission.rgb, float3(0.299, 0.587, 0.114));
emission.rgb = luminance + (emission.rgb - luminance) * 1.2;
 

We use a non-linear curve (1 - e^(-x)). Without this, bright light sources would create white hotspots that dominate the scene. The exponential rolloff compresses bright values while preserving color in mid-tones.

The saturation boost at the end counteracts the desaturation that happens when you average colors repeatedly. Without it, orange torchlight turns into muddy beige after a few diffusion iterations.

Extraction runs once per frame; the heat transfer loop then iterates on that seed (we don’t inject new emission every iteration).

Step 2: The Ping-Pong Diffusion

HeatTransfer.shader runs in a ping-pong loop: read from texture A, write to texture B, swap, repeat.

 HeatTransfer.hlsl 
 // Standard 8-neighbor offsets
static const float2 offsets[8] = {
    float2(-1, -1), float2(0, -1), float2(1, -1),
    float2(-1, 0),                 float2(1, 0),
    float2(-1, 1),  float2(0, 1),  float2(1, 1)
};
 
// Diagonal neighbors get less weight (1 / sqrt(2))
static const float weights[8] = {
    0.7071, 1.0, 0.7071,
    1.0,         1.0,
    0.7071, 1.0, 0.7071
};
 

Each pixel samples its 8 neighbors using a mix of world-space and texel-space offsets. The far-field taps use _DiffusionDistance (world units), while the near-field taps are a small number of texels for continuity. The neighbor’s heat flows into the current pixel proportionally to:

The neighbor’s conductivity (is it air or wall?)
The current pixel’s conductivity
The distance weight
The global diffusion rate

Diffusion update equation

Here’s the iteration skeleton for each heat spread round:

 HeatTransfer.hlsl 
 float4 centerHeat = SAMPLE_TEXTURE2D(_ColorBuffer, sampler_ColorBuffer, uv);
float centerCond = SAMPLE_TEXTURE2D(_ConductionMask, sampler_ConductionMask, uv).r;
if (centerCond < 1e-4) return centerHeat;
 
float4 shadowInfo = SAMPLE_TEXTURE2D(_ShadowMap, sampler_ShadowMap, uv);
float directVisibility = max(shadowInfo.r, shadowInfo.g);
float inShadow = 1.0 - directVisibility;
 
CamBounds bounds = GetGlobalCameraViewData();
float2 worldToUV = float2(1.0 / bounds.size.x, 1.0 / bounds.size.y);
float2 sampleDistance = _DiffusionDistance * worldToUV;
 
float3 accum = 0.0;
float3 wsum = 0.0;
 
// Loop neighbors (tiered if adaptive). Offsets + distanceWeight are precomputed.
for (int i = 0; i < 8; i++)
{
    float2 nUV = uv + offsets[i] * sampleDistance;
    float4 nHeat = SAMPLE_TEXTURE2D(_ColorBuffer, sampler_ColorBuffer, nUV);
    float nCond = SAMPLE_TEXTURE2D(_ConductionMask, sampler_ConductionMask, nUV).r;
 
    float weight = nCond * centerCond * weights[i] * _DiffusionRate;
 
    // Stylized per-channel (see next section)
    float3 channelPresence = saturate(nHeat.rgb + 0.1);
    float3 channelWeights = weight * channelPresence;
    accum += nHeat.rgb * channelWeights;
    wsum += channelWeights;
}
 
accum /= max(wsum, 1e-4);
 
float blend = lerp(0.5, 0.8, inShadow) * _DiffusionRate;
float3 newHeat = lerp(centerHeat.rgb, accum, blend);
 
// Optional dark-area bias (keeps deep shadows from staying dead)
float centerLum = dot(centerHeat.rgb, float3(0.299, 0.587, 0.114));
float darkness = 1.0 - saturate(centerLum);
newHeat *= 1.0 + _RangeBoost * darkness * inShadow;
 
newHeat *= _IntensityMultiplier;
 
// Soft cap
float maxChannel = max(newHeat.r, max(newHeat.g, newHeat.b));
if (maxChannel > 2.0)
    newHeat *= 2.0 / maxChannel;
 
return float4(newHeat, centerHeat.a);
 

Per-Channel Diffusion (Stylized)

Here’s a subtle but important detail. Early versions of the system diffused all channels together. Red, green, and blue moved as one. That’s the physically-based approach, and it already mixes colors correctly. But it tends to wash out saturation in pixel art.

What I ended up using is a stylized per-channel diffusion: each channel “pushes through” independently and is normalized separately. This keeps saturated colors punchier and makes color bleeding read better at low resolution. It is not physically based, it’s an artistic control knob.

 // Each channel can spread independently
float3 channelPresence = saturate(neighborHeat.rgb + 0.1);
float3 channelWeights = pathConductivity * channelPresence;
 
// Accumulate each channel independently
accumulatedHeat.r += neighborHeat.r * channelWeights.r;
accumulatedHeat.g += neighborHeat.g * channelWeights.g;
accumulatedHeat.b += neighborHeat.b * channelWeights.b;
 
totalWeightPerChannel += channelWeights;
 
// Later: normalize each channel independently
if (totalWeightPerChannel.r > 0.001)
    accumulatedHeat.r /= totalWeightPerChannel.r;
 

The channelPresence bias (+ 0.1) ensures that even dark areas can receive color. Otherwise, black pixels would never pick up any light.

If you want the physically-based version, keep the weights scalar and accumulate RGB as a vector:

float3 accum = 0.0;
float wsum = 0.0;
accum += neighborHeat.rgb * pathConductivity;
wsum  += pathConductivity;
float3 newHeat = (wsum > 1e-4) ? (accum / wsum) : centerHeat.rgb;

Solving the Resolution Problem: Hierarchical Sampling

At high GI resolutions (e.g., giResolution = 16), the world-space hop spans many GI texels (skipping over them), while texel-scale hops alone would shrink world-space reach. The result was either blocky diffusion or very slow spread.

But if we increased the sampling distance in GI texels, we got “grid artifacts” (black dots) where sparse taps miss nearby lit texels, leaving isolated unlit holes.

To prevent this, we use Adaptive Hierarchical Sampling. The shader detects when the GI-texel distance is large and switches to a three-tier approach:

 float samplingPixelDistance = _DiffusionDistance * pixelsPerWorldUnit;
bool useAdaptiveSampling = samplingPixelDistance > 2.5;
 
if (useAdaptiveSampling)
{
    // Tier 1: Near-field (30%) - local smoothness
    for (int i = 0; i < 8; i++) {
        float2 nearSampleUV = viewportUV + offsets[i] * texelSize * 1.5;
        // ... accumulate with 30% weight
    }
 
    // Tier 2: Mid-field (30%) - bridge the gap
    for (int j = 0; j < 8; j++) {
        float2 midSampleUV = viewportUV + offsets[j] * sampleDistance * 0.5;
sampleDistance is the UV offset corresponding to _DiffusionDistance in world units (i.e., a world-space hop expressed in GI texture UVs).
        // ... accumulate with 30% weight
    }
 
    // Tier 3: Far-field (40%) - long-range transport
    for (int k = 0; k < 8; k++) {
        float2 farSampleUV = viewportUV + offsets[k] * sampleDistance;
        // ... accumulate with 40% weight
    }
}
 

Near-Field (30%): Samples at 1.5 GI texels, fills immediate gaps
Mid-Field (30%): Samples at 50% world distance, bridges near and far
Far-Field (40%): Samples at full world distance, long-range light transport

This allows light to travel across larger distances without needing hundreds of iterations (far-field) while maintaining smooth gradients (near-field).

The 30 / 30 / 40 split wasn’t analytically derived. It came from iterative tuning with two constraints:

Near-field samples must be strong enough to kill holes
Far-field samples must dominate transport distance (that’s the point of hierarchical sampling)

The mid-field tier exists purely to smooth the transition between those two regimes. Its weight is the least sensitive; it mainly prevents visible “bands” where near and far contributions meet.

One caveat: the conduction check is endpoint-only (center + neighbor). With far-tier samples, that can jump across thin walls if both endpoints are air. If that shows up, the fix is straightforward: only allow far-tier samples when the straight segment is unobstructed. A cheap version is to take 3-6 steps along the segment and multiply conductivities; if any step hits a wall, zero out that far sample.

Conduction in Action

Crucially, every diffusion sample multiplies by the Conduction Mask:

 float neighborConductivity = SAMPLE_TEXTURE2D(_ConductionMask, ...).r;
float pathConductivity = neighborConductivity * centerConductivity * weight;
 

If either the source or destination is a wall (conductivity 0), no heat flows. For near-field taps this is enough to respect walls and doorways within the simulated window. For far-tier taps, either accept some leakage as a tradeoff or add a cheap segment check (described above) to prevent hopping across thin walls.

The Blur Pass (Optional)

Depending on your settings, you might see subtle grid patterns, especially at lower resolutions. An optional two-pass Gaussian blur can smooth these artifacts:

 EdgeAwareBlur.hlsl 
 // Edge-aware: only blur between similar surfaces
bool centerIsObstacle = centerConductivity < 0.1;
bool neighborIsObstacle = neighborConductivity < 0.1;
 
if (centerIsObstacle == neighborIsObstacle)
{
    // Same surface type - include in blur
    color += sampleColor * weights[i];
    totalWeight += weights[i];
}
 

The blur is conductivity-aware: it won’t smear light across wall boundaries. We also apply brightness preservation to prevent the blur from darkening the image.

WallIndirect pass: projecting bounce onto walls

Indirect light only exists in conductive cells. Wall cells are black in the indirect buffer because the conduction mask blocks diffusion there. That is correct for ground, but it means wall faces would read zero if we sampled _IndirectLightColor directly.

The WallIndirect pass is a tiny projection pass that turns the indirect buffer into four wall-face textures, mirroring how direct lighting already works. It runs after heat transfer (and after blur if enabled), reads the final indirect texture, and for each wall texel copies the indirect value from the adjacent air cell into the matching face output.

Implementation details:

Fullscreen pass with 4 MRT outputs: _WallIndirect_PosX/_NegX/_PosZ/_NegZ.
Uses the wall buffer to detect wall cells and exposed faces (stable, not view-dependent).
Samples at GI texel centers to avoid jitter.

 WallIndirectGI.shader (condensed) 
 float2 gridPos = uvToGrid(input.uv);
float giRes = max(_GI_Resolution_Lib, 1.0);
float2 gridPosCell = (floor(gridPos * giRes) + 0.5) / giRes;
 
int2 cell = int2(floor(gridPos));
if (!IsObstacle(cell.x, cell.y))
    return output; // Not a wall
 
float4 faceMask = GetWallFaceMask(cell);
 
float3 posX = faceMask.x > 0.0 ? SampleIndirectAt(gridPosCell + float2(1, 0)) : 0;
float3 negX = faceMask.y > 0.0 ? SampleIndirectAt(gridPosCell + float2(-1, 0)) : 0;
float3 posZ = faceMask.z > 0.0 ? SampleIndirectAt(gridPosCell + float2(0, 1)) : 0;
float3 negZ = faceMask.w > 0.0 ? SampleIndirectAt(gridPosCell + float2(0, -1)) : 0;
 
output.wallFace0 = float4(posX, 1.0);
output.wallFace1 = float4(negX, 1.0);
output.wallFace2 = float4(posZ, 1.0);
output.wallFace3 = float4(negZ, 1.0);
 

SampleIndirectAt maps world to UVs, clamps, and snaps to the texel center before sampling, so results are stable under phase-locked camera motion. The output textures are later read by SampleWallIndirect in GI_Lib.hlsl when a wall face is shaded. Because this is just a neighbor copy, it is cheap and it inherits whatever diffusion, blur, and shadow handling already exists in the indirect buffer.

Combining Everything

Direct + indirect are merged in GI_Lib.hlsl at sample time, where we also apply AO, conductivity, and tone mapping:

Note: ground lighting is masked by conductivity (walls → 0). Wall faces bypass conduction and are combined per face. AO stays on the ground path; if you want AO on walls, inject it into the wall sampling path.

Tone mapping happens in sampling, and we support multiple modes:

None: Linear output (for HDR displays or post-processing)
Reinhard: Simple color / (1 + color)
Reinhard with White Point: Prevents highlights from clamping to white
ACES: Film-like response curve

The bounceIntensity parameter on directional lights lets you control how bright shadows appear when the sun is out. A low value means harsh shadows; a high value fills them with more bounce light.

Sampling in the Lit shader

Once the GI textures exist, the Lit material has to decide which texture to sample and where. We treat surfaces as one of three types:

Ground: horizontal surfaces that are not part of a wall cell.
Wall faces: vertical surfaces of wall cells (+X, -X, +Z, -Z faces).
Wall tops: the horizontal top surface of a wall cell.

All of this routing lives in GI_Lib.hlsl and is exposed to Shader Graph through subgraphs that sample the global textures. The core decision tree looks like this:

 GI_Lib.hlsl 
 float3 CalculateAverageLightBrightness(float3 worldPos, float3 worldNormal)
{
    if (ShouldUseWallLighting(worldPos, worldNormal))
    {
        float3 combined = SampleWallDirect(worldPos, worldNormal)
                        + SampleWallIndirect(worldPos, worldNormal);
        return ApplyToneMap(combined);
    }
 
    if (ShouldUseWallTopLighting(worldPos, worldNormal))
    {
        float3 combined = SampleWallTopDirect(worldPos)
                        + SampleWallTopIndirect(worldPos);
        return ApplyToneMap(combined);
    }
 
    float3 direct = SampleGITextureAtWorldPos(_DirectLightColor, sampler_DirectLightColor, worldPos, worldNormal);
    float3 indirect = SampleGITextureAtWorldPos(_IndirectLightColor, sampler_IndirectLightColor, worldPos, worldNormal);
    indirect = ApplyDirectionalBounce(indirect, worldPos.xz);
 
    float3 combined = (direct + indirect);
    combined *= SampleGIAmbientOcclusion(worldPos.xz);
    combined *= SampleConduction(worldPos.xz);
    return ApplyToneMap(combined);
}
 

Ground sampling

If the surface is not a wall and not a wall top, we sample _DirectLightColor and _IndirectLightColor separately and combine them (with AO + conduction + tone map) in CalculateAverageLightBrightness. The low-level helper still just maps world XZ → UV and samples a GI texture, so ground lighting stays smooth and continuous.

 GI_Lib.hlsl (ground sampling) 
 float3 SampleGITextureAtWorldPos(Texture2D giTexture, SamplerState giSampler, float3 worldPos, float3 worldNormal)
{
    // Walls get routed elsewhere; ground samples directly.
    float2 uv = gridToViewportUV(worldPos.xz);
    uv = snapUVToTexelCenter(uv);
    
    return SAMPLE_TEXTURE2D(giTexture, giSampler, uv).xyz;
}
 

Wall faces (vertical surfaces)

ShouldUseWallLighting checks two things:

The normal has a strong XZ component (so it’s a vertical surface).
The sample position corresponds to a wall cell (via the conduction mask).

 GI_Lib.hlsl (ShouldUseWallLighting) 
 bool ShouldUseWallLighting(float3 worldPos, float3 worldNormal)
{
    if (_WallLightingEnabled < 0.5)
        return false;
 
    // Treat surfaces with significant XZ normal component as walls (even if slightly tilted).
    float2 nXZ = worldNormal.xz;
    if (length(nXZ) < 0.5)
        return false;
 
    float2 samplePos = GetWallSamplePosition(worldPos, worldNormal);
 
    // IsWallAtWorld samples the conduction mask at the corresponding UV and thresholds it.
    return IsWallAtWorld(samplePos) || IsWallAtWorld(worldPos.xz);
}
 

When it’s a wall, we sample from four wall face textures (_WallDirect_PosX/_NegX/_PosZ/_NegZ and _WallIndirect_*). Because the world is voxelized and walls are axis-aligned, I pick a dominant face instead of blending across edges. We still offset the sample by half a GI texel along the normal so we land in the adjacent conductive texel instead of the wall boundary.

 GI_Lib.hlsl (wall faces) 
 int GetDominantWallFace(float3 worldNormal)
{
    float2 n = worldNormal.xz;
    float2 absN = abs(n);
 
    if (absN.x >= absN.y)
        return n.x >= 0 ? 0 : 1; // +X / -X
    return n.y >= 0 ? 2 : 3;     // +Z / -Z
}
 

 GI_Lib.hlsl (wall sampling) 
 float3 SampleWallDirect(float3 worldPos, float3 worldNormal)
{
    float2 samplePos = GetWallSamplePosition(worldPos, worldNormal);
    float2 uv = gridToViewportUV(samplePos);
    uv = snapUVToTexelCenter(uv);
 
    int face = GetDominantWallFace(worldNormal);
    if (face == 0) return SAMPLE_TEXTURE2D(_WallDirect_PosX, sampler_WallDirect_PosX, uv).rgb;
    if (face == 1) return SAMPLE_TEXTURE2D(_WallDirect_NegX, sampler_WallDirect_NegX, uv).rgb;
    if (face == 2) return SAMPLE_TEXTURE2D(_WallDirect_PosZ, sampler_WallDirect_PosZ, uv).rgb;
    return SAMPLE_TEXTURE2D(_WallDirect_NegZ, sampler_WallDirect_NegZ, uv).rgb;
}
 

Wall tops (horizontal surfaces on a wall cell)

ShouldUseWallTopLighting checks for a mostly-upward normal and verifies the cell beneath is a wall. For tops, I keep it intentionally simple:

Direct = the URP main directional light color, scaled by sun angle (blendable via _GI_DirectionalAngleScale).
Indirect = a 3x3 average of nearby non-wall GI samples (so point lights still influence tops, but shadows don’t cut into them).

This look for wall tops might not fit your game and is very much an artistic choice. Modify as needed.

 GI_Lib.hlsl (ShouldUseWallTopLighting) 
 bool ShouldUseWallTopLighting(float3 worldPos, float3 worldNormal)
{
    if (_WallLightingEnabled < 0.5)
        return false;
 
    // Top faces should be mostly upward with minimal XZ component.
    float2 nXZ = worldNormal.xz;
    if (worldNormal.y < 0.5 || length(nXZ) > 0.35)
        return false;
 
    float2 cellCenter = floor(worldPos.xz) + 0.5;
    return IsWallAtWorld(cellCenter);
}
 

 GI_Lib.hlsl (wall top sampling) 
 float3 SampleWallTopDirect(float3 worldPos)
{
    float3 lightDir = normalize(_GI_MainDirectionalDir.xyz);
    float angle = saturate(-lightDir.y * 2.0);
    angle = lerp(angle, 1.0, _GI_DirectionalAngleScale);
    return _GI_MainDirectionalColor.rgb * angle;
_GI_MainDirectionalColor and _GI_MainDirectionalDir is set based on URPs lightData.mainLightIndex.
}
 
float3 SampleWallTopIndirect(float3 worldPos)
{
    return SampleGroundAverage(_IndirectLightColor, sampler_IndirectLightColor, worldPos.xz);
}
 

Direct and indirect sampling mirror the same routing (SampleDirectLightColor / SampleIndirectLightColor), just against _DirectLightColor / _IndirectLightColor and the wall-specific textures.

Optimizations

These are the major optimizations that can be done to this system, but were out of scope for the article:

Rendering the heat spread pass at half or quarter resolution then upscaling can speed up the pass massively, with slight visual differences. Blur can alleviate any patterns that emerge.
Compute version of Heat Spread. In my implementation this resulted in sizeable performance gains. The logic is the same as the fragment shader.
Caching. In my implementation the whole pipeline is skipped unless a cache key changes: configHash + visibleLightHash + wallBufferVersion + emissiveVersion + shadowProxyVersion. This is coarse but effective with phase-locked camera movement. If you want finer granularity, split the invalidation by pass (e.g., skip conduction/AO when only lights change).
Wall Face Culling for Fixed Cameras: If your game uses a fixed camera angle (e.g., isometric), the player may never see certain wall faces (for example, the -X and +Z faces might always be occluded). You can hard-code the shader to skip calculating these directions entirely.
Directional lights pass: I’ll admit, this was an afterthought. A DDA-style “ray per pixel” directional pass is very expensive (as the benchmarks show) and a conventional orthographic shadow map (depth render + sample) would likely be much faster while looking visually similar. If I revisit this, I’ll update the article or publish a Part 2.

Performance data

These measurements are from a standalone release build, captured via NVIDIA Nsight Perf SDK (NVPerf) using the gpu__time_duration.sum hardware counter. Internal game resolution was 480x270 (30x16.875 tiles) on a 3060 Mobile GPU with Linux + Vulkan.

Methodology: All frame captures were done in the same process (per GPU) using console commands to change GI options. They were all taken in the same world position, with walls and shadow proxies (38 of them) present in the scene. All lights were visible and frame caching was disabled. No wall faces were culled. Lights had a radius of 10 and were spread out.

A compute version of heat spread was used.

A camera padding of 1.4 was used, along with 20 heat spread rounds for all captures. All other settings were the same for all captures.

Full, Half and Quarter refer to the resolution at which Heat Spread was performed, then upsampled.

GI Res 16: Full = 768x448, Half = 384x224, Quarter = 192x112
GI Res 8: Full = 384x224, Half = 192x112, Quarter = 96x56.

Small passes (omitted from the tables): ConductionMask, AO, ExtractEmission, WallIndirect. In these runs they sum to ~0.04 to 0.11 ms on the 3060 and ~0.08 to 0.66 ms on Intel. Totals below include them.

Point lights are shadow-casting point lights. Emissive and directional counts are listed separately.

giResolution = 16 (RTX 3060 Mobile)

Heat Spread Res	Point	Emissive	Directional	DDA	DDA_Directional*	HeatSpread	Blur	Total (ms)
full	1	0	0	0.09	0.06	1.14	0.08	1.46
full	1	0	1	0.10	1.97	1.29	0.08	3.54
full	1	256	0	0.45	0.06	1.31	0.09	2.01
full	32	256	0	1.74	0.06	1.18	0.08	3.16
full	64	256	0	3.09	0.05	1.21	0.08	4.53
half	64	256	0	3.03	0.05	0.44	0.05	3.68
quarter	64	256	0	3.06	0.05	0.24	0.05	3.51

giResolution = 8 (RTX 3060 Mobile)

Heat Spread Res	Point	Emissive	DDA	DDA_Directional*	HeatSpread	Blur	Total (ms)
full	64	256	1.01	0.02	0.43	0.03	1.53
half	64	256	0.98	0.02	0.23	0.02	1.30
quarter	64	256	0.98	0.02	0.17	0.03	1.24

* DDA_Directional shows the directional-light pass. With 0 directional lights this is mostly overhead.

Integrated GPU (Intel UHD Graphics (TGL GT1))

I also ran the game on my integrated GPU (Intel UHD Graphics (TGL GT1), i7-11800H CPU). It struggles with giResolution = 16 even at quarter heat res, but giResolution = 8 we get decent frame times with low light counts.

giResolution = 16 (Quarter)

Heat Spread Res	Point	Emissive	Directional	DDA	DDA_Directional*	HeatSpread	Blur	Total (ms)
quarter	1	64	0	3.51	0.34	2.48	0.37	7.37
quarter	32	64	0	35.12	0.35	4.90	0.31	41.34

giResolution = 8 (Quarter)

Heat Spread Res	Point	Emissive	Directional	DDA	DDA_Directional*	HeatSpread	Blur	Total (ms)
quarter	1	64	0	0.94	0.07	0.20	0.11	1.52
quarter	32	64	0	9.57	0.07	1.74	0.10	11.56

What we can learn from this

DDA cost scales by light count and GI resolution. Emissive lights are significantly cheaper, but not free (giRes 16 full: 1 point / 0 emissive = 0.09 ms vs 1 point / 256 emissive = 0.45 ms).
Heat Spread cost scales by texture size. Downsampling gives a big win: 1.21 -> 0.44 -> 0.24 ms at giRes 16 (64 point / 256 emissive) with only minor visual differences when paired with blur.
The directional-light pass is expensive here: 1 directional light adds ~1.9 ms at giRes 16 full.
On Intel, light count is the bottleneck fast. giRes 8 quarter is fine at low counts, but giRes 16 quarter collapses with 32 point lights. This performance profile hints at bandwidth limitations, but I didn’t dive into the counters to confirm. Packing MRTs and re-using textures more aggressively may improve it.

Closing thoughts

Phew, that was a long one! This was a very challenging topic to tackle with limited resources available, so I decided to pack as much information here as possible. Allow me to break the “engineer” persona and say this was very frustrating and painful at times, and I became a little obsessed with it, but it was extremely fun and taught me a lot. I truly hope this helps someone keep their hair strands attached to their head. If you do something cool with this, let me know! I’d love to hear about it!

There might be techniques I’ve missed, there are for sure things I can improve, but I’m happy with where I landed with this system. Feedback and suggestions are always appreciated, and questions are welcome. You can reach me on Twitter @gincodes or through e-mail at hi@gin.codes.

If this write-up made you think “we should hire this person”, I’m currently looking to switch from a 10-year detour in back-end engineering into full-time game dev. Reach me at: hire@gin.codes

Full-disclosure: AI was used for proof reading, grammar checking, tone and consistency. The article structure, content, ideas, lessons are my own. AI played a part in writing parts of the code for this system. It was painstakingly reviewed by my human eyes before publishing.

Table of Contents

Overview

Pass Graph + Texture Formats

Requirements and compromises

1. Occlusion representation: The Conduction Mask

Grid size + WallBuffer population

Ambient occlusion

2. GI View Window

GI Resolution

Phase Locking

Coordinate Conventions

3. Direct Lighting (Point Lights)

Gathering Lights

DDA

Shadow Proxies (Analytical Occluders)

Multiple Render Targets: One Pass, Many Textures

The Wall Problem

The Rendering Loop

Directional Lights: The Sun Problem

Emissive Lights (No Shadows)

Shadow map encoding

4. Indirect Light: Thermodynamics of Pixels

Step 1: Extract Emission

Step 2: The Ping-Pong Diffusion

Diffusion update equation

Per-Channel Diffusion (Stylized)

Solving the Resolution Problem: Hierarchical Sampling

Conduction in Action

The Blur Pass (Optional)

WallIndirect pass: projecting bounce onto walls

Combining Everything

Sampling in the Lit shader

Ground sampling

Wall faces (vertical surfaces)

Wall tops (horizontal surfaces on a wall cell)

Optimizations

Performance data

giResolution = 16 (RTX 3060 Mobile)

giResolution = 8 (RTX 3060 Mobile)

Integrated GPU (Intel UHD Graphics (TGL GT1))

giResolution = 16 (Quarter)

giResolution = 8 (Quarter)

What we can learn from this

Closing thoughts