Pixel-perfect GI Without Ray Tracing

Gin ·
global-illumination custom-light game-dev

Hi! I’m building Anchorfall with a visual style inspired by Core Keeper: chunky pixel art with dynamic lighting, sharp shadows, and real-time global illumination.

To achieve that, I built a custom lighting pipeline that does direct light, indirect light (GI), pixel-perfect shadows and gives strong artistic control. It isn’t physically accurate, but fits pixel art games well.

A final render from Anchorfall (art not final). White-texture version here

It’s inspired by Core Keeper’s own lighting system (which uses heat diffusion in their pipeline) and David Maletz’s work on I Can’t Escape: Darkness. His blog post was the foundation for the indirect lighting approach.

This system is the result of ~8 months of iteration, and easily the steepest rendering learning curve I’ve put myself through. It was initially an indirect-only system, but merging URP’s own light/shadow with the system proved to be brittle and wasteful (we need direct to seed indirect anyway). I wanted something that looked great with pixel art, performed well and had none of the usual artifacts we see in modern, non-baked GI like temporal sampling or ghost trails while allowing artistic choices like dithering, posterization, palette swaps, etc.

I’m really happy with where I landed with this, and despite the length of this article, this system will likely grow to support more artistic control. But even as it stands now, it’s a solid system that feels production-worthy.

This is implemented in Unity (URP + RenderGraph). Code snippets are in C# and HLSL.

Table of Contents

Overview

The light pipeline

From a high level, here is how the system works:

Image of conduction mask
Black pixels represent geometry that blocks light
Image of direct light
Direct light information. Blue lights are emissives, smaller shadows are shadow proxies.
Image of indirect light
Resulting indirect light after several heat spread rounds
Image of indirect light
Ambient Occlusion + Direct + Indirect
Rendered frame with white texturesRendered frame

Pass Graph + Texture Formats

I’m not going to walk through RenderGraph wiring, but here’s the minimal pass order + texture contract this system assumes. If you follow this, you’ll match the behavior in the shaders.

Update _CameraView (phase-locked)
ConductionMask         _ConductionMask (global) (R8)
AmbientOcclusion       _AmbientOcclusion (global) (R8) [optional]
Direct Point + Emissive  -> _DirectLightColor (global) (RGBA16F)
                         -> _GI_ShadowMap (global) (R8G8: R=point, G=directional)
                         -> _WallDirect_[PosX/NegX/PosZ/NegZ] (global) (RGBA16F)
Directional Lights      -> add into _DirectLightColor
                         -> write _GI_ShadowMap.G (max blend)
                         -> update _WallDirect_* textures
ExtractEmission         (RGBA16F)
HeatTransfer ping-pong  _IndirectLightColor (Global) (RGBA16F) + Buffer (RGBA16F), optional downsample
Optional Blur           Writes to _IndirectLightColor (RGBA16F)
WallIndirect            -> WallIndirect_PosX/NegX/PosZ/NegZ (RGBA16F)
Sample in Lit shader    (direct + indirect + AO + conduction + tone map)

All textures are linear (sRGB off), no mips, no MSAA, point + clamp.

Implementation detail: all of these are RTHandles. They persist across frames and are reallocated when the GI window size changes (RenderGraph ReAllocateHandleIfNeeded). Heat transfer uses ping-pong buffers (and an optional downsample + upsample path).

Directional, AO and blur passes are optional.

Requirements and compromises

This pipeline makes some strong assumptions to stay fast and stable:

1. Occlusion representation: The Conduction Mask

In a typical 3D engine, light is blocked by geometry (meshes). In our world, the “truth” of the level geometry lives in a StructuredBuffer<int> called the _WallBuffer. This is a flat array of integers representing our 1x1 world tiles: 0 for empty, 1 for wall.

To make this data accessible to our pixel shaders, we first render a Conduction Mask. This is a camera-aligned texture (R8) that tells the lighting engine where light is allowed to exist.

The ConductionMask.shader is deceptively simple. It maps the fullscreen UVs to world grid coordinates.

ConductionMask.shader
1float frag(Varyings i) : SV_Target
2{
3 float2 uv = i.texcoord;
4
5 float blockAllLightColor = 0;
6 float allowAllLightColor = 1;
7
8 float2 worldCoord = uvToGrid(uv);
uvToGrid returns grid coordinates, it's defined in the next section
9 int2 worldTile = int2(floor(worldCoord));
10
11 // Apply offset to convert world coords to buffer indices
12 int2 bufferTile = worldTile + int2(_GridOffsetX, _GridOffsetY);
13
14 if (bufferTile.x < 0 || bufferTile.x >= _GridWidth || bufferTile.y < 0 || bufferTile.y >= _GridHeight)
15 {
16 return allowAllLightColor;
17 }
18
19 int bufferIndex = Convert2DIndexTo1D(bufferTile, _GridWidth);
20 int isWall = _WallBuffer[bufferIndex];
21
22 if (isWall == 0)
23 {
24 return allowAllLightColor;
25 }
26
27 return blockAllLightColor; // Wall cell: no conduction (light can't exist/propagate here)
28}

We index the wall buffer row-major (x + y * width) where y corresponds to world +Z. The shader helpers used across passes look like this:

GI_Lib.hlsl (helpers)
1int Convert2DIndexTo1D(int2 idx, int width)
2{
3 return idx.x + idx.y * width;
4}

Optionally, you could choose to sample _WallBuffer directly instead of saving a conduction texture, but having the conduction mask texture helped immensely when things went wrong, so I’m adding both IsObstacle versions below, but beware that the snippets here assume a _ConductionMask texture is present.

Texture sample version
1bool IsObstacle(int x, int y)
2{
3 int bufferX = x + _GridOffsetX;
4 int bufferY = y + _GridOffsetY;
5
6 // Out-of-bounds = air (streaming-friendly edges)
7 if (bufferX < 0 || bufferY < 0 || bufferX >= _GridWidth || bufferY >= _GridHeight)
8 return false;
9
10 float2 cellPos = float2(x, y) + 0.5;
11 float2 conductionUV = gridToViewportUV(cellPos);
12 float conductivity = SAMPLE_TEXTURE2D(_ConductionMask, sampler_ConductionMask, conductionUV).r;
13 return conductivity < 0.1;
14}
_WallBuffer version
1bool IsObstacle(int x, int y)
2{
3 int bufferX = x + _GridOffsetX;
4 int bufferY = y + _GridOffsetY;
5
6 // Out-of-bounds = air (streaming-friendly edges)
7 if (bufferX < 0 || bufferY < 0 || bufferX >= _GridWidth || bufferY >= _GridHeight)
8 return false;
9
10 int bufferIndex = Convert2DIndexTo1D(int2(bufferX, bufferY), _GridWidth);
11 return _WallBuffer[bufferIndex] != 0;
12}

The result is a binary map of the world. White allows conduction, black blocks it (stored in the R channel because the texture is R8). Out-of-bounds is treated as empty/air to keep streaming-friendly behavior at the edges of the simulated window.
For DDA shadow rays, we query _WallBuffer directly so shadows don’t depend on the GI window. This prevents popping.

Grid size + WallBuffer population

The mask only works if the shader knows the grid dimensions and has the wall buffer bound. _GridWidth and _GridHeight define the buffer size, and _GridOffsetX/Y shifts world coordinates into buffer space (so a centered or streaming world can still map into a single flat array).

In Anchorfall, a GIWallBufferManager owns the buffer. It builds a flat ComputeBuffer<int> (0 = empty, 1 = wall) from the MapGen wall layer, binds it as _WallBuffer, and sets the grid globals. It also mirrors the data on the CPU so we can do incremental updates (chunk load/unload or single-tile edits) without GPU readback.

_GridOffset is independent from _CameraView: the camera view defines what area we render, while the grid offset defines where the global wall buffer lives in world space.

GIWallBufferManager.cs (init)
1_bufferSize = new int2(diameter, diameter);
2_coordinateOffset = diameter / 2;
3
4Shader.SetGlobalInt("_GridWidth", _bufferSize.x);
5Shader.SetGlobalInt("_GridHeight", _bufferSize.y);
6Shader.SetGlobalFloat("_WallHeight", wallHeight);
7
8_wallBuffer = new ComputeBuffer(_bufferSize.x * _bufferSize.y, sizeof(int));
9_wallBuffer.SetData(_cpuMirror);
10Shader.SetGlobalBuffer("_WallBuffer", _wallBuffer);
11
12GIBridge.GridSizeOverride = _bufferSize;
13GIBridge.GridOffsetOverride = new int2(_coordinateOffset, _coordinateOffset);

Any time the wall data changes we push the updated range (or a whole chunk) and bump GIBridge.WallBufferVersion, which lets the GI render feature know the mask needs to refresh.

Ambient occlusion

Right after the conduction mask we compute a lightweight ambient occlusion (AO) texture. It only depends on the wall buffer and the grid mapping. We apply AO at sample-time to darken creases and wall-adjacent areas without affecting light transport.

This AO is purely a contact-darkening term; it doesn’t affect shadowing or bounce, only the final shading.

The AO pass samples the wall buffer in a radius around each pixel, weights by distance (and a softness power), and outputs a single-channel occlusion value. _AORadius is in world units (tile units in my case). Optional Bayer dithering helps hide banding at low resolution.

AmbientOcclusionGI.shader (core)
1float2 worldPos = uvToGrid(uv);
2float giRes = max(_GI_Resolution_Lib, 1.0);
3float2 gridPos = (floor(worldPos * giRes) + 0.5) / giRes;
4
5int2 baseTile = int2(floor(gridPos));
6float radius = max(_AORadius, 0.01);
7int range = (int)ceil(radius);
8
9float occSum = 0.0;
10float weightSum = 0.0;
11
12for (int y = -range; y <= range; y++)
13{
14 for (int x = -range; x <= range; x++)
15 {
16 float2 tileCenter = float2(baseTile.x + x + 0.5, baseTile.y + y + 0.5);
17 float dist = distance(gridPos, tileCenter);
18 if (dist > radius) continue;
19
20 float w = 1.0 - saturate(dist / radius);
21 w = pow(w, max(_AOPower, 0.01));
22 weightSum += w;
23
24 if (IsObstacle(baseTile.x + x, baseTile.y + y))
25 occSum += w;
26 }
27}
28
29float occ = (weightSum > 0.0) ? (occSum / weightSum) : 0.0;
30float ao = 1.0 - saturate(_AOStrength) * occ;

2. GI View Window

The light textures have to move with the camera, so we need to tell the shaders where the camera is looking.

The GIUtils.CalculateCameraVisibleArea() function projects the camera view onto the ground plane, then multiplies the result by a padding factor. A value of 2 means we render twice the visible width and height.

Note

Padding is not primarily for lights whose transform is off-frustum. As long as a light’s radius overlaps the GI window, it will be included (even if the light itself is outside the camera frustum). Padding is mostly about diffusion stability, tall objects and hiding the GI window edge. DDA shadow rays query _WallBuffer directly, so occluders outside the GI window still block light without relying on the conduction mask.

Objects are lit based on the tile they occupy. If an object is too tall, you’ll need more padding (or a different sampling strategy), otherwise the object will first appear dark then suddenly light up as the GI window slides into the tile it occupies.

GIUtils.cs
1public static float4 CalculateCameraVisibleArea(Camera camera, float cameraViewPadding)
2{
3 const float groundPlaneY = 0.0f;
4
5 Vector2[] corners = { new(0f, 0f), new(1f, 0f), new(0f, 1f), new(1f, 1f) };
6
7 float2 minGrid = new float2(float.MaxValue, float.MaxValue);
8 float2 maxGrid = new float2(float.MinValue, float.MinValue);
9
10 for (int i = 0; i < corners.Length; i++)
11 {
12 Vector2 corner = corners[i];
13 Ray ray = camera.ViewportPointToRay(new Vector3(corner.x, corner.y, 0f));
14 float denom = ray.direction.y;
15
16 // If the camera ray is parallel to the ground plane (or something goes NaN), we bail out and skip GI for the frame.
17 if (math.abs(denom) < 1e-4f)
18 return new float4(0f, 0f, 0f, 0f);
19
20 float t = (groundPlaneY - ray.origin.y) / denom;
21 if (t <= 0.0f || float.IsNaN(t) || float.IsInfinity(t))
22 return new float4(0f, 0f, 0f, 0f);
23
24 Vector3 hit = ray.origin + ray.direction * t;
25 minGrid = math.min(minGrid, new float2(hit.x, hit.z));
26 maxGrid = math.max(maxGrid, new float2(hit.x, hit.z));
27 }
28
29 float2 center = (minGrid + maxGrid) * 0.5f;
30 float2 paddedSize = (maxGrid - minGrid) * cameraViewPadding;
31
32 float2 minCorner = center - paddedSize * 0.5f;
33 return new float4(minCorner.x, minCorner.y, paddedSize);
34}

GI Resolution

The giResolution parameter controls how many texels we allocate per world unit.

The final texture dimensions are:

textureWidth  = viewWidth  x giResolution
textureHeight = viewHeight x giResolution

So if your camera sees a 30x20 world-unit area (after padding) and giResolution = 4, you get a 120x80 GI texture.

More precisely, the sizing math we use is:

resolution = max(1, giResolution)

// Convert padded world size -> texels
textureWidth  = ceil(viewWidth  * resolution)
textureHeight = ceil(viewHeight * resolution)

// Snap to multiples of resolution (keeps whole-tile alignment)
textureWidth  = ceil(textureWidth  / resolution) * resolution
textureHeight = ceil(textureHeight / resolution) * resolution

// Minimum size guard
textureWidth  = max(textureWidth,  64)
textureHeight = max(textureHeight, 64)

// World size implied by the snapped texture
worldWidth  = textureWidth  / resolution
worldHeight = textureHeight / resolution

That snapping step is important: it guarantees the texture grid lines up with whole world units, which keeps phase locking stable and prevents half-texel drift when the camera moves.

A resolution of 1 would produce a Minecraft-like light, with each tile having a single color. However, since our shadows use the same texture, you would get very blocky shadows.

This value also affects heat diffusion: at higher resolutions, world-space hops span more GI texels. That’s why the adaptive hierarchical sampling exists (covered later).

Phase Locking

We deliberately quantize the GI viewport origin to whole-tile (1 world unit) boundaries.

This does two things:

Trade-off: the GI window moves in discrete steps. With sufficient padding, the camera never sees the window edge, so this stays visually stable.

If you want smoother tracking, snap to GI-texel boundaries (1 / giResolution world units) instead, but you’ll update more often and caching becomes less effective.

GIUtils.cs
1private float4 CalculatePhaseLockedCameraView(float2 center, float2 size, int resolution)
2{
3 float minX = center.x - size.x * 0.5f;
4 float minY = center.y - size.y * 0.5f;
5
6 // Convert world -> texels
7 int minXTexel = Mathf.FloorToInt(minX * resolution);
8 int minYTexel = Mathf.FloorToInt(minY * resolution);
9
10 // Snap to 1 world unit boundaries (resolution texels)
11 minXTexel = Mathf.FloorToInt(minXTexel / (float)resolution) * resolution;
12 minYTexel = Mathf.FloorToInt(minYTexel / (float)resolution) * resolution;
13
14 float invRes = 1.0f / resolution;
15 return new float4(minXTexel * invRes, minYTexel * invRes, size.x, size.y);
16}

Coordinate Conventions

These are the rules I stick to everywhere. If you keep this mapping consistent, the rest of the system stays stable:

With _CameraView set, the shader can convert any screen UV to world coordinates:

GI_Lib.hlsl
1float2 uvToGrid(float2 uv)
2{
3 // _CameraView = (minX, minZ, width, height)
4 float2 gridPos;
5 gridPos.x = _CameraView.x + uv.x * _CameraView.z;
6 gridPos.y = _CameraView.y + uv.y * _CameraView.w;
7 return gridPos;
8}

The inverse mapping (world → UV) and texel snapping look like this:

GI_Lib.hlsl
1float2 gridToViewportUV(float2 gridPos)
2{
3 CamBounds bounds = GetGlobalCameraViewData();
4 float2 uv;
5 uv.x = (gridPos.x - bounds.min.x) / bounds.size.x;
6 uv.y = (gridPos.y - bounds.min.y) / bounds.size.y;
7 return uv;
8}
9
10float2 snapUVToTexelCenter(float2 uv)
11{
12 float2 texelSize = getViewportTexelSize();
13 float2 halfTexel = texelSize * 0.5;
14 uv = clamp(uv, halfTexel, 1.0 - halfTexel);
15 return (floor(uv / texelSize) + 0.5) * texelSize;
16}

Please note that the GI “grid space” is (worldX, worldZ). I’ll call the second component y in shaders because it’s a 2D texture axis.

This simple linear interpolation maps (0,0) to the bottom-left of our world view and (1,1) to the top-right. Every GI pass uses this function to bridge screen space and world space.

Consistency here matters. If any pass uses a slightly different mapping (different padding, phase lock, or floor vs round), you get flicker and off-by-one lighting bugs.


3. Direct Lighting (Point Lights)

We now have a 2D texture with our light occluders. We are ready to fake some photons.

Gathering Lights

In the RenderFeature, we iterate through Unity’s visible lights array and pack light data into a GPU buffer:

1struct VM_LightData
2{
3 float4 color;
4 float2 position;
5 float radius;
6 float _padding; // 16 byte alignment
7};

We don’t need a separate intensity because visibleLight.finalColor already bakes in the Unity light’s intensity + color (and temperature), so the shader only needs radius and position:

LightRenderFeature.cs
1// In RecordRenderGraph
2
3var lightData = frameData.Get<UniversalLightData>();
4
5for (int i = 0; i < lightData.visibleLights.Length; i++)
By using lightData.visibleLights, we don't need to handle culling + range checks, but max lights are capped at 256 by URP. If you need more, gather them in a job or similar.
6{
7 var visibleLight = lightData.visibleLights[i];
8 var unityLight = visibleLight.light;
9 var position = unityLight.transform.position;
10
11
12 WritePointLight(lightCount,
13 new float3(position.x, position.y, position.z),
14 visibleLight.finalColor,
15 unityLight.range);
WritePointLight writes to the light buffer we'll send to the shaders
16}

DDA

Instead of the classical approach, we use DDA (Digital Differential Analyzer) to cast rays from each light to each pixel.

More precisely it’s the Amanatides & Woo grid traversal, but it’s often called DDA in gamedev. The key property is stepping from grid boundary to grid boundary, visiting every cell the ray crosses in order.

For shadow casting, we want to know: “does a straight line from the light to this pixel pass through any walls?”

The algorithm works by tracking how far we need to travel in X and Y to cross the next cell boundary. Whichever is closer, we step in that direction:

PointLights_DDA.hlsl
1float CalculateShadowDDA(float2 lightPos, float2 pixelPos, float radius)
2{
3 float2 rayDir = pixelPos - lightPos;
4 float rayLength = length(rayDir);
5
6 if (rayLength < 1e-4)
7 return 1.0;
8
9 rayDir /= rayLength;
10
11 // Bias the ray off grid boundaries to avoid corner/edge ambiguity.
12 const float RAY_BIAS = 1e-4;
13 float2 startPos = lightPos + rayDir * RAY_BIAS;
14 float2 endPos = pixelPos - rayDir * RAY_BIAS;
15 rayLength = max(rayLength - 2.0 * RAY_BIAS, 0.0);
Start/end bias keeps floor() stable when positions land on grid boundaries.
16
17 // Grid cell coordinates of starting and ending points
18 int2 startCell = int2(floor(startPos));
19 int2 endCell = int2(floor(endPos));
20
21 int2 cell = startCell;
22
23 // Calculate step direction
24 int2 step = int2(sign(rayDir));
25
26 // Calculate distance to first X and Y cell boundaries
27 const float DIR_EPS = 1e-6;
28 bool xZero = abs(rayDir.x) < DIR_EPS;
29 bool yZero = abs(rayDir.y) < DIR_EPS;
Guard near-zero directions to avoid 1/dir blowups.
30
31 float2 tMax;
32 if (xZero)
33 tMax.x = 1e30;
34 else
35 tMax.x = (step.x > 0)
36 ? (floor(startPos.x) + 1 - startPos.x) / rayDir.x
37 : (startPos.x - floor(startPos.x)) / -rayDir.x;
38
39 if (yZero)
40 tMax.y = 1e30;
41 else
42 tMax.y = (step.y > 0)
43 ? (floor(startPos.y) + 1 - startPos.y) / rayDir.y
44 : (startPos.y - floor(startPos.y)) / -rayDir.y;
45
46 // Calculate how far to step in each direction to move one cell
47 float2 tDelta = float2(
48 xZero ? 1e30 : abs(1.0 / rayDir.x),
49 yZero ? 1e30 : abs(1.0 / rayDir.y));
50
51 float distanceTraveled;
52 const float CORNER_EPS = 1e-5;
53
54 // Maximum number of steps to prevent infinite loops
55 int maxSteps = max(1, (int)ceil(radius * 2.0) + 2);
56
57 for (int i = 0; i < maxSteps; i++)
58 {
59 // IsObstacle MUST use _WallBuffer in this path
60 if (IsObstacle(cell.x, cell.y))
IsObstacle treats out-of-bounds as empty (streaming-friendly).
61 {
62 return 0;
63 }
64
65 // If we've reached the destination cell, we're not in shadow
66 if (cell.x == endCell.x && cell.y == endCell.y)
67 return 1.0;
68
69 // Step to next cell
70 if (tMax.x < tMax.y - CORNER_EPS)
71 {
72 distanceTraveled = tMax.x;
73 tMax.x += tDelta.x;
74 cell.x += step.x;
75 }
76 else if (tMax.y < tMax.x - CORNER_EPS)
77 {
78 distanceTraveled = tMax.y;
79 tMax.y += tDelta.y;
80 cell.y += step.y;
81 }
82 else
83 {
84 // Corner hit: only block if both adjacent cells are obstacles
85 distanceTraveled = tMax.x;
86 int2 nextX = cell + int2(step.x, 0);
87 int2 nextY = cell + int2(0, step.y);
88 if (IsObstacle(nextX.x, nextX.y) && IsObstacle(nextY.x, nextY.y))
89 return 0;
90
91 tMax.x += tDelta.x;
92 tMax.y += tDelta.y;
93 cell.x += step.x;
94 cell.y += step.y;
95 }
96
97 // If we've traveled farther than the ray length, we're done
98 if (distanceTraveled > rayLength)
99 return 1.0;
100 }
101
102 // If we've exceeded max steps, assume no shadow
103 return 1.0;
104}

Shadow Proxies (Analytical Occluders)

DDA gives grid-accurate shadows from the wall buffer, but it doesn’t cover thin props, dynamic obstacles, or anything that isn’t baked into the grid. For those I use shadow proxies: small analytic occluders that cast soft shadows without touching the wall buffer.

Each proxy is a simple shape (box or circle) with width, length, and penumbra parameters. The proxy buffer is updated on the CPU, culled against the GI view, and capped to maxShadowProxies. Both the point-light and directional-light passes read the same buffer.

Culling is straightforward: cullRadius = max(size) + maxLength + max(penumbraWidth, penumbraLength), then keep the closest N proxies to the GI view center.

1struct VM_ShadowProxyData
2{
3 float2 position; // World XZ
4 float2 size; // Half extents (box) or radius (circle)
5 float maxLength; // Shadow length
6 float endCapBlend; // Blend between light dir and proxy axis
7 float penumbraWidth; // Soft edge across width
8 float penumbraLength; // Soft edge along length
9 float penumbraPower; // Curve sharpness
10 float shape; // 0 = box, 1 = circle
11 float widthMode; // 0 = projected width, 1 = max axis
12 float penumbraMode; // inward/outward/both
13};

Core idea: treat the proxy like a capsule/box aligned with the light direction, compute perp and along distances, and attenuate by two penumbra curves.

ComputePointLightProxyShadow (condensed)
1float ComputePointLightProxyShadow(float2 lightPos, float2 pixelPos)
2{
3 if (_ShadowProxyCount <= 0) return 1.0;
4
5 float2 ray = pixelPos - lightPos;
6 float rayLen = length(ray);
7 if (rayLen < 1e-4) return 1.0;
8
9 float2 dir = ray / rayLen;
10 float shadow = 1.0;
11
12 for (int i = 0; i < _ShadowProxyCount; i++)
13 {
14 VM_ShadowProxyData proxy = _ShadowProxyBuffer[i];
15 float2 toProxy = proxy.position - lightPos;
16 float proxyDist = length(toProxy);
17 if (proxyDist < 1e-4) continue;
18
19 float2 dirFlat = toProxy / proxyDist;
20 float2 dirShadow = normalize(lerp(dir, dirFlat, saturate(proxy.endCapBlend)));
21 float2 perpDir = float2(-dirShadow.y, dirShadow.x);
22
23 float occluderDist = dot(toProxy, dirShadow);
24 if (occluderDist <= 0.0) continue;
25
26 float2 toPixelFromProxy = pixelPos - proxy.position;
27 float alongDist = dot(toPixelFromProxy, dirShadow);
28 if (alongDist <= 0.0) continue;
29
30 float perpDist = abs(dot(toPixelFromProxy, perpDir));
31 float width = GetShadowProxyWidth(proxy, perpDir);
32
33 if (alongDist > proxy.maxLength + proxy.penumbraLength) continue;
34
35 float edge = ComputePenumbraFactor(perpDist, width, proxy.penumbraWidth, proxy.penumbraPower, proxy.penumbraMode);
36 float len = ComputePenumbraFactor(alongDist, proxy.maxLength, proxy.penumbraLength, proxy.penumbraPower, proxy.penumbraMode);
37
38 float occlusion = edge * len;
39 shadow = min(shadow, 1.0 - occlusion);
40 if (shadow <= 0.001) break;
41 }
42
43 return shadow;
44}

In the lighting loop I combine it like this:

shadow = min(ShadowDDA, ShadowProxy);

Directional lights use the same proxy logic (just with a fixed light direction).

Shadow proxy handling is basic and has room for improvement. It should be possible to project shadows in the walls non-uniformly so they retain their shape when they hit walls. Right now, if a shadow hits a wall, the wall gets shadowed from top to bottom. Penumbra helps massively but hard shadows can be distracting when they just barely touch a wall.

Multiple Render Targets: One Pass, Many Textures

Warning

If you are targeting Shader Model 3 (D3D9) or OpenGL ES, MRTs have a cap of 4 render targets. Pack 2 wall faces per texture to get around this.

We don’t just output a single color texture, we use Multiple Render Targets (MRT) to write several textures simultaneously:

1struct FragmentOutput
2{
3 float4 color : SV_Target0; // Direct light color
4 float4 shadow : SV_Target1; // Shadow map (R=point, G=directional)
5 float4 wallFace0 : SV_Target2; // +X face (RGB)
6 float4 wallFace1 : SV_Target3; // -X face (RGB)
7 float4 wallFace2 : SV_Target4; // +Z face (RGB)
8 float4 wallFace3 : SV_Target5; // -Z face (RGB)
9};

Why so many outputs? Because walls need per-face lighting.

The Wall Problem

Ground tiles are simple, they face up, sample the GI texture at their world position, done. But walls have sides. A wall lit from the east should have a bright east face and a dark west face. They also are perfectly aligned with the conduction mask occluders, so sampling at their position will always return no light.

For a while, we used the wall normal to sample the closest adjacent pixel to get around this limitation, but that causes “bands” that look distracting when shadows wrap around corners, and Ambient Occlusion interfered with the wall shading.

Our solution: for each wall texel, we store four lighting values (one per cardinal direction: +X, -X, +Z, -Z) as four separate RGBA16F textures. Each texture holds full RGB for one face. This keeps full color precision, avoids face packing math, and still lets us pick a dominant face with a single texture fetch. Non-wall texels stay zero.

This is a lot of textures (4 for direct, 4 for indirect), yes. But giResolution scales the tile resolution, not screen resolution. A 480x270 internal resolution game like Anchorfall, with giResolution 16 and padding 2 would produce textures at 960x540, which comes at ~4MB per texture at RGBA16F.

For the whole light system (assuming heat spread is done at full res), we have:

2x R8, 1x R8G8, 13x RGBA16F. At the resolution above, that amounts to just under 56MB of VRAM. That is next to nothing for modern hardware and even impressive for a whole lighting system.

Note
A Texture2DArray would be a slightly more elegant solution. They are not used here due to Unity limitations when rendering to multiple slices of the same array.
1// Which faces of this wall tile are exposed (not buried in other walls)?
2float4 GetWallFaceMask(int2 cell)
3{
4 if (!IsObstacle(cell.x, cell.y))
5 return float4(0, 0, 0, 0); // Not a wall
6
7 // Check each neighbor - if neighbor is NOT a wall, this face is exposed
8 float facePosX = IsObstacle(cell.x + 1, cell.y) ? 0.0 : 1.0;
9 float faceNegX = IsObstacle(cell.x - 1, cell.y) ? 0.0 : 1.0;
10 float facePosZ = IsObstacle(cell.x, cell.y + 1) ? 0.0 : 1.0;
11 float faceNegZ = IsObstacle(cell.x, cell.y - 1) ? 0.0 : 1.0;
12
13 return float4(facePosX, faceNegX, facePosZ, faceNegZ);
14}
15
16// How much does this light contribute to each face?
17float4 GetWallFacing(float2 dirToLight)
18{
19 return float4(
20 saturate(dirToLight.x), // +X face lit when light is to the right
21 saturate(-dirToLight.x), // -X face lit when light is to the left
22 saturate(dirToLight.y), // +Z face lit when light is above
23 saturate(-dirToLight.y) // -Z face lit when light is below
24 );
25}

For each light, we calculate the direction to the light source, multiply the face mask by the facing weights, and accumulate into the wall textures. Later, when rendering a wall sprite, we sample these textures using the sprite’s world normal to pick the right face. This is intentionally simple and snappy: it biases light toward the most directly oriented face without needing per-pixel normals in the GI texture. If you want softer transitions, raise the facing term to a power (e.g. pow(saturate(dirToLight.x), k)).

On bandwidth-constrained GPUs you may want fewer RTs or pack wall faces.

The Rendering Loop

Now for the actual rendering. Every pixel in our GI texture runs a fragment shader that loops through all point lights:

PointLights_DDA.hlsl
1FragmentOutput frag(Varyings input)
2{
3 FragmentOutput output;
4 output.color = float4(0);
5 output.shadow = float4(0);
6 output.wallFace0 = float4(0);
7 output.wallFace1 = float4(0);
8 output.wallFace2 = float4(0);
9 output.wallFace3 = float4(0);
10
11 float2 gridPos = uvToGrid(input.texcoord);
12 float giRes = max(_GI_Resolution_Lib, 1.0);
13 float2 gridPosCell = (floor(gridPos * giRes) + 0.5) / giRes;
Snap to GI texel centers so wall/ground sampling is stable across camera movement.
14
15 int2 wallCell = int2(floor(gridPos));
16 // IsObstacle MUST use _WallBuffer in this path, otherwise popping might occur
17 bool isWall = IsObstacle(wallCell.x, wallCell.y);
Walls are encoded in the wall buffer (1 = wall).
18
19 float totalShadow = 0.0;
20 float3 wallFace0 = float3(0, 0, 0);
21 float3 wallFace1 = float3(0, 0, 0);
22 float3 wallFace2 = float3(0, 0, 0);
23 float3 wallFace3 = float3(0, 0, 0);
Wall lighting accumulates into per-face outputs (valid only on wall texels).
24
25 if (isWall)
26 {
27 float2 wallSamplePos = gridPosCell;
Base wall position; per-face offsets push just outside the wall cell.
28 float4 faceMask = GetWallFaceMask(wallCell);
29 float2 cellMin = float2(wallCell);
30 float2 cellMax = cellMin + 1.0;
31 const float faceEps = 1e-3;
Small epsilon pushes the ray target just outside the wall cell (aligns wall/ground shadows).
32 const float4 zero4 = float4(0, 0, 0, 0);
33
34 for (int i = 0; i < _LightCount; i++)
35 {
36 float2 lightGridPos = _LightBuffer[i].position;
37 float4 lightColor = _LightBuffer[i].color;
38 float lightRadius = _LightBuffer[i].radius;
39
40 float dist = distance(wallSamplePos, lightGridPos);
41 if (dist > lightRadius)
42 continue;
43
44 float2 dirToLight = lightGridPos - wallSamplePos;
45 float invLen = rsqrt(max(dot(dirToLight, dirToLight), 1e-6));
46 dirToLight *= invLen;
47 float4 facing = GetWallFacing(dirToLight);
48 float4 faceWeight = faceMask * facing;
Face weights bias toward the wall face oriented to the light.
49
50 if (all(faceWeight == zero4))
51 continue;
52
53 float falloff = 1.0 - saturate(dist / lightRadius);
54 falloff = pow(falloff, _LightSoftness);
55
56 float3 directContribution = lightColor.rgb * falloff;
57
58 float4 faceShadow = float4(1, 1, 1, 1);
59 if (faceWeight.x > 0.0)
Each face gets its own DDA to the adjacent cell + proxy shadow.
60 {
61 float2 facePos = float2(cellMax.x + faceEps, wallSamplePos.y);
62 float shadowFactor = CalculateShadowDDA(lightGridPos, facePos, lightRadius);
63 float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, facePos);
64 faceShadow.x = min(shadowFactor, proxyShadowFactor);
65 }
66 if (faceWeight.y > 0.0)
67 {
68 float2 facePos = float2(cellMin.x - faceEps, wallSamplePos.y);
69 float shadowFactor = CalculateShadowDDA(lightGridPos, facePos, lightRadius);
70 float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, facePos);
71 faceShadow.y = min(shadowFactor, proxyShadowFactor);
72 }
73 if (faceWeight.z > 0.0)
74 {
75 float2 facePos = float2(wallSamplePos.x, cellMax.y + faceEps);
76 float shadowFactor = CalculateShadowDDA(lightGridPos, facePos, lightRadius);
77 float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, facePos);
78 faceShadow.z = min(shadowFactor, proxyShadowFactor);
79 }
80 if (faceWeight.w > 0.0)
81 {
82 float2 facePos = float2(wallSamplePos.x, cellMin.y - faceEps);
83 float shadowFactor = CalculateShadowDDA(lightGridPos, facePos, lightRadius);
84 float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, facePos);
85 faceShadow.w = min(shadowFactor, proxyShadowFactor);
86 }
87
88 float4 faceContribution = faceWeight * faceShadow;
89
90 wallFace0 += directContribution * faceContribution.x;
91 wallFace1 += directContribution * faceContribution.y;
92 wallFace2 += directContribution * faceContribution.z;
93 wallFace3 += directContribution * faceContribution.w;
94 }
95
96 for (int i = 0; i < _EmissiveLightCount; i++)
Emissives skip DDA but still light wall faces.
97 {
98 float2 lightGridPos = _EmissiveLightBuffer[i].position;
99 float4 lightColor = _EmissiveLightBuffer[i].color;
100 float lightRadius = _EmissiveLightBuffer[i].radius;
101
102 float dist = distance(wallSamplePos, lightGridPos);
103 if (dist > lightRadius)
104 continue;
105
106 float falloff = 1.0 - saturate(dist / lightRadius);
107 falloff = pow(falloff, _LightSoftness);
108
109 float3 directContribution = lightColor.rgb * falloff;
110 float2 dirToLight = lightGridPos - wallSamplePos;
111 float invLen = rsqrt(max(dot(dirToLight, dirToLight), 1e-6));
112 dirToLight *= invLen;
113 float4 facing = GetWallFacing(dirToLight);
114 float4 faceWeight = faceMask * facing;
115
116 wallFace0 += directContribution * faceWeight.x;
117 wallFace1 += directContribution * faceWeight.y;
118 wallFace2 += directContribution * faceWeight.z;
119 wallFace3 += directContribution * faceWeight.w;
120 }
121 }
122 else
123 {
124 // Process each light for ground
125 for (int i = 0; i < _LightCount; i++)
126 {
127 float2 lightGridPos = _LightBuffer[i].position;
128 float4 lightColor = _LightBuffer[i].color;
129 float lightRadius = _LightBuffer[i].radius;
130
131 float dist = distance(gridPos, lightGridPos);
132 if (dist > lightRadius)
133 continue;
134
135 float shadowFactor = CalculateShadowDDA(lightGridPos, gridPosCell, lightRadius);
Ground samples DDA to the GI texel center (not the wall boundary).
136 float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, gridPosCell);
137 shadowFactor = min(shadowFactor, proxyShadowFactor);
138
139 totalShadow = max(totalShadow, shadowFactor);
140
141 float falloff = 1.0 - saturate(dist / lightRadius);
142 falloff = pow(falloff, _LightSoftness);
143
144 float3 directContribution = lightColor.rgb * falloff * shadowFactor;
145 output.color.rgb += directContribution;
146 }
147
148 // Process emissive-only lights (no shadows)
149 for (int i = 0; i < _EmissiveLightCount; i++)
150 {
151 float2 lightGridPos = _EmissiveLightBuffer[i].position;
152 float4 lightColor = _EmissiveLightBuffer[i].color;
153 float lightRadius = _EmissiveLightBuffer[i].radius;
154
155 float dist = distance(gridPos, lightGridPos);
156 if (dist > lightRadius)
157 continue;
158
159 float falloff = 1.0 - saturate(dist / lightRadius);
160 falloff = pow(falloff, _LightSoftness);
161
162 output.color.rgb += lightColor.rgb * falloff;
163 }
164 }
165
166 float shadowMask = isWall ? 0.0 : 1.0;
Shadow map only stores visibility for non-wall texels.
167 output.shadow = float4(totalShadow * shadowMask, 0.0, 0.0, 0.0);
168 output.wallFace0 = float4(wallFace0, 1.0);
169 output.wallFace1 = float4(wallFace1, 1.0);
170 output.wallFace2 = float4(wallFace2, 1.0);
171 output.wallFace3 = float4(wallFace3, 1.0);
172 return output;
173}

A few things to note:

IsObstacle checks: In this path, IsObstacle needs to be wired to use _WallBuffer directly. The conduction mask is view-dependent, which might cause popping if padding is not high enough.

Light falloff: The pow(falloff, _LightSoftness) controls how “hard” or “soft” the light edge is. A value of 1.0 gives linear falloff; higher values create a sharper cutoff at the edge.

Additive accumulation: Multiple lights simply add together. Two overlapping red lights make a brighter red. A red and blue light make purple (well, magenta). This isn’t physically based, but it behaves intuitively and looks natural for games.

Directional Lights: The Sun Problem

Point lights are straightforward: position, radius, done. But directional lights (sun, moon) have no position. They have a direction and cast shadows based on angle.

We handle these in a separate pass with their own data structure:

1struct VM_DirectionalLightData {
2 float3 direction; // Normalized light direction
3 float4 color;
4 float shadowDistanceMultiplier; // Pre-computed: 1.0 / max(abs(direction.y), 0.02)
5 float baseShadowDistance;
6 float bounceIntensity;
7 float2 _padding; // 16 byte alignment
8};

Shadow length depends on sun angle. When the sun is directly overhead (direction.y ≈ -1), shadows are short. At sunset (direction.y ≈ 0), shadows stretch to infinity.

We pre-compute shadowDistanceMultiplier on the CPU:

maxShadowDistance = baseShadowDistance x shadowDistanceMultiplier

The DDA then traces from each pixel in the light direction, checking against a fixed _WallHeight (uniform wall height) to approximate long shadows. The traversal is still 2D over the wall grid, _WallHeight and sun angle just scale the effective shadow reach.

Here’s the condensed pass logic:

DirectionalLights_DDA.hlsl
1float DirectionalShadowDDA(float2 pixelPos, float3 lightDir,
2 float baseShadowDistance, float shadowDistanceMultiplier)
3{
4 float3 rayDir3D = -lightDir;
5 float2 rayDir = normalize(rayDir3D.xz);
6 float rayDirXZLen = length(rayDir3D.xz);
7 if (rayDirXZLen < 1e-4) return 1.0;
8
9 const float DIR_EPS = 1e-6;
10 const float RAY_BIAS = 1e-4;
11 float2 startPos = pixelPos + rayDir * RAY_BIAS;
12 int2 cell = int2(floor(startPos));
13 int2 step = int2(sign(rayDir));
14 bool xZero = abs(rayDir.x) < DIR_EPS;
15 bool yZero = abs(rayDir.y) < DIR_EPS;
16
17 float2 tMax;
18 tMax.x = xZero ? 1e30 :
19 (step.x > 0 ? (floor(startPos.x) + 1 - startPos.x) / rayDir.x
20 : (startPos.x - floor(startPos.x)) / -rayDir.x);
21 tMax.y = yZero ? 1e30 :
22 (step.y > 0 ? (floor(startPos.y) + 1 - startPos.y) / rayDir.y
23 : (startPos.y - floor(startPos.y)) / -rayDir.y);
24
25 float2 tDelta = float2(
26 xZero ? 1e30 : abs(1.0 / rayDir.x),
27 yZero ? 1e30 : abs(1.0 / rayDir.y));
28
29 float maxDist = baseShadowDistance * shadowDistanceMultiplier;
30 int maxSteps = 2 * (_GridWidth + _GridHeight);
31
32 for (int i = 0; i < maxSteps; i++)
33 {
34 if (BlocksDirectionalCell(cell, startPos, rayDir3D, rayDirXZLen))
35 return 0.0;
36
37 if (tMax.x < tMax.y) { tMax.x += tDelta.x; cell.x += step.x; }
38 else { tMax.y += tDelta.y; cell.y += step.y; }
39
40 float traveled = min(tMax.x, tMax.y);
41 if (traveled > maxDist) return 1.0;
42
43 int bufferX = cell.x + _GridOffsetX;
44 int bufferY = cell.y + _GridOffsetY;
45 if (bufferX < 0 || bufferY < 0 || bufferX >= _GridWidth || bufferY >= _GridHeight)
46 return 1.0;
47 }
48 return 1.0;
49}
50
51float3 direct = 0;
52float totalShadow = 0;
53for (int i = 0; i < _DirectionalLightCount; i++)
54{
55 VM_DirectionalLightData lightData = _DirectionalLightBuffer[i];
56 float3 lightDir = lightData.direction;
57 float4 lightColor = lightData.color;
58 float shadowDistanceMultiplier = lightData.shadowDistanceMultiplier;
59 float baseShadowDistance = lightData.baseShadowDistance;
60
61 float visibility = (lightDir.y >= 0.0) ? 0.0
62 : DirectionalShadowDDA(gridPosCell, lightDir, baseShadowDistance, shadowDistanceMultiplier);
63 visibility = min(visibility, ComputeDirectionalProxyShadow(gridPosCell, lightDir));
64
65 float angle = saturate(-lightDir.y * 2.0);
66 angle = lerp(angle, 1.0, _GI_DirectionalAngleScale);
67
68 totalShadow = max(totalShadow, visibility);
69 direct += lightColor.rgb * angle * visibility;
70 // Wall faces use the same face-weight accumulation as point lights,
71 // but their shadow rays start just outside each face to avoid self-occlusion.
72}
73output.color.rgb += direct;
74output.shadow.g = totalShadow * shadowMask;

The height test is a simple wall-height check against the ray:

Directional shadow height check (condensed)
1bool BlocksDirectionalCell(int2 cell, float2 startPos, float3 rayDir3D, float rayDirXZLen)
2{
3 if (!IsObstacle(cell.x, cell.y)) return false;
4
5 float distanceToWall = length(float2(cell) + 0.5 - startPos);
6 if (abs(rayDir3D.y) > 0.001 && rayDirXZLen > 1e-4)
7 {
8 float verticalRate = rayDir3D.y / rayDirXZLen;
9 float rayHeightAtWall = distanceToWall * verticalRate;
10 return rayHeightAtWall < _WallHeight; // uniform wall height
11 }
12 return true;
13}

This pass uses additive blending for direct color and max blending for shadow visibility (G channel), so “any directional light wins.” It also reads the existing wall-direct textures and adds its contribution before writing out the updated wall-direct RTs.

The angle-based brightness is blended by _GI_DirectionalAngleScale: 0 = full angle modulation, 1 = no angle modulation (flat intensity).

Emissive Lights (No Shadows)

Regular lights cast shadows. But what about glowing crystals, lava pools, or spell projectiles? These are emissive lights, they illuminate their surroundings but don’t cast shadows.

We handle these separately:

1struct VM_EmissiveLightData {
2 float2 position;
3 float4 color;
4 float radius;
5 float _padding; // 16 byte alignment
6};

Emissives are uploaded into their own buffer and processed in the same pass as point lights. They add to direct lighting (ground) and wall face textures, but skip DDA shadows entirely. In my implementation, emissives bypass the maxLights cap.

Because emission is extracted from the direct texture right after this pass, emissive lights automatically seed indirect lighting too. Emissive color is pre-multiplied by intensity on the CPU, so the shader only needs position, radius, and color.

Shadow map encoding

Note
I’ve used “shadow map” loosely here, this is not a shadow map in the traditional sense, it’s closer to a visibility texture.

The shadow texture uses this encoding:

In the point-light pass we only write R (and set G to 0). The directional pass writes only G. We keep them separate because the sun and local lights behave differently, but the combined visibility is useful for diffusion heuristics. In HeatTransfer, we use max(R, G) to decide how aggressively to fill in shadows.

4. Indirect Light: Thermodynamics of Pixels

This is the part that produces the GI look: indirect bounce via diffusion.

For that, we’ll treat light as if it was heat.

Imagine a lit floor tile as a hot plate. Darkness is cold air. We run a simulation where heat (light energy) naturally flows from hot areas to cold areas, provided there’s a conductive medium (air, not walls) connecting them. Run this simulation enough times, and light “spreads” into shadowed corners just like real indirect illumination.

Heat spreading through each heat spread step
23 heat spread iterations visualized

Step 1: Extract Emission

Before we can diffuse anything, we need to seed the heat buffer with “new energy.” We take the direct light texture and extract a percentage of it:

ExtractEmission.hlsl
1float4 directLight = SAMPLE_TEXTURE2D(_MainTex, sampler_MainTex, uv);
2
3// Non-linear extraction to prevent white hotspots
4float maxChannel = max(directLight.r, max(directLight.g, directLight.b));
5float curve = 1.0 - exp(-maxChannel * 2.0); // Exponential rolloff
6float scaleFactor = _EmissionStrength * curve / (maxChannel + 0.001);
7
8float4 emission = directLight * scaleFactor;
9
10// Boost saturation to compensate for averaging
11float luminance = dot(emission.rgb, float3(0.299, 0.587, 0.114));
12emission.rgb = luminance + (emission.rgb - luminance) * 1.2;

We use a non-linear curve (1 - e^(-x)). Without this, bright light sources would create white hotspots that dominate the scene. The exponential rolloff compresses bright values while preserving color in mid-tones.

The saturation boost at the end counteracts the desaturation that happens when you average colors repeatedly. Without it, orange torchlight turns into muddy beige after a few diffusion iterations.

Extraction runs once per frame; the heat transfer loop then iterates on that seed (we don’t inject new emission every iteration).

Step 2: The Ping-Pong Diffusion

HeatTransfer.shader runs in a ping-pong loop: read from texture A, write to texture B, swap, repeat.

HeatTransfer.hlsl
1// Standard 8-neighbor offsets
2static const float2 offsets[8] = {
3 float2(-1, -1), float2(0, -1), float2(1, -1),
4 float2(-1, 0), float2(1, 0),
5 float2(-1, 1), float2(0, 1), float2(1, 1)
6};
7
8// Diagonal neighbors get less weight (1 / sqrt(2))
9static const float weights[8] = {
10 0.7071, 1.0, 0.7071,
11 1.0, 1.0,
12 0.7071, 1.0, 0.7071
13};

Each pixel samples its 8 neighbors using a mix of world-space and texel-space offsets. The far-field taps use _DiffusionDistance (world units), while the near-field taps are a small number of texels for continuity. The neighbor’s heat flows into the current pixel proportionally to:

  1. The neighbor’s conductivity (is it air or wall?)
  2. The current pixel’s conductivity
  3. The distance weight
  4. The global diffusion rate

Diffusion update equation

Here’s the iteration skeleton for each heat spread round:

HeatTransfer.hlsl
1float4 centerHeat = SAMPLE_TEXTURE2D(_ColorBuffer, sampler_ColorBuffer, uv);
2float centerCond = SAMPLE_TEXTURE2D(_ConductionMask, sampler_ConductionMask, uv).r;
3if (centerCond < 1e-4) return centerHeat;
4
5float4 shadowInfo = SAMPLE_TEXTURE2D(_ShadowMap, sampler_ShadowMap, uv);
6float directVisibility = max(shadowInfo.r, shadowInfo.g);
7float inShadow = 1.0 - directVisibility;
8
9CamBounds bounds = GetGlobalCameraViewData();
10float2 worldToUV = float2(1.0 / bounds.size.x, 1.0 / bounds.size.y);
11float2 sampleDistance = _DiffusionDistance * worldToUV;
12
13float3 accum = 0.0;
14float3 wsum = 0.0;
15
16// Loop neighbors (tiered if adaptive). Offsets + distanceWeight are precomputed.
17for (int i = 0; i < 8; i++)
18{
19 float2 nUV = uv + offsets[i] * sampleDistance;
20 float4 nHeat = SAMPLE_TEXTURE2D(_ColorBuffer, sampler_ColorBuffer, nUV);
21 float nCond = SAMPLE_TEXTURE2D(_ConductionMask, sampler_ConductionMask, nUV).r;
22
23 float weight = nCond * centerCond * weights[i] * _DiffusionRate;
24
25 // Stylized per-channel (see next section)
26 float3 channelPresence = saturate(nHeat.rgb + 0.1);
27 float3 channelWeights = weight * channelPresence;
28 accum += nHeat.rgb * channelWeights;
29 wsum += channelWeights;
30}
31
32accum /= max(wsum, 1e-4);
33
34float blend = lerp(0.5, 0.8, inShadow) * _DiffusionRate;
35float3 newHeat = lerp(centerHeat.rgb, accum, blend);
36
37// Optional dark-area bias (keeps deep shadows from staying dead)
38float centerLum = dot(centerHeat.rgb, float3(0.299, 0.587, 0.114));
39float darkness = 1.0 - saturate(centerLum);
40newHeat *= 1.0 + _RangeBoost * darkness * inShadow;
41
42newHeat *= _IntensityMultiplier;
43
44// Soft cap
45float maxChannel = max(newHeat.r, max(newHeat.g, newHeat.b));
46if (maxChannel > 2.0)
47 newHeat *= 2.0 / maxChannel;
48
49return float4(newHeat, centerHeat.a);

Per-Channel Diffusion (Stylized)

Here’s a subtle but important detail. Early versions of the system diffused all channels together. Red, green, and blue moved as one. That’s the physically-based approach, and it already mixes colors correctly. But it tends to wash out saturation in pixel art.

What I ended up using is a stylized per-channel diffusion: each channel “pushes through” independently and is normalized separately. This keeps saturated colors punchier and makes color bleeding read better at low resolution. It is not physically based, it’s an artistic control knob.

1// Each channel can spread independently
2float3 channelPresence = saturate(neighborHeat.rgb + 0.1);
3float3 channelWeights = pathConductivity * channelPresence;
4
5// Accumulate each channel independently
6accumulatedHeat.r += neighborHeat.r * channelWeights.r;
7accumulatedHeat.g += neighborHeat.g * channelWeights.g;
8accumulatedHeat.b += neighborHeat.b * channelWeights.b;
9
10totalWeightPerChannel += channelWeights;
11
12// Later: normalize each channel independently
13if (totalWeightPerChannel.r > 0.001)
14 accumulatedHeat.r /= totalWeightPerChannel.r;

The channelPresence bias (+ 0.1) ensures that even dark areas can receive color. Otherwise, black pixels would never pick up any light.

If you want the physically-based version, keep the weights scalar and accumulate RGB as a vector:

float3 accum = 0.0;
float wsum = 0.0;
accum += neighborHeat.rgb * pathConductivity;
wsum  += pathConductivity;
float3 newHeat = (wsum > 1e-4) ? (accum / wsum) : centerHeat.rgb;

Solving the Resolution Problem: Hierarchical Sampling

At high GI resolutions (e.g., giResolution = 16), the world-space hop spans many GI texels (skipping over them), while texel-scale hops alone would shrink world-space reach. The result was either blocky diffusion or very slow spread.

But if we increased the sampling distance in GI texels, we got “grid artifacts” (black dots) where sparse taps miss nearby lit texels, leaving isolated unlit holes.

To prevent this, we use Adaptive Hierarchical Sampling. The shader detects when the GI-texel distance is large and switches to a three-tier approach:

1float samplingPixelDistance = _DiffusionDistance * pixelsPerWorldUnit;
2bool useAdaptiveSampling = samplingPixelDistance > 2.5;
3
4if (useAdaptiveSampling)
5{
6 // Tier 1: Near-field (30%) - local smoothness
7 for (int i = 0; i < 8; i++) {
8 float2 nearSampleUV = viewportUV + offsets[i] * texelSize * 1.5;
9 // ... accumulate with 30% weight
10 }
11
12 // Tier 2: Mid-field (30%) - bridge the gap
13 for (int j = 0; j < 8; j++) {
14 float2 midSampleUV = viewportUV + offsets[j] * sampleDistance * 0.5;
sampleDistance is the UV offset corresponding to _DiffusionDistance in world units (i.e., a world-space hop expressed in GI texture UVs).
15 // ... accumulate with 30% weight
16 }
17
18 // Tier 3: Far-field (40%) - long-range transport
19 for (int k = 0; k < 8; k++) {
20 float2 farSampleUV = viewportUV + offsets[k] * sampleDistance;
21 // ... accumulate with 40% weight
22 }
23}
  1. Near-Field (30%): Samples at 1.5 GI texels, fills immediate gaps
  2. Mid-Field (30%): Samples at 50% world distance, bridges near and far
  3. Far-Field (40%): Samples at full world distance, long-range light transport

This allows light to travel across larger distances without needing hundreds of iterations (far-field) while maintaining smooth gradients (near-field).

The 30 / 30 / 40 split wasn’t analytically derived. It came from iterative tuning with two constraints:

The mid-field tier exists purely to smooth the transition between those two regimes. Its weight is the least sensitive; it mainly prevents visible “bands” where near and far contributions meet.

One caveat: the conduction check is endpoint-only (center + neighbor). With far-tier samples, that can jump across thin walls if both endpoints are air. If that shows up, the fix is straightforward: only allow far-tier samples when the straight segment is unobstructed. A cheap version is to take 3-6 steps along the segment and multiply conductivities; if any step hits a wall, zero out that far sample.

Conduction in Action

Crucially, every diffusion sample multiplies by the Conduction Mask:

1float neighborConductivity = SAMPLE_TEXTURE2D(_ConductionMask, ...).r;
2float pathConductivity = neighborConductivity * centerConductivity * weight;

If either the source or destination is a wall (conductivity 0), no heat flows. For near-field taps this is enough to respect walls and doorways within the simulated window. For far-tier taps, either accept some leakage as a tradeoff or add a cheap segment check (described above) to prevent hopping across thin walls.

The Blur Pass (Optional)

Depending on your settings, you might see subtle grid patterns, especially at lower resolutions. An optional two-pass Gaussian blur can smooth these artifacts:

EdgeAwareBlur.hlsl
1// Edge-aware: only blur between similar surfaces
2bool centerIsObstacle = centerConductivity < 0.1;
3bool neighborIsObstacle = neighborConductivity < 0.1;
4
5if (centerIsObstacle == neighborIsObstacle)
6{
7 // Same surface type - include in blur
8 color += sampleColor * weights[i];
9 totalWeight += weights[i];
10}

The blur is conductivity-aware: it won’t smear light across wall boundaries. We also apply brightness preservation to prevent the blur from darkening the image.

WallIndirect pass: projecting bounce onto walls

Indirect light only exists in conductive cells. Wall cells are black in the indirect buffer because the conduction mask blocks diffusion there. That is correct for ground, but it means wall faces would read zero if we sampled _IndirectLightColor directly.

The WallIndirect pass is a tiny projection pass that turns the indirect buffer into four wall-face textures, mirroring how direct lighting already works. It runs after heat transfer (and after blur if enabled), reads the final indirect texture, and for each wall texel copies the indirect value from the adjacent air cell into the matching face output.

Implementation details:

WallIndirectGI.shader (condensed)
1float2 gridPos = uvToGrid(input.uv);
2float giRes = max(_GI_Resolution_Lib, 1.0);
3float2 gridPosCell = (floor(gridPos * giRes) + 0.5) / giRes;
4
5int2 cell = int2(floor(gridPos));
6if (!IsObstacle(cell.x, cell.y))
7 return output; // Not a wall
8
9float4 faceMask = GetWallFaceMask(cell);
10
11float3 posX = faceMask.x > 0.0 ? SampleIndirectAt(gridPosCell + float2(1, 0)) : 0;
12float3 negX = faceMask.y > 0.0 ? SampleIndirectAt(gridPosCell + float2(-1, 0)) : 0;
13float3 posZ = faceMask.z > 0.0 ? SampleIndirectAt(gridPosCell + float2(0, 1)) : 0;
14float3 negZ = faceMask.w > 0.0 ? SampleIndirectAt(gridPosCell + float2(0, -1)) : 0;
15
16output.wallFace0 = float4(posX, 1.0);
17output.wallFace1 = float4(negX, 1.0);
18output.wallFace2 = float4(posZ, 1.0);
19output.wallFace3 = float4(negZ, 1.0);

SampleIndirectAt maps world to UVs, clamps, and snaps to the texel center before sampling, so results are stable under phase-locked camera motion. The output textures are later read by SampleWallIndirect in GI_Lib.hlsl when a wall face is shaded. Because this is just a neighbor copy, it is cheap and it inherits whatever diffusion, blur, and shadow handling already exists in the indirect buffer.

Combining Everything

Direct + indirect are merged in GI_Lib.hlsl at sample time, where we also apply AO, conductivity, and tone mapping:

Note: ground lighting is masked by conductivity (walls → 0). Wall faces bypass conduction and are combined per face. AO stays on the ground path; if you want AO on walls, inject it into the wall sampling path.

Tone mapping happens in sampling, and we support multiple modes:

The bounceIntensity parameter on directional lights lets you control how bright shadows appear when the sun is out. A low value means harsh shadows; a high value fills them with more bounce light.


Sampling in the Lit shader

Once the GI textures exist, the Lit material has to decide which texture to sample and where. We treat surfaces as one of three types:

All of this routing lives in GI_Lib.hlsl and is exposed to Shader Graph through subgraphs that sample the global textures. The core decision tree looks like this:

GI_Lib.hlsl
1float3 CalculateAverageLightBrightness(float3 worldPos, float3 worldNormal)
2{
3 if (ShouldUseWallLighting(worldPos, worldNormal))
4 {
5 float3 combined = SampleWallDirect(worldPos, worldNormal)
6 + SampleWallIndirect(worldPos, worldNormal);
7 return ApplyToneMap(combined);
8 }
9
10 if (ShouldUseWallTopLighting(worldPos, worldNormal))
11 {
12 float3 combined = SampleWallTopDirect(worldPos)
13 + SampleWallTopIndirect(worldPos);
14 return ApplyToneMap(combined);
15 }
16
17 float3 direct = SampleGITextureAtWorldPos(_DirectLightColor, sampler_DirectLightColor, worldPos, worldNormal);
18 float3 indirect = SampleGITextureAtWorldPos(_IndirectLightColor, sampler_IndirectLightColor, worldPos, worldNormal);
19 indirect = ApplyDirectionalBounce(indirect, worldPos.xz);
20
21 float3 combined = (direct + indirect);
22 combined *= SampleGIAmbientOcclusion(worldPos.xz);
23 combined *= SampleConduction(worldPos.xz);
24 return ApplyToneMap(combined);
25}

Ground sampling

If the surface is not a wall and not a wall top, we sample _DirectLightColor and _IndirectLightColor separately and combine them (with AO + conduction + tone map) in CalculateAverageLightBrightness. The low-level helper still just maps world XZ → UV and samples a GI texture, so ground lighting stays smooth and continuous.

GI_Lib.hlsl (ground sampling)
1float3 SampleGITextureAtWorldPos(Texture2D giTexture, SamplerState giSampler, float3 worldPos, float3 worldNormal)
2{
3 // Walls get routed elsewhere; ground samples directly.
4 float2 uv = gridToViewportUV(worldPos.xz);
5 uv = snapUVToTexelCenter(uv);
6
7 return SAMPLE_TEXTURE2D(giTexture, giSampler, uv).xyz;
8}

Wall faces (vertical surfaces)

ShouldUseWallLighting checks two things:

GI_Lib.hlsl (ShouldUseWallLighting)
1bool ShouldUseWallLighting(float3 worldPos, float3 worldNormal)
2{
3 if (_WallLightingEnabled < 0.5)
4 return false;
5
6 // Treat surfaces with significant XZ normal component as walls (even if slightly tilted).
7 float2 nXZ = worldNormal.xz;
8 if (length(nXZ) < 0.5)
9 return false;
10
11 float2 samplePos = GetWallSamplePosition(worldPos, worldNormal);
12
13 // IsWallAtWorld samples the conduction mask at the corresponding UV and thresholds it.
14 return IsWallAtWorld(samplePos) || IsWallAtWorld(worldPos.xz);
15}

When it’s a wall, we sample from four wall face textures (_WallDirect_PosX/_NegX/_PosZ/_NegZ and _WallIndirect_*). Because the world is voxelized and walls are axis-aligned, I pick a dominant face instead of blending across edges. We still offset the sample by half a GI texel along the normal so we land in the adjacent conductive texel instead of the wall boundary.

GI_Lib.hlsl (wall faces)
1int GetDominantWallFace(float3 worldNormal)
2{
3 float2 n = worldNormal.xz;
4 float2 absN = abs(n);
5
6 if (absN.x >= absN.y)
7 return n.x >= 0 ? 0 : 1; // +X / -X
8 return n.y >= 0 ? 2 : 3; // +Z / -Z
9}
GI_Lib.hlsl (wall sampling)
1float3 SampleWallDirect(float3 worldPos, float3 worldNormal)
2{
3 float2 samplePos = GetWallSamplePosition(worldPos, worldNormal);
4 float2 uv = gridToViewportUV(samplePos);
5 uv = snapUVToTexelCenter(uv);
6
7 int face = GetDominantWallFace(worldNormal);
8 if (face == 0) return SAMPLE_TEXTURE2D(_WallDirect_PosX, sampler_WallDirect_PosX, uv).rgb;
9 if (face == 1) return SAMPLE_TEXTURE2D(_WallDirect_NegX, sampler_WallDirect_NegX, uv).rgb;
10 if (face == 2) return SAMPLE_TEXTURE2D(_WallDirect_PosZ, sampler_WallDirect_PosZ, uv).rgb;
11 return SAMPLE_TEXTURE2D(_WallDirect_NegZ, sampler_WallDirect_NegZ, uv).rgb;
12}

Wall tops (horizontal surfaces on a wall cell)

ShouldUseWallTopLighting checks for a mostly-upward normal and verifies the cell beneath is a wall. For tops, I keep it intentionally simple:

This look for wall tops might not fit your game and is very much an artistic choice. Modify as needed.

GI_Lib.hlsl (ShouldUseWallTopLighting)
1bool ShouldUseWallTopLighting(float3 worldPos, float3 worldNormal)
2{
3 if (_WallLightingEnabled < 0.5)
4 return false;
5
6 // Top faces should be mostly upward with minimal XZ component.
7 float2 nXZ = worldNormal.xz;
8 if (worldNormal.y < 0.5 || length(nXZ) > 0.35)
9 return false;
10
11 float2 cellCenter = floor(worldPos.xz) + 0.5;
12 return IsWallAtWorld(cellCenter);
13}
GI_Lib.hlsl (wall top sampling)
1float3 SampleWallTopDirect(float3 worldPos)
2{
3 float3 lightDir = normalize(_GI_MainDirectionalDir.xyz);
4 float angle = saturate(-lightDir.y * 2.0);
5 angle = lerp(angle, 1.0, _GI_DirectionalAngleScale);
6 return _GI_MainDirectionalColor.rgb * angle;
_GI_MainDirectionalColor and _GI_MainDirectionalDir is set based on URPs lightData.mainLightIndex.
7}
8
9float3 SampleWallTopIndirect(float3 worldPos)
10{
11 return SampleGroundAverage(_IndirectLightColor, sampler_IndirectLightColor, worldPos.xz);
12}

Direct and indirect sampling mirror the same routing (SampleDirectLightColor / SampleIndirectLightColor), just against _DirectLightColor / _IndirectLightColor and the wall-specific textures.

Optimizations

These are the major optimizations that can be done to this system, but were out of scope for the article:

Performance data

These measurements are from a standalone release build, captured via NVIDIA Nsight Perf SDK (NVPerf) using the gpu__time_duration.sum hardware counter. Internal game resolution was 480x270 (30x16.875 tiles) on a 3060 Mobile GPU with Linux + Vulkan.

Methodology: All frame captures were done in the same process (per GPU) using console commands to change GI options. They were all taken in the same world position, with walls and shadow proxies (38 of them) present in the scene. All lights were visible and frame caching was disabled. No wall faces were culled. Lights had a radius of 10 and were spread out.

A compute version of heat spread was used.

A camera padding of 1.4 was used, along with 20 heat spread rounds for all captures. All other settings were the same for all captures.

Full, Half and Quarter refer to the resolution at which Heat Spread was performed, then upsampled.

GI Res 16: Full = 768x448, Half = 384x224, Quarter = 192x112
GI Res 8: Full = 384x224, Half = 192x112, Quarter = 96x56. 

Small passes (omitted from the tables): ConductionMask, AO, ExtractEmission, WallIndirect. In these runs they sum to ~0.04 to 0.11 ms on the 3060 and ~0.08 to 0.66 ms on Intel. Totals below include them.

Point lights are shadow-casting point lights. Emissive and directional counts are listed separately.

giResolution = 16 (RTX 3060 Mobile)

Heat Spread ResPointEmissiveDirectionalDDADDA_Directional*HeatSpreadBlurTotal (ms)
full1000.090.061.140.081.46
full1010.101.971.290.083.54
full125600.450.061.310.092.01
full3225601.740.061.180.083.16
full6425603.090.051.210.084.53
half6425603.030.050.440.053.68
quarter6425603.060.050.240.053.51

giResolution = 8 (RTX 3060 Mobile)

Heat Spread ResPointEmissiveDirectionalDDADDA_Directional*HeatSpreadBlurTotal (ms)
full6425601.010.020.430.031.53
half6425600.980.020.230.021.30
quarter6425600.980.020.170.031.24

* DDA_Directional shows the directional-light pass. With 0 directional lights this is mostly overhead.

Integrated GPU (Intel UHD Graphics (TGL GT1))

I also ran the game on my integrated GPU (Intel UHD Graphics (TGL GT1), i7-11800H CPU). It struggles with giResolution = 16 even at quarter heat res, but giResolution = 8 we get decent frame times with low light counts.

giResolution = 16 (Quarter)

Heat Spread ResPointEmissiveDirectionalDDADDA_Directional*HeatSpreadBlurTotal (ms)
quarter16403.510.342.480.377.37
quarter3264035.120.354.900.3141.34

giResolution = 8 (Quarter)

Heat Spread ResPointEmissiveDirectionalDDADDA_Directional*HeatSpreadBlurTotal (ms)
quarter16400.940.070.200.111.52
quarter326409.570.071.740.1011.56

What we can learn from this

Closing thoughts

Phew, that was a long one! This was a very challenging topic to tackle with limited resources available, so I decided to pack as much information here as possible. Allow me to break the “engineer” persona and say this was very frustrating and painful at times, and I became a little obsessed with it, but it was extremely fun and taught me a lot. I truly hope this helps someone keep their hair strands attached to their head. If you do something cool with this, let me know! I’d love to hear about it!

There might be techniques I’ve missed, there are for sure things I can improve, but I’m happy with where I landed with this system. Feedback and suggestions are always appreciated, and questions are welcome. You can reach me on Twitter @gincodes or through e-mail at [email protected].

If this write-up made you think “we should hire this person”, I’m currently looking to switch from a 10-year detour in back-end engineering into full-time game dev. Reach me at: [email protected]