Pixel-perfect GI Without Ray Tracing
Hi! I’m building Anchorfall with a visual style inspired by Core Keeper: chunky pixel art with dynamic lighting, sharp shadows, and real-time global illumination.
To achieve that, I built a custom lighting pipeline that does direct light, indirect light (GI), pixel-perfect shadows and gives strong artistic control. It isn’t physically accurate, but fits pixel art games well.

It’s inspired by Core Keeper’s own lighting system (which uses heat diffusion in their pipeline) and David Maletz’s work on I Can’t Escape: Darkness. His blog post was the foundation for the indirect lighting approach.
This system is the result of ~8 months of iteration, and easily the steepest rendering learning curve I’ve put myself through. It was initially an indirect-only system, but merging URP’s own light/shadow with the system proved to be brittle and wasteful (we need direct to seed indirect anyway). I wanted something that looked great with pixel art, performed well and had none of the usual artifacts we see in modern, non-baked GI like temporal sampling or ghost trails while allowing artistic choices like dithering, posterization, palette swaps, etc.
I’m really happy with where I landed with this, and despite the length of this article, this system will likely grow to support more artistic control. But even as it stands now, it’s a solid system that feels production-worthy.
This is implemented in Unity (URP + RenderGraph). Code snippets are in C# and HLSL.
Table of Contents
- Overview
- Pass Graph + Texture Formats
- Requirements and compromises
- 1. Occlusion representation: The Conduction Mask
- 2. GI View Window
- 3. Direct Lighting (Point Lights)
- 4. Indirect Light: Thermodynamics of Pixels
- Optimizations
- Performance data
- Closing thoughts
Overview

From a high level, here is how the system works:
- It renders a texture that has “conduction” information, or in other words, where in the world light is allowed to exist. This is what we call the conduction mask.

- Point lights are rendered in a separate texture using DDA against the wall buffer. This yields our direct light data and shadow visibility.

- We then use the data from direct to seed a heat-spread algorithm. This spreads the light, allowing it to turn corners and fill spaces.

- Direct + indirect are sampled together (AO + conduction + tone map) to produce the final lighting

- Then our Lit shader samples the resulting textures with some special handling for vertical geometry (walls, etc), resulting in:


Pass Graph + Texture Formats
I’m not going to walk through RenderGraph wiring, but here’s the minimal pass order + texture contract this system assumes. If you follow this, you’ll match the behavior in the shaders.
Update _CameraView (phase-locked)
ConductionMask _ConductionMask (global) (R8)
AmbientOcclusion _AmbientOcclusion (global) (R8) [optional]
Direct Point + Emissive -> _DirectLightColor (global) (RGBA16F)
-> _GI_ShadowMap (global) (R8G8: R=point, G=directional)
-> _WallDirect_[PosX/NegX/PosZ/NegZ] (global) (RGBA16F)
Directional Lights -> add into _DirectLightColor
-> write _GI_ShadowMap.G (max blend)
-> update _WallDirect_* textures
ExtractEmission (RGBA16F)
HeatTransfer ping-pong _IndirectLightColor (Global) (RGBA16F) + Buffer (RGBA16F), optional downsample
Optional Blur Writes to _IndirectLightColor (RGBA16F)
WallIndirect -> WallIndirect_PosX/NegX/PosZ/NegZ (RGBA16F)
Sample in Lit shader (direct + indirect + AO + conduction + tone map)All textures are linear (sRGB off), no mips, no MSAA, point + clamp.
Implementation detail: all of these are RTHandles. They persist across frames and are reallocated when the GI window size changes (RenderGraph ReAllocateHandleIfNeeded). Heat transfer uses ping-pong buffers (and an optional downsample + upsample path).
Directional, AO and blur passes are optional.
Requirements and compromises
This pipeline makes some strong assumptions to stay fast and stable:
- Target hardware: desktop/console-class GPUs. I assume compute is available and memory bandwidth isn’t ultra-tight.
- The world must be grid-aligned.
- Lighting is simulated in 2D and projected onto 3D geometry, so it has no true height-aware GI. Complex vertical structures won’t look correct.
- Light and shadows are pixelated by design (tied to the GI texture resolution).
- It assumes a 3D world: we use normals and voxel height to shade ground, wall faces, and wall tops. A 2D-only version is possible, but it would need changes.
1. Occlusion representation: The Conduction Mask
In a typical 3D engine, light is blocked by geometry (meshes). In our world, the “truth” of the level geometry lives in a StructuredBuffer<int> called the _WallBuffer. This is a flat array of integers representing our 1x1 world tiles: 0 for empty, 1 for wall.
To make this data accessible to our pixel shaders, we first render a Conduction Mask. This is a camera-aligned texture (R8) that tells the lighting engine where light is allowed to exist.
The ConductionMask.shader is deceptively simple. It maps the fullscreen UVs to world grid coordinates.
1float frag(Varyings i) : SV_Target2{3 float2 uv = i.texcoord;4 5 float blockAllLightColor = 0;6 float allowAllLightColor = 1;7 8 float2 worldCoord = uvToGrid(uv);↳uvToGrid returns grid coordinates, it's defined in the next section9 int2 worldTile = int2(floor(worldCoord));10 11 // Apply offset to convert world coords to buffer indices12 int2 bufferTile = worldTile + int2(_GridOffsetX, _GridOffsetY);13 14 if (bufferTile.x < 0 || bufferTile.x >= _GridWidth || bufferTile.y < 0 || bufferTile.y >= _GridHeight)15 {16 return allowAllLightColor;17 }18 19 int bufferIndex = Convert2DIndexTo1D(bufferTile, _GridWidth);20 int isWall = _WallBuffer[bufferIndex];21 22 if (isWall == 0)23 {24 return allowAllLightColor;25 }26 27 return blockAllLightColor; // Wall cell: no conduction (light can't exist/propagate here)28} We index the wall buffer row-major (x + y * width) where y corresponds to world +Z. The shader helpers used across passes look like this:
1int Convert2DIndexTo1D(int2 idx, int width)2{3 return idx.x + idx.y * width;4} Optionally, you could choose to sample _WallBuffer directly instead of saving a conduction texture, but having the conduction mask texture helped immensely when things went wrong, so I’m adding both IsObstacle versions below, but beware that the snippets here assume a _ConductionMask texture is present.
1bool IsObstacle(int x, int y)2{3 int bufferX = x + _GridOffsetX;4 int bufferY = y + _GridOffsetY;5 6 // Out-of-bounds = air (streaming-friendly edges)7 if (bufferX < 0 || bufferY < 0 || bufferX >= _GridWidth || bufferY >= _GridHeight)8 return false;9 10 float2 cellPos = float2(x, y) + 0.5;11 float2 conductionUV = gridToViewportUV(cellPos);12 float conductivity = SAMPLE_TEXTURE2D(_ConductionMask, sampler_ConductionMask, conductionUV).r;13 return conductivity < 0.1;14} 1bool IsObstacle(int x, int y)2{3 int bufferX = x + _GridOffsetX;4 int bufferY = y + _GridOffsetY;5 6 // Out-of-bounds = air (streaming-friendly edges)7 if (bufferX < 0 || bufferY < 0 || bufferX >= _GridWidth || bufferY >= _GridHeight)8 return false;9 10 int bufferIndex = Convert2DIndexTo1D(int2(bufferX, bufferY), _GridWidth);11 return _WallBuffer[bufferIndex] != 0;12} The result is a binary map of the world. White allows conduction, black blocks it (stored in the R channel because the texture is R8). Out-of-bounds is treated as empty/air to keep streaming-friendly behavior at the edges of the simulated window.
For DDA shadow rays, we query _WallBuffer directly so shadows don’t depend on the GI window. This prevents popping.
Grid size + WallBuffer population
The mask only works if the shader knows the grid dimensions and has the wall buffer bound. _GridWidth and _GridHeight define the buffer size, and _GridOffsetX/Y shifts world coordinates into buffer space (so a centered or streaming world can still map into a single flat array).
In Anchorfall, a GIWallBufferManager owns the buffer. It builds a flat ComputeBuffer<int> (0 = empty, 1 = wall) from the MapGen wall layer, binds it as _WallBuffer, and sets the grid globals. It also mirrors the data on the CPU so we can do incremental updates (chunk load/unload or single-tile edits) without GPU readback.
_GridOffset is independent from _CameraView: the camera view defines what area we render, while the grid offset defines where the global wall buffer lives in world space.
1_bufferSize = new int2(diameter, diameter);2_coordinateOffset = diameter / 2;3 4Shader.SetGlobalInt("_GridWidth", _bufferSize.x);5Shader.SetGlobalInt("_GridHeight", _bufferSize.y);6Shader.SetGlobalFloat("_WallHeight", wallHeight);7 8_wallBuffer = new ComputeBuffer(_bufferSize.x * _bufferSize.y, sizeof(int));9_wallBuffer.SetData(_cpuMirror);10Shader.SetGlobalBuffer("_WallBuffer", _wallBuffer);11 12GIBridge.GridSizeOverride = _bufferSize;13GIBridge.GridOffsetOverride = new int2(_coordinateOffset, _coordinateOffset); Any time the wall data changes we push the updated range (or a whole chunk) and bump GIBridge.WallBufferVersion, which lets the GI render feature know the mask needs to refresh.
Ambient occlusion
Right after the conduction mask we compute a lightweight ambient occlusion (AO) texture. It only depends on the wall buffer and the grid mapping. We apply AO at sample-time to darken creases and wall-adjacent areas without affecting light transport.
This AO is purely a contact-darkening term; it doesn’t affect shadowing or bounce, only the final shading.
The AO pass samples the wall buffer in a radius around each pixel, weights by distance (and a softness power), and outputs a single-channel occlusion value. _AORadius is in world units (tile units in my case). Optional Bayer dithering helps hide banding at low resolution.
1float2 worldPos = uvToGrid(uv);2float giRes = max(_GI_Resolution_Lib, 1.0);3float2 gridPos = (floor(worldPos * giRes) + 0.5) / giRes;4 5int2 baseTile = int2(floor(gridPos));6float radius = max(_AORadius, 0.01);7int range = (int)ceil(radius);8 9float occSum = 0.0;10float weightSum = 0.0;11 12for (int y = -range; y <= range; y++)13{14 for (int x = -range; x <= range; x++)15 {16 float2 tileCenter = float2(baseTile.x + x + 0.5, baseTile.y + y + 0.5);17 float dist = distance(gridPos, tileCenter);18 if (dist > radius) continue;19 20 float w = 1.0 - saturate(dist / radius);21 w = pow(w, max(_AOPower, 0.01));22 weightSum += w;23 24 if (IsObstacle(baseTile.x + x, baseTile.y + y))25 occSum += w;26 }27}28 29float occ = (weightSum > 0.0) ? (occSum / weightSum) : 0.0;30float ao = 1.0 - saturate(_AOStrength) * occ; 2. GI View Window
The light textures have to move with the camera, so we need to tell the shaders where the camera is looking.
The GIUtils.CalculateCameraVisibleArea() function projects the camera view onto the ground plane, then multiplies the result by a padding factor. A value of 2 means we render twice the visible width and height.
Padding is not primarily for lights whose transform is off-frustum. As long as a light’s radius overlaps the GI window, it will be included (even if the light itself is outside the camera frustum).
Padding is mostly about diffusion stability, tall objects and hiding the GI window edge. DDA shadow rays query _WallBuffer directly, so occluders outside the GI window still block light without relying on the conduction mask.
Objects are lit based on the tile they occupy. If an object is too tall, you’ll need more padding (or a different sampling strategy), otherwise the object will first appear dark then suddenly light up as the GI window slides into the tile it occupies.
1public static float4 CalculateCameraVisibleArea(Camera camera, float cameraViewPadding)2{3 const float groundPlaneY = 0.0f;4 5 Vector2[] corners = { new(0f, 0f), new(1f, 0f), new(0f, 1f), new(1f, 1f) };6 7 float2 minGrid = new float2(float.MaxValue, float.MaxValue);8 float2 maxGrid = new float2(float.MinValue, float.MinValue);9 10 for (int i = 0; i < corners.Length; i++)11 {12 Vector2 corner = corners[i];13 Ray ray = camera.ViewportPointToRay(new Vector3(corner.x, corner.y, 0f));14 float denom = ray.direction.y;15 16 // If the camera ray is parallel to the ground plane (or something goes NaN), we bail out and skip GI for the frame.17 if (math.abs(denom) < 1e-4f)18 return new float4(0f, 0f, 0f, 0f);19 20 float t = (groundPlaneY - ray.origin.y) / denom;21 if (t <= 0.0f || float.IsNaN(t) || float.IsInfinity(t))22 return new float4(0f, 0f, 0f, 0f);23 24 Vector3 hit = ray.origin + ray.direction * t;25 minGrid = math.min(minGrid, new float2(hit.x, hit.z));26 maxGrid = math.max(maxGrid, new float2(hit.x, hit.z));27 }28 29 float2 center = (minGrid + maxGrid) * 0.5f;30 float2 paddedSize = (maxGrid - minGrid) * cameraViewPadding;31 32 float2 minCorner = center - paddedSize * 0.5f;33 return new float4(minCorner.x, minCorner.y, paddedSize);34} GI Resolution
The giResolution parameter controls how many texels we allocate per world unit.
The final texture dimensions are:
textureWidth = viewWidth x giResolution
textureHeight = viewHeight x giResolutionSo if your camera sees a 30x20 world-unit area (after padding) and giResolution = 4, you get a 120x80 GI texture.
More precisely, the sizing math we use is:
resolution = max(1, giResolution)
// Convert padded world size -> texels
textureWidth = ceil(viewWidth * resolution)
textureHeight = ceil(viewHeight * resolution)
// Snap to multiples of resolution (keeps whole-tile alignment)
textureWidth = ceil(textureWidth / resolution) * resolution
textureHeight = ceil(textureHeight / resolution) * resolution
// Minimum size guard
textureWidth = max(textureWidth, 64)
textureHeight = max(textureHeight, 64)
// World size implied by the snapped texture
worldWidth = textureWidth / resolution
worldHeight = textureHeight / resolutionThat snapping step is important: it guarantees the texture grid lines up with whole world units, which keeps phase locking stable and prevents half-texel drift when the camera moves.
A resolution of 1 would produce a Minecraft-like light, with each tile having a single color. However, since our shadows use the same texture, you would get very blocky shadows.
This value also affects heat diffusion: at higher resolutions, world-space hops span more GI texels. That’s why the adaptive hierarchical sampling exists (covered later).
Phase Locking
We deliberately quantize the GI viewport origin to whole-tile (1 world unit) boundaries.
This does two things:
- It eliminates sub-tile jitter where
floor()-based mapping would cause texels to “switch” which tile they correspond to. - It enables aggressive caching: the GI textures only need to update when the camera crosses a tile boundary (or lights/walls change), which can skip lots of frames at high GI resolutions.
Trade-off: the GI window moves in discrete steps. With sufficient padding, the camera never sees the window edge, so this stays visually stable.
If you want smoother tracking, snap to GI-texel boundaries (1 / giResolution world units) instead, but you’ll update more often and caching becomes less effective.
1private float4 CalculatePhaseLockedCameraView(float2 center, float2 size, int resolution)2{3 float minX = center.x - size.x * 0.5f;4 float minY = center.y - size.y * 0.5f;5 6 // Convert world -> texels7 int minXTexel = Mathf.FloorToInt(minX * resolution);8 int minYTexel = Mathf.FloorToInt(minY * resolution);9 10 // Snap to 1 world unit boundaries (resolution texels)11 minXTexel = Mathf.FloorToInt(minXTexel / (float)resolution) * resolution;12 minYTexel = Mathf.FloorToInt(minYTexel / (float)resolution) * resolution;13 14 float invRes = 1.0f / resolution;15 return new float4(minXTexel * invRes, minYTexel * invRes, size.x, size.y);16} Coordinate Conventions
These are the rules I stick to everywhere. If you keep this mapping consistent, the rest of the system stays stable:
_CameraView = (minX, minZ, width, height)in world units.- GI texture size =
viewSize * giResolution(integer texel dimensions). - The conduction mask is rendered at GI resolution (same texel grid as the light buffers).
- Full-screen
uvis normalized over the GI textures, not the screen. - World tile index:
floor(worldPos.xz)(1 cell per world unit). - GI texel center (world units):
(floor(worldPos.xz * giRes) + 0.5) / giRes. - When sampling GI or conduction mask, clamp + snap to texel center to avoid boundary flicker.
With _CameraView set, the shader can convert any screen UV to world coordinates:
1float2 uvToGrid(float2 uv)2{3 // _CameraView = (minX, minZ, width, height)4 float2 gridPos;5 gridPos.x = _CameraView.x + uv.x * _CameraView.z;6 gridPos.y = _CameraView.y + uv.y * _CameraView.w;7 return gridPos;8} The inverse mapping (world → UV) and texel snapping look like this:
1float2 gridToViewportUV(float2 gridPos)2{3 CamBounds bounds = GetGlobalCameraViewData();4 float2 uv;5 uv.x = (gridPos.x - bounds.min.x) / bounds.size.x;6 uv.y = (gridPos.y - bounds.min.y) / bounds.size.y;7 return uv;8}9 10float2 snapUVToTexelCenter(float2 uv)11{12 float2 texelSize = getViewportTexelSize();13 float2 halfTexel = texelSize * 0.5;14 uv = clamp(uv, halfTexel, 1.0 - halfTexel);15 return (floor(uv / texelSize) + 0.5) * texelSize;16} Please note that the GI “grid space” is (worldX, worldZ). I’ll call the second component y in shaders because it’s a 2D texture axis.
This simple linear interpolation maps (0,0) to the bottom-left of our world view and (1,1) to the top-right. Every GI pass uses this function to bridge screen space and world space.
Consistency here matters. If any pass uses a slightly different mapping (different padding, phase lock, or floor vs round), you get flicker and off-by-one lighting bugs.
3. Direct Lighting (Point Lights)
We now have a 2D texture with our light occluders. We are ready to fake some photons.
Gathering Lights
In the RenderFeature, we iterate through Unity’s visible lights array and pack light data into a GPU buffer:
1struct VM_LightData2{3 float4 color;4 float2 position;5 float radius;6 float _padding; // 16 byte alignment7}; We don’t need a separate intensity because visibleLight.finalColor already bakes in the Unity light’s intensity + color (and temperature), so the shader only needs radius and position:
1// In RecordRenderGraph2 3var lightData = frameData.Get<UniversalLightData>();4 5for (int i = 0; i < lightData.visibleLights.Length; i++)↳By using lightData.visibleLights, we don't need to handle culling + range checks, but max lights are capped at 256 by URP. If you need more, gather them in a job or similar.6{7 var visibleLight = lightData.visibleLights[i];8 var unityLight = visibleLight.light;9 var position = unityLight.transform.position;10 11 12 WritePointLight(lightCount,13 new float3(position.x, position.y, position.z),14 visibleLight.finalColor,15 unityLight.range);↳WritePointLight writes to the light buffer we'll send to the shaders16} DDA
Instead of the classical approach, we use DDA (Digital Differential Analyzer) to cast rays from each light to each pixel.
More precisely it’s the Amanatides & Woo grid traversal, but it’s often called DDA in gamedev. The key property is stepping from grid boundary to grid boundary, visiting every cell the ray crosses in order.
For shadow casting, we want to know: “does a straight line from the light to this pixel pass through any walls?”
The algorithm works by tracking how far we need to travel in X and Y to cross the next cell boundary. Whichever is closer, we step in that direction:
1float CalculateShadowDDA(float2 lightPos, float2 pixelPos, float radius)2{3 float2 rayDir = pixelPos - lightPos;4 float rayLength = length(rayDir);5 6 if (rayLength < 1e-4)7 return 1.0;8 9 rayDir /= rayLength;10 11 // Bias the ray off grid boundaries to avoid corner/edge ambiguity.12 const float RAY_BIAS = 1e-4;13 float2 startPos = lightPos + rayDir * RAY_BIAS;14 float2 endPos = pixelPos - rayDir * RAY_BIAS;15 rayLength = max(rayLength - 2.0 * RAY_BIAS, 0.0);↳Start/end bias keeps floor() stable when positions land on grid boundaries.16 17 // Grid cell coordinates of starting and ending points18 int2 startCell = int2(floor(startPos));19 int2 endCell = int2(floor(endPos));20 21 int2 cell = startCell;22 23 // Calculate step direction24 int2 step = int2(sign(rayDir));25 26 // Calculate distance to first X and Y cell boundaries27 const float DIR_EPS = 1e-6;28 bool xZero = abs(rayDir.x) < DIR_EPS;29 bool yZero = abs(rayDir.y) < DIR_EPS;↳Guard near-zero directions to avoid 1/dir blowups.30 31 float2 tMax;32 if (xZero)33 tMax.x = 1e30;34 else35 tMax.x = (step.x > 0)36 ? (floor(startPos.x) + 1 - startPos.x) / rayDir.x37 : (startPos.x - floor(startPos.x)) / -rayDir.x;38 39 if (yZero)40 tMax.y = 1e30;41 else42 tMax.y = (step.y > 0)43 ? (floor(startPos.y) + 1 - startPos.y) / rayDir.y44 : (startPos.y - floor(startPos.y)) / -rayDir.y;45 46 // Calculate how far to step in each direction to move one cell47 float2 tDelta = float2(48 xZero ? 1e30 : abs(1.0 / rayDir.x),49 yZero ? 1e30 : abs(1.0 / rayDir.y));50 51 float distanceTraveled;52 const float CORNER_EPS = 1e-5;53 54 // Maximum number of steps to prevent infinite loops55 int maxSteps = max(1, (int)ceil(radius * 2.0) + 2);56 57 for (int i = 0; i < maxSteps; i++)58 {59 // IsObstacle MUST use _WallBuffer in this path60 if (IsObstacle(cell.x, cell.y))↳IsObstacle treats out-of-bounds as empty (streaming-friendly).61 {62 return 0;63 }64 65 // If we've reached the destination cell, we're not in shadow66 if (cell.x == endCell.x && cell.y == endCell.y)67 return 1.0;68 69 // Step to next cell70 if (tMax.x < tMax.y - CORNER_EPS)71 {72 distanceTraveled = tMax.x;73 tMax.x += tDelta.x;74 cell.x += step.x;75 }76 else if (tMax.y < tMax.x - CORNER_EPS)77 {78 distanceTraveled = tMax.y;79 tMax.y += tDelta.y;80 cell.y += step.y;81 }82 else83 {84 // Corner hit: only block if both adjacent cells are obstacles85 distanceTraveled = tMax.x;86 int2 nextX = cell + int2(step.x, 0);87 int2 nextY = cell + int2(0, step.y);88 if (IsObstacle(nextX.x, nextX.y) && IsObstacle(nextY.x, nextY.y))89 return 0;90 91 tMax.x += tDelta.x;92 tMax.y += tDelta.y;93 cell.x += step.x;94 cell.y += step.y;95 }96 97 // If we've traveled farther than the ray length, we're done98 if (distanceTraveled > rayLength)99 return 1.0;100 }101 102 // If we've exceeded max steps, assume no shadow103 return 1.0;104} Shadow Proxies (Analytical Occluders)
DDA gives grid-accurate shadows from the wall buffer, but it doesn’t cover thin props, dynamic obstacles, or anything that isn’t baked into the grid. For those I use shadow proxies: small analytic occluders that cast soft shadows without touching the wall buffer.
Each proxy is a simple shape (box or circle) with width, length, and penumbra parameters. The proxy buffer is updated on the CPU, culled against the GI view, and capped to maxShadowProxies. Both the point-light and directional-light passes read the same buffer.
Culling is straightforward: cullRadius = max(size) + maxLength + max(penumbraWidth, penumbraLength), then keep the closest N proxies to the GI view center.
1struct VM_ShadowProxyData2{3 float2 position; // World XZ4 float2 size; // Half extents (box) or radius (circle)5 float maxLength; // Shadow length6 float endCapBlend; // Blend between light dir and proxy axis7 float penumbraWidth; // Soft edge across width8 float penumbraLength; // Soft edge along length9 float penumbraPower; // Curve sharpness10 float shape; // 0 = box, 1 = circle11 float widthMode; // 0 = projected width, 1 = max axis12 float penumbraMode; // inward/outward/both13}; Core idea: treat the proxy like a capsule/box aligned with the light direction, compute perp and along distances, and attenuate by two penumbra curves.
1float ComputePointLightProxyShadow(float2 lightPos, float2 pixelPos)2{3 if (_ShadowProxyCount <= 0) return 1.0;4 5 float2 ray = pixelPos - lightPos;6 float rayLen = length(ray);7 if (rayLen < 1e-4) return 1.0;8 9 float2 dir = ray / rayLen;10 float shadow = 1.0;11 12 for (int i = 0; i < _ShadowProxyCount; i++)13 {14 VM_ShadowProxyData proxy = _ShadowProxyBuffer[i];15 float2 toProxy = proxy.position - lightPos;16 float proxyDist = length(toProxy);17 if (proxyDist < 1e-4) continue;18 19 float2 dirFlat = toProxy / proxyDist;20 float2 dirShadow = normalize(lerp(dir, dirFlat, saturate(proxy.endCapBlend)));21 float2 perpDir = float2(-dirShadow.y, dirShadow.x);22 23 float occluderDist = dot(toProxy, dirShadow);24 if (occluderDist <= 0.0) continue;25 26 float2 toPixelFromProxy = pixelPos - proxy.position;27 float alongDist = dot(toPixelFromProxy, dirShadow);28 if (alongDist <= 0.0) continue;29 30 float perpDist = abs(dot(toPixelFromProxy, perpDir));31 float width = GetShadowProxyWidth(proxy, perpDir);32 33 if (alongDist > proxy.maxLength + proxy.penumbraLength) continue;34 35 float edge = ComputePenumbraFactor(perpDist, width, proxy.penumbraWidth, proxy.penumbraPower, proxy.penumbraMode);36 float len = ComputePenumbraFactor(alongDist, proxy.maxLength, proxy.penumbraLength, proxy.penumbraPower, proxy.penumbraMode);37 38 float occlusion = edge * len;39 shadow = min(shadow, 1.0 - occlusion);40 if (shadow <= 0.001) break;41 }42 43 return shadow;44} In the lighting loop I combine it like this:
shadow = min(ShadowDDA, ShadowProxy);Directional lights use the same proxy logic (just with a fixed light direction).
Shadow proxy handling is basic and has room for improvement. It should be possible to project shadows in the walls non-uniformly so they retain their shape when they hit walls. Right now, if a shadow hits a wall, the wall gets shadowed from top to bottom. Penumbra helps massively but hard shadows can be distracting when they just barely touch a wall.
Multiple Render Targets: One Pass, Many Textures
If you are targeting Shader Model 3 (D3D9) or OpenGL ES, MRTs have a cap of 4 render targets. Pack 2 wall faces per texture to get around this.
We don’t just output a single color texture, we use Multiple Render Targets (MRT) to write several textures simultaneously:
1struct FragmentOutput2{3 float4 color : SV_Target0; // Direct light color4 float4 shadow : SV_Target1; // Shadow map (R=point, G=directional)5 float4 wallFace0 : SV_Target2; // +X face (RGB)6 float4 wallFace1 : SV_Target3; // -X face (RGB)7 float4 wallFace2 : SV_Target4; // +Z face (RGB)8 float4 wallFace3 : SV_Target5; // -Z face (RGB)9}; Why so many outputs? Because walls need per-face lighting.
The Wall Problem
Ground tiles are simple, they face up, sample the GI texture at their world position, done. But walls have sides. A wall lit from the east should have a bright east face and a dark west face. They also are perfectly aligned with the conduction mask occluders, so sampling at their position will always return no light.
For a while, we used the wall normal to sample the closest adjacent pixel to get around this limitation, but that causes “bands” that look distracting when shadows wrap around corners, and Ambient Occlusion interfered with the wall shading.
Our solution: for each wall texel, we store four lighting values (one per cardinal direction: +X, -X, +Z, -Z) as four separate RGBA16F textures. Each texture holds full RGB for one face. This keeps full color precision, avoids face packing math, and still lets us pick a dominant face with a single texture fetch. Non-wall texels stay zero.
This is a lot of textures (4 for direct, 4 for indirect), yes. But giResolution scales the tile resolution, not screen resolution. A 480x270 internal resolution game like Anchorfall, with giResolution 16 and padding 2 would produce textures at 960x540, which comes at ~4MB per texture at RGBA16F.
For the whole light system (assuming heat spread is done at full res), we have:
2x R8, 1x R8G8, 13x RGBA16F. At the resolution above, that amounts to just under 56MB of VRAM. That is next to nothing for modern hardware and even impressive for a whole lighting system.
1// Which faces of this wall tile are exposed (not buried in other walls)?2float4 GetWallFaceMask(int2 cell)3{4 if (!IsObstacle(cell.x, cell.y))5 return float4(0, 0, 0, 0); // Not a wall6 7 // Check each neighbor - if neighbor is NOT a wall, this face is exposed8 float facePosX = IsObstacle(cell.x + 1, cell.y) ? 0.0 : 1.0;9 float faceNegX = IsObstacle(cell.x - 1, cell.y) ? 0.0 : 1.0;10 float facePosZ = IsObstacle(cell.x, cell.y + 1) ? 0.0 : 1.0;11 float faceNegZ = IsObstacle(cell.x, cell.y - 1) ? 0.0 : 1.0;12 13 return float4(facePosX, faceNegX, facePosZ, faceNegZ);14}15 16// How much does this light contribute to each face?17float4 GetWallFacing(float2 dirToLight)18{19 return float4(20 saturate(dirToLight.x), // +X face lit when light is to the right21 saturate(-dirToLight.x), // -X face lit when light is to the left22 saturate(dirToLight.y), // +Z face lit when light is above23 saturate(-dirToLight.y) // -Z face lit when light is below24 );25} For each light, we calculate the direction to the light source, multiply the face mask by the facing weights, and accumulate into the wall textures. Later, when rendering a wall sprite, we sample these textures using the sprite’s world normal to pick the right face. This is intentionally simple and snappy: it biases light toward the most directly oriented face without needing per-pixel normals in the GI texture. If you want softer transitions, raise the facing term to a power (e.g. pow(saturate(dirToLight.x), k)).
On bandwidth-constrained GPUs you may want fewer RTs or pack wall faces.
The Rendering Loop
Now for the actual rendering. Every pixel in our GI texture runs a fragment shader that loops through all point lights:
1FragmentOutput frag(Varyings input)2{3 FragmentOutput output;4 output.color = float4(0);5 output.shadow = float4(0);6 output.wallFace0 = float4(0);7 output.wallFace1 = float4(0);8 output.wallFace2 = float4(0);9 output.wallFace3 = float4(0);10 11 float2 gridPos = uvToGrid(input.texcoord);12 float giRes = max(_GI_Resolution_Lib, 1.0);13 float2 gridPosCell = (floor(gridPos * giRes) + 0.5) / giRes;↳Snap to GI texel centers so wall/ground sampling is stable across camera movement.14 15 int2 wallCell = int2(floor(gridPos));16 // IsObstacle MUST use _WallBuffer in this path, otherwise popping might occur17 bool isWall = IsObstacle(wallCell.x, wallCell.y);↳Walls are encoded in the wall buffer (1 = wall).18 19 float totalShadow = 0.0;20 float3 wallFace0 = float3(0, 0, 0);21 float3 wallFace1 = float3(0, 0, 0);22 float3 wallFace2 = float3(0, 0, 0);23 float3 wallFace3 = float3(0, 0, 0);↳Wall lighting accumulates into per-face outputs (valid only on wall texels).24 25 if (isWall)26 {27 float2 wallSamplePos = gridPosCell;↳Base wall position; per-face offsets push just outside the wall cell.28 float4 faceMask = GetWallFaceMask(wallCell);29 float2 cellMin = float2(wallCell);30 float2 cellMax = cellMin + 1.0;31 const float faceEps = 1e-3;↳Small epsilon pushes the ray target just outside the wall cell (aligns wall/ground shadows).32 const float4 zero4 = float4(0, 0, 0, 0);33 34 for (int i = 0; i < _LightCount; i++)35 {36 float2 lightGridPos = _LightBuffer[i].position;37 float4 lightColor = _LightBuffer[i].color;38 float lightRadius = _LightBuffer[i].radius;39 40 float dist = distance(wallSamplePos, lightGridPos);41 if (dist > lightRadius)42 continue;43 44 float2 dirToLight = lightGridPos - wallSamplePos;45 float invLen = rsqrt(max(dot(dirToLight, dirToLight), 1e-6));46 dirToLight *= invLen;47 float4 facing = GetWallFacing(dirToLight);48 float4 faceWeight = faceMask * facing;↳Face weights bias toward the wall face oriented to the light.49 50 if (all(faceWeight == zero4))51 continue;52 53 float falloff = 1.0 - saturate(dist / lightRadius);54 falloff = pow(falloff, _LightSoftness);55 56 float3 directContribution = lightColor.rgb * falloff;57 58 float4 faceShadow = float4(1, 1, 1, 1);59 if (faceWeight.x > 0.0)↳Each face gets its own DDA to the adjacent cell + proxy shadow.60 {61 float2 facePos = float2(cellMax.x + faceEps, wallSamplePos.y);62 float shadowFactor = CalculateShadowDDA(lightGridPos, facePos, lightRadius);63 float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, facePos);64 faceShadow.x = min(shadowFactor, proxyShadowFactor);65 }66 if (faceWeight.y > 0.0)67 {68 float2 facePos = float2(cellMin.x - faceEps, wallSamplePos.y);69 float shadowFactor = CalculateShadowDDA(lightGridPos, facePos, lightRadius);70 float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, facePos);71 faceShadow.y = min(shadowFactor, proxyShadowFactor);72 }73 if (faceWeight.z > 0.0)74 {75 float2 facePos = float2(wallSamplePos.x, cellMax.y + faceEps);76 float shadowFactor = CalculateShadowDDA(lightGridPos, facePos, lightRadius);77 float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, facePos);78 faceShadow.z = min(shadowFactor, proxyShadowFactor);79 }80 if (faceWeight.w > 0.0)81 {82 float2 facePos = float2(wallSamplePos.x, cellMin.y - faceEps);83 float shadowFactor = CalculateShadowDDA(lightGridPos, facePos, lightRadius);84 float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, facePos);85 faceShadow.w = min(shadowFactor, proxyShadowFactor);86 }87 88 float4 faceContribution = faceWeight * faceShadow;89 90 wallFace0 += directContribution * faceContribution.x;91 wallFace1 += directContribution * faceContribution.y;92 wallFace2 += directContribution * faceContribution.z;93 wallFace3 += directContribution * faceContribution.w;94 }95 96 for (int i = 0; i < _EmissiveLightCount; i++)↳Emissives skip DDA but still light wall faces.97 {98 float2 lightGridPos = _EmissiveLightBuffer[i].position;99 float4 lightColor = _EmissiveLightBuffer[i].color;100 float lightRadius = _EmissiveLightBuffer[i].radius;101 102 float dist = distance(wallSamplePos, lightGridPos);103 if (dist > lightRadius)104 continue;105 106 float falloff = 1.0 - saturate(dist / lightRadius);107 falloff = pow(falloff, _LightSoftness);108 109 float3 directContribution = lightColor.rgb * falloff;110 float2 dirToLight = lightGridPos - wallSamplePos;111 float invLen = rsqrt(max(dot(dirToLight, dirToLight), 1e-6));112 dirToLight *= invLen;113 float4 facing = GetWallFacing(dirToLight);114 float4 faceWeight = faceMask * facing;115 116 wallFace0 += directContribution * faceWeight.x;117 wallFace1 += directContribution * faceWeight.y;118 wallFace2 += directContribution * faceWeight.z;119 wallFace3 += directContribution * faceWeight.w;120 }121 }122 else123 {124 // Process each light for ground125 for (int i = 0; i < _LightCount; i++)126 {127 float2 lightGridPos = _LightBuffer[i].position;128 float4 lightColor = _LightBuffer[i].color;129 float lightRadius = _LightBuffer[i].radius;130 131 float dist = distance(gridPos, lightGridPos);132 if (dist > lightRadius)133 continue;134 135 float shadowFactor = CalculateShadowDDA(lightGridPos, gridPosCell, lightRadius);↳Ground samples DDA to the GI texel center (not the wall boundary).136 float proxyShadowFactor = ComputePointLightProxyShadow(lightGridPos, gridPosCell);137 shadowFactor = min(shadowFactor, proxyShadowFactor);138 139 totalShadow = max(totalShadow, shadowFactor);140 141 float falloff = 1.0 - saturate(dist / lightRadius);142 falloff = pow(falloff, _LightSoftness);143 144 float3 directContribution = lightColor.rgb * falloff * shadowFactor;145 output.color.rgb += directContribution;146 }147 148 // Process emissive-only lights (no shadows)149 for (int i = 0; i < _EmissiveLightCount; i++)150 {151 float2 lightGridPos = _EmissiveLightBuffer[i].position;152 float4 lightColor = _EmissiveLightBuffer[i].color;153 float lightRadius = _EmissiveLightBuffer[i].radius;154 155 float dist = distance(gridPos, lightGridPos);156 if (dist > lightRadius)157 continue;158 159 float falloff = 1.0 - saturate(dist / lightRadius);160 falloff = pow(falloff, _LightSoftness);161 162 output.color.rgb += lightColor.rgb * falloff;163 }164 }165 166 float shadowMask = isWall ? 0.0 : 1.0;↳Shadow map only stores visibility for non-wall texels.167 output.shadow = float4(totalShadow * shadowMask, 0.0, 0.0, 0.0);168 output.wallFace0 = float4(wallFace0, 1.0);169 output.wallFace1 = float4(wallFace1, 1.0);170 output.wallFace2 = float4(wallFace2, 1.0);171 output.wallFace3 = float4(wallFace3, 1.0);172 return output;173} A few things to note:
IsObstacle checks: In this path, IsObstacle needs to be wired to use _WallBuffer directly. The conduction mask is view-dependent, which might cause popping if padding is not high enough.
Light falloff: The pow(falloff, _LightSoftness) controls how “hard” or “soft” the light edge is. A value of 1.0 gives linear falloff; higher values create a sharper cutoff at the edge.
Additive accumulation: Multiple lights simply add together. Two overlapping red lights make a brighter red. A red and blue light make purple (well, magenta). This isn’t physically based, but it behaves intuitively and looks natural for games.
Directional Lights: The Sun Problem
Point lights are straightforward: position, radius, done. But directional lights (sun, moon) have no position. They have a direction and cast shadows based on angle.
We handle these in a separate pass with their own data structure:
1struct VM_DirectionalLightData {2 float3 direction; // Normalized light direction3 float4 color;4 float shadowDistanceMultiplier; // Pre-computed: 1.0 / max(abs(direction.y), 0.02)5 float baseShadowDistance;6 float bounceIntensity;7 float2 _padding; // 16 byte alignment8}; Shadow length depends on sun angle. When the sun is directly overhead (direction.y ≈ -1), shadows are short. At sunset (direction.y ≈ 0), shadows stretch to infinity.
We pre-compute shadowDistanceMultiplier on the CPU:
maxShadowDistance = baseShadowDistance x shadowDistanceMultiplierThe DDA then traces from each pixel in the light direction, checking against a fixed _WallHeight (uniform wall height) to approximate long shadows. The traversal is still 2D over the wall grid, _WallHeight and sun angle just scale the effective shadow reach.
Here’s the condensed pass logic:
1float DirectionalShadowDDA(float2 pixelPos, float3 lightDir,2 float baseShadowDistance, float shadowDistanceMultiplier)3{4 float3 rayDir3D = -lightDir;5 float2 rayDir = normalize(rayDir3D.xz);6 float rayDirXZLen = length(rayDir3D.xz);7 if (rayDirXZLen < 1e-4) return 1.0;8 9 const float DIR_EPS = 1e-6;10 const float RAY_BIAS = 1e-4;11 float2 startPos = pixelPos + rayDir * RAY_BIAS;12 int2 cell = int2(floor(startPos));13 int2 step = int2(sign(rayDir));14 bool xZero = abs(rayDir.x) < DIR_EPS;15 bool yZero = abs(rayDir.y) < DIR_EPS;16 17 float2 tMax;18 tMax.x = xZero ? 1e30 :19 (step.x > 0 ? (floor(startPos.x) + 1 - startPos.x) / rayDir.x20 : (startPos.x - floor(startPos.x)) / -rayDir.x);21 tMax.y = yZero ? 1e30 :22 (step.y > 0 ? (floor(startPos.y) + 1 - startPos.y) / rayDir.y23 : (startPos.y - floor(startPos.y)) / -rayDir.y);24 25 float2 tDelta = float2(26 xZero ? 1e30 : abs(1.0 / rayDir.x),27 yZero ? 1e30 : abs(1.0 / rayDir.y));28 29 float maxDist = baseShadowDistance * shadowDistanceMultiplier;30 int maxSteps = 2 * (_GridWidth + _GridHeight);31 32 for (int i = 0; i < maxSteps; i++)33 {34 if (BlocksDirectionalCell(cell, startPos, rayDir3D, rayDirXZLen))35 return 0.0;36 37 if (tMax.x < tMax.y) { tMax.x += tDelta.x; cell.x += step.x; }38 else { tMax.y += tDelta.y; cell.y += step.y; }39 40 float traveled = min(tMax.x, tMax.y);41 if (traveled > maxDist) return 1.0;42 43 int bufferX = cell.x + _GridOffsetX;44 int bufferY = cell.y + _GridOffsetY;45 if (bufferX < 0 || bufferY < 0 || bufferX >= _GridWidth || bufferY >= _GridHeight)46 return 1.0;47 }48 return 1.0;49}50 51float3 direct = 0;52float totalShadow = 0;53for (int i = 0; i < _DirectionalLightCount; i++)54{55 VM_DirectionalLightData lightData = _DirectionalLightBuffer[i];56 float3 lightDir = lightData.direction;57 float4 lightColor = lightData.color;58 float shadowDistanceMultiplier = lightData.shadowDistanceMultiplier;59 float baseShadowDistance = lightData.baseShadowDistance;60 61 float visibility = (lightDir.y >= 0.0) ? 0.062 : DirectionalShadowDDA(gridPosCell, lightDir, baseShadowDistance, shadowDistanceMultiplier);63 visibility = min(visibility, ComputeDirectionalProxyShadow(gridPosCell, lightDir));64 65 float angle = saturate(-lightDir.y * 2.0);66 angle = lerp(angle, 1.0, _GI_DirectionalAngleScale);67 68 totalShadow = max(totalShadow, visibility);69 direct += lightColor.rgb * angle * visibility;70 // Wall faces use the same face-weight accumulation as point lights,71 // but their shadow rays start just outside each face to avoid self-occlusion.72}73output.color.rgb += direct;74output.shadow.g = totalShadow * shadowMask; The height test is a simple wall-height check against the ray:
1bool BlocksDirectionalCell(int2 cell, float2 startPos, float3 rayDir3D, float rayDirXZLen)2{3 if (!IsObstacle(cell.x, cell.y)) return false;4 5 float distanceToWall = length(float2(cell) + 0.5 - startPos);6 if (abs(rayDir3D.y) > 0.001 && rayDirXZLen > 1e-4)7 {8 float verticalRate = rayDir3D.y / rayDirXZLen;9 float rayHeightAtWall = distanceToWall * verticalRate;10 return rayHeightAtWall < _WallHeight; // uniform wall height11 }12 return true;13} This pass uses additive blending for direct color and max blending for shadow visibility (G channel), so “any directional light wins.” It also reads the existing wall-direct textures and adds its contribution before writing out the updated wall-direct RTs.
The angle-based brightness is blended by _GI_DirectionalAngleScale: 0 = full angle modulation, 1 = no angle modulation (flat intensity).
Emissive Lights (No Shadows)
Regular lights cast shadows. But what about glowing crystals, lava pools, or spell projectiles? These are emissive lights, they illuminate their surroundings but don’t cast shadows.
We handle these separately:
1struct VM_EmissiveLightData {2 float2 position;3 float4 color;4 float radius;5 float _padding; // 16 byte alignment6}; Emissives are uploaded into their own buffer and processed in the same pass as point lights. They add to direct lighting (ground) and wall face textures, but skip DDA shadows entirely. In my implementation, emissives bypass the maxLights cap.
Because emission is extracted from the direct texture right after this pass, emissive lights automatically seed indirect lighting too. Emissive color is pre-multiplied by intensity on the CPU, so the shader only needs position, radius, and color.
Shadow map encoding
The shadow texture uses this encoding:
- R channel: Point light visibility (max over point lights; 0 = none reach, 1 = at least one reaches)
- G channel: Directional light visibility (0 = blocked, 1 = lit)
- Combined:
max(R, G)gives the quick “directly lit” hint
In the point-light pass we only write R (and set G to 0). The directional pass writes only G. We keep them separate because the sun and local lights behave differently, but the combined visibility is useful for diffusion heuristics. In HeatTransfer, we use max(R, G) to decide how aggressively to fill in shadows.
4. Indirect Light: Thermodynamics of Pixels
This is the part that produces the GI look: indirect bounce via diffusion.
For that, we’ll treat light as if it was heat.
Imagine a lit floor tile as a hot plate. Darkness is cold air. We run a simulation where heat (light energy) naturally flows from hot areas to cold areas, provided there’s a conductive medium (air, not walls) connecting them. Run this simulation enough times, and light “spreads” into shadowed corners just like real indirect illumination.

Step 1: Extract Emission
Before we can diffuse anything, we need to seed the heat buffer with “new energy.” We take the direct light texture and extract a percentage of it:
1float4 directLight = SAMPLE_TEXTURE2D(_MainTex, sampler_MainTex, uv);2 3// Non-linear extraction to prevent white hotspots4float maxChannel = max(directLight.r, max(directLight.g, directLight.b));5float curve = 1.0 - exp(-maxChannel * 2.0); // Exponential rolloff6float scaleFactor = _EmissionStrength * curve / (maxChannel + 0.001);7 8float4 emission = directLight * scaleFactor;9 10// Boost saturation to compensate for averaging11float luminance = dot(emission.rgb, float3(0.299, 0.587, 0.114));12emission.rgb = luminance + (emission.rgb - luminance) * 1.2; We use a non-linear curve (1 - e^(-x)). Without this, bright light sources would create white hotspots that dominate the scene. The exponential rolloff compresses bright values while preserving color in mid-tones.
The saturation boost at the end counteracts the desaturation that happens when you average colors repeatedly. Without it, orange torchlight turns into muddy beige after a few diffusion iterations.
Extraction runs once per frame; the heat transfer loop then iterates on that seed (we don’t inject new emission every iteration).
Step 2: The Ping-Pong Diffusion
HeatTransfer.shader runs in a ping-pong loop: read from texture A, write to texture B, swap, repeat.
1// Standard 8-neighbor offsets2static const float2 offsets[8] = {3 float2(-1, -1), float2(0, -1), float2(1, -1),4 float2(-1, 0), float2(1, 0),5 float2(-1, 1), float2(0, 1), float2(1, 1)6};7 8// Diagonal neighbors get less weight (1 / sqrt(2))9static const float weights[8] = {10 0.7071, 1.0, 0.7071,11 1.0, 1.0,12 0.7071, 1.0, 0.707113}; Each pixel samples its 8 neighbors using a mix of world-space and texel-space offsets. The far-field taps use _DiffusionDistance (world units), while the near-field taps are a small number of texels for continuity. The neighbor’s heat flows into the current pixel proportionally to:
- The neighbor’s conductivity (is it air or wall?)
- The current pixel’s conductivity
- The distance weight
- The global diffusion rate
Diffusion update equation
Here’s the iteration skeleton for each heat spread round:
1float4 centerHeat = SAMPLE_TEXTURE2D(_ColorBuffer, sampler_ColorBuffer, uv);2float centerCond = SAMPLE_TEXTURE2D(_ConductionMask, sampler_ConductionMask, uv).r;3if (centerCond < 1e-4) return centerHeat;4 5float4 shadowInfo = SAMPLE_TEXTURE2D(_ShadowMap, sampler_ShadowMap, uv);6float directVisibility = max(shadowInfo.r, shadowInfo.g);7float inShadow = 1.0 - directVisibility;8 9CamBounds bounds = GetGlobalCameraViewData();10float2 worldToUV = float2(1.0 / bounds.size.x, 1.0 / bounds.size.y);11float2 sampleDistance = _DiffusionDistance * worldToUV;12 13float3 accum = 0.0;14float3 wsum = 0.0;15 16// Loop neighbors (tiered if adaptive). Offsets + distanceWeight are precomputed.17for (int i = 0; i < 8; i++)18{19 float2 nUV = uv + offsets[i] * sampleDistance;20 float4 nHeat = SAMPLE_TEXTURE2D(_ColorBuffer, sampler_ColorBuffer, nUV);21 float nCond = SAMPLE_TEXTURE2D(_ConductionMask, sampler_ConductionMask, nUV).r;22 23 float weight = nCond * centerCond * weights[i] * _DiffusionRate;24 25 // Stylized per-channel (see next section)26 float3 channelPresence = saturate(nHeat.rgb + 0.1);27 float3 channelWeights = weight * channelPresence;28 accum += nHeat.rgb * channelWeights;29 wsum += channelWeights;30}31 32accum /= max(wsum, 1e-4);33 34float blend = lerp(0.5, 0.8, inShadow) * _DiffusionRate;35float3 newHeat = lerp(centerHeat.rgb, accum, blend);36 37// Optional dark-area bias (keeps deep shadows from staying dead)38float centerLum = dot(centerHeat.rgb, float3(0.299, 0.587, 0.114));39float darkness = 1.0 - saturate(centerLum);40newHeat *= 1.0 + _RangeBoost * darkness * inShadow;41 42newHeat *= _IntensityMultiplier;43 44// Soft cap45float maxChannel = max(newHeat.r, max(newHeat.g, newHeat.b));46if (maxChannel > 2.0)47 newHeat *= 2.0 / maxChannel;48 49return float4(newHeat, centerHeat.a); Per-Channel Diffusion (Stylized)
Here’s a subtle but important detail. Early versions of the system diffused all channels together. Red, green, and blue moved as one. That’s the physically-based approach, and it already mixes colors correctly. But it tends to wash out saturation in pixel art.
What I ended up using is a stylized per-channel diffusion: each channel “pushes through” independently and is normalized separately. This keeps saturated colors punchier and makes color bleeding read better at low resolution. It is not physically based, it’s an artistic control knob.
1// Each channel can spread independently2float3 channelPresence = saturate(neighborHeat.rgb + 0.1);3float3 channelWeights = pathConductivity * channelPresence;4 5// Accumulate each channel independently6accumulatedHeat.r += neighborHeat.r * channelWeights.r;7accumulatedHeat.g += neighborHeat.g * channelWeights.g;8accumulatedHeat.b += neighborHeat.b * channelWeights.b;9 10totalWeightPerChannel += channelWeights;11 12// Later: normalize each channel independently13if (totalWeightPerChannel.r > 0.001)14 accumulatedHeat.r /= totalWeightPerChannel.r; The channelPresence bias (+ 0.1) ensures that even dark areas can receive color. Otherwise, black pixels would never pick up any light.
If you want the physically-based version, keep the weights scalar and accumulate RGB as a vector:
float3 accum = 0.0;
float wsum = 0.0;
accum += neighborHeat.rgb * pathConductivity;
wsum += pathConductivity;
float3 newHeat = (wsum > 1e-4) ? (accum / wsum) : centerHeat.rgb;Solving the Resolution Problem: Hierarchical Sampling
At high GI resolutions (e.g., giResolution = 16), the world-space hop spans many GI texels (skipping over them), while texel-scale hops alone would shrink world-space reach. The result was either blocky diffusion or very slow spread.
But if we increased the sampling distance in GI texels, we got “grid artifacts” (black dots) where sparse taps miss nearby lit texels, leaving isolated unlit holes.
To prevent this, we use Adaptive Hierarchical Sampling. The shader detects when the GI-texel distance is large and switches to a three-tier approach:
1float samplingPixelDistance = _DiffusionDistance * pixelsPerWorldUnit;2bool useAdaptiveSampling = samplingPixelDistance > 2.5;3 4if (useAdaptiveSampling)5{6 // Tier 1: Near-field (30%) - local smoothness7 for (int i = 0; i < 8; i++) {8 float2 nearSampleUV = viewportUV + offsets[i] * texelSize * 1.5;9 // ... accumulate with 30% weight10 }11 12 // Tier 2: Mid-field (30%) - bridge the gap13 for (int j = 0; j < 8; j++) {14 float2 midSampleUV = viewportUV + offsets[j] * sampleDistance * 0.5;↳sampleDistance is the UV offset corresponding to _DiffusionDistance in world units (i.e., a world-space hop expressed in GI texture UVs).15 // ... accumulate with 30% weight16 }17 18 // Tier 3: Far-field (40%) - long-range transport19 for (int k = 0; k < 8; k++) {20 float2 farSampleUV = viewportUV + offsets[k] * sampleDistance;21 // ... accumulate with 40% weight22 }23} - Near-Field (30%): Samples at 1.5 GI texels, fills immediate gaps
- Mid-Field (30%): Samples at 50% world distance, bridges near and far
- Far-Field (40%): Samples at full world distance, long-range light transport
This allows light to travel across larger distances without needing hundreds of iterations (far-field) while maintaining smooth gradients (near-field).
The 30 / 30 / 40 split wasn’t analytically derived. It came from iterative tuning with two constraints:
- Near-field samples must be strong enough to kill holes
- Far-field samples must dominate transport distance (that’s the point of hierarchical sampling)
The mid-field tier exists purely to smooth the transition between those two regimes. Its weight is the least sensitive; it mainly prevents visible “bands” where near and far contributions meet.
One caveat: the conduction check is endpoint-only (center + neighbor). With far-tier samples, that can jump across thin walls if both endpoints are air. If that shows up, the fix is straightforward: only allow far-tier samples when the straight segment is unobstructed. A cheap version is to take 3-6 steps along the segment and multiply conductivities; if any step hits a wall, zero out that far sample.
Conduction in Action
Crucially, every diffusion sample multiplies by the Conduction Mask:
1float neighborConductivity = SAMPLE_TEXTURE2D(_ConductionMask, ...).r;2float pathConductivity = neighborConductivity * centerConductivity * weight; If either the source or destination is a wall (conductivity 0), no heat flows. For near-field taps this is enough to respect walls and doorways within the simulated window. For far-tier taps, either accept some leakage as a tradeoff or add a cheap segment check (described above) to prevent hopping across thin walls.
The Blur Pass (Optional)
Depending on your settings, you might see subtle grid patterns, especially at lower resolutions. An optional two-pass Gaussian blur can smooth these artifacts:
1// Edge-aware: only blur between similar surfaces2bool centerIsObstacle = centerConductivity < 0.1;3bool neighborIsObstacle = neighborConductivity < 0.1;4 5if (centerIsObstacle == neighborIsObstacle)6{7 // Same surface type - include in blur8 color += sampleColor * weights[i];9 totalWeight += weights[i];10} The blur is conductivity-aware: it won’t smear light across wall boundaries. We also apply brightness preservation to prevent the blur from darkening the image.
WallIndirect pass: projecting bounce onto walls
Indirect light only exists in conductive cells. Wall cells are black in the indirect buffer because the conduction mask blocks diffusion there. That is correct for ground, but it means wall faces would read zero if we sampled _IndirectLightColor directly.
The WallIndirect pass is a tiny projection pass that turns the indirect buffer into four wall-face textures, mirroring how direct lighting already works. It runs after heat transfer (and after blur if enabled), reads the final indirect texture, and for each wall texel copies the indirect value from the adjacent air cell into the matching face output.
Implementation details:
- Fullscreen pass with 4 MRT outputs:
_WallIndirect_PosX/_NegX/_PosZ/_NegZ. - Uses the wall buffer to detect wall cells and exposed faces (stable, not view-dependent).
- Samples at GI texel centers to avoid jitter.
1float2 gridPos = uvToGrid(input.uv);2float giRes = max(_GI_Resolution_Lib, 1.0);3float2 gridPosCell = (floor(gridPos * giRes) + 0.5) / giRes;4 5int2 cell = int2(floor(gridPos));6if (!IsObstacle(cell.x, cell.y))7 return output; // Not a wall8 9float4 faceMask = GetWallFaceMask(cell);10 11float3 posX = faceMask.x > 0.0 ? SampleIndirectAt(gridPosCell + float2(1, 0)) : 0;12float3 negX = faceMask.y > 0.0 ? SampleIndirectAt(gridPosCell + float2(-1, 0)) : 0;13float3 posZ = faceMask.z > 0.0 ? SampleIndirectAt(gridPosCell + float2(0, 1)) : 0;14float3 negZ = faceMask.w > 0.0 ? SampleIndirectAt(gridPosCell + float2(0, -1)) : 0;15 16output.wallFace0 = float4(posX, 1.0);17output.wallFace1 = float4(negX, 1.0);18output.wallFace2 = float4(posZ, 1.0);19output.wallFace3 = float4(negZ, 1.0); SampleIndirectAt maps world to UVs, clamps, and snaps to the texel center before sampling, so results are stable under phase-locked camera motion. The output textures are later read by SampleWallIndirect in GI_Lib.hlsl when a wall face is shaded. Because this is just a neighbor copy, it is cheap and it inherits whatever diffusion, blur, and shadow handling already exists in the indirect buffer.
Combining Everything
Direct + indirect are merged in GI_Lib.hlsl at sample time, where we also apply AO, conductivity, and tone mapping:
Note: ground lighting is masked by conductivity (walls → 0). Wall faces bypass conduction and are combined per face. AO stays on the ground path; if you want AO on walls, inject it into the wall sampling path.
Tone mapping happens in sampling, and we support multiple modes:
- None: Linear output (for HDR displays or post-processing)
- Reinhard: Simple
color / (1 + color) - Reinhard with White Point: Prevents highlights from clamping to white
- ACES: Film-like response curve
The bounceIntensity parameter on directional lights lets you control how bright shadows appear when the sun is out. A low value means harsh shadows; a high value fills them with more bounce light.
Sampling in the Lit shader
Once the GI textures exist, the Lit material has to decide which texture to sample and where. We treat surfaces as one of three types:
- Ground: horizontal surfaces that are not part of a wall cell.
- Wall faces: vertical surfaces of wall cells (+X, -X, +Z, -Z faces).
- Wall tops: the horizontal top surface of a wall cell.
All of this routing lives in GI_Lib.hlsl and is exposed to Shader Graph through subgraphs that sample the global textures. The core decision tree looks like this:
1float3 CalculateAverageLightBrightness(float3 worldPos, float3 worldNormal)2{3 if (ShouldUseWallLighting(worldPos, worldNormal))4 {5 float3 combined = SampleWallDirect(worldPos, worldNormal)6 + SampleWallIndirect(worldPos, worldNormal);7 return ApplyToneMap(combined);8 }9 10 if (ShouldUseWallTopLighting(worldPos, worldNormal))11 {12 float3 combined = SampleWallTopDirect(worldPos)13 + SampleWallTopIndirect(worldPos);14 return ApplyToneMap(combined);15 }16 17 float3 direct = SampleGITextureAtWorldPos(_DirectLightColor, sampler_DirectLightColor, worldPos, worldNormal);18 float3 indirect = SampleGITextureAtWorldPos(_IndirectLightColor, sampler_IndirectLightColor, worldPos, worldNormal);19 indirect = ApplyDirectionalBounce(indirect, worldPos.xz);20 21 float3 combined = (direct + indirect);22 combined *= SampleGIAmbientOcclusion(worldPos.xz);23 combined *= SampleConduction(worldPos.xz);24 return ApplyToneMap(combined);25} Ground sampling
If the surface is not a wall and not a wall top, we sample _DirectLightColor and _IndirectLightColor separately and combine them (with AO + conduction + tone map) in CalculateAverageLightBrightness. The low-level helper still just maps world XZ → UV and samples a GI texture, so ground lighting stays smooth and continuous.
1float3 SampleGITextureAtWorldPos(Texture2D giTexture, SamplerState giSampler, float3 worldPos, float3 worldNormal)2{3 // Walls get routed elsewhere; ground samples directly.4 float2 uv = gridToViewportUV(worldPos.xz);5 uv = snapUVToTexelCenter(uv);6 7 return SAMPLE_TEXTURE2D(giTexture, giSampler, uv).xyz;8} Wall faces (vertical surfaces)
ShouldUseWallLighting checks two things:
- The normal has a strong XZ component (so it’s a vertical surface).
- The sample position corresponds to a wall cell (via the conduction mask).
1bool ShouldUseWallLighting(float3 worldPos, float3 worldNormal)2{3 if (_WallLightingEnabled < 0.5)4 return false;5 6 // Treat surfaces with significant XZ normal component as walls (even if slightly tilted).7 float2 nXZ = worldNormal.xz;8 if (length(nXZ) < 0.5)9 return false;10 11 float2 samplePos = GetWallSamplePosition(worldPos, worldNormal);12 13 // IsWallAtWorld samples the conduction mask at the corresponding UV and thresholds it.14 return IsWallAtWorld(samplePos) || IsWallAtWorld(worldPos.xz);15} When it’s a wall, we sample from four wall face textures (_WallDirect_PosX/_NegX/_PosZ/_NegZ and _WallIndirect_*). Because the world is voxelized and walls are axis-aligned, I pick a dominant face instead of blending across edges. We still offset the sample by half a GI texel along the normal so we land in the adjacent conductive texel instead of the wall boundary.
1int GetDominantWallFace(float3 worldNormal)2{3 float2 n = worldNormal.xz;4 float2 absN = abs(n);5 6 if (absN.x >= absN.y)7 return n.x >= 0 ? 0 : 1; // +X / -X8 return n.y >= 0 ? 2 : 3; // +Z / -Z9} 1float3 SampleWallDirect(float3 worldPos, float3 worldNormal)2{3 float2 samplePos = GetWallSamplePosition(worldPos, worldNormal);4 float2 uv = gridToViewportUV(samplePos);5 uv = snapUVToTexelCenter(uv);6 7 int face = GetDominantWallFace(worldNormal);8 if (face == 0) return SAMPLE_TEXTURE2D(_WallDirect_PosX, sampler_WallDirect_PosX, uv).rgb;9 if (face == 1) return SAMPLE_TEXTURE2D(_WallDirect_NegX, sampler_WallDirect_NegX, uv).rgb;10 if (face == 2) return SAMPLE_TEXTURE2D(_WallDirect_PosZ, sampler_WallDirect_PosZ, uv).rgb;11 return SAMPLE_TEXTURE2D(_WallDirect_NegZ, sampler_WallDirect_NegZ, uv).rgb;12} Wall tops (horizontal surfaces on a wall cell)
ShouldUseWallTopLighting checks for a mostly-upward normal and verifies the cell beneath is a wall. For tops, I keep it intentionally simple:
- Direct = the URP main directional light color, scaled by sun angle (blendable via
_GI_DirectionalAngleScale). - Indirect = a 3x3 average of nearby non-wall GI samples (so point lights still influence tops, but shadows don’t cut into them).
This look for wall tops might not fit your game and is very much an artistic choice. Modify as needed.
1bool ShouldUseWallTopLighting(float3 worldPos, float3 worldNormal)2{3 if (_WallLightingEnabled < 0.5)4 return false;5 6 // Top faces should be mostly upward with minimal XZ component.7 float2 nXZ = worldNormal.xz;8 if (worldNormal.y < 0.5 || length(nXZ) > 0.35)9 return false;10 11 float2 cellCenter = floor(worldPos.xz) + 0.5;12 return IsWallAtWorld(cellCenter);13} 1float3 SampleWallTopDirect(float3 worldPos)2{3 float3 lightDir = normalize(_GI_MainDirectionalDir.xyz);4 float angle = saturate(-lightDir.y * 2.0);5 angle = lerp(angle, 1.0, _GI_DirectionalAngleScale);6 return _GI_MainDirectionalColor.rgb * angle;↳_GI_MainDirectionalColor and _GI_MainDirectionalDir is set based on URPs lightData.mainLightIndex.7}8 9float3 SampleWallTopIndirect(float3 worldPos)10{11 return SampleGroundAverage(_IndirectLightColor, sampler_IndirectLightColor, worldPos.xz);12} Direct and indirect sampling mirror the same routing (SampleDirectLightColor / SampleIndirectLightColor), just against _DirectLightColor / _IndirectLightColor and the wall-specific textures.
Optimizations
These are the major optimizations that can be done to this system, but were out of scope for the article:
- Rendering the heat spread pass at half or quarter resolution then upscaling can speed up the pass massively, with slight visual differences. Blur can alleviate any patterns that emerge.
- Compute version of Heat Spread. In my implementation this resulted in sizeable performance gains. The logic is the same as the fragment shader.
- Caching. In my implementation the whole pipeline is skipped unless a cache key changes:
configHash + visibleLightHash + wallBufferVersion + emissiveVersion + shadowProxyVersion. This is coarse but effective with phase-locked camera movement. If you want finer granularity, split the invalidation by pass (e.g., skip conduction/AO when only lights change). - Wall Face Culling for Fixed Cameras: If your game uses a fixed camera angle (e.g., isometric), the player may never see certain wall faces (for example, the -X and +Z faces might always be occluded). You can hard-code the shader to skip calculating these directions entirely.
- Directional lights pass: I’ll admit, this was an afterthought. A DDA-style “ray per pixel” directional pass is very expensive (as the benchmarks show) and a conventional orthographic shadow map (depth render + sample) would likely be much faster while looking visually similar. If I revisit this, I’ll update the article or publish a Part 2.
Performance data
These measurements are from a standalone release build, captured via NVIDIA Nsight Perf SDK (NVPerf) using the gpu__time_duration.sum hardware counter. Internal game resolution was 480x270 (30x16.875 tiles) on a 3060 Mobile GPU with Linux + Vulkan.
Methodology: All frame captures were done in the same process (per GPU) using console commands to change GI options. They were all taken in the same world position, with walls and shadow proxies (38 of them) present in the scene. All lights were visible and frame caching was disabled. No wall faces were culled. Lights had a radius of 10 and were spread out.
A compute version of heat spread was used.
A camera padding of 1.4 was used, along with 20 heat spread rounds for all captures. All other settings were the same for all captures.
Full, Half and Quarter refer to the resolution at which Heat Spread was performed, then upsampled.
GI Res 16: Full = 768x448, Half = 384x224, Quarter = 192x112
GI Res 8: Full = 384x224, Half = 192x112, Quarter = 96x56. Small passes (omitted from the tables): ConductionMask, AO, ExtractEmission, WallIndirect. In these runs they sum to ~0.04 to 0.11 ms on the 3060 and ~0.08 to 0.66 ms on Intel. Totals below include them.
Point lights are shadow-casting point lights. Emissive and directional counts are listed separately.
giResolution = 16 (RTX 3060 Mobile)
| Heat Spread Res | Point | Emissive | Directional | DDA | DDA_Directional* | HeatSpread | Blur | Total (ms) |
|---|---|---|---|---|---|---|---|---|
| full | 1 | 0 | 0 | 0.09 | 0.06 | 1.14 | 0.08 | 1.46 |
| full | 1 | 0 | 1 | 0.10 | 1.97 | 1.29 | 0.08 | 3.54 |
| full | 1 | 256 | 0 | 0.45 | 0.06 | 1.31 | 0.09 | 2.01 |
| full | 32 | 256 | 0 | 1.74 | 0.06 | 1.18 | 0.08 | 3.16 |
| full | 64 | 256 | 0 | 3.09 | 0.05 | 1.21 | 0.08 | 4.53 |
| half | 64 | 256 | 0 | 3.03 | 0.05 | 0.44 | 0.05 | 3.68 |
| quarter | 64 | 256 | 0 | 3.06 | 0.05 | 0.24 | 0.05 | 3.51 |
giResolution = 8 (RTX 3060 Mobile)
| Heat Spread Res | Point | Emissive | Directional | DDA | DDA_Directional* | HeatSpread | Blur | Total (ms) |
|---|---|---|---|---|---|---|---|---|
| full | 64 | 256 | 0 | 1.01 | 0.02 | 0.43 | 0.03 | 1.53 |
| half | 64 | 256 | 0 | 0.98 | 0.02 | 0.23 | 0.02 | 1.30 |
| quarter | 64 | 256 | 0 | 0.98 | 0.02 | 0.17 | 0.03 | 1.24 |
* DDA_Directional shows the directional-light pass. With 0 directional lights this is mostly overhead.
Integrated GPU (Intel UHD Graphics (TGL GT1))
I also ran the game on my integrated GPU (Intel UHD Graphics (TGL GT1), i7-11800H CPU). It struggles with giResolution = 16 even at quarter heat res, but giResolution = 8 we get decent frame times with low light counts.
giResolution = 16 (Quarter)
| Heat Spread Res | Point | Emissive | Directional | DDA | DDA_Directional* | HeatSpread | Blur | Total (ms) |
|---|---|---|---|---|---|---|---|---|
| quarter | 1 | 64 | 0 | 3.51 | 0.34 | 2.48 | 0.37 | 7.37 |
| quarter | 32 | 64 | 0 | 35.12 | 0.35 | 4.90 | 0.31 | 41.34 |
giResolution = 8 (Quarter)
| Heat Spread Res | Point | Emissive | Directional | DDA | DDA_Directional* | HeatSpread | Blur | Total (ms) |
|---|---|---|---|---|---|---|---|---|
| quarter | 1 | 64 | 0 | 0.94 | 0.07 | 0.20 | 0.11 | 1.52 |
| quarter | 32 | 64 | 0 | 9.57 | 0.07 | 1.74 | 0.10 | 11.56 |
What we can learn from this
- DDA cost scales by light count and GI resolution. Emissive lights are significantly cheaper, but not free (giRes 16 full: 1 point / 0 emissive = 0.09 ms vs 1 point / 256 emissive = 0.45 ms).
- Heat Spread cost scales by texture size. Downsampling gives a big win: 1.21 -> 0.44 -> 0.24 ms at giRes 16 (64 point / 256 emissive) with only minor visual differences when paired with blur.
- The directional-light pass is expensive here: 1 directional light adds ~1.9 ms at giRes 16 full.
- On Intel, light count is the bottleneck fast. giRes 8 quarter is fine at low counts, but giRes 16 quarter collapses with 32 point lights. This performance profile hints at bandwidth limitations, but I didn’t dive into the counters to confirm. Packing MRTs and re-using textures more aggressively may improve it.
Closing thoughts
Phew, that was a long one! This was a very challenging topic to tackle with limited resources available, so I decided to pack as much information here as possible. Allow me to break the “engineer” persona and say this was very frustrating and painful at times, and I became a little obsessed with it, but it was extremely fun and taught me a lot. I truly hope this helps someone keep their hair strands attached to their head. If you do something cool with this, let me know! I’d love to hear about it!
There might be techniques I’ve missed, there are for sure things I can improve, but I’m happy with where I landed with this system. Feedback and suggestions are always appreciated, and questions are welcome. You can reach me on Twitter @gincodes or through e-mail at [email protected].
If this write-up made you think “we should hire this person”, I’m currently looking to switch from a 10-year detour in back-end engineering into full-time game dev. Reach me at: [email protected]