Metal GPU Programming 02 - Shading a Triangle


4 months agoBusy8 min read


Github Repository

Metal GPU Programming 02

Welcome back to an exploration of Metal GPU programming for iOS and macOS. Having built OS-level UI in Part 1, we now delve deeper into the Metal Shading Language. In this entry, we'll rasterize a triangle using Metal C++14 shaders. Our starting shaders aren't too complex, so let's dive right into the code.

Triangle Shader


Here's the shader code to render a triangle on the screen without requiring any input from the host (CPU). That is, any input other than a call to drawPrimitives. The following shader code is included in the C++ source and assigned to g_shaderCode:

#include <metal_stdlib>

using namespace metal;

struct VertexOutput
  float4 position [[position]];
  float4 color;

vertex VertexOutput render_vertex(uint vid [[vertex_id]])
  VertexOutput vertexOut;
  // Clockwise winding order
  if (vid == 0)
    // Middle top of screen.
    vertexOut.position = float4(0.0, 1.0, 0.0, 1.0);
    vertexOut.color = float4(1.0, 0.3, 0.3, 1.0);
  else if (vid == 1)
    // Bottom right
    vertexOut.position = float4(1.0, -1.0, 0.0, 1.0);
    vertexOut.color = float4(0.3, 1.0, 0.3, 1.0);
  else if (vid == 2)
    // Bottom left
    vertexOut.position = float4(-1.0, -1.0, 0.0, 1.0);
    vertexOut.color = float4(0.3, 0.3, 1.0, 1.0);
  return vertexOut;

fragment float4 render_fragment(VertexOutput vertexIn [[stage_in]])
  return vertexIn.color;

This fragment is handy: all that is required from the host is a single drawPrimitives call and a render target. No need to upload Vertex Buffers or Index Buffers to get a triangle on the screen. We pull this off by using the vertex_id attribute (Section 4.3.4 - Attributes for built-in Variables)

Starting from the top, we include the Metal standard library metal_stdlib and import the metal namespace. Both are typical in Metal shaders. Next we specify the VertexOutput structure which defines the output from the vertex shader. This structure binds together the vertex shader (render_vertex) and fragment shader (render_fragment). Through VertexOutput, the vertex shader transmits the position attribute to the fixed-function rasterizer. The rasterizer then uses the position data to ensure color is interpolated before passing the result to the fragment shader.

In order to generate geometry without requiring MTLBuffers from the host, we position vertices in the vertex shader by switching on the input vid parameter. vid is declared with the vertex_id attribute implying that the absolute index of the vertex will be passed in vid. We use these absolute indices to programmatically determine the position of each vertex. vid = 0 targets middle top, vid = 1 bottom right, and vid = 2 bottom left. We set VertexOutput's position member and apply red, green, blue clockwise over the triangle.

Keep in mind, vertex_id is useful only for simple geometry. In later tutorials we'll use MTLBuffers as Vertex and Index buffers to transmit geometry.

For every relevant fragment (a pixel in many cases), the fragment shader is run and the interpolated result from the vertex shader is used to generate results. Per-fragment lighting calculations and other effects are generally applied to interpolated vertex data at this stage. Fragment processing is usually the most expensive stage in the pipeline and often the most versatile. For example, sites such as Shader Toy only need to execute the fragment stage.

In our case, the fragment shader (render_fragment) is simple. The shader's first parameter is declared with the stage_in attribute. This attribute indicates the parameter is an output from the vertex shader. The fragment shader takes this parameter and returns it's interpolated color as the result of the shader. This return value is written into the first colorAttachment specified in MTLRenderPassDescriptor.

Compiling Shaders and Initialization

Now lets compile our shader code. We do so by modifying the renderInit function from our previous tutorial. Let's load our shader source into an NSString and create a MTLLibrary:

NSString* source = [[NSString alloc] initWithUTF8String:g_shaderCode];
MTLCompileOptions* compileOpts = [[MTLCompileOptions alloc] init];
compileOpts.languageVersion = MTLLanguageVersion2_0;

NSError* err = nil;
id<MTLLibrary> library =
    [g_mtlDevice newLibraryWithSource:source

On the first line, we create an NSString from our shader source. On the next two lines, we initialize MTLCompileOptions and set our desired shader version (we can also set preprocessor macros and toggle fast math through the compile options). The last thing to do is construct a new MTLLibrary.

Before we forget, let's clean up since we don't need the source or compile options anymore:

[compileOpts release];
[source release];

Next in renderInit, we construct the rasterization pipeline state. The pipeline state, represented by MTLRenderPipelineState, contains the fully compiled state. This state is passed to the MTLRenderCommandEncoder created in our rendering function. MTLRenderPipelineDescriptor completely defines MTLRenderPipelineState:

MTLRenderPipelineDescriptor* pipelineDescriptor =
    [MTLRenderPipelineDescriptor new];
pipelineDescriptor.vertexFunction =
    [library newFunctionWithName:@"render_vertex"];
pipelineDescriptor.fragmentFunction =
    [library newFunctionWithName:@"render_fragment"];

[library release];

pipelineDescriptor.colorAttachments[0].pixelFormat = MTLPixelFormatBGRA8Unorm;
pipelineDescriptor.depthAttachmentPixelFormat = MTLPixelFormatInvalid;

NSError* error = nil;
g_mtlPipelineState =
    [g_mtlDevice newRenderPipelineStateWithDescriptor:pipelineDescriptor
if (!g_mtlPipelineState)
  NSLog(@"Failed to create render pipeline state: %@", error);

After creating the MTLRenderPipelineDescriptor object we set its vertexFunction and fagmentFunction properties. To bind functions in the shader to the descriptor, we call newFunctionWithName with the same names used in our shader code above. In this case "render_vertex" for the vertex shader and "render_fragment" for the fragment shader. After creating our shaders we release our shader library as it is no longer needed.

Afterwords it's a matter of setting the expected pixel format for the pipeline descriptor and creating MTLRenderPipelineState from the descriptor. With that finished, our pipeline is set up and we are ready to render our triangle.


The changes to renderInit are pretty small. Likewise, the changes to our render function are minimal. We keep the MTLRenderPassDescriptor from the last tutorial and expand only on MTLRenderCommandEncoder. In the previous tutorial we created a MTLRenderCommandEncoder then immediately ended encoding. Here, we must set appropriate state and call drawPrimitives while the encoder is active. This call will render our triangle in screen space by passing 3 vertices to our shader. Don't sweat the term screen space, we'll dedicate an upcoming tutorial to coordinate systems.

Here's the relevant changes to the doRender function from the last tutorial:

void doRender()


id<MTLCommandBuffer> commandBuffer = [g_mtlCommandQueue commandBuffer];

id<MTLRenderCommandEncoder> commandEncoder =
    [commandBuffer renderCommandEncoderWithDescriptor:passDescriptor];

[commandEncoder setFrontFacingWinding:MTLWindingClockwise];
[commandEncoder setCullMode:MTLCullModeNone];
[commandEncoder setRenderPipelineState:g_mtlPipelineState];

[commandEncoder drawPrimitives:MTLPrimitiveTypeTriangle

[commandEncoder endEncoding];



As in tutorial 1, we create a command buffer and start a render command encoder. Afterwards, we set a couple rasterization properties.

setFrontFacingWinding sets the order in which vertices will be delivered to the vertex shader. This informs the next property, setCullMode, which determines visibility based on triangle winding order. Since we are winding the vertices clockwise in the vertex shader, we choose clockwise.

setCullMode will perform back or front-face culling of geometry. In our case there is nothing to cull. In most cases this will cause infamous 'black screens' or leave the developer asking "why isn't my geometry rendering?" To avoid that, we do away with culling until we have more complex geometry.

setRenderPipelineState assigns the MTLRenderPipelineState we compiled in renderInit to this encoder. Most importantly, this sets the vertex and fragment shader that will be used to rasterize our triangle.

drawPrimitives signals the beginning of triangle rasterization. Since all positioning is performed in the vertex shader we simply send 3 vertices and nothing else. The vertex shader will properly position our triangle.

Finally we end encoding and commit the command buffer to the GPU.


Whew! Now that we've learned a basic shader we're done with most of our Metal set up. We laid the groundwork for a simple rasterizer, and in upcoming tutorials we'll dive deeper into graphics programming. Topics such as 3D transforms, rendering complex scenes, and raytracing.

Metal API Usage

Below is a list of the Metal functions used in this tutorial organized by usage in a C++ function. The links in this section take you to my blog; the blog allows you to explore the Metal API using an interactive dependency tree.

Blog Entry

This article is exclusive to Steem during the monetization period. Afterwards, it's posted on my blog.

Questions, comments and feedback remain on Steem. The blog post will permanently link to this Steem post to drive interaction back to Steem.


Here's a video tutorial from Apple covering triangle rasterization. Apple's example uses a vertex buffer where as our example does away with that by using Vertex IDs. We also compile our own shaders instead of relying on XCode, which we don't use on macOS.

A couple good tidbits from the video:

  • Shaders are in 'fast mode' by default. This implies NaN handling is undefined and trigonometric functions have a limited range. Either set -fno-fast-math compiler option or use the metal::precise namespace. Covered at 49:00.
  • packed_float types. Turns out host alignment is the primary consideration, not GPU alignment. GPU likely still expects alignment between array elements. Covered at 31:47.

Github Account


Sort byBest