The Rendering Pipeline - After Tessellation
Geometry Shader
Geometry Shader
- Unique abilities that are different from other stages
- Dynamically add or remove geometry to the pipeline.
- Provide geometry information to the vertex buffer through the stream output stage.
- Output a different type of primitive from the input primitive.
Input of Geometry Shader
- A completed primitive consisting of the vertices
- When the tessellation stages are disabled,
- Get vertices directly from the vertex shader.
- Primitive topology is anything that input assembler can designate.
- Receive information on primitive connectivity from the input assembler.
- When the tessellation stages are activated,
- Get vertices from the domain shader.
- Primitive topology is determined by the configuration of the tessellator stage.
- Primitives with adjacent information cannot be supported.
- System value semantic
- SV_PrimitiveID : a value that uniquely idnetifies an individual primitive.
- SV_GSInstanceID : used to perform different processing for each instance of primitive.
Status Description of Geometry Shader
- Geometry shader object
- Without stream output stage : ID3D11Device::CreateGeometryShader()
- With stream output stage : ID3D11Device::CreateGeometryShaderWithStreamOutput()
- Shader program : ID3D11DeviceContext::GSSetShader()
- Constant buffer : ID3D11DeviceContext::GSSetConstantBuffers()
- Shader resource view : ID3D11DeviceContext::GSSetShaderResources()
- Sampler status object : ID3D11DeviceContext::GSSetSamplers()
- Function attribute
- maxvertexcount
- The maximum number of vertices that can be emitted into the output stream in one execution of the geometry shader.
- Used to prevent too many vertices from being output stream due to logical errors.
- instance
- Activate an instanced mechanism of a geometry shader.
- A copy (instance) of the primitive is generated as many times as specified in the function attribute.
- After that, a geometry shader program is executed for each instance.
- The maximum number of instances : 32
- maxvertexcount
Process of Geometry Shader
- The method of outputting the processing result using stream objects
- Append() : a means of sending primitive data to the output stream.
- RestartStrip() : a means for creating a geometry that is not connected to each other.
- Multi-stream object
- Up to four stream output objects can be used simultaneously in the geometry shader program.
- Used by declaring multiple stream objects in a list of function parameters.
- Each output stream operates separately from each other.
- When using a multi-stream object, all streams must be point-list streams.
- The limit of the scalar amount that can be calculated from the execution of the geometry shader is 1024.
- Manipulation of the primitive
- Possible to reduce the amount of primitives to be processed by the rasterizer.
- Geometry shader can optionally discard unnecessary primitives.
- Shadow volume : the technique of creating a shadow using the outline of the object from a light source point of view.
- Point sprite : used to make the particles of the particle system appear more plausible.
- Instancing of the geometry
- Statically specify the number of instances through instance function attribute.
- Can amplify the number appearing in the final rendering without increasing the number of the primitive.
- Can render one geometry for multiple render target objects, effective when different transform matrices need to be applied to geometry for each render target object.
Output of Geometry Shader
- The vertices that must include SV_Position system value semantics, which must include the location of the culling space after projecting the vertex.
- SV_RenderTargetIndex : an unsigned integer that identifies a slice of a render target to be rendered in the current primitive.
- SV_ViewportArrayIndex : an unsigned integer that identifies a viewport to be rendered in the current primitive.
Stream Output
Stream Output
- Connect the geometry shader to the output buffer resources to which the vertex data of the geometry shader is to be recorded.
- Possible to use a total 4 output stream objects at the same time.
- The output of the stream output stage does not lead to input of any other stage of the pipeline.
Input of Stream Output
- Receive only the information delivered by the geometry shader stage through the output stream object declared in the geometry shader program.
- Different types of vertex structures can be sent for each stream.
- The limits on vertex data output as streams
- The scalar values for each vertex structure cannot exceed 128.
- The scalar values output in one shader program execution cannot exceed 1024.
Status Description of Stream Output
- The application program must link an appropriate number of buffer resources.
- Buffer resources to be connected must be generated by designating D3D11_BIND_STREAM_OUTPUT.
- The buffer resources connected to the stage are maintained in subsequent pipeline executions, so unnecessary buffers should be removed.
void ID3D11DeviceContext::SOSetTargets(
// The number of buffer to bind to the device
UINT NumBuffers,
// The array of output buffers to bind to the device
ID3D11Buffer * const *ppSOTargets,
// Array of offsets to the output buffers from ppSOTargets
const UINT *pOffsets
);
- Designate which data to send to which output slot.
struct D3D11_SO_DECLARATION_ENTRY{
// Index for identifying stream output objects used by geometry shader to send data
UINT Stream;
// Semantic name defined in the corresponding output attribute
LPCSTR SemanticName;
// Index for uniquely identifying one of several attributes with the same semantic name
UINT SemanticIndex;
// The starting component index (x = 0, y = 1, z = 2, w = 3)
BYTE StartComponent;
// How many component will be sent to the buffer from starting component
BYTE ComponentCount;
// Slot number of output buffer to receive stream data (0 ~ 3)
BYTE OutputSlot;
};
- A method for generating geometry shader objects using stream output stage.
HRESULT ID3D11Device::CreateGeometryShaderWithStreamOutput(
const void *pShaderBytecode,
SIZE_T BytecodeLength,
// A pointer that refers to the array of the structures
const D3D11_SO_DECLARATION_ENTRY *pSODeclaration,
// Size of the array (number of the elements)
UINT NumEntries,
// The vertex stride and the number of vertices to be transferred to each buffer
const UINT *pBufferStrides,
UINT NumStrides,
// Designate the output stream that will lead to the rasterizer stage
UINT RasterizedStream,
ID3D11ClassLinkage *pClassLinkage,
ID3D11GeometrySahder **ppGeometryShader
);
Process of Stream Output
- Automated rendering
- Use the recorded buffer resource as input of the input assembler stage.
- The application program must connect the buffer resource to slot 0 of the input assembler stage.
- Need to call ID3D11DeviceContext::DrawAuto()
- A vertex stream is output by the geometry shader object.
- Primitive connectivity information is determined byt the primitive type setting of the input assembler.
- Stream output as a debugging tool
- Extract information for debugging from the geometry shader stage of the pipeline.
- Used to investigate any intermediate computational value that is not included in the final rendered output.
- Two buffer resources needed
- Stream output buffer created by flagging the default : the application cannot directly access the buffer.
- Buffer designated as staging usage flag that can be read by CPU : copy the contents of the stream output buffer and read the buffer.
Output of Stream Output
- Do not lead to other pipeline stages.
- The output point of the stream output stage is a buffer resource connected to the stream output stage.
Rasterizer
Rasterizer
- Rasterization : sample geometric data at regular intervals to produce samples that can be applied to render objects.
- Fragment : samples approximating the original geometry taken from rasterization.
- Culling : exclude primitives that will not contribute to the final output rendering based on the location of the culling space.
- Clipping : cut the portion outside the culling space, which will not contribute to rendering, in the primitive.
- Scissor test : to allow the application to designate a rectangular area to which rasterization is actually applied.
Input of Rasterizer
- Clip space and normalized device coordinates
- Clip space : a coordinate system after projecting the scene geometry.
- Normalized device coordinates : the coordinate divided by $W$ after projection.
- Unit cube : a cube defining the vertex of the normalized device coordinates.
- System value semantic of clipping distance and culling distance : SV_ClipDistance, SV_CullDistance
- Viewport array index; SV_ViewportArrayIndex : an unsigned integer that identifies the viewport defining structure to be used when rastering the primitive.
- Render target array index; SV_RenderTargetArrayIndex : an unsigned integer identifying a slice to record rendering results among texture slices of a render target array.
Status Description of Rasterizer
- Rasterizer state object
struct D3D11_RASTERIZER_DESC {
// Determine the fill mode to use when rendering (solid fill or wireframe)
D3D11_FILL_MODE FillMode;
// Indicate triangles facing the specified direction are not drawn
D3D11_CULL_MODE CullMode;
// Determine if a triangle is front- or back-facing
BOOL FrontCounterClockwise;
// Depth value added to a given pixel
INT DepthBias;
// Maximum depth bias of a pixel
FLOAT DepthBiasClamp;
// Scalar on a given pixel’s slope
FLOAT SlopeScaledDepthBias;
// Enable clipping based on distance
BOOL DepthClipEnable;
// Enable scissor-rectangle culling
BOOL ScissorEnable;
// Specify whether to use the quadrilateral or alpha line anti-aliasing on MSAA render targets
BOOL MultisampleEnable;
// Specify whether to enable line antialiasing
BOOL AntialiasedLineEnable;
};
HRESULT CreateRasterizerState(
// Pointer to a rasterizer state desciption
const D3D11_RASTERIZER_DESC *pRasterizerDesc,
// Address of a pointer to the rasterizer state object created
ID3D11RasterizerState **ppRasterizerState
);
- Formula corresponding to the method of applying depth bias
- When the depth buffer connect to the pipeline is unorm, or not connected at all
$$Bias \ = \ (float)DepthBias \ * \ r \ + \ SlopeScaledDepthBias \ * \ MaxDepthSlope$$ - When the floating point depth buffer is connected to the pipeline
$$Bias \ = \ (float)DepthBias \ * \ 2(exponent(maximum\ z\ of\ the \ primitive) \ - \ r)$$$$+\ SlopeScaledDepthBias \ *\ MaxDepthSlope$$
- When the depth buffer connect to the pipeline is unorm, or not connected at all
- Viewport status
- Define a rectangular area used to map normalized device coordinates based on a unit cube to pixel positions based on a render target coordinate system.
- ID3D11DeviceContext::RSSetViewports : the method of setting viewports on the pipeline.
- Viewports not set in the pipeline are automatically released.
struct D3D11_VIEWPORT{
FLOAT TopLeftX;
FLOAT TopLeftY;
FLOAT Width;
FLOAT Height;
FLOAT MinDepth;
FLOAT MaxDepth;
};
- Scissor rectangle
- Designate an area where fragments are allowed to be generated among the render targets.
- ID3D11DeviceContext::RSSetScissorRects() : the method of binding an array of scissor rectangles on the pipeline.
- Scissor rectangles not set in the pipeline are automatically released.
Process of Rasterizer
- Culling : exclude the entire primitive that will not contribute to the final rendering result.
- Back face culling
- Exclude the side facing backward. (The side away from the viewpoint)
- Apply only to triangles in primitive.
- Whether the triangle is front or rear is determined by the 'winding' of the triangular vertices that the rasterizer receives.
- Primitive culling
- Exclude primitives that are completely out of the unit cube within normalized device coordinates.
- If all of the vertices of a primitive are outside of one clipping plane, it is certain that the primitive is outside the unit cube of clipping space.
- Primitive clipping
- Determine whether a given primitive is completely or partially contained in a unit cube.
- Completely contained : pass to the next task without performing a clipping operation.
- Partially contained : cut out the primitive and discard the outer part.
- Homogenous divde : divide the projected point into $W$ components. $[X\ / \ W, \ Y \ / \ W, \ Z \ / \ W, \ 1]$
- Viewport transformation
- Mapping a primitive of normalized device coordinates to a screen space pixel coordinates.
$$ X\ : \ [TopLeftX, \ TopLeftX \ + \ Width], \ Y \ : \ [TopLeftY, \ TopLeftY \ + \ Height]$$ - How the application selects a viewport to be applied to the current rendering pass
- Include SV_ViewportArrayIndex : select an element in the viewport array using the system value as an index.
- Not include SV_ViewportArrayIndex : connect only one viewport and the application no longer manipulates the viewport.
- Mapping a primitive of normalized device coordinates to a screen space pixel coordinates.
- Actual rasterization processing
- Convert a given primitive into discrete sample data approximated within the render object.
- Fragment generation : determine the pixels that the primitive will cover from the currently given render target.
- Regard a render object as a grid of pixels.
- Determine pixels covered by a given primitive among pixels of the grid.
- Scissor test
- Set scissor test by connecting a scissors rectangular arry to the rasterizer stage.
- Compare the $X$ and $Y$ components of the fragment with the scissors rectangle.
- Exclude the fragment outside the rectangle, that the fragment does not
- Attribute interpolation : the rasterizer interpolates each of the input attributes of the input geometry vertex
- Multi-sampling considerations : optional MSAA processing capability; apply MSAA by selecting the part that requires anti-aliasing
Output of Rasterizer
- Generation of the fragment
- The output should be reduced as much as possible because of the overall data "amplification".
- More efficient to perform calculations before numerous fragments are generated.
- Fragment data
- Depth value interpolated for each fragment, which used for depth determination of the output merger stage.
- SV_Position : the position of the fragments
Pixel Shader
Pixel Shader
- Determine the appearance of a fragment based on the information provided through the attributes of the fragment and various resources.
- Execute a designated pixel shader program for each fragment to process the fragments.
- Any direct communication between individual fragment processes is impossible, since each pixel shader execution operates independently.
Input of Pixel Shader
- Attribute interpolation
- Essential information involved in the interpolation process of attributes
- Attribute values of the data to be interpolated.
- The location where interpolation occurs.
- Interpolation mode : specific types of interpolation techniques.
- Linear mode : apply linear interpolation considering perspective effects.
- After dividing each vertex attribute by the vertex depth, interpolation is performed.
- Extract the original attributes using the reciprocal of the interpolated $W$.
- Noperspective mode : interpolate attributes based only on the 2D position on the render object.
- Nointerpolation mode : the attribute values of the first vertex of the primitive apply to all fragments of the primitive.
- Centroid mode : the centroid of the partial sample positions covered by the primitive, not the center position of the pixel, is used as the input of the interpolation function.
- Sample mode : interpolation occurs at each sample point.
- Essential information involved in the interpolation process of attributes
- MSAA and pixel shader
- Execution by sample
- Make the pixel shader perform once per subpixel, not once per pixel.
- Including SV_SampleIndex among input signatures activates execution by sample.
- Partial sample coverage
- An unsigned interger value corresponding to each partial sample of the current pixel.
- Including SV_Coverage among input signatures activates partial sample coverage.
- Execution by sample
- Fragment position
- The original location
- The 4-component location of the fragment is transferred to the pixel shader through SV_Position
- In general, $X$ and $Y$ components indicate the location of a fragment in 2D render object.
- If centroid is specified, the value of centroid used in the interpolation is transmitted.
- SV_Position includes the depth value (0 ~ 1) generated by the rasterizer.
- Destination of the fragment : select the render target slice on which the fragments of the given primitive will be recorded based on SV_RenderTargetArrayIndex
- The original location
- Direction of the primitive
- SV_IsFrontFace : indicate the direction of the primitive
- true : generated from the front primitive / false : generated from the back primitive
- Always true in primitives of dots and lines because there is no concept of direction.
Status Description of Pixel Shader
- Shader program : ID3D11DeviceContext::PSSetShader()
- Constant buffer : ID3D11DeviceContext::PSSetConstantBuffers()
- Shader resource view : ID3D11DeviceContext::PSSetShaderResources()
- Sampler status object : ID3D11DeviceContext::PSSetSamplers()
- Unordered access view
- Capable of performing 'scatter writing'
- Scatter writing : programmatically determine where to store data to be written to a resource
- The place where the unordered access view is actually connected is not the pixel shader stage, but the output merger stage.
- Early depth-stencil determination
- earlydepthstencil function attribute
- Depth-stencil determination is performed before the pixel shader is executed.
- Pre-remove fragments that are certain to fail the depth or stencil determination.
Process of Pixel Shader
- The execution of the pixel shader
- Call a pixel shader once for every given fragment.
- Calculate the output color of the fragment.
- Deliver the output color to the output merger stage
- Simultaneously perform multiple different pixel shader calls using parallel processing units of the GPU, because of the one-to-one model and data sharing restrictions between threads.
- Multiple render targets; MRT : used when the calculation for each path are different only for the pixel shader.
- Modify the depth value.
- Print a new depth value to SV_Depth
- If not recorded, the depth generated in the rasterizer is transferred to the output merger stage.
- Conservative depth output
- Hierarhical-Z culling; Hi-Z : determining overlapping objects in advance.
- When the pixel shader modifies the depth value, the GPU does not perform Hi-Z, because the result of Hi-Z may become inaccurate if the depth value changes after Hi-Z.
- Conservative depth output : a mean to cause Hi-Z to occur even when depth value is modified in the pixel shader.
- The pixel shader must specify not only the new depth value but also the inequality function.
- Hi-Z identifies and removes fragments that are safe to exclude based on the upper or lower limits specified by the inequality function.
- A simple example
// An example of calculating a color based on the material and the light
float4 PSMAIN(in VS_OUTPUT input) : SV_Target
{
// Normalize the normal of the world space and the light vector
float3 n = normalize(input.normal);
float3 l = normalize(input.light);
// Calculate the amount of light that has reached this fragment
float4 Illumination = max(dot(n, l), 0) + 0.2f;
// Get material surface color properties from the texture
float4 SurfaceColor = ColorTexture.Sample(LinearSampler, input.tex);
// The result of modulating the surface color to the illumination value
return(SurfaceColor * input.color * Illumination);
}
- Utilization of unordered access view
- Ability to record values anywhere in a resource.
- Possible to generate a histogram of the rendering results while rendering the scene.
- Possible to implement a general data structure.
- Consideration of MSAA
- Extracting MSAA textures from shaders
- Load : extract partial samples from MSAA textures.
- GetDimension : provide $X$ and $Y$ sizes of textures and the number of partial samples.
- GetSamplePosition : provide the pixel coverage determination sample point corresponding to the specified partial sample.
- Alpha-to-coverage : a technique for modifying the way pixel shader outputs are recorded on render objects.
- Apply AND bitwise operation to masks generated by coverage determination and masks generated from alpha components of pixel shader output colors.
- Apply AND operation to the result and screen space dithering mask.
- As a result, a simplified fixed dithering pattern is applied to partial samples of the surface.
- Unnecessary to mix the pixel output with the content of the render target.
- Unnecessary to align the transparent surfaces according to the distance from the camera.
- Modifying MSAA behavior with pixel shader
- Execution by sample : activate execution by sample including SV_SampleIndex.
- Depth write : manipulate the depth value in the pixel shader by using SV_Depth.
- Coverage mask : pixel shader can implement its own coverage mask by changing the SV_Coverage.
- Extracting MSAA textures from shaders
Output of Pixel Shader
- SV_Target[n] : color values of the fragment.
- SV_Depth : depth values of the fragment.
- SV_DepthGreaterThan, SV_DepthLessThan
- To enable Hi-Z to operate even when recording SV_Depth in a pixel shader.
- No reason to output semantics if the pixel shader does not change the depth value directly.
- SV_Coverage
- Used to implement a customized partial sample coverage mask.
- Should be used only when using MSAA render targets are used.
Output Merger
Output Merger
- Merge the color and depth output by the pixel shader into the render object binded to the pipeline output object.
- Perform depth test that determines the visibility of pixels by depth value.
- Perform stencil test to precisely control an area in which pixels are to be recorded among render objects.
- Perform alpha blending by mixing the color value provided by the pixel shader and color value of the render object using a specific mixing function.
Input of Output Merger
- Color values calculated by the pixel shader program.
- The depth value of the fragment, which used to update contents of the depth buffer.
Status Description of Output Merger
- Depth-stencil state object
// Once created, objects can no longer be modified.
struct D3D11_DEPTH_STENCIL_DESC{
// For depth test
// Enable depth testing
BOOL DepthEnable;
// Identify a portion of the depth-stencil buffer that can be modified by depth data
D3D11_DEPTH_WRITE_MASK DepthWriteMask;
// A function that compares depth data against existing depth data
D3D11_COMPARISON_FUNC DepthFunc;
// For stencil test
// Enable stencil testing
BOOL StencilEnable;
// Identify a portion of the depth-stencil buffer for reading stencil data
UINT8 StencilReadMask;
// Identify a portion of the depth-stencil buffer for writing stencil data
UINT8 StencilWriteMask;
// Identify how to use the results of the depth test and the stencil test for pixels whose surface normal is front / back
D3D11_DEPTH_STENCILOP_DESC FrontFace;
D3D11_DEPTH_STENCILOP_DESC BackFace;
};
- Blend state object
struct D3D11_BLEND_DESC {
// Specifies whether to use alpha-to-coverage as a multisampling technique when setting a pixel to a render target
BOOL AlphaToCoverageEnable;
// Specifies whether to enable independent blending in simultaneous render targets
// true : enable independent blending
// false : only the RenderTarget[0] members are used and RenderTarget[1..7] are ignored
BOOL IndependentBlendEnable;
D3D11_RENDER_TARGET_BLEND_DESC RenderTarget[8];
};
- Render target state object
- Multiple render target; MRT
- Use multiple render target slots. (one for each render target)
- All render target are recorded simultaneously.
- The number of pixel shader calls required to record to all render targets is one per pixel.
- Render target array
- Use one render target slot.
- Only one slice is recorded at a time.
- The pixel shader program is executed as many times as the number of texture slices per pixel.
- Multiple render target; MRT
// The total number of render objects and unordered access views should not exceed 8
// Bind render target to the pipeline
void OMSetRenderTargets(
UINT NumViews,
ID3D11RenderTargetView **ppRenderTargetViews,
ID3D11DepthStencilView *pDepthStencilView
);
// Bind render target with UAV
void OMSetRenderTargetAndUnorderedAccessViews(
UINT NumViews,
ID3D11RenderTargetView **ppRenderTargetViews,
ID3D11DepthStencilView *pDepthStencilView,
UINT UAVStartSlot,
UINT NumUAVs,
ID3D11UnorderedAccessView **ppUnorderedAccessView,
const UINT *pUAVInitialCounts
);
- Read-only depth-stencil view
- A write approach to the depth-stencil resource occurs in the prcess of the depth test of the output merger stage.
- Should generate a depth stencil view by specifying a 'read-only' flag that prevents the output merger stage from changing the depth stencil resource
Process of Ouptut Merger
- Stencil test
- Allow the application to specify in detail the area where fragments will actually be recorded in the render object.
$$(stencil \_ ref \_ value \ \& \ stencil \_ mask) \ comp \_ op \ (stencil \_ buf \_ value \ \& \ stencil \_ mask)$$
- Allow the application to specify in detail the area where fragments will actually be recorded in the render object.
// Operations to determine how to process the stencil buffer value after passing the test
enum D3D11_STENCIL_OP {
D3D11_STENCIL_OP_KEEP = 1,
D3D11_STENCIL_OP_ZERO = 2,
D3D11_STENCIL_OP_REPLACE = 3,
D3D11_STENCIL_OP_INCR_SAT = 4,
D3D11_STENCIL_OP_DECR_SAT = 5,
D3D11_STENCIL_OP_INVERT = 6,
D3D11_STENCIL_OP_INCR = 7,
D3D11_STENCIL_OP_DECR = 8
}
- Depth test
- Implementation of a classical z-buffer algorithm for visibility test.
$$(the \ depth \ value \ of \ the \ fragment) \ comp\_ op \ (the \ depth \ value \ of \ the \ depth \ buffer)$$ - success : perform a blending operation / fail : discard the fragment
- Implementation of a classical z-buffer algorithm for visibility test.
- Blending operation
- Mix the two selected colors, source and destination, to calculate the final color to be recorded in the output render target.
- A complete 4-component RGBA value is created that can be recorded on the render object.
Output of Ouptut Merger
- Resources in which values can be recoded in a pipeline execution
- Render target
- Receive blending operation results, the result of applying the blending operation to the color and alpha values of.
- Unordered access view
- Directly controlled by the pixel shader stage.
- There is a slight delay in the writing process of the pixel shader.
- Depth-stencil view
- Render target
- Once a pipeline has been executed, the modified content of the resources
- Can be used in the next rendering.
- Can be used in the computation path.
- Can be directly manipulated by the CPU.
- The MSAA render object must be resolved to a non-MSAA texture after rendering is completed.