ARB Vertex Program Extension - Design and Applications of a Real-Time Shading System

The scheme of ARB_vertex_program extension [31] architecture is presented in figure 3. At first sight the architecture seems quite similar to a regular CPU architecture, but there are several important differences. The most important one is that this model is inherently more limited than general purpose CPU models – vertex programs have no access to main or local (graphics) memory, only a limited register file is available. Also general forms of branching are not available (although conditional execution is possible). Unlike CPU architectures, there is no native support for integer operations.

Other differences will be discussed in the following subsections. Note that we do not provide formal overview of the extension – only the most important concepts are discussed that are essential in understanding high-level shading lan-guage implementations. Also, we do not discuss the extension at the OpenGL API level at all. These details can be found in the extension specification [31].

2.5.1 Registers

There are several distinct register files: input file (contains vertex attributes), output file, temporary and constant register files (program parameters). All reg-isters (except address register) contain four-component vectors, each component being a floating point number. Address register contains a single floating point scalar value.

Temporary register file is only used to store temporary results and is both readable and writable. The extension allows to define user-specified names for temporary registers, but for clarity this thesis uses only namesr0,r1, ...

Input registers are aliased to conventional per-vertex parameters like vertex coordinates, normal vector and texture coordinates (shown in table 1). Input register file is read-only, registers are specified asvertex.attrib[N](for current implementations, N is between 0 and 15).

Output register file contains the final, transformed vertex parameters and

Output register name Components Description

result.position (x, y, z, w) Clip-space coord. of vertex result.color.front.primary (r, g, b, a) Front-facing primary color result.color.front.secondary (r, g, b, a) Front-facing secondary color result.color.back.primary (r, g, b, a) Back-facing primary color result.color.back.secondary (r, g, b, a) Back-facing secondary color result.fogcoord (f,∗,∗,∗) Fog coordinate

result.pointsize (s,∗,∗,∗) Point size

result.texcoord[n] (s, t, r, q) Texture coordinate for unit n Table 2: Output registers of vertex programs. Note that fog coordinate and point size registers are vector registers, although only a single component of them is used.

is only writable. Transformed vertex parameters are sent to the rasterization stage. Output registers are specified in table 2. Vertex program must write to result.position register (otherwise rasterization would not be possible), while writing to the other output registers is optional.

Constant register file is read-only inside vertex program, but can be changed by application outside the Begin and End scope. Constant register file is in-tended to store parameters that do not depend on vertices. Extension divides constant register file into two classes, called program environment parameters and program local parameters. Program environment parameters (specified as program.env[N], whereN is between zero and implementation-specific constant) are associated with OpenGL context and are common to all programs of the same context. Program local parameters are associated with vertex programs (specified asprogram.local[N], N is between zero and implementation-specific value).

2.5.2 Texture coordinates

In case of triangle rasterization, fragment texture coordinates and colors are interpolated using the following formula:

V_f = (a_f/w_a)V_a+ (b_f/w_b)V_b+ (c_f/w_c)V_c

(a_f/w_a) + (b_f/w_b) + (c_f/w_c) (9) Herew_a,w_b andw_cdenotew-coordinates of three vertices,V_a,V_b,V_cdenote the interpolated quantities (usually four-component vectors) for three vertices and af, bf and cf are barycentric coordinates for the fragment (by definition, a_f +b_f +c_f = 1). OpenGL allows implementations to simplify this formula by using an approximation:

V =a V +b V +c V (10)

Instructions Output Inputs Description

MAD v v,v,v Multiply and add

ADD SUB MUL v v,v Componentwise add, subtract, multiply MIN MAX v v,v Componentwise minimum, maximum SGE SLT v v,v Componentwise >=and <relations

DST LIT XPD v v,v Distance vector, lighting terms, cross product MOV ABS SWZ v v Componentwise move, absolute value, swizzle FLR FRC v v Componentwise floor, fraction

DP3 DP4 DPH s v,v Dot products for 3D, 4D, 2D vectors EX2 LG2 s s Exponential and logarithm base 2

EXP LOG s v Exponential and logarithm base 2 (approx.) RCP RSQ s s Reciprocal and reciprocal square root

POW s s,s Exponentiate

ARL a v Address register load

Table 3: Vertex program instruction set. In the inputs column letter ‘s’ denotes scalar operand, ‘v’ vector operand. In the output column ‘s’ denotes scalar output (result is replicated across all components), ‘v’ denotes vector output and ‘a’

denotes address register.

For texture coordinates, this produces very noticeable artifacts [9], and in practice this is only applicable to color interpolation. Fortunately, current im-plementations perform interpolation with true perspective correction even for colors.

2.5.3 Instructions

Vertex program instruction set is presented in table 3. The number of operations is small and seems quite limited. Branch instructions are missing, although simple conditional execution can be still achieved using a combination ofSGE (or SLT), MULand MAD(although this has some limitations that are discussed in the fourth chapter). Some instructions (MUL,SUB,MOV) resemble ordinary CPU instructions, while other instructions likeDP3,DP4are not included in CPU instruction sets. In computer graphics 3x3 and 4x4 matrix multiplications are quite common and dot product instructions allow to perform these more efficiently. DSTandLIT instruc-tions are very specific to computer graphics – both encode common instruction sequences for lighting models. DST instruction is used in an instruction sequence for calculating vector (1, d, d²,1/d) whered denotes vertex distance from a light source. LITinstruction is used to accelerate the computation of ambient, diffuse and specular lighting terms.

Few instructions like EX2, LG2,RCP,RSQ belong to the class of scalar instruc-tions. These instructions are unary instructions and take a single scalar as an argument. As all registers contains vectors, scalar operations must specify which

source component of a vector has to be used. The results of scalar operations are replicated across all components. For example, the instruction

RCP r0,r1.x;

calculates (1/r1.x,1/r1.x,1/r1.x,1/r1.x) and stores the result in r0.

2.5.4 Operand modifiers

In order to better support data rearranging in vectors, optionalswizzle operations are provided as argument modifiers for all instructions. For example, the following instruction:

ADD r0, r1.xxzw, r2.yzxw;

calculates (r1.x+r2.y,r1.x+r2.z,r1.z+r2.x,r1.w+r2.w) and stores the result in registerr0. Additionally, all source operands can be negated:

ADD r0, r1, -r2;

Note that although instruction set is small, several instructions are redundant in the sense that they can be expressed through other instructions. For example, SUBinstruction is actually not required as it can be expressed using an ADDwith the second operand negated. Likewise, XPD instruction can be emulated using a MULand an ADDinstruction.

The extension also provides support for partial register updates, this func-tionality is calledmasking. For example,

MOV r0.xy, r1;

only copies x- and y-components from register r1 to r0, z- and w-components are left unchanged. When the mask is omitted, all components will be updated.

Although vertex programs do not support random memory access, limited support is still provided for small ”lookup”tables. This is provided using a single address register – instructions can offset constant register file access using this register. ARLinstruction is provided for setting this register.

2.5.5 Resource limits

The extension specifies queriable resource limits on register counts, parameter counts and program length. There are strict minimum requirements for each limit. Every implementation must accept programs which use 12 or less tem-porary registers, 96 program parameters and 128 instructions. In case when a program exceeds hardware resources, but can be emulated within software limits, the extension provides means to detect this. Mixing software and hardware im-plementations may produce different results. This is important as rasterization

Texture

TexCoord₀ TexCoord₁ TexCoord₂ TexCoord₃

Color_sec

Figure 4: The OpenGL multitexturing pipeline. The number of texture units (here four) may vary with the implementations.

may generate different fragments depending on whether the vertex program was executed on CPU or GPU, which can produce artifacts in multipass renderings.

Even very subtle differences in calculations are magnified by point sampling at polygon edges.

2.5.6 Other vertex programming extensions

Currently, there are two other extensions for OpenGL that define similar function-ality to ARB_vertex_program: NV_vertex_program and EXT_vertex_shader.

NV_vertex_programwas the first vertex programming extension and closely fol-lows the vertex unit design of GeForce 3 GPU [16]. For example, the limited number of constant register file ports are exposed to user; a single instruction may not refer to more than one constant. EXT_vertex_shader is more suitable for constructing shaders at runtime and provides larger set of instructions, some of which are quite complex. In general, both extensions (plusARB_vertex_program) can be used to implement any other vertex programming extension and are all designed after DirectX 8 Vertex Shader 1.1 model.

DirectX 9 defines several new vertex shader models. Vertex Shader 2.0 model adds support for static looping – it provides branching instructions which may depend only on program parameters. Vertex shader specification 3.0 is more advanced, provides support for dynamic branching and instruction masking and supports texture operations for vertices. There is currently one vendor-specific OpenGL extension that provides support for dynamic branching and instruction masking –NV_vertex_program2. Though, it does not support vertex texturing.

Im Dokument Design and Applications of a Real-Time Shading System (Seite 12-16)