Intrinsics Overview

Microsoft Specific

3DNow! technology provides up to 26 additional instructions to support high-performance 3D graphics and audio processing. 3DNow! instructions are vector instructions that operate on 64-bit registers. 3DNow! instructions are SIMD; each instruction operates on pairs of 32-bit values. See 3DNow! Intrinsics for the reference documentation for the AMD intrinsics.

Vector instructions operate in parallel on two sets of 32-bit single-precision, floating-point words. Scalar instructions operate on a single set of 32-bit operands (from the low halves of the two 64-bit operands).

The 3DNow! single-precision, floating-point format is compatible with the IEEE 754 single-precision format. This format comprises a 1-bit sign, an 8-bit biased exponent, and a 23-bit significand with one hidden integer bit for a total of 24 bits in the significand. The bias of the exponent is 127, consistent with the IEEE single-precision standard. The significands are normalized to be within the range of [1,2).

In contrast to the IEEE standard that dictates four rounding modes, 3DNow! technology supports one rounding mode, as either round-to-nearest or round-to-zero (truncation). The hardware implementation of 3DNow! technology determines the rounding mode. The AMD processors implement round-to-nearest mode. Regardless of the rounding mode used, the floating-point-to-integer and integer-to-floating-point conversion instructions, PF2ID and PI2FD, always use the round-to-zero (truncation) mode.

The largest representable normal number in magnitude for this precision in hexadecimal has an exponent of FEh and a significand of 7FFFFFh, with a numerical value of 2127 (2 – 2–23). All results that overflow above the maximum representable positive value are saturated to either this maximum representable normal number or to positive infinity. Similarly, all results that overflow below the minimum representable negative value are saturated to either this minimum representable normal number or to negative infinity.

The implementation of 3DNow! technology determines how arithmetic overflow is handled, as either properly signed maximum or minimum representable normal numbers or properly signed infinities. The processor generates properly signed maximum or minimum representable normal numbers.

Infinities and NANs are not supported as operands to 3DNow! instructions.

The smallest representable normal number in magnitude for this precision in hexadecimal has an exponent of 01h and a significand of 000000h, with a numerical value of 2–126. Accordingly, all results below this minimum representable value in magnitude are held to zero. The following table shows the exponent ranges supported by the 3DNow! technology.

3DNow! Technology Exponent Ranges

Biased exponent

Description

FFh

Unsupported. Unsupported numbers can be used as operands. The results of operations with unsupported numbers are undefined.

00h

Zero.

00h<x<FFh

Normal.

01h

2 (1–127) lowest possible exponent.

FEh

2 (254–127) largest possible exponent.

Like MMX instructions, 3DNow! instructions do not generate numeric exceptions or set any status flags. It is the user's responsibility to ensure that in-range data is provided to 3DNow! instructions and that all computations remain within valid ranges (or are held as expected).

The register operations of all 3DNow! floating-point instructions are executed by either the register X unit or the register Y unit. One operation can be issued to each register unit at each clock cycle for a maximum issue and execution rate of two 3DNow! operations per cycle.

Normally, in high-performance 3DNow! code, all 3DNow! instructions are properly scheduled apart from each other to avoid delays caused by execution resource contentions (as well as taking into account dependencies and execution latencies).

For further information regarding code optimization on the AMD-K6 processor, see the AMD-K6 Processor Code Optimization Application Note, order number 21924. This document provides in-depth discussions of code optimization techniques for the processor.

For execution resources information on the AMD Athlon processor, refer to the AMD Athlon Processor x86 Code Optimization Guide, order number 22007.

The 3DNow! performance enhancement instructions for AMD processors are summarized in the following tables.

AMD 3DNow! Floating-Point Instructions

Operation

Function

Opcode

PAVGUSB

Packed 8-bit unsigned integer averaging

BFh

PFADD

Packed floating-point addition

9Eh

PFSUB

Packed floating-point subtraction

9Ah

PFSUBR

Packed floating-point reverse subtraction

Aah

PFACC

Packed floating-point accumulate

Aeh

PFCMPGE

Packed floating-point comparison, greater or equal

90h

PFCMPGT

Packed floating-point comparison, greater

A0h

PFCMPEQ

Packed floating-point comparison, equal

B0h

PFMIN

Packed floating-point minimum

94h

PFMAX

Packed floating-point maximum

A4h

PI2FD

Packed 32-bit integer to floating-point conversion

0Dh

PF2ID

Packed floating-point to 32-bit integer

1Dh

PFRCP

Packed floating-point reciprocal approximation

96h

PFRSQRT

Packed floating-point reciprocal square root approximation

97h

PFMUL

Packed floating-point multiplication

B4h

PFRCPIT1

Packed floating-point reciprocal first iteration step

A6h

PFRSQIT1

Packed floating-point reciprocal square root first iteration step

A7h

PFRCPIT2

Packed floating-point reciprocal/reciprocal square root second iteration step

B6h

PMULHRW

Packed 16-bit integer multiply with rounding

B7h

AMD 3DNow! Performance Enhancement Instructions

Operation

Function

Opcode second byte

FEMMS

Faster entry/exit of the MMX or floating-point state.

0Eh

PREFETCH/PREFETCHW

  • The function prefetches at least a 32-byte line into L1 data cache (Dcache).

  • The AMD-K6-2 and AMD-K6-III processors execute the PREFETCHW instruction identically to the PREFETCH instruction.

  • On the AMD Athlon processor, PREFETCHW can increase performance by providing a hint to the processor of an intent to modify the cache line.

0Dh

AMD Athlon Processor 3DNow! Technology DSP Extensions

Operation

Function

Opcode / imm8

PF2IW

Packed floating-point to integer word conversion with sign extend

0Fh 0Fh / 1Ch

PFNACC

Packed floating-point negative accumulate

0Fh 0Fh / 8Ah

PFPNACC

Packed floating-point mixed positive-negative accumulate

0Fh 0Fh / 8Eh

PI2FW

Packed integer word to floating-point conversion

0Fh 0Fh / 0Ch

PSWAPD

Packed swap doubleword

0Fh 0Fh / BBh

MMX Instruction set extensions starting with AMD Athlon Processor

Operation

Function

Opcode / imm8

MASKMOVQ

Streaming (cache bypass) store using byte mask

0Fh F7h

MOVNTQ

Streaming (cache bypass) store

0Fh E7h

PAVGB

Packed average of unsigned byte

0Fh E0h

PAVGW

Packed average of unsigned word

0Fh E3h

PEXTRW

Extract word into integer register

0Fh C5h

PINSRW

Insert word from integer register

0Fh C4h

PMAXSW

Packed maximum signed word

0Fh Eeh

PMAXUB

Packed maximum unsigned byte

0Fh Deh

PMINSW

Packed minimum signed word

0Fh Eah

PMINUB

Packed minimum unsigned byte

0Fh Dah

PMOVMSKB

Move byte mask to integer register

0Fh D7h

PMULHUW

Packed multiply high unsigned word

0Fh E4h

PREFETCHNTA

Move data closer to the processor using the NTA reference

0Fh 18h 0*

PREFETCHT0

Move data closer to the processor using the T0 reference

0Fh 18h 1*

PREFETCHT1

Move data closer to the processor using the T1 reference

0Fh 18h 2*

PREFETCHT2

Move data closer to the processor using the T2 reference

0Fh 18h 3*

PSADBW

Packed sum of absolute byte differences

0Fh F6h

PSHUFW

Packed shuffle word

0Fh 70h

SFENCE

Store fence

0Fh AEh / 7h

*The number after the opcode indicates the different prefetch modes in the modR/M byte.

For further information regarding code optimization on the AMD-K6-2 processor, see the AMD-K6-2 Processor Code Optimization Application Note, order number 21924. This document provides in-depth discussions of code optimization techniques for the AMD-K6 family processor.

For execution resources information on the AMD Athlon processor, refer to the AMD Athlon Processor x86 Code Optimization Guide, order number 22007. This document provides in-depth discussions of code optimization techniques for the AMD Athlon processor.

See https://go.microsoft.com/fwlink/?LinkID=95131 for the online versions of these documents.

See Also

Reference

AMD 3DNow! Technology Overview and Intrinsics