C++ Tokens

A token is the smallest element of a C++ program that is meaningful to the compiler. The C++ parser recognizes these kinds of tokens: identifiers, keywords, literals, operators, punctuators, and other separators. A stream of these tokens makes up a translation unit.

Tokens are usually separated by "white space." White space can be one or more:

  • Blanks

  • Horizontal or vertical tabs

  • New lines

  • Formfeeds

  • Comments

The following are considered tokens:

keywordidentifierconstantoperatorpunctuator

The following are considered preprocessing tokens:

header-nameidentifierpp-numbercharacter-constantstring-literaloperatorpunctuator each nonwhite-space character that cannot be one of the above

The parser separates tokens from the input stream by creating the longest token possible using the input characters in a left-to-right scan. Consider this code fragment:

a = i+++j;

The programmer who wrote the code might have intended either of these two statements:

a = i + (++j)

a = (i++) + j

Because the parser creates the longest token possible from the input stream, it chooses the second interpretation, making the tokens i++, +, and j.

See Also

Reference

Lexical Conventions