Quantifiers (Scripting)

 

Sometimes, you do not know how many characters there are to match. In order to accommodate that kind of uncertainty, regular expressions support the concept of quantifiers. These quantifiers let you specify how many times a given component of your regular expression must occur for your match to be true.

Quantifiers and Associated Meanings

Character

Description

*

Matches the preceding character or subexpression zero or more times. For example, 'zo*' matches "z" and "zoo". * is equivalent to {0,}.

+

Matches the preceding character or subexpression one or more times. For example, 'zo+' matches "zo" and "zoo", but not "z". + is equivalent to {1,}.

?

Matches the preceding character or subexpression zero or one time. For example, 'do(es)?' matches the "do" in "do" or "does". ? is equivalent to {0,1}

{n}

n is a nonnegative integer. Matches exactly n times. For example, 'o{2}' does not match the 'o' in "Bob," but matches the two o's in "food".

{n,}

n is a nonnegative integer. Matches at least n times. For example, 'o{2,}' does not match the 'o' in "Bob" and matches all the o's in "foooood". 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'.

{n,m}

m and n are nonnegative integers, where n <= m. Matches at least n and at most m times. For example, 'o{1,3}' matches the first three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'. Note that you cannot put a space between the comma and the numbers.

With a large input document, chapter numbers could easily exceed nine, so you need a way to handle two or three digit chapter numbers. Quantifiers give you that capability. The following JScript regular expression matches chapter headings with any number of digits:

/Chapter [1-9][0-9]*/

The following VBScript regular expression performs the identical match:

"Chapter [1-9][0-9]*"

Notice that the quantifier appears after the range expression. Therefore, it applies to the entire range expression which, in this case, specifies only digits from 0 through 9, inclusive.

The '+' quantifier is not used here because there does not necessarily need to be a digit in the second or subsequent position. The '?' character also is not used because it limits the chapter numbers to only two digits. You want to match at least one digit following 'Chapter' and a space character.

If you know that your chapter numbers are limited to only 99 chapters, you can use the following JScript expression to specify at least one, but not more than 2 digits.

/Chapter [0-9]{1,2}/

For VBScript, use the following regular expression:

"Chapter [0-9]{1,2}"

The disadvantage to the expression shown above is that if there is a chapter number greater than 99, it will still only match the first two digits. Another disadvantage is that somebody could create a Chapter 0 and it would match. A better JScript expression for matching only two digits are the following:

/Chapter [1-9][0-9]?/

-or-

/Chapter [1-9][0-9]{0,1}/

For VBScript, the following expressions are equivalent:

"Chapter [1-9][0-9]?"

-or-

"Chapter [1-9][0-9]{0,1}"

The '*', '+', and '?' quantifiers are all what are referred to as greedy, that is, they match as much text as possible. Sometimes that is not at all what you want to happen. Sometimes, you just want a minimal match.

Say, for example, you are searching an HTML document for an occurrence of a chapter title enclosed in an H1 tag. That text appears in your document as:

<H1>Chapter 1: Introduction to Regular Expressions</H1>

The following expression matches everything from the opening less than symbol (<) to the greater than symbol (>) at the end of the closing H1 tag.

/<.*>/

The VBScript regular expression is:

"<.*>"

If all you really wanted to match was the opening H1 tag, the following, non-greedy expression matches only <H1>.

/<.*?>/

-or-

"<.*?>"

By placing the '?' after a '*', '+', or '?' quantifier, the expression is transformed from a greedy to a non-greedy, or minimal, match.