The goal of constituency parsing (also known as "phrase structure parsing") is to identify the phrases in the text. This can be useful when extracting information from text. Customers might want to find feature names or key phrases from a big body of text, and to see the modifiers and actions surrounding each such phrase.
To a linguist, a phrase is more than just a sequence of words. To be a phrase, a group of words has to come together to play a specific role in the sentence. That group of words can be moved together or replaced as a whole, and the sentence should remain fluent and grammatical.
Consider the sentence
I want to find a new hybrid automobile with Bluetooth.
This sentence contains the noun phrase: "a new hybrid automobile with Bluetooth". How do we know that this is a phrase? We can rewrite the sentence (somewhat poetically) by moving that whole phrase to the front:
A new hybrid automobile with Bluetooth I want to find.
Or we could replace that phrase with a phrase of similar function and meaning, like "a fancy new car":
I want to find a fancy new car.
If instead we picked different subset of words, these replacement tasks would lead to strange or unreadable sentences. Consider what happens when we move "find a new" to the front:
Find a new I want to hybrid automobile with Bluetooth.
The results is very difficult to read and understand.
The goal of a parser is to find all such phrases. Interestingly, in natural language, the phrases tend to be nested inside one another. A natural representation of these phrases is a tree, such as the following:
In this tree, the branches marked "NP" are noun phrases. There are several such phrases: I, a new hybrid automobile, Bluetooth, and a new hybrid automobile with Bluetooth.
|ADJP||Adjective Phrase||"so rude"|
|ADVP||Adverb Phrase||"clear through"|
|CONJP||Conjunction Phrase||"as well as"|
|FRAG||Fragment, used for incomplete or fragmentary inputs||"Highly recommended..."|
|LST||List marker, including punctuation||"#4)"|
|NAC||Not A Constituent, used to indicate scoping of a non-constituent phrase||"and for a good deal" in "you get things and for a good deal"|
|NP||Noun Phrase||"a tasty potato pancake"|
|NX||Used within certain complex NPs to mark the head|
|PP||Prepositional Phrase||"in the pool"|
|PRT||Particle||"out" in "ripped out"|
|QP||Quantity Phrase (i.e., complex measure/amount) within a Noun Phrase||"around $75"|
|RRC||Reduced Relative Clause.||"still unresolved" in "many issues still unresolved"|
|S||Sentence or clause.||"This is a sentence."|
|SBAR||Subordinate clause, often introduced by a subordinating conjunction||"as I left" in "I looked around as I left."|
|SBARQ||Direct question introduced by a wh-word or -phrase||"What was the point?"|
|SINV||Inverted declarative sentence||"At no time were they aware." (note how the normal subject "they" was moved to after the verb "were")|
|SQ||Inverted yes/no question, or main clause of a wh- question||"Did they get the car?"|
|UCP||Unlike Coordinated Phrase||"small and with bugs" (note how an adjective and a preposition phrase are conjoined with "and")|
|VP||Verb Phrase||"ran into the woods"|
|WHADJP||Wh-adjective Phrase||"how uncomfortable"|
|WHNP||Wh-noun Phrase||"which potato", "how much soup"|
|WHPP||Wh-prepositional Phrase||"in which country"|
|X||Unknown, uncertain, or unbracketable.||first "the" in "the... the soup"|
Trees here use the S-expressions from the Penn Treebank.