Windows PowerShell Tip of the Week

Here’s a quick tip on working with Windows PowerShell. These are published every week for as long as we can come up with new tips. If you have a tip you’d like us to share or a question about how to do something, let us know.

Calculating Text File Statistics

If you take a look at the archives for the Hey, Scripting Guy! column (which many people consider to be one of the better daily scripting columns published on TechNet) one item that jumps out at you is this: something that people seem to do over and over again is calculate statistics for text files. That is, how many lines are in my file, how many words are in my file, how many characters are in my file, etc. For various reasons, it’s never just enough to have a text file; people need to know everything there is to know about that text file.

Calculating text file statistics is relatively easy in VBScript (a little cumbersome, mind you, but relatively easy). Which leads to an obvious question: how easy is it to calculate text file statistics using Windows PowerShell? Let’s find out for ourselves.

To begin with, let’s assume we have a text file named C:\Scripts\Alice.txt, a file that contains the following information:

Curiouser and curiouser!' cried Alice (she was so much surprised, that for the moment 
she quite forgot how to speak good English); 'now I'm opening out like the largest 
telescope that ever was! Good-bye, feet!' (for when she looked down at her feet, they 
seemed to be almost out of sight, they were getting so far off). 'Oh, my poor little 
feet, I wonder who will put on your shoes and stockings for you now, dears? I'm sure 
_I_ shan't be able! I shall be a great deal too far off to trouble myself about you: 
you must manage the best way you can; --but I must be kind to them,' thought Alice, 
'or perhaps they won't walk the way I want to go! Let me see: I'll give them a new 
pair of boots every Christmas.'

We’d like to know how many words are in this file, how many lines are in this file, and how many characters are in this file. How hard is that going to be? As it turns out, not very hard at all:

Get-Content c:\scripts\alice.txt | Measure-Object -word -line -character

No, we didn’t leave anything out: one little line of code really can return all sorts of useful information about a text file. In order to get that information we simply use the Get-Content cmdlet to read the contents of the file C:\Scripts\Alice.txt. However, rather than display those contents to the screen (which is the default behavior of Get-Content), we instead “pipe” that information to the Measure-Object cmdlet. As the name implies, Measure-Object is designed to “measure” property values; for example, given a set of numbers, Measure-Object can calculate the sum and the average of those numbers, as well as report back the highest and lowest values in that set. (You say you’d like to see an example of that? No problem; that’s what this article is for.)

Of course, we didn’t pass Measure-Object a set of numbers. Instead, we passed it the contents of a text file; that’s why we tacked on the parameters –word (show me the number of words in the file); -line (show me the number of lines); and –character (show me the number of characters). In return, here’s what Measure-Object reports back:

Lines                         Words                    Characters Property
-----                         -----                    ---------- --------
1                           137                           708

Pretty cool, right?

Here’s another parameter you might find useful: -ignorewhitespace. By default, Measure-Object counts each blank space in a file as a character. In some cases that’s fine; at other times, however, you might want to ignore blank spaces. Do you want to ignore blank spaces? That’s fine; just tack the –ignorewhitespace parameter onto the end of your command, like so:

Get-Content c:\scripts\alice.txt | Measure-Object -word -line -character -ignorewhitespace

Now take a look at the number of characters found in the file:

Lines                         Words                    Characters Property
-----                         -----                    ---------- --------
     1                           137                           572

Obviously a big difference.

Incidentally, you aren’t limited to calculating statistics on text files; Measure-Object works equally well with variables. For example, suppose we assign a text value to a variable named $a:

$a = "This is a two-line value `n stored in a variable."

How many words, lines, and characters are in $a? Well, let’s try the following command and see for ourselves:

$a | Measure-Object -word -line -character

According to Measure-Object, it’s the following:

Lines                         Words                    Characters Property
-----                         -----                    ---------- --------
    2                             9                            48

And people think that statistics are hard. They’re not, at least not when you have Windows PowerShell.