How to: Query the Contents of Files in a Folder (LINQ)

This example shows how to query over all the files in a specified directory tree, open each file, and inspect its contents. This type of technique could be used to create indexes or reverse indexes of the contents of a directory tree. A simple string search is performed in this example. However, more complex types of pattern matching can be performed with a regular expression. For more information, see How to: Combine LINQ Queries with Regular Expressions.

Example

Module Module1
    'QueryContents 
    Public Sub Main()

        ' Modify this path as necessary. 
        Dim startFolder = "c:\program files\Microsoft Visual Studio 9.0\VB\" 

        'Take a snapshot of the folder contents 
        Dim dir As New System.IO.DirectoryInfo(startFolder)
        Dim fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories)

        Dim searchTerm = "Visual Studio" 

        ' Search the contents of each file. 
        ' A regular expression created with the RegEx class 
        ' could be used instead of the Contains method. 
        Dim queryMatchingFiles = From file In fileList _
                                 Where file.Extension = ".htm" _
                                 Let fileText = GetFileText(file.FullName) _
                                 Where fileText.Contains(searchTerm) _
                                 Select file.FullName

        Console.WriteLine("The term " & searchTerm & " was found in:")

        ' Execute the query. 
        For Each filename In queryMatchingFiles
            Console.WriteLine(filename)
        Next 

        ' Keep the console window open in debug mode.
        Console.WriteLine("Press any key to exit")
        Console.ReadKey()

    End Sub 

    ' Read the contents of the file. This is done in a separate 
    ' function in order to handle potential file system errors. 
    Function GetFileText(ByVal name As String) As String 

        ' If the file has been deleted, the right thing 
        ' to do in this case is return an empty string. 
        Dim fileContents = String.Empty

        ' If the file has been deleted since we took  
        ' the snapshot, ignore it and return the empty string. 
        If System.IO.File.Exists(name) Then
            fileContents = System.IO.File.ReadAllText(name)
        End If 

        Return fileContents

    End Function 
End Module
class QueryContents
{
    public static void Main()
    {
        // Modify this path as necessary. 
        string startFolder = @"c:\program files\Microsoft Visual Studio 9.0\";

        // Take a snapshot of the file system.
        System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);

        // This method assumes that the application has discovery permissions 
        // for all folders under the specified path.
        IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories);

        string searchTerm = @"Visual Studio";

        // Search the contents of each file. 
        // A regular expression created with the RegEx class 
        // could be used instead of the Contains method. 
        // queryMatchingFiles is an IEnumerable<string>. 
        var queryMatchingFiles =
            from file in fileList
            where file.Extension == ".htm" 
            let fileText = GetFileText(file.FullName)
            where fileText.Contains(searchTerm)
            select file.FullName;

        // Execute the query.
        Console.WriteLine("The term \"{0}\" was found in:", searchTerm);
        foreach (string filename in queryMatchingFiles)
        {
            Console.WriteLine(filename);
        }

        // Keep the console window open in debug mode.
        Console.WriteLine("Press any key to exit");
        Console.ReadKey();
    }

    // Read the contents of the file. 
    static string GetFileText(string name)
    {
        string fileContents = String.Empty;

        // If the file has been deleted since we took  
        // the snapshot, ignore it and return the empty string. 
        if (System.IO.File.Exists(name))
        {
            fileContents = System.IO.File.ReadAllText(name);
        }
        return fileContents;
    }
}

Compiling the Code

  • Create a Visual Studio project that targets the .NET Framework version 3.5. By default, the project has a reference to System.Core.dll and a using directive (C#) or Imported namespace (Visual Basic) for the System.Linq namespace. In C# projects, add a using directive for the System.IO namespace.

  • Copy this code into your project.

  • Press F5 to compile and run the program.

  • Press any key to exit the console window.

Robust Programming

For intensive query operations over the contents of multiple types of documents and files, consider using the Windows Desktop Search engine.

See Also

Concepts

LINQ to Objects

LINQ and File Directories