How to: Populate Object Collections from Multiple Sources (LINQ)

This example shows how to merge data from different source types into a sequence of new types. The examples in the following code merge strings with integer arrays. However, the same principle applies to any two data sources, including any combination of in-memory objects (including results from LINQ to SQL queries, ADO.NET datasets, and XML documents).

Note

Do not try to join in-memory data or data in the file system with data that is still in a database. Such cross-domain joins can yield undefined results because of different ways in which join operations might be defined for database queries and other types of sources. Additionally, there is a risk that such an operation could cause an out-of-memory exception if the amount of data in the database is large enough. To join data from a database to in-memory data, first call ToList or ToArray on the database query, and then perform the join on the returned collection.

To create the data file

Example

The following example shows how to use a named type Student to store merged data from two in-memory collections of strings that simulate spreadsheet data in .csv format. The first collection of strings represents the student names and IDs, and the second collection represents the student ID (in the first column) and four exam scores.

Class Student
    Public FirstName As String 
    Public LastName As String 
    Public ID As Integer 
    Public ExamScores As List(Of Integer)
End Class 

Class PopulateCollections

    Shared Sub Main()

        ' Join content from spreadsheets into a list of Student objectss. 
        ' names.csv contains the student name 
        ' plus an ID number. scores.csv contains the ID and a  
        ' set of four test scores. The following query joins 
        ' the scores to the student names by using ID as a 
        ' matching key, and then projects the results into a new type. 

        Dim names As String() = System.IO.File.ReadAllLines("../../../names.csv")
        Dim scores As String() = System.IO.File.ReadAllLines("../../../scores.csv")

        ' Name:    Last[0],       First[1],  ID[2],     Grade Level[3] 
        '          Omelchenko,    Svetlana,  111,       2 
        ' Score:   StudentID[0],  Exam1[1]   Exam2[2],  Exam3[3],  Exam4[4] 
        '          111,           97,        92,        81,        60 

        ' This query joins two dissimilar spreadsheets based on common ID value. 
        ' Multiple from clauses are used instead of a join clause 
        ' in order to store results of id.Split. 
        ' Note the dynamic creation of a list of ints for the 
        ' TestScores member. We skip 1 because the first string 
        ' in the array is the student ID, not an exam score. 
        Dim scoreQuery1 = From name In names _
                         Let n = name.Split(New Char() {","c}) _
                         From id In scores _
                         Let s = id.Split(New Char() {","c}) _
                         Where n(2) = s(0) _
                         Select New Student() _
                         With {.FirstName = n(0), .LastName = n(1), .ID = Convert.ToInt32(n(2)), _
                               .ExamScores = (From scoreAsText In s Skip 1 _
                                             Select Convert.ToInt32(scoreAsText)).ToList()}

        ' Optional. Store the query results for faster access 
        ' in future queries. May be useful with very large data files. 
        Dim students As List(Of Student) = scoreQuery1.ToList()

        ' Display the list contents 
        ' and perform a further calculation 
        For Each s In students
            Console.WriteLine("The average score of " & s.FirstName & " " & _
                              s.LastName & " is " & s.ExamScores.Average())
        Next 

        ' Keep console window open in debug mode.
        Console.WriteLine("Press any key to exit.")
        Console.ReadKey()
    End Sub 
End Class 
' Output:  
'The average score of Adams Terry is 85.25 
'The average score of Fakhouri Fadi is 92.25 
'The average score of Feng Hanying is 88 
'The average score of Garcia Cesar is 88.25 
'The average score of Garcia Debra is 67 
'The average score of Garcia Hugo is 85.75 
'The average score of Mortensen Sven is 84.5 
'The average score of O'Donnell Claire is 72.25 
'The average score of Omelchenko Svetlana is 82.5 
'The average score of Tucker Lance is 81.75 
'The average score of Tucker Michael is 92 
'The average score of Zabokritski Eugene is 83
class Student
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int ID { get; set; }
    public List<int> ExamScores { get; set; }
}

class PopulateCollections
{
    static void Main()
    {
        // These data files are defined in How to: Join Content from Dissimilar Files (LINQ)  
        string[] names = System.IO.File.ReadAllLines(@"../../../names.csv");
        string[] scores = System.IO.File.ReadAllLines(@"../../../scores.csv");

        // Merge the data sources using a named type. 
        // var could be used instead of an explicit type. 
        // Note the dynamic creation of a list of ints for the 
        // TestScores member. We skip 1 because the first string 
        // in the array is the student ID, not an exam score.
        IEnumerable<Student> queryNamesScores =
            from name in names
            let x = name.Split(',')
            from score in scores
            let s = score.Split(',')
            where x[2] == s[0]
            select new Student()
            {
                FirstName = x[0],
                LastName = x[1],
                ID = Convert.ToInt32(x[2]),
                ExamScores = (from scoreAsText in s.Skip(1)
                              select Convert.ToInt32(scoreAsText)).
                              ToList()
            };

        // Optional. Store the newly created student objects in memory 
        // for faster access in future queries. Could be useful with 
        // very large data files.
        List<Student> students = queryNamesScores.ToList();

        // Display the results and perform one further calculation. 
        foreach (var student in students)
        {
            Console.WriteLine("The average score of {0} {1} is {2}.",
                student.FirstName, student.LastName, student.ExamScores.Average());
        }

        //Keep console window open in debug mode
        Console.WriteLine("Press any key to exit.");
        Console.ReadKey();
    }
}
/* Output: 
    The average score of Adams Terry is 85.25.
    The average score of Fakhouri Fadi is 92.25.
    The average score of Feng Hanying is 88.
    The average score of Garcia Cesar is 88.25.
    The average score of Garcia Debra is 67.
    The average score of Garcia Hugo is 85.75.
    The average score of Mortensen Sven is 84.5.
    The average score of O'Donnell Claire is 72.25.
    The average score of Omelchenko Svetlana is 82.5.
    The average score of Tucker Lance is 81.75.
    The average score of Tucker Michael is 92.
    The average score of Zabokritski Eugene is 83.
 */

The data sources in these examples are initialized with object initializers. The query uses a join clause to match the names to the scores. The ID is used as the foreign key. However, in one source the ID is a string, and in the other source it is an integer. Because a join requires an equality comparison, you must first extract the ID from the string and convert it to an integer. This is accomplished in the two let clauses. The temporary identifier x in the first let clause stores an array of three strings created by splitting the original string at each space. The identifier n in the second let clause stores the result of converting the ID substring to an integer. In the select clause, an object initializer is used to instantiate each new Student object by using the data from the two sources.

If you do not have to store the results of a query, anonymous types can be more convenient than named types. Named types are required if you pass the query results outside the method in which the query is executed. The following example performs the same task as the previous example, but uses anonymous types instead of named types:

' This query uses an anonymous type 
' Note the dynamic creation of a list of ints for the 
' TestScores member. We skip 1 because the first string 
' in the array is the student ID, not an exam score. 
Dim scoreQuery2 = From name In names _
                 Let n = name.Split(New Char() {","c}) _
                 From id In scores _
                 Let s = id.Split(New Char() {","c}) _
                 Where n(2) = s(0) _
                 Select New With {.Last = n(0), _
                                  .First = n(1), _
                                  .TestScores = (From scoreAsText In s Skip 1 _
                                     Select Convert.ToInt32(scoreAsText)).ToList()}

' Display the list contents 
' and perform a further calculation 
For Each s In scoreQuery2
    Console.WriteLine("The average score of " & s.First & " " & s.Last & " is " & s.TestScores.Average())
Next
// Merge the data sources by using an anonymous type. 
// Note the dynamic creation of a list of ints for the 
// TestScores member. We skip 1 because the first string 
// in the array is the student ID, not an exam score. 
var queryNamesScores2 =
    from name in names
    let x = name.Split(',')
    from score in scores
    let s = score.Split(',')
    where x[2] == s[0]
    select new 
    {
        First = x[0],
        Last = x[1],
        TestScores = (from scoreAsText in s.Skip(1)
                      select Convert.ToInt32(scoreAsText))
                      .ToList()
    };

// Display the results and perform one further calculation. 
foreach (var student in queryNamesScores2)
{
    Console.WriteLine("The average score of {0} {1} is {2}.",
        student.First, student.Last, student.TestScores.Average());
}

Compiling the Code

  • Create a Visual Studio project that targets the .NET Framework version 3.5. By default, the project has a reference to System.Core.dll and a using directive (C#) or Imports statement (Visual Basic) for the System.Linq namespace.

  • Copy this code into your project.

  • Press F5 to compile and run the program.

  • Press any key to exit the console window.

See Also

Concepts

LINQ and Strings

Reference

Object and Collection Initializers (C# Programming Guide)

Anonymous Types (C# Programming Guide)

Change History

Date

History

Reason

July 2008

Added second set of code examples.

Content bug fix.