Compare 2 Array of objects each 50k lines each

Trey Burkhalter 1 Reputation point
2021-02-02T17:24:43.513+00:00

I have a script that I have been working on that takes two files (both files are each 50k+ lines). The object of this script is create a new combined file. There is a key account number that match in both files. The issue is that when I run the script, it will run for 5 hours+ and I need to cut the time down. I have an idea that would make it faster, but it using a technique that I am unsure that is possible or even how to do it.

The first array is setup:

#**********************************************
#***          BORROW.TXT File Setup         ***
#**********************************************
#This sets up the array of objects for the Nelnet BORROW.TXT file

echo "Processing file: $FileSource\BORROW.TXT"
ForEach ($Line in $BorrowFile)
{
    $BrrwArray += New-Object psobject -Property @{
        BRBSS = $($Line.substring(0,9));
        BRBNMF = $($Line.Substring(9,14));
        BRBNML = $($Line.Substring(23,25));
        BRBNMM = $($Line.Substring(48,1));
        BRBADS = $($Line.Substring(49,1));
        BRBPA1 = $($Line.Substring(56,30));
        BRBPA2 = $($Line.Substring(86,30));
        BRBRPC = $($Line.Substring(116,18));
        BRBPZC = $($Line.Substring(134,15));
        BRQSTA = $($Line.Substring(153,2));
        BRBPAC = $($Line.Substring(162,5));
        BRBPPN = $($Line.Substring(167,11));
        BRBDOB = $($Line.Substring(355,7));
    }

}

The second array is setup in a similar manner:

#********************************************
#***          LOAN.TXT File Setup         ***
#********************************************
#This sets up the array of objects for the Nelnet LOAN.TXT file
echo "Processing file: $FileSource\LOAN.TXT"
ForEach ($Line in $LoanFile)
{
    $LNArray += New-Object psobject -Property @{
        LNBSS = $($Line.substring(0,9));
        LNLLN = $($Line.substring(9,2));
        LNLLTY = $($Line.substring(11,4));
        LNLSTS = $($Line.substring(15,3));
        LNGGR = $($Line.substring(93,1));
        LNQSSC = $($Line.substring(103,6));
        LNLIRT = $($($Line.substring(120,5)).trimstart('0'));
        LNLGSD = $($Line.substring(126,7));
        LNL1DD = $($Line.substring(133,7));
        LNLAMD = $($Line.substring(140,9));
        LNLCUP = $($Line.substring(183,9));
        LNBOND = $($Line.substring(295,3));
        LNLN02 = $($Line.substring(548,3));
        LNOQST = $($Line.substring(656,1))
    }


}

The key field that matches in each array is BRBSS and LNBSS. In the first array each line is unique with no repeats, however, in the second array there can be repeats of the field LNBSS based on accounts each member has opened (IE: car loan, revolving loans, student loans, etc...).

The merging loop looks like this:

echo "Merging Files to: $OutFile"
ForEach ($BrwItem in $BrrwArray)
{   
    ForEach ($LNItem in $LNArray)
    {
        If ($($LNItem.LNBSS) -eq $($BrwItem.BRBSS))
        {
               #merge data
        }
    }
}

The idea is to create a smaller array that only contains the data from the array LNItem where LNBSS matches $BrwItem.BRBSS.

What is the best way to accomplish this in the fast most effective way so I am not basically looping 50k+ * 50k+times?

Windows Server PowerShell
Windows Server PowerShell
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.PowerShell: A family of Microsoft task automation and configuration management frameworks consisting of a command-line shell and associated scripting language.
5,383 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Rich Matheisen 45,096 Reputation points
    2021-02-02T21:09:30.813+00:00

    Since one file has only unique values, try using a hash instead of an array of (expensive) PSObject types. This uses a "hash of hashes" for the Borrow.Txt file. After the hash has been built you only need to process the Loan.txt file line-by-line.

    #**********************************************
     #***          BORROW.TXT File Setup         ***
     #**********************************************
     #This sets up the array of objects for the Nelnet BORROW.TXT file
    
     echo "Processing file: $FileSource\BORROW.TXT"
    
    $BrrwHash = @{}
    Get-Content $BorrowFile |
        ForEach-Object{
            $BrrwHash[$_.substring(0,9)] = @{
                BRBNMF = $_.Substring(9,14)
                BRBNML = $_.Substring(23,25)
                BRBNMM = $_.Substring(48,1)
                BRBADS = $_.Substring(49,1)
                BRBPA1 = $_.Substring(56,30)
                BRBPA2 = $_.Substring(86,30)
                BRBRPC = $_.Substring(116,18)
                BRBPZC = $_.Substring(134,15)
                BRQSTA = $_.Substring(153,2)
                BRBPAC = $_.Substring(162,5)
                BRBPPN = $_.Substring(167,11)
                BRBDOB = $_.Substring(355,7)
            }
    
    echo "Merging Files to: $OutFile"
    Get-Content "Loan.txt" |
        ForEach-Item{
            $lnbss = $_.substring(0,9)
            If ($BrrwHash.ContainsKey($lnbss){
                $BrrwValues = $BrrwHash[$lnbss]
                #merge data
                # using, for example $brrwValues['BRBNMF'] and $_.Substring(x,y)
            }
        }
    

    Because the hash is fairly large you might want to have a look at using the System.Collections.Hashtable class to create a new object and specify an initial size for the hash. If you think the file contains 50K items use 50000 (or 55000). That should reduce the time to create the hash.


  2. Titan 206 Reputation points
    2021-02-02T21:39:01.877+00:00

    Hello @Trey Burkhalter !

    Here is an suggestion you need to adjust.

    Does it works for you?

    $borrow = 'aaghkhkggg','bbhjkhjkhk','ccsdfdffdd','ddbnmbnnnm'  
      
    $loan = 'aa12321323','aa55555555','bb75367567','bb12321323','bb34455454','dd--------','ee????????'  
    
    # create a hashtable  
    $result = @{}  
    
    # create a key for each accountnumber and assign to it an array with the corresponding line.  
    $borrow | ForEach-Object { $result.add($_.substring(0,2), @($_)) }  
    
    # add each loan entry to the corresponding account number key  
    $loan | ForEach-Object {  
      $key = $_.substring(0,2)  
      if( $result.ContainsKey($key) )  
      {  
        $result[$key] += @($_)  
      }  
      else  
      {  
        Write-Host "loan not in borrow: $_"  
      }  
    }  
    
    # iterate through the result and do the work  
    # your borrow line is in $($result[$_][0])  
    $result.Keys | ForEach-Object { "Borrow: $_`nFirst loan: $($result[$_][1])`n"}  
    

    Keep attention, 'cc' does not have any loan

    Best whishes

    ---
    If this answer is helpfull, accept and upvote it please

    0 comments No comments