SSIS by example #1 - Processing files in folder

(Português) - (English)

Scenario: a customer wants to create a SSIS package to import all the files in a folder to a SQL Server database, but only the files that haven't been processed yet. The files that were imported are registered in a table with date and time of the processing and the number of rows inserted in the destination table.

I'm using SQL Server 2008 RC0 and VS2008 SP1 Beta1 (in my last post I explain how I managed to install BIDS). The .sql file and the solution are attached to this post.  The steps to accomplish this are:

1 - Create table SimpleData and FileControl table in any database (I used SSIS DB) - Figure 1.
2 - Create a new Integration Services solution using BIDS.
3 - Add a data flow task to the control flow, name it "Import file" and edit it.
    A - Add a flat file source, name it "Text file" and edit it's properties.
    B - Create a new connection manager, name it "TextFile_Connection" and define the filename as "SimpleData.txt" (browse until the directory that contains the files - c:\temp\SSIS in my case). See Figure 2.
    C - In the advanced tab, edit the columns properties: Column1 = (name: IDData, type: four byte signed integer DT_I4), Column2 =  (name: Name, type: string DT_STR, outputColumnWidth: 200), Column3 =  (name: Age, type: single-byte signed integer DT_I1).
    D -  The preview should display correctly all three columns with one record. Finish editing this source.
5 - Add an OleDb Destination and link the source to the destination using the green arrow. Name it "SSIS_SimpleData" and edit it.
    A - Create a new data connection pointing to your server and database, using the credentials you desire. Test the connection.
    B - Select [dbo].[SimpleData] as the destination table.
    C - In the mappings, delete the arrow that connects IDData (I want the destination IDENTITY to work). Since the column names generated by the file source are the same as the destination columns, the mappings are automatically done for you.
6 - Run your package, check the result and truncate the table to clean our test. Now let's get to more complicated stuff.
7 - Add three variables at the package level: (FilePath - type: String), (Records - type: Int32), (Processed - type: Int32). See figure 3.
8 - Add a for each loop container, name it "Loop thru files" and edit it.
    A - Select the "Foreach File Enumerator", point to the directory where the files reside and define the files as "*.txt". See figure 4.
    B - In the variable mappings tabs, add "User::FilePath" mapped to index 0 (yes, zero). Save and close. 
9 - Add the "Import file" task to the loop container (if you drag it around, both should move together).
10 - Add two "Execute SQL Task" to the loop container, name it "Check processed" and "Register Processed". Link the green arrows in the following order: Check -> Import -> Register, as in figure 5.
11 - Edit "Check processed" task.
    A - Use the same connection string defined in step 5A ( in my case).
    B - Define the result set as "Single row".
    C - Define the SQL Statement as "SELECT COUNT(*) AS Processed FROM FileControl WHERE ImportedFilename = ?".
    D - Add one parameter mapping: (User::FilePath - Input - VARCHAR - 0 (yes, Parameter name is zero) - 200). 
    E - Add one Result Set: (Result Name = Processed, Variable Name = User::Processed). Save and close.  
12 - Edit the green arrow connecting Check and Import tasks.
    A - Select evaluation operation as "Expression and Constraint".
    B - Define expression as "@Processed == 0", as in figure 6. Close and save.
13 - In the connection manager window, click in the connection "TextFile_Connection" and edit it's properties (F4).
    A - Edit the connection expressions, adding: (ConnectionString - @[User::FilePath]).
    B - Clear the connection string property (that was pointing to SimpleData.txt file).
14 - Disable the Register task and run the package. At this time, all four files should be imported. Truncate the SimpleData table to clean up our test.
15 - Edit the Import File task. See figure 7 for guidance.
    A - Add a Multicast transformation between the source and the destination (redo the mappings).
    B - Add a Row Count transformation to another connection of the multicast, edit properties and map VariableName to User::Records. Stop editing the data flow task.
16 - Edit the "Register processing" task.
    A - Use the same connection string defined in step 5A ( in my case).
    B - Define the SQL Statement as "INSERT INTO FileControl (ImportedFileName, RecordsImported) VALUES (?, ?)".
    C - Add two parameter mapping: (User::FilePath - Input - VARCHAR - 0 (yes, parameter name is zero) - 200) and (User::Records - Input - LONG - 1 - -1). Close and save.

At this point your package is done!
To make sure our tests get more interesting, move files SimpleData_3.txt and SimpleData_4.txt to another directory and execute the following steps:

1 - Run the package (it should import both files correctly)
2 - Check if the SimpleData table has three records and that the imported files are registered in the FileControl table.
3 - Move the files 3 and 4 back to the original directory and rerun the package.
4 - Check if the SimpleData table has eight records and all four files are registered in the FileControl table.
5 - Rerun the package. At this time neither import file and register processing should be executed.

This example resolves the scenario mentioned at the beginning of the post. You can play around with a similar solution, but instead of controlling the files using a table, you can move processed files to another directory.

Em breve eu vou traduzir o artigo, quem quiser colaborar e me ajudar na tradução... :-)


(1) image        (2) image

(3) image      (4) image

(5) image      (6)  image

(7) image      (8) image

Luciano Caixeta Moreira
This post is provided "AS IS" and confers no right