Extractors.Text()

Summary

The Text() extractor supports a variety of text file formats that all follow a row/column format. It provides a set of delimiters to identify the row and column boundaries and several other parameters to parse the text file and produces a rowset based on the EXTRACT expression’s schema.

See Extractor Parameters (U-SQL) for supported parameters and their defaults values.

Examples

  • The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
  • The examples below use the sample data provided with your Data Lake Analytics account. See Prepare source data for additional information.
    
    @searchlog =
         EXTRACT UserId          int,
                 Start           DateTime,
                 Region          string,
                 Query           string,
                 Duration        int?,
                 Urls            string,
                 ClickedUrls     string
         FROM "/Samples/Data/SearchLog.tsv"
         USING Extractors.Text(delimiter: '\t', skipFirstNRows: 1);
    
    OUTPUT @searchlog 
    TO "/Output/ReferenceGuide/BuiltIn/UDOs/extractorText_SearchLog.csv" 
    USING Outputters.Csv();
    
    @Drivers =
        EXTRACT driver_id   int,
            name            string,
            street          string,
            city            string,
            region          string,
            zipcode         string,
            country         string,
            phone_numbers   string // Map
     FROM "/Samples/Data/AmbulanceData/Drivers.txt"
     USING Extractors.Text(delimiter: '\t', encoding:Encoding.Unicode);
    
    OUTPUT @Drivers 
    TO "/Output/ReferenceGuide/BuiltIn/UDOs/extractorText_Drivers.csv" 
    USING Outputters.Csv();
    
    // You need to quote ASCII with [] to make sure it is not read as a reserved U-SQL keyword
    @Trips =
        EXTRACT date    DateTime,
            driver_id   int,
            vehicle_id  int,
            trips       string // Array
         FROM "/Samples/Data/AmbulanceData/DriverShiftTrips.csv"
         USING Extractors.Text(encoding: Encoding.[ASCII]);
    
    OUTPUT @Trips 
    TO "/Output/ReferenceGuide/BuiltIn/UDOs/extractorText_DriverShiftTrips.csv" 
    USING Outputters.Csv();
    

See Also