針對 Visual Studio Code 中的 Azure Data Lake Analytics 使用 Python、R 和 C# 開發 U-SQLDevelop U-SQL with Python, R, and C# for Azure Data Lake Analytics in Visual Studio Code

了解如何使用 Visual Studio Code (VSCode) 以透過 U-SQL 撰寫 Python、R 和 C# 的程式碼後置,並將作業提交至 Azure Data Lake 服務。Learn how to use Visual Studio Code (VSCode) to write Python, R and C# code behind with U-SQL and submit jobs to Azure Data Lake service. 如需關於 Azure Data Lake Tools for VSCode 的詳細資訊,請參閱使用 Azure Data Lake Tools for Visual Studio CodeFor more information about Azure Data Lake Tools for VSCode, see Use the Azure Data Lake Tools for Visual Studio Code.

撰寫程式碼後置自訂程式碼之前,您需要在 VSCode 中開啟資料夾或工作區。Before writing code-behind custom code, you need to open a folder or a workspace in VSCode.

Python 和 R 的必要條件Prerequisites for Python and R

為您的 ADL 帳戶註冊 Python 和 R 擴充功能組件。Register Python and, R extensions assemblies for your ADL account.

  1. 在入口網站開啟您的帳戶。Open your account in portal.

    • 選取 [概觀]。Select Overview.
    • 按一下 [範例指令碼]。Click Sample Script.
  2. 按一下 [更多]。Click More.

  3. 選取 [安裝 U-SQL 擴充功能]。Select Install U-SQL Extensions.

  4. 安裝 U-SQL 擴充功能之後,會顯示確認訊息。Confirmation message is displayed after the U-SQL extensions are installed.

    設定 Python 與 R 的環境

    注意

    為了在 Python 與 R 語言服務方面獲得最佳體驗,請安裝 VSCode Python 與 R 擴充功能。For best experiences on Python and R language service, please install VSCode Python and R extension.

開發 Python 檔案Develop Python file

  1. 在您的工作區中按一下 [新增檔案]。Click the New File in your workspace.

  2. 在 U-SQL 中撰寫程式碼。Write your code in U-SQL. 以下是程式碼範例。The following is a code sample.

    REFERENCE ASSEMBLY [ExtPython];
    @t  = 
        SELECT * FROM 
        (VALUES
            ("D1","T1","A1","@foo Hello World @bar"),
            ("D2","T2","A2","@baz Hello World @beer")
        ) AS 
            D( date, time, author, tweet );
    
    @m  =
        REDUCE @t ON date
        PRODUCE date string, mentions string
        USING new Extension.Python.Reducer("pythonSample.usql.py", pyVersion : "3.5.1");
    
    OUTPUT @m
        TO "/tweetmentions.csv"
        USING Outputters.Csv();
    
  3. 在腳本檔案上按一下滑鼠右鍵,然後選取 [ADL]:產生 Python 程式碼後置檔案。Right-click a script file, and then select ADL: Generate Python Code Behind File.

  4. 工作資料夾中會隨即產生 xxx.usql.py 檔案。The xxx.usql.py file is generated in your working folder. 在 Python 檔案中撰寫程式碼。Write your code in Python file. 以下是程式碼範例。The following is a code sample.

    def get_mentions(tweet):
        return ';'.join( ( w[1:] for w in tweet.split() if w[0]=='@' ) )
    
    def usqlml_main(df):
        del df['time']
        del df['author']
        df['mentions'] = df.tweet.apply(get_mentions)
        del df['tweet']
        return df
    
  5. 以滑鼠右鍵按一下 USQL 檔案,按一下 [編譯指令碼] 或 [提交作業] 即可執行作業。Right-click in USQL file, you can click Compile Script or Submit Job to running job.

開發 R 檔案Develop R file

  1. 在您的工作區中按一下 [新增檔案]。Click the New File in your workspace.

  2. 在 U-SQL 檔案中撰寫程式碼。Write your code in U-SQL file. 以下是程式碼範例。The following is a code sample.

    DEPLOY RESOURCE @"/usqlext/samples/R/my_model_LM_Iris.rda";
    DECLARE @IrisData string = @"/usqlext/samples/R/iris.csv";
    DECLARE @OutputFilePredictions string = @"/my/R/Output/LMPredictionsIris.txt";
    DECLARE @PartitionCount int = 10;
    
    @InputData =
        EXTRACT SepalLength double,
                SepalWidth double,
                PetalLength double,
                PetalWidth double,
                Species string
        FROM @IrisData
        USING Extractors.Csv();
    
    @ExtendedData =
        SELECT Extension.R.RandomNumberGenerator.GetRandomNumber(@PartitionCount) AS Par,
            SepalLength,
            SepalWidth,
            PetalLength,
            PetalWidth
        FROM @InputData;
    
    // Predict Species
    
    @RScriptOutput =
        REDUCE @ExtendedData
        ON Par
        PRODUCE Par,
                fit double,
                lwr double,
                upr double
        READONLY Par
        USING new Extension.R.Reducer(scriptFile : "RClusterRun.usql.R", rReturnType : "dataframe", stringsAsFactors : false);
    OUTPUT @RScriptOutput
    TO @OutputFilePredictions
    USING Outputters.Tsv();
    
  3. 以滑鼠右鍵按一下 [ script.usql檔案],然後選取 [ADL]:產生 R 程式碼後置檔案。Right-click in USQL file, and then select ADL: Generate R Code Behind File.

  4. 工作資料夾中會隨即產生 xxx.usql.y 檔案。The xxx.usql.r file is generated in your working folder. 在 R 檔案中撰寫程式碼。Write your code in R file. 以下是程式碼範例。The following is a code sample.

    load("my_model_LM_Iris.rda")
    outputToUSQL=data.frame(predict(lm.fit, inputFromUSQL, interval="confidence"))
    
  5. 以滑鼠右鍵按一下 USQL 檔案,按一下 [編譯指令碼] 或 [提交作業] 即可執行作業。Right-click in USQL file, you can click Compile Script or Submit Job to running job.

開發 C# 檔案Develop C# file

程式碼後置檔案是與單一 U-SQL 指令碼關聯的 C# 檔案。A code-behind file is a C# file associated with a single U-SQL script. 您可以在程式碼後置檔案中定義專用於 UDO、UDA、UDT 和 UDF 的指令碼。You can define a script dedicated to UDO, UDA, UDT, and UDF in the code-behind file. UDO、UDA、UDT 和 UDF 可以直接在指令碼中使用,而不需要先註冊組件。The UDO, UDA, UDT, and UDF can be used directly in the script without registering the assembly first. 程式碼後置檔案會放在與其對等互連 U-SQL 指令碼檔案相同的資料夾中。The code-behind file is put in the same folder as its peering U-SQL script file. 如果指令碼名稱為 xxx.usql,程式碼後置就會被命名為 xxx.usql.cs。If the script is named xxx.usql, the code-behind is named as xxx.usql.cs. 如果您手動刪除該程式碼後置檔案,系統就會停用其相關聯之 U-SQL 指令碼的程式碼後置功能。If you manually delete the code-behind file, the code-behind feature is disabled for its associated U-SQL script. 如需撰寫 u-sql 腳本之客戶程式碼的詳細資訊,請參閱在 sql-dmo 中撰寫和使用自訂程式碼:使用者定義函數For more information about writing customer code for U-SQL script, see Writing and Using Custom Code in U-SQL: User-Defined Functions.

  1. 在您的工作區中按一下 [新增檔案]。Click the New File in your workspace.

  2. 在 U-SQL 檔案中撰寫程式碼。Write your code in U-SQL file. 以下是程式碼範例。The following is a code sample.

    @a = 
        EXTRACT 
            Iid int,
        Starts DateTime,
        Region string,
        Query string,
        DwellTime int,
        Results string,
        ClickedUrls string 
        FROM @"/Samples/Data/SearchLog.tsv" 
        USING Extractors.Tsv();
    
    @d =
        SELECT DISTINCT Region 
        FROM @a;
    
    @d1 = 
        PROCESS @d
        PRODUCE 
            Region string,
        Mkt string
        USING new USQLApplication_codebehind.MyProcessor();
    
    OUTPUT @d1 
        TO @"/output/SearchLogtest.txt" 
        USING Outputters.Tsv();
    
  3. 以滑鼠右鍵按一下 [ script.usql檔案],然後選取 [ADL]:產生 CS 程式碼後置檔案。Right-click in USQL file, and then select ADL: Generate CS Code Behind File.

  4. 工作資料夾中會隨即產生 xxx.usql.cs 檔案。The xxx.usql.cs file is generated in your working folder. 在 CS 檔案中撰寫程式碼。Write your code in CS file. 以下是程式碼範例。The following is a code sample.

    namespace USQLApplication_codebehind
    {
        [SqlUserDefinedProcessor]
    
        public class MyProcessor : IProcessor
        {
            public override IRow Process(IRow input, IUpdatableRow output)
            {
                output.Set(0, input.Get<string>(0));
                output.Set(1, input.Get<string>(0));
                return output.AsReadOnly();
            } 
        }
    }
    
  5. 以滑鼠右鍵按一下 USQL 檔案,按一下 [編譯指令碼] 或 [提交作業] 即可執行作業。Right-click in USQL file, you can click Compile Script or Submit Job to running job.

後續步驟Next steps