Could not install R package to R server in Azure HDInsight using script actions

Suryanto 16 Reputation points
2020-09-30T15:01:13.097+00:00

I have deployed Azure spark HDInsight with ML Server ; i have done this many times in last 2 years, but recently i could not install R packages in azure R server using action scripts from https://raw.githubusercontent.com/Microsoft/r-server-loan-chargeoff/master/HDI/ActionScripts/InstallRPackages.sh
the script shows running completely with green tick but the R package (hashFunction, sparklyr, smbinning) are not installed.

PS: I have done same thing in last 2 years without problem, the last successful attempt i remember was 2 months ago.

to follow this experiment mannually
https://microsoft.github.io/r-server-loan-credit-risk/data-scientist.html

Anyone can help ? thank you very much

Azure R Server for HDInsight
Azure R Server for HDInsight
An Azure service that provides predictive analytics, machine learning, and statistical modeling for big data.
13 questions
0 comments No comments
{count} votes

5 answers

Sort by: Most helpful
  1. Suryanto 16 Reputation points
    2020-09-30T18:21:52.277+00:00

    I checked the Ambari and the error message is

    Warning: unable to access index for repository https://mran.microsoft.com/snapshot/2017-03-15/src/contrib:
    Line starting '<head><title>Documen ...' is malformed!
    Warning message:
    package ‘hashFunction’ is not available (for R version 3.3.3)


  2. Suryanto 16 Reputation points
    2020-10-01T17:46:53.09+00:00

    Hi @MartinJaffer-MSFT
    Thank you for your help.
    Yes, since I encountered this error i have tried at least 5 times ; and my students did it too and experienced same error. so what we did as follow:

    1. Deploy Azure HDInsight with R server
    2. Once cluster completely deployed and running ; we run script action
      https://raw.githubusercontent.com/Microsoft/r-server-loan-chargeoff/master/HDI/ActionScripts/InstallRPackages.sh
      with parameter hashFunction sparklyr smbinning to install those 3 packages with target edge, head and worker
    3. script action completed with green check ; but the R packages were not deployed
    4. I check Ambari error and show the error in my previous message
    5. I tried to install hashFunction on edge by directly run install.pacakages("hashFunction") but also experiences same error

    Warning: unable to access index for repository https://mran.microsoft.com/snapshot/2017-03-15/src/contrib:
    Line starting '<head><title>Documen ...' is malformed!
    Warning message:
    package ‘hashFunction’ is not available (for R version 3.3.3)

    Please advise me if you need more information to help me fix this problem. thank you


  3. MartinJaffer-MSFT 26,021 Reputation points
    2020-10-01T22:55:31.95+00:00

    I have begun to Repro. Below are my steps and findings for the current instance.

    Click the quickstart template, chose worker nodes = 2, filled out the rest and started to deploy.

    Deployment error: Cluster deployment failed due to an error in the custom script action. Failed Actions: 'lcrsetup',Please go to Ambari UI to further debug the failure.29638-errors-105.txt

    Then I attempt to enter the Ambari UI, but failed with IP not found message.

    Then I decided to inspect the storage account for logs. While navigating (via portal), I noticed directories were created, showing up as blobs. Perhaps Heirarchical namespace should be enabled?

    I found relevant details in /custom-scriptaction-logs/rserverrepro/2020-10-01/XXXXXAmbariDb-XXXXXX.xx.internal.cloudapp.net/errors-105.txt See attatched.

    Note in the attached log file, the first error is The extension azure-ml-admin-cli already exists.
    The second is az ml admin node setup: error: argument --admin-password/-p: expected one argument

    This leads me to believe the pre-installed cli version is different from the one your script expects. Assuming the node setup creates directories and Rprofile.site, it would explain why they were not found when the node setup failed.

    I am going to consult with my colleagues.

    0 comments No comments

  4. Suryanto 16 Reputation points
    2020-10-02T18:27:53.45+00:00

    Hi @MartinJaffer-MSFT
    Thanks for your reply ; I am not sure why the portal assign me different ID when I replied for the first time.

    Thank you very much for your time helping me.

    FYI the script in this link no longer works cos some changes in version. Even if I deployed spark HDInsight cluster with R server using GUI from azure portal now also failed.

    Now I did that loan credit risk experiment through manual deployment, the steps that i did are as follow:

    1. I used JSON template to deploy Azure HDInsight cluster with R server ; (I can send you the template if you want reproduce the problem please let me know your email if possible as i want to send you some other files ; screenshot and video to help you better understand my problem)
    2. then i run script action to install additional packages i need for loan credit experiment, below is detail
      NAME
      installRPackages
      URI
      https://raw.githubusercontent.com/Microsoft/r-server-loan-chargeoff/master/HDI/ActionScripts/InstallRPackages.sh
      PARAMETERS
      sparklyr smbinning hashFunction
      ROLES
      Head,Worker,Edge node

    this is the stage i got the error. the R packages cannot be installed. the error message i have sent in previous message

    1. for lcrsetup.sh please ignore that ; this script is to prepare edge for web services, I do not need that for my experiment

    below is the content of JSON to deploy the HDInsight spark cluster with R server

    {
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
    "clusterName": {
    "type": "string",
    "metadata": {
    "description": "The name of the HDInsight cluster to create."
    }
    },
    "clusterLoginUserName": {
    "type": "string",
    "metadata": {
    "description": "These credentials can be used to submit jobs to the cluster and to log into cluster dashboards."
    }
    },
    "clusterLoginPassword": {
    "type": "securestring",
    "metadata": {
    "description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter."
    }
    },
    "sshUserName": {
    "type": "string",
    "metadata": {
    "description": "These credentials can be used to remotely access the cluster."
    }
    },
    "sshPassword": {
    "type": "securestring",
    "metadata": {
    "description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter."
    }
    },
    "location": {
    "type": "string",
    "defaultValue": "[resourceGroup().location]",
    "metadata": {
    "description": "Location for all resources."
    }
    }
    },
    "variables": {
    "defaultStorageAccount": {
    "name": "[uniqueString(resourceGroup().id)]",
    "type": "Standard_LRS"
    }
    },
    "resources": [
    {
    "type": "Microsoft.Storage/storageAccounts",
    "name": "[variables('defaultStorageAccount').name]",
    "location": "[parameters('location')]",
    "apiVersion": "2016-01-01",
    "sku": {
    "name": "[variables('defaultStorageAccount').type]"
    },
    "kind": "Storage",
    "properties": {}
    },
    {
    "type": "Microsoft.HDInsight/clusters",
    "name": "[parameters('clusterName')]",
    "location": "[parameters('location')]",
    "apiVersion": "2015-03-01-preview",
    "dependsOn": [
    "[concat('Microsoft.Storage/storageAccounts/',variables('defaultStorageAccount').name)]"
    ],
    "properties": {
    "clusterVersion": "3.6",
    "osType": "Linux",
    "tier": "standard",
    "clusterDefinition": {
    "kind": "rserver",
    "configurations": {
    "gateway": {
    "restAuthCredential.isEnabled": true,
    "restAuthCredential.username": "[parameters('clusterLoginUserName')]",
    "restAuthCredential.password": "[parameters('clusterLoginPassword')]"
    },
    "rserver": {
    "rstudio": true
    }
    }
    },
    "storageProfile": {
    "storageaccounts": [
    {
    "name": "[replace(replace(concat(reference(concat('Microsoft.Storage/storageAccounts/', variables('defaultStorageAccount').name), '2016-01-01').primaryEndpoints.blob),'https:',''),'/','')]",
    "isDefault": true,
    "container": "[parameters('clusterName')]",
    "key": "[listKeys(resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name), '2016-01-01').keys[0].value]"
    }
    ]
    },
    "computeProfile": {
    "roles": [
    {
    "name": "headnode",
    "targetInstanceCount": 2,
    "hardwareProfile": {
    "vmSize": "Standard_D12_v2"
    },
    "osProfile": {
    "linuxOperatingSystemProfile": {
    "username": "[parameters('sshUserName')]",
    "password": "[parameters('sshPassword')]"
    }
    }
    },
    {
    "name": "workernode",
    "targetInstanceCount": 1,
    "hardwareProfile": {
    "vmSize": "Standard_D4_v2"
    },
    "osProfile": {
    "linuxOperatingSystemProfile": {
    "username": "[parameters('sshUserName')]",
    "password": "[parameters('sshPassword')]"
    }
    }
    },
    {
    "name": "zookeepernode",
    "minInstanceCount": 1,
    "targetInstanceCount": 3,
    "hardwareProfile": {
    "vmSize": "Medium"
    },
    "osProfile": {
    "linuxOperatingSystemProfile": {
    "username": "[parameters('sshUserName')]",
    "password": "[parameters('sshPassword')]"
    }
    },
    "virtualNetworkProfile": null,
    "scriptActions": []
    },
    {
    "name": "edgenode",
    "minInstanceCount": 1,
    "targetInstanceCount": 1,
    "hardwareProfile": {
    "vmSize": "Standard_D4_V2"
    },
    "osProfile": {
    "linuxOperatingSystemProfile": {
    "username": "[parameters('sshUserName')]",
    "password": "[parameters('sshPassword')]"
    }
    },
    "virtualNetworkProfile": null,
    "scriptActions": []
    }
    ]
    }
    }
    }
    ],
    "outputs": {
    "storage": {
    "type": "object",
    "value": "[reference(resourceId('Microsoft.Storage/storageAccounts', variables('defaultStorageAccount').name))]"
    },
    "cluster": {
    "type": "object",
    "value": "[reference(resourceId('Microsoft.HDInsight/clusters',parameters('clusterName')))]"
    }
    }
    }

    0 comments No comments

  5. Suryanto 16 Reputation points
    2020-10-05T12:20:33.46+00:00

    @MartinJaffer-MSFT , @Suryanto

    Today I tried to deploy again and I did not encounter any problem. The additional R packages can be deployed without error.

    So I guess the back-end team has resolved the issue.

    Thank you very much for your help