question

DavidMorris-4813 avatar image
0 Votes"
DavidMorris-4813 asked YutongTie-MSFT commented

Clean Data Error

I'm getting an error trying to clean a data module in Azure Machine Learning Studio (classic).

133292-snag-23559db.png

Below is the full log. I've tried logging out and back into my Azure account to no avail. Is there something wrong with my credentials?

Record Starts at UTC 09/18/2021 02:19:08:

Run the job:"/dll "Microsoft.Analytics.Modules.CleanMissingData.Dll, Version=6.0.0.0, Culture=neutral, PublicKeyToken=69c3241e6f0468ca;Microsoft.Analytics.Modules.CleanMissingData.Dll.CleanMissingData;Run" /Output0 "....\Cleaned dataset\Cleaned dataset.dataset" /Output1 "....\Cleaning transformation\Cleaning transformation.itransform" /inputData "....\Dataset\Dataset.csv" /columnsToClean "%7B%22isFilter%22%3Atrue%2C%22rules%22%3A%5B%7B%22ruleType%22%3A%22ColumnNames%22%2C%22columns%22%3A%5B%22symboling%22%5D%2C%22exclude%22%3Afalse%7D%5D%7D" /minRatio "0" /maxRatio "1" /cleaningMode "Replace with mean" /colsWithAllMissing "Remove" /indicatorColumns "False" /ContextFile "...._context\ContextFile.txt""
[Start] Program::Main
[Start] DataLabModuleDescriptionParser::ParseModuleDescriptionString
[Stop] DataLabModuleDescriptionParser::ParseModuleDescriptionString. Duration = 00:00:00.0024833
[Start] DllModuleMethod::DllModuleMethod
[Stop] DllModuleMethod::DllModuleMethod. Duration = 00:00:00.0000264
[Start] DllModuleMethod::Execute
[Start] DataLabModuleBinder::BindModuleMethod
[Verbose] moduleMethodDescription Microsoft.Analytics.Modules.CleanMissingData.Dll, Version=6.0.0.0, Culture=neutral, PublicKeyToken=69c3241e6f0468ca;Microsoft.Analytics.Modules.CleanMissingData.Dll.CleanMissingData;Run
[Verbose] assemblyFullName Microsoft.Analytics.Modules.CleanMissingData.Dll, Version=6.0.0.0, Culture=neutral, PublicKeyToken=69c3241e6f0468ca
[Start] DataLabModuleBinder::LoadModuleAssembly
[Verbose] Loaded moduleAssembly Microsoft.Analytics.Modules.CleanMissingData.Dll, Version=6.0.0.0, Culture=neutral, PublicKeyToken=69c3241e6f0468ca
[Stop] DataLabModuleBinder::LoadModuleAssembly. Duration = 00:00:00.0090461
[Verbose] moduleTypeName Microsoft.Analytics.Modules.CleanMissingData.Dll.CleanMissingData
[Verbose] moduleMethodName Run
[Information] Module FriendlyName : Clean Missing Data
[Information] Module Release Status : Release
[Stop] DataLabModuleBinder::BindModuleMethod. Duration = 00:00:00.0102317
[Start] ParameterArgumentBinder::InitializeParameterValues
[Verbose] parameterInfos count = 10
[Verbose] parameterInfos[0] name = inputData , type = Microsoft.Numerics.Data.Local.DataTable
[Start] DataTableCsvHandler::HandleArgumentString
[Stop] DataTableCsvHandler::HandleArgumentString. Duration = 00:00:00.1604789
[Verbose] parameterInfos[1] name = columnsToClean , type = Microsoft.Analytics.Modules.Common.Dll.ColumnSelection
[Verbose] parameterInfos[2] name = minRatio , type = System.Double
[Verbose] Converted string '0' to value of type System.Double
[Verbose] parameterInfos[3] name = maxRatio , type = System.Double
[Verbose] Converted string '1' to value of type System.Double
[Verbose] parameterInfos[4] name = cleaningMode , type = Microsoft.Analytics.Modules.CleanMissingData.Dll.CleanMissingData+CleanMissingDataHandlingPolicy
[Verbose] Converted string 'Replace with mean' to enum of type Microsoft.Analytics.Modules.CleanMissingData.Dll.CleanMissingData+CleanMissingDataHandlingPolicy
[Verbose] parameterInfos[5] name = replacementValue , type = System.String
[Verbose] Set optional parameter replacementValue value to NULL
[Verbose] parameterInfos[6] name = colsWithAllMissing , type = Microsoft.Analytics.Modules.CleanMissingData.Dll.CleanMissingData+ColumnsWithAllValuesMissing
[Verbose] Converted string 'Remove' to enum of type Microsoft.Analytics.Modules.CleanMissingData.Dll.CleanMissingData+ColumnsWithAllValuesMissing
[Verbose] parameterInfos[7] name = indicatorColumns , type = System.Boolean
[Verbose] Converted string 'False' to value of type System.Boolean
[Verbose] parameterInfos[8] name = iterations , type = System.Int32
[Verbose] Set optional parameter iterations value to NULL
[Verbose] parameterInfos[9] name = iterationsPCA , type = System.Int32
[Verbose] Set optional parameter iterationsPCA value to NULL
[Stop] ParameterArgumentBinder::InitializeParameterValues. Duration = 00:00:00.4304001
[Verbose] Begin invoking method Run ...
[Verbose] End invoking method Run
[Start] DataLabOutputManager::ManageModuleReturnValue
[Verbose] moduleReturnType = System.Tuple`2[T1,T2]
[Start] DataLabOutputManager::ConvertTupleOutputToFiles
[Verbose] tupleType = System.Tuple`2[Microsoft.Numerics.Data.Local.DataTable,Microsoft.Analytics.MachineLearning.ITransform`2[Microsoft.Numerics.Data.Local.DataTable,Microsoft.Numerics.Data.Local.DataTable]]
[Verbose] outputName Output0
[Start] DataTableDatasetHandler::HandleOutput
[Start] SidecarFiles::CreateVisualizationFiles
[Information] Creating Cleaned dataset.visualization with key visualization...
[Stop] SidecarFiles::CreateVisualizationFiles. Duration = 00:00:00.0704111
[Start] SidecarFiles::CreateDatatableSchemaFile
[Information] SidecarFiles::CreateDatatableSchemaFile creating "....\Cleaned dataset\Cleaned dataset.schema"
[Stop] SidecarFiles::CreateDatatableSchemaFile. Duration = 00:00:00.0071185
[Start] SidecarFiles::CreateMetadataFile
[Information] SidecarFiles::CreateMetadataFile creating "....\Cleaned dataset\Cleaned dataset.metadata"
[Stop] SidecarFiles::CreateMetadataFile. Duration = 00:00:00.0019639
[Stop] DataTableDatasetHandler::HandleOutput. Duration = 00:00:00.1898522
[Verbose] outputName Output1
[Start] CustomSerializationHandler::HandleOutput
[Start] DotNetSerializationHandler::HandleOutput
[Start] SidecarFiles::CreateRuntimeInfoFile
[Information] SidecarFiles::CreateRuntimeInfoFile creating "....\Cleaning transformation\Cleaning transformation.runtimeinfo"
[Information] SidecarFileWritter::WriteRuntimeInfoToFile setting Language info for "Microsoft.Analytics.Modules.CleanMissingData.Dll.CleaningMVTransform"
[ModuleOutput] SidecarFileWritter::WriteRuntimeInfoToFile setting Language info for "Microsoft.Analytics.Modules.CleanMissingData.Dll.CleaningMVTransform"
[ModuleOutput] Setting Languge to DotNet.
[Stop] SidecarFiles::CreateRuntimeInfoFile. Duration = 00:00:00.0016640
[Start] SidecarFiles::CreateMetadataFile
[Information] SidecarFiles::CreateMetadataFile creating "....\Cleaning transformation\Cleaning transformation.metadata"
[Stop] SidecarFiles::CreateMetadataFile. Duration = 00:00:00.0003313
[Stop] DotNetSerializationHandler::HandleOutput. Duration = 00:00:00.0045472
[Stop] CustomSerializationHandler::HandleOutput. Duration = 00:00:00.0050090
[Stop] DataLabOutputManager::ConvertTupleOutputToFiles. Duration = 00:00:00.2003385
[Stop] DataLabOutputManager::ManageModuleReturnValue. Duration = 00:00:00.2017888
[Verbose] {"InputParameters":{"DataTable":[{"Rows":205,"Columns":26,"estimatedSize":12574720,"ColumnTypes":{"System.Int32":5,"System.Nullable`1[System.Int32]":4,"System.String":10,"System.Double":5,"System.Nullable`1[System.Double]":2},"IsComplete":true,"Statistics":{"0":[0.8341463414634146,1.0,-2.0,3.0,1.2453068281055315,6.0,0.0],"1":[122.0,115.0,65.0,256.0,35.442167530553256,51.0,41.0],"2":[22,0],"3":[2,0],"4":[2,0],"5":[2,2],"6":[5,0],"7":[3,0],"8":[2,0],"9":[98.756585365853581,97.0,86.6,120.9,6.0217756850255721,53.0,0.0],"10":[174.04926829268285,173.2,141.1,208.1,12.337288526555183,75.0,0.0],"11":[65.907804878048722,65.5,60.3,72.3,2.1452038526871831,44.0,0.0],"12":[53.724878048780596,54.1,47.8,59.8,2.4435219699049036,49.0,0.0],"13":[2555.5658536585365,2414.0,1488.0,4066.0,520.68020350163874,171.0,0.0],"14":[7,0],"15":[7,0],"16":[126.90731707317073,120.0,61.0,326.0,41.642693438179847,44.0,0.0],"17":[8,0],"18":[3.3297512437810943,3.31,2.54,3.94,0.27353873182959904,38.0,4.0],"19":[3.2554228855721377,3.29,2.07,4.17,0.31671745337703111,36.0,4.0],"20":[10.142536585365862,9.0,7.0,23.0,3.9720403218632976,32.0,0.0],"21":[104.25615763546799,95.0,48.0,288.0,39.714368786793578,59.0,2.0],"22":[5125.3694581280788,5200.0,4150.0,6600.0,479.33455983341668,23.0,2.0],"23":[25.219512195121951,24.0,13.0,49.0,6.5421416530016216,29.0,0.0],"24":[30.751219512195121,30.0,16.0,54.0,6.886443130941827,30.0,0.0],"25":[13207.129353233831,10295.0,5118.0,45400.0,7947.066341939274,186.0,4.0]}}],"Generic":{"columnsToClean":"{\"isFilter\":true,\"rules\":[{\"ruleType\":\"ColumnNames\",\"columns\":[\"symboling\"],\"exclude\":false}]}","minRatio":0.0,"maxRatio":1.0,"cleaningMode":"ReplaceWithMean","colsWithAllMissing":"Remove","indicatorColumns":false}},"OutputParameters":["Parameter with no known logging method, Microsoft.Analytics.Modules.CleanMissingData.Dll.CleaningMVTransform",{"Rows":205,"Columns":26,"estimatedSize":0,"ColumnTypes":{"System.Int32":5,"System.Nullable`1[System.Int32]":4,"System.String":10,"System.Double":5,"System.Nullable`1[System.Double]":2},"IsComplete":true,"Statistics":{"0":[0.8341463414634146,1.0,-2.0,3.0,1.2453068281055315,6.0,0.0],"1":[122.0,115.0,65.0,256.0,35.442167530553256,51.0,41.0],"2":[22,0],"3":[2,0],"4":[2,0],"5":[2,2],"6":[5,0],"7":[3,0],"8":[2,0],"9":[98.756585365853581,97.0,86.6,120.9,6.0217756850255721,53.0,0.0],"10":[174.04926829268285,173.2,141.1,208.1,12.337288526555183,75.0,0.0],"11":[65.907804878048722,65.5,60.3,72.3,2.1452038526871831,44.0,0.0],"12":[53.724878048780596,54.1,47.8,59.8,2.4435219699049036,49.0,0.0],"13":[2555.5658536585365,2414.0,1488.0,4066.0,520.68020350163874,171.0,0.0],"14":[7,0],"15":[7,0],"16":[126.90731707317073,120.0,61.0,326.0,41.642693438179847,44.0,0.0],"17":[8,0],"18":[3.3297512437810943,3.31,2.54,3.94,0.27353873182959904,38.0,4.0],"19":[3.2554228855721377,3.29,2.07,4.17,0.31671745337703111,36.0,4.0],"20":[10.142536585365862,9.0,7.0,23.0,3.9720403218632976,32.0,0.0],"21":[104.25615763546799,95.0,48.0,288.0,39.714368786793578,59.0,2.0],"22":[5125.3694581280788,5200.0,4150.0,6600.0,479.33455983341668,23.0,2.0],"23":[25.219512195121951,24.0,13.0,49.0,6.5421416530016216,29.0,0.0],"24":[30.751219512195121,30.0,16.0,54.0,6.886443130941827,30.0,0.0],"25":[13207.129353233831,10295.0,5118.0,45400.0,7947.066341939274,186.0,4.0]}}],"ModuleType":"Microsoft.Analytics.Modules.CleanMissingData.Dll","ModuleVersion":" Version=6.0.0.0","AdditionalModuleInfo":"Microsoft.Analytics.Modules.CleanMissingData.Dll, Version=6.0.0.0, Culture=neutral, PublicKeyToken=69c3241e6f0468ca;Microsoft.Analytics.Modules.CleanMissingData.Dll.CleanMissingData;Run","Errors":"","Warnings":[],"Duration":"00:00:00.7323002"}
[Stop] DllModuleMethod::Execute. Duration = 00:00:00.7557727
[Stop] Program::Main. Duration = 00:00:00.8973350
Module finished after a runtime of 00:00:00.9843864 with exit code 0
Execution failed due to exception:taskStatusCode=400. Failed to upload W:\jw\e\Cleaned dataset\Cleaned dataset.dataset to Uri experimentoutput/4a90c8cd-cc1d-4de0-97b2-aebb1285e140/4a90c8cd-cc1d-4de0-97b2-aebb1285e140.dataset. This is an Azure storage request failure with status code 404. The request id is 5052585f-501e-000b-2933-acd43e000000 and the error message is The specified resource does not exist.. Possible reasons for such failure: (1) Invalid storage account or credential (2) Invalid SAS token (3) Concurrent jobs trying to upload files to the same blob at the same time. If those are not your case, please consider it as a transient error and retry.

Record Ends at UTC 09/18/2021 02:19:09.


azure-machine-learning
snag-23559db.png (255.1 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

YutongTie-MSFT avatar image
0 Votes"
YutongTie-MSFT answered

Hello,

Thanks for reaching to us about this issue. I have the same issue because of my bad setting for cleaning mode.

For replace with mean:
Calculates the column mean and uses the mean as the replacement value for each missing value in the column.

Applies only to columns that have Integer, Double, or Boolean data types. See the Technical Notes section for more information.
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-missing-data#bkmk_TechNotes

You may contain other types of data.

My solution is I used "MICE" instead. Could you please check on your data types or change your mode?

Hope this helps.

Regards,
Yutong

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

DavidMorris-4813 avatar image
0 Votes"
DavidMorris-4813 answered

Thank you YutongTie-MSFT but unfortunately that does not appear to be the case. Several new experiment attempts have been failing me lately and given the 400 code errors I'm noticing in the logs, I suspect the issue is on Microsoft's side.

To verify further, here I run the experiment again with only one numerical column selected:

![133701-image.png

======

133550-snag-b68567.png

======

![133721-image.png

======

133670-image.png

======

Mind you, I'm using the basic Automobile price data (Raw) that appears under Sample Datasets. In review of my log file, I suspect this to be potentially the most relevant piece (emphasis mine):


Execution failed due to exception:taskStatusCode=400. Failed to upload [...].dataset to Uri [...].dataset. This is an Azure storage request failure with status code 404. The request id is [...] and the error message is The specified resource does not exist.. Possible reasons for such failure: (1) Invalid storage account or credential (2) Invalid SAS token (3) Concurrent jobs trying to upload files to the same blob at the same time. If those are not your case, please consider it as a transient error and retry.


Questions for Follow Up:

  • Can you run my experiment and verify it for you?

  • If functional, how do I check "possible reasons" (1), (2), and (3)?

  • If not, how do we escalate the issue?


image.png (271.8 KiB)
snag-b68567.png (210.9 KiB)
image.png (168.8 KiB)
image.png (196.4 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

YutongTie-MSFT avatar image
0 Votes"
YutongTie-MSFT answered YutongTie-MSFT commented

Hello,

I followed your settings but everything seems work well for me as below:
Replace by mean:
133715-image.png

MICE:
133695-image.png

Could you please rebuild one to try? I am in South Central US.
133716-image.png

Please let me know if you still have this issue.

Regards,
Yutong



image.png (89.2 KiB)
image.png (93.7 KiB)
image.png (13.4 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you, however, as displayed in my fourth screenshot, MICE did nothing for me. I am also using South Central. Based on the follow-up questions I listed, can you offer additional guidance?

0 Votes 0 ·

@DavidMorris-4813 Hello both MICE and Mean work for me. Let's bring this to support in this situation. Please let me know your Azure subscription ID if you have no support plan. I can enable you a one time free ticket for this issue.

Regards,
Yutong

0 Votes 0 ·