Azure Datalake query accelaration error XML specified is not syntactically valid.
RequestId:e3204a59-901e-005f-7612-d2ee5f000000
Time:2021-11-05T06:58:32.8652708Z
Status: 400 (XML specified is not syntactically valid.

Question

I am using query acceleration to access and filter data lake- https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-query-acceleration-how-to?tabs=dotnet%2Cazure-cli

I followed the same steps as per the document to the letter.
but getting exception when i run the query. I am using .net core application.

Error:

{"XML specified is not syntactically valid. RequestId:e3204a59-901e-005f-7612-d2ee5f000000 Time:2021-11-05T06:58:32.8652708Z Status: 400 (XML specified is not syntactically valid.) ErrorCode: InvalidXmlDocument Content: InvalidXmlDocumentXML specified is not syntactically valid. RequestId:e3204a59-901e-005f-7612-d2ee5f000000 Time:2021-11-05T06:58:32.8652708Z Headers: Server: Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0 x-ms-error-code: InvalidXmlDocument x-ms-request-id: e3204a59-901e-005f-7612-d2ee5f000000 x-ms-version: 2020-10-02 x-ms-client-request-id: 159c9d5a-2624-4be4-b0ae-f6fbfb3cda5b Date: Fri, 05 Nov 2021 06:58:32 GMT Content-Length: 229 Content-Type: application/xml "}

Code:
static async Task QueryHemingway(BlockBlobClient blob)
{
string query = @"SELECT * FROM BlobStorage WHERE _3 = 'Hemingway, Ernest, 1899-1961'";
await DumpQueryCsv(blob, query, false);
}

private static async Task DumpQueryCsv(BlockBlobClient blob, string query, bool headers)
{
try
{
var options = new BlobQueryOptions() {
InputTextConfiguration = new BlobQueryCsvTextOptions() { HasHeaders = headers },
OutputTextConfiguration = new BlobQueryCsvTextOptions() { HasHeaders = true },
ProgressHandler = new Progress((finishedBytes) => Console.Error.WriteLine($"Data read: {finishedBytes}"))
};
options.ErrorHandler += (BlobQueryError err) => {
Console.ForegroundColor = ConsoleColor.Red;
Console.Error.WriteLine($"Error: {err.Position}:{err.Name}:{err.Description}");
Console.ResetColor();
};
// BlobDownloadInfo exposes a Stream that will make results available when received rather than blocking for the entire response.
using (var reader = new StreamReader((await blob.QueryAsync(
query,
options)).Value.Content))
{
using (var parser = new CsvReader(reader, new CsvConfiguration(CultureInfo.CurrentCulture, hasHeaderRecord: true) { HasHeaderRecord = true }))
{
while (await parser.ReadAsync())
{
Console.Out.WriteLine(String.Join(" ", parser.Parser.Record));
}
}
}
}
catch (Exception ex)
{
Console.Error.WriteLine("Exception: " + ex.ToString());
}
}

Accepted Answer

So it turned out azure blob new nuget package version implemented a new branch which required RecordSeparator. Hope microsoft does a better job posting these breaking changes in article or documenting it somewhere:

InputTextConfiguration = new BlobQueryJsonTextOptions() {RecordSeparator = " "},
OutputTextConfiguration = new BlobQueryJsonTextOptions() { RecordSeparator = " " }

Answer

I am using Azure.Storage.Blobs for .Net version 12.10.0 and BlobQueryCsvTextOptions.
I have to set all options, even the optional nullable to avoid getting this error.

InputTextConfiguration = new BlobQueryCsvTextOptions()
                {
                    HasHeaders = true,
                    RecordSeparator = recordSeparator,
                    ColumnSeparator = delimiter,
                    EscapeCharacter = '\',
                    QuotationCharacter = '"',
                },

Azure Datalake query accelaration error XML specified is not syntactically valid.\nRequestId:e3204a59-901e-005f-7612-d2ee5f000000\nTime:2021-11-05T06:58:32.8652708Z\r\nStatus: 400 (XML specified is not syntactically valid.

1 additional answer