您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

数据格式Data Format

数据文件描述要建立索引的对象的列表。The data file describes the list of objects to index. 文件中的每一行都使用 UTF-8 编码指定 JSON 格式的对象的属性值。Each line in the file specifies the attribute values of an object in JSON format with UTF-8 encoding. 除了在模式中定义的属性之外,每个对象都有一个可选的“logprob”属性,用于指定对象之间的相对对数概率。In addition to the attributes defined in the schema, each object has an optional "logprob" attribute that specifies the relative log probability among the objects. 当服务按概率递减的顺序返回对象时,我们可以使用“logprob”来指示匹配对象的返回顺序。When the service returns objects in order of decreasing probability, we can use "logprob" to indicate the return order of matching objects. 假定概率 p 介于 0 和 1 之间,相应的对数概率可以计算为 log(p),其中 log() 是自然对数函数。Given a probability p between 0 and 1, the corresponding log probability can be computed as log(p), where log() is the natural log function. 如果没有为 logprob 指定值,则使用默认值 0。When no value is specified for logprob, the default value 0 is used.

{"logprob":-5.3, "Title":"latent dirichlet allocation", "Year":2003, "Author":{"Name":"david m blei", "Affiliation":"uc berkeley"}, "Author":{"Name":"andrew y ng", "Affiliation":"stanford"}, "Author":{"Name":"michael i jordan", "Affiliation":"uc berkeley"}}
{"logprob":-6.1, "Title":"probabilistic latent semantic indexing", "Year":1999, "Author":{"Name":"thomas hofmann", "Affiliation":"uc berkeley"}}
...

对于 String、GUID 和 Blob 属性,该值表示为带引号的 JSON 字符串。For String, GUID, and Blob attributes, the value is represented as a quoted JSON string. 对于数值属性(Int32、Int64、Double),该值表示为 JSON 数字。For numeric attributes (Int32, Int64, Double), the value is represented as a JSON number. 对于复合属性,该值是用于指定子属性值的 JSON 对象。For composite attributes, the value is a JSON object that specifies the sub-attribute values. 为了更快地构建索引,可通过降低对数概率来预先分类对象。For faster index builds, presort the objects by decreasing log probability.

通常,一个属性可能具有 0 个或更多个值。In general, an attribute may have 0 or more values. 如果属性没有任何值,则我们只需将其从 JSON 中删除即可。If an attribute has no value, we simply drop it from the JSON. 如果属性具有 2 个或更多个值,则我们可以在 JSON 对象中重复属性值对。If an attribute has 2 or more values, we can repeat the attribute value pair in the JSON object. 或者,我们可以将包含多个值的 JSON 数组分配给该属性。Alternatively, we can assign a JSON array containing the multiple values to the attribute.

{"logprob":0, "Title":"0 keyword"}
{"logprob":0, "Title":"1 keyword", "Keyword":"foo"}
{"logprob":0, "Title":"2 keywords", "Keyword":"foo", "Keyword":"bar"}
{"logprob":0, "Title":"2 keywords (alt)", "Keyword":["foo", "bar"]}