question

EdwardCoventry-4213 avatar image
0 Votes"
EdwardCoventry-4213 asked KarlOMeara-4628 commented

Translator gives unexpected word alignment for Japanese to English translations

I'm using azure-translator to translate sentences from Japanese to English with word alignment. However the word alignment seems to be incorrect. I don't get any errors, and I get the expected result when I instead translate from English to Japanese.

I have followed this example:
https://docs.microsoft.com/en-us/azure/cognitive-services/translator/word-alignment

When translating "Can I drive your car tomorrow?" from English to Japanese the alignment I get is "0:2-10:14 6:10-8:9 12:15-2:5 12:15-7:7 17:19-6:6 21:29-0:1 21:29-15:15"

('Can', 'できますか') ('drive', '運転') ('your', 'あなたの') ('your', 'を') ('car', '車') ('tomorrow?', '明日') ('tomorrow?', '。')

If I exclude the second item of each duplicate, these are all correct.

However when translating "明日あなたの車を運転できますか?" from Japanese to English I get "0:1-0:2 2:5-4:4 6:6-6:10 7:7-12:15 8:9-17:19 15:15-21:29"

('明日', 'Can') ('あなたの', 'I') ('車', 'drive') ('を', 'your') ('運転', 'car') ('?', 'tomorrow?')

None of these are correct.

Is Japanese to English word alignment expected to be correct, and does it work in the same way as for other languages? Thanks!






azure-translator
· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@EdwardCoventry-4213 Thanks for reporting this issue. Obtaining alignment information is an experimental feature that has been enabled for prototyping research and experiences with potential phrase mappings, we shall report this issue to our team and update this thread with their response. The current limitations with respect to alignment are listed in our documentation here.


0 Votes 0 ·

Hey I have a clue about what is happening. When translating from Japanese to English it appears the alignment indicies have been sorted for both languages before being paired. This shouldn't happen, they should be paired first and only be sorted according to the source language.

Can I drive your car tomorrow?
明日あなたの車を運転できますか?
('明日', 'Can') ('あなたの', 'I') ('車', 'drive') ('を', 'your') ('運転', 'car') ('?', 'tomorrow?')

Notice how the order of the words is unchanged for both English and Japanese.

Is this something I can expect to be fixed, or would I have to wait until api-version 3.1? Thanks!



0 Votes 0 ·

@EdwardCoventry-4213 We can confirm this as a bug and our team is currently working on it. But, we do not have an ETA on its rollout. We will update this thread with more details as soon as the fix is available.

0 Votes 0 ·
Show more comments

1 Answer

KarlOMeara-4628 avatar image
0 Votes"
KarlOMeara-4628 answered

Writing answer due to 1000 char limit on comments...

@romungi-MSFT I can confirm this is also a problem now in Spanish.

This is the alignment information for "Sí, sabes que ya llevo un rato mirándote"


"Translations": [
{
"Text": "Yes, you know I've been looking at you for a while.",
"To": "en",
"Alignment": {
"Proj": "0:2-0:3 0:2-5:7 4:8-9:12 10:12-14:17 14:15-19:22 17:21-24:30 17:21-32:33 23:24-35:37 23:24-39:41 31:39-43:43 31:39-45:50"


Sí, Yes,
Sí, you
sabes know
que I've
ya been
llevo looking
llevo at
un you
un for
mirándote a
mirándote while.

Which is completely wrong now.

Someone, somewhere in the innards of Microsoft has broken it. For last year (on 2020-04-23 09:06:23.700Z) I made a the same call it was working...



"Translations": [
{
"Text": "Yes, you know I've been looking at you for a while.",
"To": "en",
"Alignment": {
"Proj": "0:2-0:3 4:8-9:12 4:8-5:7 17:21-14:17 17:21-19:22 23:24-43:43 26:29-45:50 26:29-39:41 31:39-35:37 31:39-32:33"


Sí, Yes,
sabes know
sabes you
llevo I've
llevo been
un a
rato while.
rato for
mirándote you
mirándote at


Thanks!

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.