question

parvathypriya-1048 avatar image
0 Votes"
parvathypriya-1048 asked SeeyaXi-msft commented

How to get the results in single byte for codepage utf8 using WideCharToMultiByte

In our application, we are reading the XML data stored as WCHAR in MSSQL db and converting it back to char* using WideCharToMultiByte with codepage CP_UTF8. But we are getting the results in double bytes for some characters like below,

0xFA - xC3BA
0xF9 - xC3B9
0xF8 - xC3B8

When we pass [ ‘ú’ ] in the db then also while reading from the MSSQL database are currently interpreted as below,

C3BA 'ú’
C3B9 'ù'
C3B8 'ø'

Is there any way to get the result from DB in single bytes like 0xff instead of xC3BF ?

sql-server-general
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.


C3 BA is a valid UTF-8 representation for 00FA. Why do you need FA, which is not a valid UTF-8? For debugging purposes?


0 Votes 0 ·

Hi @parvathypriya-1048 ,

We have not received a response from you. Did the reply could help you? If the response helped, do "Accept Answer". If it dosn't work, please let us know the progress. By doing so, it will benefit all community members who are having this similar issue. Your contribution is highly appreciated.

0 Votes 0 ·
OlafHelper-2800 avatar image
0 Votes"
OlafHelper-2800 answered

Is there any way to get the result from DB in single bytes like 0xff instead of xC3BF

UTF uses 1-4 bytes to store characters of worldwide languages; how could one convert that to a single byte without loosing data? Not possible.

https://en.wikipedia.org/wiki/UTF-8



5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SeeyaXi-msft avatar image
0 Votes"
SeeyaXi-msft answered SeeyaXi-msft edited

Hi @parvathypriya-1048

Please refer to this: https://www.programmersought.com/article/46484991586/

 std::string WstringToString(){ 
 Wchar_t*unicode = L"Hello, just world.";
     int len;
     len = WideCharToMultiByte(CP_UTF8, 0, unicode, -1, NULL, 0, NULL, NULL);
     char *szUtf8 = (char*)malloc(len + 1);
     memset(szUtf8, 0, len + 1);
     WideCharToMultiByte(CP_OEMCP, 0, (const wchar_t*)unicode, -1, szUtf8, len, NULL, NULL);
     //WideCharToMultiByte(CP_UTF8, 0, (const wchar_t*)unicode, -1, szUtf8, len, NULL, NULL);
     return szUtf8;
     std::string str1(szUtf8);
     return str1;
 }

Best regards,
Seeya


If the response is helpful, please click "Accept Answer" and upvote it, as this could help other community members looking for similar queries.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ErlandSommarskog avatar image
0 Votes"
ErlandSommarskog answered

Is there any way to get the result from DB in single bytes like 0xff instead of xC3BF ?

Yes, convert to another code page than UTF8. 1252 (i.e. Latin-1) should work for those characters. But this also means that if your data includes character not in Latin-1, you will get fallback characters for these. For instance, Chinese characters will all come back as question marks.

But if you go for UTF-8, you will have to accept that it is a variable-length encoding.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.