COMMENT This algorithm maps a Unicode string encoded in the specified codepage to UTF-16.
It requires the following externally specified values:
1) CodePage: An integer value to represent an ANSI codepage value.
If CodePage value is CP_ACP (0), use the system default ANSI codepage from
the OS.
If CodePage value is CP_OEMCP (1), use the system default OEM codepage from
the OS.2) MultiByteString: A string encoded in ANSI codepage. Every
character can be an 8-bit (byte) unsigned value or two 8-bit
unsigned values.
3) MultiByteStringLength: The length in bytes, including
the byte for terminating null character. When MultiByteStringLength is 0,
the length is decided by counting from the beginning of the string to a
null character (0x00), including the null character.
4) UnicodeString: A string encoded in UTF-16. Every Unicode code point
is an unsigned 16-bit ("WORD") value. Surrogate pair is not
supported in this algorithm.
5) UnicodeStringLength: The string length in 16-bit ("WORD") unit for
UnicodeString. When UnicodeStringLength is 0,
the UnicodeString value will not be used in this algorithm.
Instead, the length of the result string in UTF-16 will be
returned.
PROCEDURE MultiByteToWideCharFromCodepageDataFile
IF CodePage is CP_ACP THEN
COMMENT Windows keeps a systemwide value of
default ANSI system codepage. It is used to provide a default
COMMENT system codepage to be used by legacy ANSI application.
SET CodePage to the default ANSI system codepage from Windows.
ELSE IF CodePage is CP_OEMCP THEN
COMMENT Windows keeps a systemwide value of
default OEM system codepage. It is used to provide a default
COMMENT system codepage to be used by legacy console application.
SET CodePage to the default OEM system codepage from Windows.
ENDIF
IF CodePage is CP_UTF8 THEN
CALL Utf8ConversionAlgorithm
COMMENT For UTF-8 use the algorithm in 3.1.5.1.6
RETURN
ENDIF
IF MultiByteStringLength is 0 THEN
COMPUTE UnicodeStringLength as the string length in 8-bit units
of MultiByteString as a null-terminated string, including
terminating null character.
ENDIF
IF UnicodeStringLength is 0 THEN
SET IsCountingOnly to True
ELSE
SET IsCountingOnly to False
ENDIF
SET CodePageFileName to the concatenation of
CodePage as a string, and ".txt"
OPEN SECTION CodePageInfo where section name is CPINFO from file
with the name of CodePageFileName
COMMENT Read the codepage type.
COMMENT The value for Single Byte Code Page (SBCS) is 1
COMMENT The value for Double Byte Code Page (DBCS) is 2
SET CodePageType to CodePageInfo.Field1
SET DefaultUnicodeChar to CodePageInfo.Field3
OPEN SECTION SingleByteMapping where section name is MBTABLE from file
with the name of CodePageFileName
SET MultiByteIndex = 0
WHILE MultiByteIndex <= to MultiByteStringLength - 1
SET MultiByteChar = MultiByteString[MultiByteIndex]
IF CodePageType is 1 THEN
COMMENT SBCS codepage
COMMENT Select a record which contains the mapping data
SELECT MappingData from SingleByteMapping
where field 1 matches MultiByteChar
IF MappingData is null THEN
COMMENT There is no mapping for this single-byte character, use
COMMENT the default character
IF IsCountingOnly is False THEN
SET MultiByteString[ResultUnicodeLength]
to DefaultUnicodeChar
ENDIF
INCREMENT ResultMultiByteLength
INCREMENT MultiByteIndex
CONTINUE WHILE loop
ENDIF
IF IsCountOnly is False THEN
SET UnicodeString[ResultUnicodeLength]
to MappingData.Field2
ENDIF
INCREMENT ResultUnicodeLength
ELSE
COMMENT DBCS codepage
COMMENT First, try if this is a single-byte mapping
SELECT MappingData from SingleByteMapping
where field 1 matches MultiByteChar
IF MappingData is not null THEN
COMMENT This byte is a single-byte character
IF IsCountOnly is False THEN
SET UnicodeString[ResultUnicodeLength]
to MappingData.Field2
ENDIF
INCREMENT ResultUnicodeLength
ELSE
COMMENT Not a single-byte character
COMMENT Check if this is a valid lead byte for double byte mapping
OPEN SECTION DBCSRanges
where section name is DBCSRANGE from file
with the name of CodePageFileName
COMMENT Read the count of DBCS Range count
SET DBCSRangeCount to DBCSRanges.Field1
SET ValidDBCS to False
COMMENT Enumerate through every DBCSRange record to see if
COMMENT the MultiByteChar is a leading byte
FOR Counter i = 1 to DBCSRangeCount
COMMENT Select the current record
SELECT DBCSRangeRecord from DBCSRanges
SET LeadByteStart to DBCSRangeRecord.Field1
SET LeadByteEnd to DBCSRangeRecord.Field2
IF MultiByteChar is larger or equal to LeadByteStart AND
MultiByteChar is less or equal to LeadByteEnd THEN
COMMENT This is a valid lead byte
COMMENT Now check if there is a following valid trailing byte
SET LeadByteTableCount = MultiByteChar – LeadByteStart
COMMENT Select the current DBCSTABLE section
OPEN SECTION DBCSTableSection from DBCSRanges
where section name is DBCSTABLE
COMMMENT Advance to the right DBCSTABLE section
FOR LeadByteIndex = 0 to LeadByteTableCount
ADVANCE SECTION DBCSTableSection
NEXTFOR
COMMENT Check if the trailing byte is valid
IF MultiByteIndex + 1 is less than MultiByteStringLength THEN
SET TrailByteChar to MultiByteString[MultiByteIndex + 1]
SELECT MappingData FROM DBCSTABLE
Where field 1 matches TrailgByteChar
IF MappingData is not null THEN
COMMENT Valid trailing byte
SET ValidDBCS to True
IF IsCountingOnly is FALSE THEN
SET UnicodeString[ResultUnicodeLength] to MappingData.Field2
ENDIF
INCREMENT ResultUnicodeLength
COMMENT Increment the MultiByteIndex.
COMMENT Note that the MultiByteIndex will
COMMENT be incremented again for the WHILE loop
INCREMENT MultiByteIndex
EXIT FOR loop
ENDIF
ENDIF
ENDIF
COMMENT No valid lead byte is found. Advance to next record
ADVANCE DBCSRangeRecord
NEXTFOR
IF ValidDBCS is FALSE THEN
COMMENT There is no valid leading byte/trailing byte sequence
If IsCountingOnly is FALSE THEN
SET UnicodeString[ResultUnicodeLength] to DefaultUnicodeChar
ENDIF
INCREMENT MultiByteIndex
INCREMENT ResultUnicodeLength
ENDIF
ENDIF
ENDIF
INCREMENT MultiByteIndex
ENDWHILE
RETURN ResultMultiByteLength as a 32-bit unsigned integer