3.1.5.1.1.3 Pseudocode for Mapping a Codepage String to a UTF-16 String

  
 COMMENT  This algorithm maps a Unicode string encoded in the specified codepage to UTF-16.
   
 It requires the following externally specified values:
  
 1) CodePage: An integer value to represent an ANSI codepage value.
  
    If CodePage value is CP_ACP (0), use the system default ANSI codepage from 
    the OS.
    If CodePage value is CP_OEMCP (1), use the system default OEM codepage from 
    the OS.2) MultiByteString: A string encoded in ANSI codepage. Every 
    character can be an 8-bit (byte) unsigned value or two 8-bit
    unsigned values.
  
 3) MultiByteStringLength: The length in bytes, including
    the byte for terminating null character. When MultiByteStringLength is 0,
    the length is decided by counting from the beginning of the string to a
    null character (0x00), including the null character.
  
 4) UnicodeString: A string encoded in UTF-16. Every Unicode code point 
    is an unsigned 16-bit ("WORD") value. Surrogate pair is not 
    supported in this algorithm.
  
 5) UnicodeStringLength: The string length in 16-bit ("WORD") unit for 
    UnicodeString. When UnicodeStringLength is 0, 
    the UnicodeString value will not be used in this algorithm.
    Instead, the length of the result string in UTF-16 will be
    returned.
  
  
 PROCEDURE MultiByteToWideCharFromCodepageDataFile
  
 IF CodePage is CP_ACP THEN
     COMMENT Windows keeps a systemwide value of 
             default ANSI system codepage. It is used to provide a default
     COMMENT system codepage to be used by legacy ANSI application.
             
     SET CodePage to the default ANSI system codepage from Windows. 
             
 ELSE IF CodePage is CP_OEMCP THEN
     COMMENT Windows keeps a systemwide value of 
             default OEM system codepage. It is used to provide a default
     COMMENT system codepage to be used by legacy console application.
             
     SET CodePage to the default OEM system codepage from Windows. 
             
 ENDIF
 
 IF CodePage is CP_UTF8 THEN   
     CALL Utf8ConversionAlgorithm             
     COMMENT For UTF-8 use the algorithm in 3.1.5.1.6
     RETURN
 
 ENDIF
  
 IF MultiByteStringLength is 0 THEN
     COMPUTE UnicodeStringLength as the string length in 8-bit units 
             of MultiByteString as a null-terminated string, including
             terminating null character.
 ENDIF
  
 IF UnicodeStringLength is 0 THEN
     SET IsCountingOnly to True
 ELSE
     SET IsCountingOnly to False
 ENDIF
  
 SET CodePageFileName to the concatenation of 
 CodePage as a string, and ".txt"
  
 OPEN SECTION CodePageInfo where section name is CPINFO from file 
     with the name of CodePageFileName
 COMMENT Read the codepage type.
 COMMENT The value for Single Byte Code Page (SBCS) is 1
 COMMENT The value for Double Byte Code Page (DBCS) is 2
  
 SET CodePageType to CodePageInfo.Field1
 SET DefaultUnicodeChar to CodePageInfo.Field3
  
 OPEN SECTION SingleByteMapping where section name is MBTABLE from file 
     with the name of CodePageFileName
  
 SET MultiByteIndex = 0
 WHILE MultiByteIndex <= to MultiByteStringLength - 1
      SET MultiByteChar = MultiByteString[MultiByteIndex]
      IF CodePageType is 1 THEN
          COMMENT SBCS codepage
          COMMENT Select a record which contains the mapping data
          SELECT MappingData from SingleByteMapping
             where field 1 matches MultiByteChar
          IF MappingData is null THEN
              COMMENT There is no mapping for this single-byte character, use
              COMMENT the default character
              IF IsCountingOnly is False THEN
                  SET MultiByteString[ResultUnicodeLength]
                      to DefaultUnicodeChar
              ENDIF
              INCREMENT ResultMultiByteLength
              INCREMENT MultiByteIndex
              CONTINUE WHILE loop
          ENDIF
          IF IsCountOnly is False THEN
              SET UnicodeString[ResultUnicodeLength]
                    to MappingData.Field2
          ENDIF
          INCREMENT ResultUnicodeLength
     ELSE
          COMMENT DBCS codepage
          COMMENT First, try if this is a single-byte mapping
          SELECT MappingData from SingleByteMapping
             where field 1 matches MultiByteChar
          IF MappingData is not null THEN
              COMMENT This byte is a single-byte character
              IF IsCountOnly is False THEN
                  SET UnicodeString[ResultUnicodeLength]
                      to MappingData.Field2
              ENDIF
              INCREMENT ResultUnicodeLength
          ELSE
              COMMENT Not a single-byte character
              COMMENT Check if this is a valid lead byte for double byte mapping
              OPEN SECTION DBCSRanges
                  where section name is DBCSRANGE from file 
                  with the name of CodePageFileName
  
              COMMENT Read the count of DBCS Range count
              SET DBCSRangeCount to DBCSRanges.Field1
  
              SET ValidDBCS to False
              COMMENT Enumerate through every DBCSRange record to see if
              COMMENT the MultiByteChar is a leading byte
  
              FOR Counter i = 1 to DBCSRangeCount
                  COMMENT Select the current record
                  SELECT DBCSRangeRecord from DBCSRanges
                  SET LeadByteStart to DBCSRangeRecord.Field1
                  SET LeadByteEnd to DBCSRangeRecord.Field2
                  IF MultiByteChar is larger or equal to LeadByteStart AND
                     MultiByteChar is less or equal to LeadByteEnd THEN
                      COMMENT This is a valid lead byte
                      COMMENT Now check if there is a following valid trailing byte
                      SET LeadByteTableCount = MultiByteChar – LeadByteStart
  
                      COMMENT Select the current DBCSTABLE section
                      OPEN SECTION DBCSTableSection from DBCSRanges 
                         where section name is DBCSTABLE
                      COMMMENT Advance to the right DBCSTABLE section
                      FOR LeadByteIndex = 0 to LeadByteTableCount
                          ADVANCE SECTION DBCSTableSection
                      NEXTFOR
                      COMMENT Check if the trailing byte is valid
                      IF MultiByteIndex + 1 is less than MultiByteStringLength THEN
                          SET TrailByteChar to MultiByteString[MultiByteIndex + 1]
                          SELECT MappingData FROM DBCSTABLE 
                              Where field 1 matches TrailgByteChar
                          IF MappingData is not null THEN
                              COMMENT Valid trailing byte
                              SET ValidDBCS to True
                              IF IsCountingOnly is FALSE THEN
                                  SET UnicodeString[ResultUnicodeLength] to MappingData.Field2
                              ENDIF
                              INCREMENT ResultUnicodeLength
                              COMMENT Increment the MultiByteIndex. 
                              COMMENT Note that the MultiByteIndex will
                              COMMENT be incremented again for the WHILE loop 
                              INCREMENT MultiByteIndex
                              EXIT FOR loop
                          ENDIF
                      ENDIF
                  ENDIF
              COMMENT No valid lead byte is found. Advance to next record
              ADVANCE DBCSRangeRecord
              NEXTFOR
              IF ValidDBCS is FALSE THEN
                  COMMENT There is no valid leading byte/trailing byte sequence
                  If IsCountingOnly is FALSE THEN
                      SET UnicodeString[ResultUnicodeLength] to DefaultUnicodeChar
                  ENDIF
                  INCREMENT MultiByteIndex
                  INCREMENT ResultUnicodeLength
              ENDIF
          ENDIF    
     ENDIF
     INCREMENT MultiByteIndex
 ENDWHILE
  
 RETURN ResultMultiByteLength as a 32-bit unsigned integer