Working with UNICODE

From NetXMS Wiki
Jump to: navigation, search

UNICODE build

Windows build is always UNICODE. To enable UNICODE build on Linux/UNIX, add --enable-unicode option to configure script.


Data types and Macros

Name Description Defined when...
WCHAR Wide character type, usually wchar_t. Always.
TCHAR Build-dependent character type. Expands to char for non-UNICODE build and WCHAR for UNICODE build. Always.
UCS2CHAR UCS-2 character type. Expands to wchar_t on platforms where sizeof(wchar_t) is 2, and to unsigned short otherwise. Always.
_T(text) Define text constant as either naroow or wide character string. Always.
UNICODE Defined if build is UNICODE-enabled. UNICODE build enabled.
UNICODE_UCS2 Defined if system natively uses UCS-2 as wide character set. Always.
UNICODE_UCS4 Defined if system natively uses UCS-4 as wide character set. Always.


Conversion Functions

Basic conversion functions are WideCharToMultiByte and MultiByteToWideChar. It's a standard API functions on Windows: http://msdn.microsoft.com/en-us/library/windows/desktop/dd374130%28v=vs.85%29.aspx, http://msdn.microsoft.com/en-us/library/windows/desktop/dd319072%28v=vs.85%29.aspx. On other platforms, they are implemented in libnetxms. Typical usage example:

   pRequest->GetVariableBinary(VID_SHARED_SECRET, (BYTE *)szSecret, MD5_DIGEST_SIZE);
#ifdef UNICODE
   {
      char sharedSecret[256];
      WideCharToMultiByte(CP_ACP, WC_COMPOSITECHECK | WC_DEFAULTCHAR, g_szSharedSecret, -1, sharedSecret, 256, NULL, NULL);
      sharedSecret[255] = 0;
      CalculateMD5Hash((BYTE *)sharedSecret, strlen(sharedSecret), hash);
   }
#else
   CalculateMD5Hash((BYTE *)g_szSharedSecret, strlen(g_szSharedSecret), hash);
#endif
#ifdef UNICODE
   MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, optarg, -1, g_windowsEventSourceName, MAX_PATH);
   g_windowsEventSourceName[MAX_PATH - 1] = 0;
#else
   nx_strncpy(g_windowsEventSourceName, optarg, MAX_PATH);
#endif

There are also various helper functions to simplify conversion:

/* Create wide character string from multibyte string using current code page */
WCHAR *WideStringFromMBString(const char *pszString);
 
/* Create wide character string from UTF-8 string */
WCHAR *WideStringFromUTF8String(const char *pszString);
 
/* Create multibyte string from wide character string using current code page */
char *MBStringFromWideString(const WCHAR *pwszString);
 
/* Create UTF-8 string from wide character string */
char *UTF8StringFromWideString(const WCHAR *pwszString);

All these functions returns dynamically allocated strings which must be destroyed by the caller using free function.


Important Notes

Value of sizeof(WCHAR) may vary between platforms. On Windows it always 2 bytes (UCS-2), on Linux it's usually 4 bytes (UCS-4). Text data in NXCP always encoded in UCS-2. Usually CSCPMessage class do all necessary conversions, but if you need to convert between UCS2 and UCS4 there are special functions for that:

size_t ucs2_to_ucs4(const UCS2CHAR *src, int srcLen, WCHAR *dst, int dstLen);
size_t ucs4_to_ucs2(const WCHAR *src, int srcLen, UCS2CHAR *dst, int dstLen);
UCS2CHAR *UCS2StringFromUCS4String(const WCHAR *pwszString);
WCHAR *UCS4StringFromUCS2String(const UCS2CHAR *pszString);

Functions UCS2StringFromUCS4String and UCS4StringFromUCS2String returns dynamically allocated strings which must be destroyed by the caller using free function.

If you need to work with UCS-2 text on UCS-4 platform, there are some helper functions:

/* Length of UCS-2 string in caracters */
int ucs2_strlen(const UCS2CHAR *pStr);
 
/* strncpy for UCS-2 strings */
UCS2CHAR *ucs2_strncpy(UCS2CHAR *pDst, const UCS2CHAR *pSrc, int nDstLen);
 
/* strdup for UCS-2 strings */
UCS2CHAR *ucs2__tcsdup(const UCS2CHAR *pStr);
 
/* Convert UVS-2 string to UTF-8 string */
size_t ucs2_to_utf8(const UCS2CHAR *src, int srcLen, char *dst, int dstLen);
 
/* Convert UCS-2 string to multibyte string (using current code page) */
size_t ucs2_to_mb(const UCS2CHAR *src, int srcLen, char *dst, int dstLen);
 
/* Convert multibyte string to UCS-2 string (using current code page) */
size_t mb_to_ucs2(const char *src, int srcLen, UCS2CHAR *dst, int dstLen);
 
/* Convert multibyte string to UCS-2 string (using current code page) */
UCS2CHAR *UCS2StringFromMBString(const char *pszString);
 
/* Convert UCS-2 string to multibyte string (using current code page) */
char *MBStringFromUCS2String(const UCS2CHAR *pszString);

Functions UCS2StringFromMBString and MBStringFromUCS2String returns dynamically allocated strings which must be destroyed by the caller using free function.