Working with UNICODE

From NetXMS Wiki
Jump to navigation Jump to search
This Wiki is deprecated and we are are currrently migrating remaining pages into product documentation (Admin Guide, NXSL Guide)

= UNICODE build =

Windows build is always UNICODE. To enable UNICODE build on Linux/UNIX, add --enable-unicode option to configure script.


Data types and Macros

Name Description Defined when...
WCHAR Wide character type, usually wchar_t. Always.
TCHAR Build-dependent character type. Expands to char for non-UNICODE build and WCHAR for UNICODE build. Always.
UCS2CHAR UCS-2 character type. Expands to wchar_t on platforms where sizeof(wchar_t) is 2, and to unsigned short otherwise. Always.
_T(text) Define text constant as either naroow or wide character string. Always.
UNICODE Defined if build is UNICODE-enabled. UNICODE build enabled.
UNICODE_UCS2 Defined if system natively uses UCS-2 as wide character set. Always.
UNICODE_UCS4 Defined if system natively uses UCS-4 as wide character set. Always.


Conversion Functions

Basic conversion functions are WideCharToMultiByte and MultiByteToWideChar. It's a standard API functions on Windows: http://msdn.microsoft.com/en-us/library/windows/desktop/dd374130%28v=vs.85%29.aspx, http://msdn.microsoft.com/en-us/library/windows/desktop/dd319072%28v=vs.85%29.aspx. On other platforms, they are implemented in libnetxms. Typical usage example:

   pRequest->GetVariableBinary(VID_SHARED_SECRET, (BYTE *)szSecret, MD5_DIGEST_SIZE);
#ifdef UNICODE
   {
      char sharedSecret[256];
      WideCharToMultiByte(CP_ACP, WC_COMPOSITECHECK | WC_DEFAULTCHAR, g_szSharedSecret, -1, sharedSecret, 256, NULL, NULL);
      sharedSecret[255] = 0;
      CalculateMD5Hash((BYTE *)sharedSecret, strlen(sharedSecret), hash);
   }
#else
   CalculateMD5Hash((BYTE *)g_szSharedSecret, strlen(g_szSharedSecret), hash);
#endif
#ifdef UNICODE
   MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, optarg, -1, g_windowsEventSourceName, MAX_PATH);
   g_windowsEventSourceName[MAX_PATH - 1] = 0;
#else
   nx_strncpy(g_windowsEventSourceName, optarg, MAX_PATH);
#endif

There are also various helper functions to simplify conversion:

/* Create wide character string from multibyte string using current code page */
WCHAR *WideStringFromMBString(const char *pszString);

/* Create wide character string from UTF-8 string */
WCHAR *WideStringFromUTF8String(const char *pszString);

/* Create multibyte string from wide character string using current code page */
char *MBStringFromWideString(const WCHAR *pwszString);

/* Create UTF-8 string from wide character string */
char *UTF8StringFromWideString(const WCHAR *pwszString);

All these functions returns dynamically allocated strings which must be destroyed by the caller using free function.


Important Notes

Value of sizeof(WCHAR) may vary between platforms. On Windows it always 2 bytes (UCS-2), on Linux it's usually 4 bytes (UCS-4). Text data in NXCP always encoded in UCS-2. Usually CSCPMessage class do all necessary conversions, but if you need to convert between UCS2 and UCS4 there are special functions for that:

size_t ucs2_to_ucs4(const UCS2CHAR *src, int srcLen, WCHAR *dst, int dstLen);
size_t ucs4_to_ucs2(const WCHAR *src, int srcLen, UCS2CHAR *dst, int dstLen);
UCS2CHAR *UCS2StringFromUCS4String(const WCHAR *pwszString);
WCHAR *UCS4StringFromUCS2String(const UCS2CHAR *pszString);

Functions UCS2StringFromUCS4String and UCS4StringFromUCS2String returns dynamically allocated strings which must be destroyed by the caller using free function.

If you need to work with UCS-2 text on UCS-4 platform, there are some helper functions:

/* Length of UCS-2 string in caracters */
int ucs2_strlen(const UCS2CHAR *pStr);

/* strncpy for UCS-2 strings */
UCS2CHAR *ucs2_strncpy(UCS2CHAR *pDst, const UCS2CHAR *pSrc, int nDstLen);

/* strdup for UCS-2 strings */
UCS2CHAR *ucs2__tcsdup(const UCS2CHAR *pStr);

/* Convert UVS-2 string to UTF-8 string */
size_t ucs2_to_utf8(const UCS2CHAR *src, int srcLen, char *dst, int dstLen);

/* Convert UCS-2 string to multibyte string (using current code page) */
size_t ucs2_to_mb(const UCS2CHAR *src, int srcLen, char *dst, int dstLen);

/* Convert multibyte string to UCS-2 string (using current code page) */
size_t mb_to_ucs2(const char *src, int srcLen, UCS2CHAR *dst, int dstLen);

/* Convert multibyte string to UCS-2 string (using current code page) */
UCS2CHAR *UCS2StringFromMBString(const char *pszString);

/* Convert UCS-2 string to multibyte string (using current code page) */
char *MBStringFromUCS2String(const UCS2CHAR *pszString);

Functions UCS2StringFromMBString and MBStringFromUCS2String returns dynamically allocated strings which must be destroyed by the caller using free function.