c++ - Convert byte array from utf-16 to utf-8 -
i have byte array
uint8_t array[] = {0x00, 0x72, 0x00, 0x6f, 0x00, 0x6f, 0x00, 0x74};
i know, in text "root"; have function should convert utf-16 utf-8. here code:
inline bool convertucs2toutf8(const std::vector<char> &from, std::string* const to) { return ucnvconvert("utf-16", "utf-8", from, to); } static inline bool ucnvconvert(const char *enc_from, const char *enc_to, const std::vector<char> &from, std::string* const to) { if (from.empty()) { to->clear(); return true; } unsigned int maxoutsize = from.size() * 3 + 1; std::vector<char> outbuf(maxoutsize); iconv_t c = iconv_open(enc_to, enc_from); assert_msg(c != null, "convert: illegal encodings"); char *from_ptr = const_cast<char*>(from.data()); char *to_ptr = &outbuf[0]; size_t inleft = from.size(), outleft = maxoutsize; size_t n = iconv(c, &from_ptr, &inleft, &to_ptr, &outleft); bool success = true; if (n == (size_t)-1) { success = false; if (errno == e2big) { elog("convert: insufficient space from"); } else if (errno == eilseq) { elog("convert: invalid input sequence"); } else if (errno == einval) { elog("convert: incomplete input sequence"); } } if (success) { to->assign(&outbuf[0], maxoutsize - outleft); } iconv_close(c); return success; }
it works great cyrillic (it begins 0x04), when try put array in it, :
爀漀漀琀开㌀㜀
and on... what's wrong here ?
byte order must specified utf-16 input. since passing utf16-be
(big-endian) encoded buffer, should prefix appropriate byte-order-mark:
uint8_t array[] = { 0xfe, 0xff, 0x00, 0x72, 0x00, 0x6f, 0x00, 0x6f, 0x00, 0x74 };
but produce utf-8 output byte order mark might not want. effective way specify endianness way:
ucnvconvert("utf-16be", "utf-8", from, to);
Comments
Post a Comment