C++ char32_t Keyword
The char32_t keyword in C++ is a data type introduced in C++11 for representing 32-bit Unicode characters. It is primarily used for handling UTF-32 encoded text, which assigns a unique 32-bit code point for each Unicode character. The char32_t type provides compatibility with modern Unicode standards and is suitable for applications requiring wide character support.
Strings using char32_t are prefixed with U, and their type is const char32_t*.
Syntax
</>
Copy
char32_t variable_name = U'character';
const char32_t* string_name = U"string";
- char32_t
- The keyword used to declare a variable to store a 32-bit Unicode character.
- variable_name
- The name of the variable that stores the Unicode character.
- U
- A prefix used for UTF-32 encoded strings or characters.
Examples
Example 1: Declaring a UTF-32 Character
This example demonstrates how to declare a char32_t variable and print its value as a Unicode character and integer.
</>
Copy
#include <iostream>
using namespace std;
int main() {
char32_t ch = U'A'; // Declare a UTF-32 character
cout << "Character: " << (char)ch << endl;
cout << "Unicode Value: " << (int)ch << endl;
return 0;
}
Output:
Character: A
Unicode Value: 65
Explanation:
- The
char32_tvariablechis initialized withU'A', representing a UTF-32 character. - The character is cast to
charfor display in the first output line. - The character is cast to
intto display its Unicode value, which is65.
Example 2: Declaring and Printing a UTF-32 String
This example demonstrates how to declare and print a UTF-32 encoded string using char32_t.
</>
Copy
#include <iostream>
#include <string>
#include <codecvt> // For conversion (deprecated in C++17)
#include <locale> // For std::wstring_convert
using namespace std;
int main() {
// UTF-32 encoded string
const char32_t* greeting = U"Hello, UTF-32!";
// Convert to std::u32string
std::u32string utf32_string(greeting);
// Convert UTF-32 to UTF-8 using std::wstring_convert
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> converter;
std::string utf8_string = converter.to_bytes(utf32_string);
// Output the UTF-8 string
cout << "Message: " << utf8_string << endl;
return 0;
}
Output:
Message: Hello, UTF-32!
Explanation:
- UTF-32 String Declaration: The string
greetingis declared asconst char32_t*to represent a UTF-32 encoded literal. - Convert to
std::u32string: Thegreetingpointer is converted to astd::u32stringfor compatibility with conversion utilities. - UTF-32 to UTF-8 Conversion: The
std::wstring_convertclass is used withstd::codecvt_utf8<char32_t>to convert the UTF-32 encoded string to a UTF-8 encodedstd::string. - Output with
std::cout: The UTF-8 encoded string is printed usingstd::cout.
Example 3: Working with Non-ASCII Characters
This example shows how to use char32_t with UTF-32 encoded non-ASCII characters.
</>
Copy
#include <iostream>
#include <string>
#include <codecvt> // For conversion (deprecated in C++17)
#include <locale> // For std::wstring_convert
int main() {
// UTF-32 encoded string (Japanese for "Hello")
const char32_t* japanese = U"こんにちは";
// Convert to std::u32string
std::u32string utf32_string(japanese);
// Convert UTF-32 to UTF-8
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> converter;
std::string utf8_string = converter.to_bytes(utf32_string);
// Print the UTF-8 encoded string
std::cout << "UTF-8 String: " << utf8_string << std::endl;
return 0;
}
Output:
UTF-32 String: こんにちは
Explanation:
- UTF-32 String Declaration: The
japanesestring is a UTF-32 encoded string usingchar32_t*. - Conversion to
std::u32string: The raw UTF-32 pointer is wrapped in astd::u32stringto simplify handling and compatibility with conversion utilities. - UTF-32 to UTF-8 Conversion: The
std::wstring_convertclass is used along withstd::codecvt_utf8<char32_t>to convert the UTF-32 string to a UTF-8 encodedstd::string. - Output with
std::cout: The UTF-8 encoded string is printed usingstd::cout, which is compatible with UTF-8.
Key Points about char32_t Keyword
char32_tis a 32-bit data type introduced in C++11 for handling UTF-32 encoded characters.- Strings using
char32_tmust be prefixed withU. char32_tensures compatibility with modern Unicode encoding standards and supports all Unicode code points.- Outputting
char32_tdata typically requires casting, asstd::coutdoes not natively support it. - Use
char32_twhen working with UTF-32 strings for wide character support in modern applications.
