String Internals
How C# string semantics are mapped to std::string in sharp-runtime.
The Core Mapping
C# string (reference type, immutable, UTF-16 encoded) maps to C++ std::string
(value type by default, mutable, typically UTF-8 or unspecified encoding). This is the most significant
semantic difference in the string story.
| Property | C# string | sharp-runtime std::string |
|---|---|---|
| Type category | Reference type | Value type (copyable) |
| Mutability | Immutable | Mutable |
| Encoding | UTF-16 | UTF-8 (by convention) |
| Null | Can be null | Cannot be null (empty string is the convention) |
| Char type | char = UTF-16 code unit | char = 8-bit byte |
| Length | s.Length = UTF-16 code units | s.size() = bytes |
| Equality | Value equality by default | Value equality via == |
Why std::string?
Using std::string rather than a custom class avoids the overhead and complexity
of a managed string type. It interoperates naturally with all C++ standard library APIs.
The trade-off is that callers must be aware of the encoding difference when dealing with
non-ASCII characters.
System::String Role
System::String is a static utility class with a deleted constructor.
It provides the .NET-style static API (String.IsNullOrEmpty, String.Format, etc.)
but does not hold any string data. All string data lives in std::string instances.
include/System/String.hpp
Null vs Empty
In C#, null and "" are different string values.
In sharp-runtime, because std::string cannot be null, the convention is:
- A "null" C# string maps to an empty
std::string String::IsNullOrEmpty(s)returnstruefor""- Functions that return a "null string" in .NET typically return
""
std::string name = ""; // "null" string in .NET equivalent
if (System::String::IsNullOrEmpty(name)) {
// handles both "null" and "" cases
}
The charcs / char16_t Alias
C# char is a 16-bit Unicode value (UTF-16 code unit).
The sharp-runtime alias for this is charcs = char16_t,
defined in SharpRuntimeHelper.hpp:
using charcs = char16_t;
When porting C# code that uses individual char values, use charcs.
When porting code that works with strings as a whole, use std::string.
Note that std::string contains char (8-bit), not charcs.
If you need a UTF-16 string, use std::u16string — though most sharp-runtime APIs
use std::string.
Encoding
sharp-runtime does not enforce a specific encoding in std::string.
By convention, strings are expected to contain UTF-8. The System::Text::Encoding
hierarchy provides explicit encode/decode operations:
UTF8Encoding::GetBytes(string)— UTF-8 string → bytesUTF8Encoding::GetString(bytes)— bytes → UTF-8 stringASCIIEncoding,Latin1Encoding,UnicodeEncoding(UTF-16)
include/System/Text/Encoding.hpp
StringBuilder
System::Text::StringBuilder wraps a std::string and provides
append/insert/remove/replace operations. It mirrors the .NET StringBuilder API
for mutable string accumulation:
System::Text::StringBuilder sb;
sb.Append("Hello");
sb.Append(", ");
sb.Append("world");
std::string result = sb.ToString(); // "Hello, world"
include/System/Text/StringBuilder.hpp
String Interning and Identity
.NET has string interning (reference equality for same-content strings). sharp-runtime
has no such mechanism — two std::string objects with the same content are equal
via == but are distinct objects in memory. This is the expected C++ behavior.