There are two types of strings in Rust, String and &str, where String can be dynamically allocated, modified, and the internal implementation can be understood as Vec<u8>
, and &str
is a slice of type &[u8]
. Both of these strings can only hold legal UTF-8 characters.
For non-naturally recognizable UTF-8 characters, consider using the following types.
- File paths have dedicated Path and PathBuf classes available.
- Use
Vec<u8>
and&[u8]
- Use
OSString
and&OSStr
to interact with the operating system - Use
CString
and&CStr
to interact with C libraries
The second method above is the common way to handle non-UTF-8 byte streams, which is to use Vec<u8>
and &[u8]
, where we can also use literal values for both types of data, which we call byte string literals of type &[u8]
.
String literals
Let’s look at string literals.
Like any other language, a string is enclosed in double quotes, but one of the features of Rust is that strings can span lines, i.e. a carriage return in the middle will not cause a compile or runtime error, and the output will carry the newline character inside.
Similarly, string literals support escapes, for example, if you want to use double quotes inside them, the escape will also escape line breaks, for example, if you use \
in front of a line break, the escape, the line break, and all spaces at the beginning of the next line will be ignored.
String literals support escaping to Unicode in addition to the common \
for bytes (characters).
\xHH
: + 2 bits of hexadecimal 7-bit wide byte code, which is equivalent to the equivalent ASCII character.\u{xxxx}
: 24-bit-long hexadecimal, which represents the equivalent Unicode character.\n
/\r
/\t
denotes U+000A (LF), U+000D (CR) and U+0009 (HT)\\\
is used to escape\\
itself\0
denotes Unicode U+0000 (NUL)
Raw type string literals are escaped, meaning that the value of the string is whatever the literal value says. This type of literal is defined using r
and a number of #
s at the beginning and an equal number of #
s at the end.
This is shown below.
What if there are double quotes in the string? Rust actually supports the use of r#
to specify string bounds, since you can’t use escapes in raw strings. This #
is another way to implement escaping, for example, if there are 4 #
s in the string, then the string can be enclosed by r#####"abc####def "#####
, which means that there are more #
s than there are in it.
Byte string literals
Byte string literal values are defined using b"..."
and its derivative syntax is defined as &[u8]
, which is a completely different type than &str
, so some methods that work on &str
won’t work on &[u8]
.
For example.
The compiler will report an error because &[u8]
does not implement std::fmt::Display
.
|
|
The Byte string literal also supports escaping, but note that it only supports byte escaping, not Unicode escaping.
Byte strings also support raw definitions, similar to the standard string types, using the r
prefix to define raw byte string literal variables.
For example, in the example below, a normal byte string needs to be escaped, but a raw byte string does not need to be escaped with \
.
Summary
The following is a summary of these string literal definitions just introduced, listing the different ways of defining them and their meanings.
symbol | meaning |
---|---|
"..." |
string literal |
r"..." , r#"..."# , r##"..."## , etc. |
Raw string literal value, no escaping |
b"..." |
Byte string literal, type &[u8] |
br"..." , br#"... "# , br##"..."## , etc. |
Raw Byte string literal |
'...' |
Character Literals |
b'...' |
ASCII byte literal |