On February 28, 2019, Rust version 1.33 was released, adding new pinning APIs, mainly including these.
std::pin::Pin
std::marker::Unpin
std::marker::PhantomPinned
impl !Unpin for T
When I first encountered these concepts, I felt that they were particularly convoluted and difficult to understand thoroughly. There are also many articles on the Internet about Pin and Unpin, but I feel that they are not particularly clear. It’s also difficult to understand the std::pin module documentation directly. So in this article I will try to sort out Pin and Unpin from shallow to deep, and hopefully it will help you understand these concepts.
Concept
Let’s look at the definition on the official documentation of Pin.
|
|
I’ll start by unpacking it on a macro level. Pin is one such smart pointer that wraps another pointer P inside him and guarantees that T will never be moved (moved) as long as the content pointed by the P pointer (which we call T) does not implement Unpin. The word Pin also figuratively means that Pin is like a nail that can hold T in place. So Pin is generally represented by Pin<P<T>>
in this way (P is short for Pointer and T is short for Type). This definition is a bit confusing at first glance, so let’s underline a few points.
- Pin itself is a smart pointer. Why? Because he impl has Deref and DerefMut.
- Pin wrapped content can only be a pointer, not other common types. For example,
Pin<u32>
would not make sense. - Pin has the function of “pinning” T from moving, whether this function works depends on whether T
impl Unpin
or not. Simply put, if T implements Unpin, the “pinning” function of Pin is completely disabled, and thePin<P<T>>
is then equivalent toP<T>
. - Unpin is an auto trait, and the compiler implements Unpin for all types by default. There are only a few exceptions, and they are implemented as !Unpin. These exceptions are PhantomPinned, the structure of
impl Future
that the compiler generates for async/await desugar afterwards. - So
Pin<P<T>>
does not work by default, but only for theimpl !Unpin
cases mentioned above.
After reading these articles may still be a little confused, it does not matter, we start to analyze each article. The first thing we need to figure out is what exactly is a move and why we need to prevent it from happening in some cases.
What exactly is a move?
According to the official definition: the process of transferring ownership is move, so let’s look at a familiar example.
|
|
The basic concept is not explained here, what we need to figure out is let s2 = s1;
what is happening in this line.
Borrowing from this diagram in the TRPL book, both variables s1 and s2 are allocated on the stack, and the string “Hello” is allocated on the heap, where the ptr field is a pointer to the string. move occurs when the compiler creates a new piece of memory on the stack, s2, and then copies the contents of s1 to s2 as is. and then copies the contents of the s1 stack to s2, immediately invalidating the original s1 memory.
Look at another example of a move.
We swap the contents of two mutable borrowings &mut via the std::mem::swap()
method, where a move also occurs.
These two kinds of moves are very common in Rust and do not cause any problems at all. So what exactly is the situation that needs to prevent a move from happening?
There really is, and that is self-referential structures!
Move of Self-Referential Structs
Self-Referential Structsis a structure like this, where one of its internal members is a reference to another member. For example, like this.
But the fact is that there is no way to construct a self-referential structure like Test through Safe Rust, and Rust’s support for self-referential structures is still very imperfect. The only workaround is to use pointers.
|
|
Let’s try the move of this self-referencing structure.
|
|
Here is the printout.
Have you noticed that something is wrong! What’s the problem? The reason is that field b in the Test structure is a pointer to field a, which stores the address of field a on the stack. After swapping the two Test structures by the swap()
function, fields a and b are moved to each other’s memory area, but the contents of a and b themselves remain unchanged. That is, the pointer b still points to the original address, but this address now belongs to another structure! Not only is this not a self-referential structure anymore, but what’s even scarier is that this pointer could lead to even more dangerous problems, which Rust will never allow! 👇 The following diagram can help to understand.
More critically, Rust’s Generator and async/await set are based on self-referential structs. If this problem is not solved at the root, the foundation of Rust’s claim to be Memory Safe is completely shaken.
For more on the principles of async/await, it is highly recommended to read these two books.
So let’s find the root cause of this problem and figure out how to fix it at the source!
What is the root cause?
We found that the most critical line of code in the above example is std::mem::swap(&mut test1, &mut test2)
, which is what caused our self-referenced structure to fail and caused the memory safety problem. So don’t we just need to avoid applying this swap()
function to our self-referenced structures? But how do we avoid it? Let’s look at the definition of the swap()
method.
|
|
Its parameters require variable borrowing &mut, so as long as we find a way to keep variable borrowing from being exposed under Safe Rust!
Or take Test as an example, it has no way to restrict itself from variable borrowing, because we can easily get it directly with &mut Test{...}
and we can easily get it. What about going to the standard library, Box<T>
? Ignoring its performance issues for now, let’s wrap the structure T in Box and see if Box can guarantee not to expose &mut T
out. Looking at the API documentation, unfortunately we can’t. The return value of Box::leak() is &mut T
, and what’s more Box impl has DerefMut, even if we don’t use leak()
we can also easily get &mut T
by * &mut Box<T>
dereference!
No need to look, there is really no such API in the standard library before Pin that prevents from not exposing &mut T
under Safe Rust.
So, it’s time for Pin to make an appearance!
Pin
We found out where the root of the problem is, and Pin is the solution to that problem from the root. Now that we’re clear, doesn’t it seem that we can sum it up in one sentence: Pin is a smart pointer that won’t let you expose variable borrowing &mut at Safe Rust?
The answer is: not entirely true. This is where the Pin concept initially left everyone flabbergasted. Let Pin himself answer everyone’s confusion. Pin says: “Don’t you want me to make sure that the pointer P<T>
wrapped by me is always pinned and not allowed to move? I can promise, but I have a principle. That is that I can never pin a friend who holds a pass, and that pass is Unpin. If you don’t have this pass, rest assured that I will nail you to death!”
As an example. Let’s say I am Pin
and you are P<T>
, if you impl have Unpin, I will provide two ways for you to get &mut T
under Safe Rust.
-
The first one, using:
Pin::get_mut()
-
The second one, I impl the DerefMut, you can dereference to get
&mut T
But rustc dad is too lenient, he defaulted to all your types issued a pass (all achieved Unpin )! I’m almost out of a job!
|
|
The only thing I’m glad about is that he left me a little buddy named PhantomPinned. Don’t look at his strange name, but he’s my very favorite right-hand man! Because he achieves it !Unpin!
Papa rustc also said that if you want to “go straight” and get rid of Unpin, you can do so in two ways.
-
Use PhantomPinned . With it, rustc daddy won’t let you implement Unpin
-
Give yourself a manual
impl !Unpin
. The prerequisite is that you use the nightly version and that you need to introduce #! [feature(negative_impls)]
If you meet any one of the above two conditions, I’ll make sure you can’t get the variable borrow &mut T
under Safe Rust (go through my API if you don’t believe me), and if you can’t get &mut T
you can’t act on std::mem::swap()
, which means you’re nailed by me! Do you think rustc daddy has enchanted me? You’re wrong, I work just like that! Thanks to the rich and powerful type system of the Rust world, so do my other brothers Sync, Send, none of us have any so-called magic!
Of course I still provide an unsafe get_unchecked_mut()
, whether you have implemented Unpin or not, you can get &mut T
by calling this method, but you need to abide by the Pin’s contract (refer below), otherwise you are responsible for the consequences of what goes wrong!
Pin’s contract
For Pin<P<T>>
,
- If
P<T>
matches Unpin, thenP<T>
has to keepP<T>
unpinned from the time it is wrapped by Pin to the time it is destroyed - If
P<T
> is !Unpin, thenP<T>
is guaranteed to be pinned from the time it is wrapped by Pin until it is destroyed
With the above Pin self-reference, let’s summarize in another sentence: If you implement Unpin, Pin allows you to get &mut T
under Safe Rust, otherwise it will pin you under Safe Rust (i.e., you can’t get &mut T
).
Next we use Pin to fix the problem with the self-referencing structs above.
How to construct a Pin
First we need to sort out how to wrap P<T>
in Pin, that is, how to construct a Pin. A look at the documentation will show that there are several main ways to do this.
Pin::new()
You can safely call Pin::new()
to construct a Pin if the T that your P points to is Unpin. You can see that it is actually calling unsafe Pin::new_unchecked()
at the bottom. The reason why Pin::new()
is safe is that the “pinned” effect of Pin does not work in the case of Unpin, and it is the same as a normal pointer.
Pin::new_unchecked()
This method is simple, but it is unsafe. The reason it is marked as unsafe is that the compiler has no way to guarantee that the user’s subsequent operations must comply with the Pin contract . As long as there is a possibility of violating the contract, it must be marked as unsafe, because it is the user’s problem and the compiler has no way to guarantee it. If the user constructs a Pin<P<T>>
with Pin::new_unchecked()
and then the life cycle of Pin ends, but P<T>
still exists, the subsequent operation may still be moved, causing memory insecurity.
|
|
Other
including Box::pin()
, Rc::pin()
and Arc::pin()
, etc., the underlying are calls to the above Pin::new_unchecked()
, no further elaboration.
Application of Pin
Pin can be classified as on the stack or on the heap, depending on whether that pointer P you want to Pin is on the stack or on the heap. For example, Pin<&mut T>
is on the stack and Pin<Box<T>>
is on the heap.
Pin to the stack
|
|
We tried to pin &mut Test
on the stack and then tried to call get_mut()
as an argument to std::mem::swap()
and found that it didn’t compile. the Rust compiler stopped us from making the mistake from the compile stage.
Pin to the heap
|
|
Here Box::pin()
is used to pin Test to the heap. Uncommenting any line will fail to compile, because Test is !Unpin.
Future
Next, let’s talk about one of the most important applications of Pin at the moment: Future . When the Pin API was first introduced by the official asynchronous group in 2018, the original intention was to solve the problem of self-referencing within Future. Because async/await is implemented through Generator, Generator is implemented through anonymous structs. If there is a cross-await reference in the async function, it will cause the underlying Generator to have a cross-yield reference, and the anonymous structure generated according to Generator will be a self-referential structure! Then this self-referencing structure will be impl Future
, and the asynchronous Runtime will need a variable borrow (i.e. &mut Self
) when calling the Future::poll()
function to query the state. If this &mut Self
is not wrapped in Pin, the developer’s own impl Future
will use a function like std::mem::swap()
to move &mut Self
! So that’s why poll()
in Future has to use Pin<&mut Self>
.
And of course there is a very important point not to forget! Pin only has a pinning effect on types that implement !Unpin, does this impl Future
anonymous structure have impl !Unpin
? Of course it does, there are only a few exceptions to the default !Unpin as mentioned earlier, and this anonymous structure is one of them.
|
|
The focus impl<T: Generator<ResumeTy, Yield = ()>> !Unpin for GenFuture<T> {}
, only what you see with your own eyes will convince everyone.
Other
Pin In addition to the above, there are several other concepts, such as Pin projection, Structural pin and Non-structural pin, which I do not use much myself.
There are also many APIs related to Pin in futures-rs, so if you use futures-rs in depth, you will inevitably need to deal with Pin frequently.
Summary
The following is an excerpt from the official Async Book on Pin 8 summaries as a summary, these are almost the Pin API This is pretty much all there is to it.
- If T: Unpin (which is the default), then Pin<‘a, T> is entirely equivalent to &‘a mut T. in other words: Unpin means it’s OK for this type to be moved even when pinned, so Pin will have no effect on such a type.
- Getting a &mut T to a pinned T requires unsafe if T: !Unpin.
- Most standard library types implement Unpin. The same goes for most “normal” types you encounter in Rust. A Future generated by async/await is an exception to this rule.
- You can add a !Unpin bound on a type on nightly with a feature flag, or by adding std::marker::PhantomPinned to your type on stable.
- You can either pin data to the stack or to the heap.
- Pinning a !Unpin object to the stack requires unsafe
- Pinning a !Unpin object to the heap does not require unsafe. There is a shortcut for doing this using Box::pin.
- For pinned data where T: !Unpin you have to maintain the invariant that its memory will not get invalidated or repurposed from the moment it gets pinned until when drop is called. This is an important part of the pin contract.