The last article on Pin was a shallow introduction to what Pin is all about and why it is needed, but it is still not enough to master that part of knowledge, so this article hopes to systematically sort out the knowledge points related to Pin, so I named the title “Rust Pin Advanced”.
Pin API Anatomy
To understand Pin in depth, it is essential to be familiar with all of its methods. Excluding the nightly API, Pin has 13 methods in total.
|
|
These methods can be divided into two broad categories.
Pin<P> where P: Deref
Pin<P> where P: DerefMut
As mentioned in the previous article, Pin is generally represented as Pin<P<T>>
(P is the abbreviation for Pointer and T is the abbreviation for Type), so the content wrapped in Pin can only be a smart pointer (any type that implements the Deref
trait can be called a smart pointer), and has no meaning for other ordinary types. Since &T
and &mut T
implement Deref
and DerefMut
respectively, Pin<&'a T>
and Pin<&'a mut T>
are considered special implementations of these two classes respectively.
At first glance, these 13 methods look a bit haphazard, but they are actually very well designed, to the point of symmetry. By function, these methods can be divided into 5 major categories, each of which is subdivided into 2 to 3 categories according to mutability or compliance with the T: Unpin
restriction. Variable versions end in mut
, because unsafe
versions that do not conform to the T: Unpin
restriction contain unchecked
.
Functions | Methods | Remarks |
---|---|---|
Construct Pin |
new() / new_unchecked() |
Distinguish between safe and unsafe versions by whether they satisfy the T: Unpin restriction. |
Convert Pin type | as_ref() / as_mut() |
Converts &/&mut Pin<P<T>> to Pin<&/&mut T> . |
Get the borrow of T inside P<P<T>> |
get_ref() / get_mut() / get_unchecked_mut() |
consume ownership and get the borrow of T inside. There are two versions by mutability. Since &mut T is the “root of all evil”, get_mut also distinguishes between safe and unsafe versions according to whether or not they satisfy the T: Unpin restriction. |
Consume Pin ownership and get the pointer inside P | into_inner() / into_inner_unchecked() |
Distinguish between safe and unsafe versions by whether they satisfy the T: Unpin restriction. Also, to avoid conflicts with P ’s own into class methods, these APIs are designed as static methods that must be called with Pin::into_inner() , not pin.into_inner() . |
Pin projection | map_unchecked() / map_unchecked_mut() |
Usually used for Pin projection. |
There are only two methods left that are not categorized in the table above, and they are also relatively simple, namely
Pin::set()
- Sets the newT
value inPin<P<T>>
.Pin<&mut Self>::into_ref()
- ConvertsPin<&mut T>
toPin<&T>
.
It is worth noting that the implementation of new()
and new_unchecked()
, get_mut()
and get_unchecked_mut()
, into_inner()
and into_inner_unchecked()
are actually identical, the only difference is that the safe version has the Unpin
restriction.
|
|
Why should there be a distinction between safe and unsafe versions of the same code? To answer this question, we have to go back to the nature of Pin
. The essence of Pin
is to ensure that the memory address of T
in Pin<P<T>
is not changed (i.e., not moved) under safe Rust unless T
satisfies T: Unpin
. The essence of ensuring that the memory address of T
is not changed is to avoid exposing T
or &mut T
(“the root of all evil”). If you expose T
, you can just move it; if you expose &mut T
, the developer can call methods like std::mem::swap()
or std::mem::replace()
to move T
. Another thing is that the boundary between safe and unsafe in Rust must be very clear and unambiguous. So as long as you don’t satisfy T: Unpin
, then any method that needs to construct Pin<P<T>>
, expose T
or &mut T
should be unsafe.
Satisfy T: Unpin |
Not Satisfy T: Unpin |
|
---|---|---|
Construct Pin |
safe | unsafe |
Exposure T |
safe | unsafe |
Exposure &T |
safe | safe |
Exposure &mut T |
safe | unsafe |
For example, into_inner_unchecked()
returns P
, but it is indirectly exposing T
and &mut T
. Because you can easily get T
or &mut T
with *P
or &mut *P
. And you construct Pin<P<T>>
as if you were promising to abide by the Pin contract, but this step is clearly a violation of that contract.
Why is Pin::get_ref()
safe? Because it returns &T
, and there’s no way to move it: the std::mem::swap()
class method only supports &mut T
, and the compiler will error you if you dereference &T
. (Thanks again rustc) Another thing to emphasize is the type of internal mutability. For example, for RefCell<T>
, Pin<&mut RefCell<T>>.into_ref().get_ref()
returns &RefCell<T>
, while methods like RefCell<T>::into_inner()
can get T
and move it. But that’s okay, because the contract of Pin<P<T>>
is to ensure that T
inside P
is not moved, and here P
is &
, and T
is RefCell
, not T
inside RefCell<T>
. This is fine as long as there is no additional Pin<&T>
pointing to T
inside RefCell<T>
, but you’ve actually eliminated that possibility automatically when you construct RefCell<T>
. Because the argument to RefCell::new()
is value: T
, which already moves T
in.
Similarly,
Pin<&mut Box<T>>
guarantees thatBox<T>
itself is not moved, not theT
insideBox
. To ensure thatT
insideBox<T>
is not moved, just usePin<Box<T>>
.
Pin additional attributes
#[fundamental]
Traits marked with the #[fundamental]
attribute are not subject to the orphan rule. So you can give Pin<P<T>>
impl your local trait.
#[repr(transparent)
#[repr(transparent)]
This property allows Pin
to have the same ABI layout as the pointer
field inside, which can be useful in FFI scenarios.
The
#[repr(transparent)]
attribute is now stable. This attribute allows a Rust newtype wrapper (struct NewType<T>(T);
) to be represented as the inner type across Foreign Function Interface (FFI) boundaries.
Traits implemented by Pin
Let’s take a look at what traits Pin
implements that are of interest.
Unpin
|
|
Since Unpin
is an auto trait, Pin<P<T>
will also achieve Unpin
if it satisfies P: Unpin
. And almost all P
s will be Unpin
, so Pin<P<T>>
will almost always be Unpin
. This implementation is important, especially if the T
in question is a Future
. It doesn’t matter if your Future
satisfies Unpin
or not, after you wrap it in Pin<&mut ... >
, it’s a Future
that satisfies Unpin
(because Pin<P>
implements Future
, as we’ll see later). Many asynchronous methods may require your Future
to satisfy Unpin
before they can be called, and the Future
returned by the async fn
method obviously does not satisfy Unpin
, so you often need to pin this Future
to it. For example, use the macro tokio::pin!().
Also, it needs to be emphasized again that
Pin
itself is notUnpin
has nothing to do with whetherT
isUnpin
or not, only withP
.Pin
has nothing to do with whetherP
isUnpin
or not, it has to do withT
.
The above two sentences are a bit confusing, but after you figure it out, you won’t be confused about many pin scenarios.
Deref and DerefMut
These two traits are critical to Pin
. Only when Deref
is implemented is Pin<P>
a smart pointer, so that the developer can seamlessly call the methods of P
. It is important to note that DerefMut
is implemented for Pin<P<T>>
only if T: Unpin
is satisfied. This is because one of the responsibilities of Pin<P<T>>
under Safe Rust is to not expose &mut T
without satisfying T: Unpin
.
In addition, after implementing these two traits, you can dereference &T
and &mut T
respectively, but there is a difference between this dereference and get_ref()
and get_mut()
. Take &T
for example, suppose there is let p = Pin::new(&T);
, dereference p
to get &T
: let t = &*p
;, here the lifecycle of &T
is actually equal to the lifecycle of &Pin::new(&T)
. And Pin::new(&T).get_ref()
gets the lifecycle of &T
and the lifecycle of Pin
itself are equal.
Why is this the case? Let’s look at the syntactic sugar of dereferenced smart pointers after we expand it.
The code for Pin
’s Deref
implementation is: Pin::get_ref(Pin::as_ref(self))
, while the code for Pin::as_ref()
is as follows. By comparison, you can see that the lifecycle of &T
obtained by dereferencing is indeed different from that obtained by get_ref()
.
Another thing worth noting is that Pin::as_ref()
and Pin::as_mut()
will dereference self.pointer
, which actually calls its deref()
or deref_mut()
methods. These two methods are implemented by P
itself, so there is a possibility of a “malicious implementation” of T
move here. But this “malicious implementation” will be ruled out by Pin’s contract: this is caused by your own “malicious implementation”, not by using Pin
.
The documentation for
Pin::new_unchecked()
makes a point of emphasizing this point. By using this method, you are making a promise about the P::Deref and P::DerefMut implementations, if they exist. Most importantly, they must not move out of their self arguments: Pin::as_mut and Pin::as_ref will call DerefMut::deref_mut and Deref::deref on the pinned pointer and expect these methods to uphold the pinning invariants.
|
|
In the above example, we construct a Pin<Boz<Unmovable>>
, and then call the as_mut()
method to dereference this Boz
, which has a “malicious” DerefMut
implementation that moves away this Unmovable
. But I obviously have it Pin
in place.
Future
Pin
also implements Future
, which is closely related to Unpin
, so we’ll cover that in the next section.
Unpin and Future
One of the big things that confuses beginners about Rust’s pinning API is the introduction of Unpin
, which can often be confusing, so it’s important to get a thorough understanding of Unpin
, and in particular its relationship to Future
.
As mentioned before, Unpin
is an auto trait, and almost all types implement Unpin
, including some types you didn’t realize. For example.
- &T:
impl<'a, T: ?Sized + 'a> Unpin for &'a T {}
- &mut T:
impl<'a, T: ?Sized + 'a> Unpin for &'a mut T {}
- *const T:
impl<T: ?Sized> Unpin for *const T {}
- *mut T:
impl<T: ?Sized> Unpin for *mut T {}
- Other, including
Box
,Arc
,Rc
, etc.
Note that here they are Unpin
regardless of whether T
satisfies T: Unpin
or not. The reason for this has already been stated: The ability of Pin
to pin T
has nothing to do with whether P
is Unpin
or not, but only with T
.
As mentioned in the previous article, only std::marker::PhatomPinned, which contains the type PhatomPinned, and
.await
the structure that follows the desyntactic sugar is!Unpin
, which is not repeated here.
Unpin is a safe trait
Another important feature: Unpin
is a safe trait, which means you can implement Unpin
for any type under safe Rust, including your Future
type.
We prepare two assert functions in advance, which will be used later.
|
|
If you want to poll this Dummy
future in another Future
it’s no problem at all. The futures
crate even provides a series of unpin versions of methods to help you do this, such as FutureExt::poll_ unpin().
You can see that this is &mut self
, not self: Pin<&mut Self>
.
However, the pin projection scenario requires special attention, if you have a field of type !Unpin
, you can’t implement Unpin
for this type. See the official website Pinning is structural for field for details.
Why Future can be Unpin
Some people may ask, “Wasn’t Pin originally designed to solve the problem of self-referencing structures that don’t get moved when implementing Future
? Why is it possible to implement Unpin
for the Future
type? The reason is this: if you implement Future
as a self-referential structure, then of course it can’t be Unpin
, but otherwise it’s perfectly fine to implement Unpin
. The example above, and many third-party libraries’ Future
types, do not have self-referential structs, so you can move with confidence, so it can be Unpin
. Another advantage is that you can use the safe version of the Pin::new()
method to construct Pin
to poll future, without having to deal with unsafe.
Pin’s Future implementation
The reason we moved here to talk about the Future
implementation of Pin
is that 1.56 has a PR #81363 that removes the P: Unpin
restriction. Let’s first look at why we need to implement Future
for Pin
, and then analyze why the Unpin
restriction can be let go here.
|
|
The reason for implementing Future
for Pin
is simply to make it easier to call poll()
, especially in the pin projection scenario. Since self
of poll()
is of type Pin<&mut Self>
, you can’t call poll()
directly with future
.
You have to construct a Pin<&mut Dummy>
before you can call poll()
. After implementing Future
for Pin
, you can just write: Pin::new(&mut dummy).poll(ctx)
, otherwise you need to write Future::poll(Pin::new(&mut dummy), ctx)
.
Again, let’s see why P::Unpin
is not needed here. First, the purpose of this method is to poll P::Target
, a Future
, and the Self
of the poll()
method is Pin<P<T>>
and self
is Pin<&mut Pin<P<T>>>
(note that there are two layers of Pin
here). We need to safely convert Pin<&mut Pin<P<T>>>
to Pin<&mut T>>
in order to call poll()
on P::Target
. It’s easy to get Pin<&mut T>
, there’s Pin::as_mut()
, and both versions end up calling as_mut()
, so there’s no problem here. But the signature of as_mut()
is &mut self
, which means we have to get &mut Pin<P<T>>
first. If we reduce Pin<&mut Pin<P<T>>>
to the basic form Pin<P<T>>
, then &mut
is the P
and Pin<P<T>>
is the T
. To get &mut Pin<P<T>>>
from Pin<&mut Pin<P<T>>
is actually to get &mut T
from Pin<P<T>>
. Both get_mut()
and get_unchecked_mut()
methods are satisfied, the only difference is the Unpin
restriction, which is where that PR change comes in. Without the Unpin
restriction, we would have to use the unsafe version of get_unchecked_mut()
. But it’s completely safe here, because we call as_mut()
as soon as we get &mut Pin<P<T>>
, and we don’t move it. So the previous P: Unpin
is redundant. For more details, see the documentation and source code comments for Pin::as_deref_mut().
Why Unpin constraints are needed
As mentioned above, some asynchronous-related APIs require your type to meet Unpin
in order to be called. As far as I can tell, these APIs fall into three general categories.
- Scenarios that require
&mut future
. **For example, tokio::select!(), a macro that requires yourFuture
to satisfyUnpin
. - The
AsyncRead
/AsyncWrite
scenario. **For example, the method tokio::io::AsyncWriteExt requires yourSelf
to satisfyUnpin
. Future
itself isUnpin
compliant and does not want to deal directly withPin
. **TheFutureExt::poll_unpin()
method mentioned above falls into this category.
Class (2) is mainly related to self
of AsyncRead
/ AsyncWrite
which requires Pin<&mut Self>
, there are quite a few discussions about this in the community, not the focus of this article, check the following information if you are interested.
- futures-rs: Should AsyncRead and AsyncWrite take self by Pin?
- tokio: Should AsyncRead/AsyncWrite required pinned self?
- Tokio’s AsyncReadExt and AsyncWriteExt require Self: Unpin. Why and what to do about it?
Second, tower is also considering whether to add Pin<&mut Self>
: Pinning and Service.
Regarding class (1), the main reason is that the implementation of Future
for &mut Future
specifies the need for F: Unpin
.
So it comes down to figuring out why we need Unpin
here. Let’s start with a scenario where we have a future
that we need to keep polling in a loop
, but Future::poll()
consumes ownership of self
every time it is called. So we need to mutably borrow this future
to avoid consuming ownership of future
. But after &mut future
there is a risk of moving the future
(“the root of all evil”), so either your future
is Unpin
or you have to pin it and borrow it mutably (i.e. &mut Pin<&mut future>
). And it just so happens that Pin<P> where P: DerefMut
implements Future
! (as mentioned in the previous section) and Pin<P>
also satisfies Unpin
! It’s so perfect that we can just implement Future
for &mut F
, as long as F
satisfies Future + Unpin
. The advantage of this is that if your future
satisfies Unpin
, then you can just poll it multiple times in the loop
and not worry about the move; if your future
doesn’t satisfy Unpin
, that’s fine, just pin it. For example, in the following example, because tokio::time::Sleep doesn’t satisfy Unpin
, you need to pin it with tokio::pin!()
before you can compile it.
|
|
By the same token, the implementation of Future
for Box<F>
also requires Unpin
.
Other scenarios that require Pin
I often encounter people asking questions like “Do I need to use Pin
to solve this scenario?” I look at the question and see that it has nothing to do with Pin
, so I reply with this classic quote.
Rust Community Quote: Whenever you wonder if Pin could be the solution, it isn’t.
The Pinning API is designed for generality, not just to solve the problem of self-referential struct move in asynchronous, but also for other scenarios where Pin
is needed.
Intrusive collections
Intrusive collections is another application scenario for Pin
. The documentation for Pin
mentions the example of intrusive doubly-linked list, but it is similar for other intrusive data structures (e.g. intrusive single-linked tables). However, the documentation is only a few sentences, which is not very good, so I will briefly summarize it here.
First of all, you need to understand what intrusive collections are. Almost all the data structures we use in collections are non-intrusive, such as the standard library Vec
, LinkedList
and so on. The characteristic of non-intrusive type collections is that the elements in the collection are completely decoupled from the collection itself, the collection does not need to care what type each element is, and the collection can be used to hold elements of any type. However, a collection of type intrusive is a completely intrusive collection, where the prev
or next
pointer is defined on top of the element.
Using C++ as an example, a non-intrusive doubly linked list can be defined like this
And the intrusive version needs to be written like this
The pseudo-code for the Rust version of intrusive would probably also look like this.
You can see that the biggest difference between the two is whether the pointer is placed on top of the collection or on top of the element. The two types of collections have their own advantages and disadvantages, while the intrusive type has the advantage of better performance and the disadvantage of not being generic and requiring repeated definitions of collections for different elements. Related knowledge is not the focus of this article, for more details you can take a look at the following information.
- Invasive containers provided by Google Fuchsia
- Intrusive linked lists
- Safe Intrusive Collections with Pinning
So why do intrusive collections need to use pins
? The reason is that elements have a prev or next pointer to each other, so if one element in the middle moves, the pointer address of the other elements to it will be invalid, resulting in unsafe behavior. Rust has a library called intrusive-collections that provides many intrusive collection types, and Tokio also defines intrusive collections, and no doubt they all use pins
.
Other
In fact, as long as we need to deal with the scenario of preventing being moved, theoretically we need to use Pin
to solve it. I can’t think of any other cases for now, so I’ll add them later if I find any new ones, or if you know of others, please let me know.
Summary
This article is a little long, so let’s summarize.
- The API for
Pin
is very well designed, even full of symmetry, and its methods can be roughly divided into 5 categories. It involvesUnpin
and&mut T
which can be subdivided into safe and unsafe. #[fundamental]
and#[repr(transparent)]
ofPin
are important, but you generally don’t need to care about it.- The traits implemented by
Pin
need to focus onUnpin
,Deref
/DerefMut
andFuture
, and understanding them will allow you to fully masterPin
. Unpin
andFuture
are very closely related.Unpin
is a safe trait that can theoretically be implemented arbitrarily, andFuture
can also beUnpin
. Some asynchronous APIs may requireUnpin
restrictions, and the reason for it needs to be understood, not just used.Pin
is a generic API, and there will be other scenarios that requirePin
in addition toasync / await
, such as intrusive collections.
The Pin projection, mentioned several times in the article, is not expanded, so we will discuss it in detail in the next article. See you soon!