Anyone who has used Objective-C will know that declaring an attribute as atomic
does not solve the multi-threading problem for mutable objects. If this is the case, then what is the point of this property? In this article, we’ll compare several programming languages that support reference counting and talk about the “underlying logic” of this age-old topic.
As we know, atomic
and nonatomic
are mainly for properties of object types and have no effect on primitive types. For properties of object types, if you use nonatomic
, you need to make sure that there are no multiple threads reading and writing the property at the same time, otherwise it will crash.
What is the difference between object types and primitive types in terms of reading and writing properties? The answer is reference counting. For the following code.
Let’s look at the setter method generated for it by the compiler (since it’s a generated method, there will only be assembly code here).
|
|
We notice that for the nonatomic
property, the compiler generates the same code as the stack variable assignment, which is the objc_storeStrong
runtime function. We can find the implementation of this function in the objc source code as follows.
This code contains multiple operations, including memory read/write and application count operations, and there are many interleaved points in the multi-threaded execution. The most typical example is that both threads read location
to prev
and then perform subsequent operations separately, resulting in the same object being freed multiple times, thus creating a dangling pointer.
Next, let’s see what happens in the generated code in the same scenario, replacing the attribute with atomic
.
|
|
As you can see, the key function becomes objc_setProperty_atomic
, and the implementation of this function can also be found in the source code.
|
|
Runtime is also very simple in solving this problem, we just need to ensure that the modification of the property pointer and the acquisition of the old value is an atomic operation. The atomization here uses a spin lock, and to avoid serious lock competition in case of high concurrency, a global StripedMap
is used for optimization, which is a very common optimization tool. Here you can actually use CAS operations instead of locking operations, but it needs to be verified whether the performance is really improved.
Why doesn’t the last objc_release
need to be in the lock’s critical zone? We know that the problem with nonatomic
is that multiple threads get the old value of the property and release it at the same time; with atomic
, the new value is set at the same time as the old value of the property, and there is no case where two threads get the same old value. The reference counting is also an atomic operation, so there is no need for additional locking in the case of clear ownership.
Reference counting support in other languages
The case in C++ (clang STL)
Since Objective-C solves this problem perfectly with the atomic
property, is there a similar problem in C++? Let’s also verify this using the following code.
Reading and writing someProperty
fields simultaneously in multiple threads also crashes, which means that nonatomic
in Objective-C is not a performance optimization. Just like @synchronized
, atomic
is actually an additional capability provided by Objective-C to handle this multi-threaded scenario.
The cause of the crash in C++ is very similar to nonatomic
in Objective-C, so let’s also look at what happens when we assign a value to someProperty
. Here I have written an assignment function.
The assembly code is as follows.
|
|
Since C++ supports operator=
from the object assignment operation, a simple assignment expression is actually a function call, and the result is shown here after inline. And std::move
is a cast operation that has no effect on the value content, so we can analyze the key method directly. The symbols of this method are expanded by the template at compile time, and actually correspond to the following method of std::shared_ptr
.
This code seems to do a lot of operations, but there is only one place we need to focus on, and that is the this
pointer. As mentioned at the beginning of the article, two threads perform this operation at the same time, and the only thing that could be the same is the this
pointer to the old value of the variable. Let’s continue down the call chain.
There are two swap operations here, both of which are in fact mundane swap operations on pointers, but not atomic ones.
|
|
Let’s consider two threads calling the above method at the same time, with __x
being the new value and __y
being the old value, then the step __x = __y
has the potential for both threads to get the same old value. Next, the call stack exits, and in this code will be released twice due to RAII.
From this we can see that C++ does not perform the same operations as Objective-C in this variable swapping process due to syntax features. However, the fundamental problem is that the same object is freed multiple times, because getting the old value and writing the new value are not atomic operations.
How to fix
Attempt 1
The easier way to think of is to protect the attribute assignment using std::mutex
.
This causes a minor performance problem, though. If the old value of someProperty
is uniquely referenced, then after the assignment, the old value will be released in lock scope.
Attempt 2
This potential performance problem can be optimized if we first construct a temporary variable to take over the old value and destroy the temporary variable outside the lock. We can also implement this operation here by swap.
In this way, you can achieve a similar effect to Objective-C atomic by first atomically swapping the new value with the old one, and then releasing the old value outside the lock. It is worth noting that C++ has move semantics, and the temporary variable in the first line actually swaps with val
, so that the contents of temp
after the swap are the contents of val
before, and val
becomes an invalid object. After the function scope exits, both temp
and val
will be destructed, but the destruct of val
will be a no-op. If you turn on compilation optimization, many operations of shared_ptr
will be inline, and the performance will be better.
The situation in Rust
To better answer the question in the article title, we introduce here a comparison of Rust to see how the same scenario is handled in Rust.
First we construct the code for the same logic.
|
|
After compiling we get an error: obj
has been mutably referenced multiple times, which is not allowed in Rust.
How can the compiler determine that the closure is still capturing external variables after it ends? We see the implementation of Scope spawn
in the standard library.
As you can see, the lifecycle of the closure F
is the same as the Scope itself, meaning that the captured variables inside it will also last until the destruction of the Scope. A single mutable reference is another important principle of Rust, preventing competing accesses and some other problems by this restriction.
Since you can’t have multiple mutable references, you can construct only multiple immutable references, right? Can we use “Interior Mutability” to achieve our need.
The answer is no, because Cell
does not implement Sync
, so the types containing Cell
references will not implement Send
, and these variables naturally cannot cross thread boundaries. Interestingly enough, when we look at the implementation of Cell::set
we see that
|
|
This implementation is the same as the implementation of shared_ptr
swap in C++: both get the old value, set the new value, and destroy the old value. In the case of no lock protection, the old value is released twice.
How to fix
The method is actually also very simple, the multi-threaded scenario is straightforward using Mutex
and we modify the field type.
The operation to update a field should also be swap inside the lock + drop outside the lock.
Rust has a very good design for Mutex
, where each Mutex
is explicitly bound to a value. For a value to be read or written in multiple threads, it must be protected by a Mutex
. All types that implement Send
can become Sync
when Mutex
is applied. For objects with internal mutability (e.g. Arc
), they may not be protected when used in multiple threads, but in fact thread safety is the responsibility of the object itself.
Why doesn’t Mutex
make all objects Sync
?
For !Send
types (e.g. Rc
), they generally represent some shared resource, and the types do not take into account the handling in multi-threaded scenarios. For example, when Rc
is moved to a different thread, there is a high probability that two threads drop Rc
at the same time resulting in inconsistent reference counts.
In addition, mem::swap
and the single mutable borrowing principle ensure that thread safety is guaranteed in contexts where swap can be performed, and we cannot write unsafe swap operations.
So with Rust, we can better understand the issue raised in the article’s title. To break it down, Mutex<Arc<T>>
involves two thread-safe guarantees.
- the guarantee of atomic modification of reference counts by
Arc
itself, which is implemented here using Atomic operations. - the protection of
Mutex
forArc
pointer modifications, preventing multiple releases ofArc
due to the presence of dirty values in multi-threaded operations.
That is, whether the reference counting mechanism itself is thread-safe or not has nothing to do with manipulating the same property of the same object in multiple threads.
Summary
The article seems to analyze how the reference counting mechanism in several system programming languages (Objective-C does not count if strictly speaking) behaves under multiple threads, but it actually explains the essence of thread safety: in the object model, the thread safety of an object does not mean that all scenarios in which the object is used are thread safe. External objects that are not thread-safe may have logical errors even if they operate on a thread-safe object. The reference counting in this article is just one example, and it just so happens that this example involves memory operations that can easily lead to obvious segfaults.
There are other multi-threaded scenarios that we may encounter in our daily development where the lack of thread-safe logic is even less noticeable and therefore more worthy of our attention.