During a debugging session a few days ago, I noticed that the Stack Trace printed by the program was not quite what I expected. After much research, I found a problematic piece of code. Can you see what the problem is?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
#include <chrono>
#include <iostream>
#include <thread>
int subtask1(int x) {
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
return x;
}
int subtask2(int x) {
std::this_thread::sleep_for(std::chrono::milliseconds(500));
return x;
}
int run(int a, int b) {
int result1;
std::thread t([&]() { result1 = subtask1(a); });
int result2 = subtask2(b);
if (result2 < 0) {
return -2;
}
t.join();
if (result1 < 0) {
return -1;
}
return 0;
}
int main(int argc, char **argv) {
if (argc < 3) {
std::cerr << "usage: " << argv[0] << " [int] [int]" << std::endl;
return 1;
}
std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
return 0;
}
|
Problem
The code above returns -2 directly when result2
is less than 0. However, if a std::thread
object is decomposed without first calling a join
or detach
member function, the std::thread
decomposition function will call std::terminate
directly to terminate the entire program.
Although it is disconcerting to call std::terminate
directly, it is not entirely unreasonable. If the std::thread
destructor function automatically calls the detach
member function, the execution of another thread may take longer than the life of its reference object. This may result in undefined behaviour. For example, in the previous example, the other thread would refer to result1
and a
. If the std::thread
destruct function calls t.detach()
when return -2
, result1
and a
will become dangling references, and accessing them will produce undefined behaviour.
If subtask1
in the above example stops in the middle of execution to wait for subtask2
, but before allowing subtask1
to continue subtask2
returns an error and triggers The std::thread
destruct function (which automatically calls t.join()
), which causes the two threads to wait for each other and creates a Dead Lock.
In addition, the automatic call to the join
member function may also lengthen the execution time of the program. In the previous example, if an error occurs in subtask2
, we don’t care what happens to subtask1
. But in order to execute t.join()
, the main thread must wait for another thread to execute. In some cases, this is unnecessarily wasteful.
Solution
First we must check the synchronization relationship between the two threads. If there is a synchronization relationship between the two threads other than “creating a new thread” and “merging the threads with the join
function” (e.g., communicating with each other as Mutex or Condition Variable), we must re-examine the synchronization protocol between the two threads. We must make sure that the “waiting thread” can definitely get a response from the “other thread”. For example.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
|
#include <chrono>
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <thread>
class scoped_thread_join {
private:
std::thread* thread_;
public:
explicit scoped_thread_join(std::thread& thread) : thread_(&thread) {}
~scoped_thread_join() {
if (thread_->joinable()) {
thread_->join();
}
}
};
bool is_valid(int x) {
return x % 2 == 0;
}
bool is_ready = false;
std::mutex m;
std::condition_variable cv;
int subtask1(int x) {
std::unique_lock<std::mutex> lock(m);
cv.wait(lock, []() { return is_ready; });
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
return x;
}
int subtask2(int x) {
if (!is_valid(x)) {
return -1; // Problemetic
}
{
std::lock_guard<std::mutex> lock(m);
is_ready = true;
cv.notify_all();
}
std::this_thread::sleep_for(std::chrono::milliseconds(500));
return x;
}
int run(int a, int b) {
int result1;
std::thread t([&]() { result1 = subtask1(a); });
scoped_thread_join thread_guard(t);
int result2 = subtask2(b);
if (result2 < 0) {
return -2;
}
t.join();
if (result1 < 0) {
return -1;
}
return 0;
}
int main(int argc, char **argv) {
if (argc < 3) {
std::cerr << "usage: " << argv[0] << " -1 -3" << std::endl;
return 1;
}
std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
return 0;
}
|
The above program forgets to notify the other side when it handles an error. If your program has this problem, simply calling the join
or detach
functions will not solve it. We must define an “Error State” in the synchronization protocol so that another thread can handle the exception. Example.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
|
#include <chrono>
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <thread>
class scoped_thread_join {
private:
std::thread* thread_;
public:
explicit scoped_thread_join(std::thread& thread) : thread_(&thread) {}
~scoped_thread_join() {
if (thread_->joinable()) {
thread_->join();
}
}
};
bool is_valid(int x) {
return x % 2 == 0;
}
bool is_error = false; // Added
bool is_ready = false;
std::mutex m;
std::condition_variable cv;
int subtask1(int x) {
std::unique_lock<std::mutex> lock(m);
cv.wait(lock, []() { return is_ready || is_error; });
if (is_error) { // Added
// Return error early
return -1;
}
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
return x;
}
int subtask2(int x) {
if (!is_valid(x)) {
std::lock_guard<std::mutex> lock(m); // Added
is_error = true; // Added
cv.notify_all(); // Added
return -1;
}
{
std::lock_guard<std::mutex> lock(m);
is_ready = true;
cv.notify_all();
}
std::this_thread::sleep_for(std::chrono::milliseconds(500));
return x;
}
int run(int a, int b) {
int result1;
std::thread t([&]() { result1 = subtask1(a); });
scoped_thread_join thread_guard(t);
int result2 = subtask2(b);
if (result2 < 0) {
return -2;
}
t.join();
if (result1 < 0) {
return -1;
}
return 0;
}
int main(int argc, char **argv) {
if (argc < 3) {
std::cerr << "usage: " << argv[0] << " -1 -3" << std::endl;
return 1;
}
std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
return 0;
}
|
We can also think about whether to rewrite the whole synchronisation process. For example, moving the is_valid(x)
check above out of subtask2
and eliminating the problem directly before creating the thread t
. However, this is beyond the scope of this article and will be covered in a future issue.
After checking the synchronization relationship, we must think about solving the std:🧵:~thread
call to std::terminate
with join()
or detach()
. Using join()
is simpler, but as mentioned before join()
makes the main thread wait for another thread (whether you care about the result or not). On the other hand, with detach()
we have to make sure that the object used by the other thread is not deconstructed during its execution. A simple sufficient condition is to allow another thread to hold the objects it needs. If the situation is too complex to determine simply, join()
is a safer choice.
The following four solutions are described separately.
- call the
join
function
- use
std::jthread
instead (call the join
variant)
- call the
detach
function
- change to
std::async
(variant calling detach
)
Solution 1: Calling the join function
The most straightforward way to do this is to call the join
member function before std:🧵:~thread
is called. The code at the beginning of this article could be rewritten as follows
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
|
#include <chrono>
#include <iostream>
#include <thread>
int subtask1(int x) {
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
return x;
}
int subtask2(int x) {
std::this_thread::sleep_for(std::chrono::milliseconds(500));
return x;
}
int run(int a, int b) {
int result1;
std::thread t([&]() { result1 = subtask1(a); });
int result2;
try {
result2 = subtask2(b);
} catch (...) {
t.join();
throw;
}
if (result2 < 0) {
t.join();
return -2;
}
t.join();
if (result1 < 0) {
return -1;
}
return 0;
}
int main(int argc, char **argv) {
if (argc < 3) {
std::cerr << "usage: " << argv[0] << " [int] [int]" << std::endl;
return 1;
}
std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
return 0;
}
|
Because of the Exceptions, the whole program can become very cumbersome. We can write a scoped_thread_join
class.
1
2
3
4
5
6
7
8
9
10
11
12
|
class scoped_thread_join {
private:
std::thread* thread_;
public:
explicit scoped_thread_join(std::thread& thread) : thread_(&thread) {}
~scoped_thread_join() {
if (thread_->joinable()) {
thread_->join();
}
}
};
|
Then rewrite the program as follows
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
|
#include <chrono>
#include <iostream>
#include <thread>
class scoped_thread_join {
private:
std::thread* thread_;
public:
explicit scoped_thread_join(std::thread& thread) : thread_(&thread) {}
~scoped_thread_join() {
if (thread_->joinable()) {
thread_->join();
}
}
};
int subtask1(int x) {
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
return x;
}
int subtask2(int x) {
std::this_thread::sleep_for(std::chrono::milliseconds(500));
return x;
}
int run(int a, int b) {
int result1;
std::thread t([&]() { result1 = subtask1(a); });
scoped_thread_join thread_guard(t);
int result2 = subtask2(b);
if (result2 < 0) {
return -2;
}
t.join();
if (result1 < 0) {
return -1;
}
return 0;
}
int main(int argc, char **argv) {
if (argc < 3) {
std::cerr << "usage: " << argv[0] << " [int] [int]" << std::endl;
return 1;
}
std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
return 0;
}
|
or, more recently, merge t.join()
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
|
#include <chrono>
#include <iostream>
#include <thread>
class scoped_thread_join {
private:
std::thread* thread_;
public:
explicit scoped_thread_join(std::thread& thread) : thread_(&thread) {}
~scoped_thread_join() {
if (thread_->joinable()) {
thread_->join();
}
}
};
int subtask1(int x) {
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
return x;
}
int subtask2(int x) {
std::this_thread::sleep_for(std::chrono::milliseconds(500));
return x;
}
int run(int a, int b) {
int result1;
std::thread t([&]() { result1 = subtask1(a); });
{
scoped_thread_join thread_guard(t);
int result2 = subtask2(b);
if (result2 < 0) {
return -2;
}
}
if (result1 < 0) {
return -1;
}
return 0;
}
int main(int argc, char **argv) {
if (argc < 3) {
std::cerr << "usage: " << argv[0] << " [int] [int]" << std::endl;
return 1;
}
std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
return 0;
}
|
Solution 2: Use std::jthread instead
C++ 20 adds a new class std::jthread
(with an extra j
in front of its name). Unlike std::thread
, std::jthread
calls the join
function in the destructor function. So we can also rewrite the original program as
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
|
#include <chrono>
#include <iostream>
#include <thread>
#ifndef __cpp_lib_jthread
// jthread library: https://github.com/josuttis/jthread
#include "jthread.hpp"
#endif
int subtask1(int x) {
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
return x;
}
int subtask2(int x) {
std::this_thread::sleep_for(std::chrono::milliseconds(500));
return x;
}
int run(int a, int b) {
int result1;
std::jthread t([&]() { result1 = subtask1(a); });
int result2 = subtask2(b);
if (result2 < 0) {
return -2;
}
t.join();
if (result1 < 0) {
return -1;
}
return 0;
}
int main(int argc, char **argv) {
if (argc < 3) {
std::cerr << "usage: " << argv[0] << " [int] [int]" << std::endl;
return 1;
}
std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
return 0;
}
|
C++ 20 Alternative
C++ 20 is relatively new, however. At the time of writing, some C++ implementations do not have the std::jthread
class. As an alternative, we can use the jthread library written by Nicolai Josuttis.
1
|
git clone https://github.com/josuttis/jthread
|
Then add to our program.
1
2
3
4
|
#ifndef __cpp_lib_jthread
// jthread library: https://github.com/josuttis/jthread
#include "jthread.hpp"
#endif
|
Lastly the following party instructions are compiled.
1
|
g++ -pthread -std=c++17 -Ijthread/source solution_jthread.cpp
|
Solution 3: Call the detach function
We also call detach
after we have created a thread. However, to ensure the lifecycle of the object, I have changed the reference Lambda Capture ([&]
) to a value Lambda Capture ([a, sync]
). In addition, I have defined the data structure required for synchronisation as a struct
and made it common to both threads with std::shared_ptr
.
The normal flow of t.join()
should also be rewritten as a Mutex and Condition Variable. The main thread will lock the std::mutex
object sync->m
with std::unique_lock
and then wait for the return value with sync->cv.wait(lock, ...)
wait for the value to be returned. The other thread will run subtask1
first. After getting the return value, it locks sync->m
with std::lock_guard
, sets the return value, and then notifies the main thread with sync->cv.notify_all()
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
|
#include <chrono>
#include <condition_variable>
#include <iostream>
#include <memory>
#include <mutex>
#include <thread>
int subtask1(int x) {
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
return x;
}
int subtask2(int x) {
std::this_thread::sleep_for(std::chrono::milliseconds(500));
return x;
}
struct Sync {
std::mutex m;
std::condition_variable cv;
bool result1_ready = false;
int result1;
};
int run(int a, int b) {
auto sync = std::make_shared<Sync>();
std::thread t([a, sync]() {
int tmp = subtask1(a);
std::lock_guard<std::mutex> lock(sync->m);
sync->result1 = tmp;
sync->result1_ready = true;
sync->cv.notify_all();
});
t.detach();
int result2 = subtask2(b);
if (result2 < 0) {
return -2;
}
std::unique_lock<std::mutex> lock(sync->m);
sync->cv.wait(lock, [&]() { return sync->result1_ready; });
if (sync->result1 < 0) {
return -1;
}
return 0;
}
int main(int argc, char **argv) {
if (argc < 3) {
std::cerr << "usage: " << argv[0] << " [int] [int]" << std::endl;
return 1;
}
std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
return 0;
}
|
Solution 4: Use std::async instead
If you find it too cumbersome to write std::mutex
and std::condition_variable
yourself, we can also rewrite the std::async
function as defined by the <future>
header file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
#include <chrono>
#include <future>
#include <iostream>
#include <thread>
int subtask1(int x) {
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
return x;
}
int subtask2(int x) {
std::this_thread::sleep_for(std::chrono::milliseconds(500));
return x;
}
int run(int a, int b) {
std::future<int> result1 = std::async(std::launch::async, subtask1, a);
int result2 = subtask2(b);
if (result2 < 0) {
return -2;
}
if (result1.get() < 0) {
return -1;
}
return 0;
}
int main(int argc, char **argv) {
if (argc < 3) {
std::cerr << "usage: " << argv[0] << " [int] [int]" << std::endl;
return 1;
}
std::cout << run(std::stoi(argv[1]), std::stoi(argv[2])) << std::endl;
return 0;
}
|
In the above code, std::async(std::launch::async, subtask1, a)
creates a thread to execute subtask1(a)
. After the execution, the return value of subtask1
will be put into std::future<int>
. We can get the return value with result1.get()
. If subtask1
takes longer to execute, result1.get()
will stop and wait for the result of subtask1
.
The underlying implementation of std::async
also calls the detach
member function. So, as with solution 3, we must ensure that the life cycle of the object is longer than the execution time.