Preface
The reason for writing this article is that a colleague at work recently wrote a Goroutine directly using the Go keyword, and then had a null pointer problem that caused the whole program to go down because there was no recover. The code looks like this.
Returned results.
Note that there is a uniform exception handling in the outer layer of Goroutine, but obviously the outer deferer of Goroutine does not cover this exception.
The reason for this is that we don’t know much about the Go source code. panic & recover have their own scope.
- recover only works if called from within a defer.
- panic allows multiple calls to be nested within a defer.
- panic will only work for the current Goroutine defer
The reason why panic will only work for the current Goroutine’s defer is that when the newdefer allocates the _defer
structure object, it will chain the allocated object to the head of the current Goroutine’s _defer
table.
Source code analysis
_panic struct
|
|
- argp is a pointer to the argument to the defer call.
- arg is the argument passed in when we call panic.
- link is a pointer to an earlier call to the
runtime._panic
structure, i.e. painc can be called consecutively, forming a chain between them. - recovered indicates whether the current
runtime._panic
has been recovered. - aborted indicates whether the current panic has been forcibly terminated.
The main effect of these three keywords for pc, sp, and goexit is that it is possible for a panic to occur in a defer and then be recovered in an upper-level defer by recovering it, then the recovered process will actually resume normal execution on top of the Goexit framework and therefore abort Goexit.
A discussion of the pc, sp and goexit fields and code commits can be found here: https://github.com/golang/go/commit/7dcd343ed641d3b70c09153d3b041ca3fe83b25e and this discussion runtime: panic + recover can cancel a call to Goexit.
panic process
- the compiler converts the keyword panic to
runtime.gopanic
and calls it, then it keeps fetching deferers from the current Goroutine’s defer table in a loop and executing them. - if the defer function called has recover in it, then
runtime.gorecover
is called, which modifies the recovered field ofruntime._panic
to true. - After calling the defer function and returning to the
runtime.gopanic
main logic, checking that the recovered field is true will retrieve the program counterpc
and stack pointersp
from theruntime._defer
structure and call theruntime.recovery
function to recover the program.runtime.recvoery
sets the return value of the function to 1 during dispatch. - when the return value of the
runtime.deferproc
function is 1, the compiler-generated code jumps directly to the caller function before it returns and executesruntime.deferreturn
, then the program has recovered frompanic
and executes the normal logic. - after
runtime.gopanic
has executed all the _defer and has not encountered recover either, thenruntime.fatalpanic
is executed to terminate the program and return error code 2.
So the whole process is divided into two parts: 1. logic with recover, where the panic can recover, and 2. logic without recover, where the panic simply crashes.
Trigger panic to crash directly
|
|
Let’s look at the logic first.
- it first fetches the current Goroutine, creates a new
runtime._panic
and adds it to the top of the _panic chain of the Goroutine it’s in. - then it goes into a loop to get the current Goroutine’s defer table and calls reflectcall to run the defer function.
- after running it removes the defer from the current Goroutine, as we assume here that there is no recover logic, then fatalpanic will be called to stop the whole program.
|
|
fatalpanic It prints out the full panic message and the arguments passed in when it is called via printpanics before aborting the program, then calling exit and returning error code 2.
Triggering a panic recovery
The recover keyword is called in runtime.gorecover
.
If the current Goroutine does not call panic, then the function will simply return nil; p.Goexit
determines if the current one is triggered by goexit, and as the example above says, recover is not able to block goexit.
If the condition is met, then the recovered field will eventually be modified to ture, and then recovery will be performed in runtime.gopanic
.
|
|
Two mcall(recovery) calls to recovery are included here.
The first part if gp._panic ! = nil && gp._panic.goexit && gp._panic.aborted
determines mainly for Goexit, ensuring that Goexit will also be recovered to perform an exit when Goexit is executed.
The second part is to do the panic recovery, taking the program counters pc and sp from runtime._defer
and calling recovery to trigger the program recovery.
The recovery here sets the return value of the function to 1, and the call to gogo jumps back to where the defer
keyword was called, and the Goroutine continues to execute.
|
|
We know from the comments that when deferproc returns a value of 1, the compiler generates code that jumps directly to the caller’s function before it returns and executes runtime.deferreturn
.
What are the pitfalls in runtime?
Just because we don’t recommend using panic when implementing our business doesn’t mean it’s not used in runtime, which is a big trap for newcomers who don’t know the underlying Go implementation. It is impossible to write robust Go code if you are not familiar with these pitfalls.
Here I’ll categorise the exceptions in runtime, some of which are not caught by recover, and some of which are normal panics that can be caught.
Uncatchable exceptions
memory overflow
If you call alloc to allocate memory, you will call grow to request new memory from the system. If you call mmap to request memory and return _ENOMEM, you will throw a runtime: out of memory
exception, and throw will call exit to cause the whole program to exit.
|
|
map Concurrent read and write
Since map is not thread-safe, it throws a concurrent map read and map write
exception when it encounters concurrent reads and writes, which causes the program to exit straight away.
The throw here, like above, will eventually be called to exit.
I used to work in java, and when I encountered concurrent stateful problems with hashmap, it just threw an exception and didn’t cause the program to crash.
The official explanation for this is as follows.
The runtime has added lightweight, best-effort detection of concurrent misuse of maps. As always, if one goroutine is writing to a map, no other goroutine should be reading or writing the map concurrently. If the runtime detects this condition, it prints a diagnosis and crashes the program. The best way to find out more about the problem is to run the program under the race detector, which will more reliably identify the race and give more detail.
running out of stack memory
This example would return.
Let me briefly explain the basic mechanics of the stack.
In Go, Goroutines do not have a fixed stack size. Instead, they start small (say 4KB) and grow/shrink as needed, seemingly giving the impression of an “infinite” stack. But growth is always finite, but this limit comes not from the call depth limit, but from the stack memory limit, which is 1GB on Linux 64-bit machines.
In stack expansion, it is checked that the new stack size exceeds the threshold 1 << 20
, and if it does, throw("stack overflow")
is called and an exit is executed, causing the whole program to crash.
tries to give the nil function to goroutine to start
Here too, it will simply crash.
All threads are hibernating
Normally, not all threads in a program will be hibernating, there will always be threads running to handle our tasks, e.g.
However, some students have done some “interesting” things, such as not handling the logic of our code very well and adding some code to the logic that will permanently block.
For example, if you add a select to a goroutine, this will cause a permanent block, and go will crash the program if it detects that there is no goroutine left to run.
|
|
Exceptions that can be caught
array ( slice ) subscript out of bounds
Return.
Because of the use of recover
in the code, the program resumes with the output exit
.
null pointer exception
Return.
In addition to the above, another common scenario is that we have a variable that is initialized but left empty, but the Receiver is a pointer.
sends data to a chan that has been closed
Results
|
|
When sending, it is determined whether the chan has been closed.
Type Assertion
Results
So when asserting we need to use an assertion with two return values.
There are quite a few errors like the one above, so if you want to look deeper, you can do so on stackoverflow.
Summary
This article started with an example and then explained the source code of panic & recover. Some exceptions are not caught by recover, and some are caught by normal panic, so we need to pay attention to them from time to time to prevent the application from crashing.