Implementation of retry mechanism

When a service requests a resource, if it encounters a network exception or other situation that causes the request to fail, it needs a retry mechanism to continue the request. A common practice is to retry 3 times and sleep randomly for a few seconds. For business development scaffolding, the HTTP Client basically encapsulates the retry method and automatically retries when the request fails according to the configuration. Here is an example of a common HTTP Client to see how it implements request retry. Finally, the implementation of some other retry mechanisms are organized.

Implementation of go-resty retry mechanism

Let’s look at the implementation of go-resty to request a retry when sending HTTP requests.

// Execute method performs the HTTP request with given HTTP method and URL
// for current `Request`.
// 		resp, err := client.R().Execute(resty.GET, "http://httpbin.org/get")
func (r *Request) Execute(method, url string) (*Response, error) {
	var addrs []*net.SRV
	var resp *Response
	var err error

	if r.isMultiPart && !(method == MethodPost || method == MethodPut || method == MethodPatch) {
		return nil, fmt.Errorf("multipart content is not allowed in HTTP verb [%v]", method)
	}

	if r.SRV != nil {
		_, addrs, err = net.LookupSRV(r.SRV.Service, "tcp", r.SRV.Domain)
		if err != nil {
			return nil, err
		}
	}

	r.Method = method
	r.URL = r.selectAddr(addrs, url, 0)

	if r.client.RetryCount == 0 {
		resp, err = r.client.execute(r)
		return resp, unwrapNoRetryErr(err)
	}

	attempt := 0
	err = Backoff(
		func() (*Response, error) {
			attempt++

			r.URL = r.selectAddr(addrs, url, attempt)

			resp, err = r.client.execute(r)
			if err != nil {
				r.client.log.Errorf("%v, Attempt %v", err, attempt)
			}

			return resp, err
		},
		Retries(r.client.RetryCount),
		WaitTime(r.client.RetryWaitTime),
		MaxWaitTime(r.client.RetryMaxWaitTime),
		RetryConditions(r.client.RetryConditions),
	)

	return resp, unwrapNoRetryErr(err)
}

Retry flow

Sort out the retry flow of Execute(method, url) at request time.

If no retry count is set, execute r.client.execute(r) : request Request directly, return Response and error.
If r.client.RetryCount is not equal to 0, execute Backoff() function.
The Backoff() method takes a handler argument, makes attempt network requests according to the retry policy, and takes function arguments such as Retries(), WaitTime().

The Backoff function

Focus on what the Backoff() function does.

The Backoff() code is as follows.

// Backoff retries with increasing timeout duration up until X amount of retries
// (Default is 3 attempts, Override with option Retries(n))
func Backoff(operation func() (*Response, error), options ...Option) error {
	// Defaults
	opts := Options{
		maxRetries:      defaultMaxRetries,
		waitTime:        defaultWaitTime,
		maxWaitTime:     defaultMaxWaitTime,
		retryConditions: []RetryConditionFunc{},
	}

	for _, o := range options {
		o(&opts)
	}

	var (
		resp *Response
		err  error
	)

	for attempt := 0; attempt <= opts.maxRetries; attempt++ {
		resp, err = operation()
		ctx := context.Background()
		if resp != nil && resp.Request.ctx != nil {
			ctx = resp.Request.ctx
		}
		if ctx.Err() != nil {
			return err
		}

		err1 := unwrapNoRetryErr(err)           // raw error, it used for return users callback.
		needsRetry := err != nil && err == err1 // retry on a few operation errors by default

		for _, condition := range opts.retryConditions {
			needsRetry = condition(resp, err1)
			if needsRetry {
				break
			}
		}

		if !needsRetry {
			return err
		}

		waitTime, err2 := sleepDuration(resp, opts.waitTime, opts.maxWaitTime, attempt)
		if err2 != nil {
			if err == nil {
				err = err2
			}
			return err
		}

		select {
		case <-time.After(waitTime):
		case <-ctx.Done():
			return ctx.Err()
		}
	}

	return err
}

Sort out the flow of the Backoff() function.

Backoff() receives a handler function and an optional Option function (retry optione) as arguments
default policy 3 retries, customize retry policy by Step 1 preset Options.
set the repsonse and error variables of the request
start the opts.maxRetries HTTP request:
1. execute the handler function (initiate the HTTP request)
2. If the return result is not empty and the context is not empty, keep the request context for repsonse. If the context is wrong, exit the Backoff() process
3. execute retryConditions(), set the conditions to check for retry.
4. determine whether to exit the process based on needsRetry
5. calculate the duration by sleepDuration() (based on the request resp, wait time configuration, maximum timeout and the number of retries to calculate sleepDuration. Time algorithm is relatively complex, refer to: Exponential Backoff And Jitter)
6. waitTime for the next retry. If the request completes, exit the process.

A simple demo

See the request for a specific HTTP Client (with a simple wrapper).

func getInfo() {
	request := client.DefaultClient().
		NewRestyRequest(ctx, "", client.RequestOptions{
			MaxTries:      3,
			RetryWaitTime: 500 * time.Millisecond,
			RetryConditionFunc: func(response *resty.Response) (b bool, err error) {
				if !response.IsSuccess() {
					return true, nil
				}
				return
			},
		}).SetAuthToken(args.Token)
	resp, err := request.Get(url)
	if err != nil {
		logger.Error(ctx, err)
    return 
	}

	body := resp.Body()
	if resp.StatusCode() != 200 {
    logger.Error(ctx, fmt.Sprintf("Request keycloak access token failed, messages:%s, body:%s","message", resp.Status(),string(body))),
		)
    return 
	}
  ...
}

According to the go-resty request process sorted out above, since RetryCount is greater than 0, a retry mechanism is performed with a retry count of 3. Then request.Get(url) enters the Backoff() process, where the boundary condition for retry is: !response.IsSuccess() , until the request succeeds.

Some other implementations of retry mechanisms

As you can see, go-resty’s retry policy is not very simple, it is a well-developed, customizable mechanism that takes HTTP request scenarios into account, and its business attributes are relatively heavy.

Let’s take a look at two common implementations of Retry.

Implementation 1

// retry retries ephemeral errors from f up to an arbitrary timeout
func retry(f func() (err error, mayRetry bool)) error {
	var (
		bestErr     error
		lowestErrno syscall.Errno
		start       time.Time
		nextSleep   time.Duration = 1 * time.Millisecond
	)
	for {
		err, mayRetry := f()
		if err == nil || !mayRetry {
			return err
		}

		if errno, ok := err.(syscall.Errno); ok && (lowestErrno == 0 || errno < lowestErrno) {
			bestErr = err
			lowestErrno = errno
		} else if bestErr == nil {
			bestErr = err
		}

		if start.IsZero() {
			start = time.Now()
		} else if d := time.Since(start) + nextSleep; d >= arbitraryTimeout {
			break
		}
		time.Sleep(nextSleep)
		nextSleep += time.Duration(rand.Int63n(int64(nextSleep)))
	}

	return bestErr
}

Each retry waits a randomly extended amount of time until f() completes or until there are no more retry attempts.

Implementation 2

func Retry(attempts int, sleep time.Duration, f func() error) (err error) {
	for i := 0; ; i++ {
		err = f()
		if err == nil {
			return
		}

		if i >= (attempts - 1) {
			break
		}

		time.Sleep(sleep)

	}
	return fmt.Errorf("after %d attempts, last error: %v", attempts, err)
}

The number of retries for the function is attempts, each time waiting for the sleep time until f() finishes executing.

Table of Contents