I wrote a background job that backs up a git repo after every commit, with a poll interval, such that many commits submitted rapidly only triggers one backup process.

The tests have been intermittently failing on CI, for seemingly odd reasons:

  1. Removing the test git directory during setup
Error: Received unexpected error:
       unlinkat data/testGitOrg: directory not empty
  1. Trying to backup after the original repo has already been removed (maybe)
target OID for the reference doesn't exist on the repository

I added some more logging, and it looks like a race condition is in play. I have the following logic to trigger a backup to run:

	for {
		select {
		case <-ctx.Done():
			log.Println("cancelled")
			return
		default:
			time.Sleep(pollInterval)

			if ready(batchInterval) {
				Run(src) // this is the actual backup routine
			}
		}
	}

I context I pass into this loop is instantiated thusly:

    	gbctx, gbcancel := context.WithCancel(context.Background())

gbcancel gets called at the bottom of each test.

My current theory is that, if we are in the middle of a Run, then the cancel won’t be responded to until that Run is complete. Now that I’m typing this out, this logic is obvious…

The problem is, I don’t know how long the commits+backup are going to take during the test run. It seems lame to just bump up the wait time such that it’s extremely unlikely that a race will occur. There is a better solution, though. The Run func uses a lock, so that more than one goroutine doesn’t try to interact with the backup repository. So, if I add an additional Run call after the gbcancel, the test will be locked until a previous Run completes!

Even better, if I create a new method WaitForRunToComplete, and lock/unlock the same mutex, I can wait for a previous Run to complete without needing to execute another one.

This seems to be working, so far.