Summary
In production systems, data corruption is often more catastrophic than data loss. A common vulnerability in many applications is the standard file write operation. If power fails or the application crashes halfway through writing a file, you are left with a partially written, corrupted file that can break your entire system upon restart. To achieve reliability, we must ensure that file updates are atomic: they either happen completely or not at all.
The Vulnerability of Standard Writes
Consider the standard way most developers write files in Go:
// DANGEROUS: Do not use for critical data
f, _ := os.Create("config.json")
f.Write(data)
f.Close()
If the server loses power after os.Create truncates the file but before f.Write completes, your config.json is now empty (0 bytes). If it crashes during the write, the file contains half a JSON object, which is invalid syntax. When your service restarts, it will crash trying to parse this corrupted file.
The Solution: Write-Sync-Rename
To prevent this, we use a three-step pattern that leverages the atomicity guarantees of the operating system's file system.
- Write to Temp: Write the new data to a temporary file (e.g.,
config.json.tmp). If this fails, the original file is untouched. - Flush to Disk: Force the operating system to flush the data from memory buffers to the physical disk using
fsync. - Atomic Rename: Rename the temporary file to the target filename (
os.Rename).
Insight: On POSIX systems (Linux, macOS), os.Rename is an atomic operation. This means that at any given microsecond, an observer will see either the old version of the file or the new version. There is no intermediate state where the file is missing or partially written. This is the "All or Nothing" guarantee that databases rely on.
Implementation in Go
Here is a robust, production-ready function for atomic file writes.
package utils
import (
"fmt"
"io/ioutil"
"os"
"path/filepath"
)
// WriteFileAtomic writes data to a file ensuring that the file is either
// fully written or not modified at all.
func WriteFileAtomic(filename string, data []byte, perm os.FileMode) error {
dir := filepath.Dir(filename)
// 1. Create a temporary file in the same directory
// It's critical to be in the same directory to ensure we are on the same
// filesystem partition, otherwise os.Rename might fail or not be atomic.
tmpFile, err := ioutil.TempFile(dir, "tmp-*")
if err != nil {
return fmt.Errorf("failed to create temp file: %w", err)
}
// Ensure we clean up the temp file if something goes wrong
tmpName := tmpFile.Name()
defer func() {
tmpFile.Close()
// If the rename didn't happen, remove the temp file
if _, err := os.Stat(tmpName); err == nil {
os.Remove(tmpName)
}
}()
// 2. Write data to the temp file
if _, err := tmpFile.Write(data); err != nil {
return fmt.Errorf("failed to write data: %w", err)
}
// 3. Sync to disk (Critical Step!)
// Just writing to the file descriptor isn't enough; the OS might hold
// it in memory. Sync() forces the write to physical storage.
if err := tmpFile.Sync(); err != nil {
return fmt.Errorf("failed to sync data to disk: %w", err)
}
// Close the file before renaming
if err := tmpFile.Close(); err != nil {
return fmt.Errorf("failed to close temp file: %w", err)
}
// 4. Atomic Rename
// This replaces the destination file with the temp file atomically.
if err := os.Rename(tmpName, filename); err != nil {
return fmt.Errorf("failed to rename file: %w", err)
}
// Success! The defer will try to remove tmpName, but since it was renamed,
// os.Stat will fail and os.Remove won't be called on the new file.
return nil
}
Critical Considerations
Directory Boundaries
The temporary file must be on the same filesystem partition as the target file. os.Rename cannot move files atomically across different partitions (e.g., from /tmp to /home). That is why our code creates the temp file in filepath.Dir(filename).
Performance Cost
Calling Sync() (fsync) is expensive because it forces the disk head to move and write data immediately. While necessary for correctness, you should avoid doing this in a tight loop. For high-frequency writes, consider using a Write Ahead Log (WAL) or appending to a file instead of rewriting it entirely.
Windows Nuances
While POSIX guarantees atomicity, Windows has historically had stricter file locking. However, Go's os.Rename implementation on Windows attempts to simulate POSIX behavior by replacing the destination file if it exists. For cross-platform Go applications, this pattern is generally safe and standard practice.