One Billion Row Challenge

How to Process a Billion Rows Fast

Nov 27, 2024

Hey there! Today, let’s talk about a code optimization challenge.

The One Billion Row Challenge (1brc.dev) is a programming challenge designed to test how fast we can process a file containing 1 billion lines of data. Think of it as a fun way to sharpen your skills in performance optimization.

The input is a UTF-8 file where each line consists of a station name and a temperature. Stations can appear multiple times in the file, like this:

Tokyo;12.8
Marseille;28.1
Philadelphia;10.1
Tokyo;14.8
...
Philadelphia;-4.1

The goal is to write an application that:

Reads the file.
Calculate the min, mean, and max temperatures per station.
Write the results on stdout (sorted by station name) in a specific format.

The challenge was initially posted on GitHub under gunnarmorling/1brc and was specific to Java. Due to its growing popularity, it has been extended to several other languages:

C/C++
C#
Go
JavaScript
PHP
Python
Rust
Zig

If you have already tackled this challenge, please share your solutions in the comments. If not, this is a great opportunity to delve into code optimization. While the problem statement may sound straightforward, we can learn a lot about different optimization techniques we may not yet be familiar with, such as concurrency, branch prediction1, memory mapping2, SIMD3, or any other strategies.

Tomorrow, we will discuss a distributed systems coding challenge.

A way to help the CPU when it attempts to guess the outcome of a conditional operation in order to minimize the delays caused by code branching.

A technique to map files or devices into a process’s virtual space without reading everything into memory,

A technique where a single instruction operates simultaneously on multiple data.

One Billion Row Challenge

How to Process a Billion Rows Fast

Comments