1-bit models and the future of on-device AI

I’ll share why this is important and provide a light explainer so you leave knowing a little more about ML.

TLDR

The team saw comparable inference performance on simple benchmarks on a 1-bit trained model vs a 16-bit model by ‘drop-in’ replacing key layers of the model called the “transformer”. This makes them almost 3x faster, use nearly 4x less GPU memory, and almost 100x less energy.

Let’s step back to: “Why this matters”

You’ve probably heard Machine Learning can be an computationally expensive energy hog. According to OpenAI, a single GPT query consumes 1567% (15x) more energy than a Google search query.

If you were to ask “where can we get the most impact on improving the efficiency?” You’d want to look at what tasks are performed the most often.

That task in language models is during training and is “updating the (hundreds of) billions of weights” which results in A LOT of matrix calculations, often on specialized processing infrastructure called Graphical Processing Units (GPUs).

IF you wanted to make a tremendous impact, reducing the cost of updating and storing the weights in memory is going to be a big win. Also, allowing more weights in memory because they each take up less space is a win.

This paper explores something that’s biologically inspired. It aligns models with the idea of Human neurons having a simple binary (on or off) nature.

Most modern models use 16-bit (think large decimal numbers aka ‘floating point precision’) to represent weights. Open Source models and tools that allow us to run models locally on smaller infrastructure “quantize” or turn the model from 16 to 8 or 4 bits so they run with a much smaller memory requirement. “Quantize” can be considered a form of “rounding” so there is accuracy loss — the question is — do we need that accuracy?

This team simplified it and said, what if we use 1-bit, like the brain?

And it seems to have worked and they saw nearly two orders of magnitude decrease in energy usage, faster responses from the trained model, and less memory usage during and after training.

In summary: it’s on par (not better), much faster, and much cheaper.

Shrinking models and getting them doing similarly for less is, obviously, a win for AI companies and for consumers.

This research also aligns with the general trend of models becoming smaller and exist on our personal devices.

TLDR

Let’s step back to: “Why this matters”

Related Posts

The Lake They Couldn't See

Isolation Is the New Runtime

What AI Needs That $700 Billion Can't Buy