CosX AI

8 min read

Python 3.14 finally introduces a free-threaded build. Here's how I tested it with real multi-core workloads — and what it means for Python's future.

The Ghost of the GIL

For as long as I've written Python, there's been one phrase that inevitably shows up in every discussion about performance — the GIL. The Global Interpreter Lock, or GIL, has been both a salvation and a curse for Python developers. It made memory management simple and safe, but it also tied Python's hands. Because of it, only one thread could run Python bytecode at a time. No matter how powerful your machine was, Python never truly used more than one core.

That limitation quietly shaped how we wrote Python for decades. Whenever we needed parallel performance — we reached for subprocesses — multiprocessing, offloading to C extensions, or even rewriting critical paths in Rust or C++. It worked, but it was never elegant. It always felt like Python was jogging while the hardware beneath it was sprinting.

This year, that changed.

With Python 3.14, released in October 2025, CPython finally introduced a free-threaded build — a version of the interpreter where the GIL can be completely disabled. For the first time, threads can truly run in parallel across multiple cores. It's the biggest change to Python's runtime in more than three decades.

Putting it to the Test

I wanted to see what this really meant in practice, so I set up two environments on my MacBook using pyenv. One was Python 3.12, the standard build with the GIL. The other was Python 3.14t, compiled in the free-threading mode using the -t flag.

The default 3.14 installer still includes the GIL, so developers need to opt into the free-threaded build manually.

To keep things simple, I wrote a small script that creates eight threads — one for each core — and uses each of them to calculate the sum of squares up to forty million. Pure Python math. Heavy on CPU, no I/O, no tricks.

Here's a rough look at what it runs:

run_of_squares.py

import threading
import time
import multiprocessing


def num_of_squares(n):
    """Helper function to calculate the sum of squares from 0 to n-1"""
    result = sum(i**2 for i in range(n))
    return result


def worker_thread(n):
    """Worker function for the multi-threaded sum of squares benchmark"""
    name = threading.current_thread().name
    print(f"Worker {name}: starting")
    start_time = time.time()
    result = num_of_squares(n)
    end_time = time.time()
    thread_time = end_time - start_time
    print(f"Worker {name}: calculated sum of squares for {n:,} numbers in {thread_time:.2f} seconds")
    return result


def main():
    run_cores = multiprocessing.cpu_count()
    start_time = time.time()

    # One CPU task per thread — raise range for a more demanding benchmark
    threads = []
    for i in range(run_cores):
        thread = threading.Thread(
            target=worker_thread,
            args=(40_000_000,),
            name=str(i),
        )
        threads.append(thread)

    for thread in threads:
        thread.start()

    # Wait for all threads to complete
    for thread in threads:
        thread.join()

    end_time = time.time()
    total_time = end_time - start_time
    print(f"All workers completed in {total_time:.2f} seconds")


if __name__ == "__main__":
    main()

Each thread does the same job, crunching numbers as fast as it can. The only variable was the Python version.

The Results: Eight Cores, One Revelation

Let's see how long it takes to run using regular Python 3.12:

Python 3.12 (with GIL)

$ python run_of_squares.py
Worker 0: starting
Worker 1: starting
Worker 2: starting
Worker 3: starting
Worker 4: starting
Worker 5: starting
Worker 6: starting
Worker 7: starting
Worker 0: calculated sum of squares for 40,000,000 numbers in 4.48 seconds
Worker 1: calculated sum of squares for 40,000,000 numbers in 4.51 seconds
Worker 2: calculated sum of squares for 40,000,000 numbers in 4.54 seconds
Worker 3: calculated sum of squares for 40,000,000 numbers in 4.57 seconds
Worker 4: calculated sum of squares for 40,000,000 numbers in 4.59 seconds
Worker 5: calculated sum of squares for 40,000,000 numbers in 4.62 seconds
Worker 6: calculated sum of squares for 40,000,000 numbers in 4.65 seconds
Worker 7: calculated sum of squares for 40,000,000 numbers in 4.68 seconds
All workers completed in 36.05 seconds

Now, with the GIL-free version:

Python 3.14t (free-threaded)

$ python run_of_squares.py
Worker 0: starting
Worker 1: starting
Worker 2: starting
Worker 3: starting
Worker 4: starting
Worker 5: starting
Worker 6: starting
Worker 7: starting
Worker 0: calculated sum of squares for 40,000,000 numbers in 1.38 seconds
Worker 1: calculated sum of squares for 40,000,000 numbers in 1.39 seconds
Worker 2: calculated sum of squares for 40,000,000 numbers in 1.42 seconds
Worker 3: calculated sum of squares for 40,000,000 numbers in 1.41 seconds
Worker 4: calculated sum of squares for 40,000,000 numbers in 1.43 seconds
Worker 5: calculated sum of squares for 40,000,000 numbers in 1.45 seconds
Worker 6: calculated sum of squares for 40,000,000 numbers in 1.46 seconds
Worker 7: calculated sum of squares for 40,000,000 numbers in 1.47 seconds
All workers completed in 11.17 seconds

On Python 3.12, the result was exactly what we all come to expect. Every worker ran independently, but never truly together. The program took 36.05 seconds. After that, I switched to Python 3.14t's free-threaded build. I ran the same code — same logic — and the computation finished in just 11.17 seconds. That's nearly three times faster purely because the interpreter no longer forced everything through one thread at a time.

Why This Changes Everything

It's hard to exaggerate how transformative this is. For years, Python's threading library has existed mostly for I/O-bound tasks — reading files, handling web requests, waiting for network responses — but it was never useful for real CPU-bound work. With the GIL gone, you no longer have to juggle process pools or shared memory queues just to take advantage of all your cores. You can write single-threaded Python code and use parallel processing, data pipelines, embeddings, deep learning transformations, or agent reasoning that finally scales linearly with the number of cores.

It also changes how you think about Python servers. Models that rely on async I/O and threading will now be far more performant. And since threads share memory, the overhead stays much lower than in multiprocess setups.

A New Chapter for Python

Of course, there are caveats. Single-threaded programs might run a little slower — typically 5 to 10 percent — because every object operation uses atomic reference counting. Some C extensions will need small updates to adapt to the new model. But those are transition costs. The real story is that Python can finally grow into the multi-core world we've been living in for years.

For me, watching these eight threads blaze through their tasks in parallel isn't just a performance win. It felt symbolic: it broke Python's old limitations. For decades, the GIL has been the punchline in every performance debate about the language. Now, with Python 3.14, that punchline is beginning to fade.

This isn't just an optimization. It's liberation. Python 3.14 doesn't merely make your code faster — it lets the language finally use all the power your machine has been offering all along.

Talk to us.

AI Transformation

Ready to Scale with AI?

Join forward-thinking teams already working with CosX AI. Tell us your challenge and we'll map the highest-value workflows to automate first.

Book a call today

Written by

CosX AI

Engineering

Published

October 15, 2025

Duration

8 min read

Python 3.14 and the Fall of the GIL: How My Code Finally Used All Eight Cores

The Ghost of the GIL

Putting it to the Test

The Results: Eight Cores, One Revelation

Why This Changes Everything

A New Chapter for Python

Onesignal Integration with Next.js PWA

Onesignal Integration with Next.js PWA

Beyond LGTM: Bringing Speed and Quality to Code Reviews with AI Agents

Beyond LGTM: Bringing Speed and Quality to Code Reviews with AI Agents

Don't Get Left Behind: The Smartest Web3 Chains for Forward-Thinking Businesses

Don't Get Left Behind: The Smartest Web3 Chains for Forward-Thinking Businesses