Python Internals (How Python Works, GIL Explained)

Python is widely loved for its simplicity and readability, but behind this clean syntax lies a sophisticated execution model. Understanding Python internals helps you write more efficient code, debug complex issues, and make better architectural decisions.

In this article, we will explore how Python executes code, how memory and objects are handled, and the role of the Global Interpreter Lock (GIL).

From Source Code to Execution

When you write Python code, it does not get directly executed by the CPU. Instead, Python follows a multi-step execution process.

First, your source code (.py file) is compiled into bytecode, which is a lower-level, platform-independent representation of your code. This bytecode is stored in .pyc files inside the __pycache__ directory.

Then, the Python Virtual Machine (PVM) executes this bytecode line by line. The PVM acts as an interpreter that converts bytecode instructions into operations performed by the system.

This two-step approach makes Python portable across platforms.

The Python Interpreter

The most commonly used Python implementation is CPython, written in C. It is responsible for:

- Parsing Python code
- Compiling it into bytecode
- Executing bytecode using the PVM

Other implementations like PyPy or Jython exist, but CPython is the standard and the one where concepts like the GIL are most relevant.

Python Object Model

Everything in Python is an object, including integers, strings, functions, and even classes. Each object contains:

- Type information (what kind of object it is)
- Value (the actual data)
- Reference count (used for memory management)

For example:

a = 10
b = a

Both a and b refer to the same object in memory. Python manages these references internally.

This object-based design makes Python highly flexible but introduces some performance overhead compared to lower-level languages.

Stack and Heap Memory

Python uses two primary memory areas:

- Stack memory: Stores function calls, local variables, and execution context
- Heap memory: Stores objects and data structures

When a function is called, a new stack frame is created. Objects created during execution are stored in the heap and managed by Python's memory manager.

Dynamic Typing and Late Binding

Python is dynamically typed, meaning variable types are determined at runtime. This flexibility comes from the fact that variables are simply references to objects.

x = 10      # integer
x = "Hello" # now a string

This dynamic nature is powerful but requires additional overhead during execution.

The Global Interpreter Lock (GIL)

One of the most discussed aspects of Python internals is the Global Interpreter Lock (GIL). The GIL is a mutex (mutual exclusion lock) that ensures that only one thread executes Python bytecode at a time in a single process.

This means that even on multi-core systems, Python threads cannot achieve true parallel execution for CPU-bound tasks.

Why Does the GIL Exist?

The GIL simplifies memory management, particularly reference counting. Since multiple threads modifying reference counts simultaneously could lead to race conditions, the GIL ensures thread safety by allowing only one thread to execute at a time.

Without the GIL, Python would need more complex locking mechanisms, which could introduce additional overhead.

Impact of the GIL

The GIL has different implications depending on the type of task:

For CPU-bound tasks, the GIL becomes a bottleneck because threads cannot run in parallel. This limits performance gains from multithreading.

For I/O-bound tasks, the GIL is less of an issue because it is released during I/O operations. This allows other threads to run while one thread is waiting.

Example: GIL Limitation

import threading

def compute():
    total = 0
    for i in range(10_000_000):
        total += i
    return total

t1 = threading.Thread(target=compute)
t2 = threading.Thread(target=compute)

t1.start()
t2.start()

t1.join()
t2.join()

Even with two threads, this CPU-intensive task will not run significantly faster due to the GIL.

Overcoming the GIL

To bypass the limitations of the GIL, Python provides alternatives:

Multiprocessing: Uses separate processes with independent memory and interpreters
C extensions: Libraries like NumPy release the GIL for heavy computations
Async programming: Efficient for I/O-bound concurrency

These approaches help achieve better performance depending on the use case.

Python uses a mechanism called context switching to switch between threads. The interpreter periodically releases the GIL (after a certain number of bytecode instructions or time interval) to allow other threads to run.

This creates the illusion of concurrency, even though only one thread executes at a time.

Conclusion

Understanding Python internals helps explain why Python is slower than compiled languages like C or Java. The overhead comes from:

- Dynamic typing
- Object abstraction
- Bytecode interpretation
- GIL constraints

However, Python compensates with developer productivity and a rich ecosystem of optimized libraries.