Python is widely used for data processing, automation, and scripting, but it’s often criticized for being slow in CPU-bound tasks due to the Global Interpreter Lock (GIL). Fortunately, Python’s built-in multiprocessing
module allows you to fully utilize multiple CPU cores, enabling true parallelism and greatly improving performance.
In this article, you’ll learn how multiprocessing works, how it differs from multithreading, and how to implement it to make your Python programs faster and more efficient.
Why Use Multiprocessing?
When to Use:
- Heavy computations (e.g., numerical simulations, data crunching)
- CPU-bound tasks (vs I/O-bound tasks which are better with threading
- Parallel processing of independent tasks (e.g., image conversion, API calls)
Multiprocessing vs. Multithreading
Feature | Multiprocessing | Multithreading |
---|---|---|
Uses multiple… | Processes (separate memory) | Threads (shared memory) |
Bypasses GIL? | Yes | No (in CPython) |
Ideal for… | CPU-bound tasks | I/O-bound tasks |
Getting Started with multiprocessing
Step 1: Import the module
import multiprocessing
Step 2: Define a function to run in parallel
def square(n):
return n * n
Step 3: Use a Pool
to distribute tasks
if __name__ == '__main__':
with multiprocessing.Pool() as pool:
numbers = [1, 2, 3, 4, 5]
results = pool.map(square, numbers)
print(results)
Output:
[1, 4, 9, 16, 25]
Example: Parallel Processing with Pool
from multiprocessing import Pool
import time
def slow_task(x):
time.sleep(1)
return x * 2
if __name__ == '__main__':
start = time.time()
with Pool(processes=4) as pool: # Use 4 worker processes
results = pool.map(slow_task, range(8))
print(f"Results: {results}")
print(f"Time taken: {time.time() - start:.2f} seconds")
8 tasks run in about 2 seconds instead of 8 — big performance gain!
Using Process
for More Control
For more custom behavior:
from multiprocessing import Process
def greet(name):
print(f"Hello, {name}!")
if __name__ == '__main__':
p1 = Process(target=greet, args=('Alice',))
p2 = Process(target=greet, args=('Bob',))
p1.start()
p2.start()
p1.join()
p2.join()
Sharing Data Between Processes
Use multiprocessing.Queue
or multiprocessing.Value
for inter-process communication.
Example with Queue
:
from multiprocessing import Process, Queue
def worker(q):
q.put('Data from worker')
if __name__ == '__main__':
q = Queue()
p = Process(target=worker, args=(q,))
p.start()
print(q.get()) # Output from worker
p.join()
Tips and Best Practices
-
Always guard your code with
if __name__ == "__main__"
to avoid infinite recursion on Windows. -
Use
Pool
for mapping functions over lists. -
Don't share regular Python objects across processes — use
Queue
,Value
, orManager
objects. -
Use
concurrent.futures.ProcessPoolExecutor
(Python 3.4+) for a higher-level API.
Performance Comparison
Task Type | Sequential | Multiprocessing |
---|---|---|
4 slow tasks | 4 sec | ~1 sec |
Image processing | 20 sec | ~5 sec |
Large math calc | 10 sec | ~2-3 sec |
Summary
Feature | Description |
---|---|
multiprocessing.Pool |
Best for parallelizing a function over data |
multiprocessing.Process |
Full control over separate processes |
Queue / Value |
Share data between processes |
Avoid shared state | Each process has its own memory |
Multiprocessing is a powerful feature in Python that allows your scripts to run faster and more efficiently on multi-core systems. Whether you're processing files, analyzing data, or building a high-performance application, using multiprocessing
can drastically reduce execution time and improve throughput.
0 Comments:
Post a Comment