Multithreading in Python
This document covers the basics of multithreading in the Python programming language. Multiprocessing as well as multithreading is a method to achieve multitasking. In multithreading, the concept of a thread is used. First, let’s understand the concept of a thread in computer architecture.
What is a Process in Python?
In computing, a process is an instance of a running computer program. Every process has three basic components.
- An executable program.
- Relevant data required by the program (variables, workspace, buffers, etc.)
- The execution context of the program (process state)
Introduction to Python Threading
A thread is an entity within a process that can be scheduled for execution. It is also the smallest unit of processing that can be executed by the OS (operating system). Simply put, a thread is a sequence of instructions within a program that can be executed independently from other code. For simplification, a thread can be considered simply a subset of a process. Threads contain all this information in a Thread Control Block (TCB).
- Thread Identifier: A unique ID (TID) is assigned to each new thread.
- Stack Pointer: Points to the thread stack of the process. The stack contains local variables defined within the thread’s scope.
- Program Counter: A register that holds the address of the instruction currently being executed by the thread.
- Thread State: Can be running, ready, waiting, starting, or completed.
- Set of Registers for the Thread: Registers assigned to the thread for computation.
- Pointer to Parent Process: A pointer to the process control block (PCB) of the process that the thread belongs to.
Understanding how to implement multithreading can significantly enhance the performance of I/O-bound and certain types of network applications. However, mastering this concept requires a clear understanding of some advanced features of Python. For those looking to advance these skills further, a free Python course includes a dedicated module on multithreading.
To understand the relationship between processes and threads, consider the diagram below.
Relationship Between Processes and Threads
There can be multiple threads within a single process in the following cases:
- Each thread includes its ownset of registers andlocal variables (stored on the stack).
- All threads in a process shareglobal variables (stored in the heap) andprogram code.
To understand how multiple threads exist in memory, consider the diagram below.
Multiple Threads in Memory
Introduction to Python Threading
Multithreading is defined as the ability of a processor to execute multiple threads simultaneously. In a simple single-core CPU, this is achieved using frequent context switching between threads. This is calledcontext switching. In context switching, the state of a thread is saved each time an interrupt (due to I/O or manual setup) occurs, and the state of another thread is loaded. The context switching happens so frequently that it appears as if all threads are running in parallel (this is referred to asmultitasking).
Look at the diagram below that contains two active threads in a process.
Multithreading
Multithreading in Python
The threading module in Python provides a very simple and intuitive API for creating multiple threads in a program. Let’s understand multithreading code step by step.
Step 1: Import the module
First, import the threading module.
import threading
Step 2: Create Threads
To create a new thread, create an object of the Thread class. Use ‘target’ and ‘args’ as parameters. The target is the function that will be executed by the thread, and args are the arguments that will be passed to the target function .
t1 = threading.Thread(target, args)
t2 = threading.Thread(target, args)
Step 3: Start the Thread
To start a thread, use thestart() method of the Thread class.
t1.start()
t2.start()
Step 4: End Thread Execution
Once the threads are started, the current program (think of it as the main thread) continues to execute as well. To stop the execution of the current program until the threads complete, use thejoin() method.
t1.join()
t2.join()
As a result, the current program waits fort1 to finish first, and then waits fort2 to complete. Once the tasks are done, the remaining statements of the current program are executed.
Example:
Let’s look at a simple example using the threading module.
This code demonstrates how to calculate the square and cube of a number simultaneously using Python’s threading module. Two threads are created to perform these calculations. The results are printed in parallel before the program prints “Done!” after both threads have completed. Threading is used to achieve parallelism when handling computation-intensive tasks and enhance program performance. t1 t2
Python
import threading
def print_cube(num):
print("Cube: {}" .format(num * num * num))
def print_square(num):
print("Square: {}" .format(num * num))
if __name__ =="__main__":
t1 = threading.Thread(target=print_square, args=(10,))
t2 = threading.Thread(target=print_cube, args=(10,))
t1.start()
t2.start()
t1.join()
t2.join()
print("Done!")
Output
Square: 100
Cube: 1000
Done!
Output:
Square: 100
Cube: 1000
Done!
To better understand how the above program works, consider the diagram below.
Multithreading
Example:
This example uses theos.getpid() function to get the ID of the current process. Thethreading.main_thread() function is used to retrieve the main thread object. Under normal conditions, the main thread is the thread in which the Python interpreter was started. Thename attribute of the thread object is used to obtain the thread’s name. Then, thethreading.current_thread() function is used to retrieve the current thread object.
Consider the following Python program that prints the thread name and its associated process for each task.
This code demonstrates how to run two tasks simultaneously using Python’s threading module. The main program starts two threads, each executing a specific task. The threads run in parallel, and the code provides information about the process ID and thread name. The module is used to access process IDs, and the‘ module is used to manage threads and thread execution. t1 t2osthreading’
Python
import threading
import os
def task1():
print("Task 1 assigned to thread: {}".format(threading.current_thread().name))
print("ID of process running task 1: {}".format(os.getpid()))
def task2():
print("Task 2 assigned to thread: {}".format(threading.current_thread().name))
print("ID of process running task 2: {}".format(os.getpid()))
if __name__ == "__main__":
print("ID of process running main program: {}".format(os.getpid()))
print("Main thread name: {}".format(threading.current_thread().name))
t1 = threading.Thread(target=task1, name='t1')
t2 = threading.Thread(target=task2, name='t2')
t1.start()
t2.start()
t1.join()
t2.join()
Output
ID of process running main program: 19
Main thread name: MainThread
Task 1 assigned to thread: t1
ID of process running task 1: 19
Task 2 assigned to thread: t2
ID of process running task 2: 19
Output:
ID of process running main program: 1141
Main thread name: MainThread
Task 1 assigned to thread: t1
ID of process running task 1: 1141
Task 2 assigned to thread: t2
ID of process running task 2: 1141
The diagram below clarifies the above concepts.
Multithreading
In conclusion, we briefly introduced multithreading in Python. The next article in this series will cover synchronization between multiple threads. Multithreading in Python | Set 2 (Synchronization)
Python Thread Pool
A thread pool is a collection of threads that can be reused to execute multiple tasks, which are created in advance. The Concurrent.futures module in Python provides the ThreadPoolExecutor class, which allows easy creation and management of thread pools.
This example defines a function worker that will be executed in the threads. It creates a ThreadPoolExecutor with a maximum of 2 worker threads. The two tasks are then submitted to the pool using the submit method. The pool manages the execution of tasks by the worker threads. To wait for all tasks to complete before the main thread continues, the shutdown method is used.
Multithreading helps improve a program’s efficiency and responsiveness. However, it is crucial to be cautious when working with threads to avoid issues like race conditions and deadlocks.
This code runs two worker tasks simultaneously using a thread pool created by concurrent.futures. It allows efficient parallel processing of tasks in a multithreaded environment. concurrent.futures.ThreadPoolExecutorpool.shutdown(wait=True)
Python
import concurrent.futures
def worker():
print("Worker thread running")
pool = concurrent.futures.ThreadPoolExecutor(max_workers=2)
pool.submit(worker)
pool.submit(worker)
pool.shutdown(wait=True)
print("Main thread continuing to run")
Output
Worker thread running
Worker thread running
Main thread continuing to run
Multithreading in Python – FAQ
What is Multithreading in Python?
Multithreading in Python involves executing multiple threads simultaneously within a single process to achieve parallelism and utilize multiple CPU cores.
Is Python suitable for Multithreading?
Due to the GIL (Global Interpreter Lock), which limits Python to execute only one thread at a time for CPU-bound tasks, Python is suitable for using multithreading for I/O-bound tasks. For CPU-bound tasks, multiprocessing is often more effective.
Which module is used for Multithreading in Python?
The‘threading’ module is used for multithreading in Python.
What are the various types of threads in Python?
The two main types of threads in Python are:
- Main Thread: The initial thread that runs when the program starts.
- Daemon Threads: Background threads that automatically terminate when the main thread exits.
- Non-Daemon Threads: Threads that continue to run until their tasks are complete, even if the main thread exits.
How many threads can Python have for Multithreading?
There is no fixed limit on the number of threads in Python, but the actual limit is determined by system resources and the GIL, so having too many threads may degrade performance. Generally, Python applications use dozens to hundreds of threads, but for intensive tasks, it’s recommended to use multiprocessing.