08-2 Python Tutorial – Getting Started with Regular Expressions

What is a Regular Expression?

A regular expression is a powerful tool for matching strings to specific patterns. It is mainly used for data validation, searching, and text processing tasks. Utilizing regular expressions in programming languages, especially in Python, allows you to easily handle complex pattern matching.

Using Regular Expressions in Python

The Python re module offers various functions related to regular expressions. Commonly used functions include matchsearchfindall, and finditer.


# Import the re module
import re

# Pattern matching example
pattern = re.compile(r'\d+')

# Search for numbers in a string
match = pattern.search("The cost is 1200 won.")
if match:
    print("Number found:", match.group())
    

Basic Patterns in Regular Expressions

You can perform more complex pattern matching through commonly used metacharacters in regular expressions. For example:

  • . : Any single character
  • ^ : Start of the string
  • $ : End of the string
  • * : Zero or more repetitions
  • + : One or more repetitions
  • ? : Zero or one repetition

Advanced Pattern Matching

To use regular expressions more deeply, you need to understand advanced features such as grouping and capturing, lookaheads, and lookbehinds.


# Grouping example
pattern = re.compile(r'(\d{3})-(\d{3,4})-(\d{4})')
match = pattern.search("The phone number is 010-1234-5678.")
if match:
    print("Area code:", match.group(1))
    print("Middle number:", match.group(2))
    print("Last number:", match.group(3))
    

Useful Examples of Regular Expressions

Regular expressions can be used to identify and process various string patterns. For example, you can check the validity of an email address or extract URLs from text.

Practical Examples

We will explore applications of regular expressions through various real-world cases. This section will demonstrate how regular expressions can contribute to problem-solving with specific code examples.

Cautions When Using Regular Expressions

While regular expressions are a powerful tool, performance issues may arise at times. You should be cautious when applying them to very complex patterns or large datasets. Additionally, you should consider readability and maintainability when using them.

Conclusion

Regular expressions are a very useful feature in programming languages like Python. With sufficient practice and understanding, you can write code more efficiently and concisely.

08-1 Python Course – Exploring Regular Expressions

What is Regular Expression?

Regular Expressions (regex or regexp for short) are strings used for searching, replacing, and extracting strings that match specific rules. They are mainly used in text processing to search multiple patterns or for data validation.

Basic Concepts of Regular Expressions

To understand the basic concepts of regular expressions, it is essential to know a few special characters that are commonly used.

Basic Patterns

  • Dot (.): Represents any single character.
  • Brackets ([]): Represents one of the characters inside the brackets. Example: [abc]
  • Caret (^): Represents the start of the string. Example: ^Hello
  • Dollar sign ($): Represents the end of the string. Example: world$
  • Asterisk (*): Indicates that the preceding character may occur zero or more times. Example: a*
  • Plus (+): Indicates that the preceding character may occur one or more times. Example: a+
  • Question mark (?): Indicates that the preceding character may occur zero or one time. Example: a?

Meta Characters

Meta characters are used with special meanings in regular expressions and often need to be escaped to be used literally as characters.

  • Backslash (\\): An escape character used to treat a special character as a regular character.
  • Pipe (|): The OR operator, which is considered true if any of the multiple patterns match. Example: a|b
  • Parentheses ((): Represents grouping and is used to create subpatterns. Example: (ab)

Using Regular Expressions in Python

In Python, the `re` module is used to handle regular expressions. This module provides various functions to easily work with regular expressions.

Functions in the re Module

  • re.match(): Checks if the beginning of a string matches the specified pattern.
  • re.search(): Searches the entire string for the first matching pattern.
  • re.findall(): Returns all substrings that match the pattern as a list.
  • re.finditer(): Returns all substrings that match the pattern as an iterable object.
  • re.sub(): Replaces substrings that match the pattern with another string.

Examples of Using Regular Expressions

Basic Usage Examples


import re

# Check if the start of the string is 'Hello'
result = re.match(r'^Hello', 'Hello, world!')
print(result)  # Returns a match object if successful, or None if failed.
    

Finding Patterns in a String


import re

search_result = re.search(r'world', 'Hello, world!')
print(search_result)  # Returns a match object for the matched portion.
    

Extracting All Matching Patterns


# Finding all 'a' characters in the string
all_matches = re.findall(r'a', 'banana')
print(all_matches)  # Returns a list of all matches found.
    

Transforming Strings Based on Patterns

You can use the re.sub() function to transform patterns in a string into other strings.


# Replace all whitespace with underscores
transformed_string = re.sub(r'\s', '_', 'Hello world!')
print(transformed_string)  # Output: 'Hello_world!'
    

Advanced Features of Regular Expressions

Grouping and Capturing

Grouping is very useful for capturing subpatterns of a regex for reuse or for performing specific tasks.


pattern = r'(\d+)-(\d+)-(\d+)'
string = 'Phone number: 123-456-7890'
match = re.search(pattern, string)

if match:
    print(match.group(0))  # Full matched string
    print(match.group(1))  # First group: 123
    print(match.group(2))  # Second group: 456
    print(match.group(3))  # Third group: 789
    

Lookahead and Lookbehind

Lookahead and Lookbehind are used to check conditions that are before or after a specific pattern. These features are commonly used techniques but can be somewhat complex.

Using Lookahead


# Finding the pattern where 'def' follows 'abc'
lookahead_pattern = r'abc(?=def)'
lookahead_string = 'abcdefghi'
lookahead_match = re.search(lookahead_pattern, lookahead_string)
print(lookahead_match)
    

Using Lookbehind


# Pattern that comes before '123'
lookbehind_pattern = r'(?<=123)abc'
lookbehind_string = '123abc'
lookbehind_match = re.search(lookbehind_pattern, lookbehind_string)
print(lookbehind_match)
    

Comprehensive Example: Extracting Email Addresses

Regular expressions are especially useful for extracting email addresses from entered text.


email_pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
text = "Contact email: example@example.com or support@domain.com"
emails = re.findall(email_pattern, text)

print(emails)  # ['example@example.com', 'support@domain.com']
    

Summary

Regular expressions are a powerful tool in string processing, and Python's `re` module provides sufficient functionality to work with them. By understanding the basic syntax of regular expressions and practicing, one can easily handle complex text patterns. Regular practice and application of these techniques will help solve more complex string processing issues effectively.

07: Flying with Python

In this course, we will explore how to utilize the advanced features of Python to solve complex problems and write efficient code. The main topics we will cover include various programming paradigms, advanced data structures, and the powerful built-in module functionalities that Python offers.

1. Advanced Programming Paradigms

Python is a multi-paradigm programming language. It supports procedural, object-oriented, and functional programming, allowing you to take advantage of each as needed. In this section, we will focus primarily on advanced techniques in object-oriented programming (OOP) and functional programming.

1.1 In-depth Object-Oriented Programming

The basic concept of OOP starts with the understanding of classes and objects. However, to design more complex programs, you need to know other concepts as well.

1.1.1 Inheritance and Polymorphism

Inheritance is a feature where a new class inherits the properties and methods of an existing class. By using inheritance, the reusability of the code can be enhanced. Polymorphism allows for the same interface to be used for objects of different classes.

class Animal:
    def speak(self):
        pass

class Dog(Animal):
    def speak(self):
        return "Woof!"

class Cat(Animal):
    def speak(self):
        return "Meow!"

def animal_sound(animal):
    print(animal.speak())

dog = Dog()
cat = Cat()

animal_sound(dog)  # Woof!
animal_sound(cat)  # Meow!

The above example is an illustration of polymorphism. By having the speak() method in different class objects, it can be called in the same way within the animal_sound function.

1.1.2 Abstraction and Interfaces

An abstract class is a class that defines a basic behavior, housing one or more abstract methods. An interface can be thought of as a collection of these abstract methods. In Python, abstraction is implemented through the ABC class of the abc module.

from abc import ABC, abstractmethod

class Shape(ABC):
    @abstractmethod
    def area(self):
        pass

class Circle(Shape):
    def __init__(self, radius):
        self.radius = radius

    def area(self):
        return 3.1415 * self.radius * self.radius

circle = Circle(5)
print(circle.area())  # 78.5375

In the above example, the Shape class is an abstract class that defines the abstract method area. The Circle class inherits from Shape and implements the area method.

1.2 Functional Programming

Functional programming uses pure functions to reduce side effects and implements complex behaviors through function composition. Python provides strong functional tools to encourage this style.

1.2.1 Lambda Functions

Lambda functions are anonymous functions defined typically with a single expression. They are useful for writing short and concise functions.

add = lambda x, y: x + y
print(add(5, 3))  # 8

In the above example, lambda defines an anonymous function that adds two parameters.

1.2.2 Higher-Order Functions

A higher-order function is a function that takes another function as an argument or returns it. Python’s map, filter, and reduce are examples that utilize these functional programming techniques.

numbers = [1, 2, 3, 4, 5]
squared = map(lambda x: x**2, numbers)
print(list(squared))  # [1, 4, 9, 16, 25]

In the above example, the map function applies the lambda function to each element in the list to create a new iterator.

2. Advanced Data Structures

Utilizing advanced data structures allows for more efficient handling of complex data operations. Here we will address more complex data structures beyond basic types like lists and dictionaries.

2.1 Collections Module

The Python collections module provides several data structures with specialized purposes. Let’s take a look at a few of them.

2.1.1 defaultdict

defaultdict is a dictionary that automatically creates a default value when a non-existent key is referenced.

from collections import defaultdict

fruit_counter = defaultdict(int)
fruits = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']

for fruit in fruits:
    fruit_counter[fruit] += 1

print(fruit_counter)  # defaultdict(, {'apple': 3, 'banana': 2, 'orange': 1})

This example demonstrates how to easily count each fruit using defaultdict.

2.1.2 namedtuple

namedtuple is like a tuple but immutable while allowing access to fields by name which enhances the readability of the code.

from collections import namedtuple

Point = namedtuple('Point', ['x', 'y'])
p = Point(10, 20)

print(p.x, p.y)  # 10 20

By using namedtuple, fields can be accessed by name, allowing for clearer code.

2.2 Heap Queue Module

The heapq module implements a heap queue algorithm, enabling a list to be used as a priority queue.

import heapq

numbers = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
heapq.heapify(numbers)  # Convert list to a priority queue

smallest = heapq.heappop(numbers)
print(smallest)  # 0

This allows for quick extraction of the minimum value in the data using a priority queue.

3. Utilizing Advanced Built-in Modules

The rich built-in modules of Python provide various functionalities. Here, we will introduce some modules for advanced tasks.

3.1 itertools Module

The itertools module offers useful functions for dealing with iterators. It is a powerful tool for repetitive data processing.

3.1.1 Combinations and Permutations

Combinations and permutations provide various methods for selecting elements from data sets.

from itertools import combinations, permutations

data = ['A', 'B', 'C']

# Combinations
print(list(combinations(data, 2)))  # [('A', 'B'), ('A', 'C'), ('B', 'C')]

# Permutations
print(list(permutations(data, 2)))  # [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]

These functions allow for the quick generation of various list combinations.

3.1.2 Handling Iterator Collections

This module provides tools for various iterations such as infinite loops, counting increments, and periodic repetitions.

from itertools import count, cycle

# Infinite count
for i in count(10):
    if i > 15:
        break
    print(i, end=' ')  # 10 11 12 13 14 15

print()  # New line

# Periodic repetition
for i, char in zip(range(10), cycle('ABC')):
    print(char, end=' ')  # A B C A B C A B C A

The above example shows how to utilize infinite loops and periodic repetitions.

3.2 functools Module

The functools module provides functional programming tools, offering various utilities particularly useful for handling functions.

3.2.1 lru_cache Decorator

The @lru_cache decorator is used for memoization, storing computed results to avoid recalculating for the same input.

from functools import lru_cache

@lru_cache(maxsize=None)
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

print([fibonacci(n) for n in range(10)])  # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

In the code above, the computed results for the Fibonacci sequence are stored in the cache, saving execution time for the same input.

Conclusion

In this article, we have discussed advanced topics in Python. By effectively utilizing these features, complex problems can be solved efficiently, and high-level code can be written. Let's delve into more topics in the next course and advance towards becoming Python experts.

python type annotation

Python is well known as a dynamic typing language. This means that every value is checked at runtime without the need to explicitly specify the type of a variable. However, as large projects become more complex and multiple developers collaborate, understanding and maintaining the code can become difficult. To address this, Type Annotation was introduced in Python 3.5. Using type annotations helps improve code readability, prevent bugs, and enhance autocompletion features.

1. Basics of Type Annotation

Type annotation is the syntax that allows explicitly specifying the types of variables or functions. Here are the basic ways to annotate the types of variables and functions:

Variable Annotation:
x: int = 10
y: float = 10.5
name: str = "Alice"

Function Annotation:
def greeting(name: str) -> str:
    return "Hello " + name

1.1 Type Annotation of Variables

By specifying the type of a variable, the code writer can clearly indicate which type is expected. This allows for early error detection through static analysis in tooling and IDEs.

1.2 Annotations for Function Parameters and Return Values

Function annotations can explicitly specify the input and output types of a function, helping to anticipate what type of data the function will receive. This is very helpful during code reviews.

2. Built-in Data Types

Python supports various built-in data types, and these types can be used in annotations.

  • Basic types like int, float, str, bool, None, etc.
  • Container types like List, Dict, Set, Tuple can be further refined using the typing module.
from typing import List, Dict, Tuple

names: List[str] = ["Alice", "Bob", "Charlie"]
scores: Dict[str, int] = {"Alice": 95, "Bob": 85}
position: Tuple[int, int] = (10, 20)

3. Union and Optional

When multiple types are allowed, it is common to use Union, and when None is allowed, Optional is used.

from typing import Union, Optional

value: Union[int, float] = 5.5

def get_user(id: int) -> Optional[Dict[str, str]]:
    if id == 1:
        return {"name": "Alice", "role": "admin"}
    return None

4. User-defined Types

When you need to define complex types, using Type or NewType allows you to write clearer code.

from typing import NewType

UserID = NewType('UserID', int)
admin_user_id: UserID = UserID(524313)

4.1 Type Alias

Using type aliases allows you to express complex type structures with concise names.

Vector = List[float]

def normalize(vec: Vector) -> Vector:
    magnitude = sum(x**2 for x in vec) ** 0.5
    return [x / magnitude for x in vec]

5. Generic Types

Using generic types allows a single function or class to work with multiple types. You can define generic types using the typing.Generic class.

from typing import TypeVar, Generic

T = TypeVar('T')

class Box(Generic[T]):
    def __init__(self, content: T) -> None:
        self.content = content

int_box = Box(123)
str_box = Box("hello")

6. Advanced Example

Here is a slightly more complex example utilizing type annotations.

from typing import List, Dict, Union, Callable

def process_data(data: List[Union[int, float, str]]) -> Dict[str, Union[int, float]]:
    result: Dict[str, Union[int, float]] = {'total': 0, 'numeric_count': 0}

    def is_number(val: Union[int, float, str]) -> bool:
        return isinstance(val, (int, float))

    for item in data:
        if is_number(item):
            result['total'] += item  # Prevents type warnings.
            result['numeric_count'] += 1

    return result

mixed_data: List[Union[int, float, str]] = [10, '20', 30.5, 'forty', '60', 70.2]
output = process_data(mixed_data)
print(output)
# {'total': 110.7, 'numeric_count': 3}

7. Static Type Checking Tools

Type annotations are most useful when used with static type checking tools. In Python, tools like mypy, Pyright, and Pylance are widely used.

For example, mypy is used as follows:

mypy script.py

These tools are very effective in checking the type consistency of the code and preventing unexpected type errors.

8. Conclusion

Type annotation is a powerful feature of Python that greatly helps improve code readability, ease maintenance, and prevent errors early. Additionally, when combined with static analysis tools, it provides greater stability for large projects. Through this tutorial, I hope you will be able to effectively utilize type annotations and write more robust Python code.

Python Iterators and Generators

In programming, iterable objects and their usage are essential for large-scale data processing. Python provides two powerful tools for performing these tasks: iterators and generators. In this article, we will delve deeply into the concepts of iterators and generators, their differences, and how to use them.

Iterator

An iterator is a protocol that represents an object that can be iterated upon, providing an interface to traverse the elements of the object. In Python, an iterator is created by implementing the __iter__() method and the __next__() method. These are automatically called when iterating in a loop and are generally useful when handling large amounts of data.

How iterators work

To understand how an iterator works, we need to look more deeply at the two methods.

  • __iter__()Returns the iterable object, i.e., it returns the object itself. This method is called when the iteration starts. The iterable object is used to obtain an iterator from the starting point.
  • __next__()Returns the next value of the data through iteration. If no more data is available, it should raise a StopIteration exception. This method is called to fetch the next item from the iterable that has the items grouped for iteration.

Simple iterator example

Below is a simple example code of a counter iterator:


class Counter:
    def __init__(self, low, high):
        self.current = low
        self.high = high

    def __iter__(self):
        return self

    def __next__(self):
        if self.current >= self.high:
            raise StopIteration
        else:
            self.current += 1
            return self.current - 1

counter = Counter(1, 5)
for number in counter:
    print(number)
    

In the above example, the Counter class follows the iterator protocol by implementing the __iter__() and __next__() methods. Objects of this class can be used in loops (for loop).

Generator

A generator is a special function that helps to create an iterator more simply, using the yield keyword to return values one at a time. When called, a generator returns a generator object, which is run when the generator function is used to iterate over values and can pause and resume from where it left off when called again.

How generators work

Generators internally automatically implement the __iter__() and __next__() methods, hiding these implementations from the user. Therefore, when a generator function is called, a generator object is returned, which can be used like an iterator.

Generator example

Below is a simple example code of a generator function:


def simple_generator():
    yield 1
    yield 2
    yield 3

for value in simple_generator():
    print(value)
    

In the above example, the simple_generator() function returns values one at a time using the yield keyword every time it is called. This generator can be used in a for loop like other iterators.

Differences between iterators and generators

Iterators and generators have many similarities, but there are a few important differences:

  • Simplicity of implementation: Generators can be implemented more intuitively and simply using the yield keyword. This eliminates the complexity of writing iterators manually.
  • State preservation: Generators automatically preserve their state. When a generator is paused, it remembers all current states, so calling yield continually keeps that state intact.
  • Memory usage: Generators do not generate results immediately and create values one at a time as needed, making them memory efficient. Compared to iterators, they are more useful for processing large-scale data.

Advanced usage example

Generators can be combined with complex logic to write highly efficient code. Below is an example of generating the Fibonacci sequence using a generator:


def fibonacci_generator():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

fib_gen = fibonacci_generator()
for _ in range(10):
    print(next(fib_gen))
    

In this example, the fibonacci_generator generates an infinite Fibonacci sequence, and you can output as many values as needed using a for loop or the next() function.

Practical applications

Iterators and generators are often used in situations where it is necessary to process large streams of data or to generate values one at a time without the need to store the entire list of results in memory, optimizing memory usage.

File reading: Each line of a file can be read as a generator to handle larger files in a memory-efficient manner.


def read_large_file(file_path):
    with open(file_path, 'r') as file:
        for line in file:
            yield line.strip()

for line in read_large_file("large_file.txt"):
    print(line)
    

Conclusion

Iterators and generators are very powerful features of Python, and using them can help perform complex and large-scale data processing efficiently and with better readability. By understanding and appropriately utilizing these two concepts, you will be able to write more efficient and scalable code.

I hope this tutorial has helped deepen your understanding of Python iterators and generators. Consider applying this content in your future Python programming journey.