Pandas DataFrame
Pandas is a library widely used for data analysis in Python, and among its features, DataFrame is a two-dimensional structure consisting of rows and columns. A DataFrame can store and manipulate data in a format similar to an Excel spreadsheet, making it very useful for data analysis tasks.
import pandas as pd
# Create a DataFrame
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"City": ["New York", "Los Angeles", "Chicago"]
}
df = pd.DataFrame(data)
print(df)
Key Features of DataFrame
1. Creating a DataFrame
A DataFrame can be created from various data structures, such as dictionaries, lists, and Numpy arrays. For example, you can create a DataFrame using a dictionary.
data = {
"Product": ["Apple", "Banana", "Cherry"],
"Price": [100, 200, 300]
}
df = pd.DataFrame(data)
print(df)
2. Accessing Columns and Rows in a DataFrame
To access columns or rows in a DataFrame, you can use the loc
or iloc
methods. loc
accesses based on labels, while iloc
accesses based on integer indices.
# Access a column
print(df["Product"])
# Access a row (using loc)
print(df.loc[0])
# Access a row (using iloc)
print(df.iloc[1])
3. Adding and Removing Data
You can add new columns or rows to a DataFrame or delete existing data. To add a new column, you write as follows.
# Adding a new column
df["Discounted Price"] = df["Price"] * 0.9
print(df)
To delete a row, use the drop()
method.
# Deleting a row
df = df.drop(1)
print(df)
4. Data Analysis Functions
Pandas provides various functions useful for data analysis. For example, the describe()
function provides basic statistical information about the DataFrame.
print(df.describe())
Additionally, you can use functions like mean()
and sum()
to calculate the average or sum of a specific column.
average_price = df["Price"].mean()
print("Average Price:", average_price)
5. Filtering a DataFrame
You can filter data in a DataFrame based on specific conditions. For example, to select only products priced at 150 or more, write as follows.
filtered_df = df[df["Price"] >= 150]
print(filtered_df)
6. Sorting a DataFrame
To sort a DataFrame based on a specific column, use the sort_values()
method.
# Sort in descending order by price
sorted_df = df.sort_values(by="Price", ascending=False)
print(sorted_df)
Summary
- Variables are spaces for storing data, and when values are assigned, their data type is automatically determined.
- You can use the
type()
function to check the data type of a variable. - Python variables use dynamic typing, allowing different types of values to be assigned to the same variable.
- You can assign values to multiple variables at once or assign the same value to several variables.
- You can use functions like
int()
,float()
, andstr()
to convert data types. - Pandas DataFrame is a two-dimensional data structure consisting of rows and columns and is very useful for data analysis.
- A DataFrame can be created in various forms such as dictionaries or lists, and you can access columns and rows as well as add and delete data.
- You can efficiently analyze data using filtering, sorting, and statistical functions of a DataFrame.
Variables and Pandas DataFrames are essential tools for handling data in Python. Understand and apply them well to achieve effective data processing!