Data Cleaning & Missing Values in Pandas for Pharmacy Students (Clinical Dataset Challenges & Solutions)
Real-world healthcare datasets are often incomplete or inconsistent. In pharmaceutical sciences, data cleaning is essential to ensure accurate analysis of patient records, ADR datasets, and PK studies.
π· What is Data Cleaning?
Data cleaning is the process of identifying and correcting errors, missing values, and inconsistencies in datasets.
π· What are Missing Values?
Missing values occur when data is not available for certain entries.
import pandas as pd
df = pd.read_csv("clinical_data.csv")
print(df)
Missing values are often represented as NaN.
π· Detecting Missing Values
print(df.isnull()) print(df.isnull().sum())
π Helps identify which columns have missing data.
π· Removing Missing Values
df_clean = df.dropna()
Removes rows with missing values.
π· Filling Missing Values
π Fill with Mean
df["Dose"].fillna(df["Dose"].mean(), inplace=True)
π Fill with Constant
df.fillna(0, inplace=True)
π Clinical Dataset Example
df = pd.read_csv("adr_data.csv")
# Fill missing dose
df["Dose"].fillna(df["Dose"].mean(), inplace=True)
# Remove rows with missing drug name
df.dropna(subset=["Drug"], inplace=True)
π Ensures dataset is usable for analysis.
π· Handling Inconsistent Data
# Replace incorrect values
df["Reaction"].replace("sev", "Severe", inplace=True)
π§ Memory Tricks
- isnull() β Find missing
- dropna() β Remove data
- fillna() β Fill data
π§ͺ Practice Exercise
Load dataset and:
- Find missing values
- Fill missing dose with average
- Remove incomplete rows
π§ͺ Mini Project
Clean patient dataset:
import pandas as pd
df = pd.read_csv("clinical_data.csv")
df["Dose"].fillna(df["Dose"].mean(), inplace=True)
df.dropna(inplace=True)
print("Cleaned Data:")
print(df)
π MCQs
- Missing values are represented as:
a) 0
b) NaN
c) None
d) Blank
Answer: b - Which function removes missing values?
a) fillna()
b) dropna()
c) replace()
d) remove()
Answer: b - fillna() is used for:
a) Delete data
b) Fill missing data
c) Print data
d) Sort data
Answer: b
β FAQs
Why is data cleaning important in pharmacy?
It ensures accurate clinical analysis and decision-making.
What is best method to handle missing data?
Depends on contextβuse mean, median, or remove rows.
π₯ Download Clinical Dataset with Missing Values
Practice real-world data cleaning problems.
β‘ Next Topic: Filtering & Selecting Data β
Recommended readings
- Introduction to Pandas (Why it is used in Pharma Data Analysis)
- Pandas Series & DataFrame (with patient & PK datasets)
- Reading CSV & Excel Files (PK datasets, ADR reports)
- Inspecting Data (head(), tail(), info(), describe())
- Data Cleaning & Missing Values (real clinical dataset problems)
- Filtering & Selecting Data (high dose, ADR filtering)
- Grouping & Aggregation (mean dose, ADR frequency)
Question Bank Unit 4: Data Handling with Pandas
For detailed information: Basics of Python Programming for Pharmaceutical Sciences