A collection of useful pandas snippets for time series analysis, including date filtering, period conversion, and merging by nearest match.
Handling time-series data efficiently is a core skill for data analysis. Whether you are filtering by year, converting to specific periods, or performing asynchronous joins, pandas provides a robust set of tools. Below are some common patterns and techniques, often encountered in database-style coding challenges.
df['pay_month'] = df['pay_date'].dt.strftime('%Y-%m')
Extracting parts of a date or filtering by specific time components is frequent in reporting.
.dt accessor to reach datetime properties. df.query('timestamp.dt.year == 2020')
df[df['order_date'].dt.to_period('M') == '2020-02']
pd.to_datetime(): Converts strings or other formats to datetime objects..between(): Checks if values are within a range.pd.to_timedelta(): Represents durations/differences in time.merge_asof: This is powerful for “nearest match” joins. Unlike a standard join that requires exact matches, merge_asof aligns data based on the closest key (e.g., matching a transaction date to the price active at that time).NaN values may appear if some products had no sales. Always handle these or avoid dividing by zero using .fillna(0) or similar logic.To compare a value with “yesterday’s” value, you can merge the table with itself using a calculated date offset.
pd.to_timedelta(1, unit='D') adds or subtracts one day.suffixes=('_today', '_yesterday') in pd.merge() to keep the columns distinguishable.pd.to_datetime().Sometimes you need to filter groups based on whether all their records fall within a certain range.
start_time = pd.to_datetime('2019-01-01')
end_time = pd.to_datetime('2019-03-31')
# Filter groups where the entire date range is within the window
df.groupby('product_id').filter(
lambda x: x['sale_date'].min() >= start_time and x['sale_date'].max() <= end_time
)
This approach ensures that products sold strictly within the first quarter are selected, excluding any that had sales outside that period.