Write a Pandas Snippet for a Specific Data Transformation
Generates a vectorized, commented pandas snippet for a precise transformation on your actual DataFrame.
Pandas has five ways to do everything and four of them are slow, deprecated, or trigger a SettingWithCopyWarning you will ignore until it corrupts a result. This prompt gets you a vectorized, commented snippet for one specific transformation instead of a vague tutorial. The key is pasting your actual column names and dtypes: the model writes very different code for a string column that needs parsing versus an already-typed datetime, and it stops guessing whether a column is numeric. Asking it to preserve the original DataFrame and handle NaN explicitly prevents the two bugs that waste the most debugging time. The two-sentence rationale at the end is not filler; it tells you whether the approach assumes sorted data or unique keys, which is exactly where transformations break on the full dataset after working on your sample. Use it for groupby aggregations, reshaping, window calculations, date handling, and messy joins where you know the output you want but not the cleanest call.
You are an expert Python data engineer. Write a clean pandas snippet that performs one transformation on a DataFrame named [DATAFRAME NAME]. Current columns and dtypes: [COLUMNS AND DTYPES] Transformation I need: [TRANSFORMATION DESCRIPTION] Desired output: [EXPECTED RESULT] Requirements: use vectorized pandas operations (no row-by-row apply unless unavoidable), preserve the original DataFrame unless I say otherwise, handle NULL/NaN explicitly, and avoid chained-assignment warnings. Add inline comments on each step. After the code, explain in two sentences why you chose this approach over a slower alternative, and note any assumption about input data (sorting, uniqueness, types).
What you can expect back
```python
import pandas as pd
# Work on a copy so the original stays intact
result = sales_df.copy()
# Sort so the per-user diff is chronological
result = result.sort_values(["user_id", "month"])
# Month-over-month change within each user; first month has no prior -> NaN -> 0
result["mrr_delta"] = (
result.groupby("user_id")["mrr"].diff().fillna(0.0)
)
result = result[["user_id", "month", "mrr", "mrr_delta"]]
```
Why: groupby().diff() is vectorized in C and far faster than apply or a Python loop over users. Assumptions: there is at most one row per user per month, and month is a proper datetime so sorting orders correctly; if months can be missing, decide whether gaps should be reindexed before diffing.Illustrative example — your results will vary by tool and inputs.
Get sharper results
- 01Always paste dtypes, not just names; the model chooses .dt accessors versus string parsing based on them.
- 02If you get a SettingWithCopyWarning, paste it back and ask for a .loc or .assign rewrite.
- 03For big frames, ask whether the operation can run in a single groupby pass instead of multiple merges.
- 04Request a tiny synthetic DataFrame in the answer so you can run and verify the snippet immediately.
Adapt it for your case
Ask for the same transformation in Polars to compare speed and syntax on large data.
Paste existing pandas code and ask it to explain line by line and flag performance traps.
Request a few assert statements covering empty input, a single user, and a NaN row.
Common questions
Why does it keep telling me to .copy()?
Transforming a slice of a DataFrame in place is the classic source of SettingWithCopyWarning and silent partial updates; copying first makes behavior predictable.
It used apply even though I said avoid it, why?
Some transformations have no clean vectorized form; the prompt allows apply 'unless avoidable' and asks the model to justify it, so check the rationale to confirm it was truly necessary.
Will this work on millions of rows?
Vectorized groupby operations scale well, but if memory is tight ask for the Polars or chunked variant and test on a sample first.
You may also need
Write a Precise, Unambiguous Metric Definition
Produces a precise metric definition with formula, grain, inclusion rules, and resolved edge cases.
Build a Data-Cleaning Checklist for a New Dataset
Generates a prioritized, column-specific data-cleaning checklist tailored to your dataset and intended analysis.
Choose the Right Chart Type for Your Data
Recommends the best and runner-up chart type for your message, variables, and audience with encoding guidance.
Interpret an A/B Test Result Without Overclaiming
Interprets A/B test numbers honestly, flags validity risks, and gives a ship or keep-running recommendation.