← IndexEntry № 222·data

Write a Pandas Snippet for a Specific Data Transformation

Generates a vectorized, commented pandas snippet for a precise transformation on your actual DataFrame.

Optimized for
ChatGPTClaude
§ When to use this

Pandas has five ways to do everything and four of them are slow, deprecated, or trigger a SettingWithCopyWarning you will ignore until it corrupts a result. This prompt gets you a vectorized, commented snippet for one specific transformation instead of a vague tutorial. The key is pasting your actual column names and dtypes: the model writes very different code for a string column that needs parsing versus an already-typed datetime, and it stops guessing whether a column is numeric. Asking it to preserve the original DataFrame and handle NaN explicitly prevents the two bugs that waste the most debugging time. The two-sentence rationale at the end is not filler; it tells you whether the approach assumes sorted data or unique keys, which is exactly where transformations break on the full dataset after working on your sample. Use it for groupby aggregations, reshaping, window calculations, date handling, and messy joins where you know the output you want but not the cleanest call.

§ The Prompt— fill in the fields, then copy or open in a tool
§ Customize0/4 fields filled
your prompt — fill the fields above
You are an expert Python data engineer. Write a clean pandas snippet that performs one transformation on a DataFrame named [DATAFRAME NAME]. Current columns and dtypes:
[COLUMNS AND DTYPES]

Transformation I need: [TRANSFORMATION DESCRIPTION]
Desired output: [EXPECTED RESULT]

Requirements: use vectorized pandas operations (no row-by-row apply unless unavoidable), preserve the original DataFrame unless I say otherwise, handle NULL/NaN explicitly, and avoid chained-assignment warnings. Add inline comments on each step. After the code, explain in two sentences why you chose this approach over a slower alternative, and note any assumption about input data (sorting, uniqueness, types).
Open with your prompt →ChatGPTClaudeSends your filled-in prompt straight into a new chat.
§ Example Output

What you can expect back

```python
import pandas as pd

# Work on a copy so the original stays intact
result = sales_df.copy()

# Sort so the per-user diff is chronological
result = result.sort_values(["user_id", "month"])

# Month-over-month change within each user; first month has no prior -> NaN -> 0
result["mrr_delta"] = (
    result.groupby("user_id")["mrr"].diff().fillna(0.0)
)

result = result[["user_id", "month", "mrr", "mrr_delta"]]
```

Why: groupby().diff() is vectorized in C and far faster than apply or a Python loop over users. Assumptions: there is at most one row per user per month, and month is a proper datetime so sorting orders correctly; if months can be missing, decide whether gaps should be reindexed before diffing.

Illustrative example — your results will vary by tool and inputs.

§ Pro Tips

Get sharper results

  • 01Always paste dtypes, not just names; the model chooses .dt accessors versus string parsing based on them.
  • 02If you get a SettingWithCopyWarning, paste it back and ask for a .loc or .assign rewrite.
  • 03For big frames, ask whether the operation can run in a single groupby pass instead of multiple merges.
  • 04Request a tiny synthetic DataFrame in the answer so you can run and verify the snippet immediately.
§ Variations

Adapt it for your case

Polars version

Ask for the same transformation in Polars to compare speed and syntax on large data.

Explain a snippet I have

Paste existing pandas code and ask it to explain line by line and flag performance traps.

Add tests

Request a few assert statements covering empty input, a single user, and a NaN row.

Best For — Roles
Use For — Tasks
Tags#pandas#python#etl
§ FAQ

Common questions

Why does it keep telling me to .copy()?

Transforming a slice of a DataFrame in place is the classic source of SettingWithCopyWarning and silent partial updates; copying first makes behavior predictable.

It used apply even though I said avoid it, why?

Some transformations have no clean vectorized form; the prompt allows apply 'unless avoidable' and asks the model to justify it, so check the rationale to confirm it was truly necessary.

Will this work on millions of rows?

Vectorized groupby operations scale well, but if memory is tight ask for the Polars or chunked variant and test on a sample first.

§ Related Entries

You may also need