← IndexEntry № 222·data

Write a Pandas Snippet for a Specific Data Transformation

Generates a vectorized, commented pandas snippet for a precise transformation on your actual DataFrame.

Optimized for

§ When to use this

Pandas has five ways to do everything and four of them are slow, deprecated, or trigger a SettingWithCopyWarning you will ignore until it corrupts a result. This prompt gets you a vectorized, commented snippet for one specific transformation instead of a vague tutorial. The key is pasting your actual column names and dtypes: the model writes very different code for a string column that needs parsing versus an already-typed datetime, and it stops guessing whether a column is numeric. Asking it to preserve the original DataFrame and handle NaN explicitly prevents the two bugs that waste the most debugging time. The two-sentence rationale at the end is not filler; it tells you whether the approach assumes sorted data or unique keys, which is exactly where transformations break on the full dataset after working on your sample. Use it for groupby aggregations, reshaping, window calculations, date handling, and messy joins where you know the output you want but not the cleanest call.

§ The Prompt— fill in the fields, then copy or open in a tool

§ Customize0/4 fields filled

Dataframe name[DATAFRAME NAME]The variable name so the code drops straight in, e.g. 'sales_df' or 'events'.

Columns and dtypes[COLUMNS AND DTYPES]Each column with its dtype, e.g. 'user_id: int64, plan: object, signup_ts: datetime64[ns], mrr: float64'.

Transformation description[TRANSFORMATION DESCRIPTION]The operation in words, e.g. 'compute each user's month-over-month MRR change'.

Expected result[EXPECTED RESULT]What the output should look like, e.g. 'one row per user per month with columns user_id, month, mrr, mrr_delta'.

your prompt — fill the fields above

You are an expert Python data engineer. Write a clean pandas snippet that performs one transformation on a DataFrame named [DATAFRAME NAME]. Current columns and dtypes:
[COLUMNS AND DTYPES]

Transformation I need: [TRANSFORMATION DESCRIPTION]
Desired output: [EXPECTED RESULT]

Requirements: use vectorized pandas operations (no row-by-row apply unless unavoidable), preserve the original DataFrame unless I say otherwise, handle NULL/NaN explicitly, and avoid chained-assignment warnings. Add inline comments on each step. After the code, explain in two sentences why you chose this approach over a slower alternative, and note any assumption about input data (sorting, uniqueness, types).

Open with your prompt →ChatGPT ↗Claude ↗Sends your filled-in prompt straight into a new chat.

§ Example Output

What you can expect back

```python
import pandas as pd

# Work on a copy so the original stays intact
result = sales_df.copy()

# Sort so the per-user diff is chronological
result = result.sort_values(["user_id", "month"])

# Month-over-month change within each user; first month has no prior -> NaN -> 0
result["mrr_delta"] = (
    result.groupby("user_id")["mrr"].diff().fillna(0.0)
)

result = result[["user_id", "month", "mrr", "mrr_delta"]]
```

Why: groupby().diff() is vectorized in C and far faster than apply or a Python loop over users. Assumptions: there is at most one row per user per month, and month is a proper datetime so sorting orders correctly; if months can be missing, decide whether gaps should be reindexed before diffing.

Illustrative example — your results will vary by tool and inputs.

§ Pro Tips

Get sharper results

01Always paste dtypes, not just names; the model chooses .dt accessors versus string parsing based on them.
02If you get a SettingWithCopyWarning, paste it back and ask for a .loc or .assign rewrite.
03For big frames, ask whether the operation can run in a single groupby pass instead of multiple merges.
04Request a tiny synthetic DataFrame in the answer so you can run and verify the snippet immediately.

§ Variations

Adapt it for your case

Polars version

Ask for the same transformation in Polars to compare speed and syntax on large data.

Explain a snippet I have

Paste existing pandas code and ask it to explain line by line and flag performance traps.

Add tests

Request a few assert statements covering empty input, a single user, and a NaN row.

Best For — Roles

Use For — Tasks

Analyzing Data

Tags#pandas#python#etl

§ FAQ

Common questions

Why does it keep telling me to .copy()?

Transforming a slice of a DataFrame in place is the classic source of SettingWithCopyWarning and silent partial updates; copying first makes behavior predictable.

It used apply even though I said avoid it, why?

Some transformations have no clean vectorized form; the prompt allows apply 'unless avoidable' and asks the model to justify it, so check the rationale to confirm it was truly necessary.

Will this work on millions of rows?

Vectorized groupby operations scale well, but if memory is tight ask for the Polars or chunked variant and test on a sample first.

§ Related Entries

You may also need

№ 225data

Write a Precise, Unambiguous Metric Definition

Produces a precise metric definition with formula, grain, inclusion rules, and resolved edge cases.

For

chatgpt·claude

№ 226data

Build a Data-Cleaning Checklist for a New Dataset

Generates a prioritized, column-specific data-cleaning checklist tailored to your dataset and intended analysis.

For

chatgpt·claude

№ 223data

Choose the Right Chart Type for Your Data

Recommends the best and runner-up chart type for your message, variables, and audience with encoding guidance.

For

chatgpt·claude

№ 224data

Interpret an A/B Test Result Without Overclaiming

Interprets A/B test numbers honestly, flags validity risks, and gives a ship or keep-running recommendation.

For

chatgpt·claude