Link Search Menu Expand Document

Pandas

Columns

Drop columns

df = df.drop(columns=['a', 'b'])

Rename columns

df = df.rename(columns={'a': 'b'})

Drop duplicates

df = df.drop_duplicates(subset=['col_name'])

Check types

df.dtypes

Disk i/o

Parquet, uses pyarrow which is based on a C++ lib, and is therefore not bound by the python GIL.

Shuffle

Shuffle rows in place:

df = df.sample(frac=1).reset_index(drop=True)