Pandas
Columns
Drop columns
df = df.drop(columns=['a', 'b'])
Rename columns
df = df.rename(columns={'a': 'b'})
Drop duplicates
df = df.drop_duplicates(subset=['col_name'])
Check types
df.dtypes
Disk i/o
Parquet, uses pyarrow
which is based on a C++ lib, and is therefore not bound by the python GIL.
Shuffle
Shuffle rows in place:
df = df.sample(frac=1).reset_index(drop=True)