Pandas apply#
I often use Pandas to process NLP data. In many cases I want to create a new column from the information in an existing column. For example, if I want to have the number of characters or tokens.
This can easily be done with the help of the apply
function of Pandas.
However, an extreme case is when you want to apply one single function to create two new columns from the information of two existing columns.
Here I show you how it’s done.
import pandas as pd
df = pd.DataFrame(
{
"int1": [1, 2, 3],
"int2": [11, 12, 13],
"strings": ["string1", "string2", "string3"],
}
)
df
int1 | int2 | strings | |
---|---|---|---|
0 | 1 | 11 | string1 |
1 | 2 | 12 | string2 |
2 | 3 | 13 | string3 |
def add_and_multiply(x, y):
add_result = x + y
multiply_result = x * y
return add_result, multiply_result
df[["int1_plus_int2", "int1_times_int2"]] = df[["int1", "int2"]].apply(
lambda x: add_and_multiply(*x),
axis=1,
result_type="expand",
)
df
int1 | int2 | strings | int1_plus_int2 | int1_times_int2 | |
---|---|---|---|---|---|
0 | 1 | 11 | string1 | 12 | 11 |
1 | 2 | 12 | string2 | 14 | 24 |
2 | 3 | 13 | string3 | 16 | 39 |
Options for Date Encoding
The importance of chat templates