Pandas: Split a given dataset, group by one column and apply an aggregate function to few columns and another aggregate function to the rest of the columns of the dataframe
Pandas Grouping and Aggregating: Split-Apply-Combine Exercise-27 with Solution
Write a Pandas program to split a given dataset, group by one column and apply an aggregate function to few columns and another aggregate function to the rest of the columns of the dataframe.
Test Data:
salesman_id sale_jan sale_feb sale_mar sale_apr sale_may sale_jun \ 0 5002 150.50 250.50 150.50 150.50 130.50 150.50 1 5005 270.65 170.65 270.65 270.65 270.65 270.65 2 5001 65.26 15.26 65.26 95.26 65.26 45.26 3 5003 110.50 110.50 110.50 210.50 310.50 110.50 4 5002 948.50 598.50 948.50 948.50 948.50 948.50 5 5001 2400.60 1400.60 2400.60 2400.60 2400.60 3400.60 6 5001 1760.00 2760.00 5760.00 760.00 760.00 5760.00 7 5006 2983.43 1983.43 1983.43 1983.43 1983.43 983.43 8 5003 480.40 2480.40 2480.40 2480.40 2480.40 2480.40 9 5002 1250.45 250.45 250.45 250.45 250.45 250.45 10 5007 75.29 75.29 75.29 75.29 75.29 75.29 11 5001 1045.60 3045.60 3045.60 3045.60 3045.60 3045.60
sale_jul sale_aug sale_sep sale_oct sale_nov sale_dec 0 950.50 150.50 150.50 150.50 150.50 150.50 1 270.65 70.65 270.65 270.65 270.65 70.65 2 65.26 65.26 65.26 65.26 95.26 65.26 3 210.50 110.50 110.50 110.50 110.50 110.50 4 948.50 948.50 948.50 948.50 948.50 948.50 5 2400.60 400.60 200.60 2400.60 2400.60 2400.60 6 5760.00 5760.00 5760.00 5760.00 5760.00 5760.00 7 983.43 1983.43 1983.43 1983.43 1983.43 1983.43 8 2480.40 2480.40 2480.40 2480.40 2480.40 2480.40 9 250.45 250.45 250.45 250.45 250.45 250.45 10 75.29 75.29 75.29 75.29 75.29 75.29 11 3045.60 3045.60 3045.60 3045.60 3045.60 3045.60
Sample Solution:
Python Code :
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
df = pd.DataFrame({
'salesman_id': [5002,5005,5001,5003,5002,5001,5001,5006,5003,5002,5007,5001],
'sale_jan':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 1760, 2983.43, 480.4, 1250.45, 75.29,1045.6],
'sale_feb':[250.5, 170.65, 15.26, 110.5, 598.5, 1400.6, 2760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_mar':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_apr':[150.5, 270.65, 95.26, 210.5, 948.5, 2400.6, 760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_may':[130.5, 270.65, 65.26, 310.5, 948.5, 2400.6, 760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_jun':[150.5, 270.65, 45.26, 110.5, 948.5, 3400.6, 5760, 983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_jul':[950.5, 270.65, 65.26, 210.5, 948.5, 2400.6, 5760, 983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_aug':[150.5, 70.65, 65.26, 110.5, 948.5, 400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_sep':[150.5, 270.65, 65.26, 110.5, 948.5, 200.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_oct':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_nov':[150.5, 270.65, 95.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_dec':[150.5, 70.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6]
})
print("Original Orders DataFrame:")
print(df)
print("\Result after group on salesman_id and apply different aggregate functions:")
df = df.groupby('salesman_id').agg(lambda x : x.sum() if x.name in ['sale_jan','sale_feb','sale_mar'] else x.mean())
print(df)
Sample Output:
Original Orders DataFrame: salesman_id sale_jan sale_feb sale_mar sale_apr sale_may sale_jun \ 0 5002 150.50 250.50 150.50 150.50 130.50 150.50 1 5005 270.65 170.65 270.65 270.65 270.65 270.65 2 5001 65.26 15.26 65.26 95.26 65.26 45.26 3 5003 110.50 110.50 110.50 210.50 310.50 110.50 4 5002 948.50 598.50 948.50 948.50 948.50 948.50 5 5001 2400.60 1400.60 2400.60 2400.60 2400.60 3400.60 6 5001 1760.00 2760.00 5760.00 760.00 760.00 5760.00 7 5006 2983.43 1983.43 1983.43 1983.43 1983.43 983.43 8 5003 480.40 2480.40 2480.40 2480.40 2480.40 2480.40 9 5002 1250.45 250.45 250.45 250.45 250.45 250.45 10 5007 75.29 75.29 75.29 75.29 75.29 75.29 11 5001 1045.60 3045.60 3045.60 3045.60 3045.60 3045.60 sale_jul sale_aug sale_sep sale_oct sale_nov sale_dec 0 950.50 150.50 150.50 150.50 150.50 150.50 1 270.65 70.65 270.65 270.65 270.65 70.65 2 65.26 65.26 65.26 65.26 95.26 65.26 3 210.50 110.50 110.50 110.50 110.50 110.50 4 948.50 948.50 948.50 948.50 948.50 948.50 5 2400.60 400.60 200.60 2400.60 2400.60 2400.60 6 5760.00 5760.00 5760.00 5760.00 5760.00 5760.00 7 983.43 1983.43 1983.43 1983.43 1983.43 1983.43 8 2480.40 2480.40 2480.40 2480.40 2480.40 2480.40 9 250.45 250.45 250.45 250.45 250.45 250.45 10 75.29 75.29 75.29 75.29 75.29 75.29 11 3045.60 3045.60 3045.60 3045.60 3045.60 3045.60 \Result after group on salesman_id and apply different aggregate functions: sale_jan sale_feb sale_mar sale_apr sale_may sale_jun \ salesman_id 5001 5271.46 7221.46 11271.46 1575.365000 1567.865 3062.865000 5002 2349.45 1099.45 1349.45 449.816667 443.150 449.816667 5003 590.90 2590.90 2590.90 1345.450000 1395.450 1295.450000 5005 270.65 170.65 270.65 270.650000 270.650 270.650000 5006 2983.43 1983.43 1983.43 1983.430000 1983.430 983.430000 5007 75.29 75.29 75.29 75.290000 75.290 75.290000 sale_jul sale_aug sale_sep sale_oct sale_nov \ salesman_id 5001 2817.865000 2317.865000 2267.865000 2817.865000 2825.365000 5002 716.483333 449.816667 449.816667 449.816667 449.816667 5003 1345.450000 1295.450000 1295.450000 1295.450000 1295.450000 5005 270.650000 70.650000 270.650000 270.650000 270.650000 5006 983.430000 1983.430000 1983.430000 1983.430000 1983.430000 5007 75.290000 75.290000 75.290000 75.290000 75.290000 sale_dec salesman_id 5001 2817.865000 5002 449.816667 5003 1295.450000 5005 70.650000 5006 1983.430000 5007 75.290000
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Pandas program to split a given dataset, group by two columns and convert other columns of the dataframe into a dictionary with column header as key.
Next: Write a Pandas program to split a given dataset, group by one column and remove those groups if all the values of a specific columns are not available.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
Python: Tips of the Day
Find current directory and file's directory:
To get the full path to the directory a Python file is contained in, write this in that file:
import os dir_path = os.path.dirname(os.path.realpath(__file__))
(Note that the incantation above won't work if you've already used os.chdir() to change your current working directory, since the value of the __file__ constant is relative to the current working directory and is not changed by an os.chdir() call.)
To get the current working directory use
import os cwd = os.getcwd()
Documentation references for the modules, constants and functions used above:
- The os and os.path modules.
- The __file__ constant
- os.path.realpath(path) (returns "the canonical path of the specified filename, eliminating any symbolic links encountered in the path")
- os.path.dirname(path) (returns "the directory name of pathname path")
- os.getcwd() (returns "a string representing the current working directory")
- os.chdir(path) ("change the current working directory to path")
Ref: https://bit.ly/3fy0R6m
- New Content published on w3resource:
- HTML-CSS Practical: Exercises, Practice, Solution
- Java Regular Expression: Exercises, Practice, Solution
- Scala Programming Exercises, Practice, Solution
- Python Itertools exercises
- Python Numpy exercises
- Python GeoPy Package exercises
- Python Pandas exercises
- Python nltk exercises
- Python BeautifulSoup exercises
- Form Template
- Composer - PHP Package Manager
- PHPUnit - PHP Testing
- Laravel - PHP Framework
- Angular - JavaScript Framework
- Vue - JavaScript Framework
- Jest - JavaScript Testing Framework