파이썬 판다스(Pandas) 라이브러리를 사용하는 방법을 정리했습니다. 학습을 시작하기 전에 필요한 라이브러리를 아래의 명령어로 설치합니다.

pip install pandas numpy xlrd

각각의 라이브러리에 대한 설명은 다음과 같습니다.

pandas: raw Data를 불러들여 편집하는데 사용
numpy: pandas의 aggfunc으로 평균과 분산값을 구하는데 사용
xlrd : excel 파일을 읽어들이는데 필요

In [1]:

# import modules
import pandas as pd
import numpy as np

사용한 데이터는 깃허브에 공개되어 있는 엑셀파일 사용할 것입니다.

1. 엑셀 데이터 불러오기¶

In [2]:

# read data from web
df = pd.read_excel(
    "https://github.com/chris1610/pbpython/raw/master/data/salesfunnel.xlsx"
)
df.head()

Out[2]:

	Account	Name	Rep	Manager	Product	Quantity	Price	Status
0	714466	Trantow-Barrows	Craig Booker	Debra Henley	CPU	1	30000	presented
1	714466	Trantow-Barrows	Craig Booker	Debra Henley	Software	1	10000	presented
2	714466	Trantow-Barrows	Craig Booker	Debra Henley	Maintenance	2	5000	pending
3	737550	Fritsch, Russel and Anderson	Craig Booker	Debra Henley	CPU	1	35000	declined
4	146832	Kiehn-Spinka	Daniel Hilton	Debra Henley	CPU	2	65000	won

불러온 엑셀 파일은 매니저나 도매상별 매출에 대한 데이터입니다.

In [3]:

df["Status"] = df["Status"].astype("category")  # 데이터 타입을 변경
# Status 컬럼을 .cat.set_categories로 카테고리형으로 변경합니다.
df["Status"].cat.set_categories(
    ["won", "pending", "presented", "declined"], inplace=True
)
df.head()

Out[3]:

	Account	Name	Rep	Manager	Product	Quantity	Price	Status
0	714466	Trantow-Barrows	Craig Booker	Debra Henley	CPU	1	30000	presented
1	714466	Trantow-Barrows	Craig Booker	Debra Henley	Software	1	10000	presented
2	714466	Trantow-Barrows	Craig Booker	Debra Henley	Maintenance	2	5000	pending
3	737550	Fritsch, Russel and Anderson	Craig Booker	Debra Henley	CPU	1	35000	declined
4	146832	Kiehn-Spinka	Daniel Hilton	Debra Henley	CPU	2	65000	won

In [4]:

# 그리고... pd.pivot_table 명령을 Name 컬럼을 기준으로 적용합니다. 그러면 중복된 Name들을 모두 하나로 표현해서 위 그림과 같은 결과가 나타납니다. 중복된 항목의 숫자들은 모두 평균으로 처리됩니다.
pd.pivot_table(df, index=["Name", "Rep", "Manager"])

Out[4]:

			Account	Price	Quantity
Name	Rep	Manager
Barton LLC	John Smith	Debra Henley	740150	35000	1.000000
Fritsch, Russel and Anderson	Craig Booker	Debra Henley	737550	35000	1.000000
Herman LLC	Cedric Moss	Fred Anderson	141962	65000	2.000000
Jerde-Hilpert	John Smith	Debra Henley	412290	5000	2.000000
Kassulke, Ondricka and Metz	Wendy Yule	Fred Anderson	307599	7000	3.000000
Keeling LLC	Wendy Yule	Fred Anderson	688981	100000	5.000000
Kiehn-Spinka	Daniel Hilton	Debra Henley	146832	65000	2.000000
Koepp Ltd	Wendy Yule	Fred Anderson	729833	35000	2.000000
Kulas Inc	Daniel Hilton	Debra Henley	218895	25000	1.500000
Purdy-Kunde	Cedric Moss	Fred Anderson	163416	30000	1.000000
Stokes LLC	Cedric Moss	Fred Anderson	239344	7500	1.000000
Trantow-Barrows	Craig Booker	Debra Henley	714466	15000	1.333333

In [5]:

# 이번에는 Name, Rep, Manager를 모두 사용하는것입니다.
pd.pivot_table(df, index=["Manager", "Rep"])

Out[5]:

		Account	Price	Quantity
Manager	Rep
Debra Henley	Craig Booker	720237.0	20000.000000	1.250000
	Daniel Hilton	194874.0	38333.333333	1.666667
	John Smith	576220.0	20000.000000	1.500000
Fred Anderson	Cedric Moss	196016.5	27500.000000	1.250000
Fred Anderson	Wendy Yule	614061.5	44250.000000	3.000000

In [6]:

# Name은 빼고, Manager와 Rep만 사용하면, 먼저 언급한 Manger를 중복된 것을 정리하면 위와 같이 되고, 그 안에 Rep을 각각 표현해주고 있습니다.
# values 옵션을 사용해서 Price만 표현되도록 할 수 있습니다.

pd.pivot_table(df, index=["Manager", "Rep"], values=["Price"])

Out[6]:

		Price
Manager	Rep
Debra Henley	Craig Booker	20000.000000
	Daniel Hilton	38333.333333
	John Smith	20000.000000
Fred Anderson	Cedric Moss	27500.000000
Fred Anderson	Wendy Yule	44250.000000

In [7]:

# 이제 aggfunc 옵션을 사용해서 기본적으로 평균값을 표현하던 것을 np.sum을 이용해서 합계를 표현하도록 할 수 있습니다.
pd.pivot_table(df, index=["Manager", "Rep"], values=["Price"], aggfunc=np.sum)

Out[7]:

		Price
Manager	Rep
Debra Henley	Craig Booker	80000
	Daniel Hilton	115000
	John Smith	40000
Fred Anderson	Cedric Moss	110000
Fred Anderson	Wendy Yule	177000

In [8]:

# 또한 이제 aggfunc 옵션을 사용해서 기본적으로 평균값을 표현하던 것을 np.sum을 이용해서 합계를 표현하도록 할 수 있습니다.
pd.pivot_table(
    df, index=["Manager", "Rep"], values=["Price"], aggfunc=[np.sum, np.mean]
)

Out[8]:

		sum	mean
		Price	Price
Manager	Rep
Debra Henley	Craig Booker	80000	20000.000000
	Daniel Hilton	115000	38333.333333
	John Smith	40000	20000.000000
Fred Anderson	Cedric Moss	110000	27500.000000
Fred Anderson	Wendy Yule	177000	44250.000000