데이터 시각화 데이터 시각화의 기본 조건
목적에 맞는 선정
선형 그래프, 막대 그래프, 산점도, 박스플롯 etc
환경에 맞는 도구 선택
코드 기반(R, Python)
프로그램 기반 (시각화 툴)
문맥(도메인)에 맞는 색과 도형 사용
코드 기반의 장점
재현성 (함수화)
여러 그래프 동시 작성 가능
기존 코드 Ctrl + C/V
데이터 크기 제한 없음 (RAM 조건 충족 시)
Matplotlib 사용시 주의점
객체 지향 API 문법을 사용하라
pyplot API 문법 사용은 자제하라.
숙달해도 다른 문법과 차이가 있어서 쓸 데가 없다.
라이브러리 불러오기 1 2 3 4 import matplotlibimport seaborn as snsprint (matplotlib.__version__)print (sns.__version__)
시각화 그려보기 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import matplotlib.pyplot as pltdates = [ '2021-01-01' , '2021-01-02' , '2021-01-03' , '2021-01-04' , '2021-01-05' , '2021-01-06' , '2021-01-07' , '2021-01-08' , '2021-01-09' , '2021-01-10' ] min_temperature = [20.7 , 17.9 , 18.8 , 14.6 , 15.8 , 15.8 , 15.8 , 17.4 , 21.8 , 20.0 ] max_temperature = [34.7 , 28.9 , 31.8 , 25.6 , 28.8 , 21.8 , 22.8 , 28.4 , 30.8 , 32.0 ] fig, ax = plt.subplots(nrows = 1 , ncols = 1 , figsize=(10 ,6 )) ax.plot(dates, min_temperature, label = "Min Temp." ) ax.plot(dates, max_temperature, label = "Max Temp." ) ax.legend() plt.show()
주섹 데이터 다운로드 받기
1 2 3 4 5 import yfinance as yfdata = yf.download("AAPL" , start="2019-08-01" , end="2020-08-01" ) ts = data['Open' ] print (ts.head())print (type (ts))
[*********************100%***********************] 1 of 1 completed
2019-08-01 53.474998
2019-08-02 51.382500
2019-08-05 49.497501
2019-08-06 49.077499
2019-08-07 48.852501
Name: Open, dtype: float64
<class 'pandas.core.series.Series'>
pyplot 형태 1 2 3 4 5 6 7 import matplotlib.pyplot as pltplt.plot(ts) plt.title("" ) plt.title("Stock Market of APL" ) plt.xlabel("Date" ) plt.ylabel("Open Pric" ) plt.show()
객체지향으로 그리기
1 2 3 4 5 6 7 8 import matplotlib.pyplot as pltfix, ax = plt.subplots() ax.plot(ts) plt.show()
막대 그래프 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 import matplotlib.pyplot as pltimport numpy as npimport calendarmonth_list = [1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 ] sold_list = [300 , 400 , 550 , 900 , 600 , 960 , 900 , 910 , 800 , 700 , 550 , 450 ] fix, ax = plt.subplots(figsize = (10 , 6 )) barplots = ax.bar(month_list, sold_list) print ("barplots : " , barplots)for plot in barplots: print (plot) height = plot.get_height() ax.text(plot.get_x() + plot.get_width()/2. , height, height, ha = 'center' , va = 'bottom' ) plt.xticks(month_list, calendar.month_name[1 :13 ], rotation = 90 ) plt.show()
1 2 3 4 5 6 7 8 9 10 11 12 13 import seaborn as snstips = sns.load_dataset("tips" ) x = tips['total_bill' ] y = tips['tip' ] flg, ax = plt.subplots(figsize=(10 , 6 )) ax.scatter(x,y) ax.set_xlabel('Total Bill' ) ax.set_ylabel('Tip' ) plt.show
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 label, data = tips.groupby('sex' ) tips['sex_color' ] = tips['sex' ].map ({'Male' : '#2521F6' , 'Female' : '#EB4036' }) fix, ax = plt.subplots(figsize=(10 ,6 )) for label, data in tips.groupby('sex' ): ax.scatter(data['total_bill' ], data['tip' ], label=label, color=data['sex_color' ],alpha=0.5 ) ax.set_xlabel('Total Bill' ) ax.set_ylabel('Tip' ) ax.legend() plt.show()
다음 코드는 위와 같은 결과가 나온다. 하지만 더 간단하다.
1 2 3 4 5 6 7 8 9 import matplotlib.pyplot as pltimport seaborn as snstips = sns.load_dataset("tips" ) fig, ax =plt.subplots(figsize=(10 ,6 )) sns.scatterplot(x='total_bill' , y = 'tip' , hue='sex' , data = tips) plt.show()
1 2 3 4 5 6 7 8 9 10 fig, ax = plt.subplots(nrows=1 , ncols=2 , figsize=(15 ,5 )) sns.regplot(x = "total_bill" , y = "tip" , data = tips, ax=ax[1 ], fit_reg = True ) ax[1 ].set_title("with linear regression line" ) sns.regplot(x = "total_bill" , y = "tip" , data = tips, ax=ax[0 ], fit_reg = False ) ax[0 ].set_title("without linear regression line" ) plt.show()
막대 그래프 그리기 seaborn 방식 1 2 sns.countplot(x="day" , data=tips) plt.show()
1 2 3 print (tips['day' ].value_counts().index)print (tips['day' ].value_counts().values)print (tips['day' ].value_counts(ascending=True ))
1 2 3 4 5 6 7 8 9 10 flg, ax = plt.subplots() ax = sns.countplot(x="day" , data=tips, order = tips['day' ].value_counts().index) for plot in ax.patches: print (plot) height = plot.get_height() ax.text(plot.get_x() + plot.get_width()/2. , height, height, ha = 'center' , va = 'bottom' ) ax.set_ylim(-5 , 100 ) plt.show()
어려운 시각화 그래프 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 import matplotlib.pyplot as pltimport seaborn as snsimport numpy as npfrom matplotlib.ticker import (MultipleLocator, AutoMinorLocator, FuncFormatter)def major_formatter (x, pos ): return "%.2f$" % x formatter = FuncFormatter(major_formatter) tips = sns.load_dataset("tips" ) fig, ax = plt.subplots(nrows=1 , ncols=2 , figsize=(16 , 6 )) ax0 = sns.barplot(x="day" , y="total_bill" , data=tips, ci=None , color='lightgray' , alpha=0.85 , zorder=2 , ax = ax[0 ]) group_mean = tips.groupby(['day' ])['total_bill' ].agg('mean' ) h_day = group_mean.sort_values(ascending=False ).index[0 ] h_mean = group_mean.sort_values(ascending=False ).values[0 ] print (h_mean)for plot in ax0.patches: height = np.round (plot.get_height(), 2 ) fontweight = "normal" color = "k" if h_mean == height: fontweight = "bold" color = "darkred" plot.set_facecolor(color) plot.set_edgecolor("black" ) ax0.text(plot.get_x() + plot.get_width()/2. , height + 1 , height, ha = 'center' , size=12 , fontweight = fontweight, color = color) ax0.set_ylim(-3 , 30 ) ax0.set_title("Bar Graph" , size = 16 ) ax0.spines['top' ].set_visible(False ) ax0.spines['left' ].set_position(("outward" , 20 )) ax0.spines['left' ].set_visible(False ) ax0.spines['right' ].set_visible(False ) ax0.yaxis.set_major_locator(MultipleLocator(10 )) ax0.yaxis.set_major_formatter(formatter) ax0.yaxis.set_minor_locator(MultipleLocator(5 )) ax0.set_ylabel("Avg. Total Bill($)" , fontsize=14 ) ax0.grid(axis="y" , which="major" , color = "lightgray" ) ax0.grid(axis="y" , which="major" , ls = ":" ) for xtick in ax0.get_xticklabels(): print (xtick) if xtick.get_text() == h_day: xtick.set_color("darkred" ) xtick.set_fontweight("demibold" ) ax0.set_xticklabels(['Thursday' , 'Friday' , 'Saturday' , 'Sunday' ], size = 12 ) plt.show()
