반응형
# 불필요한 컬럼 제거
drop 함수 이용
data.drop(columns=['CHAS','RAD'])
# 각 컬럼별 IQR(Inter Quantile Range) 구하기
describe() 함수로 기초통계량을 구하여 변수에 저장
data_col12_desc = data_col12.describe()
print(data_col12_desc)
CRIM ZN INDUS NOX RM AGE \
count 506.000000 506.000000 506.000000 506.000000 491.000000 506.000000
mean 3.613524 11.363636 11.136779 0.554695 6.285102 68.574901
std 8.601545 23.322453 6.860353 0.115878 0.708096 28.148861
min 0.006320 0.000000 0.460000 0.385000 3.561000 2.900000
25% 0.082045 0.000000 5.190000 0.449000 5.886000 45.025000
50% 0.256510 0.000000 9.690000 0.538000 6.209000 77.500000
75% 3.677083 12.500000 18.100000 0.624000 6.622000 94.075000
max 88.976200 100.000000 27.740000 0.871000 8.780000 100.000000
DIS TAX PTRATIO B LSTAT MEDV
count 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000
mean 3.795043 408.237154 18.455534 356.674032 12.653063 22.532806
std 2.105710 168.537116 2.164946 91.294864 7.141062 9.197104
min 1.129600 187.000000 12.600000 0.320000 1.730000 5.000000
25% 2.100175 279.000000 17.400000 375.377500 6.950000 17.025000
50% 3.207450 330.000000 19.050000 391.440000 11.360000 21.200000
75% 5.188425 666.000000 20.200000 396.225000 16.955000 25.000000
max 12.126500 711.000000 22.000000 396.900000 37.970000 50.000000
4번(25%), 6번(75%) 행의 값을 가져옴
print(data_col12_desc.iloc[[4,6]])
CRIM ZN INDUS NOX RM AGE DIS TAX PTRATIO \
25% 0.082045 0.0 5.19 0.449 5.886 45.025 2.100175 279.0 17.4
75% 3.677083 12.5 18.10 0.624 6.622 94.075 5.188425 666.0 20.2
B LSTAT MEDV
25% 375.3775 6.950 17.025
75% 396.2250 16.955 25.000
컬럼명이 세로축으로 나오도록 구조 변경
transpose() 함수 또는 T 함수 사용
print(data_col12_desc.iloc[[4,6]].T)
print(data_col12_desc.iloc[[4,6]].transpose())
25% 75%
CRIM 0.082045 3.677083
ZN 0.000000 12.500000
INDUS 5.190000 18.100000
NOX 0.449000 0.624000
RM 5.886000 6.622000
AGE 45.025000 94.075000
DIS 2.100175 5.188425
TAX 279.000000 666.000000
PTRATIO 17.400000 20.200000
B 375.377500 396.225000
LSTAT 6.950000 16.955000
MEDV 17.025000 25.000000
75% 컬럼에서 25% 컬럼의 값을 빼기
print(data_col12_T['75%']-data_col12_T['25%'])
CRIM 3.595038
ZN 12.500000
INDUS 12.910000
NOX 0.175000
RM 0.736000
AGE 49.050000
DIS 3.088250
TAX 387.000000
PTRATIO 2.800000
B 20.847500
LSTAT 10.005000
MEDV 7.975000
dtype: float64
반응형
'IT 자격증 > 빅데이터분석기사' 카테고리의 다른 글
빅분기 용어정리 (ADSP) (0) | 2022.06.18 |
---|---|
데이터 읽어오기, 저장하기 (0) | 2022.06.02 |
결측치 확인 (0) | 2022.05.31 |
Top 10 구하기 (0) | 2022.05.31 |
분류모델링 (0) | 2022.05.31 |