'AI/Pandas 기초' 카테고리의 글 목록

13. 두개의 데이터프레임 합치기

행으로 합치기 df1 df2 result = pd.concat([df1,df2] ,ignore_index = True) ignore_index 앞에서도 나왔는데 행 index 중복 될까봐 이전 값 무시하는거 result = df1.append(df2, ignore_index = True) 똑같은 기능 열로 합치기 result = pd.concat([df1,df2],axis =1 ,ignore_index =True) label = [1,2,3,4,5] prediction = [1,2,2,4,4] comparison = pd.DataFrame({'label' : label, 'prediction' : prediction}) 이렇게 생긴다.

AI/Pandas 기초 2020.04.24

12. 컬럼 내 유니크한 값 뽑아내고 갯수 확인하기

df.job.unique() array( ['teacher', 'student', 'developer', 'dentist', 'lawyer', 'banker', 'basketball player'], dtype=object) 종류별로 하나씩 가져온다. df.job.value_counts() teacher 8 student 5 lawyer 2 banker 2 developer 1 dentist 1 basketball player 1 Name: job, dtype: int64 count도 할 수 있다.

AI/Pandas 기초 2020.04.24

11. map, applymap 함수 활용

def extract_year(row): return row.split('-')[0] df['year'] = df['date'].map(extract_year) apply와 map 똑같네 df.job = df.job.map({"student":1,"developer":2,"teacher":3}) 이건 apply는 못하는거 직업 데이터를 숫자로 바꾸기 student -> 1 df = df.applymap(np.around) 반올림하기 모든 값을 바꿀 땐 apply 하나하나는 map

AI/Pandas 기초 2020.04.24

10. apply 함수 활용

def extract_year(row): return row.split('-')[0] df['year'] = df['yyyy-mm-dd'].apply(extract_year) 06. 에서 했던거 이게 apply함수의 기본이래 def get_age(year, current_year): return current_year - int(year) df['age'] = df['year'].apply(get_age, current_year = 2020) 생년으로 나이 구해서 넣기. def get_introduce(age, prefix, suffix): return prefix + str(age) + suffix df['introduce'] = df.apply(get_introduce, prefix = "I am",..

AI/Pandas 기초 2020.04.24

09. NaN 찾아서 다른 값으로 변경하기

df.shape df.info() (8, 3) RangeIndex: 8 entries, 0 to 7 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 8 non-null object 1 job 8 non-null object 2 age 6 non-null float64 dtypes: float64(1), object(2) memory usage: 320.0+ bytes 이처럼 확인할 수 있다. df.isna() df.isnull() 둘 다 똑같다. df.age = df.age.fillna(0) 0으로 채운다. df["age"].fillna(df.groupby("job")["a..

AI/Pandas 기초 2020.04.24

08. 중복 데이터 삭제하기

df.duplicated() true / false로 알려준다 이때는 모든 열의 데이터가 같아야 한다. df.drop_duplicates() 쉽게 지울 수 있다. df.duplicated(['name']) name 이 같은 데이터를 찾는다. df.drop_duplicates(['name'], keep='last') keep = 'first' keep = 'last' keep =False 남길 값은 정한다. 중간 값은 안된다. False이면 다 지운다.

AI/Pandas 기초 2020.04.24

07. 데이터 그룹 만들기

df groupby_major = df.groupby('major') type(groupby_major)는 pandas.core.groupby.generic.DataFrameGroupBy groupby_major.groups {'Computer Science': Int64Index([0, 1, 6, 7], dtype='int64'), 'Economics': Int64Index([4, 5, 9], dtype='int64'), 'Physics': Int64Index([2], dtype='int64'), 'Psychology': Int64Index([3, 8, 10], dtype='int64')} dict 타입으로 확인할 수 있다. for name, group in groupby_major: print(nam..

AI/Pandas 기초 2020.04.24

06. 행, 열 생성 및 수정하기

열 만들기 df['salary'] = 0 열추가 자동으로 적용된다. import numpy as np df['salary'] = np.where(df['job'] != 'student', 'yes', 'no') np.where을 이용하여 앞에 조건에 따라 참이면 yes 거짓이면 no 값이 들어간다. df['total'] = df['mid'] + df['fin'] mid라는 열과 fin열의 값을 더해서 total이라는 열 생성하여 저장 grades = [] for row in df['average']: if row >= 90: grades.append('A') elif row >= 80: grades.append('B') else: grades.append('F') df['grades'] =grades 리..

AI/Pandas 기초 2020.04.24

05. 데이터프레임 행, 열 삭제하기

-drop df.drop(['John', 'Nate']) df.drop(['John', 'Nate'], inplace = True) 'John', 'Nate' 는 행의 index이다. 행 삭제 적용되지는 않는다. inplace 를 해주면 바로 적용된다. df.drop(df.index[[0,2]]) index로 접근하여 행삭제 df[df.age>20] 조건에 따라서 행삭제 df = df.drop('age', axis =1) column 이름으로 열삭제

AI/Pandas 기초 2020.04.24

04. 데이터프레임 행, 열 선택 및 필터하기

- 일부 선택 df[1:3] 1, 2 data만 ! - 불연속적으로 선택 df.loc[ ['2009-01','2010-01'] ] 행의 이름으로! - by column condition df[df.age > 25] df.query('age>25') df[ (df.age>25) & (df.name == 'Nate') ] Filter Column - by index df.iloc[:,0:2] df.iloc[0:2,0:2] 인덱스로 접근! - by column name df[ ['name', 'age'] ] df.filter(items = ['age','job']) 둘다 같은 의미로 column name 선택하기 df.filter(like ='a', axis = 1) column name에 'a'가 들어가 있..

AI/Pandas 기초 2020.04.24

Since. 24살

AI/Pandas 기초 14

티스토리툴바