這篇文章主要介紹Pandas中dff的示例分析,文中介紹的非常詳細(xì),具有一定的參考價(jià)值,感興趣的小伙伴們一定要看完!
成都創(chuàng)新互聯(lián)公司自成立以來(lái),一直致力于為企業(yè)提供從網(wǎng)站策劃、網(wǎng)站設(shè)計(jì)、做網(wǎng)站、成都網(wǎng)站制作、電子商務(wù)、網(wǎng)站推廣、網(wǎng)站優(yōu)化到為企業(yè)提供個(gè)性化軟件開(kāi)發(fā)等基于互聯(lián)網(wǎng)的全面整合營(yíng)銷(xiāo)服務(wù)。公司擁有豐富的網(wǎng)站建設(shè)和互聯(lián)網(wǎng)應(yīng)用系統(tǒng)開(kāi)發(fā)管理經(jīng)驗(yàn)、成熟的應(yīng)用系統(tǒng)解決方案、優(yōu)秀的網(wǎng)站開(kāi)發(fā)工程師團(tuán)隊(duì)及專(zhuān)業(yè)的網(wǎng)站設(shè)計(jì)師團(tuán)隊(duì)。
數(shù)據(jù)分析處理庫(kù)
import pandas as pd
df=pd.read_csv("./pandas/data/titanic.csv")
df.head(N) 讀取數(shù)據(jù)的前N行
df.head(6)
df.info() 獲取DataFrame的簡(jiǎn)要摘要
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
df.index 查看索引
df.index
RangeIndex(start=0, stop=891, step=1)
df.columns 查看所有列名
df.columns
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')
df.dtypes 查看每一列的字段類(lèi)型
df.dtypes
PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object
df.values查看所有數(shù)據(jù)
df.values
array([[1, 0, 3, ..., 7.25, nan, 'S'],
[2, 1, 1, ..., 71.2833, 'C85', 'C'],
[3, 1, 3, ..., 7.925, nan, 'S'],
...,
[889, 0, 3, ..., 23.45, nan, 'S'],
[890, 1, 1, ..., 30.0, 'C148', 'C'],
[891, 0, 3, ..., 7.75, nan, 'Q']], dtype=object)
df['Name']
0 Braund, Mr. Owen Harris
1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 Heikkinen, Miss. Laina
3 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 Allen, Mr. William Henry
...
886 Montvila, Rev. Juozas
887 Graham, Miss. Margaret Edith
888 Johnston, Miss. Catherine Helen "Carrie"
889 Behr, Mr. Karl Howell
890 Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object
df=df.set_index('Name')
df
查詢(xún)Age列的前8列數(shù)據(jù)
df['Age'][:8]
Name
Braund, Mr. Owen Harris 22.0
Cumings, Mrs. John Bradley (Florence Briggs Thayer) 38.0
Heikkinen, Miss. Laina 26.0
Futrelle, Mrs. Jacques Heath (Lily May Peel) 35.0
Allen, Mr. William Henry 35.0
Moran, Mr. James NaN
McCarthy, Mr. Timothy J 54.0
Palsson, Master. Gosta Leonard 2.0
Name: Age, dtype: float64
對(duì)單列數(shù)據(jù)的操作
age=df['Age']
age
Name
Braund, Mr. Owen Harris 22.0
Cumings, Mrs. John Bradley (Florence Briggs Thayer) 38.0
Heikkinen, Miss. Laina 26.0
Futrelle, Mrs. Jacques Heath (Lily May Peel) 35.0
Allen, Mr. William Henry 35.0
...
Montvila, Rev. Juozas 27.0
Graham, Miss. Margaret Edith 19.0
Johnston, Miss. Catherine Helen "Carrie" NaN
Behr, Mr. Karl Howell 26.0
Dooley, Mr. Patrick 32.0
Name: Age, Length: 891, dtype: float64
# 每一個(gè)Age統(tǒng)一加10
age=age+10
age
Name
Braund, Mr. Owen Harris 32.0
Cumings, Mrs. John Bradley (Florence Briggs Thayer) 48.0
Heikkinen, Miss. Laina 36.0
Futrelle, Mrs. Jacques Heath (Lily May Peel) 45.0
Allen, Mr. William Henry 45.0
...
Montvila, Rev. Juozas 37.0
Graham, Miss. Margaret Edith 29.0
Johnston, Miss. Catherine Helen "Carrie" NaN
Behr, Mr. Karl Howell 36.0
Dooley, Mr. Patrick 42.0
Name: Age, Length: 891, dtype: float64
# Age的最大值
age.max()
90.0
# Age的最小值
age.min()
10.42
# Age的平均值
age.mean()
39.69911764705882
describe得到數(shù)據(jù)的基本統(tǒng)計(jì)特征
df.describe()
只查詢(xún)某集幾列
df[['Age','Fare']][:5]
通過(guò)索引或者標(biāo)簽查詢(xún)數(shù)據(jù)
# 通過(guò)索引查看某一行的數(shù)據(jù)
df.iloc[0]
# 查詢(xún)前4行數(shù)據(jù)
df.iloc[0:5]
# 查詢(xún)前4行前3列的數(shù)據(jù)
df.iloc[0:5,1:3]
# 通過(guò)索引列值讀取某一行的數(shù)據(jù)
df.loc['Futrelle, Mrs. Jacques Heath (Lily May Peel)']
# 查詢(xún)某行某列的某個(gè)值
df.loc['Futrelle, Mrs. Jacques Heath (Lily May Peel)','Age']
# 查詢(xún)某幾行的數(shù)某幾列的數(shù)據(jù)
df.loc['Braund, Mr. Owen Harris':'Graham, Miss. Margaret Edith','Sex':'Age']
# 修改某個(gè)值
df.loc['Heikkinen, Miss. Laina','Age']=2000
bool運(yùn)算
# 查詢(xún)Age大于50的前5行數(shù)據(jù)
df[df['Age']>50][:5]
# 查詢(xún)Sex為female的數(shù)據(jù)
df[df['Sex']=='female']
# 計(jì)算Sex為male,Age的平均值
df.loc[df['Sex']=='male','Age'].mean()
# 計(jì)算Age大于50的年齡和
(df['Age']>50).sum()
65
DataFrame groupby數(shù)據(jù)分組
dff=pd.DataFrame({'key':['A','B','C','A','B','C','A','B','C'],'value':[0,5,10,5,10,15,10,15,20]})
dff
按照key分組求和
dff.groupby('key').sum()
import numpy as np
dff.groupby('key').aggregate(np.mean)
# 按照Sex分組,計(jì)算Age的平均值
df.groupby('Sex')['Age'].mean()
Sex
female 35.478927
male 30.726645
Name: Age, dtype: float64
數(shù)值運(yùn)算
df1=pd.DataFrame([[1,2,3,4],[3,4,5,6]],index=['a','b'],columns=['A','B','C','D'])
df1
# 每一列求值
df1.sum()
df1.sum(axis=0)
A 4
B 6
C 8
D 10
dtype: int64
# 每一行求和
df1.sum(axis=1)
a 10
b 18
dtype: int64
# 每一列求平均值
df1.mean(axis=0)
A 2.0
B 3.0
C 4.0
D 5.0
dtype: float64
# 每一行求平均值
df1.mean(axis=1)
a 2.5
b 4.5
dtype: float64
df
# 協(xié)方差
df.cov()
# 相關(guān)性
df.corr()
# 統(tǒng)計(jì)某一個(gè)每一個(gè)值出現(xiàn)的次數(shù)
df['Age'].value_counts()
24.00 30
22.00 27
18.00 26
28.00 25
19.00 25
..
53.00 1
55.50 1
70.50 1
23.50 1
0.42 1
Name: Age, Length: 89, dtype: int64
# 統(tǒng)計(jì)某一個(gè)每一個(gè)值出現(xiàn)的次數(shù),次數(shù)由少到多排列
df['Age'].value_counts(ascending=True)
0.42 1
23.50 1
70.50 1
55.50 1
53.00 1
..
19.00 25
28.00 25
18.00 26
22.00 27
24.00 30
Name: Age, Length: 89, dtype: int64
對(duì)象操作(Series一行或者一列)
data=[1,2,3,4]
index=['a','b','c','d']
s=pd.Series(index=index,data=data)
# 查詢(xún)第一行
s[0]
# 查詢(xún)1到3行
s[1:3]
# 掩碼操作 只顯示a c行
mask=[True,False,True,False]
s[mask]
#修改某個(gè)值
s['a']=200
# 值替換將3替換為300
s.replace(to_replace=3,value=300,inplace=True)
# 修改列名
s.rename(index={'a':'A'},inplace=True)
# 添加數(shù)據(jù)
s1=pd.Series(index=['e','f'],data=[5,6])
s3=s.append(s1)
# 刪除A行數(shù)據(jù)
del s3['A']
# 一次刪除多行數(shù)據(jù)
s3.drop(['c','d'],inplace=True)
s3
b 2
e 5
f 6
dtype: int64
DataFrame的增刪改查操作
# 構(gòu)造一個(gè)DataFrame
data=[[1,2,3,4],[5,6,7,8]]
index=['a','b']
columns=['A','B','C','D']
dff=pd.DataFrame(data=data,index=index,columns=columns)
# 通過(guò)loc(‘索引值’)和iloc(索引數(shù)值)查詢(xún)
dff1=dff.iloc[1]
dff1=dff.loc['a']
dff1
A 1
B 2
C 3
D 4
Name: a, dtype: int64
# 修改值
dff.loc['a']['A']=1000
dff
# 修改索引
dff.index=['m','n']
dff
# 添加一行數(shù)據(jù)
dff.loc['o']=[10,11,12,13]
dff
| A | B | C | D | m | 1000 | 2 | 3 | 4 |
n | 5 | 6 | 7 | 8 |
o | 10 | 11 | 12 | 13 |
# 添加一列數(shù)據(jù)
dff['E']=[5,9,14]
dff
| A | B | C | D | E | m | 1000 | 2 | 3 | 4 | 5 |
n | 5 | 6 | 7 | 8 | 9 |
o | 10 | 11 | 12 | 13 | 14 |
# 批量添加多列數(shù)據(jù)
df4=pd.DataFrame([[6,10,15],[7,11,16],[8,12,17]],index=['m','n','o'],columns=['F','M','N'])
df5=pd.concat([dff,df4],axis=1)
df5
| A | B | C | D | E | F | M | N | m | 1000 | 2 | 3 | 4 | 5 | 6 | 10 | 15 |
n | 5 | 6 | 7 | 8 | 9 | 7 | 11 | 16 |
o | 10 | 11 | 12 | 13 | 14 | 8 | 12 | 17 |
# 刪除一行數(shù)據(jù)
df5.drop(['o'],axis=0,inplace=True)
df5
| A | B | C | D | E | F | M | N | m | 1000 | 2 | 3 | 4 | 5 | 6 | 10 | 15 |
n | 5 | 6 | 7 | 8 | 9 | 7 | 11 | 16 |
# 刪除列
df5.drop(['E','F'],axis=1,inplace=True)
df5
| A | B | C | D | M | N | m | 1000 | 2 | 3 | 4 | 10 | 15 |
n | 5 | 6 | 7 | 8 | 11 | 16 |
以上是“Pandas中dff的示例分析”這篇文章的所有內(nèi)容,感謝各位的閱讀!希望分享的內(nèi)容對(duì)大家有幫助,更多相關(guān)知識(shí),歡迎關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道!
網(wǎng)站欄目:Pandas中dff的示例分析
分享網(wǎng)址:http://bm7419.com/article36/jjcdsg.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供虛擬主機(jī)、域名注冊(cè)、微信小程序、建站公司、軟件開(kāi)發(fā)、企業(yè)網(wǎng)站制作
廣告
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶(hù)投稿、用戶(hù)轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話(huà):028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來(lái)源:
創(chuàng)新互聯(lián)