导入模块:
>>> import numpy as npy
>
>算他的bmi,来分析是否需要减肥
>>> stu['BIM']=stu['体重']/(npy.square(stu['身高']/100))
>>> stu
性别 年龄 身高 体重 省份 成绩 月生活费 课程兴趣 案例教学 BIM
序号
1 male 20.0 170 70.0 LiaoNing NaN 800.0 5 4 24.221453
2 male 22.0 180 71.0 GuangXi 77.0 1300.0 3 4 21.913580
3 male NaN 180 62.0 FuJian 57.0 1000.0 2 4 19.135802
4 male 20.0 177 72.0 LiaoNing 79.0 900.0 4 4 22.981902
5 male 20.0 172 NaN ShanDong 91.0 NaN 5 5 NaN
6 male 20.0 179 75.0 YunNan 92.0 950.0 5 5 23.407509
7 female 21.0 166 53.0 LiaoNing 80.0 1200.0 4 5 19.233561
8 female 20.0 162 47.0 AnHui 78.0 1000.0 4 4 17.908855
9 female 20.0 162 47.0 AnHui 78.0 1000.0 4 4 17.908855
10 male 19.0 169 76.0 HeiLongJiang 88.0 1100.0 5 5 26.609713
>>>
相关的函数:
对列表进行计算统计分析:
stu['BIM']=stu['体重']/(npy.square(stu['身高']/100))
>>> stu
性别 年龄 身高 体重 省份 成绩 月生活费 课程兴趣 案例教学 BIM
序号
1 male 20.0 170 70.0 LiaoNing NaN 800.0 5 4 24.221453
2 male 22.0 180 71.0 GuangXi 77.0 1300.0 3 4 21.913580
3 male NaN 180 62.0 FuJian 57.0 1000.0 2 4 19.135802
4 male 20.0 177 72.0 LiaoNing 79.0 900.0 4 4 22.981902
5 male 20.0 172 NaN ShanDong 91.0 NaN 5 5 NaN
6 male 20.0 179 75.0 YunNan 92.0 950.0 5 5 23.407509
7 female 21.0 166 53.0 LiaoNing 80.0 1200.0 4 5 19.233561
8 female 20.0 162 47.0 AnHui 78.0 1000.0 4 4 17.908855
9 female 20.0 162 47.0 AnHui 78.0 1000.0 4 4 17.908855
10 male 19.0 169 76.0 HeiLongJiang 88.0 1100.0 5 5 26.609713
>>> #对成绩和月生活费进行分析统计
>>> stu['成绩'].mean()
80.0
>>> stu['月生活费'].quantile()
1000.0
>>> stu
性别 年龄 身高 体重 省份 成绩 月生活费 课程兴趣 案例教学 BIM
序号
1 male 20.0 170 70.0 LiaoNing NaN 800.0 5 4 24.221453
2 male 22.0 180 71.0 GuangXi 77.0 1300.0 3 4 21.913580
3 male NaN 180 62.0 FuJian 57.0 1000.0 2 4 19.135802
4 male 20.0 177 72.0 LiaoNing 79.0 900.0 4 4 22.981902
5 male 20.0 172 NaN ShanDong 91.0 NaN 5 5 NaN
6 male 20.0 179 75.0 YunNan 92.0 950.0 5 5 23.407509
7 female 21.0 166 53.0 LiaoNing 80.0 1200.0 4 5 19.233561
8 female 20.0 162 47.0 AnHui 78.0 1000.0 4 4 17.908855
9 female 20.0 162 47.0 AnHui 78.0 1000.0 4 4 17.908855
10 male 19.0 169 76.0 HeiLongJiang 88.0 1100.0 5 5 26.609713
>>> stu['月生活费'].sum()
9250.0
>>> s=stu['月生活费'].sum()
>>> s
9250.0
>>> s/10
925.0
>>> s/9
1027.7777777777778
用describe 进行统计分析:
>>> stu[['身高','体重','成绩']].describe()
身高 体重 成绩
count 10.000000 9.000000 9.000000
mean 171.700000 63.666667 80.000000
std 7.071853 11.811012 10.464225
min 162.000000 47.000000 57.000000
25% 166.750000 53.000000 78.000000
50% 171.000000 70.000000 79.000000
75% 178.500000 72.000000 88.000000
max 180.000000 76.000000 92.000000
这是一个分组分析:
grouped = stu.groupby(['性别','年龄'])
>>> grouped.aggregate({'身高':npy.mean,'月生活费':npy.max})
身高 月生活费
性别 年龄
female 20.0 162.0 1000.0
21.0 166.0 1200.0
male 19.0 169.0 1100.0
20.0 174.5 950.0
22.0 180.0 1300.0
相关性的分析,越接近1表示越正相关,越接近0表示不相关;
执行一个案例:
#这是一个调查反馈分析表:
#导入方法:
import pandas as pad
import numpy as npy
#读取表格:
data1 = pad.read_excel("d:\\resu\\python_test\\练习文件\\例题源程序-学生\\data\\studentsInfo.xlsx",'Group1',index_col = 0)
data2 = pad.read_excel("d:\\resu\\python_test\\练习文件\\例题源程序-学生\\data\\studentsInfo.xlsx",'Group2',index_col = 0)
data3 = pad.read_excel("d:\\resu\\python_test\\练习文件\\例题源程序-学生\\data\\studentsInfo.xlsx",'Group3',index_col = 0)
data4 = pad.read_excel("d:\\resu\\python_test\\练习文件\\例题源程序-学生\\data\\studentsInfo.xlsx",'Group4',index_col = 0)
data5 = pad.read_excel("d:\\resu\\python_test\\练习文件\\例题源程序-学生\\data\\studentsInfo.xlsx",'Group5',index_col = 0)
#按行追加:
stu = pad.concat([data1,data2,data3,data4,data5],axis=0)
print('data size:',stu.shape()
去重:
stu.drop_duplicates(inplace = True)
>>> print('data size after drop:',stu.shape)
data size after drop: (49, 9)
去空值:
print("nan columns:\n",stu.isnull().any())
nan columns:
性别 False
年龄 True
身高 False
体重 True
省份 False
成绩 True
月生活费 True
课程兴趣 False
案例教学 False
填充空值:
>>> stu.fillna({'年龄':20,'成绩':stu['成绩'].mean()},inplace = True)
>>> print("nan columns:\n",stu.isnull().any())
nan columns:
性别 False
年龄 False
身高 False
体重 True
省份 False
成绩 False
月生活费 True
课程兴趣 False
案例教学 False
dtype: bool
查看统计信息
stu_grade = stu.sort_values(by = '成绩',ascending = False )
>>> ex = (stu_grade['成绩']>=90).sum()
>>> fail = (stu_grade['成绩']<60).sum()
>>> print("excellent:{},fail:{}".format(ex,fail))
excellent:10,fail:4
常见统计函数:
转载自原文链接, 如需删除请联系管理员。
原文链接:统计相关的分析(表格),转载请注明来源!