Classify_breast_cancer_with_sklearn
在sklearn里面有个数据集叫load_breast_cancer,569个sample,每个sample有30个feature.
我打算用svm.SVC来试试看.主要尝试的参数有kernel和gamma,kernel分别设置为rbf和sigmoid. gamma则设置为auto和scale.
经过实验我的感觉如下
sigmoid的score略低。而rbf稍微好一点.
‘gamma’设置为auto和scale最后的score都一样.
我很奇怪,sigmoid的train score比test score还少了0.03, 3%为什么呢?
(569, 30)
(569,)
['malignant' 'benign']
X data shape:(569, 30); no. posi...
Install_tensorflow2.1.0_in_venv
为了能安装一个支持CPU指令集的tensorflow,就是为了解决这个问题Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
我在python3.5的venv里面重新安装了个tensorflow,使用别人做好的binary文件
步骤如下:
#create venv
python3.5 -m venv TensorEnv
#install tensorflow in venv
pip3.5 install https://github.com/lakshayg/tenso...
Kali_install_tensorflow
开始玩tensorflow, Kali linux的源暂时没有tensorflow.
apt-cache search tensorflow看到有python3-keras-preprocessing,只能用pip来安装.
我以前安装过tensorflow1.x,现在就算是升级吧.
安装命令:
pip3 install tensorflow==2.1.0
碰到的问题:
python3-wrapt这个包版本太低,需要卸载掉使用apt-get安装的,使用pip3重新安装
apt-get remove python3-wrapt
grpcio的版本低,需要使用pip3升级.
ERROR: tensorboard 2.1.1 has requirement grpcio...
Logistic_regression
logistic_regression 说白了就是个binary classifier.遵从伯努利分布
代码如下:
# coding: utf-8
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
raw_data=pd.read_csv('data-analysis/python-jupyter/2.01. Admittance.csv')
type(raw_data)
print(raw_data.columns)
data=raw_data.copy()
data['Admitted']=raw_data['Admitted'].m...
Feature_selection
记录下最近在看的两个博客
刘建平Pinard
Machine Learning Mastery
参考了Feature Selection For Machine Learning in Python
数据文件仍然使用了上一篇中的pima-indians-diabetes.csv
才有了我这篇文章,在他的4种method之外,我多加了一个RandomForestClassifier,
但是RandomForestClassifier和ExtraTreesClassifier结果非常接近.
代码如下:
# coding: utf-8
import pandas as pd, numpy as np
import statsmodels.api as sm
import statsm...
Standardization_example
想要练下standardization,结果就有了这一篇.
理论上的Normalization vs. Standardization
The terms normalization and standardization are sometimes used interchangeably, but they usually refer to different things.
Normalization usually means to scale a variable to have a values between 0 and 1,
while standardization transforms data to have a mean of zero and a stan...
Multiple_lenear_reg_sklearn
使用sklearn练习的multiple_linear_regression, sklearn没有现成计算p-value,adjusted-R-squared的方法。也没有statsmodel那样的summary,需要自己手动制作.
α: level of significance, 常取值 0.05, 0.01,
(1-α): confidence level
if we have a α = 0.05, means we are 95% confidence the feature are significant
the aim is -- the p-values always less than α.
代码如下:
# coding: utf-8
import stats...
Linear_reg_sklearn
这是sklearn的linear regression和前两天的statsmodel不一样,statsmodel如果用
statsmodels.api,不论是fit方法还是predict方法,
都需要用sm.add_constant方法增加一列const,
如果使用statsmodels.formula.api则不需要add_constant方法,
只需要传入R-style formula string就可以.
使用sklearn的LinearRegression则可以直接fit,predict. 只是要注意传入参数的shape.
# coding: utf-8
import numpy as np
import statsmodels.api as sm
import seab...
87 post articles, 11 pages.