Blog

Classify_breast_cancer_with_sklearn

在sklearn里面有个数据集叫load_breast_cancer,569个sample,每个sample有30个feature. 我打算用svm.SVC来试试看.主要尝试的参数有kernel和gamma,kernel分别设置为rbf和sigmoid. gamma则设置为auto和scale. 经过实验我的感觉如下 sigmoid的score略低。而rbf稍微好一点. ‘gamma’设置为auto和scale最后的score都一样. 我很奇怪,sigmoid的train score比test score还少了0.03, 3%为什么呢? (569, 30) (569,) ['malignant' 'benign'] X data shape:(569, 30); no. posi...

Read more

Install_tensorflow2.1.0_in_venv

为了能安装一个支持CPU指令集的tensorflow,就是为了解决这个问题Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA 我在python3.5的venv里面重新安装了个tensorflow,使用别人做好的binary文件 步骤如下:  #create venv python3.5 -m venv TensorEnv #install tensorflow in venv pip3.5 install https://github.com/lakshayg/tenso...

Read more

Kali_install_tensorflow

开始玩tensorflow, Kali linux的源暂时没有tensorflow. apt-cache search tensorflow看到有python3-keras-preprocessing,只能用pip来安装. 我以前安装过tensorflow1.x,现在就算是升级吧. 安装命令: pip3 install tensorflow==2.1.0 碰到的问题: python3-wrapt这个包版本太低,需要卸载掉使用apt-get安装的,使用pip3重新安装 apt-get remove python3-wrapt grpcio的版本低,需要使用pip3升级. ERROR: tensorboard 2.1.1 has requirement grpcio...

Read more

Logistic_regression

logistic_regression 说白了就是个binary classifier.遵从伯努利分布 代码如下: # coding: utf-8 import seaborn as sns import matplotlib.pyplot as plt import pandas as pd import numpy as np raw_data=pd.read_csv('data-analysis/python-jupyter/2.01. Admittance.csv') type(raw_data) print(raw_data.columns) data=raw_data.copy() data['Admitted']=raw_data['Admitted'].m...

Read more

Feature_selection

记录下最近在看的两个博客 刘建平Pinard Machine Learning Mastery 参考了Feature Selection For Machine Learning in Python 数据文件仍然使用了上一篇中的pima-indians-diabetes.csv 才有了我这篇文章,在他的4种method之外,我多加了一个RandomForestClassifier, 但是RandomForestClassifier和ExtraTreesClassifier结果非常接近. 代码如下: # coding: utf-8 import pandas as pd, numpy as np import statsmodels.api as sm import statsm...

Read more

Standardization_example

想要练下standardization,结果就有了这一篇. 理论上的Normalization vs. Standardization The terms normalization and standardization are sometimes used interchangeably, but they usually refer to different things. Normalization usually means to scale a variable to have a values between 0 and 1, while standardization transforms data to have a mean of zero and a stan...

Read more

Multiple_lenear_reg_sklearn

使用sklearn练习的multiple_linear_regression, sklearn没有现成计算p-value,adjusted-R-squared的方法。也没有statsmodel那样的summary,需要自己手动制作. α: level of significance, 常取值 0.05, 0.01, (1-α): confidence level if we have a α = 0.05, means we are 95% confidence the feature are significant the aim is -- the p-values always less than α. 代码如下: # coding: utf-8 import stats...

Read more

Linear_reg_sklearn

这是sklearn的linear regression和前两天的statsmodel不一样,statsmodel如果用 statsmodels.api,不论是fit方法还是predict方法, 都需要用sm.add_constant方法增加一列const, 如果使用statsmodels.formula.api则不需要add_constant方法, 只需要传入R-style formula string就可以. 使用sklearn的LinearRegression则可以直接fit,predict. 只是要注意传入参数的shape. # coding: utf-8 import numpy as np import statsmodels.api as sm import seab...

Read more