Blog

在sklearn里面有个数据集叫load_breast_cancer,569个sample,每个sample有30个feature. 我打算用svm.SVC来试试看.主要尝试的参数有kernel和gamma,kernel分别设置为rbf和sigmoid. gamma则设置为auto和scale. 经过实验我的感觉如下 sigmoid的score略低。而rbf稍微好一点. ‘gamma’设置为auto和scale最后的score都一样. 我很奇怪，sigmoid的train score比test score还少了0.03, 3%为什么呢? (569, 30) (569,) ['malignant' 'benign'] X data shape:(569, 30); no. posi...

为了能安装一个支持CPU指令集的tensorflow，就是为了解决这个问题Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA 我在python3.5的venv里面重新安装了个tensorflow,使用别人做好的binary文件步骤如下：　 #create venv python3.5 -m venv TensorEnv #install tensorflow in venv pip3.5 install https://github.com/lakshayg/tenso...

开始玩tensorflow, Kali linux的源暂时没有tensorflow. apt-cache search tensorflow看到有python3-keras-preprocessing，只能用pip来安装. 我以前安装过tensorflow１.x,现在就算是升级吧. 安装命令: pip3 install tensorflow==2.1.0 碰到的问题: python3-wrapt这个包版本太低，需要卸载掉使用apt-get安装的，使用pip3重新安装 apt-get remove python3-wrapt grpcio的版本低，需要使用pip3升级. ERROR: tensorboard 2.1.1 has requirement grpcio...

logistic_regression 说白了就是个binary classifier.遵从伯努利分布代码如下: # coding: utf-8 import seaborn as sns import matplotlib.pyplot as plt import pandas as pd import numpy as np raw_data=pd.read_csv('data-analysis/python-jupyter/2.01. Admittance.csv') type(raw_data) print(raw_data.columns) data=raw_data.copy() data['Admitted']=raw_data['Admitted'].m...

记录下最近在看的两个博客刘建平Pinard Machine Learning Mastery 参考了Feature Selection For Machine Learning in Python 数据文件仍然使用了上一篇中的pima-indians-diabetes.csv 才有了我这篇文章，在他的４种method之外，我多加了一个RandomForestClassifier, 但是RandomForestClassifier和ExtraTreesClassifier结果非常接近. 代码如下： # coding: utf-8 import pandas as pd, numpy as np import statsmodels.api as sm import statsm...

想要练下standardization,结果就有了这一篇. 理论上的Normalization vs. Standardization The terms normalization and standardization are sometimes used interchangeably, but they usually refer to different things. Normalization usually means to scale a variable to have a values between 0 and 1, while standardization transforms data to have a mean of zero and a stan...

使用sklearn练习的multiple_linear_regression, sklearn没有现成计算p-value,adjusted-R-squared的方法。也没有statsmodel那样的summary，需要自己手动制作. α: level of significance, 常取值　0.05, 0.01, (1-α): confidence level if we have a α = 0.05, means we are 95% confidence the feature are significant the aim is -- the p-values always less than α. 代码如下: # coding: utf-8 import stats...

这是sklearn的linear regression和前两天的statsmodel不一样，statsmodel如果用 statsmodels.api,不论是fit方法还是predict方法，都需要用sm.add_constant方法增加一列const, 如果使用statsmodels.formula.api则不需要add_constant方法，只需要传入R-style formula string就可以. 使用sklearn的LinearRegression则可以直接fit,predict. 只是要注意传入参数的shape. # coding: utf-8 import numpy as np import statsmodels.api as sm import seab...

Classify_breast_cancer_with_sklearn

Install_tensorflow2.1.0_in_venv

Kali_install_tensorflow

Logistic_regression

Feature_selection

Standardization_example

Multiple_lenear_reg_sklearn

Linear_reg_sklearn