Np和pd中std函数默认参数不同引发的有趣现象

 

在看numpy和pandas的时候,发现了一个有趣的现象。由于std函数的默认参数不同所以求出来的standard deviation 是不同的。
看代码

    In [1]: import numpy as np

    In [2]: import pandas as pd

    In [3]: df5=pd.DataFrame(np.arange(9).reshape(3,3), columns=['a','b','c'])

    In [4]: df5
    Out[4]: 
       a  b  c
    0  0  1  2
    1  3  4  5
    2  6  7  8

    In [5]: df5.apply(np.std, axis=1)
    Out[5]: 
    0    0.816497
    1    0.816497
    2    0.816497
    dtype: float64

    In [6]: df5.std(axis=1)
    Out[6]: 
    0    1.0
    1    1.0
    2    1.0
    dtype: float64

为什么使用np.std 和pandas的std函数,得出的结果却不一样呢?
help一下。在numpy中ddof默认值是0.

std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)
    Compute the standard deviation along the specified axis.
    
    Returns the standard deviation, a measure of the spread of a distribution,
    of the array elements. The standard deviation is computed for the
    flattened array by default, otherwise over the specified axis.
    
    Parameters
    ----------
    a : array_like
        Calculate the standard deviation of these values.
    axis : None or int or tuple of ints, optional
        Axis or axes along which the standard deviation is computed. The
        default is to compute the standard deviation of the flattened array.
    
        .. versionadded:: 1.7.0
    
        If this is a tuple of ints, a standard deviation is performed over
        multiple axes, instead of a single axis or all the axes as before.
    dtype : dtype, optional
        Type to use in computing the standard deviation. For arrays of
        integer type the default is float64, for arrays of float types it is
        the same as the array type.
    out : ndarray, optional
        Alternative output array in which to place the result. It must have
        the same shape as the expected output but the type (of the calculated
        values) will be cast if necessary.
    ddof : int, optional
        Means Delta Degrees of Freedom. The divisor used in calculations
        is ``N - ddof``, where ``N`` represents the number of elements.
        By default `ddof` is zero.

再看pandas中的help信息。ddof的默认值是1

    Help on method std in module pandas.core.frame:

    std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs) method of pandas.core.frame.DataFrame instance
    Return sample standard deviation over requested axis.
    
    Normalized by N-1 by default. This can be changed using the ddof argument
    
    Parameters
    ----------
    axis : {index (0), columns (1)}
    skipna : boolean, default True
        Exclude NA/null values. If an entire row/column is NA, the result
        will be NA
    level : int or level name, default None
        If the axis is a MultiIndex (hierarchical), count along a
        particular level, collapsing into a Series
    ddof : int, default 1
        Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
        where N represents the number of elements.

ddof: Delta Degrees of Freedom.
Degrees of Freedom 是自由度,Delta Degrees of Freedom是什么我是搞不清楚.
但是找到了原因,在调用np.std传入ddof的值就可以了。

    In [12]: df5.apply(np.std, axis=1, ddof=1)
    Out[12]:
    0 1.0
    1 1.0
    2 1.0
    dtype: float64

    In [13]: df5.std(axis=1, ddof=1)
    Out[13]:
    0 1.0
    1 1.0
    2 1.0
    dtype: float64