[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

45. descriptive


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

45.1 Introduction to descriptive

Package descriptive contains a set of functions for making descriptive statistical computations and graphing. Together with the source code there are three data sets in your Maxima tree: pidigits.data, wind.data and biomed.data. They can be also downloaded from the web site http://www.biomates.net.

Any statistics manual can be used as a reference to the functions in package descriptive.

For comments, bugs or suggestions, please contact me at 'mario AT edu DOT xunta DOT es'.

Here is a simple example on how the descriptive functions in descriptive do they work, depending on the nature of their arguments, lists or matrices,

(%i1) load (descriptive)$
(%i2) /* univariate sample */   mean ([a, b, c]);
                            c + b + a
(%o2)                       ---------
                                3
(%i3) matrix ([a, b], [c, d], [e, f]);
                            [ a  b ]
                            [      ]
(%o3)                       [ c  d ]
                            [      ]
                            [ e  f ]
(%i4) /* multivariate sample */ mean (%);
                      e + c + a  f + d + b
(%o4)                [---------, ---------]
                          3          3

Note that in multivariate samples the mean is calculated for each column.

In case of several samples with possible different sizes, the Maxima function map can be used to get the desired results for each sample,

(%i1) load (descriptive)$
(%i2) map (mean, [[a, b, c], [d, e]]);
                        c + b + a  e + d
(%o2)                  [---------, -----]
                            3        2

In this case, two samples of sizes 3 and 2 were stored into a list.

Univariate samples must be stored in lists like

(%i1) s1 : [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5];
(%o1)           [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]

and multivariate samples in matrices as in

(%i1) s2 : matrix ([13.17, 9.29], [14.71, 16.88], [18.50, 16.88],
             [10.58, 6.63], [13.33, 13.25], [13.21,  8.12]);
                        [ 13.17  9.29  ]
                        [              ]
                        [ 14.71  16.88 ]
                        [              ]
                        [ 18.5   16.88 ]
(%o1)                   [              ]
                        [ 10.58  6.63  ]
                        [              ]
                        [ 13.33  13.25 ]
                        [              ]
                        [ 13.21  8.12  ]

In this case, the number of columns equals the random variable dimension and the number of rows is the sample size.

Data can be introduced by hand, but big samples are usually stored in plain text files. For example, file pidigits.data contains the first 100 digits of number %pi:

      3
      1
      4
      1
      5
      9
      2
      6
      5
      3 ...

In order to load these digits in Maxima,

(%i1) load (numericalio)$
(%i2) s1 : read_list (file_search ("pidigits.data"))$
(%i3) length (s1);
(%o3)                          100

On the other hand, file wind.data contains daily average wind speeds at 5 meteorological stations in the Republic of Ireland (This is part of a data set taken at 12 meteorological stations. The original file is freely downloadable from the StatLib Data Repository and its analysis is discused in Haslett, J., Raftery, A. E. (1989) Space-time Modelling with Long-memory Dependence: Assessing Ireland's Wind Power Resource, with Discussion. Applied Statistics 38, 1-50). This loads the data:

(%i1) load (numericalio)$
(%i2) s2 : read_matrix (file_search ("wind.data"))$
(%i3) length (s2);
(%o3)                          100
(%i4) s2 [%]; /* last record */
(%o4)            [3.58, 6.0, 4.58, 7.62, 11.25]

Some samples contain non numeric data. As an example, file biomed.data (which is part of another bigger one downloaded from the StatLib Data Repository) contains four blood measures taken from two groups of patients, A and B, of different ages,

(%i1) load (numericalio)$
(%i2) s3 : read_matrix (file_search ("biomed.data"))$
(%i3) length (s3);
(%o3)                          100
(%i4) s3 [1]; /* first record */
(%o4)            [A, 30, 167.0, 89.0, 25.6, 364]

The first individual belongs to group A, is 30 years old and his/her blood measures were 167.0, 89.0, 25.6 and 364.

One must take care when working with categorical data. In the next example, symbol a is asigned a value in some previous moment and then a sample with categorical value a is taken,

(%i1) a : 1$
(%i2) matrix ([a, 3], [b, 5]);
                            [ 1  3 ]
(%o2)                       [      ]
                            [ b  5 ]

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

45.2 Functions and Variables for data manipulation

Function: continuous_freq (list)
Function: continuous_freq (list, m)

The argument of continuous_freq must be a list of numbers, which will be then grouped in intervals and counted how many of them belong to each group. Optionally, function continuous_freq admits a second argument indicating the number of classes, 10 is default,

(%i1) load (numericalio)$
(%i2) load (descriptive)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) continuous_freq (s1, 5);
(%o4) [[0, 1.8, 3.6, 5.4, 7.2, 9.0], [16, 24, 18, 17, 25]]

The first list contains the interval limits and the second the corresponding counts: there are 16 digits inside the interval [0, 1.8], that is 0's and 1's, 24 digits in (1.8, 3.6], that is 2's and 3's, and so on.

Function: discrete_freq (list)

Counts absolute frequencies in discrete samples, both numeric and categorical. Its unique argument is a list,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"));
(%o3) [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3, 2, 3, 8, 
4, 6, 2, 6, 4, 3, 3, 8, 3, 2, 7, 9, 5, 0, 2, 8, 8, 4, 1, 9, 7, 
1, 6, 9, 3, 9, 9, 3, 7, 5, 1, 0, 5, 8, 2, 0, 9, 7, 4, 9, 4, 4, 
5, 9, 2, 3, 0, 7, 8, 1, 6, 4, 0, 6, 2, 8, 6, 2, 0, 8, 9, 9, 8, 
6, 2, 8, 0, 3, 4, 8, 2, 5, 3, 4, 2, 1, 1, 7, 0, 6, 7]
(%i4) discrete_freq (s1);
(%o4) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 
                             [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]

The first list gives the sample values and the second their absolute frequencies. Commands ? col and ? transpose should help you to understand the last input.

Function: subsample (data_matrix, logical_expression)
Function: subsample (data_matrix, logical_expression, col_num1, col_num2, ...)

This is a sort of variation of the Maxima submatrix function. The first argument is the name of the data matrix, the second is a quoted logical expression and optional additional arguments are the numbers of the columns to be taken. Its behaviour is better understood with examples,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) subsample (s2, '(%c[1] > 18));
              [ 19.38  15.37  15.12  23.09  25.25 ]
              [                                   ]
              [ 18.29  18.66  19.08  26.08  27.63 ]
(%o4)         [                                   ]
              [ 20.25  21.46  19.95  27.71  23.38 ]
              [                                   ]
              [ 18.79  18.96  14.46  26.38  21.84 ]

These are multivariate records in which the wind speeds in the first meteorological station were greater than 18. See that in the quoted logical expression the i-th component is refered to as %c[i]. Symbol %c[i] is used inside function subsample, therefore when used as a categorical variable, Maxima gets confused. In the following example, we request only the first, second and fifth components of those records with wind speeds greater or equal than 16 in station number 1 and lesser than 25 knots in station number 4,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) subsample (s2, '(%c[1] >= 16 and %c[4] < 25), 1, 2, 5);
                     [ 19.38  15.37  25.25 ]
                     [                     ]
                     [ 17.33  14.67  19.58 ]
(%o4)                [                     ]
                     [ 16.92  13.21  21.21 ]
                     [                     ]
                     [ 17.25  18.46  23.87 ]

Here is an example with the categorical variables of biomed.data. We want the records corresponding to those patients in group B who are older than 38 years,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s3 : read_matrix (file_search ("biomed.data"))$
(%i4) subsample (s3, '(%c[1] = B and %c[2] > 38));
                [ B  39  28.0  102.3  17.1  146 ]
                [                               ]
                [ B  39  21.0  92.4   10.3  197 ]
                [                               ]
                [ B  39  23.0  111.5  10.0  133 ]
                [                               ]
                [ B  39  26.0  92.6   12.3  196 ]
(%o4)           [                               ]
                [ B  39  25.0  98.7   10.0  174 ]
                [                               ]
                [ B  39  21.0  93.2   5.9   181 ]
                [                               ]
                [ B  39  18.0  95.0   11.3  66  ]
                [                               ]
                [ B  39  39.0  88.5   7.6   168 ]

Probably, the statistical analysis will involve only the blood measures,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s3 : read_matrix (file_search ("biomed.data"))$
(%i4) subsample (s3, '(%c[1] = B and %c[2] > 38), 3, 4, 5, 6);
                   [ 28.0  102.3  17.1  146 ]
                   [                        ]
                   [ 21.0  92.4   10.3  197 ]
                   [                        ]
                   [ 23.0  111.5  10.0  133 ]
                   [                        ]
                   [ 26.0  92.6   12.3  196 ]
(%o4)              [                        ]
                   [ 25.0  98.7   10.0  174 ]
                   [                        ]
                   [ 21.0  93.2   5.9   181 ]
                   [                        ]
                   [ 18.0  95.0   11.3  66  ]
                   [                        ]
                   [ 39.0  88.5   7.6   168 ]

This is the multivariate mean of s3,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s3 : read_matrix (file_search ("biomed.data"))$
(%i4) mean (s3);
       65 B + 35 A  317          6 NA + 8145.0
(%o4) [-----------, ---, 87.178, -------------, 18.123, 
           100      10                100
                                                    3 NA + 19587
                                                    ------------]
                                                        100

Here, the first component is meaningless, since A and B are categorical, the second component is the mean age of individuals in rational form, and the fourth and last values exhibit some strange behaviour. This is because symbol NA is used here to indicate non available data, and the two means are of course nonsense. A possible solution would be to take out from the matrix those rows with NA symbols, although this deserves some loss of information,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s3 : read_matrix (file_search ("biomed.data"))$
(%i4) mean(subsample(s3, '(%c[4] # NA and %c[6] # NA), 3,4,5,6));
(%o4) [79.4923076923077, 86.2032967032967, 16.93186813186813, 
                                                            2514
                                                            ----]
                                                             13

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

45.3 Functions and Variables for descriptive statistics

Function: mean (list)
Function: mean (matrix)

This is the sample mean, defined as

                       n
                     ====
             _   1   \
             x = -    >    x
                 n   /      i
                     ====
                     i = 1

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) mean (s1);
                               471
(%o4)                          ---
                               100
(%i5) %, numer;
(%o5)                         4.71
(%i6) s2 : read_matrix (file_search ("wind.data"))$
(%i7) mean (s2);
(%o7)     [9.9485, 10.1607, 10.8685, 15.7166, 14.8441]
Function: var (list)
Function: var (matrix)

This is the sample variance, defined as

                     n
                   ====
           2   1   \          _ 2
          s  = -    >    (x - x)
               n   /       i
                   ====
                   i = 1

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) var (s1), numer;
(%o4)                   8.425899999999999

See also function var1.

Function: var1 (list)
Function: var1 (matrix)

This is the sample variance, defined as

                     n
                   ====
               1   \          _ 2
              ---   >    (x - x)
              n-1  /       i
                   ====
                   i = 1

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) var1 (s1), numer;
(%o4)                    8.5110101010101
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) var1 (s2);
(%o6) [17.39586540404041, 15.13912778787879, 15.63204924242424, 
                            32.50152569696971, 24.66977392929294]

See also function var.

Function: std (list)
Function: std (matrix)

This is the the square root of function var, the variance with denominator n.

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) std (s1), numer;
(%o4)                   2.902740084816414
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) std (s2);
(%o6) [4.149928523480858, 3.871399812729241, 3.933920277534866, 
                            5.672434260526957, 4.941970881136392]

See also functions var and std1.

Function: std1 (list)
Function: std1 (matrix)

This is the the square root of function var1, the variance with denominator n-1.

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) std1 (s1), numer;
(%o4)                   2.917363553109228
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) std1 (s2);
(%o6) [4.17083509672109, 3.89090320978032, 3.953738641137555, 
                            5.701010936401517, 4.966867617451963]

See also functions var1 and std.

Function: noncentral_moment (list, k)
Function: noncentral_moment (matrix, k)

The non central moment of order k, defined as

                       n
                     ====
                 1   \      k
                 -    >    x
                 n   /      i
                     ====
                     i = 1

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) noncentral_moment (s1, 1), numer; /* the mean */
(%o4)                         4.71
(%i6) s2 : read_matrix (file_search ("wind.data"))$
(%i7) noncentral_moment (s2, 5);
(%o7) [319793.8724761506, 320532.1923892463, 391249.5621381556, 
                            2502278.205988911, 1691881.797742255]

See also function central_moment.

Function: central_moment (list, k)
Function: central_moment (matrix, k)

The central moment of order k, defined as

                    n
                  ====
              1   \          _ k
              -    >    (x - x)
              n   /       i
                  ====
                  i = 1

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) central_moment (s1, 2), numer; /* the variance */
(%o4)                   8.425899999999999
(%i6) s2 : read_matrix (file_search ("wind.data"))$
(%i7) central_moment (s2, 3);
(%o7) [11.29584771375004, 16.97988248298583, 5.626661952750102, 
                             37.5986572057918, 25.85981904394192]

See also functions central_moment and mean.

Function: cv (list)
Function: cv (matrix)

The variation coefficient is the quotient between the sample standard deviation (std) and the mean,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) cv (s1), numer;
(%o4)                   .6193977819764815
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) cv (s2);
(%o6) [.4192426091090204, .3829365309260502, 0.363779605385983, 
                            .3627381836021478, .3346021393989506]

See also functions std and mean.

Function: mini (list)
Function: mini (matrix)

This is the minimum value of the sample list,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) mini (s1);
(%o4)                           0
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) mini (s2);
(%o6)             [0.58, 0.5, 2.67, 5.25, 5.17]

See also function maxi.

Function: maxi (list)
Function: maxi (matrix)

This is the maximum value of the sample list,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) maxi (s1);
(%o4)                           9
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) maxi (s2);
(%o6)          [20.25, 21.46, 20.04, 29.63, 27.63]

See also function mini.

Function: range (list)
Function: range (matrix)

The range is the difference between the extreme values.

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) range (s1);
(%o4)                           9
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) range (s2);
(%o6)          [19.67, 20.96, 17.37, 24.38, 22.46]
Function: quantile (list, p)
Function: quantile (matrix, p)

This is the p-quantile, with p a number in [0, 1], of the sample list. Although there are several definitions for the sample quantile (Hyndman, R. J., Fan, Y. (1996) Sample quantiles in statistical packages. American Statistician, 50, 361-365), the one based on linear interpolation is implemented in package descriptive.

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) /* 1st and 3rd quartiles */
         [quantile (s1, 1/4), quantile (s1, 3/4)], numer;
(%o4)                      [2.0, 7.25]
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) quantile (s2, 1/4);
(%o6)    [7.2575, 7.477500000000001, 7.82, 11.28, 11.48]
Function: median (list)
Function: median (matrix)

Once the sample is ordered, if the sample size is odd the median is the central value, otherwise it is the mean of the two central values.

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) median (s1);
                                9
(%o4)                           -
                                2
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) median (s2);
(%o6)         [10.06, 9.855, 10.73, 15.48, 14.105]

The median is the 1/2-quantile.

See also function quantile.

Function: qrange (list)
Function: qrange (matrix)

The interquartilic range is the difference between the third and first quartiles, quantile(list,3/4) - quantile(list,1/4),

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) qrange (s1);
                               21
(%o4)                          --
                               4
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) qrange (s2);
(%o6) [5.385, 5.572499999999998, 6.0225, 8.729999999999999, 
                                               6.650000000000002]

See also function quantile.

Function: mean_deviation (list)
Function: mean_deviation (matrix)

The mean deviation, defined as

                     n
                   ====
               1   \          _
               -    >    |x - x|
               n   /       i
                   ====
                   i = 1

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) mean_deviation (s1);
                               51
(%o4)                          --
                               20
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) mean_deviation (s2);
(%o6) [3.287959999999999, 3.075342, 3.23907, 4.715664000000001, 
                                               4.028546000000002]

See also function mean.

Function: median_deviation (list)
Function: median_deviation (matrix)

The median deviation, defined as

                 n
               ====
           1   \
           -    >    |x - med|
           n   /       i
               ====
               i = 1

where med is the median of list.

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) median_deviation (s1);
                                5
(%o4)                           -
                                2
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) median_deviation (s2);
(%o6)           [2.75, 2.755, 3.08, 4.315, 3.31]

See also function mean.

Function: harmonic_mean (list)
Function: harmonic_mean (matrix)

The harmonic mean, defined as

                  n
               --------
                n
               ====
               \     1
                >    --
               /     x
               ====   i
               i = 1

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
(%i4) harmonic_mean (y), numer;
(%o4)                   3.901858027632205
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) harmonic_mean (s2);
(%o6) [6.948015590052786, 7.391967752360356, 9.055658197151745, 
                            13.44199028193692, 13.01439145898509]

See also functions mean and geometric_mean.

Function: geometric_mean (list)
Function: geometric_mean (matrix)

The geometric mean, defined as

                 /  n      \ 1/n
                 | /===\   |
                 |  ! !    |
                 |  ! !  x |
                 |  ! !   i|
                 | i = 1   |
                 \         /

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$
(%i4) geometric_mean (y), numer;
(%o4)                   4.454845412337012
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) geometric_mean (s2);
(%o6) [8.82476274347979, 9.22652604739361, 10.0442675714889, 
                            14.61274126349021, 13.96184163444275]

See also functions mean and harmonic_mean.

Function: kurtosis (list)
Function: kurtosis (matrix)

The kurtosis coefficient, defined as

                    n
                  ====
            1     \          _ 4
           ----    >    (x - x)  - 3
              4   /       i
           n s    ====
                  i = 1

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) kurtosis (s1), numer;
(%o4)                  - 1.273247946514421
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) kurtosis (s2);
(%o6) [- .2715445622195385, 0.119998784429451, 
   - .4275233490482866, - .6405361979019522, - .4952382132352935]

See also functions mean, var and skewness.

Function: skewness (list)
Function: skewness (matrix)

The skewness coefficient, defined as

                    n
                  ====
            1     \          _ 3
           ----    >    (x - x)
              3   /       i
           n s    ====
                  i = 1

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) skewness (s1), numer;
(%o4)                  .009196180476450306
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) skewness (s2);
(%o6) [.1580509020000979, .2926379232061854, .09242174416107717, 
                            .2059984348148687, .2142520248890832]

See also functions mean, var and kurtosis.

Function: pearson_skewness (list)
Function: pearson_skewness (matrix)

Pearson's skewness coefficient, defined as

                _
             3 (x - med)
             -----------
                  s

where med is the median of list.

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) pearson_skewness (s1), numer;
(%o4)                   .2159484029093895
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) pearson_skewness (s2);
(%o6) [- .08019976629211892, .2357036272952649, 
         .1050904062491204, .1245042340592368, .4464181795804519]

See also functions mean, var and median.

Function: quartile_skewness (list)
Function: quartile_skewness (matrix)

The quartile skewness coefficient, defined as

               c    - 2 c    + c
                3/4      1/2    1/4
               --------------------
                   c    - c
                    3/4    1/4

where c_p is the p-quantile of sample list.

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) quartile_skewness (s1), numer;
(%o4)                  .04761904761904762
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) quartile_skewness (s2);
(%o6) [- 0.0408542246982353, .1467025572005382, 
       0.0336239103362392, .03780068728522298, 0.210526315789474]

See also function quantile.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

45.4 Functions and Variables for specific multivariate descriptive statistics

Function: cov (matrix)

The covariance matrix of the multivariate sample, defined as

              n
             ====
          1  \           _        _
      S = -   >    (X  - X) (X  - X)'
          n  /       j        j
             ====
             j = 1

where X_j is the j-th row of the sample matrix.

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) fpprintprec : 7$  /* change precision for pretty output */
(%i5) cov (s2);
      [ 17.22191  13.61811  14.37217  19.39624  15.42162 ]
      [                                                  ]
      [ 13.61811  14.98774  13.30448  15.15834  14.9711  ]
      [                                                  ]
(%o5) [ 14.37217  13.30448  15.47573  17.32544  16.18171 ]
      [                                                  ]
      [ 19.39624  15.15834  17.32544  32.17651  20.44685 ]
      [                                                  ]
      [ 15.42162  14.9711   16.18171  20.44685  24.42308 ]

See also function cov1.

Function: cov1 (matrix)

The covariance matrix of the multivariate sample, defined as

              n
             ====
         1   \           _        _
   S  = ---   >    (X  - X) (X  - X)'
    1   n-1  /       j        j
             ====
             j = 1

where X_j is the j-th row of the sample matrix.

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) fpprintprec : 7$ /* change precision for pretty output */
(%i5) cov1 (s2);
      [ 17.39587  13.75567  14.51734  19.59216  15.5774  ]
      [                                                  ]
      [ 13.75567  15.13913  13.43887  15.31145  15.12232 ]
      [                                                  ]
(%o5) [ 14.51734  13.43887  15.63205  17.50044  16.34516 ]
      [                                                  ]
      [ 19.59216  15.31145  17.50044  32.50153  20.65338 ]
      [                                                  ]
      [ 15.5774   15.12232  16.34516  20.65338  24.66977 ]

See also function cov.

Function: global_variances (matrix)
Function: global_variances (matrix, logical_value)

Function global_variances returns a list of global variance measures:

where p is the dimension of the multivariate random variable and S_1 the covariance matrix returned by cov1.

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) global_variances (s2);
(%o4) [105.338342060606, 21.06766841212119, 12874.34690469686, 
         113.4651792608502, 6.636590811800794, 2.576158149609762]

Function global_variances has an optional logical argument: global_variances(x,true) tells Maxima that x is the data matrix, making the same as global_variances(x). On the other hand, global_variances(x,false) means that x is not the data matrix, but the covariance matrix, avoiding its recalculation,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) s : cov1 (s2)$
(%i5) global_variances (s, false);
(%o5) [105.338342060606, 21.06766841212119, 12874.34690469686, 
         113.4651792608502, 6.636590811800794, 2.576158149609762]

See also cov and cov1.

Function: cor (matrix)
Function: cor (matrix, logical_value)

The correlation matrix of the multivariate sample.

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) fpprintprec:7$
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) cor (s2);
      [   1.0     .8476339  .8803515  .8239624  .7519506 ]
      [                                                  ]
      [ .8476339    1.0     .8735834  .6902622  0.782502 ]
      [                                                  ]
(%o5) [ .8803515  .8735834    1.0     .7764065  .8323358 ]
      [                                                  ]
      [ .8239624  .6902622  .7764065    1.0     .7293848 ]
      [                                                  ]
      [ .7519506  0.782502  .8323358  .7293848    1.0    ]

Function cor has an optional logical argument: cor(x,true) tells Maxima that x is the data matrix, making the same as cor(x). On the other hand, cor(x,false) means that x is not the data matrix, but the covariance matrix, avoiding its recalculation,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) fpprintprec:7$
(%i4) s2 : read_matrix (file_search ("wind.data"))$
(%i5) s : cov1 (s2)$
(%i6) cor (s, false); /* this is faster */
      [   1.0     .8476339  .8803515  .8239624  .7519506 ]
      [                                                  ]
      [ .8476339    1.0     .8735834  .6902622  0.782502 ]
      [                                                  ]
(%o6) [ .8803515  .8735834    1.0     .7764065  .8323358 ]
      [                                                  ]
      [ .8239624  .6902622  .7764065    1.0     .7293848 ]
      [                                                  ]
      [ .7519506  0.782502  .8323358  .7293848    1.0    ]

See also cov and cov1.

Function: list_correlations (matrix)
Function: list_correlations (matrix, logical_value)

Function list_correlations returns a list of correlation measures:

Example:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) z : list_correlations (s2)$
(%i5) fpprintprec : 5$ /* for pretty output */
(%i6) z[1];  /* precision matrix */
      [  .38486   - .13856   - .15626   - .10239    .031179  ]
      [                                                      ]
      [ - .13856   .34107    - .15233    .038447   - .052842 ]
      [                                                      ]
(%o6) [ - .15626  - .15233    .47296    - .024816  - .10054  ]
      [                                                      ]
      [ - .10239   .038447   - .024816   .10937    - .034033 ]
      [                                                      ]
      [ .031179   - .052842  - .10054   - .034033   .14834   ]
(%i7) z[2];  /* multiple correlation vector */
(%o7)       [.85063, .80634, .86474, .71867, .72675]
(%i8) z[3];  /* partial correlation matrix */
       [  - 1.0     .38244   .36627   .49908   - .13049 ]
       [                                                ]
       [  .38244    - 1.0    .37927  - .19907   .23492  ]
       [                                                ]
(%o8)  [  .36627    .37927   - 1.0    .10911    .37956  ]
       [                                                ]
       [  .49908   - .19907  .10911   - 1.0     .26719  ]
       [                                                ]
       [ - .13049   .23492   .37956   .26719    - 1.0   ]

Function list_correlations also has an optional logical argument: list_correlations(x,true) tells Maxima that x is the data matrix, making the same as list_correlations(x). On the other hand, list_correlations(x,false) means that x is not the data matrix, but the covariance matrix, avoiding its recalculation.

See also cov and cov1.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

45.5 Functions and Variables for statistical graphs

Function: dataplot (list)
Function: dataplot (list, option_1, option_2, ...)
Function: dataplot (matrix)
Function: dataplot (matrix, option_1, option_2, ...)

Funtion dataplot permits direct visualization of sample data, both univariate (list) and multivariate (matrix). Giving values to the following options some aspects of the plot can be controlled:

For example, with the following input a simple plot of the first twenty digits of %pi is requested.

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) dataplot (makelist (s1[k], k, 1, 20), 'pointstyle = 3)$

Note that one dimensional data are plotted as a time series. In the next case, same more data with different settings,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) dataplot (makelist (s1[k], k, 1, 50),
 'axisnames = ["digit order", "digit value"], 'pointstyle = 2,
 'maintitle = "First pi digits", 'joined = true)$

Function dataplot can be used to plot points in the plane. The next example is a scatterplot of the pairs of wind speeds corresponding to the first and fifth meteorological stations,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) dataplot (submatrix (s2, 2, 3, 4), 'pointstyle = 2,
 'maintitle = "Pairs of wind speeds measured in knots",
 'axisnames = ["Wind speed in A", "Wind speed in E"])$

If points are stored in a two column matrix, dataplot can plot them directly, but if they are formatted as a list of pairs, their must be transformed to a matrix as in the following example.

(%i1) load (descriptive)$
(%i2) x : [[-1, 2], [5, 7], [5, -3], [-6, -9], [-4, 6]]$
(%i3) dataplot (apply ('matrix, x), 'maintitle = "Points",
 'joined=true, 'axisnames=["", ""], 'picturescales=[0.5, 1.0])$

Points in three dimensional space can be seen as a projection on the plane. In this example, plots of wind speeds corresponding to three meteorological stations are requested, first in a 3D plot and then in a multivariate scatterplot.

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) /* 3D plot */
 dataplot (submatrix (s2, 4, 5), 'pointstyle = 2,
 'maintitle = "Pairs of wind speeds measured in knots",
 'axisnames = ["Station A", "Station B", "Station C"])$
(%i5) /* Multivariate scatterplot */
 dataplot (submatrix (s2, 4, 5), 'nclasses = 6, 'threedim = false)$

Note that in the last example, the number of classes in the histograms of the diagonal is set to 6, and that option 'threedim is set to false.

For more than three dimensions only multivariate scatterplots are possible, as in

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) dataplot (s2)$
Function: histogram (list)
Function: histogram (list, option_1, option_2, ...)
Function: histogram (one_column_matrix)
Function: histogram (one_column_matrix, option_1, option_2, ...)

This function plots an histogram. Sample data must be stored in a list of numbers or a one column matrix. Giving values to the following options some aspects of the plot can be controlled:

In the next two examples, histograms are requested for the first 100 digits of number %pi and for the wind speeds in the third meteorological station.

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s1 : read_list (file_search ("pidigits.data"))$
(%i4) histogram (s1, 'maintitle = "pi digits",
 'axisnames = ["", "Absolute frequency"], 'relbarwidth = 0.2,
 'barcolor = 3, 'colorintensity = 0.6)$
(%i5) s2 : read_matrix (file_search ("wind.data"))$
(%i6) histogram (col (s2, 3), 'colorintensity = 0.3)$

Note that in the first case, s1 is a list and in the second example, col(s2,3) is a matrix.

See also function barsplot.

Function: barsplot (list)
Function: barsplot (list, option_1, option_2, ...)
Function: barsplot (one_column_matrix)
Function: barsplot (one_column_matrix, option_1, option_2, ...)

Similar to histogram but for discrete, numeric or categorical, statistical variables. These are the options,

This example plots the barchart for groups A and B of patients in sample s3,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s3 : read_matrix (file_search ("biomed.data"))$
(%i4) barsplot (col (s3, 1), 'maintitle = "Groups of patients",
 'axisnames=["Group", "# of individuals"], 'colorintensity=0.2)$

The first column in sample s3 stores the categorical values A and B, also known sometimes as factors. On the other hand, the positive integer numbers in the second column are ages, in years, which is a discrete variable, so we can plot the absolute frequencies for these values,

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s3 : read_matrix (file_search ("biomed.data"))$
(%i4) barsplot (col (s3, 2), 'maintitle = "Ages",
 'axisnames = ["Years", "# of individuals"], 'colorintensity = 0.2,
 'relbarwidth = 0.6)$

See also function histogram.

Function: boxplot (data)
Function: boxplot (data, option_1, option_2, ...)

This function plots box diagrams. Argument data can be a list, which is not of great interest, since these diagrams are mainly used for comparing different samples, or a matrix, so it is possible to compare two or more components of a multivariate statistical variable. But it is also allowed data to be a list of samples with possible different sample sizes, in fact this is the only function in package descriptive that admits this type of data structure. See example bellow. These are the options,

Examples:

(%i1) load (descriptive)$
(%i2) load (numericalio)$
(%i3) s2 : read_matrix (file_search ("wind.data"))$
(%i4) boxplot (s2, 'maintitle = "Windspeed in knots",
 'axisnames = ["Seasons", ""])$
(%i5) A :
 [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2],
  [8, 10, 7, 9, 12, 8, 10],
  [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$
(%i6) boxplot (A)$

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Lucas Nussbaum on January, 19 2008 using texi2html 1.76.