!pip install statsmodels
Requirement already satisfied: statsmodels in /root/venv/lib/python3.7/sitepackages (0.12.2)
Requirement already satisfied: scipy>=1.1 in /sharedlibs/python3.7/py/lib/python3.7/sitepackages (from statsmodels) (1.6.0)
Requirement already satisfied: numpy>=1.15 in /sharedlibs/python3.7/py/lib/python3.7/sitepackages (from statsmodels) (1.19.5)
Requirement already satisfied: patsy>=0.5 in /root/venv/lib/python3.7/sitepackages (from statsmodels) (0.5.1)
Requirement already satisfied: pandas>=0.21 in /sharedlibs/python3.7/py/lib/python3.7/sitepackages (from statsmodels) (1.2.1)
Requirement already satisfied: six in /sharedlibs/python3.7/pycore/lib/python3.7/sitepackages (from patsy>=0.5>statsmodels) (1.15.0)
Requirement already satisfied: pythondateutil>=2.7.3 in /sharedlibs/python3.7/pycore/lib/python3.7/sitepackages (from pandas>=0.21>statsmodels) (2.8.1)
Requirement already satisfied: pytz>=2017.3 in /sharedlibs/python3.7/py/lib/python3.7/sitepackages (from pandas>=0.21>statsmodels) (2021.1)
WARNING: You are using pip version 20.1.1; however, version 21.0.1 is available.
You should consider upgrading via the '/root/venv/bin/python m pip install upgrade pip' command.
import numpy as np
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
from statsmodels.stats.anova import AnovaRM
from statsmodels.multivariate.manova import MANOVA
help(AnovaRM)
Help on class AnovaRM in module statsmodels.stats.anova:
class AnovaRM(builtins.object)
 AnovaRM(data, depvar, subject, within=None, between=None, aggregate_func=None)

 Repeated measures Anova using least squares regression

 The full model regression residual sum of squares is
 used to compare with the reduced model for calculating the
 withinsubject effect sum of squares [1].

 Currently, only fully balanced withinsubject designs are supported.
 Calculation of betweensubject effects and corrections for violation of
 sphericity are not yet implemented.

 Parameters
 
 data : DataFrame
 depvar : str
 The dependent variable in `data`
 subject : str
 Specify the subject id
 within : list[str]
 The withinsubject factors
 between : list[str]
 The betweensubject factors, this is not yet implemented
 aggregate_func : {None, 'mean', callable}
 If the data set contains more than a single observation per subject
 and cell of the specified model, this function will be used to
 aggregate the data before running the Anova. `None` (the default) will
 not perform any aggregation; 'mean' is s shortcut to `numpy.mean`.
 An exception will be raised if aggregation is required, but no
 aggregation function was specified.

 Returns
 
 results : AnovaResults instance

 Raises
 
 ValueError
 If the data need to be aggregated, but `aggregate_func` was not
 specified.

 Notes
 
 This implementation currently only supports fully balanced designs. If the
 data contain more than one observation per subject and cell of the design,
 these observations need to be aggregated into a single observation
 before the Anova is calculated, either manually or by passing an aggregation
 function via the `aggregate_func` keyword argument.
 Note that if the input data set was not balanced before performing the
 aggregation, the implied heteroscedasticity of the data is ignored.

 References
 
 .. [*] Rutherford, Andrew. Anova and ANCOVA: a GLM approach. John Wiley & Sons, 2011.

 Methods defined here:

 __init__(self, data, depvar, subject, within=None, between=None, aggregate_func=None)
 Initialize self. See help(type(self)) for accurate signature.

 fit(self)
 estimate the model and compute the Anova table

 Returns
 
 AnovaResults instance

 
 Data descriptors defined here:

 __dict__
 dictionary for instance variables (if defined)

 __weakref__
 list of weak references to the object (if defined)
df = pd.read_csv('run_results.csv')
df.head()
cleaned_df = pd.DataFrame()
cleaned_df['name'] = df['name'].loc[df['name'].index.repeat(7)]
cleaned_df = cleaned_df.reset_index()
cleaned_df['within'] = np.tile([0, 1, 2, 3, 4, 5, 6], 72)
cleaned_df['response'] = pd.Series(df[['0','1','2','3','4','5','6']].values.reshape(1, 1)[0])
#cleaned_df = cleaned_df.drop(columns=['index'])
cleaned_df['name'] = cleaned_df['name'] + '' + cleaned_df['within'].astype(str)
idx = np.concatenate((np.tile([0], 7), np.tile([1], 7), np.tile([2], 7)))
cleaned_df['index'] = np.tile([idx], 24).reshape(1,1)[0]
#perform the repeated measures ANOVA
result = AnovaRM(data=cleaned_df, depvar='response', subject='name', within=['index']).fit()
print(result)
Anova
====================================
F Value Num DF Den DF Pr > F

index 2.5914 2.0000 334.0000 0.0764
====================================
help(MANOVA)
Help on class MANOVA in module statsmodels.multivariate.manova:
class MANOVA(statsmodels.base.model.Model)
 MANOVA(endog, exog, missing='none', hasconst=None, **kwargs)

 Multivariate Analysis of Variance

 The implementation of MANOVA is based on multivariate regression and does
 not assume that the explanatory variables are categorical. Any type of
 variables as in regression is allowed.

 Parameters
 
 endog : array_like
 Dependent variables. A nobs x k_endog array where nobs is
 the number of observations and k_endog is the number of dependent
 variables.
 exog : array_like
 Independent variables. A nobs x k_exog array where nobs is the
 number of observations and k_exog is the number of independent
 variables. An intercept is not included by default and should be added
 by the user. Models specified using a formula include an intercept by
 default.

 Attributes
 
 endog : ndarray
 See Parameters.
 exog : ndarray
 See Parameters.

 Notes
 
 MANOVA is used though the `mv_test` function, and `fit` is not used.

 The ``from_formula`` interface is the recommended method to specify
 a model and simplifies testing without needing to manually configure
 the contrast matrices.

 References
 
 .. [*] ftp://public.dhe.ibm.com/software/analytics/spss/documentation/
 statistics/20.0/en/client/Manuals/IBM_SPSS_Statistics_Algorithms.pdf

 Method resolution order:
 MANOVA
 statsmodels.base.model.Model
 builtins.object

 Methods defined here:

 __init__(self, endog, exog, missing='none', hasconst=None, **kwargs)
 Initialize self. See help(type(self)) for accurate signature.

 fit(self)
 Fit a model to data.

 mv_test(self, hypotheses=None)
 Linear hypotheses testing

 Parameters
 
 hypotheses : list[tuple]
 Hypothesis `L*B*M = C` to be tested where B is the parameters in
 regression Y = X*B. Each element is a tuple of length 2, 3, or 4:

 * (name, contrast_L)
 * (name, contrast_L, transform_M)
 * (name, contrast_L, transform_M, constant_C)

 containing a string `name`, the contrast matrix L, the transform
 matrix M (for transforming dependent variables), and righthand side
 constant matrix constant_C, respectively.

 contrast_L : 2D array or an array of strings
 Lefthand side contrast matrix for hypotheses testing.
 If 2D array, each row is an hypotheses and each column is an
 independent variable. At least 1 row
 (1 by k_exog, the number of independent variables) is required.
 If an array of strings, it will be passed to
 patsy.DesignInfo().linear_constraint.

 transform_M : 2D array or an array of strings or None, optional
 Left hand side transform matrix.
 If `None` or left out, it is set to a k_endog by k_endog
 identity matrix (i.e. do not transform y matrix).
 If an array of strings, it will be passed to
 patsy.DesignInfo().linear_constraint.

 constant_C : 2D array or None, optional
 Righthand side constant matrix.
 if `None` or left out it is set to a matrix of zeros
 Must has the same number of rows as contrast_L and the same
 number of columns as transform_M

 If `hypotheses` is None: 1) the effect of each independent variable
 on the dependent variables will be tested. Or 2) if model is created
 using a formula, `hypotheses` will be created according to
 `design_info`. 1) and 2) is equivalent if no additional variables
 are created by the formula (e.g. dummy variables for categorical
 variables and interaction terms)


 Returns
 
 results: MultivariateTestResults

 Notes
 
 Testing the linear hypotheses

 L * params * M = 0

 where `params` is the regression coefficient matrix for the
 linear model y = x * params

 If the model is not specified using the formula interfact, then the
 hypotheses test each included exogenous variable, one at a time. In
 most applications with categorical variables, the ``from_formula``
 interface should be preferred when specifying a model since it
 provides knowledge about the model when specifying the hypotheses.

 
 Methods inherited from statsmodels.base.model.Model:

 predict(self, params, exog=None, *args, **kwargs)
 After a model has been fit predict returns the fitted values.

 This is a placeholder intended to be overwritten by individual models.

 
 Class methods inherited from statsmodels.base.model.Model:

 from_formula(formula, data, subset=None, drop_cols=None, *args, **kwargs) from builtins.type
 Create a Model from a formula and dataframe.

 Parameters
 
 formula : str or generic Formula object
 The formula specifying the model.
 data : array_like
 The data for the model. See Notes.
 subset : array_like
 An arraylike object of booleans, integers, or index values that
 indicate the subset of df to use in the model. Assumes df is a
 `pandas.DataFrame`.
 drop_cols : array_like
 Columns to drop from the design matrix. Cannot be used to
 drop terms involving categoricals.
 *args
 Additional positional argument that are passed to the model.
 **kwargs
 These are passed to the model with one exception. The
 ``eval_env`` keyword is passed to patsy. It can be either a
 :class:`patsy:patsy.EvalEnvironment` object or an integer
 indicating the depth of the namespace to use. For example, the
 default ``eval_env=0`` uses the calling namespace. If you wish
 to use a "clean" environment set ``eval_env=1``.

 Returns
 
 model
 The model instance.

 Notes
 
 data must define __getitem__ with the keys in the formula terms
 args and kwargs are passed on to the model instantiation. E.g.,
 a numpy structured or rec array, a dictionary, or a pandas DataFrame.

 
 Data descriptors inherited from statsmodels.base.model.Model:

 __dict__
 dictionary for instance variables (if defined)

 __weakref__
 list of weak references to the object (if defined)

 endog_names
 Names of endogenous variables.

 exog_names
 Names of exogenous variables.
n_samples = 20
n_dim = 5
n_classes = 3
X = np.random.randn(n_samples, n_dim)
y = np.random.randint(n_classes, size=n_samples)
print(X.shape)
print(y.shape)
manova = MANOVA(endog=X, exog=y)
print(manova.mv_test())
(20, 5)
(20,)
Multivariate linear model
============================================================

x0 Value Num DF Den DF F Value Pr > F

Wilks' lambda 0.8411 5.0000 15.0000 0.5666 0.7244
Pillai's trace 0.1589 5.0000 15.0000 0.5666 0.7244
HotellingLawley trace 0.1889 5.0000 15.0000 0.5666 0.7244
Roy's greatest root 0.1889 5.0000 15.0000 0.5666 0.7244
============================================================
avg_df = df.groupby('name',axis=0, as_index=False).mean()
avg_df['dataset'] = avg_df['name'].apply(lambda x: x.split('')[0])
avg_df['method'] = avg_df['name'].apply(lambda x: x.split('')[1])
avg_df['sensitive_attr'] = avg_df['name'].apply(lambda x: x.split('')[2])
X = avg_df[['0','1','2','3','4','5','6']].to_numpy()
y = avg_df['method'].to_numpy()[0]
manova = MANOVA(endog=X, exog=y)
print(manova.mv_test())
ValueError: unrecognized data structures: <class 'numpy.ndarray'> / <class 'str'>
print(X.shape, y.shape)
AttributeError: 'str' object has no attribute 'shape'