2.2 Ergodicity and the Measurment Problem

It does not take an expert on population statistics to see there is probably a mismatch between the interesting behavioural phenomena and the analytical toolbox most frequently used to study human behaviour in the social and life sciences. Anyone who took an introductory class in inferential statistics will remember the assumptions of statistical models require observations to be independent of one another, variances to be homogeneous (e.g. Levene’s test), and measurement error to be essentially random in nature and normally distributed, not correlated to any other factors that might cause the phenomenon under scrutiny.

Given the nature of the phenomena of interest and the properties of the system under scrutiny, there are two main concerns about the scientific study of human behaviour:

The assumption that the ergodic theorems apply to the theoretical objects of measurement and data generating processes (Molenaar, 2004, 2008): Ensemble averages of variables observed in samples of sufficiently many individuals are expected to be arbitrarily similar to the time averages of variables evolving over a sufficiently long interval of time, from any single initial condition.
The assumption that the interpretation of outcomes of psychological measurement is, or should be, equivalent to classical physical measurement (RN5?): It is considered unproblematic to interpret a measurement outcome as a property of the theoretical object of measurement confounded by some random additive measurement noise or sampling error.

The validity of the assumptions related to ergodicity (i.e. stationarity and homogeneity of central moments) are obviously important for making valid statistical inferences and generalizations. However, even if some of the core assumptions for an ergodic data generating process are formally valid, one cannot rely on parameter estimates to converge on a characteristic expected value within the time scale of observation, or, scale of fluctuation, as is the case when the process samples from a stable distribution with one or more undefined central moments like the Cauchy distribution. This has led some scholars to suggest that “the very notion of probability may not make sense” (RN6?) when studying complex systems with internal state dynamics.

Recent observations of discrepancies between inferred properties at the ensemble level (inter-individual) and the individual level (intra-individual), have been suggested as a cause of the so-called reproducibility crisis in the social and life sciences (RN7?; RN8?; RN9?). A study which observed a lack of ‘group-to-individual generalizability’ in the context of psychopathology described the phenomenon as a threat to human subjects research: “In clinical research, diagnostic tests may be systematically biased and our classification systems may be at least partially invalid. In terms of theory development, we may have a misleading impression about the nature of psychological variables and their interactions.” (RN8?). A study of the neuroanatomical phenotypes of schizophrenia and bi-polar disorder (RN9?) concluded: “This study found that group-level differences disguised biological heterogeneity and interindividual differences among patients with the same diagnosis. This finding suggests that the idea of the average patient is a noninformative construct in psychiatry that falls apart when mapping abnormalities at the level of the individual patient.”

The second concern is about the lack of a clear notion in psychology and the life sciences of how to incorporate the measurement context and the act of measurement into the description of a phenomenon (RN10?). Psychological measurement is an interaction between a (prepared) theoretical object of measurement and the elements of the measurement procedure (experimental design, instruments, etc.). The very act of asking someone to project their current internal state of happiness onto an arbitrary ordinal scale will interfere with their “true” state of happiness (if such a thing even exists without the measurement context). There is no “happiness” equivalent for unobtrusive measurement of body temperature using an infra-red camera.

Resolutions to these and other problems with psychological measurement have been proposed, for example the various types of conjoint measurement (RN5?; RN11?), or suggestions to adopt concepts from quantum measurement (RN10?; RN12?). However, when measurement and analysis of the temporal evolution of internal states is concerned, problems arise due to the fact that living systems are subject to ageing (loss of identity over time) and appear to be able to coordinate their current behavior relative to some record of previously experienced events. In more general terms, the behavior of a complex adaptive system will display after-effects of interactions with its internal or external environment that extend far beyond any timescale that might be understood as a simple stochastic process with autoregressive components. Time series of observables of living systems will often lack the memoryless-ness property (RN13?; RN14?), suggesting anomalous, rather than normal diffusion processes should be considered as a model for the data generating process (RN15?).