On causality of extreme events

Multiple metrics have been developed to detect causality relations between data describing the elements constituting complex systems, all of them considering their evolution through time. Here we propose a metric able to detect causality within static data sets, by analysing how extreme events in one element correspond to the appearance of extreme events in a second one. The metric is able to detect non-linear causalities; to analyse both cross-sectional and longitudinal data sets; and to discriminate between real causalities and correlations caused by confounding factors. We validate the metric through synthetic data, dynamical and chaotic systems, and data representing the human brain activity in a cognitive task. We further show how the proposed metric is able to outperform classical causality metrics, provided non-linear relationships are present and large enough data sets are available.


I. INTRODUCTION
Detecting causality relationships between the elements composing a complex system is an old, though unsolved problem [1,2].The origin of the concept of causality goes back to the ancient Greek phylosophy, according to which causal investigation was the search for an answer to the question "why?" [3,4]; and the debate was still hot in the late 18th century, in the work of David Hume [5] and his arguing that causality cannot be rationally demonstrated.
In the last few decades there has been an increasing interest for the creation of metrics able to detect causality in real data, in order to improve our understanding of systems that cannot directly be described.For instance, while one may suspect that the gross domestic product of a country and its unemployment rate may be related, it is difficult to prove the presence of this relationship, as economical models are neither perfect nor complete.The same happens when one tries to infer if a gene is regulating a second one, in the absence of a complete model of their dynamics, or of a pathway.The solution is thus to analyse if the dynamics of these indicators are connected.Among the best known causality metrics, examples include Granger causality, cointegration, or transfer entropy [6][7][8][9][10], to name a few.
All proposed causality metrics share a common characteristic: causality is defined as a relation existing in the temporal domain, and thus require the study of pairs of time series.
For instance, for two processes X and Y, the transfer entropy is defined as the reduction in the uncertainty about the future of Y when one includes information about the past of X [8].Similarly, the Granger causality involves estimating the reduction in the error of an autoregressive linear model of Y given the history of X [7].Associating causality to the temporal domain is intuitive, due to the way the human brain incorporates time into our perception of causality [11,12].To exemplify, if we see a ball approaching a window, and just after the window broken, we can safely conclude that the first event was the cause of the second -and thus that causality is a relation between the past and the future.The need of a time evolution is nevertheless an important limiting factor when studying systems whose dynamics through time cannot easily be observed.Consider genetic analysis; one single measurement is usually available per subject and gene, precluding the estimation of gene-gene interactions through a causal analysis solely based on expression levels, as the corresponding time evolution would not be accessible.
When only vectors of observations are available, i.e. vectors representing static observations of different realisations of the same system, it is customary to resort to statistics.This can be classical statistics, for then defining the relationship in terms of linear or non-linear correlations; or Bayesian statistics and the vast field of statistical learning and data mining [13,14].Although correlation, and statistical learning in general, appear prima facie as an interesting solution, they present the important drawback of not being able of discriminating between real and spurious causalities.Suppose one is studying a system composed of three interconnected elements, as the one depicted in Fig. 1 (i), with the aim of detecting if the dynamics of element C is caused by B. Additionally, no time series are available, and elements are described through vectors of cross-sectional observations; in other words, multiple realisations of the same system are available, but each one of them can only be observed at a single moment in time.A statistically significant correlation between B and C may be found both when a true causality is present (Fig. 1 (iii)), and when both elements are driven by an unobserved confounding element A (Fig. 1 (ii)).
In order to tackle the scenario of Fig. 1, in this contribution we propose a novel metric for detecting causality from observational data.It entails three innovative points.First, it is defined on vectors of observation, which do not have to necessarily represent a time evolution.In other words, input vectors may correspond to gene expression levels measured in a population, i.e. to a cross-sectional study; or, but not necessarily, to multiple observations of the same subject, i.e. to a longitudinal study.Second, the method is based on the detection of extreme events, and on their appearance statistics.This is not dissimilar to Granger causality, as the latter measures how shocks in one time series are explained by a second one; but without the need of a time evolution.Third, it is optimised for the detection of non-linear causal relations, which are common in many real-world complex systems [15], but that may create problems in standard causality metrics [16].

II. METHODS
Suppose two vectors of elements B = {b i } and C = {c i } of equal size.The two elements of each pair (b i , c i ) must be related, e.g. they may correspond to the measurement of two biomarkers in a same subject.In the case of B and C being time series, clearly (b i , c i ) would correspond to measurements at time i; yet, as already introduced, such dynamical approach An example of these three rules is depicted in Fig. 2; note how extreme events (red bars) in B always propagate to C, while the second extreme event of C is caused by its internal dynamics and is not propagated.
Let us denote by p 1 the probability that an extreme event in C also corresponds to an extreme event in B, i.e Conversely, p 2 will denote the probability that an extreme event in B corresponds to an extreme event in C, i.e.
In the case of a real causality, the second condition implies that p 1 ≈ 1, the third one that p 2 1.On the other hand, in the case of an external confounding effect, and if the two thresholds are chosen such that the probability of finding extreme events is the same for both elements, it is easy to see that p 1 ≈ p 2 .Notice that the same is true if B and C are bidirectionally interacting.
The previous analysis suggests that the necessary condition for having a B → C causality is p 1 > p 2 .The statistical significance can be quantified through a binomial two-proportion z-test: with n 1 and n 2 the number of events associated to p 1 and p 2 , and p = (n The corresponding p-value can be obtained through a Gaussian cumulative distribution function.
Before demonstrating the effectiveness of the proposed causality metric, it is worth discussing several aspects of the same.
First of all, the attentive reader will notice the similarity of this method with some metrics for assessing synchronisation in time series.For instance, local maxima and their statistics were considered in Ref. [17], and event coincidences in Ref. [18].In both cases, an essential ingredient is the time evolution: extreme events in one time series are identified and related to those appearing in a second time series, and the delay required for their transmission assessed through a time shift optimisation.While this yields an estimation of the direction of the information flow between two time series, it cannot be applied to systems whose time evolutions are not accessible.The metric here proposed has the advantage that can be applied to static data sets, in principle paving the way to the construction of data mining algorithms based on causality.This applicability to static data sets is also the main difference with respect to Ref. [19], which proposes a method for the detection of relations between large ensembles of short time series.While Ref. [19] allows analysing fast systems, described by time series comprising only a handful of values, it is still not applicable to static measurements, as for instance those found in genetics.
Second, the metric definition requires setting two thresholds, i.e. τ b and τ c .This can be done using a priori information, e.g. when a level is accepted as abnormal for a given biomarker; or by simply explore all the parameters space, in order to assess the values of (τ b , τ c ) corresponding to the lowest p-value.This may result especially useful in those situations for which the input elements are not well characterised: beyond the identification of causality relations, this method may also be used to define what an abnormal value is.
Additionally, the form of detecting extreme events through a threshold is different from similar approached in the literature.For instance, Ref. [17] defines the events of interest as local maxima, independently of their amplitude; some of these events may not pass the threshold filtering here proposed, which only considers extreme (in the sense of not normal or not expected, but not necessarily of maximal) values.
Third, we have previously stated that the presence of a confounding effect can be correctly detected, and that in such situations the metric would not detect a statistically significant causality.According to the Common Cause Principle [1], two variables are unconfounded iff always allow to discriminate true causalities from spurious relationships, although it provides important clues about which one of these two effects is having the strongest impact.

III. RESULTS
We first test the proposed metric with synthetic data.Fig. 3 presents the evolution of the p-value for two vectors B and C, whose values are drawn from different distributions.
Two situations are compared.First, a real B → C causality, such that c i = c i + γb n i (n being the order of the coupling) -solid lines in Fig. 3. Second, a confounding effect in which b i = b i +γa n i and c i = c i +γa n i -dashed lines in Fig. 3.It can be appreciated that the p-values of real causalities drop to zero with small values of coupling constants; and that non-linear couplings perform better than linear ones.When the same analysis is performed using other causality standard metrics, such clear behaviour is not observed.Specifically, Fig. 4 presents the evolution of the p-value, as obtained for Gaussian distributions by the Granger Causality and the Transfer Entropy.The former metric rejects, for all coupling constants, the presence of a causality.As for the Transfer Entropy, it correctly detects the presence of a relationship, but only for very high coupling constants; additionally, it is not able to detect the presence of confounding effects -note that the three dashed lines in Fig. 4 Right are almost always below the corresponding solid ones.In some cases, a confounding effect, especially when highly non-linear, can foul the proposed metric and yield a low p-value -see, for instance, the cubic confounding coupling for a gamma distribution in Fig. 3.Such situations can easily be identified by comparing the p-values for B → C and C → B: in the case of a true causality, which is by definition directed, the p-value should be small only for one of them.An example of this is depicted in Fig. 5, which shows the evolution of the p-values for a confounding effect (top panel) and a causality (bottom panel), for vectors of Gamma distributed values.
Once the limitations and requirements about confounding effects, as defined in the previous section, are taken into account, discriminating between true and spurious causalities only requires calculating the two opposite p-values, and checking whether they are both small.
The necessity of detecting extreme events introduces a drawback in the method, i.e.
the need of having a large set of input values to reach a stable statistics.This problem is explored in Fig. 6, which depicts the p-value obtained as a function of the number of input values.Depending on the kind of relation to be detected, between 2 and 4 thousand values are required.One of the advantages of the proposed metric is that it can be applied both to crosssectional and longitudinal data.In other words, the metric can be used to study both those systems that do not present a temporal evolution, but for which information corresponding to different instances is available; and those systems whose evolution through time can be observed.Here we show such flexibility in the detection of the causality between two noisy Kuramoto oscillators [20,21].Suppose two oscillators whose phases are defined as: p-values than the cross-sectional one, as the latter is probably confounded by the presence of a strong correlation.Fig. 7 Left further depicts the behaviour of the p-value when calculated using the Granger Causality metric; it can be appreciated that the proposed causality metric is more sensitive, especially for small coupling constants.
An important characteristic of complex systems is that their constituting elements usually have a chaotic dynamics [15], making more complicated the task of detecting causality between them.We here test the proposed metrics by considering two unidirectionally coupled Rössler oscillators (B → C) in their chaotic regime -see [22] for details.We consider both linear and cubic couplings; following the notation in [22], this means: Time series are created by sampling the second dimension of each oscillator (i.e.x 2 and y 2 ) with a resolution lower than the intrinsic frequency.Fig. 7 Right depicts the evolution of the p-value for low coupling strengths γ, thus ensuring that the system is generalised synchronised.For γ ≈ 0.01 (γ ≈ 2 • 10 −4 for cubic coupling), a true causality is detected, while for γ > 0.015 (γ > 4 • 10 −4 ) the two oscillators start to synchronise.
The possibility of combining a cross-sectional analysis of extreme values with a longitudinal analysis opens new doors towards the understanding of systems for which both aspects it is further able to discriminate real from spurious causalities, thus enabling the detection of confounding effects.The effectiveness of the metric has been tested through synthetic data; data obtained from simple and chaotic dynamical systems, i.e.Kuramoto and Rössler oscillators; and through EEG data representing the activity of the human brain during an object recognition task.
In spite of the advantages that the proposed metric presents, and that have been described throughout the text, two limitations have to be highlighted.First, the reduced sensitivity of the metric to linear causality relationships, and in the analysis of data without long tail distributions, i.e. without clear extreme events -see Fig. 3 for further details.Second, the need of large quantities of data, in the order of several thousands of observations, to reach statistically significant results (Fig. 6).
The possibility of detecting causality in static data sets is expected to be of increasing importance in those research fields in which time dynamics are not available, and that require ensuring that a causality is not just the result of the presence of a confounding factor.For instance, one may considering the raising field of biomedical data analysis [27][28][29].The custom solution is to resort to data mining algorithms, which allow to detect and make explicit patterns in the input data, with the final objective of using such patterns in diagnostic and prognostic models [13].Nevertheless, data mining (and machine learning in general) is based on the Bayes theorem, a form of statistics of co-occurrences, and thus on a generalised concept of correlation.These methods are thus sensitive to the confounding effects that are frequently in place, as genes and metabolites create an intricate network of interactions.Resorting to classical causality metrics, like Granger's one, is not possible, as time series are seldom available -measuring gene expression or metabolite levels is an expensive and slow process.In spite of this, causality is an essential element to be detected: if one only focuses on correlations, there is a risk of detecting elements whose manipulation does not guarantee the expected results on the system [30][31][32].We foresee that the proposed causality metric can be an initial solution to this problem, by providing a causality test that can be applied to static data, and that could be used as the foundation of a new class of data mining algorithms.
A Python implementation of the proposed causality metric is freely available at www.

FIG. 1 .FIG. 2 .
FIG. 1. Distinguishing causality from correlation.(i) General situation, in which three elements A, B and C interact in a simple triangular configuration.If one is interested in the relation between B and C, two different scenarios may arise.(ii) When A is dominating the dynamics, any common dynamics between B and C will be a correlation, generated by the external confounding factor.(iii) The situation corresponding to a real causality between B and C.
FIG.3.p-value obtained by the proposed causality metric, for vectors of synthetic data drawn from six different distributions, as a function of the coupling constant γ -see main text for details.Black, red and green lines respectively correspond to linear, quadratic and cubic couplings; solid lines depict true causalities (as in Fig.1(ii)), dashed lines spurious ones (Fig.1(iii)).Each point corresponds to 10, 000 realisations.

1 (
FIG.4.p-value obtained by two standard causality metrics, for vectors of synthetic data drawn from Gaussian distributions, as a function of the coupling constant γ.The left panel corresponds to the Granger Causality, the right one to the Transfer Entropy.Black, red and green lines respectively correspond to linear, quadratic and cubic couplings; solid lines depict true causalities (as in Fig.1(iii)), dashed lines spurious ones (Fig.1(ii)).

FIG. 5 . 1 pFIG. 6 .
FIG. 5. Evolution of the p-value of the causality, when considering both B → C and C → B tests for a cubic coupling and for data drawn from a Gamma distribution (as in green lines of the first panel of Fig. 3.The top panel reports the results for a confounding effect, the bottom one for a true causality between B and C.

FIG. 7 .FIG. 8 .
FIG. 7. (Left) Evolution of the p-value of the causality test between two Kuramoto oscillators, for different values of the coupling constant γ.Solid and dashed lines respectively correspond to a cross-sectional and longitudinal study -see main text for details.Black lines correspond to the proposed metric, red ones to Granger Causality.(Right) p-value for two coupled Rössler oscillators as a function of the coupling constant γ, for a linear (top graph) and cubic (bottom graph) coupling.