 ## Highlights §

• The degree of overlap can be visualized by means of a cardinality map (a special case of a unique values map), where each location indicates how many neighbors the two weights matrices have in common. In addition, different p-value cut-offs can be employed to select the significant locations, i.e., where the probability of a given number of common neighbors falls below the chosen p. (View Highlight)

## New highlights added October 10, 2023 at 10:04 PM §

• concept of local spatial autocorrelation to the multivariate domain. This turns out to be particularly challenging due to the difficulty in separating pure attribute correlation among multiple variables from the spatial effects. (View Highlight)
• Designing a spatial autocorrelation statistic in a multivariate setting is fraught with difficulty. The most common statistic, Moran’s I, is based on a cross-product association, which is the same as a bivariate correlation statistic. As a result, it is difficult to disentangle whether the correlation between multiple variables at adjoining locations is due to the correlation among the variables, or a similarity due to being neighbors in space. (View Highlight)
• Early attempts at extending Moran’s I to multiple variables focused on principal components, as in the suggestion by Wartenberg (1985), and later work by Dray, Saïd, and Débias (2008). However, these proposals only dealt with a global statistic. A more local perspective along the same lines is presented in Lin (2020), although it is primarily a special case of a geographically weighted regression, or GWR (Fotheringham, Brunsdon, and Charlton 2002). (View Highlight)
• In Anselin (2019), the idea is proposed to focus on the distance between observations in both attribute and geographical space and to construct statistics that assess the match between those distances. In general, the squared multi-attribute distance between a pair of observations i,ji,ji, j on kkk variables is given as: d2ij=||xi−xj||=∑h=1k(xih−xjh)2,dij2=||xi−xj||=∑h=1k(xih−xjh)2, d_{ij}^2 = || x_i - x_j || = \sum_{h=1}^k (x_{ih} - x_{jh})^2, with xixix_i and xjxjx_j as vectors of observations. In some expressions, the squared distance will be preferred, in others, the actual distance (dijdijd_{ij}, its square root) will be used. The overall idea is to identify observations that are both close in multiattribute space and close in geographical space. (View Highlight)
• The treatment of the bivariate Local Moran’s I closely follows that of its global counterpart (see also Anselin, Syabri, and Smirnov 2002). In essence, it captures the relationship between the value for one variable at location iii, xixix_i, and the average of the neighboring values for another variable, i.e., its spatial lag ∑jwijyj∑jwijyj\sum_j w_{ij} y_j. Apart from a constant scaling factor (that can be ignored), the statistic is the product of xixix_i with the spatial lag of yiyiy_i (i.e., ∑jwijyj∑jwijyj\sum_j w_{ij}y_j), with both variables standardized, such that their means are zero and variances equal one: IBi=cxi∑jwijyj,IiB=cxi∑jwijyj, I_{i}^B = c x_i \sum_j w_{ij} y_j, where wijwijw_{ij} are the elements of the spatial weights matrix. (View Highlight)
• his statistic needs to be interpreted with caution, since it ignores in-situ correlation between the two variables (see the discussion of global bivariate spatial autocorrelation for details). (View Highlight)
• A special case of the bivariate Local Moran statistic is comparing the same variable at two points in time. The most meaningful application is where one variable is for time period ttt, say ztztz_t, and the other variable is for the neighbors in the previous time period, say zt−1zt−1z_{t-1}. This formulation measures the extent to which the value at a location in a given time period is correlated with the values at neighboring locations in a previous time period, or an inward influence. (View Highlight)
• As mentioned, the interpretation of the bivariate Local Moran cluster map warrants some caution, since it does not control for the correlation between the two variables at each location (i.e., the correlation between xixix_i and yiyiy_i). (View Highlight)

## New highlights added October 11, 2023 at 9:16 AM §

• consider the space-time case, where one can interpret the results of the High-High and Low-Low clusters in Figure 3 as locations where high/low values at time ttt (in our example, homicide rates in 1990) were surrounded by high/low values at time t−1t−1t -1 (homicide rates in 1980) more so than would be the case randomly. (View Highlight)
• if the homicide rates in all locations are highly correlated over time, then if the surrounding values in t−1t−1t - 1 were correlated with the value at iii in t−1t−1t - 1, then they would also tend to be correlated with the value at iii in ttt (through the correlation between yj,t−1yj,t−1y_{j,t-1} and yj,tyj,ty_{j,t}). Hence, while these findings could be compatible with diffusion, this is not necessarily the case. The same complication affects the interpretation of the bivariate Local Moran coefficient between two variables at the same point in time. (View Highlight)
• In Anselin (2019), a multivariate extension of the Local Geary statistic is proposed. This statistic measures the extent to which neighbors in multiattribute space (i.e., data points that are close together in the multidimensional variable space) are also neighbors in geographical space. While the mathematical formalism is easily extended to many variables, in practice one quickly runs into the curse of dimensionality. (View Highlight)
• he Multivariate Local Geary statistic measures the extent to which the average distance in attribute space between the values at a location and the values at its neighboring locations are smaller or larger than what they would be under spatial randomness. The former case corresponds to positive spatial autocorrelation, the latter to negative spatial autocorrelation. (View Highlight)
• An important aspect of the multivariate statistic is that it is not simply the superposition of univariate statistics. In other words, even though a location may be identified as a cluster using the univariate Local Geary for each of the variables separately, this does not mean that it is also a multivariate cluster. The univariate statistics deal with distances in attribute space projected onto a single dimension, whereas the multivariate statistics are based on distances in a higher dimensional space. The multivariate statistic thus provides an additional perspective to measuring the tension between attribute similarity and locational similarity. (View Highlight)
• ple, for each univariate test, the target p-value of αα\alpha would typically be adjusted to α/kα/k\alpha / k (with kkk variables, each with a univariate test), as a Bonferroni bound. Since the multivariate statistic is in essence a sum of the statistics for the univariate cases, this would suggest a similar approach by dividing the target p-value by the number of variables (kkk). Alternatively, and preferable, a FDR strategy can be pursued. The extent to which this actually compensates for the two dimensions of multiple comparison (multiple variables and multiple observations) remains to be further investigated.2 (View Highlight)
• The results of the Multivariate Local Geary significance or cluster map are not always easy to interpret. Mainly, this is because we tend to simply superimpose the results of the univariate tests, whereas the multivariate statistic involves different tradeoffs. (View Highlight)
• An alternative approach to visualize and quantify the tradeoff between geographical and attribute similarity was suggested by Anselin and Li (2020) in the form of what is called a local neighbor match test. The basic idea is to assess the extent of overlap between k-nearest neighbors in geographical space and k-nearest neighbors in multi-attribute space. (View Highlight)
• this is a simple intersection operation between two k-nearest neighbor weights matrices, one computed for the variables (standardized) and one using geographical distance. We can then quantify the probability that an overlap occurs between the two neighbor sets. This corresponds to the probability of drawing vvv common neighbors from the kkk out of n−1−kn−1−kn - 1 - k possible choices as neighbors, a straightforward combinatorial calculation. (View Highlight)
• More formally, the probability of vvv shared neighbors out of kkk is: p=C(k,v).C(N−k,k−v)/C(N,k),p=C(k,v).C(N−k,k−v)/C(N,k), p = C(k,v).C(N-k,k-v) / C(N,k), where N=n−1N=n−1N = n - 1 (one less than the number of observations), kkk is the number of nearest neighbors considered in the connectivity graphs, vvv is the number of neighbors in common, and CCC is the combinatorial operator. (View Highlight)
• the value of kkk may need to be adjusted (increased) in order to find meaningful results. In addition, the k-nearest neighbor calculation becomes increasingly difficult to implement in very high attribute dimensions, due to the empty space problem. The idea of matching neighbors can be extended to distances among variables obtained from dimension reduction techniques, such as multidimensional scaling, (View Highlight)
• In contrast to the approach taken for the Multivariate Local Geary, the local neighbor match test focuses on the distances directly, rather than converting them into a weighted average. Both measures have in common that they focus on squared distances in attribute space, rather than a cross-product as in the Moran statistic (View Highlight)
• We start by creating two k-nearest neighbor weights matrices for k=6, one based on Euclidean geographical distance, the other on multi-variable distance using the six specified variables in z-standardized form. (View Highlight)
• We compare the connectivity graphs for the geographical (guerry_85_kd6) and the multi-variable (guerry_85_k6v) weights. They are shown side-by-side in Figure 19, with the geographical connectivity on the left. Whereas the latter has the familiar form, the multi-variable connectivity is quite complex and shows connections between locations that are far apart in geographical space. (View Highlight)
• The connectivity graph for the common neighbors is shown in Figure 20. This illustrates the logic underlying the local neigbhor match test. What remains to be done is to assess the extent to which the number of coincident neighbors is due to chance, or unlikely under random matching. (View Highlight)
• We investigate the connectivity pattern for those two locations more closely in Figure 29. In the right-hand panel, we see how the multi-attribute neighbors for both departments tend to be far removed geographically. In the averaging operation that is behind the Multivariate Local Geary, the difference with the contiguous locations are averaged out. In the local neighbor match test, the focus is singularly on the distances themselves. Figure 29: Significant Multivariate Local Geary locations with only two matching neighbors In spite of these slight discrepancies between the two methods, they tend to identify the same broad clusters. Given that it has fewer issues with determining the proper p-value, the local neighbor match test thus provides a viable alternative to detect multivariate spatial clusters. (View Highlight)