Open Access: CC BY 4.0


Seifert, B., Ritz, M. & Csősz, S.

Year: 2014


Application of Exploratory Data Analyses opens a new perspective in morphology-based alpha-taxonomy of eusocial organisms

Journal: Myrmecological News

Volume: 19

Pages: 1-15

Type of contribution: Original Article

Supplementary material: No


This article introduces a new application of the Exploratory Data Analysis (EDA) algorithms Ward's method, Unweighted Pair Group Method with Arithmetic Mean (UPGMA), K-Means clustering, and a combination of Non-Metric Multidimensional Scaling and K-Means clustering (NMDS-K-Means) for hypothesis formation in morphology-based alpha-taxonomy of ants. The script is written in R and freely available at: The characteristic feature of the new approach is an unconventional application of linear discriminant analysis (LDA): No species hypothesis is imposed. Instead each nest sample, composed of individual ant workers, is treated as a separate class. This creates a multidimensional distance matrix between group centroids of nest samples as input data for the clustering methods. We mark the new method with the prefix "NC" (Nest Centroid). The performance of NC-Ward, NC-UPGMA, NC-K-Means clustering, and a combination of Non-Metric Multidimensional Scaling and K-Means clustering (NCNMDS-K-Means) was comparatively tested in 48 examples with multiple morphological character sets of 74 cryptic species of 13 ant genera. Data sets were selected specifically on the criteria that the Eda methods are likely to lead to errors – i.e., for the condition that any character under consideration overlapped interspecifically in bivariate plots against body size. Morphospecies hypotheses were formed through interaction between Eda and a confirmative linear discriminant analysis (LDA) in which samples with disagreements between the primary species hypotheses and Eda classification were set as wild-cards. Subsequent Advanced Species Hypotheses were formed by aligning Morphospecies Hypotheses with biological and genetic data. Over all 48 cases and all four methods using nest centroid data generated by a hypothesis-free Lda, the mean deviation of clustering from Advanced Species Hypotheses was 5.25% in Ncupgma, 2.58% in NC-NMDS-K-Means, 2.40% in NC-Ward and 2.09% in NC-K-Means. A dramatically larger mean error of 21.50% was observed if K-Means used nest-sample means of morphological characters instead of centroid data. This indicates that having first run a hypothesis-free Lda was a deciding factor for the unexpectedly high performance of the new clustering algorithms. Advantages and disadvantages of the Eda methods are discussed. A combination of NC-Ward, NC-UPGMA and NC-K-Means clustering is recommended as the most conclusive and most rapidly working routine for the exploration of cryptic species. The method is applicable to any group of eusocial organisms such as ants, bees, wasps, termites, gall-making aphids, thrips, weevils, pistol shrimps, and mole rats. In general, NC-Clustering can be applied for all cohesive systems providing repeats of definitely conspecific elements – e.g., leaves and flowers of the same plant, a coral "head" of genetically identical polyps, an aphid colony produced by a single fundatrix. It can also be used to monitor intraspecific zoogeographical structures. However, the clustering methods presented did not appear to be good tools for the investigation of hybrid scenarios, for which we recommend alternative methods.

Open access, licensed under CC BY 4.0. © 2014 The Author(s).

Key words:

Taxonomy, cryptic species, eusociality, hierarchical cluster analysis, agglomerative nesting, non-hierarchical cluster analysis, multi-dimensional scaling, automated determination.

Publisher: The Austrian Society of Entomofaunistics

ISSN: Print: 1994-4136 - Online: 1997-3500