NTSYSpc ver. 2.2 details

Technical details about NTSYSpc ver. 2.2

Computational modules

AUTOREGR - Fits data using the pure autoregressive model Used in spatial and phylogenetic autocorrelation analyses.

CANPLS - Performs canonical correlation and partial least-squares analyses. Used to study correlations between two sets of variables. Permutation test used to test for associations.

COMBINE - Combines two or more matrices into one.

CONSEN - Computes a consensus tree for two of two or more trees (such as multiple tied trees from SAHN clustering or between two different methods). Several consensus indices are also computed to measure the degree of agreement between trees. Can read nexus tree files.

COPH - Produces a cophenetic value matrix (matrix of ultrametric values) from a tree matrix (produced, e.g., by the SAHN clustering program). This matrix can be used by the MXCOMP program to test the goodness of fit of a cluster analysis to the similarity or dissimilarity matrix on which it was based. Can also produce path length distance matrices and phylogenetic covariance matrices.

CORRESP - Correspondence analysis. This is a useful way to investigate the structure of 2-way contingency table.

CPCA - Common principal components analysis of Flury (1984, 1988). Fits a single set of eigenvectors to a series of variance-covariance matrices.

CVA - Performs a canonical vectors analysis (a generalization of discriminant function analysis). Can also test closeness of each specimen to each group mean.

DCENTER - Performs a "double-centering" of a matrix of similarities or dissimilarities among the objects. The resulting matrix can then be factored to perform a principal coordinates analysis (a method for displaying relationships among objects in terms of their positions along a set of axes based on a dissimilarity matrix, see Gower 1966).

EIGEN - Computes eigenvector and eigenvalue matrices of a real symmetric similarity matrix. This program can be used to perform a principal components or a principal coordinates analysis by extracting eigenvectors (factors) from a correlation or variance-covariance matrix.

FACTOR - Performs the initial step (factor extraction) for a factor analysis of a correlation or a covariance matrix. The principal factor and maximum likelihood methods are included. The max diagonal, squared multiple correlation, and Jöreskog (1963) methods for initial communality estimation.

FOURIER - Computes Fourier and elliptic Fourier transformations and their inverses. Can be used on both 2D and 3D outline curves.

FOURPLOT - Plots outlines and their estimates based on Fourier coefficients.

FREQ - Computes matrices of gene frequencies for input to the SIMINT or SIMGEND modules.

FROTATE - Performs the orthogonal or oblique factor rotation step in a factor analysis. The function plane, primary product function plane, Harris-Kaiser independent cluster, Varimax, and Promax methods are included.

FSCORES - Computes factor scores. The Anderson-Rubin, Bartlett, Least-squares, Regression, multigroup, and Thompson methods are included.

MDSCALE - Nonmetric and linear multidimensional scaling analysis. This can be used as an alternative to PCA.

MOD3D - Plots a 3-way scatter diagram as an interactive 3D perspective view of a model with n "objects" at tops of wires attached from a base plane. The view can be rotated interactively. This program is often used to view the results of a principal components or principal coordinates analysis.

MST - Computes a minimum-length spanning tree from a similarity or dissimilarity matrix. This is useful for showing the nearest neighbors of objects based on their positions in a multidimensional space.

MXCOMP - Compares two symmetric matrices by computing their matrix correlation and then plotting a scatter diagram. Can also compare two matrices with the effects of a third held constant (the Smouse, Long, Sokal test). The statistics for a 2-way Mantel test are also computed. It can be used to compute the goodness of fit of a cluster analysis to a dataset (by comparing a cophenetic value matrix with a dissimilarity matrix).

MXPLOT - Plots 2-way scatter diagrams of rows or columns of a matrix.

NJOIN - Computes Saitou and Nei's (1987) neighbor-joining method trees as estimated phylogenetic trees. Unweighted neighbor-joining clustering trees can also be computed. As in the UPGMA module, checks can be made for the effects of ties.

OUTPUT - Formats matrices into pages for printing. Results can be pasted into most word processors. This formatted output is also useful for checking to make sure that an input file has been prepared in the correct format for NTSYSpc.

PLOT - Plot one or more variables against another.

POOLVC - Computes a pooled within-groups variance-covariance matrix from two or more data matrices. Can also perform a test for homogeneity of covariance matrices.

PROCRUSTES - Performs a Procrustes superimposition or a generalized Procrustes analysis to compute and average configuration of points and to align configurations to the average. Useful for comparing ordinations and in geometric morphometrics. Analyses can be performed for two or higher dimensional data.

PROCPLOT - Plots the results of a Procrustes analysis.

PROJ - Projects a set of objects onto one or more vectors—or onto a space orthogonal to a set of vectors. In principal components analysis one will project standardized data onto the eigenvectors of the correlation matrix in order to see the best (in a least-squares sense) low-dimensional view of a data set. The orthogonal projection option can be used to implement Burnaby's (1966) method for size adjustment. Can also compute predictions using the results of a regression analysis.

MULREGR - Performs a regression, multivariate regression, multiple regression, and generalized least-squares regression.

RESAMPLE - Create samples using bootstrap, jackknife, random permutation, or random normal deviates.

SAHN clustering - Performs the sequential, agglomerative, hierarchical, and nested clustering methods as defined by Sneath and Sokal (1973) . These include such commonly used hierarchical clustering methods as listed below. The program can find alternative trees when there are ties in the input matrix.

	complete-link (maximum method)
	single-link (minimum method)
	flexible clustering
	UPGMA (unweighted pair-group method)
	WPGMA (weighted pair-group method)
	WPGM using centroid clustering (either similarities or dissimilarities)
	WPGM using Spearman's average

SIMGEND - Computes matrices of genetic distance coefficients from gene-frequency and DNA sequence data. The following coefficients can be selected.

	Cavalli-Sforza and Edwards (1967) arc distance
	Balakrishnan and Sanghvi (1968) distance.
	Cavalli-Sforza and Edwards (1967) chord distance.
	Hillis (1984) distance
	Swofford and Olsen's (1990) suggestion to unbias the distance by using same correction as in Nei's distance.
	Nei's (1972) distance (default).
	Nei's (1978) unbiased distance. Formula as above but with denominator:
	Prevosti (Wright, 1978 ) distance.
	Rogers (1972) distance
	Rogers distance as modified by Wright (1978
	Jukes and Cantor (1969) distance modified for DNA sequence data.

SIMINT - Computes various similarity or dissimilarity indices for interval measure (continuous) data (e.g., correlation, distance, etc. coefficients).

	Bray-Curtis distance
	Canberra metric
	Chi-squared distance
	Average taxonomic distance
	Squared average distance
	Euclidean distance
	Euclidean distance squared
	Manhattan distance
	Penrose's shape coefficient
	Penrose's size coefficient
	Product-moment correlation
	Cosine of angle
	Sample size
	Morisita (1959) index
	Horn's (1966) modification of Morisita index
	Renkonen (1938) similarity
	Variances and covariances
	Inner product

SIMQUAL - Computes various association coefficients for qualitative data— data with unordered states (e.g., simple matching, Jaccard, phi, etc. coefficients). Hamann (1961) coefficient

	Rogers and Tanimoto (1960) distance
	Simple matching coefficient
	Dice (1945) coefficient
	Jaccard (1908) coefficient
	Kulcznski (1927) coefficients 1 and 2
	Phi coefficient
	Russel and Rao (1940) coefficient
	Ochiai coefficient
	Yule (1911) coefficient
	also several unnamed coefficients from Sokal and Sneath (1961)

SPLIT- Divides a matrix into two or more matrices.

STAND - Performs a linear transformation of a data matrix so as to eliminate the effects of different scales of measurement. Several options for what gets subtracted off and what gets used as a divisor.

SUMMARY - Summarizes results of a resampling experiment (bootstrap, jackknife, etc.).

SVD - Computes a singular-value decomposition of a rectangular matrix. It allows you to compute principal axes and projections in a single step.

TPSWTS - Computes projections of the 2D or 3D coordinates of objects onto the principal warps of a thin-plate spline bending energy matrix (see Bookstein, 1991). This is done to enable a statistical analysis of the components of shape variation. Includes both 2D and 3D estimates of the uniform shape component.

TRANSF - Performs various linear and non-linear transformations of the rows or columns of a matrix. Computes Bookstein shape coordinates (both scaled and unscaled). Can also be used to delete rows or columns and alter the form of storage of some matrices.

TREE - Displays a tree (e.g., from a cluster analysis) as a phenogram or the results of the neighbor-joining method as a phylogenetic tree with branch lengths. Options are provided for scaling and scrolling through a tree interactively.

Data size limitations

Most modules in NTSYSpc do not have explicit dimension limits for objects or variables. The limitation will be disk space and time. Larger amounts of RAM will speed up to computations for very large datasets. With the present capacity and power of modern PCs, a data set with a few hundred samples or variables is considered a small dataset for most computations. However, the MDSCALE module must manipulate many matrices simultaneously and hence is more limited in the size of the matrix it can handle (512 variables is the maximum).

This file was last modified on 9 June 2023.