In a surprising turn of events, I now have one twitter follower.

# twitter pt. II

Still going strong with 0 followers.

I caved.

https://twitter.com/adilapapaya

# Java arrays

I didn’t know this until very recently, but in Java, arrays of arrays can have *rows of unequal sizes*. For example, say you want to create a lower triangular matrix of integers called lowerTri consisting of 10 rows with each row having 1, 2, 3,…, 10 elements. This is perfectly legal in Java:

int[][] lowerTri; lowerTri = new int[10][]; // allocate memory for 10 rows of arbitrary length for (int r=0; r<lowerTri.length; r++) { // set the number of elements in each row to the // r+1 since it's lower triangular lowerTri[r] = new int[r+1]; } // print the resulting array for (int r=0; r<lowerTri.length; r++) { System.out.print("\nRow "+r+" has "+lowerTri[r].length+" elements:"); for (int c=0; c<lowerTri[r].length; c++) { System.out.print(" " + lowerTri[r][c]); } System.out.println(""); }

Think of all the memory saved when you know your matrix is symmetric, or if you want to create something akin to a stem-and-leaf plot (with each row consisting of an array of observations).

Nice!

# Error in Matlab’s Mahalanobis distance?

First, some notation:

- is an -by- input matrix with each row corresponding to an observation, each column a variable.
- denotes row of .
- denotes column of .

Now that that’s out of the way…the Matlab statistics package has a function called pdist.m which is used in multidimensional scaling of multivariable datasets. One of the available distance metrics is the “mahalanobis” distance metric, (nicely elaborated upon here). In pdist.m, it’s defined as a dissimilarity measure between each pair of observations and in the -by- input matrix with each row corresponding to an observation, each column a variable.

That is, element of the mahalanobis distance matrix measures the mahalanobis distance between row and of ,

where is the (-by-) covariance matrix of .

I checked it using two methods:

- Is the mahalanobis distance computed using an identity covariance matrix equal to the euclidean distance?
- Is the mahalanobis distance equal to what we’d get from matrix-matrix multiplication as specified in the formula for above.

In both cases, the answer was no.

Code I used:

% Initialize x. clear all; ln = 100; x(:,1) = 1:ln; x(:,2) = sin(1:ln); x(:,3) = (1:ln).^2; x(:,4) = cos(1:ln); % This is what Matlab's pdist.m computes 'Malahanobis distance from Matlab pdist' Y = squareform(pdist(x,'mahalanobis')); Y(1:10,1:10) % By definition, the Mahalanobis distance matrix should reduce to the % Euclidian distance matrix if the covariance matrix is equal to % the identity. % Let's check this: 'Mahalanobis with covariance = identity matrix' mahaIdentity = squareform(pdist(x,'mahalanobis',eye(4))); mahaIdentity(1:10,1:10) 'Euclidean distance matrix' euclid = squareform(pdist(x,'euclidean')); euclid(1:10,1:10) % It isn't(!!) In fact, it's ignoring the input covariance % matrix and returning the same mahalanobis distance computed using pdist(x,'mahalanobis'). % What *should* the mahalanobis distance be then? % First, compute the inverse covariance of x icovx = cov(x) \ eye(4); % Then compute the mahalanobis distance. for(r1=1:ln) %for each row for(r2 = r1+1:ln) % for all combinations of rows dist = (x(r1,:) - x(r2,:)).^2*diag(icovx); %get the distance maha(r1,r2) = sqrt(dist); %take the square root maha(r2,r1) = maha(r1,r2); end end % This is what it should be. 'Mahalanobis from matrix-matrix multiplication' maha(1:10,1:10)

Feel free to let me know if I’ve got things wrong though.

In the meantime, feel free to use this code snippet:

icovx = cov(x) \ eye(size(x,2)); // inverse cov(x) for(r1=1:ln) %for each row for(r2 = r1+1:ln) % for all combinations of rows dist = (x(r1,:) - x(r2,:)).^2*diag(icovx); %get the distance maha(r1,r2) = sqrt(dist); %take the square root maha(r2,r1) = maha(r1,r2); end end