Friday, June 5, 2015

Calculating Angular Distance

Hi there followers, it's been a while from the last post but here I am again with fresh news.
Some weeks ago I was writing a program for an exam and I thought that sharing with you this code could be a good idea.
This because it's simple and can make easily understand how to fairly share the work between process with Open MPI.

Project Requirements

The goal is to develop a program written in C that calculate the angular distances between a group of stars, and than write as output the distribution of those distances so that could be plotted with Gnuplot.

The angular distance

Quoting Wikipedia:
In mathematics (in particular geometry and trigonometry) and all natural sciences (including astronomy, geophysics, etc.), the angular distance (angular separation, apparent distance, or apparent separation) between two point objects, as observed from a location different from either of these objects, is the size of the angle between the two directions originating from the observer and pointing towards these two objects.
So to calculate the position of two point i and j in the space we can use the following equation to get \( \theta \):

\[\theta_{ij} = arcos \left ( \frac {x_i x_j + y_i y_j + z_i z_j} {\sqrt{ (x_i^2+y_i^2+z_i^2) (x_j^2+y_j^2+z_j^2)} } \right ) \]

The result of this is a number between \( [0, \pi )  \).

Calculating the distribution

We know now how to calculate the angular distance between two points, but our goal is to get a distribution over a set of stars, and we get the parameter K to sample our set of values.
But our huge first problem is the number of operations required because defined \(n\) as the stars' number, we have \( \frac{n(n-1)} {2} \) distances.
So with a large amount of stars, we can easily reach billion of operations.
If we imagine the distances that we need to compute displaced as an upper triangular matrix, we need to split fairly the matrix's elements between our pool of processes.
The idea now is to transform the matrix in an array that concatenate every row of the matrix, in this way we can split the work but we lose a big information. How could we get back the row and the column simply from the index of the elements in the array?
Googling a lot I finally found the solution, even avoiding the two loops, using the following two formulas:
\[I_r =-{ \frac{(-2(n-1) - 1 + \sqrt{ 4 n (n-1) - 8 i - 7} )} {2}}\]
\[I_c = i - (n - 1) I_r + I_r \frac{I_r + 1} {2} + 1\] 
where \( I_r \) is our row index and  \( I_c \) is our column index.
We can now calculate our distribution of distance and this is the output for a set of 119613 stars and 1000 K intervals:
The output was generated in 151.870678 seconds with 8 processors on Intel i7 4700MQ quad core, a really good result.
The resulting speedup is plotted in the following image:
and this is the efficiency of the program:
We can see that with the Hyper Thread technology we obtain a little amount of speedup even if we go over the physical core number, but over 8 processes the efficiency goes down fast on a quad core.

So that's all folks, you can find this program and the example data-set on Github at the following link:
Please let me know if anything isn't clear or wrong, and I will try to correct it until it is. 
So if you linked this post, share, share, share!