My current assignment at work is optimizing some code for DualCore processors using OpenMP.

OpenMP is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer.

Sounds pretty impressive, huh? .... Well it actually isn't. All you have to do (VS 2005) is activate OpenMP for your project and then you can use #pragma plus various OpenMP directives to split your code into multiple threads.

Check out the code for this multi-threaded Hello World program using the PARALLEL directive:

C++:
  1. code is still coming :D

Pretty cool! Heres how the output looks:
OpenMP parallel output

Another useful OpenMP directive is FOR. Here's the code from a FOR LOOP and the output:

C++:
  1. #include "stdafx.h"
  2. #include "stdio.h"
  3. #include "stdlib.h"
  4. #include "math.h"
  5. #include "omp.h"
  6.  
  7. int _tmain(int argc, _TCHAR* argv[])
  8. {
  9. #ifdef _OPENMP
  10. printf("OpenMP implemented\n");
  11. #endif
  12.  
  13. double *x,*y;
  14. int array_size=18,i;
  15.  
  16. // allocate memory for arrays
  17. x=(double*)malloc((size_t)(array_size*sizeof(double)));
  18. y=(double*)malloc((size_t)(array_size*sizeof(double)));
  19.  
  20. // initialize 'x' with junk
  21. for(i=0;i<array_size;i++)>
  22. </array_size;i++)>    x[i]=((double)i)/(i+1000);
  23. }
  24.  
  25. omp_set_num_threads(5);
  26.  
  27. #pragma omp parallel for
  28. for(i=0;i<array_size;i++)>
  29. </array_size;i++)>    y[i]=sin(exp(cos(-exp(sin(x[i])))));
  30. printf("%d (%d): %.20f\n",i,omp_get_thread_num(),y[i]);
  31. }
  32.  
  33. getchar();
  34. return 0;
  35. }

OpenMP for loop output

You can see how OpenMP breaks the loop apart and spreads the goodies among the different threads. The first number of each line is the current iteration, followed by the number of the thread (in parentheses) that is executing that iteration. After the semicolon comes the result. Of course this only works with non-linear loop computations and with loops that have a fixed number of iterations.

So that's OpenMP. You can accomplish some severe optimization with very little code....