In order to successfully complete this assignment you need to participate both individually and in groups during class. If you attend class in-person then have one of the instructors check your notebook and sign you out before leaving class on Friday February 10.

In-Class Assignment: Loops¶

Image showing a simple c/c++ loop code and a graphical representaiton of the loops

Image from: https://en.wikipedia.org/wiki/For_loop

Agenda for today's class (70 minutes)¶

(20 minutes) Pre class Review
(20 minutes) OpenMP Loop Pi Code
(30 minutes) Matrix Multiply

1. Pre class Review¶

In class we will talk about the basic loop sharing construct which looks like the following:

#pragma omp parallel
{
    #pragma omp for
    for (int i=0; i < max_itr; i++)
    {

    }
}

0216--OMP_Loops_pre-class-assignment

2. OpenMP Loop Example: Pi Code¶

Next we will try to help each other out so that everyone gets a parallel OpenMP version of the Pi-code working. Here are some instructions

If you are stuck or confused, raise your hand and ask for help.
If you get stuck review code in the Google document.
Help your neighbors
Share all of your solutions with each other using the following Google Document.
Review the Google Document and learn from each other's solutions.

Group Google Document

OpenMP example: Matrix Multiply¶

A simple matrix multiply between matrix $A$ (with $m$ rows and $k$ columns) and matrix $B$ (with $k$ rows and $n$ columns) into matrix $C$ (with $m$ rows and $n$ columns) is defined as follows:

$$A_{(m\times k)} \times B_{(k\times n)} = C_{(m\times n)} $$

Where each $m,n$ element in $C$ (aka $c_{m,n}$) is calculated using the dot product of the $m^{th}$ row of $A$ by the $n^{th}$ column of $B$:

$$c_{m,n} = a_{m,1}b_{1,n} + \dots + a_{m,k}b_{k,n}$$

More information about matrix multiply can be found here.

The following C++ program is the first part of the file matmul.cpp in the class repository. It can be used to calculate a matrix multiplication between $A$ and $B$:

int main(int argc, char *argv[]){

    std::default_random_engine generator;
    std::normal_distribution<double> distribution(0.0,1.0);

    // matrix size
    int N = 10, M = 10, K = 10;

// Accept input numbers for array sizes (m,k,n)
    if (argc > 1)
        M = atoi(argv[1]);

    if (argc > 2)
        K = atoi(argv[2]);

    if (argc > 3)
        N = atoi(argv[3]);

    Darray2 A,B,C; // Define Matrices
    A.define(1,M,1,K);
    B.define(1,K,1,N);
    C.define(1,M,1,N);
    for (int j = 1; j <= K ; j++)
      for (int i = 1; i <= M ; i++)
        A(i,j) = distribution(generator);

    for (int j = 1; j <= N ; j++)
      for (int i = 1; i <= K ; i++)
        B(i,j) = distribution(generator);

//  Only time the actual matmul
    auto begin = std::chrono::high_resolution_clock::now();

    for (int j = 1; j <= N ; j++)
      for (int i = 1; i <= M ; i++)
      {
        double cij = 0.0;
         for (int k = 1; k <= K ; k++)
            cij += A(i,k)*B(k,j);
        C(i,j) = cij;
      }    

    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
    cout << "Matmul 1 [s]: " << elapsed.count()*1e-9 << endl;

✅ Do This: Inspect the above code? Can you figure out what it is doing and why?

✅ Do This: Get the above code to compile and run on the HPCC. Check the answers to make sure they seem correct. Experiment with different values of $m,n$ and $k$ to see how much time it takes for different sizes. Make a few nice plots. Note very large values may cause a segmentation fault (why?)?

✅ DO THIS: Modify the above code and compile it using OpenMP parallel for loops. Use the plain vanilla OpenMP for loop in the second matmul and experiment with collapse and schedule in the third. Use a bash script to loop over many different sizes and make a few nice plots of the results. Are you able to increase the speed of the code?

Congratulations, we're done!¶

If you attend class in-person then have one of the instructors check your notebook and sign you out before leaving class. If you are attending asynchronously, turn in your assignment using D2L.