ICA 14: Loops

ICA 14: Loops#

Image showing a simple c/c++ loop code and a graphical representaiton of the loops

Image from: https://en.wikipedia.org/wiki/For_loop

Agenda for today’s class (70 minutes)#

(20 minutes) Pre class Review
(20 minutes) OpenMP Loop Pi Code
(30 minutes) Matrix Multiply

1. Pre class Review#

In class we will talk about the basic loop sharing construct which looks like the following:

#pragma omp parallel
{
    #pragma omp for
    for (int i=0; i < max_itr; i++)
    {

    }
}

2. OpenMP Loop Example: Pi Code#

Next we will try to help each other out so that everyone gets a parallel OpenMP version of the Pi-code working. We did this in a previous class already. If you don’t yet have a working solution, go back to the linked Google document and find a solution that you can compile and run.

Group Google Document

OpenMP example: Matrix Multiply#

A simple matrix multiply between matrix \(A\) (with \(m\) rows and \(k\) columns) and matrix \(B\) (with \(k\) rows and \(n\) columns) into matrix \(C\) (with \(m\) rows and \(n\) columns) is defined as follows:

\[A_{(m\times k)} \times B_{(k\times n)} = C_{(n\times m)} \]

Where each \(m,n\) element in \(C\) (aka \(c_{m,n}\)) is calculated using the dot product of the \(m^{th}\) row of \(A\) by the \(n^{th}\) column of \(B\):

\[c_{m,n} = a_{m,0}b_{0,n} + a_{m,1}b_{1,n} + \dots + a_{m,k}b_{k,n}\]

The following figure is an example depiction of this algorithm:

Image showing how matrices are multiplied (see text)

More information about matrix multiply can be found here.

The following C program can be used to calculate a matrix multiplication between \(A\) and \(B\):

#include "stdio.h"
#include "stdlib.h"
#include "time.h"

int main(int argc, char *argv[])
{
    int sz_m=2,sz_k=10,sz_n=3;

    srand(time(0));

    // Accept input numbers for array sizes (m,k,n)
    if (argc > 1)
        sz_m = atoi(argv[1]);

    if (argc > 2)
        sz_k = atoi(argv[2]);

    if (argc > 3)
        sz_n = atoi(argv[3]);

    //Allocate space for matrix A
    double * A_vector = (double *) malloc(sz_m*sz_k*sizeof(double));
    for (int i=0; i<sz_m*sz_k; i++)
        A_vector[i] = rand()%10;
    double **A = malloc(sz_m * sizeof(double*));
    for (int r=0; r<sz_m; r++)
        A[r] = &A_vector[r*sz_k];
 
    //Print out Matrix A
    printf("A = \n");
    for (int i=0; i<sz_m;i++) {
        for (int j=0; j<sz_k;j++)
            printf("%f ",A[i][j]);
        printf("\n");
    }
    printf("\n");

    //Allocate space for matrix B    
    double * B_vector = (double *) malloc(sz_k*sz_n*sizeof(double));
    printf("\n");
    for (int i=0; i<sz_k*sz_n; i++)
        B_vector[i] = rand()%10;
    double **B = malloc(sz_k * sizeof(double*));
    for (int r=0; r<sz_k; r++)
        B[r] = &B_vector[r*sz_n];
    
    //Print out matrix B
    printf("B = \n");
    for (int i=0; i<sz_k;i++) {
        for (int j=0; j<sz_n;j++)
            printf("%f ",B[i][j]);
        printf("\n");
    }
    printf("\n");
    
    //Allocate space for matrix C
    double * C_vector = (double *) malloc(sz_m*sz_n*sizeof(double));
    for (int i=0; i<sz_m*sz_n; i++)
        C_vector[i] = 0;
    double **C = malloc(sz_m * sizeof(double*));
    for (int r=0; r<sz_m; r++)
        C[r] = &C_vector[r*sz_n];
    
    printf("multiplying matrices (%dx%d) (%dx%d)\n",sz_m,sz_k,sz_k,sz_n);
    for (int i=0;i<sz_m;i++){
        for(int j=0;j<sz_n;j++){
            C[i][j] = 0;
            for(int k=0;k<sz_k;k++){
                C[i][j] += A[i][k] * B[k][j];
            }
        }
    }
    
    //Print out matrix C
    printf("C = \n");
    for (int i=0; i<sz_m;i++) {
        for (int j=0; j<sz_n;j++)
            printf("%f ",C[i][j]);
        printf("\n");
    }
    printf("\n");
    
    free(A_vector);
    free(A);
    free(B_vector);
    free(B);
    free(C_vector);
    free(C);    
}

✅ Do This: Inspect the above code? Can you figure out what it is doing and why?

✅ Do This: Get the above code to compile and run on the HPCC. Check the answers to make sure they seem correct. Experiment with different values of \(m,n\) and \(k\) to see how time works. Note very large values may cause a segmentation fault (why?)?

✅ DO THIS: Modify the above code and compile it using OpenMP parallel for loops. Are you able to increase the speed of the code?

Congratulations, we’re done!#

Have one of the instructors check your notebook and sign you out before leaving class. Turn in your assignment using D2L.

Written by Dr. Dirk Colbry, Michigan State University (Updated by Dr. Nathan Haut in Spring 2025)
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.