In order to successfully complete this assignment you must do the required reading, watch the provided videos and complete all instructions. The embedded survey form must be entirely filled out and submitted on or before 11:59pm on Sunday March 14. Students must come to class the next day prepared to discuss the material covered in this assignment.
This pre-class assignment will review a few of the more common alternatives to CUDA.
OpenCL is an alternative to CUDA which is designed to be more open and available on many different platforms. Watch the following video to get some history behind OpenCL.
from IPython.display import YouTubeVideo
YouTubeVideo("V4RfPfHQPC8",width=640,height=360)
✅ DO THIS: Copy the following code to the HPC and compile/run using the commands provided below.
Example From: https://gist.github.com/ddemidov/2925717
%%writefile NCode/vecAdd_opencl.c
#include <iostream>
#include <vector>
#include <string>
#define __CL_ENABLE_EXCEPTIONS
#include <CL/cl.hpp>
//Example From: https://gist.github.com/ddemidov/2925717
// Compute c = a + b.
static const char source[] =
"#if defined(cl_khr_fp64)\n"
"# pragma OPENCL EXTENSION cl_khr_fp64: enable\n"
"#elif defined(cl_amd_fp64)\n"
"# pragma OPENCL EXTENSION cl_amd_fp64: enable\n"
"#else\n"
"# error double precision is not supported\n"
"#endif\n"
"kernel void add(\n"
" ulong n,\n"
" global const double *a,\n"
" global const double *b,\n"
" global double *c\n"
" )\n"
"{\n"
" size_t i = get_global_id(0);\n"
" if (i < n) {\n"
" c[i] = a[i] + b[i];\n"
" }\n"
"}\n";
int main() {
const size_t N = 1 << 20;
try {
// Get list of OpenCL platforms.
std::vector<cl::Platform> platform;
cl::Platform::get(&platform);
if (platform.empty()) {
std::cerr << "OpenCL platforms not found." << std::endl;
return 1;
}
// Get first available GPU device which supports double precision.
cl::Context context;
std::vector<cl::Device> device;
for(auto p = platform.begin(); device.empty() && p != platform.end(); p++) {
std::vector<cl::Device> pldev;
try {
p->getDevices(CL_DEVICE_TYPE_GPU, &pldev);
for(auto d = pldev.begin(); device.empty() && d != pldev.end(); d++) {
if (!d->getInfo<CL_DEVICE_AVAILABLE>()) continue;
std::string ext = d->getInfo<CL_DEVICE_EXTENSIONS>();
if (
ext.find("cl_khr_fp64") == std::string::npos &&
ext.find("cl_amd_fp64") == std::string::npos
) continue;
device.push_back(*d);
context = cl::Context(device);
}
} catch(...) {
device.clear();
}
}
if (device.empty()) {
std::cerr << "GPUs with double precision not found." << std::endl;
return 1;
}
std::cout << device[0].getInfo<CL_DEVICE_NAME>() << std::endl;
// Create command queue.
cl::CommandQueue queue(context, device[0]);
// Compile OpenCL program for found device.
cl::Program program(context, cl::Program::Sources(
1, std::make_pair(source, strlen(source))
));
try {
program.build(device);
} catch (const cl::Error&) {
std::cerr
<< "OpenCL compilation error" << std::endl
<< program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(device[0])
<< std::endl;
return 1;
}
cl::Kernel add(program, "add");
// Prepare input data.
std::vector<double> a(N, 1);
std::vector<double> b(N, 2);
std::vector<double> c(N);
// Allocate device buffers and transfer input data to device.
cl::Buffer A(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
a.size() * sizeof(double), a.data());
cl::Buffer B(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
b.size() * sizeof(double), b.data());
cl::Buffer C(context, CL_MEM_READ_WRITE,
c.size() * sizeof(double));
// Set kernel parameters.
add.setArg(0, static_cast<cl_ulong>(N));
add.setArg(1, A);
add.setArg(2, B);
add.setArg(3, C);
// Launch kernel on the compute device.
queue.enqueueNDRangeKernel(add, cl::NullRange, N, cl::NullRange);
// Get result back to host.
queue.enqueueReadBuffer(C, CL_TRUE, 0, c.size() * sizeof(double), c.data());
// Should get '3' here.
std::cout << c[42] << std::endl;
} catch (const cl::Error &err) {
std::cerr
<< "OpenCL error: "
<< err.what() << "(" << err.err() << ")"
<< std::endl;
return 1;
}
}
Overwriting NCode/vecAdd_opencl.c
!g++ -std=c++0x -lOpenCL -o opencl NCode/vecAdd_opencl.c
NCode/vecAdd_opencl.c:7:10: fatal error: CL/cl.hpp: No such file or directory #include <CL/cl.hpp> ^~~~~~~~~~~ compilation terminated.
!time ./opencl
./opencl: Command not found. 0.000u 0.000s 0:00.00 0.0% 0+0k 0+0io 0pf+0w
✅ QUESTION: Where you able to get the OpenCL code to compile and run?
Put your answer to the above question here.
✅ QUESTION: If not, what problems did you encounter?
Put your answer to the above question here.
The next programming extension is called OpenACC and tries to combine the ease of programming in OpenMP with the power of the GPU. It uses pragmas similar to OpenMP to compile and run code on the GPU.
✅ DO THIS: Copy the following code to the HPC and compile/run using the commands provided below.
Example From: https://www.olcf.ornl.gov/tutorials/openacc-vector-addition/
%%writefile vecAdd_openacc.cu
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
//Example From: https://www.olcf.ornl.gov/tutorials/openacc-vector-addition/
int main( int argc, char* argv[] )
{
// Size of vectors
int n = 10000;
// Input vectors
double *restrict a;
double *restrict b;
// Output vector
double *restrict c;
// Size, in bytes, of each vector
size_t bytes = n*sizeof(double);
// Allocate memory for each vector
a = (double*)malloc(bytes);
b = (double*)malloc(bytes);
c = (double*)malloc(bytes);
// Initialize content of input vectors, vector a[i] = sin(i)^2 vector b[i] = cos(i)^2
int i;
for(i=0; i<n; i++) {
a[i] = sin(i)*sin(i);
b[i] = cos(i)*cos(i);
}
// sum component wise and save result into vector c
#pragma acc kernels copyin(a[0:n],b[0:n]), copyout(c[0:n])
for(i=0; i<n; i++) {
c[i] = a[i] + b[i];
}
// Sum up vector c and print result divided by n, this should equal 1 within error
double sum = 0.0;
for(i=0; i<n; i++) {
sum += c[i];
}
sum = sum/n;
printf("final result: %f\n", sum);
// Release memory
free(a);
free(b);
free(c);
return 0;
}
Overwriting vecAdd_openacc.cu
!module swap GNU PGI
ERROR: Unable to locate a modulefile for 'GNU' >
!pgcc -acc -o openacc vecAdd_OpenACC.c
pgcc: Command not found.
!time ./openacc
./openacc: Command not found. 0.000u 0.000s 0:00.00 0.0% 0+0k 0+0io 0pf+0w
✅ QUESTION: Where you able to get the OpenACC code to compile and run?
Put your answer to the above question here.
✅ QUESTION: If not, what problems did you encounter?
Put your answer to the above question here.
Two newcomers to the playing field are Kokkos and RAJA. Neither are new languages but instead a C++ library intended to make it easier to run both OpenMP and GPU code. The goal of both programs is to:
These two goals are very difficult. However, both projects seem to be getting close. It will be interesting to see which one "wins".
For more information on how to run Kokkos on the HPC follow the instructions found here: https://www.egr.msu.edu/nextgen/wiki/index.php/Kokkos
I have not tried getting RAJA to work yet but here is some information on the project: https://media.readthedocs.org/pdf/raja/master/raja.pdf
✅ QUESTION: Why you think there are so many alternatives to CUDA (provide at least two reasons)?
Put your answer to the above question here
Another way to avoid having to write your own CUDA code is to use CUDA Accelerated libraries. As a programmer you don't need to do anything except include the libraries and call the CUDA enabled functions.
✅ DO THIS: Using your favorite search engine, find some common CUDA enabled libraries for Fast Fourier Transforms, Dense Linear Algebra and Sparse Linear Algebra
Put your answer to the above question here.
✅ DO THIS: See if you can find other CUDA enabled libraries that you think could be useful. Come to class prepared to share what you found.
Put your answer to the above question here.
Please fill out the form that appears when you run the code below. You must completely fill this out in order to receive credits for the assignment!
If you have trouble with the embedded form, please make sure you log on with your MSU google account at googleapps.msu.edu and then click on the direct link above.
✅ Assignment-Specific QUESTION: Why you think there are so many alternatives to CUDA (provide at least two reasons)?
Put your answer to the above question here
✅ QUESTION: Summarize what you did in this assignment.
Put your answer to the above question here
✅ QUESTION: What questions do you have, if any, about any of the topics discussed in this assignment after working through the jupyter notebook?
Put your answer to the above question here
✅ QUESTION: How well do you feel this assignment helped you to achieve a better understanding of the above mentioned topic(s)?
Put your answer to the above question here
✅ QUESTION: What was the most challenging part of this assignment for you?
Put your answer to the above question here
✅ QUESTION: What was the least challenging part of this assignment for you?
Put your answer to the above question here
✅ QUESTION: What kind of additional questions or support, if any, do you feel you need to have a better understanding of the content in this assignment?
Put your answer to the above question here
✅ QUESTION: Do you have any further questions or comments about this material, or anything else that's going on in class?
Put your answer to the above question here
✅ QUESTION: Approximately how long did this pre-class assignment take?
Put your answer to the above question here
from IPython.display import HTML
HTML(
"""
<iframe
src="https://cmse.msu.edu/cmse401-pc-survey"
width="100%"
height="500px"
frameborder="0"
marginheight="0"
marginwidth="0">
Loading...
</iframe>
"""
)
To get credit for this assignment you must fill out and submit the above survey from on or before the assignment due date.
Written by Dr. Dirk Colbry, Michigan State University
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.