Virtual Lab

Set Up CUDA and CUDA-X Libraries
- Ensure you have a compatible NVIDIA GPU installed.
- Install the CUDA Toolkit and the necessary CUDA-X libraries: cuBLAS, cuDNN, and cuTensor.
- Follow the installation instructions provided by NVIDIA.
- Verify the installation:
```
nvcc --version cat /usr/local/cuda/version.txt
```

Test cuBLAS for Matrix Multiplication

Use cuBLAS to perform matrix multiplication:

#include <iostream>
#include <cublas_v2.h>

int main() {
    cublasHandle_t handle;
    cublasCreate(&handle);

    int N = 1024;
    float *d_A, *d_B, *d_C;
    cudaMalloc(&d_A, N * N * sizeof(float));
    cudaMalloc(&d_B, N * N * sizeof(float));
    cudaMalloc(&d_C, N * N * sizeof(float));

    float alpha = 1.0f;
    float beta = 0.0f;
    cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, N, N, N, &alpha, d_A, N, d_B, N, &beta, d_C, N);

    cublasDestroy(handle);
    cudaFree(d_A);
    cudaFree(d_B);
    cudaFree(d_C);

    return 0;
}

Compile and run the program:

nvcc -o matrix_multiplication matrix_multiplication.cu -lcublas ./matrix_multiplication

Test cuDNN for Deep Learning Operations

Use cuDNN for a convolution operation:

#include <cudnn.h>
#include <iostream>

int main() {
    cudnnHandle_t handle;
    cudnnCreate(&handle);

    cudnnConvolutionDescriptor_t conv_desc;
    cudnnCreateConvolutionDescriptor(&conv_desc);

    // Perform convolution (details omitted)

    cudnnDestroy(handle);
    return 0;
}

Test cuTensor for Tensor Operations

Use cuTensor to perform tensor operations:

#include <cutensor.h>

int main() {
    cutensorHandle_t handle;
    cutensorCreate(&handle);

    // Perform tensor operations (details omitted)

    cutensorDestroy(handle);
    return 0;
}

Benchmarking Performance
- Use nvprof or Nsight Systems to profile cuBLAS, cuDNN, and cuTensor operations.
- Measure performance and compare GPU vs CPU speed.
- Record results and analyze performance across libraries.
Analyze Results
- Evaluate GPU utilization, execution time, and memory consumption.
- Identify bottlenecks and opportunities for optimization.
- Choose the most efficient library for your workload.

Procedure