Objective

  1. To learn how to deploy monitoring services like Prometheus and Grafana for live performance monitoring.
  2. To set up Jupyter notebook environments for interactive computing and data analysis on multi-node and GPU systems.
  3. To integrate more and more services like logging and alerting systems for better management of the systems.
  4. Make sure that the services which are deployed are highly available and scalable.
  5. To assess the deployment services performance with the help of monitoring and analytics tools.