- Consulting services
Azure HPC MAX: 12-Week PoC
Oakwood’s 12-week Azure HPC MAX PoC provides a scalable, high-performance solution for complex tasks like AI/ML with advanced networking, storage, and job scheduling.
For organizations with complex, large-scale computational requirements, Oakwood’s Azure HPC MAX PoC delivers a highly scalable and sophisticated environment on Azure. Transitioning from on-premises infrastructure to Azure allows clients to overcome hardware limitations and gain access to advanced networking options like Infiniband and ExpressRoute, as well as high-throughput storage solutions like Weka.io or Hammerspace.
This setup, combined with Slurm’s advanced job scheduling, provides the computational power necessary for high-priority tasks, including simulations and AI/ML applications. Azure’s on-demand resources and built-in security enable organizations to handle peak workloads efficiently, improve cost-effectiveness, and focus on strategic HPC outcomes rather than infrastructure maintenance.
Delivery Approach: • Assessment: Conduct a thorough review of current HPC requirements to align Azure resources with target performance. • Design: Architect a large-scale environment with specialized storage, compute, and networking configurations. • Deployment: Establish CycleCloud and Slurm with advanced configurations to accommodate large workloads, while integrating tools such as Infiniband, Weka.io, and Hammerspace for high throughput. • Testing: Execute extensive performance testing across complex workloads. • Review and Report: Provide an in-depth report on performance metrics, cost analysis, and production recommendations.
Deliverables: • Comprehensive CycleCloud and Slurm setup tailored for high-performance workloads. • Deployment of Weka.io or Hammerspace for advanced storage. • High-speed connectivity through Infiniband or ExpressRoute. • Performance testing of large-scale HPC tasks, such as simulations and AI/ML applications.
Notes: • In addition to the 12 weeks, the Oakwood Team will devote an additional 2 weeks to work with the organization/user(s) to further optimize and manage the environment to ensure optimal performance. • The organization/user(s) will be limited by the number of jobs that can be ran during this PoC. • At this point a discussion will be had around moving forward with a full production environment or decommissioning the infrastructure that was stood up during the PoC.
Is Oakwood’s Azure HPC MAX PoC right for you? • Targeted Use: High-complexity, large-scale workloads, including AI/ML applications and simulations. • Complexity & Level of Effort: High complexity and intensive setup, featuring advanced Slurm job scheduling, high-speed networking with Infiniband or ExpressRoute, and high-throughput storage solutions like Weka.io or Hammerspace. • Scalability: Designed for maximum scalability, this PoC can handle peak computational demands and complex workloads efficiently.
Recommendation Guide: • If you need: A robust, highly scalable HPC environment capable of handling high-priority tasks and maximizing compute resources for large-scale applications. • Consider HPC PRO or CORE if: Your workload doesn’t require ultra-low latency networking or high-throughput storage and can be managed within a moderate or lightweight setup.