Azure HPC MAX: 12-Week PoC

OAKWOOD SYSTEMS GROUP INC

Oakwood’s 12-week Azure HPC MAX PoC provides a scalable, high-performance solution for complex tasks like AI/ML with advanced networking, storage, and job scheduling.

For organizations with complex, large-scale computational requirements, Oakwood’s Azure HPC MAX PoC delivers a highly scalable and sophisticated environment on Azure. Transitioning from on-premises infrastructure to Azure allows clients to overcome hardware limitations and gain access to advanced networking options like Infiniband and ExpressRoute, as well as high-throughput storage solutions like Weka.io or Hammerspace.

This setup, combined with Slurm’s advanced job scheduling, provides the computational power necessary for high-priority tasks, including simulations and AI/ML applications. Azure’s on-demand resources and built-in security enable organizations to handle peak workloads efficiently, improve cost-effectiveness, and focus on strategic HPC outcomes rather than infrastructure maintenance.

Delivery Approach: • Assessment: Conduct a thorough review of current HPC requirements to align Azure resources with target performance. • Design: Architect a large-scale environment with specialized storage, compute, and networking configurations. • Deployment: Establish CycleCloud and Slurm with advanced configurations to accommodate large workloads, while integrating tools such as Infiniband, Weka.io, and Hammerspace for high throughput. • Testing: Execute extensive performance testing across complex workloads. • Review and Report: Provide an in-depth report on performance metrics, cost analysis, and production recommendations.

Deliverables: • Comprehensive CycleCloud and Slurm setup tailored for high-performance workloads. • Deployment of Weka.io or Hammerspace for advanced storage. • High-speed connectivity through Infiniband or ExpressRoute. • Performance testing of large-scale HPC tasks, such as simulations and AI/ML applications.

Notes: • In addition to the 12 weeks, the Oakwood Team will devote an additional 2 weeks to work with the organization/user(s) to further optimize and manage the environment to ensure optimal performance. • The organization/user(s) will be limited by the number of jobs that can be ran during this PoC. • At this point a discussion will be had around moving forward with a full production environment or decommissioning the infrastructure that was stood up during the PoC.

Is Oakwood’s Azure HPC MAX PoC right for you? • Targeted Use: High-complexity, large-scale workloads, including AI/ML applications and simulations. • Complexity & Level of Effort: High complexity and intensive setup, featuring advanced Slurm job scheduling, high-speed networking with Infiniband or ExpressRoute, and high-throughput storage solutions like Weka.io or Hammerspace. • Scalability: Designed for maximum scalability, this PoC can handle peak computational demands and complex workloads efficiently.

Recommendation Guide: • If you need: A robust, highly scalable HPC environment capable of handling high-priority tasks and maximizing compute resources for large-scale applications. • Consider HPC PRO or CORE if: Your workload doesn’t require ultra-low latency networking or high-throughput storage and can be managed within a moderate or lightweight setup.

https://store-images.s-microsoft.com/image/apps.45871.243c39e0-1137-45fd-afed-b786288ba52e.beef34d2-88c8-4de3-9e12-ac53a26e8b82.159157f7-f1ac-4c35-aad3-d321d8d97a07
https://store-images.s-microsoft.com/image/apps.45871.243c39e0-1137-45fd-afed-b786288ba52e.beef34d2-88c8-4de3-9e12-ac53a26e8b82.159157f7-f1ac-4c35-aad3-d321d8d97a07
https://store-images.s-microsoft.com/image/apps.52346.243c39e0-1137-45fd-afed-b786288ba52e.beef34d2-88c8-4de3-9e12-ac53a26e8b82.988bb675-d10e-4698-ac76-3d08d5b8ecb5
https://store-images.s-microsoft.com/image/apps.50979.243c39e0-1137-45fd-afed-b786288ba52e.beef34d2-88c8-4de3-9e12-ac53a26e8b82.aad5679e-3b10-45e4-8479-b0d9b61d92e0