Job opening at Bengaluru
Bengaluru
Bengaluru
Full Time
Any Graduate
2500000 to 3500000
2026 Feb,26
Meezan
meezanwhiteforce@gmail.com
9303447007
Job Title – Lead SoluƟons Architect – AI Infrastructure & Private Cloud Job Description: We are seeking an experienced Lead SoluƟons Architect with deep experƟse in AI/ML infrastructure, High Performance CompuƟng (HPC), and container plaƞorms to join our dynamic team focused on delivering HPE Private Cloud AI and Enterprise AI Factory SoluƟons. This role is instrumental in architecƟng, deploying, and opƟmizing private cloud environments that leverage HPE’s codeveloped soluƟons with NVIDIA, as well as validated HPE reference architectures, to support enterprise-grade AI workloads at scale. The ideal candidate will bring strong technical experƟse in AI infrastructure, container orchestraƟon plaƞorms, and hybrid cloud environments, and will play a key role in delivering scalable, secure, and high-performance AI plaƞorm soluƟons powered by HPE GreenLake and NVIDIA AI Enterprise technologies. Key ResponsibiliƟes: 1. Leadership and Strategy: Provide delivery assurance and serve as the lead design authority to ensure seamless execuƟon of Enterprise grade container plaƞorm —including Red Hat OpenShiŌ and SUSE Rancher, HPE Private Cloud AI and HPC/AI soluƟons, fully aligned with customer AI/ML strategies and business objecƟves. Align soluƟon architecture with NVIDIA Enterprise AI Factory design principles, including modular scalability, GPU opƟmizaƟon, and hybrid cloud orchestraƟon. Oversee planning, risk management, and stakeholder alignment throughout the project lifecycle to ensure successful outcomes. 2. SoluƟon Planning and Design: Architect and opƟmize end-to-end soluƟons across container orchestraƟon and HPC workload management domains, leveraging plaƞorms such as Red Hat OpenShiŌ, SUSE Rancher, and/or workload schedulers like Slurm and Altair PBS Pro. Ensure seamless integraƟon of container and AI plaƞorms with the broader soŌware ecosystem, including NVIDIA AI Enterprise, as well as open-source DevOps, AI/ML tools, and frameworks. 3. Opportunity assessment: Lead technical responses to RFPs, RFIs, and customer inquiries, ensuring alignment with business and technical requirements. Conduct proof-of-concept (PoC) engagements to validate soluƟon feasibility, performance, and integraƟon within customer environments. Assess customer infrastructure and workloads to recommend opƟmal configuraƟons using validated reference architectures from HPE and strategic partners such as Red Hat, NVIDIA, SUSE, along with components from the open-source ecosystem. 4. InnovaƟon and Research: Stay current with emerging technologies, industry trends, and best pracƟces across HPC, Kubernetes, container plaƞorms, hybrid cloud, and security to inform soluƟon design and innovaƟon. 5. Customer-centric mindset: Act as a trusted advisor to enterprise customers, ensuring alignment of AI soluƟons with business goals. Translate complex technical concepts into value proposiƟons for stakeholders 6. Team CollaboraƟon: Collaborate with cross-funcƟonal teams, including subject maƩer experts in infrastructure components—such as HPE servers, storage, networking—and data science teams to ensure cohesive and integrated soluƟon delivery. Mentor technical consultants and contribute to internal knowledge sharing through tech talks and innovaƟon forums. Required Skills: 1. HPC & AI Infrastructure Extensive knowledge of HPC technologies and workload scheduler such as Slurm and/or Altair PBS Pro, Proficient in HPC cluster management tools, including HPE Cluster Management (HPCM) and/or NVIDIA Base Command Manager. Experience with HPC cluster managers like HPE Cluster Management (HPCM) and/or NVIDIA Base Command Manager. Good understanding with high-speed networking stacks (InfiniBand, Mellanox) and performance tuning of HPC components. Solid grasp of high-speed networking technologies, such as InfiniBand and Ethernet. 2. ContainerizaƟon & OrchestraƟon Extensive hands-on experience with containerizaƟon technologies such as Docker, Podman, and Singularity Proficiency with at least two container orchestraƟon plaƞorms: CNCF Kubernetes, Red Hat OpenShiŌ, SUSE Rancher (RKE/K3S), Canonical Charmed Kubernetes. Strong understanding of GPU technologies, including the NVIDIA GPU Operator for Kubernetes-based environments and DCGM (Data Center GPU Manager) for GPU health and performance monitoring. 3.OperaƟng Systems & VirtualizaƟon Extensive experience in Linux system administraƟon, including package management, boot process troubleshooƟng, performance tuning, and network configuraƟon. Proficient with mulƟple Linux distribuƟons, with hands-on experƟse in at least two of the following: RHEL, SLES, and Ubuntu. Experience with virtualizaƟon technologies, including KVM and OpenShiŌ VirtualizaƟon, for deploying and managing virtualized workloads in hybrid cloud environments. 4. Cloud, DevOps & MLOps Solid understanding of hybrid cloud architectures and experience working with major cloud plaƞorms in conjuncƟon with on-premises infrastructure. Familiarity with DevOps pracƟces, including CI/CD pipelines, infrastructure as code (IaC), and microservices-based applicaƟon delivery. Experience integraƟng and operaƟonalizing open-source AI/ML tools and frameworks, supporƟng the full model lifecycle from development to deployment. Good understanding of cloud-naƟve security, observability, and compliance frameworks, ensuring secure and reliable AI/ML operaƟons at scale. 5. Networking & Protocols Strong understanding of core networking principles, including DNS, TCP/IP, rouƟng, and load balancing, essenƟal for designing resilient and scalable infrastructure. Working knowledge of key network protocols, such as S3, NFS, and SMB/CIFS, for data access, transfer, and integraƟon across hybrid environments. 6. Programming & AutomaƟon Proficiency in scripƟng or programming languages such as Python and Bash. Experience automaƟng infrastructure and AI workflows. 7. SoŌ Skills & Leadership Excellent problem-solving, analyƟcal thinking, and communicaƟon skills for engaging both technical and non-technical stakeholders. Proven ability to lead complex technical projects from requirements gathering through architecture, design, and delivery. Strong business acumen with the ability to align technical soluƟons with client challenges and objecƟves. QualificaƟons: Bachelor’s/master’s degree in computer science, InformaƟon Technology, or a related field. Professional cerƟficaƟons in AI Infrastructure, Containers and Kubernetes are highly desirable —such as RHCSA, RHCE, CNCF cerƟficaƟons (CKA, CKAD, CKS), NVIDIA-CerƟfied Associate - AI Infrastructure and OperaƟons Typically, 8–10 years of hands-on experience in architecƟng and implemenƟng HPC, AI/ML, and container plaƞorm soluƟons within hybrid or private cloud environments, with a strong focus on scalability, performance, and enterprise integraƟon.
If you are interested, contact on
9303447007