Job description
Postion - Sr. TechOps Engineer
Job location - Bangalore and Hyderabad
Job Summary:
We are seeking a skilled professional to join our team as a TechOps Lead/ Sr. TechOps engineer. The ideal candidate will have a strong background in Linux system administration, expertise in managing AWS/Oracle cloud infrastructure, and knowledge of DevOps principles and practices.
Responsibilities:
Linux Administration:
- Install, configure, and maintain Linux operating systems and applications.
- Perform system monitoring, troubleshooting, and performance tuning.
- Manage user accounts, permissions, and file systems.
- Implement and maintain security measures, including access controls, firewalls, and intrusion detection systems.
- Collaborate with development teams to ensure proper integration of software applications with the Linux infrastructure.
- Automate routine tasks using scripting languages such as Bash or Python.
- Stay up-to-date with Linux technologies, security vulnerabilities, and best practices.
AWS Administration:
- Design, deploy, and manage AWS cloud infrastructure, including EC2 instances, VPCs, S3 buckets, and IAM policies.
- Monitor and optimize AWS resources for performance, cost, and security.
- Implement backup and disaster recovery strategies for AWS environments.
- Configure and manage AWS services, such as RDS databases, Elastic Load Balancers, and Auto Scaling.
- Collaborate with development teams to ensure seamless deployment and scalability of applications on AWS.
- Implement security controls and best practices for AWS environments.
- Monitor and respond to AWS service incidents and outages.
- Stay up-to-date with AWS services, features, and best practices
Requirements
- Bachelor's degree in Computer Science, Information Technology, or a related field. Relevant certifications (e.g., Linux Professional Institute Certification, AWS Certified SysOps Administrator, AWS Certified DevOps Engineer, DevOps Institute certifications) will be a plus.
- 5+ years’ experience in a technical or network operations support environment
- Proven experience as a Linux Administrator, AWS Administrator, or DevOps engineer.
- Strong knowledge of Linux operating systems (Red Hat, CentOS, Ubuntu) and command-line tools.
- Proficiency in scripting languages such as Bash or Python.
- Experience in managing AWS cloud infrastructure, including EC2, VPC, S3, IAM, and RDS.
- Familiarity with DevOps principles and practices, including CI/CD, infrastructure as code, and configuration management.
- Hands-on experience with DevOps tools, such as Jenkins, Gitlab CI/CD, Ansible, Chef, Puppet, Terraform, or Cloud Formation.
- Solid understanding of networking, security, and system administration best practices.
- Strong analytical and problem-solving skills.
- Experience in Change Management & Problem Management domains is a plus
- Excellent communication and collaboration skills to work effectively with cross-functional teams.
- Ability to work independently and manage multiple priorities in a fast-paced environment.
Principal Responsibilities
- Confirms and troubleshoots all alerts from remote monitoring tools and works on all issues related with cloud environments, network infrastructure, hardware and/or applications
- Confirms and acknowledges all alerts received from the internal Monitoring tools and follows the run book to perform the needed troubleshooting steps.
- Based on the run book, work on the alert & resolve the same or update the findings in the ticket and route it to the respective team for further analysis and resolution
- Responsible for executing operational objectives and ensuring that you meet or exceed service level expectations by following defined resolution and escalation procedures to resolve the alerts
- Verify the run book for its accuracy and ensure that the documentation is easily understood and can be followed for easy execution and/or escalation
- Responsible for escalating and prioritizing any unresolved issues to the appropriate on-call staff so the ticket can be closed in a timely manner and reports any violations.
- Maintains the SLA for alerts & tickets
- Suggests improvements on thresholds and other parameters based on the trends for alerts
- Accountable for the accuracy and timeliness of the alerts based on prioritisation
- Aggressively follows up on resolution of ticket and information update so ticket can be effectively closed in a timely fashion
Danish Ahmed
Talent acquisition