Western Digital reliance on software and software development workflows is growing by leaps and bounds as a leading provider of Storage Solutions. As Secure Development Factory (SDF) Senior Site Reliability Engineer, you will be at the heart of Western Digital’s engineering process, delivering the software development tools and infrastructure that empowers engineering teams to develop and deliver high quality products quickly. You will play a pivotal role in ensuring the reliability, scalability, and performance of our IT infrastructure and DevOps tools. You will lead by example and collaborate closely with Engineering teams to align our efforts with customer requirements. Your technical expertise, adaptability, and commitment to excellence will drive the success and empower our stakeholders to develop and deliver high quality products faster reducing time to market without sacrificing security, development velocity, stability, code quality or code health.
The ideal candidate will have a passion for technology, a relentless focus on the customer experience and an ability to multitask, assimilate data, make decisions and prioritize complex work while paying attention to the details. Communication with internal customers, vendors and co-workers in a clear and professional manner is an absolute must. This position is open to candidates located in Bangalore, India.
Key Responsibilities
- Technical Leadership: Provide technical leadership to the DevOps infrastructure team, fostering a collaborative, positive and growth-oriented team environment.
- Architecting and Designing: Play a key role in shaping the architecture and design of systems and applications, aligning them with reliability and scalability objectives.
- Accountability: Demonstrate ownership of system reliability, meet Service Level Objectives (SLOs), and ensure high levels of customer satisfaction.
- Collaboration: Engage closely with Engineering teams to comprehend customer requirements and jointly devise innovative solutions.
- Observability and Monitoring: Architect, implement, and iteratively enhance monitoring and observability solutions to achieve effective and real-time visibility into system performance.
- Best Practices Champion: Advocate for and implement best practices in Site Reliability Engineering (SRE), DevOps, and Automation, prioritizing the enhancement of platform stability and performance.
- Automation Leadership: Drive automation initiatives to streamline processes, minimize manual tasks, and enhance overall operational efficiency.
- Adaptability: Remain current with emerging technologies and demonstrate the ability to quickly adapt to evolving requirements and challenges.
- Upskilling: Proactively engage in continuous learning of emerging technologies and contribute to knowledge sharing within the team.
- Team Player: Foster effective collaboration with team members and actively contribute to cultivating a positive team culture
- Professional Behaviour: Demonstrate professionalism, integrity, and a commitment to the highest ethical standards.
- Documentation: Ensure meticulous and well-organized documentation for systems and processes is consistently maintained.
Required Skills and Qualifications
- Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field.
- Candidates MUST HAVE 9 to 12 years of hands-on experience in DevOps tools and SRE practices.
- Exceptional analytical, problem solving, troubleshooting skills and the ability to think strategically about technical challenges to manage complex process and technology issues.
- Positive attitude, strong work ethic, and a team player mentality.
- Extensive experience in Ansible automation (Research, Write, Maintain, and Optimize roles/playbooks/modules)
- Proficiency in containerization technologies viz., Docker, Kubernetes.
- Expertise in shell scripting, Python, and other configuration management tools like Terraform.
- MUST HAVE Administration experience on DevOps tools such as Artifactory, Jenkins, Zuul, Git/Gerrit, Wan disco SVN, Blackduck, CodeScene, Spinnaker, and SAST/DAST tools.
- Development and customisation of CICD pipelines and onboarding applications with varying requirements
- Immense experience in monitoring enhancements and metrics dashboarding using tools such as Icinga, Splunk, Prometheus & Grafana
- Excellent communication and collaboration skills.
- Automation First mindset.
- Focus on embedding Security postures on the systems.
- Working experience in ha-proxy, load balancers, ldap/sso integration, security endpoint configurations
- Enormous SME experience in building highly available, scalable, secured, and stable solutions for DevOps tools & services.
- Must Possess strong documentation skills and can work with rapid change and at a fast pace
- Very good understanding of Infrastructure at the Server, VMWare, Storage and Networking
- Knowledge of cloud computing platforms (e.g., AWS, Azure, GCP) is a plus
- Good experience in supporting large number of users and developers providing platform as a service
- Proven ability to influence and/or lead high performing/ geographically dispersed teams