
Lead Site Reliability Engineer (Technical Duty Officer)
- Australia
- Permanent
- Full-time
- Own the incident management process, ensuring it drives enduring reliability across all products and services within Xero.
- Provide expert leadership during critical outages, coordinating multiple teams to ensure streamlined decision-making and quick resolution.
- Lead and advocate for the transformation to a world-leading SRE organization, promoting SRE principles within the Engineering Department.
- Promote a customer-focused approach by addressing and mitigating global customer environment issues, and fostering a culture of continuous learning and technical excellence within the SRE team.
- Develop and implement scalable process frameworks and observability strategies to ensure rapid problem diagnosis, response, and service reliability.
- Collaborate with product teams to thoroughly analyze failures and integrate insights to improve service reliability, scalability, and operational efficiency.
- Previous career experience as a Site Reliability Engineer, in an Operations or Engineering environment
- Strong hands-on coding experience (preferably Python) and knowledge of software engineering best practice
- Hands-on experience troubleshooting AWS hosted services
- Networking knowledge and able to troubleshoot TCP/IP, SSL/TLS, DNSSEC, IPsec, and BGP issues
- Strong communication (oral & written) skills including the ability to translate technical issues/concepts into agreed actions