
Senior Site Reliability Engineer - CPTO (P680)
- Melbourne, VIC
- Permanent
- Full-time
- Designing and implementing for scale & resilience: Architect, implement and continuously improve our existing Data Center and Cloud environments on AWS, Azure, and VMware, ensuring they meet our SLAs and adapt dynamically to demand working alongside the Platform teams providing PaaS/IaaS.
- Driving automation: Build and evolve infrastructure as code (Terraform, etc.) and CI/CD pipelines (GitHub Actions, etc.) to ship new features safely and at speed.
- Defining and measuring reliability: Partner with teams to set up meaningful SLIs/SLOs, implement real-time observability (Datadog, Prometheus, Grafana, ...) and proactively identify risks before it impacts our users.
- Leading incident response: Own the on-call rota, coach teams through blameless post-mortems, and embed a culture of continuous improvement so outages become learning opportunities.
- Mentoring & evangelism: Share your deep expertise by pairing with engineers, running brown-bag sessions on reliability best practices, and helping raise the bar across our global engineering organisation.
- Securing our stack: Collaborate with our Security team and include security controls into CI/CD, runtime environments and disaster-recovery plans; so, our customers and citizens are always protected.
- Demonstrable experience in a production SRE, DevOps or infrastructure role, ideally within a SaaS or large-scale web environment
- Expert in at least one public cloud (AWS, Azure, or GCP) and comfortable designing hybrid migrations from on-prem to cloud
- Proven track record with IaC tools (Terraform, CloudFormation, or similar) and container orchestration (Kubernetes, ECS, AKS, OpenShift)
- Proven track record with virtual machine orchestration / provisioning and resiliency strategies (Kubevirt, packer, ansible, ...)
- Strong coding/scripting skills (Python, Go, Bash, etc.) and a passion for building reusable tested libraries and tooling
- Deep understanding of monitoring, logging, and tracing frameworks (Prometheus/Grafana, ELK/Opensearch, Jaeger, etc.)
- Excellent communicator who thrives in cross-functional teams, with passion for translating complex technical issues into clear, actionable plans