
Service Reliability Engineer
- Australia
- Permanent
- Full-time
- Be on the front line of reliability: Investigate and resolve production issues impacting Mambu customers, ensuring minimal disruption and fast recovery.
- Lead incident management: Collaborate with engineering and support teams to drive major incident resolution, post-mortems, and systemic improvements.
- Improve observability & detection: Design, maintain, and evolve monitoring, alerting, and logging to catch issues before customers do.
- Drive automation: Eliminate repetitive tasks through scripting and tooling, enabling faster, safer, and more consistent operations.
- Shape resilience across Mambu: Partner with product and infrastructure teams to embed reliability, capacity management, and operational excellence into how we build software.
- Cross-train and share knowledge: Strengthen team expertise across domains, continuously growing technical depth and operational maturity.
- Strong communication, problem-solving, and analytical skills, with the ability to balance customer impact and technical priorities.
- Solid understanding of cloud-native applications, distributed systems, and modern web technologies.
- Hands-on experience with public cloud services (AWS, GCP, or Azure).
- SQL knowledge (querying, troubleshooting, and performance tuning).
- Familiarity with the software delivery lifecycle, CI/CD practices, and DevOps culture.
- Proficiency in scripting/programming (Bash, Python, Go, or Java) and experience with observability stacks (Prometheus, Grafana, ELK, OpsGenie, Datadog, etc.) is a plus
- Company equity for all
- Learning and development opportunities
- Hybrid/Remote working (location dependant)
- 30 day working abroad
- 4 week paid sabbatical after 5 years service
- Additional benefits based on location