Lead Production Support Engineer
Spacetalk
- Sydney, NSW
- Permanent
- Full-time
- Lead investigation and resolution of complex production issues across our technology stack (mobile apps, web services, APIs, AWS infrastructure).
- Serve as the primary technical escalation point for Customer Support, translating customer reports into technical diagnoses and solutions.
- Monitor and respond to alerts from our monitoring systems, ensuring rapid identification and resolution of platform issues.
- Develop and maintain runbooks, troubleshooting guides, and escalation procedures for common and complex technical scenarios.
- Collaborate with the product engineering teams to implement permanent fixes for recurring issues whilst shielding them from day-to-day operational distractions.
- Maintain deep technical knowledge of our AWS serverless architecture (Lambda, DynamoDB, API Gateway, CloudWatch, S3, Redis/ValKey, etc.).
- Perform root cause analysis on production incidents using AWS monitoring and logging services (CloudWatch, X-Ray).
- Implement and optimise monitoring, alerting, and logging systems across our AWS infrastructure to proactively identify potential issues.
- Support deployment and release activities through GitLab CI/CD pipelines and AWS services, ensuring smooth transitions to production.
- Establish best practices and processes for technical support operations across multiple time zones.
- Design the framework for expanding technical support coverage to EU and US markets.
- Mentor and develop technical support capabilities within the Customer Support team.
- Create documentation and training materials to enable future team scaling.
- Work closely with Customer Support, Product, Engineering, and QA teams to ensure seamless issue resolution and customer communication.
- Partner with Data Science teams to identify patterns in platform issues and user behaviour.
- Collaborate with Regulatory and Security teams to ensure compliance during incident response (SOC2, HIPAA, GDPR).
- 5+ years of experience in technical support, platform operations, or site reliability engineering, with hands-on production issue resolution.
- Strong experience with AWS services (Lambda, DynamoDB, API Gateway, CloudWatch, S3, OpenSearch, Redis/ValKey) and cloud-native troubleshooting.
- Proficiency in Node.js, TypeScript, and JavaScript for investigating and resolving application-level issues.
- Experience with mobile app ecosystems (iOS/Android) and understanding of mobile-specific technical challenges.
- Strong analytical and problem-solving skills with experience in root cause analysis and incident management.
- Excellent communication skills with ability to explain technical issues to both technical and non-technical stakeholders.
- Experience with monitoring and observability tools (CloudWatch, Datadog, or similar).
- Knowledge of React and React Native applications and common deployment patterns.
- Experience with CI/CD pipelines (GitLab preferred) and deployment troubleshooting.
- Knowledge of .NET code and windows systems to investigate issues with our legacy offerings.
- Background in SaaS or consumer-facing applications with high availability requirements.
- Experience in regulated industries (healthcare, safety, security, or finance).
- Previous experience building or scaling technical support teams in global organisations.
- Impact: Be the technical guardian ensuring our AI-driven safety solutions remain reliable for families who depend on us for their loved ones' wellbeing.
- Global Reach: Build technical support capabilities for products used across Australia, Europe, and the US.
- Platform Ownership: Take ownership of our cutting-edge AWS serverless architecture and monitoring systems.
- Leadership Opportunity: Establish and grow a world-class technical support function from the ground up.
- Career Growth: Develop leadership skills in a high-growth, ASX-listed technology company with clear progression opportunities.