Roles & Responsibilities
Job Description :
- SLOs & error budgets - Define, track, and evangelize latency and availability targets for our payment APIs.
- Observability - Deploy Cloud Monitoring, Cloud Trace, Error Reporting, and dashboards; integrate alerts via Incident.io and Slack for on-call.
- Incident lifecycle - Establish blameless postmortems, guardrails, and runbooks to drive learning and prevent recurrence.
- CI / CD golden path - Codify Cloud Build pipelines and automated canary rollouts for Cloud Functions / Cloud Run.
- Infrastructure as Code - Manage GCP resources; embed security, IAM least-privilege, and cost controls by default.
- Performance & cost tuning - Profile hot paths (BigQuery, Firestore, Pub / Sub), and implement caching or concurrency improvements to keep user latency
- Developer tooling - Eliminate toil by improving local-to-prod parity, secrets management, and spinning up environments with a single command.
- Culture carrier - Instill reliability thinking across engineering and product as the first platform-focused hire.
Requirements :
At least 5+ years of experience building / operating production systems at scale, ideally on Google Cloud or a similar serverless stack, ideally in fast-paced or startup settings.Hands‑on Fluency with Firebase, Cloud Build, Cloud Run / Functions, Pub / Sub, Cloud SQL / Spanner, VPC Service Controls.Strong coding in Python or Go for automation, with an eye on maintainability.Demonstrated record of driving observability, on‑call and cost optimisation in a fast‑moving environment.Excellent collaboration and communication skills to work effectively with cross-functional teams.Experience in payments, PCI‑DSS, or crypto settlement flows is a bonus.Tech note : we are 99 % serverless . There are no pet VMs to patch, but the stakes are higher : every cold‑start, DB connection pool and retry policy can impact real money transfers. You’ll architect for resiliency and velocity.
Tell employers what skills you have
Scalability
Pipelines
Software Engineering
Reliability
Reliability Engineering
Python
Firebase
Platforms
Software Development
BigQuery