Bonsai’s mission is to enable checkout at every point of discovery. Our technology allows users to transact instantly while browsing content across websites and social media platforms. Our vision is one where anyone can sell anything anywhere on the internet using our technology. Our team has been scaling quickly this year to help realize that vision. As a result, Bonsai was recently listed as one of LinkedIn's Top 5 Startups in Canada for 2021. For publishers, we provide a powerful revenue alternative to ads that also improves the user experience. For merchants, Bonsai allows them to capture customers while they browse content, right when purchasing intent is being formed. We’ve been featured in the Wall Street Journal and Business Insider for leading the charge in a new wave of native commerce, but the opportunity is still largely untapped. We are looking for a Senior Site Reliability Engineer to become an integral part of our Site Reliability team. The Senior SRE will be part of a small but impactful team within our wider engineering structure focusing on the performance, reliability, and scalability of the Bonsai ecosystem. What will I be doing? Owning Terraform-provisioned infrastructure on GCP and GKE services to ensure consistent deployments and stability of our services for all users. Designing, developing and maintaining robust CI/CD tools and pipelines. Implementing monitoring, observability and alerting tools such as dashboards, logging and tracing systems to understand and be able to quickly respond to the abnormalities of the health and availability of our infrastructure and applications. Incident response, diagnosis and blameless postmortems on system outages and/or alerts so we all can learn from our mistakes and improve our services. Collaborating with and mentoring engineering teams to improve their development experience by creating tools, systems and processes that help them scale and provide better on-call support to the services they support. Establishing best practices (on-call, tracing, security, reliability culture), documenting them and running training sessions with developers to improve full service ownership. Participating in an on-call rotation to support our business-critical infrastructure. What do I need? Experience with Kubernetes, Docker, Terraform, and Helm for large-scale production environments. A strong understanding of networking concepts. Multiple years of experience in designing and implementing reliable infrastructure for a scaling technology company. Strong knowledge of monitoring and alerting tools. Experience with a Cloud Platform (GCP, AWS or Azure) in production. Expertise with modern databases such as PostgreSQL & MongoDB. Experience building automation scripts using Bash, JavaScript, Python, or Go. Experience with secrets management. Nice to have: Experience with JavaScript and/or Typescript. Experience designing and implementing fast CI/CD pipelines. Experience defining and implementing SLIs and SLOs.