Ncounter Technology Recruitment logo

Platform Engineer

Ncounter Technology Recruitment

Toronto, Canada

Share this job:
Posted:

Job Description

Overview

Principal Platform Engineer, Reliability and Observability

Ncounter is hiring a senior Platform Engineer to own reliability and observability across a mission-critical trading platform. This is a deeply technical role focused on keeping complex, distributed systems stable, measurable, and predictable under real-time load. You will work directly on shared platform services that underpin trading and research workloads, where latency, partial failure, and blind spots in monitoring are not tolerated.

Observability is a core engineering concern here, not a bolt-on toolset. You will design and operate metrics, logging, tracing, and alerting pipelines that ingest high-volume telemetry, expose system behaviour under stress, and materially reduce operational risk. The role blends production engineering, platform tooling, automation, and reliability-led architecture, with direct ownership of systems running at scale.

Responsibilities

  • Owning reliability and observability for shared platform services in Linux and Kubernetes environments
  • Designing and operating high-throughput metrics, logging, and tracing pipelines for real-time systems
  • Hardening services against latency degradation, cascading failure, and outages using reliability engineering principles
  • Reducing operational toil through automation, GitOps workflows, and platform tooling
  • Improving on-call signal quality through alert design, runbooks, and post-incident learning
  • Partnering with engineers to bake observability and resilience into services by default

Core Technical Background

  • Strong experience in SRE, production engineering, or platform reliability with ownership of live systems
  • Deep Linux systems knowledge, debugging, and performance tuning
  • Software engineering with Python or Go, plus solid Git and CI/CD experience
  • Hands-on expertise with observability stacks covering metrics, logs, traces, and alerting
  • Experience operating systems at scale, including HA, DR, and incident response

Nice to Have

  • Infrastructure automation with Terraform or Ansible

This is a role for engineers who enjoy understanding how systems really behave under pressure and who want to own reliability as a first-class engineering problem. If you like solving hard platform problems where observability directly drives system correctness, this is worth a conversation.


#J-18808-Ljbffr
Back to Listings

Create Your Resume First

Give yourself the best chance of success. Create a professional, job-winning resume with AI before you apply.

It's fast, easy, and increases your chances of getting an interview!

Create Resume

Application Disclaimer

You are now leaving Internationalstudentshelpline.com and being redirected to a third-party website to complete your application. We are not responsible for the content or privacy practices of this external site.

Important: Beware of job scams. Never provide your bank account details, credit card information, or any form of payment to a potential employer.