Logo

Senior Software Engineer, ML Platform & Ops

World
To reach our next milestone—continuous, trustworthy ML innovation across millions of edge devices—we’re hiring a Senior ML Platform & Ops Engineer to own the ML lifecycle from data to device. You’ll design and operate production-grade pipelines that transform state-of-the-art ML research into deployed models with clear telemetry, rollback, and reproducibility. If you thrive on building self‑service platforms that turn research ideas into reliable, observable production systems, we’d love to meet you.

Overview

Department

Engineering

Job type

Full time

Compensation

€138,000 - €160,000 per year

Location

Munich, Germany, Western Europe

Resume Assistance

See how well your resume matches this job role with our AI-powered score. By uploading your resume, you agree to our Terms of Service

Ready to apply?

You're one step away - it takes less than a minute to upload your resume

About Tools for Humanity

World is a network of real humans, built on privacy-preserving proof-of-human technology, and powered by a globally inclusive financial network that enables the free flow of digital assets for all. It is built to connect, empower, and be owned by everyone. This opportunity would be with Tools for Humanity.

About the AI & Biometrics Team

The AI & Biometrics team is building a biometric recognition system that can work reliably with more than a billion users and enables them to claim their free share of WLD. We use cutting-edge machine learning models deployed on custom hardware to enable high-quality image acquisition, identification, and fraud prevention, all while requiring minimal user interaction.

 

We are building a biometric recognition and fraud detection engine that works on the 1bn people scale. Therefore, its performance needs to out-perform all the current recognition technologies. We leverage our powerful custom-made iris recognition and presentation attack detection device, the Orb, combined with the latest research from the field of AI and Deep Learning.

What You'll Own & Drive

  • End‑to‑end model lifecycle pipelines that take a model from pull‑request to device in minutes
  • Edge‑aware rollout services with staged deployment, A/B experimentation and instant rollback across Orbs, Orb Mini and Mobile Apps
  • A model and dataset registry with lineage, reproducibility, and diffing across versions
  • Data provenance and governance tooling that lets researchers trace every model back to the exact data slice, feature set and config used
  • Developer experience tooling (CLI, SDKs, notebooks, dashboards) to allow ML engineers to self-serve training jobs, schedule retraining, evaluations and labelling campaigns
  • Future platform roadmap and culture: define ML platform best practices, standards, mentor engineers and help grow the ML Platform discipline at TFH
  • Day-to-Day Responsibilities

  • Design, build, and operate reliable, observable infrastructure for training, evaluation, telemetry ingestion, and deployment
  • Maintain CI/CD workflows and OTA pipelines for secure rollout of models to cloud and edge devices
  • Develop secure APIs and backend services that expose governed datasets and model artefacts at scale
  • Implement automated checks, drift detection, and alerting for real-time model monitoring
  • Champion best practices in data lineage, reproducibility, privacy‑by‑design, security and secure edge delivery
  • Collaborate across ML research, product, and firmware teams to streamline delivery and feedback loops
  • Requirements

  • 5+ years building ML infrastructure, data platforms, or production ML systems at scale
  • Track record of delivering platforms and CI/CD pipelines used daily by ML or data teams
  • Hands-on experience running large-scale training on multi-tenant GPU clusters to maximize throughput and reliability
  • You’ve built versioned dataset & lineage systems with slice-level provenance and governed access, making every model reproducible to the exact data, features, code, and config used
  • Deep understanding of containerisation (Docker) and orchestration (Kubernetes/EKS) plus Infrastructure‑as‑Code (Terraform/CDK/Cloudformation)
  • Strong backend engineering skills in Python and/or Go; you value clean, maintainable code
  • Deep understanding of modern CI/CD, model packaging, and observability practices
  • Comfortable operating production systems, defining SLAs, and handling rollout or incident workflows
  • Nice to Have

  • Knowledge of Rust for high‑performance services
  • Understanding of OTA (over-the-air) workflows, hardware/software integration, or embedded systems
  • © All rights reserved.