Logo

Principal Platform Engineer

Helius
North America
Full time
Remote

Overview

Department

Engineering

Job type

Full time

Compensation

Salary not specified

Location

North America

Company size

Scale Up [ 11<50 employees ]

Ready to apply?

You're one step away - it takes less than a minute to upload your resume

Resume Assistance

See how well your resume matches this job role with our AI-powered score. By uploading your resume, you agree to our Terms of Service

Lead the Platform Engineering Team, building internal developer platforms and scaling solutions for thousands of bare-metal servers. Build service frameworks to manage globally distributed services ensuring high-reliability and uptime.

Requirements

  • A minimum of 8 years of experience in a DevOps or Site Reliability Engineering role, preferably in a high-performance, low latency environment.
  • Experience managing and optimizing bare-metal server environments.
  • Expert scripting and programming skills (e.g., Bash, Python, Go).
  • Experience in Rust, Golang, Java, or a similar language.
  • Proficiency with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
  • Strong knowledge of automation tools and frameworks (e.g., Ansible, Terraform, Puppet, Chef).
  • Expertise in CI/CD tools and practices (e.g., Jenkins, GitLab CI, CircleCI).
  • Excellent problem-solving skills and the ability to troubleshoot complex issues.
  • Strong communication skills and the ability to collaborate effectively with cross-functional teams.
  • Ability to work independently and take ownership of projects from start to finish.
  • Responsibilities

  • Design, implement, and manage automated systems for deploying, monitoring, and maintaining our bare-metal servers and services.
  • Develop and maintain CI/CD pipelines to streamline the deployment process.
  • Enhance the security of our infrastructure and networks by implementing best practices and proactive measures.
  • Monitor system performance, identify and resolve issues to ensure high availability and reliability.
  • Lead incident response and root cause analysis for system outages and issues.
  • Implement robust security measures to safeguard sensitive data and protect against cyber threats and attacks.
  • Collaborate with the engineering team to optimize performance and scalability of our services.
  • Establish and enforce policies and procedures to ensure compliance with industry standards and regulations.
  • Benefits

  • Competitive salary and equity package
  • Flexible work hours and remote-friendly environment
  • Generous vacation and time-off policy
  • Opportunities for personal and professional growth in a fast-paced, dynamic industry
  • © All rights reserved.