Logo

Director of Site Reliability Engineering (SRE) - New York

Galaxy
New York, United States
Full time
$200,000 - $250,000 per year
On site

Overview

Department

Engineering

Job type

Full time

Compensation

$200,000 - $250,000 per year

Location

New York, United States

Company size

Mature [ 50+ employess ]

Ready to apply?

You're one step away - it takes less than a minute to upload your resume

Resume Assistance

See how well your resume matches this job role with our AI-powered score. By uploading your resume, you agree to our Terms of Service

Lead reliability, automation, and performance initiatives; drive development of scalable, secure, and resilient infrastructure; work closely with engineering and security teams.

Requirements

  • 10+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
  • Deep expertise in Kubernetes, containers, and service mesh architectures
  • Strong proficiency in Terraform and other Infrastructure as Code (IaC) tools
  • Extensive experience with AWS and on-prem environments
  • Hands-on experience with observability stacks (e.g., Datadog, Prometheus, Grafana, OpenTelemetry)
  • Experience securing and optimizing crypto/blockchain infrastructure (e.g.,Ethereum, Solana, Bitcoin, L2s)
  • Proven experience leading high-performing SRE teams
  • Ability to work cross-functionally with engineering, security, and product teams
  • Strong analytical, problem-solving, and incident management skills
  • Excellent communication and stakeholder management abilities
  • Responsibilities

  • Define and execute the SRE strategy to enhance system reliability, scalability, and security
  • Lead a team of SREs, fostering a culture of automation, observability, and continuous improvement
  • Implement and refine SLOs, SLAs, and error budgets to balance innovation with system stability
  • Collaborate with security teams to enforce best practices for crypto-related workloads
  • Architect and manage containerized environments using Kubernetes and related cloud-native technologies
  • Oversee Infrastructure as Code (IaC) using Terraform, ensuring repeatability and compliance
  • Develop CI/CD pipelines to accelerate software delivery with security and compliance in mind
  • Improve auto-scaling, failover, and disaster recovery strategies for highly available systems
  • Build and enhance observability capabilities using logging, tracing, and metrics tools (e.g., Datadog, Prometheus, OpenTelemetry)
  • Define monitoring and alerting strategies to proactively detect and mitigate failures
  • Lead post-mortem reviews, driving improvements in system design and operational playbooks
  • Develop automated solutions to minimize MTTR (Mean Time to Recovery) and improve incident response
  • Optimize infrastructure for crypto/blockchain nodes, validators, and smart contracts
  • Implement security best practices for key management, RPC endpoints, and decentralized protocols
  • Ensure regulatory compliance and high availability for blockchain-based applications
  • Benefits

  • Competitive base salary and discretionary bonus
  • Flexible Time Off (i.e. unlimited paid vacation days)
  • Company paid Holidays (11)
  • Company paid sick leave
  • Company-paid health and protective benefits for employees, partners, and other dependents
  • 3% 401(k) company contribution
  • Generous paid Parental Leave
  • Free virtual coaching and counseling sessions through Headspace
  • Opportunities to learn about the Crypto industry
  • Free daily snacks in-office
  • Smart, entrepreneurial, and fun colleagues
  • Employee Resource Groups
  • © All rights reserved.