Senior Site Reliability Engineer

  • Full-time

Company Description

Twitter is what’s happening and what people are talking about right now. For us, life's not about a job, it's about purpose. We feel real change starts with conversation. Here, your voice matters. Come as you are and together we'll do what's right (not what's easy) to serve the public conversation.

Job Description

Twitter Site Reliability Engineers (SREs) are Software Engineers who focus on Availability, Reliability, Disaster Recovery, and other challenges of Scale. They possess a breadth and depth of knowledge about Twitter’s production environment that allows them to craft tools, processes, and frameworks to guide colleagues through safely releasing production code, provide mentorship and support for monitoring distributed systems, reduce operational overhead, and enable teams to achieve their desired reliability outcomes.

Our team ingests and serves petabytes of data from all the services and systems across Twitter’s entire infrastructure. This data is critical for our production services. It includes system and service-level metrics, logging, and tracing.

What you’ll be doing:

  • Build tooling to improve the automation of operations, and reduction of toil. This includes automatic failure remediation, application, and systems deployment, capacity planning, and fleet management.
  • Solve complex distributed systems handling millions of queries per second, petabytes of data.
  • Embed with the Software Engineering team to bring your expertise around Availability, Reliability, Scalability, Disaster Recovery, Problem/Incident Management, and Performance of production services.
  • Help bring our service to more data centers and cloud environments faster with reliable automation, Docker + Kubernetes, and other ideas you’ve got!
  • Identify and contribute to solutions for reducing services outages, reducing alert noise, improving monitoring, and helping our services reach Service Level Objectives (SLOs).
  • Work with highly distributed and diverse hardware, software, and networking teams throughout the company.

Qualifications

  • 5+ years of developing or handling services in a distributed, internet-scale, production environment.
  • Practical knowledge of at least one programming language (Python, Go, Java, Ruby, C++, Scala).
  • Deep knowledge of Linux operating system internals, TCP/IP, filesystems, disk/storage technologies.
  • Experience with state configuration tools (Puppet, Chef, etc.)
  • Experience setting up capacity plans for physical and/or virtual infrastructure.
  • Bonus: Hands-on experience with Observability systems including metrics generation, monitoring, alerting, and dashboards for viewing/handling this data.

Additional Information

A few other things we value:

  • Challenge - We solve some of the industry’s hardest problems. Come to be challenged, learn, and thrive as an engineer.
  • Diversity - Diversity makes us a better organization and team. We value diverse backgrounds, ideas, and experiences.
  • Work, Life, Balance - We work hard, but we believe with hard work should come balance.

We are committed to an inclusive and diverse Twitter. Twitter is an equal opportunity employer. We do not discriminate based on race, color, ethnicity, ancestry, national origin, religion, sex, gender, gender identity, gender expression, sexual orientation, age, disability, veteran status, genetic information, marital status or any legally protected status.

Privacy Policy