hero
TPY Capital
TPY Capital
14
companies
76
Jobs

Software Engineer - Resilience Engineering

Ozcode

Ozcode

Software Engineering
Paris, France · Grenoble, France · Nantes, France · Lyon, France
Posted on Friday, September 22, 2023

Production Practices is a new team in our SRE organization whose mission is to steward production readiness and support engineers as they implement best practices around reliability and operational excellence. We advocate on behalf of engineers to improve the overall developer experience of building resilient services at Datadog, and support the company by identifying, tracking, and mitigating systemic reliability risks.

At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them.

What you’ll do:

  • Steward production readiness for the company by setting the technical direction of reliable and sustainable production practices.
  • Perform readiness reviews for new services when they launch to production and writing software to automate the process.
  • Work with teams across Datadog to help them implement best practices to build resilience and operational excellence. This involves running training workshops for SRE concepts and writing reliability bulletins that solve specific engineering problems.
  • Identify sources of friction for engineers running a service in production and advocate on their behalf by building golden paths and tools to support healthy production practices. Collaborate with other infrastructure teams to improve the developer experience and lower the burden of launching services.
  • Help the organization identify, track, and mitigate emergent risks. Where necessary, participate in cross-functional squads to solve complex reliability problems.
  • Collaborate with the Chaos Engineering team on large-scale game days which involve injecting faults into our stack. Follow up with engineering teams after these events to ensure risks are identified and fixes are adopted.

Who you are:

  • Around 4 years experience working with distributed systems. A lot of our work involves reviewing unfamiliar services before they launch, so we rely on our people’s strong systems thinking and familiarity with common production patterns.
  • Strong interest in training and helping upskill others. We aim to help build resilience through distributing best practices, so prior experience teaching others or writing developer documentation is a plus.
  • Good coding skills in Go and/or Python. We automate as many manual processes as possible and use code to scale our impact.
  • Empathy, collaboration, and communication skills in English to work remotely with people across teams. Our goal is to improve the developer experience of launching and operating services in production, and to do this we need empathy for others that’s sourced from real world experience.
  • Willingness to jump into new codebases and unknown systems and quickly ramp up. We’re looking for people excited by challenges and who use their determination to cut through and help scope ambiguity.


Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. That's okay. If you’re passionate about technology and want to grow your skills, we encourage you to apply.

Benefits and Growth:

  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
  • Continuous professional development, product training, and career pathing
  • Intradepartmental mentor and buddy program for in-house networking
  • An inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups)
  • Access to Inclusion Talks, our Internal panel discussions
  • Free, global mental health benefits for employees and dependents age 6+
  • Competitive global benefits
  • Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.

Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.

#LI-MF2


About Datadog:

Datadog (NASDAQ: DDOG) is a global SaaS business, delivering a rare combination of growth and profitability. We are on a mission to break down silos and solve complexity in the cloud age by enabling digital transformation, cloud migration, and infrastructure monitoring of our customers’ entire technology stacks. Built by engineers, for engineers, Datadog is used by organizations of all sizes across a wide range of industries. Together, we champion professional development, diversity of thought, innovation, and work excellence to empower continuous growth. Join the pack and become part of a collaborative, pragmatic, and thoughtful people-first community where we solve tough problems, take smart risks, and celebrate one another. Learn more about #DatadogLife on Instagram, LinkedIn and Datadog Learning Center.


Equal Opportunity at Datadog:

Datadog is an Affirmative Action and Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements.

Your Privacy:

Any information you submit to Datadog as part of your application will be processed in accordance with Datadog’s Applicant and Candidate Privacy Notice.