Engineering /Hybrid/Full-Time

Sr. AWS DevOps SRE Lead

Beaverton, OR / Lakewood, CO / Norwood, MA. United States

Are you endlessly curious, great with people and love working with a talented team of engineers? Do you have a knack for troubleshooting and love solving problems? We are seeking a dynamic and self-driven individual to join our Site Reliability Engineering (SRE) / Managed Services team as a Senior SRE Lead. This is an excellent opportunity for a seasoned professional with strong AWS DevOps skills to not only contribute to cutting-edge technologies but also lead and mentor a team of SRE engineers. At Cloud303, we foster a culture of collaboration, transparency, and a “get it done” attitude. We value individuals who take ownership and guide initiatives to successful completion. Given the variability of challenges that come along with support work, a wide range of experience with different types of AWS workloads is a must.

If you are a highly self motivated individual and are not afraid of wearing multiple hats while learning aspects of running a start-up, you’ll fit right in!

img

Day to Day Responsibilities

  • Team Leadership
    • Lead and mentor a team of Site Reliability Engineers
    • Foster a collaborative and transparent culture within the team.
    • Manage a large number of simultaneous engagements
    • Supervise the helpdesk
    • Approve team timesheets
    • Supervise execution of technical tasks
  • Issue Resolution
    • Ensure outages and issues are resolved promptly
  • Subject Matter Expert
    • Answer questions and provide expert support to engineers working on support tickets
    • Fill in knowledge gaps to ensure the prompt resolution of support tickets
  • Client Engagement
    • Interface with clients as needed, providing expert support and guidance throughout ticket lifecycles.
  • Cloud Solutions Development
    • Identify which requests require larger professional services projects and which can be handled by the SRE team
    • Architect cost-effective, AWS-based cloud solutions for clients
    • Heavy use of IaC (Terraform & CloudFormation) to automate our use of AWS

img

Requirements

  • Experience:
    • 6+ years of production experience, with a focus on cloud-based environments.
    • Proven expertise in networking, orchestration/automation, and scripting.
  • Leadership Skills:
    • Demonstrated leadership experience in managing teams and administering production systems and environments.
    • Participated in outage recovery and incident response.
  • Technical Proficiency:
    • Deep understanding of containers and orchestration systems, preferably EKS/K8S and/or ECS.
    • Experience with deploying and debugging microservices.
    • Experience with instrumenting metrics, alerting, and logging of production systems, especially with DataDog
    • Experience with HPC-related software and services preferred (AWS ParallelCluster, Slurm, AWS Batch, AWS Step Functions, Elastic Fabric Adapter)
  • AWS Mastery:
    • Extensive experience with core AWS services.
    • AWS Solutions Architect or other AWS certifications are highly desirable.
  • Communication:
    • Strong verbal and written communication skills.
    • Previous client-facing experience is strongly preferred.
  • US Citizenship and/or authorized to work in the United States
  • Ability to commute to one of the Cloud303 offices (minimum 2 days per week)
    • Locations:
      • Beaverton, OR
      • Lakewood, CO
      • Boston, MA

Our Benefits

We offer a comprehensive benefits package that supports your well-being, growth, and work-life balance. Our benefits reflect our commitment to a positive and collaborative work environment where you can thrive both professionally and personally.

img

Medical Care

Stay in good health with our robust medical plan.

img

Dental Care

Maintain a healthy smile with our dental care plan.

img

Vision Care

Better vision with our comprehensive plan.

img

Retirement Plan

401k with Contribution Matching and IRA available.

img

Family Leave

Family always comes first at our company. Maternity and Paternity leave are covered.

img

Paid Time Off

Competitive vacation, sick, and public holidays.

img

Stock Option Plan

We're one the fastest growing AWS partners, ever. Gain long term benefits for your efforts.

img

Free AWS Training & Education

We'll pay for your AWS training and certification costs. We invest in your career.

img

Unlimited Snacks & Coffee

We know it's a cheesy benefit, but people do really care about these!