Sr Manager, Site Reliability Engineering
Company: United Airlines
Location: Chicago
Posted on: May 24, 2025
Job Description:
Achieving our goals starts with supporting yours. Grow your
career, access top-tier health and wellness benefits, build lasting
connections with your team and our customers, and travel the world
using our extensive route network.
Come join us to create what's next. Let's define tomorrow,
together.
DescriptionJob overview and responsibilitiesAs the Senior Manager
of Site Reliability Engineering, you are responsible for guiding a
team dedicated to the instrumentation and analysis of vital
business applications, ensuring their availability, and
contributing to major incident resolution and root cause analysis.
You hold accountability for devising the strategy, as well as the
assessment, deployment, and management of IT operations tools and
methodologies. Your leadership role involves steering technical
experts who specialize in evaluating enterprise reliability and
enhancing system efficiency. Furthermore, you are tasked with
forging and upholding robust connections with digital technology
and business executives at all tiers, leveraging your profound
technical knowledge and outstanding leadership and analytical
abilities to lead your team towards creating highly available
applications, adhering to best practices, and promoting system
optimization based on empirical evidence in partnership with
development teams by leveraging modern DevOps practices.
- Design, Develop & Drive Outcomes:
- Understand the potential impact of system requirements and
design choices across multiple cloud and on-premise
technologies
- Embrace the role of developing and mentoring the Site
Reliability Engineering team, fostering expertise in this critical
area
- Guide the team to devise solutions that not only meet long-term
objectives but also effectively address urgent technical debts
- Position yourself as a prominent thought leader in Site
Reliability Engineering Principles, influencing others through your
knowledge and experience
- Regularly disseminate best practices and champion process
improvements, both within your team and in collaboration with other
teams, to drive collective success
- Program Management & Delivery:
- Track the team's progress on projects and key performance
indicators, while also offering concrete, actionable suggestions
for further enhancing or influencing product or project
delivery
- Encourage cross-functional collaboration and gather input from
technology teams to promote ongoing program enhancement
- Regularly provide insights on critical Site Reliability
Engineering metrics to showcase the program's achievements and
identify potential areas for improvement
- Keep an updated collection of materials to communicate the
current status, including progress, obstacles, opportunities, and
the program's strategic direction to Digital Technology
leaders
- Effectively manage both internal and external relationships to
foster and sustain beneficial strategic partnerships, thereby
advancing the success of the Site Reliability Engineering Program
Develop and roll out training initiatives to ensure that partners
are well-equipped to fully utilize Observability programs
- Oversee the 24/7 command center teams, ensuring they are adept
at early detection, triage, and recovery for all applications and
services, which contributes to a reduced mean time to recovery
- Talent Management and People Development:
- Initiate and facilitate the performance assessment process for
your team, fostering an environment that encourages individuals at
all performance tiers to excel
- Establish and nurture relationships with team members to create
a foundation of trust, recognizing areas where technical or
analytical skills are lacking, devising strategies for improvement
Regularly encourage team members to exchange expertise about Site
Reliability Engineering practices and embrace new technologies
- Lead and inspire teams to tackle intricate challenges and
champion the use of open-source technologies and solutions
- Organizational Effectiveness / People:
- Possessing robust technical expertise and leadership qualities
as you lead by example with a proven track record in Site
Reliability Engineering
- Your proficiency in driving the creation of multi-cloud
infrastructure serves as a benchmark and motivates the team of
developers and infrastructure engineers
- Collaborate with your engineers to manage project dependencies,
adeptly negotiate and plan for incremental delivery milestones with
stakeholders, and achieve on-time project completion
- Work closely with product teams to understand and address their
performance and resilience concerns, and formulate sustainable
strategies to resolve persistent challenges
- Engineering Excellence and Practices:
- Continuously work on enhancing the reliability, stability, and
performance of our digital platforms, being at the forefront of
promoting engineering excellence, implementing best practices, and
overseeing the integration of fully automated telemetry within
modern DevOps frameworks
- Your work in advancing problem detection and service
restoration processes is pivotal
- Utilizing cutting-edge Site Reliability Engineering methods,
coupled with automated alerting and self-healing mechanisms, you
are instrumental in improving both cloud-based and on-premises
systems, thereby fortifying our digital infrastructure's robustness
and efficiencyQualificationsWhat's needed to succeed (Minimum
Qualifications):
- Bachelor's degree in information technology, Business
Administration, Computer Science or relevant field
- 7+ years of IT and business/industry work experience
- 5+ years of Site Reliability Engineering experience working
with telemetry, observability, self-healing solutions, and platform
automation
- +5 years of experience leading projects and managing
people
- 2 - 3 years of leadership experience in managing
cross-functional teams or projects, and influencing senior level
management and key stakeholders
- 2+ years of experience with leading DevOps practices and tools
(CI/CD pipelines, Jenkins, GitHub)
- Recognized expertise in field - in industry and/or within
United
- Proven expertise in leading and influencing technical staff or
coordinating work across multiple technology teams
- Proven experience with monitoring, logging and telemetry tools
like Dynatrace, Splunk, Prometheus, AWS Cloudwatch, etc.
- Proficiency with DevOps practices and tools (CI/CD pipelines,
Jenkins, GitHub)
- Ability to diagnose and troubleshoot issues effectively
- Strong and effective communication skills and status
reporting
- Experience with AWS networking services like VPC, Route 53, and
CloudFront, with understanding of cloud concepts like IaaS, PaaS,
and SaaS
- Experience with distributed storage technologies such as EC2
(Elastic Compute Cloud), S3 (Simple Storage Service), RDS
(Relational Database Service), VPC (Virtual Private Cloud), Lambda,
and CloudFormation
- Experience in developing monitoring tools and log analysis
tools to manage operations
- Experience in one or more general purpose programming
languages: Python, JavaScript, shell scripting (Unix/Linux)
- Dynatrace Associate Certification or AWS Certified DevOps
Engineer is a plus
- Must be legally authorized to work in the United States for any
employer without sponsorship
- Successful completion of interview required to meet job
qualification
- Reliable, punctual attendance is an essential function of the
position
The base pay range for this role is $137,275.00 to $187,000.00.
The base salary range/hourly rate listed is dependent on
job-related, non-discriminatory factors such as experience,
education, and skills. This position is also eligible for bonus
and/or long-term incentive compensation awards.
You may be eligible for the following competitive benefits:
medical, dental, vision, life, accident & disability, parental
leave, employee assistance program, commuter, paid holidays, paid
time off, 401(k) and flight privileges.
United Airlines is an equal opportunity employer. United Airlines
recruits, employs, trains, compensates and promotes regardless of
race, religion, color, national origin, gender identity, sexual
orientation, physical ability, age, veteran status and other
protected status as required by applicable law. Equal Opportunity
Employer - Minorities/Women/Veterans/Disabled/LGBT.
We will ensure that individuals with disabilities are provided
reasonable accommodation to participate in the job application or
interview process, to perform crucial job functions. Please contact
to request accommodation.
#J-18808-Ljbffr
Keywords: United Airlines, West Allis , Sr Manager, Site Reliability Engineering, Professions , Chicago, Wisconsin
Didn't find what you're looking for? Search again!
Loading more jobs...