We are looking for Senior Site Reliability Engineers (SRE) who are knowledgeable software engineers to provide team leadership and to help take ownership of our infrastructure and products. Our aim is to create new and refactor existing components into self-service, automated solutions that we can expose to our 120+ software engineering organization. We share knowledge, responsibilities and value a culture of everyone being able to speak their mind and offer up constructive feedback. We work cross-functionally with every engineering team at iHeartRadio, ensuring we design and instrument tools and integrations that are of value from day one, driving towards simplicity and ultimately reliability.
- Owning the development of our Shared Compute platform built on Kubernetes.
- Automating AWS and GCP infrastructure provisioning and configuration management, using tools like Ansible, Gitlab and Terraform.
- Providing engineers with the tools they need to meaningfully monitor and alert on the services and features they develop, using tools like Prometheus, AlertManager, Grafana, EFK (ELK) Stack, and PagerDuty.
- Automating DNS and CDN providers (Fastly, DYN, Route53) so that engineers may safely test, deploy and revert changes to these providers without direct SRE involvement.
- Supporting the deployment and automated orchestration of various storage components, such as MongoDB Atlas, Elasticsearch, PostgreSQL, and Kafka.
- Collaborating with and mentoring teammates.
- Working closely with engineers on other teams and presenting to multiple engineering teams.
- Exploring and evaluating different open source tools.
- A passion for your work.
- 3+ years cloud engineering experience or equivalent.
- Experience coding in Go.
- Participated in and contributed to team meetings and discussions.
- Demonstrated experience to resolve incidents, outages, and be part of an on-call rotation.
- Ability to troubleshoot, diagnose and recommend new solutions to difficult engineering and infrastructure related problems.
- Strong communication and documentation skills.
- Taken ownership of infrastructure components and lead team projects.
- Experience with Docker, Kubernetes, Prometheus is a big plus.