Site Reliability Engineer
We are seeking an experienced SRE to support our customer in Romania. As a Site Reliability Engineer (SRE), your primary responsibilities will include ensuring the availability and health of our clients production environment, building and managing software systems for platform infrastructure, and enhancing the reliability, quality, and time-to-market of our software solutions.
Position: Site Reliability Engineer (SRE)
Type: Contract (6 months)
Location: Romania (Remote)
1. Performance Monitoring and Optimization:
• Monitor the availability and health of the production environment.
• Collect and analyse metrics from operating systems and applications to fine-tune performance and identify issues.
2. Collaboration with Development Teams:
• Partner with development teams to enhance services through rigorous testing and reliable release procedures.
• Participate in system design consulting, platform management, and capacity planning.
3. Infrastructure Automation:
• Build and maintain software and systems for platform infrastructure and applications.
• Implement automation solutions to create sustainable and efficient systems and services.
4. Reliability and Service Level Objectives:
• Balance feature development speed with the reliability of services, ensuring adherence to service level objectives (SLOs).
– Bachelor’s degree in computer science or a related technical, scientific discipline.
– Previous successful experience in technical engineering, preferably with public cloud platforms (IBM Cloud, Microsoft Azure, Amazon Web Services).
– Proven knowledge of containerized applications and their lifecycle.
– Experience with dynamic resource management frameworks (e.g., RedHat OpenShift, Kubernetes) and distributed storage technologies (NFS, HDFS, Ceph, S3).
– Proactive problem-solving skills and the ability to identify areas for improvement and performance bottlenecks.