On behalf of Ciklum Digital, Ciklum is looking for a Site Reliability Engineer to join our Kyiv team in a full-time basis.
One of the fastest growing B2B software companies in the world, OutSystems is on a mission to change the way software is built. The OutSystems modern application platform empowers customers to build, deliver, manage and evolve the software that makes a difference to their business. With high-productivity, AI-assisted tools, customers are able to quickly tackle any strategic challenge such as application modernization, workplace innovation, business process automation, and customer experience transformation. The OutSystems platform also ensures solutions are secure, resilient, cloud-native, built to scale, and most importantly, are able to be continuously evolved.
Our Client is looking for Site Reliability Engineers (SREs), who combine advanced Software Engineering practices with mature Operations skills in order to deliver and operate highly resilient systems at scale. Ensure that our Cloud services meet the reliability and uptime requirements of our demanding enterprise customers. This is achieved with proactivity, through the practice of sound engineering practices and resilient design from day 0; as well as with reactively, through a well-defined and effective on-call rotation that runs 24×7.
About Quality Engineering Team:
Quality Engineering Center of Excellence is an international award-winning Quality Engineering department that has rapidly evolved over the past 7 years to become a mature Quality Engineering service provider with 250+ professionals working in 7 main directions: QA Consulting and Management, DevOps, Manual, Automation, Support, Performance, Cyber Security and Robotic Process Automation.
Our main principles are:
- People are over processes and hierarchy
- Flat and open collaboration/communication increases creativity and brings more value to a business
- Investing in people and innovations ensures your future
- Reuse and share your experience – Develop best practices, publicize and follow them
Quality Engineering Center of Excellence is an optimal environment for your professional involvement and growth.
- Automate highly scalable and resilient cloud operations that can be executed with no customer downtime
- Perform blameless root cause analysis on outages and ensure action items are done
- Fix resiliency problems wherever they are in the product, or collaborate with product teams to do it
- Monitor customer infrastructure, measuring availability and system health
- Collaborate with customer support in recovering from escalated outages
- Troubleshoot complex incidents in highly distributed systems
- Shorten time to detecting by improving the accuracy of alarms
- Be a key stakeholder in the design of cloud services so that they are resilient from day 0
- Bachelor or Master Degree in Computer Science or similar
- 1+ years of experience in software development or operations. Programming skills in at least a high-level programming language (C++, Python, Java, C#, etc.)
- Experience in troubleshooting and debugging
- Availability to work in shifts and be part of the 24×7 on-call rotation
- Fluency in English and good communication skills
- Experience with automation and IaC is a plus (CloudFormation, Terraform, Chef, etc.)
- Experience with AWS services is a plus (EC2, RDS, Lambda, step-functions, etc.)
- Experience with Docker and Kubernetes is a plus
- Experience with monitoring and troubleshooting complex distributed systems
- Experience in designing resilient and fault-tolerant systems
- Experience in debugging complex, distributed systems
What's in it for you
- Close cooperation with the client
- A constant flow of new projects
- Dynamic and challenging tasks
- Ability to influence project technologies
- Projects from scratch
- Team of professionals: learn from colleagues and gain recognition of your skills
- European management style
- Continuous self-improvement