Site Reliability Engineer for Ciklum Digital

Kyiv, Amosova, Ukraine

Apply

Ciklum is a leading global digital services and software engineering company, serving Fortune 500 and fast-growing organisations. Headquartered in the UK, we unite 3,500+ software developers, designers, product managers and data scientists around the world building tailored digital solutions that leverage emerging technologies.

We are enabling digital transformation for some of the largest household names and platforms in the digital economy. Ciklum is the place to make your tech ideas tangible and join the global projects redefining industries.

We are looking forward to seeing you as a part of our team!

Read more about the client

Description

On behalf of Ciklum Digital, Ciklum is looking for a Site Reliability Engineer to join our Kiev team in a full-time basis.

Project description:

One of the fastest growing B2B software companies in the world, OutSystems is on a mission to change the way software is built. The OutSystems modern application platform empowers customers to build, deliver, manage and evolve the software that makes a difference to their business. With high-productivity, AI-assisted tools, customers are able to quickly tackle any strategic challenge such as application modernization, workplace innovation, business process automation, and customer experience transformation. The OutSystems platform also ensures solutions are secure, resilient, cloud-native, built to scale, and most importantly, are able to be continuously evolved.

Our Client is looking for Site Reliability Engineers (SREs), who combine advanced Software Engineering practices with mature Operations skills in order to deliver and operate highly resilient systems at scale. Ensure that our Cloud services meet the reliability and uptime requirements of our demanding enterprise customers. This is achieved with proactivity, through the practice of sound engineering practices and resilient design from day 0; as well as with reactively, through a well-defined and effective on-call rotation that runs 24×7.

About Quality Engineering Team:

Quality Engineering Center of Excellence is an international award-winning Quality Engineering department that has rapidly evolved over the past 7 years to become a mature Quality Engineering service provider with 250+ professionals working in 7 main directions: QA Consulting and Management, DevOps, Manual, Automation, Support, Performance, Cyber Security and Robotic Process Automation.

Our main principles are:

  • People are over processes and hierarchy
  • Flat and open collaboration/communication increases creativity and brings more value to a business
  • Investing in people and innovations ensures your future
  • Reuse and share your experience – Develop best practices, publicize and follow them

Quality Engineering Center of Excellence is an optimal environment for your professional involvement and growth.

Responsibilities

  • Automate highly scalable and resilient cloud operations that can be executed with no customer downtime
  • Perform blameless root cause analysis on outages and ensure action items are done
  • Fix resiliency problems wherever they are in the product, or collaborate with product teams to do it
  • Monitor customer infrastructure, measuring availability and system health
  • Collaborate with customer support in recovering from escalated outages
  • Troubleshoot complex incidents in highly distributed systems
  • Shorten time to detecting by improving the accuracy of alarms
  • Be a key stakeholder in the design of cloud services so that they are resilient from day 0

Requirements

  • Bachelor or Master Degree in Computer Science or similar
  • 1+ years of experience in software development or operations. Programming skills in at least a high-level programming language (C++, Python, Java, C#, etc.)
  • Experience in troubleshooting and debugging
  • Availability to work in shifts and be part of the 24×7 on-call rotation
  • Fluency in English and good communication skills

Desirable

  • Experience with automation and IaC is a plus (CloudFormation, Terraform, Chef, etc.)
  • Experience with AWS services is a plus (EC2, RDS, Lambda, step-functions, etc.)
  • Experience with Docker and Kubernetes is a plus
  • Experience with monitoring and troubleshooting complex distributed systems
  • Experience in designing resilient and fault-tolerant systems
  • Experience in debugging complex, distributed systems

What's in it for you

  • Close cooperation with the client
  • A constant flow of new projects
  • Dynamic and challenging tasks
  • Ability to influence project technologies
  • Projects from scratch
  • Team of professionals: learn from colleagues and gain recognition of your skills
  • European management style
  • Continuous self-improvement