System Reliability Engineer for Ciklum Digital

Kyiv, Amosova, Ukraine

Apply

Ciklum is a leading global digital services and software engineering company, serving Fortune 500 and fast-growing organisations. Headquartered in the UK, we unite 3,500+ software developers, designers, product managers and data scientists around the world building tailored digital solutions that leverage emerging technologies.

We are enabling digital transformation for some of the largest household names and platforms in the digital economy. Ciklum is the place to make your tech ideas tangible and join the global projects redefining industries.

We are looking forward to seeing you as a part of our team!

Read more about the client

Description

On behalf of Ciklum Digital, we are looking for a System Reliability Engineer to join our team on a full-time basis.

Project description:

One of the fastest growing B2B software companies in the world, OutSystems is on a mission to change the way software is built. The OutSystems modern application platform empowers customers to build, deliver, manage and evolve the software that makes a difference to their business. With high-productivity, AI-assisted tools, customers are able to quickly tackle any strategic challenge such as application modernization, workplace innovation, business process automation, and customer experience transformation. The OutSystems platform also ensures solutions are secure, resilient, cloud-native, built to scale, and most importantly, are able to be continuously evolved.

Our client is looking for an System Reliability Engineer (SRE) to focus on architecture, instrumentation and management of our observability platform for our distributed systems.  The team goal is to make troubleshooting of production incidents a thing of the past, by enabling other teams to quickly diagnose and fix resiliency, quality, and performance issues of their cloud services and components.

Responsibilities

  • Architect, deploy and manage our Monitoring & Observability platform
  • Define best practices around making our systems and services measurable and work with our various teams to get those best practices applied
  • Collaborate with our global Engineering & Platform teams to ensure our services, platforms and infrastructure are emitting the right metrics
  • Collect, aggregate and visualize the collected metrics to provide actionable insight
  • Contribute to our evolving “data driven” and “cloud first” culture through continuous learning

Requirements

  • 5+ years of experience in Software Engineering, SRE, or DevOps
  • Prior experience with instrumenting mission critical services on a globally distributed level, using public cloud providers like AWS, GCP and more
  • Knowledge of or experience with container orchestrion technologies
  • Prior experience with one or more of the following would be great – Elasticsearch, Cloud Watch, Prometheus, DataDog, Splunk
  • Prior experience with AIOps is a plus
  • Prior experience integrating to event management systems such as Pager Duty
  • Prior exposure to cloud automation

Desirable

  • Experience with databases
  • Experience with large-scale, real-time applications
  • Experience building and deploying complex systems to cloud and on-premises

What's in it for you

  • Close cooperation with the client
  • A constant flow of new projects
  • Dynamic and challenging tasks
  • Ability to influence project technologies
  • Projects from scratch
  • Team of professionals: learn from colleagues and gain recognition of your skills
  • European management style
  • Continuous self-improvement