Site Reliability Engineers
Digital and Developer platform group in the company is one team that strongly works towards next-gen payments and believes in its slogan "It's Everywhere You Want to Be," for making payments accessible everywhere and for everyone. This group innovates technology that improves the lives of millions of people around the world for the payment ecosystem. The desired candidate will be part of this journey of our team and will be contributing to achieve the same. This role is in Site Reliability Engineering (SRE) team which focusses on the digital products from reliability, availability, performance and efficiency perspective. The Position:
- Engage with product, architects, developers, Certification, Project management, Operations & Infrastructure teams from the start of the SDLC phase.
- Become subject matter expert for the assigned product verticals. Analyse complex systems from a reliability and resilience perspective.
- Run the production environment by monitoring availability and taking a holistic view of system health
- Understanding the end-to-end product topology from infrastructure and application perspective.
- Identify sources of instability in large-scale distributed systems and drive operational excellence. Dive deep and understand every issue occurred and own them completely for end-to-end closure.
- Performing functional analysis of products by gathering and analyzing metrics from both operating systems and applications to assist in performance tuning and fault finding - integration/operational challenges.
- Performing code bug fixes in production and recommending any architectural improvements during issue/incident analysis.
- Work closely with development and product teams on suggesting new features and enhancements based on live issues.
- Drive down the burden of toil with tooling and automation to achieve operational efficiency and smoother customer experience.
- Technical consultancy for monitoring, incidents and problem management. Lead technical bridges and interact with both technical staff and management during the incident and change management process.
- Provide Level 3 on-call support (within working hours only, over weekends once in a quarter),
- Engage with tech and non-tech partners on regular basis to analyze functional and technical in-depth solutions.
- Understanding new changes in production systems and assessing its risk from application perspective for driving reliability and availability
- Have some level of network engineering understanding to assist in incident/issue triaging
- Provide guidance and technical expertise to junior team members.
- 2 or more years of work experience with a Bachelor's Degree or an Advanced Degree (e.g. Masters, MBA, JD, MD, or PhD)
- Must have at least 2 years of professional working experience in Java environment
- Coding experience beyond simple scripting
- 3 or more years of work experience with a Bachelor's Degree or more than 2 years of work experience with an Advanced Degree (e.g. Masters, MBA, JD, MD)
- 3+ years of technical support and development experience with Java, Automation, bug fixing, Production & Application support
- Work Hours This position requires the incumbent to provide 6 hours of on-call support during weekdays (not more than 2 days in a week) between 9am to 9pm on rotational basis. In either case the incumbent does not exceed the work time of 9 hours in a day. This also includes weekend on-call support which generally comes once in 2-3 months. Compensatory leaves are eligible for weekend on-call support.
For more information you can email Ganjapan Chamsaeng
in our Singapore office on ganjapan.chamsaeng@tekystems.
com quoting Job Reference 527377
or alternatively, apply here to register your interest.
http://jobs.en-sg.teksystems.com/o4qPdr/site-reliability-engineers-itcommunications-singapore-14866492 Allegis Group Singapore Pte Ltd, Company Reg No. 200909448N, EA License No. 10C4544