Site Reliability Engineer - Marcus

  • Competitive
  • London, England, Großbritannien
  • Festanstellung, Vollzeit
  • Goldman Sachs International
  • 25 Apr 19

See job description for details

MORE ABOUT THIS JOB Consumer Digital Finance Technology, Investment Management Division (CIMD).

The Investment Management Division, comprised originally of Goldman Sachs Asset Management and Private Wealth Management, provides asset management and wealth management solutions to world-class institutions and individual investors globally. The division is now the home of the new Consumer Digital Finance portfolio, helping customers meet their personal financial goals through the provision of innovative retail banking products and solutions.

Consumer Digital Finance is comprised of the firm's digitally-led consumer businesses, which include the Marcus retail deposits and lending businesses, as well as the personal financial management app, Clarity Money. Digital Finance combines the strength and heritage of a 149-year-old financial institution with the agility and entrepreneurial spirit of a tech start-up. Through the use of machine learning and intuitive design, we provide customers with powerful tools that are grounded in value, transparency and simplicity to help them make smarter decisions about their money.

In 2018 the Marcus business was brought to the UK in the form of a simple and accessible savings account; this is just the beginning of the consumer business expanding into the UK and Europe.


  • You will be responsible for incident management - response, analysis, resolution and communication; Availability, performance and monitoring of the platforms within Consumer Banking.
  • I ncident response responsibility include defining and managing the processes for coordinating, and resolving technology incidents, minimizing the impact to business operations. You will provide oversight on incidents, root cause analysis and follow ups and will provide visibility to senior leadership with a UK focus.
  • You will ensure that before going to production, systems meet supportability requirements, including documentation, runbooks, monitoring, logging, high availability configuration and can pass required disaster recovery plans.
  • Ef fectiveness is measured through site availability, mean time to recover and mean time to failure as well as tracking resolution actions through to closure in order to mitigate production risk.
  • In addition, your team is responsible for working closely with central engineering teams to develop and enhance automation capabilities that address the need to detect and recover as quickly as possible.
  • Vendor relationships and SLA management are also essential as relates to infrastructure capabilities.

  • BS c or Masters degree.
  • 5+ years recent hands on experience in software industry.
  • 3+ years of experience with monitoring and service management tools such as PagerDuty, AppDynamics, Splunk and Service Now.
  • Hands on expertise in Tableau or other data visualization tools.
  • Maintain services once they are in production by measuring and monitoring availability, latency and overall system health.
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
  • Experience in designing, analyzing and troubleshooting large-scale distributed systems.
  • Experience and curiosity in debugging and optimizing code for routine task automation.
  • Ex cellent Communications skills are essential - both verbal and written.
  • You must have experience managing multiple tasks and using sound judgment when prioritizing and escalating.
  • You must have extensive experience of issue investigation and root cause analysis of a technical nature.
  • You must be able to work with deeply technical engineers, identify gaps that need addressing, and hold them to account.
  • A successful candidate will be able to balance project management trade-offs, own decisions and communicate effectively with senior stakeholders across business, partners, vendors, internal technology stakeholders and technology peers, with an eye towards influencing and driving positive business outcomes.
  • En ergetic, self-directed and self-motivated, able to build and sustain long-term relationships with clients and colleagues.
  • Intuitively coalesce towards problems with an open mind, within the context of a team.
  • Exceptional analytical skills, able to apply knowledge and experience in decision-making to arrive at creative and commercial solutions.
  • Strong desire to learn and contribute solutions and ideas to a broad team.
  • Ex posure to Lean, Agile, Six Sigma, BPM and ITIL preferable.
  • Ex perience of development or live operations (reading code, debugging etc.) relating to in one or more of the following technologies is a plus: Python, JavaScript, Java, SQL, NoSQL, linux shell scripting, terraform, kubernetes, docker, http and networking stacks.

ABOUT GOLDMAN SACHS The Goldman Sachs Group, Inc. is a leading global investment banking, securities and investment management firm that provides a wide range of financial services to a substantial and diversified client base that includes corporations, financial institutions, governments and individuals. Founded in 1869, the firm is headquartered in New York and maintains offices in all major financial centers around the world.

© The Goldman Sachs Group, Inc., 2019. All rights reserved Goldman Sachs is an equal employment/affirmative action employer Female/Minority/Disability/Vet.