Senior Data Ops Engineer

Location London
Discipline: Machine Learning
Contact name: Tom Goldberg

Contact email: tom@engima-rec.ai
Published: about 1 month ago

​​We are hiring for Synthesia, the world’s #1 AI video generation platform. A video production studio — in a browser. As in, no cameras or film crews at all. Synthesia, can build personalised on-the-fly videos, give your chatbot a human face or run 24/7 weather channels in different languages, to name just a few of the possibilities.

What will you be doing?
Synthesia are expanding their ML Platform team to supercharge large model training on PB scale datasets and are looking for an outstanding Senior DataOps Engineer to join their team to manager their audio and video data platform.
The ideal candidate will be responsible for ensuring the efficient architecture, operation and maintenance of this platform, including data ingestion, processing, storage, and retrieval.
They will work closely with cross-functional teams to implement best practices in data management, monitor system performance, troubleshoot issues, and optimize workflows to support data analysis and application development.

What will you be doing?

  • Data Ops for data management, versioning, usage tracking, logging.

  • Setup of a data-lake and data transform pipelines for large scale audio-visual datasets.

  • Integration of 3d party annotation services for continuous data annotation and active learning.

  • Setup of metadata stores and APIs to access data-sets on demand for ML training.

  • Support for data streaming to train large models.

  • Data pipelines - deploy custom ML data transformations, working with our ML team.

  • Data access - create transient data-sets on demand to support ML model training.

  • Data tracking - usage tracking and monitoring across all data sources.

  • Establish the workflow for continual data delivery and annotation.

Who are you?

  • 5+ years minimum experience in Data Engineering / Data Ops / Data Science.

  • Been involved in managing large scale datasets not just one-off data collection tasks, you have seen continuous data collection.

  • Been responsible for setting up data ops (ingest / storage / transform / access) endto-end for multiple teams.

  • Seen audio/video data and understand managing audio/video data at PB scale.

  • Experience with Streaming / Batch Data Pipelines (Airflow, Apache Beam, Spark etc.).

  • Experience with event-driven systems.

  • Experience in handling heterogeneous types of data (e.g. audio / text / video / tabular data).

  • Experience with any type of RDBMs (Relational Database Management Systems).

  • Outstanding communication skills.

Benefits

  • You will be compensated well (salary + stock options + bonus)

  • Your will get Private Health Insurance

  • You will work in a hybrid setting with an office in London

  • You get a cycle to work salary sacrifice scheme to commute to the office

  • You get 25 days of annual leave + public holidays

  • You will join an established company culture with regular socials and company retreats

  • You can participate in a generous referral scheme

  • You will have huge opportunities for your career growth