Oferta pracy

Data Engineer

Samsung R&D Institute Poland

Samsung R&D Institute Poland

plac Europejski 1

Wola

Warszawa

Technologies we use

Expected

  • SQL

  • Python

  • Linux

About the project

About our Team

We invite you to the one of the largest speech and language processing teams in Europe. We work closely with other R&D teams to develop and test our next-generation personal Intelligent Assistant. In our lab engineers, researchers, and linguists work together on innovative products for the multilingual European market. We define the way users access, explore and interact with devices, knowledge, information, and services.

Technologies in use

• Python (Pandas, pytest, multiprocessing)

• PostgreSQL

• Jenkins

• Bash

• Docker

Your responsibilities

  • Design and implementation of data workflows for continuous, scalable data ingestion, integration, validation and delivery of data products for ML algorithms development.

  • Design and evaluation of data workflow architectures

  • Development of automation solutions for “human-in-the-loop” NLP data production and quality assurance e.g. annotation, speech transcription, translation etc.

  • Development of ML algorithms for increasing efficiency of NLP data production process.

  • Collaboration with external companies, language experts and other R&D centers.

Our requirements

  • MSc or BA in Computer Science, Signal Processing, Electronic Engineering, Sound Engineering or equivalent.

  • Several years of hands-on experience in software engineering and database management

  • Several years of experience with SQL, Python and Linux.

  • Experience in building data ingestion pipelines from multiple sources.

  • Experience in building data processing pipelines comprising multiple formats of data.

  • Experience with continuous delivery tools like Jenkins.

  • Experience with code versioning tools, such as Git.

  • Understanding of data modelling, architecture, workflow management (orchestration) solutions.

  • Ability to write test-driven reusable code that is easy to maintain and well documented.

  • Ability to work effectively in a multi-disciplinary and multi-cultural team.

  • Knowledge of English at a level that enables reading and writing technical documentation.

Optional

  • Experience in managing and hosting services on Amazon Web Services

  • Experience in writing scripts in Bash

  • Database Reliability Engineering knowledge/experience.

  • Service Reliability Engineering knowledge/experience.

  • Experience/knowledge of ASR frameworks (Kaldi, wav2letter, etc.)

  • Experience/knowledge of audio signal processing techniques

  • Experience with scalable, distributed data infrastructures using SQL and NoSQL databases.

  • Experience in development of solutions relying on Machine Learning algorithms.

  • Experience with data workflow management tools e.g. Airflow, Nifi, Luigi, Prefect, Flyte etc.

What we offer

  • Friendly working atmosphere

  • Wide range of trainings and a huge support in developing algorithmic skills

  • Opportunity to work in multiple projects

  • Multidisciplinary team (UX, UI, PO, Devs, Architects)

  • Working with the latest technologies on the market

  • Weekly developers’ meetups named BUG about newest trends (frameworks, skills etc.)

  • Monthly integration budget

  • Possibility to attend local and foreign conferences

  • Start of work between 7 a.m. and 10 a.m.

Benefits:

  • Private medical care (possibility to add family members for free)

  • Multisport card

  • Life insurance

  • Lunch card

  • A partial reimbursement of the cost of an English language course

  • Possibility to learn Korean for free

  • Variety of discounts (Samsung products, theaters, restaurants)

  • Unlimited free access to Copernicus Science Center for you and your friends

  • Possibility to test new Samsung products

Equipment:

  • Laptop and PC workstation 2 external monitors

  • OS: Linux, Windows

Location:

  • Office in Warsaw Spire near metro station

  • Currently we work remotely until further notice