Engineer in Deep Learning models for precision oncology
Descriptif du poste
Centralized and Distributed Deep Learning models for precision oncology
Over the past 10 years, technological advances in digital pathology and DNA sequencing have enabled the generation of large amounts of histological images and genomic data. In the field of oncology, these data are now of great value for diagnosing cancers, identifying their molecular subtypes and guiding the therapeutic decision. Artificial intelligence methods, and in particular those based on Deep Learning, have recently enabled great advances in the analysis of this data. The usual implementation of these methods is done from large data sets located on a centralized storage space connected to high-performance computing nodes. However, the transfer of these large volumes of sensitive patient data, produced in hospitals, in centralized spaces, outside of hospitals, is associated with several difficulties. Indeed, the large volume of this data as well as the new General Data Protection Regulation, established at European level to protect and control the use of this sensitive patient data, pose challenges related to data security and energy consumption. To try to overcome these difficulties, distributed (or federated) deep learning strategies have been developed in recent years. They make it possible to keep sensitive patient data in their places of production and to transfer in a centralized space only the parameters which are used to train the predictive models. The performance of this distributed strategy has so far been demonstrated on simulated data, on use cases conventionally used to test methods (e.g. MNIST data), but not on histological imaging or genomic patient data.
This project aims to study the performance of Deep Learning models trained in distributed (federated) mode compared to that obtained by these same models trained in centralized mode. This work will be carried out in collaboration with researchers from the European KATY project on personalized medicine (https://katy-project.eu/).
The engineer will work on two use cases whose data is already available and collected. In the first use case, he/she will start from a Deep Learning model already developed, and trained in centralized mode, to segment histological images of cancers in order to locate and quantify different cell populations of interest. He/she will implement a distributed (federated) version of this Deep learning model and evaluate its performance against the one trained in centralized mode. In the second use case, the engineer will work from patient transcriptomic data associated with clinical metadata. He/she will contribute to the development of a Deep learning model to predict the response of patients to drug treatments based on the genetic characteristics of their cancers. He/she will then compare the performance of this algorithm when trained in centralized or distributed mode.
The engineer will be hosted in the “Genetics and Chemogenomics” team of the Interdisciplinary Research Institute of Grenoble (IRIG) of the CEA Grenoble. He/she will be co-supervised by Christophe Battail (IRIG, CEA Grenoble), expert in computational analysis and modeling of genomic data, and by Stéphane Gazut (DM2I/LIST, CEA Saclay), expert in Deep Learning methods. He/she will evolve in a multidisciplinary research environment composed of AI-experts, bioinformaticians and biologists. The engineer will also interact with developers and researchers from the European KATY project.
Technical skills: Big Data architecture (Hadoop), AI/ML/DL algorithms, unix command line and Python programming.
Professional aptitude: curiosity and desire to improve their scientific and technological skills, rigor and organization, and ability to work in a team and interact with other students, engineers and researchers.
12-month work contract starting in January 2023.
How to apply
Envoyer CV et lettre de motivation à