Skip to main content

DS4S - Data Science for Science

Wednesday, 25.5.2022 - 4 pm

Hanna Voß will present her data science research on Mass Spectrometry

We, the graduate school Data Science in Hamburg - Helmholtz Graduate School for the Structure of Matter (DASHH), the Leibniz ScienceCampus InterACt (LSC InterACt) and the Center for Data and Computing in Natural Sciences (CDCS) are currently organizing a new series of networking events entitled Data Science for Science (DS4S) to enable exchange and networking across disciplines covered in the metropolitan region Hamburg. Therefore, we invite all interested researchers to join the biweekly events which are scheduled on Wednesdays at 4 pm.

Researchers from all partner institutions are invited to present different aspects of (envisioned) data science research and the corresponding application fields. The new series is an extension of our former Hamburg COVID-19 Series, which received a lot of attention and positive feedback, although having a much broader scope now. The event is currently organized online, but we hope that we can extend our series to an on-site event with talks and a subsequent networking event by June.

The next lecture will be presented by Hannah Voß (Clinical Chemistry and Laboratory Medicine, University Medical Center Hamburg-Eppendorf) on May 25th, 2022 at 4 pm. In their talks, they will introduce recent data science challenges in the field of mass spectrometry. The title of the presentation is "Data integration of in-house and publicly available proteome data across tissue types, quantification techniques and experimental setups overcomes cohort size limitations and enables valid statistical analysis for rare samples".

Investing the proteome is crucial for the understanding of molecular changes in diseases, as the proteome represents pharmacologically addressable phenotype.Small cohorts limit the usability and validity of statistical methods especially for rare disease, while variable technical setups and high numbers of missing values make data integration from public sources challenging. Here, we show for the first time the successful integration of proteomic data across different tissue types (Fresh Frozen, Formalin Fixated Paraffin embedded (FFPE)), quantification platforms (DDA, DIA, SILAC, TMT) and technical setups, while handling missing data without the need for error prone imputation. The developed framework can remove technical batch effects trough Bayesian framework or linear regression model and is adaptable to different data probability distributions-according to the user’s needs.
Based on different datasets we show that data integration across independent proteomic cohorts can help to identify subpopulations and to disclose molecular signatures and altered pathways in biomarker discovery studies.

If you want to contribute to this series and looking for collaborations with researchers from Hamburg concerning a specific method or application field, feel free to contact us via the contact button.

If you are interested in our talks and want to receive regular updates on this series, subscribe to our mailing list

We are especially grateful for the support of this series by the Joachim Herz Foundation.