Projects

CancerMine

This project uses natural language processing to identify cancer genes and their roles in different cancers (e.g. as drivers, oncogenes or tumor suppressors). This information can be used to help identify important cancer mutations and understanding the underlying genetics of different cancer types. This work was published in Nature Methods. The dataset can be explored through this web viewer, the code is available at GitHub and the dataset is available at Zenodo
icon for the cancer mine project

CIViCmine

Precision oncology enables scientists and clinicians to probe the genetics of individual patients’ tumours. But the clinical relevance of each mutation can be hard to ascertain and reference to the latest research is often required. The CIViC database aims to curate this expert knowledge. As part of this, the CIViCmine project uses natural language processing to identify mentions of cancer mutations and their clinical impacts. This work was published in Genome Medicine. The dataset can be explored through this web viewer, the code is available at GitHub and the dataset is available at Zenodo.
icon for the civicmine project

CoronaCentral

During the coronavirus pandemic, a vast number of research papers were published related to COVID-19. The Corona Central resource categorized these using a BERT-based classifier along with a unique dataset of categorized documents. It provided a portal to explore these research papers. This work was published in PNAS. The code can be accessed at GitHub and the corpus of documents at Zenodo.
icon for the corona central project