Projects

Selected research projects from the lab.

Libra project visual

Libra

Libra is a temporally-aware multimodal large language model designed for chest X-ray report generation. By leveraging temporal sequences of medical images, Libra enhances the understanding of disease progression and improves the accuracy of automated radiology report generation. This approach aids clinicians by providing more context-aware and consistent reporting over time.

CancerMine project visual

CancerMine

This project uses natural language processing to identify cancer genes and their roles in different cancers (e.g. as drivers, oncogenes or tumor suppressors). This information can be used to help identify important cancer mutations and understand the underlying genetics of different cancer types. This work was published in Nature Methods.

CIViCmine project visual

CIViCmine

Precision oncology enables scientists and clinicians to probe the genetics of individual patients' tumours. But the clinical relevance of each mutation can be hard to ascertain and reference to the latest research is often required. The CIViC database aims to curate this expert knowledge. As part of this, the CIViCmine project uses natural language processing to identify mentions of cancer mutations and their clinical impacts. This work was published in Genome Medicine.

CoronaCentral project visual

CoronaCentral

During the coronavirus pandemic, a vast number of research papers were published related to COVID-19. The CoronaCentral resource categorized these using a BERT-based classifier along with a unique dataset of categorized documents. It provided a portal to explore these research papers. This work was published in PNAS.

FusionDTI project visual

FusionDTI

Predicting drug-target interaction (DTI) is critical in the drug discovery process. Despite remarkable advances in recent DTI models through the integration of representations from diverse drug and target encoders, such models often struggle to capture the fine-grained interactions between drugs and proteins. To address this issue, we introduce FusionDTI, which uses a token-level Fusion module to effectively learn fine-grained information for drug-target interaction. This work was presented at the ICML 2024 AI for Science Workshop.

FusionGDA project visual

FusionGDA

Identifying associations between genes and diseases is critical for diagnosis, prevention, prognosis and drug development. We propose FusionGDA, which utilises a pre-training phase with a fusion module to enrich the gene and disease semantic representations encoded by pre-trained language models. This work was published in Briefings in Bioinformatics.

BPP project visual

BPP

Biological pathways are a series of interconnected biochemical reactions that support life activities. Current research relies heavily on experiments and manual analysis, overlooking the rich topological information within pathway networks. We develop a Biochemical Pathway Prediction (BPP) platform to automatically identify potential connections within these pathways. BPP can predict participants or products in biochemical reactions and includes tools to interpret these predictions. This work was published in Briefings in Bioinformatics.