Main

David Zhang

By bridging bioinformatics and engineering, I translate genetic and transcriptomic data into software that delivers real-world impact. I have lead cross-functional projects across the full software development lifecycle from prototyping innovative solutions to implementing and maintaining robust, production-ready pipelines.

Work Experience

Senior bioinformatics engineer

CoSyne Therapeutics

London, UK (hybrid)

Present - 2024

Lead the optimisation and scaling of machine learning tools for single-cell perturb-seq data comprising millions of cells. Collaborate closely with AI, engineering, and computational biology teams, ensuring key internal stakeholders are consistently informed of progress. Apply these tools to generate actionable insights and inform strategic decisions around company direction.
Design and deploy a data pipeline to ingest, tidy and version-control data for the CoSyne knowledge graph. Automate the release of the graph to AWS using terraform and CI/CD, improving the efficiency and traceability of data updates.
Build and maintain infrastructure tooling including docker images, terraform modules, CI/CD workflows and cruft templates to streamline bioinformatics analyses.

Senior bioinformatics software engineer

Congenica

Hinxton, UK (hybrid)

2024 - 2022

Developed scalable nextflow pipelines to process solid tumor DNA-sequencing data covering alignment, variant calling, driver mutation annotation, and therapy matching.
Collaborated with clinical and bioinformatics teams to investigate driver variant misclassifications. Led the design, refinement, and implementation of solutions within an agile scrum team, effectively translating complex scientific concepts for engineers without a bioinformatics background to ensure accurate and aligned development.
Built python and R packages to improve the efficiency of clinical verification, reducing time taken by 2 weeks per quarterly release.

Bioinformatician internship (2 months)

Verge Genomics

London, UK (remote)

2021

Created a reproducible aberrant splicing detection pipeline using docker for drug target discovery in C9orf72 ALS patients.

Education

PhD, Bioinformatics

University College London

London, UK

2022 - 2017

Analysed bulk RNA-sequencing data with the aim of improving the diagnosis rate of rare disease patients. Focussed on detection of abberant splicing events as a strategy to prioritise pathogenic variants.
Released R/Bioconductor packages that enable bioinformatics analyses and interpretation. Championed best practices for software development through teaching workshops and courses.

MSc, Neuroscience

University College London

London, UK

2016 - 2015

Grade: Merit (68%)

BSc, Biomedical science

University College London

London, UK

2015 - 2012

Grade: 2:1 (69%)

Open-source software

Web development

N/A

Present - 2022

Portfolio website: Showcases my favourite open-source contributions. Built with Django and deployed using PythonAnywhere.
automeals: Automate your meal planning.

Rust packages

N/A

2024

tuni: Unify transcript identifiers across different samples.

Python packages

N/A

2023 - 2021

autogroceries: Use Playwright to automate your grocery shop.
rightprice: Retrieve the prices of sold properties in the UK.
gust: Generate offshore wind news reports using Claude.
stravaboard: An extendable Streamlit dashboard for tracking Strava runs.

R packages

N/A

2022 - 2020

ggtranscript: Visualising transcript structure and annotation using ggplot2.
dasper: Detection of aberrant splicing events in RNA-sequencing data.

Selected Publications

A complete list of my publications is available via Google Scholar

ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2

Bioinformatics

N/A

2022

Role: Co-first author, R package developer.

Developmental Consequences of Defective ATG7-Mediated Autophagy in Humans

The New England Journal of Medicine

N/A

2021

Role: Analyst

Incomplete annotation of disease-associated genes is limiting our understanding of Mendelian and complex neurogenetic disorders

Science advances

N/A

2020

Role: First author, lead analyst.

Aside

Contact

Languages

Technologies

Disclaimer

Main

David Zhang

Work Experience

Senior bioinformatics engineer

Senior bioinformatics software engineer

Bioinformatician internship (2 months)

Education

PhD, Bioinformatics

MSc, Neuroscience

BSc, Biomedical science

Open-source software

Web development

Rust packages

Python packages

R packages

Selected Publications

ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2

Developmental Consequences of Defective ATG7-Mediated Autophagy in Humans

Incomplete annotation of disease-associated genes is limiting our understanding of Mendelian and complex neurogenetic disorders