Main
David Zhang
Bioinformatics software engineer with experience operating across the entire software development lifecycle. Skilled in prototyping and benchmarking innovative solutions, as well as implementing, testing, and integrating software into production-ready pipelines.
Work Experience
Senior bioinformatics engineer
London, UK (hybrid)
Present - 2024
- Optimised and scaled machine learning tools to extract actionable insights from single-cell Perturb-seq datasets comprising millions of cells. Directed the project end-to-end, integrating findings to inform strategic decisions and guide company direction.
- Designed and deployed a robust data pipeline that ingested, tidied and version-controlled data for the Neo4j knowledge graph. Automated the deployment of this graph via CI/CD using Terraform, enabling automated releases to AWS enhancing reproducibility and operational efficiency.
Senior bioinformatics software engineer
Hinxton, UK (hybrid)
2024 - 2022
- Designed, benchmarked, and productionized scalable bioinformatics pipelines in Nextflow to process solid tumor sequencing data. Pipelines included alignment, variant calling, driver mutation annotation, and therapy matching, supporting clinical and translational applications.
- Built a suite of Python and R packages to automate the clinical verification process, enabling earlier detection and resolution of issues. This automation reduced verification time by from 1 month per quarterly release, significantly accelerating the development cycle.
Bioinformatician internship (2 months)
London, UK (remote)
2021
- Created a reproducible aberrant splicing detection pipeline using docker for drug target discovery in C9orf72 ALS patients.
Education
PhD, Bioinformatics
University College London
London, UK
2022 - 2017
- Thesis: Using transcriptomics to improve the genetic diagnosis rate of rare disease patients.
- Developed ggtranscript, an open-source R package for visualizing transcript structures, which has recieved 150+ stars on GitHub and 250+ citations.
MSc, Neuroscience
University College London
London, UK
2016 - 2015
- Thesis: The role of mitochondrial dysfunction in Xerodoma pigmentosum
- Grade: Merit (68%)
- Awarded post-graduate support scheme bursary (£10,000)
BSc, Biomedical science
University College London
London, UK
2015 - 2012
- Thesis: Investigating the function of CYFIP1 in the development of rat hippocampal neurons.
- Grade: 2:1 (69%)
H.S.
Queen Elizabeth’s School
Barnet, UK
2012 - 2007
- Grades: Maths (A*), Biology (A*), Chemistry (A*), Sociology (A).
Software & programming
Portfolio website
N/A
N/A
Present - 2022
- My website is built using Django/Bootstrap 5, deployed with Heroku and showcases the five projects I’m most fond of.
Python packages
N/A
N/A
2023 - 2021
- codino converts a codon design to the expected amino acid frequencies, and vice versa. Author.
- autogroceries: Use Selenium to automate your grocery shop. Author.
- stravaboard: A dashboard for flexibly displaying and tracking Strava runs built using Streamlit. Author.
R packages
N/A
N/A
2022 - 2020
- ggtranscript: Visualising transcript structure and annotation using ggplot2. Author.
- megadepth: BigWig and BAM related utilities. An R wrapper for the megadepth software developed by Chris Wilks. Co-author.
- dasper: Detection of aberrant splicing events in RNA-sequencing. Author,
Selected Publications
A complete list of publications is available via Google Scholar
ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2
Bioinformatics
N/A
2022
- Gustavsson EK, Zhang D, Reynolds RH, Garcia-Ruiz S, Ryten M
- Role: Co-first author
- DOI: https://doi.org/10.1056/NEJMoa1915722
Developmental Consequences of Defective ATG7-Mediated Autophagy in Humans
The New England Journal of Medicine
N/A
2021
- Collier J, Guissart C, Oláhová M, Sasorith S, Piron-Prunier F, Suom Fi, Zhang D, Martinez-Lopez N, Leboucq N, Bahr A, Azzarello-Burri S, Reich S, Schöls L, Polvikoski TM, Meyer P, Larrieu L, Schaefer AM, Alsaif HS, Alyamani S, Zuchner S, Barbosa IA, Deshpande C, Pyle A, Rauch A, Synofzik M, Alkuraya FS, Rivier F, Ryten M, McFarland R, Delahodde A, McWilliams TG, Koenig M, and Taylor RW.
- Role: Co-first author
- DOI: https://doi.org/10.1093/bioinformatics/btac409
Megadepth: efficient coverage quantification for BigWigs and BAMs
Bioinformatics
N/A
2021
- Wilks C, Ahmed O, Baker DN, Zhang D, Collado-Torres L, Langmead B.
- Role: R package developer.
- DOI: https://doi.org/10.1093/bioinformatics/btab152
Incomplete annotation of disease-associated genes is limiting our understanding of Mendelian and complex neurogenetic disorders.
Science advances
N/A
2020
- Zhang D, Guelfi S, Ruiz SG, Costa B, Reynolds RH, D’Sa K, Liu W, Courtin T, Peterson A, Jaffe AE, Hardy J, Botia JA, Collado-Torres L and Ryten M.
- Role: First Author.
- DOI: https://doi.org/10.1126/sciadv.aay8299