Main
David Zhang
By bridging bioinformatics and engineering, I translate genetic and transcriptomic data into software that delivers real-world impact. I have lead cross-functional projects across the full software development lifecycle from prototyping innovative solutions to implementing and maintaining robust, production-ready pipelines.
Work Experience
Senior bioinformatics engineer
London, UK (hybrid)
Present - 2024
- Lead the optimisation and scaling of machine learning tools for single-cell perturb-seq data comprising millions of cells. Collaborate closely with AI, engineering, and computational biology teams, ensuring key internal stakeholders are consistently informed of progress. Apply these tools to generate actionable insights and inform strategic decisions around company direction.
- Design and deploy a data pipeline to ingest, tidy and version-control data for the CoSyne knowledge graph. Automate the release of the graph to AWS using terraform and CI/CD, improving the efficiency and traceability of data updates.
- Build and maintain infrastructure tooling including docker images, terraform modules, CI/CD workflows and cruft templates to streamline bioinformatics analyses.
Senior bioinformatics software engineer
Hinxton, UK (hybrid)
2024 - 2022
- Developed scalable nextflow pipelines to process solid tumor DNA-sequencing data covering alignment, variant calling, driver mutation annotation, and therapy matching.
- Collaborated with clinical and bioinformatics teams to investigate driver variant misclassifications. Led the design, refinement, and implementation of solutions within an agile scrum team, effectively translating complex scientific concepts for engineers without a bioinformatics background to ensure accurate and aligned development.
- Built python and R packages to improve the efficiency of clinical verification, reducing time taken by 2 weeks per quarterly release.
Bioinformatician internship (2 months)
London, UK (remote)
2021
- Created a reproducible aberrant splicing detection pipeline using docker for drug target discovery in C9orf72 ALS patients.
Education
PhD, Bioinformatics
University College London
London, UK
2022 - 2017
- Analysed bulk RNA-sequencing data with the aim of improving the diagnosis rate of rare disease patients. Focussed on detection of abberant splicing events as a strategy to prioritise pathogenic variants.
- Released R/Bioconductor packages that enable bioinformatics analyses and interpretation. Championed best practices for software development through teaching workshops and courses.
MSc, Neuroscience
University College London
London, UK
2016 - 2015
- Grade: Merit (68%)
BSc, Biomedical science
University College London
London, UK
2015 - 2012
- Grade: 2:1 (69%)
Open-source software
Web development
N/A
N/A
Present - 2022
- Portfolio website: Showcases my favourite open-source contributions. Built with Django and deployed using PythonAnywhere.
Python packages
N/A
N/A
2023 - 2021
- autogroceries: Use Selenium to automate your grocery shop.
- stravaboard: An extendable Streamlit dashboard for tracking Strava runs.
R packages
N/A
N/A
2022 - 2020
- ggtranscript: Visualising transcript structure and annotation using ggplot2.
- dasper: Detection of aberrant splicing events in RNA-sequencing data.
Selected Publications
A complete list of my publications is available via Google Scholar
ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2
Bioinformatics
N/A
2022
- Role: Co-first author, R package developer.
Developmental Consequences of Defective ATG7-Mediated Autophagy in Humans
The New England Journal of Medicine
N/A
2021
- Role: Analyst
Megadepth: efficient coverage quantification for BigWigs and BAMs
Bioinformatics
N/A
2021
- Role: R package developer
Incomplete annotation of disease-associated genes is limiting our understanding of Mendelian and complex neurogenetic disorders
Science advances
N/A
2020
- Role: First author, lead analyst.