Main
David Zhang
Bioinformatics software engineer who has experience developing production-ready pipelines and software in python or R.
Work Experience
Senior bioinformatics software engineer
Hinxton, UK (hybrid)
Present - 2022
- Goal: Developing, benchmarking and productionising bioinformatic pipelines for the precision oncology product.
- Engineering nextflow and snakemake pipelines that perform alignment, variant calling, driver annotation and therapy matching using solid tumour sequencing data.
Machine learning engineer
London, UK (remote)
2022
- Goal: Implemented python packages that leverage RNA biology and chemistry to accelerate drug discovery.
Bioinformatician internship
London, UK (remote)
2021
- Goal: Set up an aberrant splicing detection pipeline for drug target discovery in C9orf72 ALS patients.
- Used docker to setup a reproducible workflow for running aberrant splicing analyses on an AWS instance.
Research Technician
University College London
London, UK
2017 - 2016
- Goal: Investigate the impact of genetic variation on the age of onset of dementia and cognition within Down syndrome patients.
Education
PhD, Bioinformatics
University College London
London, UK
2022 - 2017
- Thesis: Using transcriptomics to improve the genetic diagnosis rate of rare disease patients.
- Developed and released software that facilitate transcriptomic analyses with a focus on diagnostics.
MSc, Neuroscience
University College London
London, UK
2016 - 2015
- Thesis: The role of mitochondrial dysfunction in Xerodoma pigmentosum
- Grade: Merit (68%)
- Awarded post-graduate support scheme bursary (£10,000)
BSc, Biomedical science
University College London
London, UK
2015 - 2012
- Thesis: Investigating the function of CYFIP1 in the development of rat hippocampal neurons.
- Grade: 2:1 (69%)
Queen Elizabeth’s School
A-levels
Barnet, UK
2012 - 2007
- Grades: Maths (A*), Biology (A*), Chemistry (A*), Sociology (A).
Software & programming
Portfolio website
N/A
N/A
2022
- My website is built using Django/Bootstrap 5, deployed with Heroku and showcases the five projects I’m most fond of.
R packages
N/A
N/A
2022 - 2021
- ggtranscript: Visualising transcript structure and annotation using ggplot2. Author and maintainer.
- autorecipes: Automate your recipe planning. Author and maintainer.
- ODER: Optimising the definition of Expressed Regions. Submitted to Bioconductor. Co-author and maintainer.
Bioconductor packages
N/A
N/A
2022 - 2020
- megadepth: BigWig and BAM related utilities. An R wrapper for the megadepth software developed by Chris Wilks. Co-author.
Python packages
N/A
N/A
2022 - 2021
- autogroceries: Automate your grocery shop. Author and maintainer.
- stravaboard: A dashboard for flexibly displaying and tracking Strava runs built using Streamlit. Author and maintainer.
- codino converts a codon design to the expected amino acid frequencies, and vice versa. Author and maintainer.
Web scraping
N/A
N/A
2021
- Applied the python packages Beautiful Soup and Selenium to web scrape information on all UK biotechnology companies.
Teaching Experience
Developing Bioconductor packages
University College London
Virtual Event
2020
R package development
Rstats club
Virtual Event
2020
- Presentation about unit testing fundamentals, the importance of testing and new features released in the R package testthat edition 3.
- Presentation about pre-commit hooks in R.
- Presentation about the best practices of developing R packages.
R fundamentals
Clinician Coders
London, UK
2020 - 2018
- Developed materials and lead workshops that aimed to teach R fundamentals to clinicians.
RNA-sequencing for diagnostics
Kings College London
London, UK
2020 - 2017
- Lectured graduate level students about how transcriptomics can be applied in the diagnostic pipeline.
Selected Publications
A complete list of publications is available via Google Scholar
ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2
Bioinformatics
N/A
2022
- Gustavsson EK, Zhang D, Reynolds RH, Garcia-Ruiz S, Ryten M
- Role: Co-first author.
- DOI: https://doi.org/10.1093/bioinformatics/btac409
Developmental Consequences of Defective ATG7-Mediated Autophagy in Humans
The New England Journal of Medicine
N/A
2021
- Collier J, Guissart C, Oláhová M, Sasorith S, Piron-Prunier F, Suom Fi, Zhang D, Martinez-Lopez N, Leboucq N, Bahr A, Azzarello-Burri S, Reich S, Schöls L, Polvikoski TM, Meyer P, Larrieu L, Schaefer AM, Alsaif HS, Alyamani S, Zuchner S, Barbosa IA, Deshpande C, Pyle A, Rauch A, Synofzik M, Alkuraya FS, Rivier F, Ryten M, McFarland R, Delahodde A, McWilliams TG, Koenig M, and Taylor RW.
- Role: Analyst
- DOI: https://doi.org/10.1056/NEJMoa1915722
Megadepth: efficient coverage quantification for BigWigs and BAMs
Bioinformatics
N/A
2021
- Wilks C, Ahmed O, Baker DN, Zhang D, Collado-Torres L, Langmead B.
- Role: R package developer.
- DOI: https://doi.org/10.1093/bioinformatics/btab152
Incomplete annotation of disease-associated genes is limiting our understanding of Mendelian and complex neurogenetic disorders.
Science advances
N/A
2020
- Zhang D, Guelfi S, Ruiz SG, Costa B, Reynolds RH, D’Sa K, Liu W, Courtin T, Peterson A, Jaffe AE, Hardy J, Botia JA, Collado-Torres L and Ryten M.
- Role: First Author.
- DOI: https://doi.org/10.1126/sciadv.aay8299