Software

I maintain a growing collection of open-source tools, analysis pipelines, and teaching materials on GitHub. The work spans genomic analysis, clinical modelling, data visualisation, and reproducible research workflows.


Areas of work

Clinical decision support & machine learning

Models and pipelines developed in the context of the Neotree project and related work in clinical AI. Includes data preprocessing, model training, validation frameworks, and performance evaluation in low-resource settings.

Genomics & bioinformatics

Tools for working with genetic and sequencing data — population structure analysis, variant annotation, GWAS workflows, and integration with standard bioinformatics formats (VCF, PLINK, BED/BIM/FAM).

Statistics & teaching materials

Notebooks, tutorials, and worked examples covering statistical methods in R and Python. Designed for researchers learning to apply quantitative methods to their own data — from regression and survival analysis to dimensionality reduction and clustering.

Data visualisation

R (ggplot2) and Python (matplotlib, seaborn, plotly) scripts and templates for producing publication-quality figures and exploratory dashboards from clinical and genomic data.


Languages & tools

Python

pandas · NumPy · scikit-learn · matplotlib · seaborn · plotly · statsmodels · lifelines

R

tidyverse · ggplot2 · Bioconductor · survival · caret · randomForest · lme4

SQL

PostgreSQL · SQLite · data modelling for clinical and research databases

Reproducible research

Git · GitHub Actions · R Markdown · Jupyter · Docker · Quarto


Contribute or collaborate

If you find any of my tools useful, have suggestions, or would like to collaborate on a project, please open an issue on the relevant repository or get in touch directly.