Next-Generation Sequencing and Bioinformatics workshops and training conducted:
Next-generation sequencing and bioinformatics: SARS-CoV-2 – Genome sequencing training using Oxford Nanopore. Training provided to staff, University of Jember, Indonesia.
Next-generation sequencing - Oxford Nanopore Sequencing, Bioinformatics and Molecular Diagnostic Techniques. Conducted at CERID-LAUTECH - Center for Emerging and Re-emerging infectious diseases (https://cerid.lautech.edu.ng/), Humbolt-Bayer Foundation Research Hub, Nigeria and Sudan.
Next-generation sequencing and bioinformatics: SARS-CoV-2 – Genome Sequencing training using Oxford Nanopore. Training provided to PANDORA and CANTAM (https://www.pandora-id.net) partners and assisted in capacities building; Republic of Congo, Zambia, Gabon, Ghana, Nigeria and Sierra-Leone. https://www.pandora-id.net/news
Next-generation sequencing and bioinformatics: SARS-CoV-2 – Genome Sequencing training using Oxford Nanopore. Training provided to VG-CARE and Vietnam Military Medical University (VMMU) staff, Hanoi, Vietnam. https://www.vgcare.org/news-and-events/
TRAINING MODULES A. THEORY
Basic concepts of NGS
Application of genomics in infectious disease research
Metagenomics, targeted sequencing and WGS
Different sequencing platforms (Illumina, ONT and PacBio) and their applications
Special emphasis on Nanopore sequencing platform and its applications
Experimental designs
Standard operating procedures (SOPs)
Internal quality assessment (IQA)
Scaling-up lab infrastructure and computational setup
Cloud computing
Global public genomic databases
Best practices for data sharing
B. WET-LAB MODULE
Experimental designing
Barcoding, library preparation, quantification and pooling
Sequencing of bacterial and viral genomes
16S genome sequencing for bacterial pathogen identification
SARS-CoV-2 sequencing
Bacterial DNA isolation in the context of AMR
C. BIOINFORMATICS MODULE
Introduction to Bioinformatics
Basics on Linux operating system (Ubuntu) and its use in data analysis
Some insights into pipelines and programmes that are commonly applied in developing bioinformatic tools [Python (www.python.org/), R (www.r-project.org/about.html), Bash (www.gnu.org/software/bash/), and visual basic tool (visualstudio.microsoft.com/)]
Links to get familiar with technology and experiments designs
https://nanoporetech.com/ https://nanoporetech.com/nanopore-sequencing-data-analysis https://www.youtube.com/c/OxfordNanoporeTechnologies/playlists E. BIOINFORMATICS 3. Introduction to bioinformatics: The languages currently popular for bioinformatics work are Python, Java, R, Perl, and Bash. Majority of bioinformatics/computational biology software’s are developed only for Linux and are command-line ((i.e., launched by entering a command in a terminal window rather than through Graphical User Interphase). As the common languages used in bioinformatics are Python, R and bash. If you are a beginner in bioinformatics without any prior knowledge in Linux or any programming language, you can still do the data analysis, as the bioinformatics workflows for Nanopore are meant to cater the needs of biologists who wants to do sequence analysis by themselves
Following lectures on YouTube to get a good insight on basics in Bioinformatics
This lecture focusses on analysis of raw data obtained from MinKNOW software
https://nanoporetech.com/resource-centre/bioinformatics-workflows-sars-cov-2-raw-nanopore-reads-consensus-genomes-using Besides basic knowledge in Linux, Miniconda, Docker and GitHub are essential. For editing the code, Visual studio code is also a nice to have software that be easily installed on any OS. Linux and Mac operating systems. However, if your PC has Windows 10 as OS, you could still install Linux and share all your data between the two operating systems without having to reboot the PC. 4. File systems and terminology:
Pipeline: composed of a wide array of software algorithms to process raw sequencing data and generate a consensus and list of annotated sequence variants
Reference sequences: The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences
MultiQC: Aggregate results from bioinformatics analyses across many samples into a single report
Sequence alignment: Aligns the reads/sequences based in the similarities
Contigs: is a set of overlapping DNA segments that together represent a consensus region of DNA
Consensus sequence: A DNA sequence of that represents aligned, related sequences
De novo assembly: De novo sequence assemblers are a type of program that assembles short nucleotide sequences into longer ones without the use of a reference genome.
Environments: Software environment is the term commonly used to refer to support an application.
FAST5: files contain raw signal data from Nanopore that can be used for basecalling.
FASTQ: format that contain nucleotide sequence data (obtained from FAST5 Raw data)
How to Install ubuntu (18 or 20) on Virtual Box: https://www.youtube.com/watch?v=8yLd7PtIMmA How to Install Ubuntu directly on Windows 10 (WSL): https://www.youtube.com/watch?v=X-DHaQLrBi8 6. Conda and Miniconda: Conda is an open-source package management and environment management system that runs on any OS. It has been created for Python programs, although it can package and distribute software for any language. Conda as a package manager helps you find and install packages. You will learn more about this package manager as you use it regularly. Miniconda is a free minimal installer for conda. It is a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others. To know more about Conda / miniconda, please refer to the following website. https://docs.conda.io/projects/conda/en/latest/ 7. Docker: Docker is an open-source software platform to build, deploy or execute applications by using containers. As the name indicates, containerization isolate applications with each container having a separate environment at the same time sharing the underlying OS kernel. Docker provides a good platform to run multiple applications at the same time, each needing its own environment. Containerization is very similar to virtual machines, but unlike VMs, containers are faster and use less system resources. Each Docker container has an image file to create a run-time environment for the supposed application.
Installing Docker
Docker can be easily installed on WSL2 based windows by following instructions as mentioned in the below URL https://docs.docker.com/docker-for-windows/wsl/ 8. GitHub: GitHub is an Open-source Community where people around the globe work together on Open-source projects and make contributions. GitHub is also a code hosting platform for collaboration and version control. Importantly, GitHub is a repository for bioinformatics pipelines, and in our work, we use GitHub to download Nanopore workflows.
For more information and basic commands on GitHub, please refer to
https://guides.github.com/introduction/git-handbook/ 9. Visual Studio Code: This editor can be downloaded freely from https://code.visualstudio.com depending on the type of OS. This is very useful software to edit the code, besides providing an excellent platform to integrate linux terminal, Git and docker. 10. Nanopore MinION data analysis: In our program, we use NGS pipelines developed by Artic Network project. These workflows can be modified to suit the analysis of Ebola, SARS-CoV-2, Influenza viruses etc. For further details on ARTIC Network, please refer to