Bioinformatics

ngs.training.workshop_25.08.2022.pdf
File Size:	5433 kb
File Type:	pdf

Download File

ngs_sops.pdf
File Size:	2439 kb
File Type:	pdf

Download File

human_genetics_applications.pdf
File Size:	3891 kb
File Type:	pdf

Download File

Next-Generation Sequencing and Bioinformatics workshops and training conducted:

Next-generation sequencing and bioinformatics: SARS-CoV-2 – Genome sequencing training using Oxford Nanopore. Training provided to staff, University of Jember, Indonesia.
Next-generation sequencing - Oxford Nanopore Sequencing, Bioinformatics and Molecular Diagnostic Techniques. Conducted at CERID-LAUTECH - Center for Emerging and Re-emerging infectious diseases (https://cerid.lautech.edu.ng/), Humbolt-Bayer Foundation Research Hub, Nigeria and Sudan.
Next-generation sequencing and bioinformatics: SARS-CoV-2 – Genome Sequencing training using Oxford Nanopore. Training provided to PANDORA and CANTAM (https://www.pandora-id.net) partners and assisted in capacities building; Republic of Congo, Zambia, Gabon, Ghana, Nigeria and Sierra-Leone. https://www.pandora-id.net/news
Next-generation sequencing and bioinformatics: SARS-CoV-2 – Genome Sequencing training using Oxford Nanopore. Training provided to VG-CARE and Vietnam Military Medical University (VMMU) staff, Hanoi, Vietnam. https://www.vgcare.org/news-and-events/

TRAINING MODULES
A. THEORY

Basic concepts of NGS
Application of genomics in infectious disease research
Metagenomics, targeted sequencing and WGS
Different sequencing platforms (Illumina, ONT and PacBio) and their applications
Special emphasis on Nanopore sequencing platform and its applications
Experimental designs
Standard operating procedures (SOPs)
Internal quality assessment (IQA)
Scaling-up lab infrastructure and computational setup
Cloud computing
Global public genomic databases
Best practices for data sharing

B. WET-LAB MODULE

Experimental designing
Barcoding, library preparation, quantification and pooling
Sequencing of bacterial and viral genomes
16S genome sequencing for bacterial pathogen identification
SARS-CoV-2 sequencing
Bacterial DNA isolation in the context of AMR

C. BIOINFORMATICS MODULE

Introduction to Bioinformatics
Basics on Linux operating system (Ubuntu) and its use in data analysis
Some insights into pipelines and programmes that are commonly applied in developing bioinformatic tools [Python (www.python.org/), R (www.r-project.org/about.html), Bash (www.gnu.org/software/bash/), and visual basic tool (visualstudio.microsoft.com/)]
Open-source databases [Github (https://github.com), Nextflow (www.nextflow.io/), docker (www.docker.com), Singularity (sylabs.io/singularity/)]
Open-source bioinformatics packages (CONDA;//docs.conda.io/projects/conda)
Special emphasis on the bioinformatics pipelines for bacterial whole-genome analysis for AMR detection and surveillance.
Submission to public databases and WHO GLASS database (www.who.int/initiatives/glass)
Application of Machine Learning (www.youtube.com/watch?v=NyEWTDeBc0k) on bacterial genomes to predict AMR.

D. NEXT-GENERATION SEQUENCING (Nanopore and Illumina platforms)
1. Illumina: (will be added soon)
2. Oxford Nanopore:

Links to explore further on Nanopore technology

https://nanoporetech.com/how-nanopore-sequencing-works
https://nanoporetech.com/nanopore-sequencing-data-analysis

Video links on flow cells, priming and loading flow cell

https://www.youtube.com/watch?v=zC6lAtzqi_k
https://www.youtube.com/watch?v=Pt-iaemrM88
https://www.youtube.com/watch?v=IknVaEnuDz0&t=604s

Links to get familiar with technology and experiments designs

https://nanoporetech.com/
https://nanoporetech.com/nanopore-sequencing-data-analysis
https://www.youtube.com/c/OxfordNanoporeTechnologies/playlists
E. BIOINFORMATICS
3. Introduction to bioinformatics:
The languages currently popular for bioinformatics work are Python, Java, R, Perl, and Bash. Majority of bioinformatics/computational biology software’s are developed only for Linux and are command-line ((i.e., launched by entering a command in a terminal window rather than through Graphical User Interphase). As the common languages used in bioinformatics are Python, R and bash.
If you are a beginner in bioinformatics without any prior knowledge in Linux or any programming language, you can still do the data analysis, as the bioinformatics workflows for Nanopore are meant to cater the needs of biologists who wants to do sequence analysis by themselves

Following lectures on YouTube to get a good insight on basics in Bioinformatics

https://www.youtube.com/c/SimonCockell/videos (Prof. Simon Cockell)
https://www.youtube.com/watch?v=lhU3CzslFqw
https://www.youtube.com/watch?v=CX0ThuOCrG0

This lecture focusses on analysis of raw data obtained from MinKNOW software

https://nanoporetech.com/resource-centre/bioinformatics-workflows-sars-cov-2-raw-nanopore-reads-consensus-genomes-using
Besides basic knowledge in Linux, Miniconda, Docker and GitHub are essential. For editing the code, Visual studio code is also a nice to have software that be easily installed on any OS. Linux and Mac operating systems. However, if your PC has Windows 10 as OS, you could still install Linux and share all your data between the two operating systems without having to reboot the PC.
4. File systems and terminology:

Pipeline: composed of a wide array of software algorithms to process raw sequencing data and generate a consensus and list of annotated sequence variants
Reference sequences: The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences
MultiQC: Aggregate results from bioinformatics analyses across many samples into a single report
Sequence alignment: Aligns the reads/sequences based in the similarities
Contigs: is a set of overlapping DNA segments that together represent a consensus region of DNA
Consensus sequence: A DNA sequence of that represents aligned, related sequences
De novo assembly: De novo sequence assemblers are a type of program that assembles short nucleotide sequences into longer ones without the use of a reference genome.
Environments: Software environment is the term commonly used to refer to support an application.
FAST5: files contain raw signal data from Nanopore that can be used for basecalling.
FASTQ: format that contain nucleotide sequence data (obtained from FAST5 Raw data)

5. Linux and Bash:
Linux is an open-source operating system that is very secure, fast and reliable. Linux is most preferred for bioinformatics workflow as it can easily handle large datasets.
https://www.youtube.com/watch?v=v_1zB2WNN14
https://www.youtube.com/watch?v=V1y-mbWM3B8Ubuntu
https://www.youtube.com/watch?v=x5MhydijWmc
https://www.youtube.com/watch?v=G36I1iqDZig

Beginner's Guide to the Bash Terminal

https://www.youtube.com/watch?v=oxuRxtrO2Ag

Introduction to bash for data analysis

https://www.youtube.com/watch?v=EMaFdfIlK58

Installing Linux Operating System

How to Install ubuntu (18 or 20) on Virtual Box:
https://www.youtube.com/watch?v=8yLd7PtIMmA
How to Install Ubuntu directly on Windows 10 (WSL):
https://www.youtube.com/watch?v=X-DHaQLrBi8
6. Conda and Miniconda:
Conda is an open-source package management and environment management system that runs on any OS. It has been created for Python programs, although it can package and distribute software for any language. Conda as a package manager helps you find and install packages. You will learn more about this package manager as you use it regularly.
Miniconda is a free minimal installer for conda. It is a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others. To know more about Conda / miniconda, please refer to the following website.
https://docs.conda.io/projects/conda/en/latest/
7. Docker:
Docker is an open-source software platform to build, deploy or execute applications by using containers. As the name indicates, containerization isolate applications with each container having a separate environment at the same time sharing the underlying OS kernel. Docker provides a good platform to run multiple applications at the same time, each needing its own environment. Containerization is very similar to virtual machines, but unlike VMs, containers are faster and use less system resources. Each Docker container has an image file to create a run-time environment for the supposed application.

Installing Docker

Docker can be easily installed on WSL2 based windows by following instructions as mentioned in the below URL
https://docs.docker.com/docker-for-windows/wsl/
8. GitHub:
GitHub is an Open-source Community where people around the globe work together on Open-source projects and make contributions. GitHub is also a code hosting platform for collaboration and version control. Importantly, GitHub is a repository for bioinformatics pipelines, and in our work, we use GitHub to download Nanopore workflows.

For more information and basic commands on GitHub, please refer to

https://guides.github.com/introduction/git-handbook/
9. Visual Studio Code:
This editor can be downloaded freely from https://code.visualstudio.com depending on the type of OS. This is very useful software to edit the code, besides providing an excellent platform to integrate linux terminal, Git and docker.
10. Nanopore MinION data analysis:
In our program, we use NGS pipelines developed by Artic Network project. These workflows can be modified to suit the analysis of Ebola, SARS-CoV-2, Influenza viruses etc. For further details on ARTIC Network, please refer to

https://artic.network/1-about.html

F. ARTIFICIAL INTELLIGENCE

Introduction to Machine Learning for Biological Data

https://www.youtube.com/watch?v=pQg04M4gfAk

CBW’s Machine LEarning workshops

https://www.youtube.com/watch?v=NyEWTDeBc0k&list=PL3izGL6oi0S_zXasgxccctQLNhIyVT_5o

James Zou: "Deep learning for genomics: Introduction and examples"

https://www.youtube.com/watch?v=JYt1IqdDAPc

Introduction to Machine Learning

https://www.youtube.com/watch?v=h0e2HAPTGF4

Artificial Intelligence and Machine Learning

https://www.youtube.com/watch?v=t4K6lney7Zw

Introduction To Machine Learning | What is Machine Learning?

https://www.youtube.com/watch?v=BuezNNeOGCI

Next-Generation sequencing and Bioinformatics