Evaluation and Comparison of Hadoop Technologies for Genetic Data Analyses
| Thesis Type | Master |
| Thesis Status |
Finished
|
| Student | Clemens Banas |
| Final |
|
| Start |
|
| Thesis Supervisor | |
| Contact | |
| Research Field |
As data volume in Genetics is constantly increasing, it is key to utilize scalable big data technologies to process large genomic studies. The selection of a specific technology is crucial, whereby Apache Hadoop and Apache Spark are two promising technologies to tackle the demands. The aim of this thesis is to compare the advantages/disadvantages of these state-of-the-art technologies and to evaluate them on the three most important genetic data formats FASTQ, BAM and VCF.