#REQUEST.pageInfo.pagedescription#

Site Navigation

DATA8005 - Distributed Data Management

banner1
Title:Distributed Data Management
Long Title:Distributed Data Management
Module Code:DATA8005
 
Credits: 5
NFQ Level:Advanced
Field of Study: Data Format
Valid From: Semester 2 - 2012/13 ( February 2013 )
Module Delivered in 1 programme(s)
Module Coordinator: TIM HORGAN
Module Author: AISLING O DRISCOLL
Module Description: This module introduces students to scalable data management using private and public infrastructure, distributed storage and parallel processing/algorithms, distributed large-scale data mining and machine learning and the complex ecosystem of "big data" tools and platforms available.
Learning Outcomes
On successful completion of this module the learner will be able to:
LO1 Appraise the trend towards big data, evolving data characteristics, its impact on traditional systems and evaluate current emerging technologies.
LO2 Build and deploy a distributed data cluster to store and process large data sets.
LO3 Apply advanced big data programming technologies to transform, process and manage a given data set.
LO4 Design and implement an analytical solution using big data analytics solutions to analyse and interpret a massive data set.
Pre-requisite learning
Module Recommendations
This is prior learning (or a practical skill) that is strongly recommended before enrolment in this module. You may enrol in this module if you have not acquired the recommended learning but you will have considerable difficulty in passing (i.e. achieving the learning outcomes of) the module. While the prior learning is expressed as named CIT module(s) it also allows for learning (in another module or modules) which is equivalent to the learning specified in the named module(s).
No recommendations listed
Incompatible Modules
These are modules which have learning outcomes that are too similar to the learning outcomes of this module. You may not earn additional credit for the same learning and therefore you may not enrol in this module if you have successfully completed any modules in the incompatible list.
No incompatible modules listed
Co-requisite Modules
No Co-requisite modules listed
Requirements
This is prior learning (or a practical skill) that is mandatory before enrolment in this module is allowed. You may not enrol on this module if you have not acquired the learning specified in this section.
No requirements listed
Co-requisites
No Co Requisites listed
 

Module Content & Assessment

Indicative Content
Background and Motivation
Definitions, changing data characteristics (4 V’s, structured vs unstructured) and the limitations of traditional data management and technologies.
"Big Data" Paradigm and Emerging Technologies
Overview of Big Data technology platforms analytics, distributed systems and parallelized processing. Scale up vs scale out. Emerging NOSQL system, key value stores, document database, column based databases, graph databases, Trade-offs SQL vs NoSQL. Evaluation of existing big data technologies Hadoop (and MapReduce in general), HBase, MongoDB, Cassandra, Voldemort.
Distributed Storage
Distributed File Systems (DFS) concepts and design - file systems, node types, operations, data flow, parallel processing, data Integrity in DFS, file compression, serialization, file-based data structures;
Parallelised Processing
MapReduce – map/reduce functions, job scheduling, I/O; Parallelized algorithm design. Input and Output Splits, Records Readers and Writers, Mappers, Partitioners and Reducers.
Large Scale Analytics
Distributed machine learning and data mining over big data. Recommendation, Clustering, Classification and Frequent itemset mining.
Assessment Breakdown%
Course Work100.00%
Course Work
Assessment Type Assessment Description Outcome addressed % of total Assessment Date
Short Answer Questions In Class Assessment 1 20.0 Week 4
Practical/Skills Evaluation Implement and test a distributed parallelised program to process a distributed data set 2,3 40.0 Week 8
Practical/Skills Evaluation Design, implement and document a large scale distributed analytics solution 3,4 40.0 Sem End
No End of Module Formal Examination
Reassessment Requirement
Coursework Only
This module is reassessed solely on the basis of re-submitted coursework. There is no repeat written examination.

The institute reserves the right to alter the nature and timings of assessment

 

Module Workload

Workload: Full Time
Workload Type Workload Description Hours Frequency Average Weekly Learner Workload
Lecture Lecture based on Indicative Content 1.0 Every Week 1.00
Lab Lab based on Indicative Content 3.0 Every Week 3.00
Independent Learning Independent student learning 3.0 Every Week 3.00
Total Hours 7.00
Total Weekly Learner Workload 7.00
Total Weekly Contact Hours 4.00
Workload: Part Time
Workload Type Workload Description Hours Frequency Average Weekly Learner Workload
Lecture Lecture based on Indicative Content 1.0 Every Week 1.00
Lab Lab based on Indicative Content 3.0 Every Week 3.00
Independent Learning Independent student learning 3.0 Every Week 3.00
Total Hours 7.00
Total Weekly Learner Workload 7.00
Total Weekly Contact Hours 4.00
 

Module Resources

Recommended Book Resources
  • Tom White,, Hadoop: The Definitive Guide [ISBN: 978-1449311520]
  • Chuck Lam,, Hadoop in Action [ISBN: 9781935182191]
  • Sean Owen, Robin Anil, Ted Dunning, Ellen Friedman, Mahout in Action [ISBN: 9781935182689]
This module does not have any article/paper resources
This module does not have any other resources
 

Module Delivered in

Programme Code Programme Semester Delivery
CR_SDAAN_8 Higher Diploma in Science in Data Science & Analytics 2 Mandatory

Cork Institute of Technology
Rossa Avenue, Bishopstown, Cork

Tel: 021-4326100     Fax: 021-4545343
Email: help@cit.edu.ie