Title: | Data Analytics |
Long Title: | Data Analytics |
Field of Study: |
Computer Science
|
Valid From: |
Semester 1 - 2016/17 ( September 2016 ) |
Module Coordinator: |
Donna OShea |
Module Author: |
Ted Scully |
Module Description: |
Data analytics (DA) is the science of examining raw data with the purpose of drawing conclusions about that information. Data analytics is used in many industries to allow companies and organizations to make better business decisions and in the sciences to verify or disprove existing models or theories. Data analytics is distinguished from data mining by the scope, purpose and focus of the analysis. Data analytics focuses on inference, the process of deriving a conclusion based solely on what is already known by the researcher. This module explores the most recent trends in this fast growing field. |
Learning Outcomes |
On successful completion of this module the learner will be able to: |
LO1 |
Apply fundamental techniques relevant to the field of data analytics such as exploratory data analysis and pre-processing techniques |
LO2 |
Research a specific topic in machine learning such as decision trees, neural networks, support vector machines. |
LO3 |
Investigate and apply a machine learning algorithm to a problem from a specific application domain and interpret the results. |
LO4 |
Implement a big data analytics solution in order to analyse and interpret a very large data set. |
Pre-requisite learning |
Module Recommendations
This is prior learning (or a practical skill) that is strongly recommended before enrolment in this module. You may enrol in this module if you have not acquired the recommended learning but you will have considerable difficulty in passing (i.e. achieving the learning outcomes of) the module. While the prior learning is expressed as named CIT module(s) it also allows for learning (in another module or modules) which is equivalent to the learning specified in the named module(s). |
No recommendations listed |
Incompatible Modules
These are modules which have learning outcomes that are too similar to the learning outcomes of this module. You may not earn additional credit for the same learning and therefore you may not enrol in this module if you have successfully completed any modules in the incompatible list. |
No incompatible modules listed |
Co-requisite Modules
|
No Co-requisite modules listed |
Requirements
This is prior learning (or a practical skill) that is mandatory before enrolment in this module is allowed. You may not enrol on this module if you have not acquired the learning specified in this section.
|
No requirements listed |
Co-requisites
|
No Co Requisites listed |
Module Content & Assessment
Indicative Content |
Data Analytics
Overview of data analytics including example applications in areas such as Finance, Web, Marketing etc. Introduction to exploratory data analysis, including data summarization, exploratory statistics and visualizations such as boxplots, histograms, scatter plots, etc. Common pre-processing techniques such as data cleaning and feature selection.
|
Machine Learning
Introduction to different categories of machine learning such as supervised and non-supervised learning algorithms. Application of classification, clustering and regression models such as K-means, decision trees, Bayesian classification, linear regression, support vectors machines and ensemble methods.
|
Big Data Frameworks
Principles and challenges of Big Data. Overview of Big Data frameworks such as Spark for parallelized processing of data.
|
Large Scale Analytics
Introduction to algorithms for the analysis of high velocity data. Machine learning for parallel analysis of large data sets using a distributed framework such as Spark MLlib. Distributed stream processing for data analysis real-time using a distributed framework such as Spark Streaming.
|
Case Study
Application of distributed data analytics to a real-world use case such as sentiment analysis, fraud detection, customer analysis, recommender systems.
|
Assessment Breakdown | % |
Course Work | 100.00% |
Course Work |
Assessment Type |
Assessment Description |
Outcome addressed |
% of total |
Assessment Date |
Project |
Applying a specific use case the student is expect to apply a data mining algorithm to extract data with the aim of analysing it futher summarising it into useful information. For example the student may be expected to mine data with the aim of monitoring suspicious discussions in an online forum. |
1,2,3 |
70.0 |
Week 10 |
Project |
By employing appropriate research methods, the student is expected to evaluate technical aspects of a large scale data analytics system, such as Hadoop or Spark. |
4 |
30.0 |
Sem End |
No End of Module Formal Examination |
Reassessment Requirement |
Coursework Only
This module is reassessed solely on the basis of re-submitted coursework. There is no repeat written examination.
|
The institute reserves the right to alter the nature and timings of assessment
Module Workload
Workload: Full Time |
Workload Type |
Workload Description |
Hours |
Frequency |
Average Weekly Learner Workload |
Lecture |
Lecture delivering theory underpinning learning outcomes. |
2.0 |
Every Week |
2.00 |
Lab |
Lab to support learning outcomes. |
1.0 |
Every Week |
1.00 |
Independent & Directed Learning (Non-contact) |
Independent study. |
4.0 |
Every Week |
4.00 |
Total Hours |
7.00 |
Total Weekly Learner Workload |
7.00 |
Total Weekly Contact Hours |
3.00 |
Workload: Part Time |
Workload Type |
Workload Description |
Hours |
Frequency |
Average Weekly Learner Workload |
Lecture |
Lecture delivering theory underpinning learning outcomes. |
2.0 |
Every Week |
2.00 |
Lab |
Lab to support learning outcomes. |
1.0 |
Every Week |
1.00 |
Independent & Directed Learning (Non-contact) |
Independent study. |
4.0 |
Every Week |
4.00 |
Total Hours |
7.00 |
Total Weekly Learner Workload |
7.00 |
Total Weekly Contact Hours |
3.00 |
Module Resources
Recommended Book Resources |
---|
- Jones, Don 2010, The Shortcut Guide to Achieving Business Intelligence in Midsize Companies, Realtime Publishers [ISBN: 1935581112]
- Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman 2014, Mining of Massive Datasets, Cambridge University Press [ISBN: 9781107077232]
- Trevor Hastie, Robert Tibshirani, Jerome Friedman 2009, The Elements of Statistical Learning, 2nd Ed., Springer [ISBN: 9780387848587]
- Pang-Ning Tan, Michael Steinbach, Vipin Kumar 2006, Introduction to Data Mining, 4, 6, 8, Pearson [ISBN: 9780321321367]
| This module does not have any article/paper resources |
---|
Other Resources |
---|
- ebook: Mark Scott 2010, The Shortcut Guide to Large Scale Data
Warehousing and Advanced Analytics
, Realtime Publishers
- ebook: Don Jones 2010, Achieving Business Intelligence in
Midsize Companies
, Realtime Publishers
- ebook: Anand Rajaraman, Jeffrey D. UllmanMining of Massive Data Sets
- ebook: Pang-Ning Tan, Michael Steinbach, Vipin Kumar 2006, Introduction to Data Mining
|
Module Delivered in
|