Data Science

The Internet of Things (IoT) has the potential to generate massive amounts of data that will demonstrate most of the characteristics of the Big Data. A single IoT system will incorporate a multitude of diverse devices generating a constant stream of data relevant to a variety of IoT applications. While most of the data will hold short term practical value (e.g. temperature readings, air pressure values), deep analysis of the historical data can provide a meaningful insight of high potential value. Such insight may relate to a particular IoT application (e.g. business analytics) as well as operation of the IoT system itself (operational analytics). The analysis of the IoT data will encompass descriptive, predictive and prescriptive business analytics with operational analytics closely linked to the growth of the IoT system. Operational analytics will be driven by data that is sourced from all things included into the IoT system. Consequently the IoT requires a new skill-set for data governance and analyses.



The Data Science strand will begin with the fundamentals of relational databases used to store structured transactional business data. This data holds the basis for reporting and descriptive analysis required to predict future events and to identify relationships in data. In the third year the students will extend their knowledge to NoSQL (especially for managing unstructured data) databases and data warehouses (supporting consistent views of a domain, and as a springboard for statistics and machine learning analyses). In the IoT context the importance of dealing with large volumes of data in terms of storage and analytics is great. The skills they learn will allow them to design and implement the appropriate data solution with a complete understanding and knowledge of the available options. The students will learn about the trade-offs in terms of consistency, availability and partitioning. In the fourth year students will learn and implement the skills of data mining covering classification, prediction and clustering, applied to data that had been managed using methods and technologies they have learned about in previous years.