Covers the primary problem solving strategies, methods, and tools needed for data-intensive programs using large collections of computers typically called "warehouse scale" or "data-center scale" computers. The course examines methods and algorithms for processing data-intensive applications, methods for deploying and managing large collections of computers in an on-demand infrastructure and issues of large-scale computer system design.

 

1)            Big Data Processing ( 18 Hours total)

Map/ Reduce model (3 hours)

Hadoop as practical Map/Reduce ( 6 hours)

PIG and PIG Latin to simplify Hadoop (3 hours)

SPARK as an alternative to Hadoop (3 hours)

Graph-based computation (3 hours)

2)            Big Data Computer Systems (9 hours total)

Computer Virtualization: Processors, Storage and Networking (3 hours)

Configuring and Deploying Virtual Machines (6 hours)

3)            Software As A Service ( 12 hours total)

Architecture of cloud-based computer systems (3 hours)

Configuring and using web servers, REST protocols ( 6 hours)

Review of sample system architectures and designs (3 hours)

4)            Future ofDatacenter Scale Computers (6 hours total)

Metrics for evaluating Datacenter Scale Computer design (3 hours)

Engergy efficiency, cost of ownership, role of cooling (2 hours)

Future technology trends for Datacenter Scale Computers ( 1 hours)

 

Textbooks

The Datacenter As Computer: An Introduction to the Design of Warehouse-Scale Machine, Morgan and Claypool Publishers, 2009, Luiz Andre Barroso and Urs Holzle

Reference material and manuals provided by instructor

 

Grading

80% of student evaluation is based on 8 "programming labs" that reify the topics described above; the specific number of "labs" has varied in the two times this course has been taught based on student feedback of specific problems being "large large". In both instances the projects encompassed the same learning goals, but were broken in slightly more steps so that studenrs could pace their work.

20% of student evaluation is based a final project that is briefly presented during the finals period.

Undergraduates and graduate students are graded in different "pools". Undergraduate projects are smaller in scale and require less extensive evaluation and analysis.

Graduate students are required to complete and write-up a more substantial semester project. The acceptability of graduate projects is based on approval of projects approximately mid-semester.