• Specialization: Data Science Foundations: Data Structures and Algorithms
  • Instructor: Sriram Sankaranarayanan, Assistant Professor
  • Prior knowledge needed: 
  • Mathematical Background: We expect that the student is comfortable with basic mathematics at the level of a U.S. first-year college STEM student. This includes basic notions such as sets and functions: Properties of sets, definition and properties of functions; logarithms and exponentials: and their properties; basic series summations: arithmetic and geometric series summations; and probability theory: basic definition of probability, independence of events, probability distributions and expectations.  CLRS has a helpful appendix but the student unfamiliar with these concepts can find numerous high quality explanations online.
  • Programming Background: The course involves solving programming assignments in Python. You must be comfortable with Python programming. This includes basic control structures in Python: conditional branches, for loops and recursion; functions: defining and calling functions, and recursion; in-built data structures: lists and dictionaries; and classes. Our use of Python will get more sophisticated as the course progresses to accommodate some learning of Python. 

View on Coursera

Learning Outcomes

  • Explain fundamental concepts for algorithmic searching and sorting
  • Describe heap data structures and analyze heap components, such as arrays and priority queues
  • Design basic algorithms to implement sorting, selection, and hash functions in heap data structures

Course Content

Duration: 9h 16m

In this module the student will learn the very basics of algorithms through three examples: insertion sort (sort an array in ascending/descending order); binary search: search whether an element is present in a sorted array and if yes, find its index; and merge sort (a faster method for sorting an array). Through these algorithms the student will be introduced to the analysis of algorithms -- i.e, proving that the algorithm is correct for the task it has been designed for and establishing a bound on how the time taken to execute the algorithm grows as a function of input. The student is also exposed to the notion of a faster algorithm and asymptotic complexity through the O, big-Omega and big-Theta notations.

Duration: 8h 5m

In this module, the student will learn about the basics of data structures that organize data to make certain types of operations faster. The module starts with a broad introduction to data structures and talks about some simple data structures such as first-in first out queues and last-in first out stack. Next, we introduce the heap data structure and the basic properties of heaps. This is followed by algorithms for insertion, deletion and finding the minimum element of a heap along with their time complexities. Finally, we will study the priority queue data structure and showcase some applications.

Duration: 8h 30m

We will go through the quicksort and quickselect algorithms for sorting and selecting the kth smallest element in an array efficiently. This will also be an introduction to the role of randomization in algorithm design. Next, we will study hashtables: a highly useful data structure that allows for efficient search and retrieval from large amounts of data. We will learn about the basic principles of hash-table and operations on hashtables.

Duration: 3h 43m

In this module, we will learn randomized pivot selection for quicksort and quickselect. We will learn how to analyze the complexity of the randomized quicksort/quickselect algorithms. We will learn open address hashing: a technique that simplifies hashtable design. Next we will study the design of hash functions and their analysis. Finally, we present and analyze Bloom filters that are used in various applications such as querying streaming data and counting.

Duration: 1h 32m

This module contains materials for the final exam for MS-DS degree students. If you've upgraded to the for-credit version of this course, please make sure you review the additional for-credit materials in the Introductory module and anywhere else they may be found.

 

Note: This page is periodically updated. Course information on the Coursera platform supersedes the information on this page. Click View on Coursera button above for the most up-to-date information.