# Data Science Graduate Certificate

## Data Science Graduate Certificate on Coursera

Develop interdisciplinary skills in data science and gain knowledge of statistical analysis, data mining, and machine learning from one of the nation’s top-ranked Tier 1 research institutions.

## Data Science Graduate Certificate

**Earn credit towards a Master's Degree: **This Graduate Certificate can qualify as credit towards your degree.

**Learn from top-tier faculty: **Take classes taught by the same faculty on campus.

**Study on your schedule: **Flexible 8-week terms with self-paced courses. Students have 8 years to complete the Data Science Graduate certificate and/or MS-Data Science degree.

Data science is a multidisciplinary field that focuses on extracting knowledge and insight from large datasets. There is a growing need for data scientists, data analysts, and statisticians equipped with knowledge and essential skills to work across diverse organizations.

CU Boulder will prepare you for a successful career in this high-paying, in-demand field. Learners will become proficient in new skills, build a portfolio through hands-on projects, and earn an industry-recognized credential to land the job they want.

To earn the Data Science Graduate Certificate (12 credits), students **must** complete the following required specializations:

**Data Mining Foundations and Practice Specialization**(3 credits)**Data Science Foundations: Statistical Inference Specialization**(3 credits)

As well as **two** specializations from the following:

**Introduction to Statistical Learning for Data Science Specialization**(3 credits)**Machine Learning Specialization**(3 credits)**Statistical Modeling for Data Science Specialization**(3 credits)

The certificate will be stackable and the credits can be applied to the Master of Science in Data Science on Coursera degree for students interested in continuing their education.

To earn the **Data Science Graduate Certificate **(12 credits), students **must** complete the following required specializations on Coursera:

**Required Specializations**

This course introduces the key steps involved in the data mining pipeline, including:

- Data Understanding
- Data Preprocessing
- Data Warehousing
- Data Modeling
- Interpretation & Evaluation
- Real-World Applications

This course covers the core techniques used in data mining, including:

- Frequent Pattern Analysis
- Classification
- Clustering
- Outlier Analysis
- Mining Complex Data
- Research Frontiers in the Data Mining Field

This course offers step-by-step guidance and hands-on experience in designing and implementing a real-world data mining project.

You will learn

- Problem Formulation
- Literature Survey
- Proposed Work
- Evaluation
- Discussion
- Future Work

Understand the foundations of probability and its relationship to statistics and data science.

- Learn what it means to calculate a probability, independent and dependent outcomes, and conditional events.
- Study how discrete and continuous random variables fit with data collection.
- Understand the fundamental importance of Gaussian (normal) random variables and the Central Limit Theorem to all statistics and data science.

This course introduces statistical inference, sampling distributions, and confidence intervals.

Students will learn:

- How to define and construct good estimators
- Methods of moments estimation
- Maximum likelihood estimation
- Methods of constructing confidence intervals that extend to more general settings

This course will focus on the theory and implementation of hypothesis testing, especially as it relates to applications in data science.

You will learn

- To use hypothesis tests to make informed decisions from data.
- The general logic of hypothesis testing, error and error rates, power, simulation, and the correct computation and interpretation of p-values.
- The misuse of testing concepts, especially p-values, and the ethical implications of such misuse.

**Elective Specializations**

Choose **two** specializations from the following:

Introduction to Statistical Learning will explore concepts in statistical modeling:

- When to use certain models
- How to tune those models
- And, if other options will provide certain trade-offs

We will cover Regression, Classification, Trees, Resampling, Unsupervised techniques, and more!

Learn the foundational framework & application of cross-validation, bootstrapping, dimensionality reduction, ridge regression, lasso, GAMs, and splines.

It consists of the foundational framework & application of tree-based methods, support vector machines, and unsupervised learning.

In this course, you will learn various supervised Machine Learning algorithms and prediction tasks applied to different data. When to use which model and why, and how to improve the model performances.

We will cover models such as linear and logistic regression, KNN, decision trees and ensembling methods such as Random Forest and Boosting, and kernel methods such as SVM.

One of the most useful areas in machine learning is discovering hidden patterns from unlabeled data

In this course, you will learn:

- Unsupervised learning methods for dimensionality reduction, clustering, and learning latent features.
- Real-world applications such as recommender systems, through hands-on examples of product recommendation algorithms.

This course will cover the basics of deep learning, including how to build and train:

- Multilayer perceptron
- Convolutional Neural Networks (CNNs)
- Recurrent neural networks (RNNs)
- Autoencoders (AE)
- Generative adversarial networks (GANs)

The course includes several hands-on projects, including:

- Cancer detection with CNNs and RNNs on disaster tweets
- Generating dog images with GANs

This course will provide a set of foundational statistical modeling tools for data science.

You will be introduced to:

- Methods, theory, and applications of linear statistical models
- The topics of parameter estimation, residual diagnostics, the goodness of fit
- Various strategies for variable selection and model comparison.

Attention will be given to the misuse of statistical models and the ethical implications of such misuse.

Statistical modeling will introduce you to the study of the analysis of variance (ANOVA), analysis of covariance (ANCOVA), and experimental design.

ANOVA and ANCOVA, presented as a type of linear regression model, will provide the mathematical basis for designing experiments for data science applications.

Emphasis will be placed on important design-related concepts, such as randomization, blocking, factorial design, and causality.

Some attention will also be given to ethical issues raised in experimentation.

You will study a broad set of more advanced statistical modeling tools, including:

- Generalized linear models (GLMs), which will introduce classification (through logistic regression)
- Monparametric modeling, including kernel estimators and smoothing splines
- Semi-parametric generalized additive models (GAMs).

Emphasis will be placed on a firm conceptual understanding of these tools. Attention will also be given to ethical issues raised by using complicated statistical models.

- Probability Theory
- Unstructured Data
- Machine Learning
- Artificial Intelligence
- Deep Learning
- Data Visualization
- Big Data Analytics

There are no formal prerequisites, but we recommend that you have prior knowledge of basic mathematical concepts and computer programming.

**Math**: Calculus and Linear Algebra**Programming**: Python and R Programming

If you do not have this knowledge already, we encourage you to try out non-credit coursework before attempting for-credit courses.

If you would like to brush up on the above skills before starting the program, consider the following classes on Coursera:

**Calculus:**Algebra and Differential Calculus for Data Science**Linear Algebra:**Essential Linear Algebra for Data Science**R Programming:**Mastering Software Development in R Specialization by Johns Hopkins University**Python:**Introduction to Scripting in Python Specialization by Rice University

The Data Science Graduate Certificate on Coursera is** $525 per credit hour**.

The program requires 12 credit hours of coursework, so the total program cost is $6,300. Because this program is 100% online, the tuition is the same for all students regardless of residency status.

Full tuition payment is required at the time of enrollment per credit each session. See the Bursar's Degrees on Coursera page for details about payment methods, refund policies, etc.

Take the next step in your education to boost your career. This **Graduate Certificate** can be stacked toward the Master of Science in Data Science on Coursera degree if you are interested in continuing your education.

The Data Science Foundations Specialization, a requirement to earn this Data Science Graduate Certificate, is a pathway for admissions to the Master of Science in Data Science (MS-DS) degree program.

The Master of Data Science degree is designed to prepare you to successfully work and collaborate with others across a variety of scientific, business, and other fields.

**To pursue admission to the MS-DS degree program, you’ll need to complete the following four required protocols:**

- Pass one pathway with a pathway GPA of 3.0 or higher
- Earn a C or better in all pathway courses within your chosen pathway
- Earn an overall cumulative GPA of 3.0 or higher
- Indicate interest in degree admission (via the enrollment form)

By completing these steps, you will go from working on a certificate to earning a degree.

Upon completion of the Data Science Graduate Certificate, you can apply these 12 credits to the Master of Science in Data Science degree.

CU Boulder is committed to teaching the next generation of interdisciplinary data scientists.