Course Overview

Infocomm Technology
Modular Certification Course
6 days
Fee Subsidy
Up to 90% SF Funding

This module aims to equip the you with skills ranging from data wrangling, big data processing to machine learning. 

Upon completion of the course, you will be equipped with the necessary skills to excel in an entry-level position in data engineering.

You will have the ability to confidently carry out exploratory data analysis using Python, design and create both SQL and NoSQL databases, create ETL (Extract Transform Load) data pipelines in Apache Spark, create supervised machine learning models in sklearn, and perform unsupervised techniques such as clustering.

With this comprehensive set of skills, you will be prepared to take on the challenges of the data engineering industry.

Who Should Attend

  • Engineers
  • Software Developers
  • Professionals who have experience in programming and are interested to find out more about data engineering
Haja Mydin
"The course has equipped me with the necessary techniques for data structures and how to handle them. The instructors were friendly and patient, and they reached out to help all students that were struggling. I wish nothing but success for the course."
Lim Guo Cong

What You Will Learn

Data Wrangling

  • Participants will be able to apply data wrangling techniques, using libraries such as Numpy and Pandas to transform data from one form to another


  • Participants will be taught basic SQL and they will be able to write and debug simple SQL queries on a database for CRUD (Create Retrieve Update Delete) operations. Participants will also be taught principles of database design (normal forms)


  • Participants will be taught the difference between SQL and NoSQL databases and be able to contrast the situations where each should be used
  • Participants will similarly have to be able to carry out CRUD operations on a NoSQL database such as MongoDB

Apache Spark

  • Participants will be taught how to create a simple data pipeline consisting of data ingestion, data preparation and generating views / queries

Supervised Machine Learning

  • Participants will be able to use tools such as sklearn to create machine learning models using a range of techniques such as decision trees or neural networks

Unsupervised Machine Learning

  • Participants will be exposed to unsupervised learning techniques such as k-means and hierarchical clustering

Teaching Team

Soh Cheng Lock, Donny
Soh Cheng Lock, Donny

Associate Professor / Prog Leader, Infocomm Technology, Singapore Institute of Technology

View profile
Vivek Balachandran
Vivek Balachandran

Associate Professor / Prog Leader, Infocomm Technology, Singapore Institute of Technology

View profile
Zhang Wei
Zhang Wei

Assistant Professor/Prog Leader, Infocomm Technology, Singapore Institute of Technology

View profile


Lessons are held every Friday.

Day Topics
Day 1

Introduction to data programming and Python
Pandas and Numpy basics
Pandas data structure and plotting
Hands-on practice
Assembling data and handling missing data
Applying functions

Day 2

Overview of database systems
SQL basics – data definition language, data manipulation language
Hands-on practice
Relational database and SQL for relational data
Advanced topics on database and discussion

Day 3

Introduction to NoSQL, REST, and MongoDB CRUD
Building MongoDB for a Python Application
NoSQL for big data

Day 4

Introduction to Big Data, Hadoop, Apache Spark, RDD, Functional Programming, and Data Pipelines

Day 5

Introduction to supervised machine learning algorithms and clustering through K Nearest Neighbour algorithm
Theory behind Decision Trees in Supervised Learning
Random Forest Algorithm
Introduction to Functional Programming and machine learning programming
Implementing clustering with K Nearest Neighbour, Decision Trees, and Random Forest

Day 6

Exam (SIT@NYP)

Certificate and Assessment

A Certificate of Participation will be issued to participants who

  • Attend at least 75% of the module
  • Undertake non-credit bearing assessment during the module

A Certificate of Attainment will be issued to participants who

  • Attend at least 75% of the module
  • Undertake and pass credit bearing assessment during the module

Fee Structure

The full fee for this course is S$5,832.00.

Category After SF Funding
Singapore Citizen (Below 40) S$1,733.40
Singapore Citizen (40 & Above) S$653.40
Singapore PR / LTVP+ Holder S$1,749.60
Non-Singapore Citizen S$5,832.00 (No Funding)

Note: All fees above include GST. GST applies to individuals and Singapore-registered companies.

Course Runs

There are no upcoming course runs at the moment.

Subscribe to our mailing list to learn about the latest dates as soon as they become available.


Learning Pathway


Earn a Postgraduate Certificate

The Postgraduate Cetificate in Data Engineering and Smart Factory is designed to equip engineers for greater business competitiveness amidst the exponential growth in robotic automation, information and communication technologies.

Explore More arrow--right