Data Science Bootcamp 2017 program

Home / Data Science Bootcamp 2017 program

General overview

Sessions take place on Monday and Tuesday, from 9 to 17 hrs, at DigitYser (Boulevard D’Anvers 40, 1000 Brussels). To see the sessions details and book your seat, please scroll down. Questions? Drop us an e-mail: training@di-academy.com

Sessions’ overview

You can also check our EventBrite page to have a list of these courses. We are continuously adding more descriptions. Questions? Drop us an e-mail: training@di-academy.com

SESSION & DATE

DESCRIPTION

OUTLINE

PREREQUISITES

SESSION 01 – Monday, October 2nd., 2017.

In his frank and charismatic style, Kris Peeters, from Data Minded, will share his view on the four pillars (the WHY, PROCESSES, TEAM and LEADERSHIP) related to working on Data Science projects, sharing his secret recipe to reach success in a Data Science project in a hands-on way.

Book your place!

The four pillars of a Data Science project:

1. The Why
2. Processes
3. Team
4. Leadership

Full program details in the EventBrite event.

Basic understanding on Data Science tools (Python and R). No software needs to be installed upfront.

Mandatory reading: blog post (by Kris Peeters)

Be prepared for a lot of interaction, discussions and whiteboarding!

SESSION 02 – Tuesday, October 3rd., 2017.

It is very handy to have a real hands-on, practical guide on the actions to do when working on Data Science projects. That is what Jellert Schaepherders and Maarten Van den Broeck, from XploData, will explain to us: the -step-by-step overview of a Data Science project.

Book your place!

A description on the precise actions to do in a Data Science project:

1. General Overview
2. Data
3. Data selection
4. Data cleaning
5. Data transformation
6. Data discovery
7. Evaluation and Interpretation
8. Insights and Knowledge

  • Knowledge of R and Python.
  • Install Rstudio and/or a Python IDE (e.g. PyCharm) and Oracle Data Visualisation (demo version).

SESSION 03 and 04 – Monday and Tuesday, October 9nd., and 10th. 2017.

In this 2-day module, Maxime Coniglio, from Keyrus, will introduce the Hadoop ecosystem as the main framework for storing and processing Big Data.

Book your place!

1. Motivation
2. What is Hadoop
3. HDFS
4. YARN
5. Sqoop
6. HBase
7. Hive
8. Optimisation
9. Administration and security

  • Knowledge of basic Linux.
  • Installation of Linux.

SESSION 05 – Monday, October 16th., 2017.

In this hands-on & interactive session, Maarten Callaert, from Captic, will not only teach the participants how to build interactive & dynamic dashboards, but also the best way to sell these dashboards (storytelling).

Book your place!

1. Introduction to dashboarding & Tableau Software
2. Tableau Desktop Basics
3. Introduction to storytelling
4. Tableau Desktop Advanced
5. Business case

  • Knowledge of R and Python.
  • Prior to the session, there is no need to install any software.

SESSION 06 – Tuesday, October 17th., 2017.

In this session, Véronique Van Vlasselaer, from SAS, will zoom into the main techniques of (social) network analysis from a theoretical and practical perspective. Network theoretical concepts such as homophily, multipartite graphs, centrality, etc. are discussed and supported by hands-on examples.

Book your place!

1. Introduction to key concepts of Network Analysis
2. Creating networks with PROC OPTGRAPH
3. Calculating centrality measures
4. Visualizing networks in SAS Visual Analytics
5. Community Detection
6. Other useful PROC OPTGRAPH options
7. Best Practices
8. A glimpse on advanced network analysis

  • Knowledge of SAS.
  • There is no need to install any software before the session.

SESSION 07 – Monday, October 23rd., 2017.

In this hands-on session, Nele Verbiest, from Python Predictions, will show how to approach a typical predictive analytics project, which can be used in numerous domains like marketing, risk, operations and HR in diverse industries. The participants will be able to build a stable predictive model and present their results to business in an elegant fashion.

Book your place!

1. Introduction to predictive analytics: definitions and usecases
2. Predictive analytics algorithms and evaluation techniques
3. Data preparation for predictive analytics
4. How to align a predictive analytics project with business
5. Presenting your predictive analytics project to business

  • Knowledge of R.
  • R or RStudio (preferred) should be installed.

SESSION 08 – Tuesday, October 24th., 2017.

In this session, Nele Verbiest, from Python Predictions, will introduce to the participants the fundamentals of customer segmentation, the process used by many organisations as a strategic tool to understand customers and monitor evolutions throughout the customer base.

Book your place!

1. Introduction to segmentation: definitions and usecases
2. Clustering: data driven segmentation
3. Data preparation for clustering
4. How to align a segmentation project with business
5. Presenting your segmentation results to business

  • Knowledge of R.
  • R or RStudio (preferred) should be installed.

SESSION 09 – Monday, October 30th., 2017.

In this session, Fabien Janssens, from AXA, will make us play with Elastic, and learn the usage of a search engine. We will discover how to retrieve our data and create a real time Dashboard.

Book your place!

1. Introduction to Elastic
2. Deploy a cluster with Elastic Cloud
3. API discovery
4. Real-time Dashboard

Knowledge of Python.

Installation of Python 3.0 and Jupyter notebook (http://jupyter.org/install.html)

SESSION 10 – Tuesday, October 31st., 2017.

In this session, Maarten Callaert, from Captic, will show the value of adding external data to their models in a pragmatic and creative manner.

Book your place!

1. General introduction & overview
2. Getting started with our Business Case
3. Web Scraping: Adding external data to our initial Business Case

  • Knowledge of Python.
  • Installation of Python in your PC, preferably Python 2.7.

SESSION 11 & 12 – Monday, November 11th., and Tuesday, November 12th.

In this 2-day module, Frank Vanden Berghen, from TIMi, will offer us an insight on how to create good predictive models for telco, banking, insurance, and retail. He will cover the different steps required to make a successful advanced analytic project.

Book your place!

1. Installation of TIMi.
2. Hands-on: create your first predictive model (with TIMi).
3. Assess if your predictive model is ok (lift curves, value, does it make sense?, over-fitting, etc.)
4. General principles to create a good analytical dataset? (time aspects, RFMA)
5. Hands-on: create your first analytical dataset (with Anatella).
6. How to fail an advanced analytic project with glory?
7. Collaborative filtering & segmentation studies.

  • A laptop with MS-Windows.
  • Knowledge of SQL.

SESSION 13 – Monday, November 13th., 2017.

In this session, Eric Lecoutre, from WeLoveDataScience, will take all the participants in a hackathon mode to perform a data analysis on paintings related data, where an art expert will contribute performing a subjective evaluation. Team working, collaboration and sharing will be encouraged, where the participants will act as a consultant.

Book your place!

1. Consulting skills
2. Pitch
3. Data analysis
4. Presentation

  • Knowledge of one of these technologies: Python, R or SAS.
  • The participant must come with his own working environment.

SESSION 14 – Tuesday, November 14th.

 

In this session, Rik Van Bruggen and Tom Geudens, from Neo4j, will guide you through a hands-on experience with the graph database model and it’s most popular implementation, Neo4j.

Book your place!

1. Introduction
2. What is a graph database
3. Why a graph database 4. Graph database use cases
5. Graph Query Languages – (open)Cypher
6. Graph query assignments
7. Graph query assignments
8. Future reading and sources of info.
9. Q&A

  • General database / SQL experience is useful but not mandatory.
  • The participant must have Neo4j Community Edition installed on his laptop before coming to the course. You can download the correct latest version from https://neo4j.com/download/community-edition/. When you surf to https://localhost:7474 you should see the start screen of the Neo4j “browser”.

SESSION 15 – Monday, November 20th., 2017.

During this session, Kris Peeters, from Dataminded, will provide an overview of the technologies and architectures that exist, and discuss the pro’s and con’s of each. Based on a concrete example, the objective is to make the right trade-offs and build your own data architecture using gamestorming techniques.

Book your place!

1. Brainstorm data analytics idea
2. Map your data sources
3. Select big data ideas
4. Choose big data technologies
5. Build a big data architecture
6. Identify potential risks

  • General notions of data architecture

There will be will be a lot of interaction, discussions and whiteboarding!

 

SESSION 16 – Tuesday, November 21st., 2017.

This session, delivered by Thomas Ghys (from BYOD) and Michelangelo van Dam (from In2IT) will bring you up to speed with the essence of GDPR and offer pragmatic advice on how to perform analyses using personal data in a privacy-proof and secure way. You will walk away with a few things to remember and many actions to take immediately.

Book your place!

1.Why personal data protection matters
2. GDPR: the good, the bad and the ugly
3. A practical roadmap towards GDPR compliance
4. The implications of GDPR for Data Science workflows, profiling and IOT
5. No data privacy without data security
6. Actions in case of data breach.

For the detailed program, please refer to this page.

There are no pre-requisites for this session.

SESSION 17 & 18 – Monday, November 27th., and Tuesday, November 28th., 2017.

In this 2-day module, Jan Wijffels (from BNOSAC) will explain the use of text mining tools for the purpose of data analysis. It covers basic text handling, natural language engineering and statistical modelling on top of textual data.

Book your place!

1. Text encodings
2. Cleaning of text data
3. Regular expressions
4. String distances
5. Graphical displays of text
6. Natural language processing: stemming, parts-of-speech tagging, tokenization, lemmatisation
7. Sentiment analysis
8. Statistical topic detection modelling and visualization (latent diriclet allocation)
9. Visualisation of correlations & topics
10. Word embeddings
11. Document similarities
12. Text alignment

  • Knowledge of R; follow one of these courses:
  • 1. http://lstat.kuleuven.be/training/coursedescriptions/statistical-machine-learning-with-r
    or
    2. https://lstat.kuleuven.be/training/coursedescriptions/AdvancedprogramminginR.html
  • Knowledge of basic Statistics lm/glm
  • Knowledge of Data Manipulation
  • Basic knowledge of Predictive Modeling

SESSION 19 & 20 – Monday December 4th. and Tuesday December 5th., 2017.

In this 2-day module, Mathieu Carette and Kasper Van Lombeek (from RockEstate) will guide you to create a recommendation engine from scratch using collaborative filtering in Python.

Book your place!

1. Introduction: what are recommendation algorithms?
2. Theoretical overview of collaborative filtering
3. Implementation in Python

  • Knowledge of Python
  • Please review the following tutorial: https://www.youtube.com/watch?v=1JRrCEgiyHM

SESSION 21 – Monday, December 4th., 2017.

Topic: Bringing a Model into Production

In this session, Filip Deryckere (from Noisy Channels) will share the actions to perform before, during and after a project has been launched. More details to come soon.

Book your place!

To be updated asap…

To be updated asap.