Data Science Bootcamp 2017 program
SESSION & DATE
SESSION 01 – Monday, October 2nd., 2017.
The four pillars of a Data Science project:
1. The Why
Full program details in the EventBrite event.
Basic understanding on Data Science tools (Python and R). No software needs to be installed upfront.
Mandatory reading: blog post (by Kris Peeters)
Be prepared for a lot of interaction, discussions and whiteboarding!
SESSION 02 – Tuesday, October 3rd., 2017.
It is very handy to have a real hands-on, practical guide on the actions to do when working on Data Science projects. That is what Jellert Schaepherders and Maarten Van den Broeck, from XploData, will explain to us: the -step-by-step overview of a Data Science project.
A description on the precise actions to do in a Data Science project:
1. General Overview
3. Data selection
4. Data cleaning
5. Data transformation
6. Data discovery
7. Evaluation and Interpretation
8. Insights and Knowledge
- Knowledge of R and Python.
- Install Rstudio and/or a Python IDE (e.g. PyCharm) and Oracle Data Visualisation (demo version).
SESSION 03 and 04 – Monday and Tuesday, October 9nd., and 10th. 2017.
In this 2-day module, Maxime Coniglio, from Keyrus, will introduce the Hadoop ecosystem as the main framework for storing and processing Big Data.
2. What is Hadoop
9. Administration and security
- Knowledge of basic Linux.
- Installation of Linux.
In this hands-on & interactive session, Maarten Callaert, from Captic, will not only teach the participants how to build interactive & dynamic dashboards, but also the best way to sell these dashboards (storytelling).
1. Introduction to dashboarding & Tableau Software
2. Tableau Desktop Basics
3. Introduction to storytelling
4. Tableau Desktop Advanced
5. Business case
- Knowledge of R and Python.
- Prior to the session, there is no need to install any software.
In this session, Véronique Van Vlasselaer, from SAS, will zoom into the main techniques of (social) network analysis from a theoretical and practical perspective. Network theoretical concepts such as homophily, multipartite graphs, centrality, etc. are discussed and supported by hands-on examples.
1. Introduction to key concepts of Network Analysis
2. Creating networks with PROC OPTGRAPH
3. Calculating centrality measures
4. Visualizing networks in SAS Visual Analytics
5. Community Detection
6. Other useful PROC OPTGRAPH options
7. Best Practices
8. A glimpse on advanced network analysis
- Knowledge of SAS.
- There is no need to install any software before the session.
In this hands-on session, Nele Verbiest, from Python Predictions, will show how to approach a typical predictive analytics project, which can be used in numerous domains like marketing, risk, operations and HR in diverse industries. The participants will be able to build a stable predictive model and present their results to business in an elegant fashion.
1. Introduction to predictive analytics: definitions and usecases
2. Predictive analytics algorithms and evaluation techniques
3. Data preparation for predictive analytics
4. How to align a predictive analytics project with business
5. Presenting your predictive analytics project to business
- Knowledge of R.
- R or RStudio (preferred) should be installed.
In this session, Nele Verbiest, from Python Predictions, will introduce to the participants the fundamentals of customer segmentation, the process used by many organisations as a strategic tool to understand customers and monitor evolutions throughout the customer base.
1. Introduction to segmentation: definitions and usecases
2. Clustering: data driven segmentation
3. Data preparation for clustering
4. How to align a segmentation project with business
5. Presenting your segmentation results to business
- Knowledge of R.
- R or RStudio (preferred) should be installed.
In this session, Fabien Janssens, from AXA, will make us play with Elastic, and learn the usage of a search engine. We will discover how to retrieve our data and create a real time Dashboard.
1. Introduction to Elastic
2. Deploy a cluster with Elastic Cloud
3. API discovery
4. Real-time Dashboard
Knowledge of Python.
Installation of Python 3.0 and Jupyter notebook (http://jupyter.org/install.html)
In this session, Maarten Callaert, from Captic, will show the value of adding external data to their models in a pragmatic and creative manner.
1. General introduction & overview
2. Getting started with our Business Case
3. Web Scraping: Adding external data to our initial Business Case
- Knowledge of Python.
- Installation of Python in your PC, preferably Python 2.7.
In this 2-day module, Frank Vanden Berghen, from TIMi, will offer us an insight on how to create good predictive models for telco, banking, insurance, and retail. He will cover the different steps required to make a successful advanced analytic project.
1. Installation of TIMi.
2. Hands-on: create your first predictive model (with TIMi).
3. Assess if your predictive model is ok (lift curves, value, does it make sense?, over-fitting, etc.)
4. General principles to create a good analytical dataset? (time aspects, RFMA)
5. Hands-on: create your first analytical dataset (with Anatella).
6. How to fail an advanced analytic project with glory?
7. Collaborative filtering & segmentation studies.
- A laptop with MS-Windows.
- Knowledge of SQL.
In this session, Eric Lecoutre, from WeLoveDataScience, will take all the participants in a hackathon mode to perform a data analysis on paintings related data, where an art expert will contribute performing a subjective evaluation. Team working, collaboration and sharing will be encouraged, where the participants will act as a consultant.
1. Consulting skills
3. Data analysis
- Knowledge of one of these technologies: Python, R or SAS.
- The participant must come with his own working environment.
In this session, Rik Van Bruggen and Tom Geudens, from Neo4j, will guide you through a hands-on experience with the graph database model and it’s most popular implementation, Neo4j.
2. What is a graph database
3. Why a graph database 4. Graph database use cases
5. Graph Query Languages – (open)Cypher
6. Graph query assignments
7. Graph query assignments
8. Future reading and sources of info.
- General database / SQL experience is useful but not mandatory.
- The participant must have Neo4j Community Edition installed on his laptop before coming to the course. You can download the correct latest version from https://neo4j.com/download/community-edition/. When you surf to https://localhost:7474 you should see the start screen of the Neo4j “browser”.
During this session, Kris Peeters, from Dataminded, will provide an overview of the technologies and architectures that exist, and discuss the pro’s and con’s of each. Based on a concrete example, the objective is to make the right trade-offs and build your own data architecture using gamestorming techniques.
1. Brainstorm data analytics idea
2. Map your data sources
3. Select big data ideas
4. Choose big data technologies
5. Build a big data architecture
6. Identify potential risks
- General notions of data architecture
There will be will be a lot of interaction, discussions and whiteboarding!
This session, delivered by Thomas Ghys (from BYOD) and Michelangelo van Dam (from In2IT) will bring you up to speed with the essence of GDPR and offer pragmatic advice on how to perform analyses using personal data in a privacy-proof and secure way. You will walk away with a few things to remember and many actions to take immediately.
1.Why personal data protection matters
2. GDPR: the good, the bad and the ugly
3. A practical roadmap towards GDPR compliance
4. The implications of GDPR for Data Science workflows, profiling and IOT
5. No data privacy without data security
6. Actions in case of data breach.
For the detailed program, please refer to this page.
There are no pre-requisites for this session.
In this 2-day module, Jan Wijffels (from BNOSAC) will explain the use of text mining tools for the purpose of data analysis. It covers basic text handling, natural language engineering and statistical modelling on top of textual data.
1. Text encodings
2. Cleaning of text data
3. Regular expressions
4. String distances
5. Graphical displays of text
6. Natural language processing: stemming, parts-of-speech tagging, tokenization, lemmatisation
7. Sentiment analysis
8. Statistical topic detection modelling and visualization (latent diriclet allocation)
9. Visualisation of correlations & topics
10. Word embeddings
11. Document similarities
12. Text alignment
- Knowledge of R; follow one of these courses:
- 1. http://lstat.kuleuven.be/training/coursedescriptions/statistical-machine-learning-with-r
- Knowledge of basic Statistics lm/glm
- Knowledge of Data Manipulation
- Basic knowledge of Predictive Modeling
In this 2-day module, Mathieu Carette and Kasper Van Lombeek (from RockEstate) will guide you to create a recommendation engine from scratch using collaborative filtering in Python.
1. Introduction: what are recommendation algorithms?
2. Theoretical overview of collaborative filtering
3. Implementation in Python
- Knowledge of Python
- Please review the following tutorial: https://www.youtube.com/watch?v=1JRrCEgiyHM
SESSION 21 – Monday, December 4th., 2017.
Topic: Bringing a Model into Production
In this session, Filip Deryckere (from Noisy Channels) will share the actions to perform before, during and after a project has been launched. More details to come soon.
Book your place!
To be updated asap…
To be updated asap.