zum Inhalt springen

Advanced Seminar: Data Science in Practice

 

 

    

 

The goal of “Advanced data science in practice” is to answer real business questions using real data from an industry partner. You will work in teams and create

1) a notebook that contains your analysis, and

2) a presentation. 

Industry partners (data scientists, to be more precise) will provide datasets and questions. They will come to the classroom and you will present your results in front of them. You will receive grades for both, the notebook and the presentation. 

As of this writing, our industry partners in summer 2018 are 

Deutsche Fussball Liga 

Deloitte Analytics Institute

Justix GmbH

These companies have agreed to join us during the seminar, to provide data, to attend the presentations. 

Students can choose to work with Python, R, or both. This course is not for students without prior experience in R, Python, or data science. You should feel comfortable using one of these languages to study data and communicate results. 

 

If you choose R, then use the following setup:

R with tidyverse packages

This is a good book to get you started: http://r4ds.had.co.nz/

RStudio with Git and GitHub extensions (https://www.rstudio.com/)

For plots: ggplot2

For writing the report: RMarkdown with knitr in RStudio

 

If you choose Python, then use the following setup:

Simple: Anaconda distribution with Python3 (https://www.anaconda.com/)

Use the numpy, pandas, and scikit-learn stack

This is a good book to get you started: https://jakevdp.github.io/PythonDataScienceHandbook/

Choose your own editor. Not all Python editors integrate seamlessly with Git/GitHub.

For plots: matplotlib

For writing the report: Jupyter notebook

 

Notes:

Jupyter with R kernel is also allowed

GitHub renders jupyter out of the box

RMarkdown provides a GitHub output format

No other programming languages allowed, sorry!

 

Confused? Take a look at the books recommended above. Both R and Python provide highly functional and elegant environments for doing data science.

 

There will be an “installation session” (April 12) where we meet to discuss the seminar in depth. We will talk about the organization of the course, about our expectations, and we will install all the required software packages, so that it works for all of you.

 

Registration and kick-off

You register via KLIPS. Kick-Off will start at 10:00, 12.4.2018