Two-day hands-on workshop "Data Science at the Command Line"

0 ratings

Two-day hands-on workshop "Data Science at the Command Line"

Jeroen Janssens
0 ratings

The unix command line, although invented decades ago, is an amazing environment for efficiently performing tedious but essential data science tasks. By combining small, powerful, command-line tools (like parallel, jq, and csvkit), you can quickly scrub and explore your data and hack together prototypes.

This hands-on workshop is based on the O’Reilly book Data Science at the Command Line, written by instructor Jeroen Janssens. You’ll learn how to build fast data pipelines, how to leverage R and Python at the command line, and how to quickly visualize data. No prior knowledge about the Unix command line is required.

By the end of this workshop you will have a solid understanding of how to integrate the command line in your data science workflow. Even if you’re already comfortable processing data with, for example, R or Python, being able to also leverage the power of the command line can make you a more effective and efficient data scientist.

Note: This workshop doesn't have any ratings yet because I have only been using Gumroad since December 2021. Please visit my company website for reviews from past students about this workshop and other workshops. Thanks.

What you’ll learn

  • Automate tedious tasks

  • Parallelize and distribute your tasks to multiple cores and machines

  • Convert your existing code to reusable command-line tools

  • Easily inspect, transform, and visualize data

  • Apply a variety of supervised and unsupervised machine learning algorithms


This workshop consists of 4 online sessions:

  1. Thursday March 10, 2022 from 10am to noon EST

  2. Thursday March 10, 2022 from 1pm to 3pm EST

  3. Friday March 11, 2022 from 10am to noon EST

  4. Friday March 11, 2022 from 1pm to 3pm EST

There's a one-hour break in between sessions 1 & 2 and 3 & 4. Find out what time the first session starts in your local time zone.


Day 1:

  • Introduction

    • What is the command line?

    • Why learn the command line for doing data science?

    • A real-world data science use case

    • Getting up and running with the Docker image

  • Essential concepts of the unix command line

    • Running command-line tools

    • Combining command-line tools

    • Redirecting input and output

    • Working with files

    • Getting help

  • Obtaining data from logs, spreadsheets, and databases

  • Downloading data from the Internet and accessing APIs using curl

  • Transforming data with filters such as cut, paste, grep, and sed

  • Processing other data formats efficiently

    • JSON with jq

    • CSV with csvkit

    • HTML with pup

    • XML with xmlstarlet

Day 2:

  • Running R from the command line

  • Visualising data from the command line

    • Scatter plot

    • Histogram

    • Bar chart

    • Geographic visualisation

  • Parallelising and distributing data-intensive pipelines

  • Creating reusable command-line tools

    • Automate things in a Bash script

    • Convert your existing code to a command-line tool

    • Processing arguments

    • Working with streaming data

  • Applying machine learning

    • Dimensionality reduction

    • Classification

    • Regression

  • Conclusion

Recommended preparation

Participants are kindly requested to have the following items installed prior to the start of the workshop:

Once you've signed up, you'll receive detailed installation instructions and an invitation to the online Zoom sessions with Jeroen. Looking forward to seeing you there.

If you have any questions, don't hesitate to email me at jeroen@datascienceworkshops.com.

This product is not currently for sale.

You'll receive an invitation to the live Zoom sessions with Jeroen Janssens

Powered by