The unix command line, although invented decades ago, is an amazing environment for efficiently performing tedious but essential data science tasks. By combining small, powerful, command-line tools (like
csvkit), you can quickly scrub and explore your data and hack together prototypes.
This hands-on workshop is based on the O’Reilly book Data Science at the Command Line, written by instructor Jeroen Janssens. You’ll learn how to build fast data pipelines, how to leverage R and Python at the command line, and how to quickly visualize data. No prior knowledge about the Unix command line is required.
By the end of this workshop you will have a solid understanding of how to integrate the command line in your data science workflow. Even if you’re already comfortable processing data with, for example, R or Python, being able to also leverage the power of the command line can make you a more effective and efficient data scientist.
Note: This workshop doesn't have any ratings yet because I have only been using Gumroad since December 2021. Please visit my company website for reviews from past students about this workshop and other workshops. Thanks.
What you’ll learn
Automate tedious tasks
Parallelize and distribute your tasks to multiple cores and machines
Convert your existing code to reusable command-line tools
Easily inspect, transform, and visualize data
Apply a variety of supervised and unsupervised machine learning algorithms
This workshop consists of 4 online sessions:
Thursday March 10, 2022 from 10am to noon EST
Thursday March 10, 2022 from 1pm to 3pm EST
Friday March 11, 2022 from 10am to noon EST
Friday March 11, 2022 from 1pm to 3pm EST
There's a one-hour break in between sessions 1 & 2 and 3 & 4. Find out what time the first session starts in your local time zone.
What is the command line?
Why learn the command line for doing data science?
A real-world data science use case
Getting up and running with the Docker image
Essential concepts of the unix command line
Running command-line tools
Combining command-line tools
Redirecting input and output
Working with files
Obtaining data from logs, spreadsheets, and databases
Downloading data from the Internet and accessing APIs using
Transforming data with filters such as
Processing other data formats efficiently
Rfrom the command line
Visualising data from the command line
Parallelising and distributing data-intensive pipelines
Creating reusable command-line tools
Automate things in a Bash script
Convert your existing code to a command-line tool
Working with streaming data
Applying machine learning
Participants are kindly requested to have the following items installed prior to the start of the workshop:
The Docker image, by running:
docker pull datasciencetoolbox/dsatcl2e
Once you've signed up, you'll receive detailed installation instructions and an invitation to the online Zoom sessions with Jeroen. Looking forward to seeing you there.
If you have any questions, don't hesitate to email me at firstname.lastname@example.org.
You'll receive an invitation to the live Zoom sessions with Jeroen Janssens