Learn Shell for Data Science (Command-line/Terminal)

Learning the Unix shell, also known as command-line interface (CLI), command prompt or terminal, is beneficial for data scientists because it allows efficient data manipulation and automation, enabling tasks like cleaning, preprocessing, and automating workflows.

The shell facilitates the access to data, the management and organization of files, and makes it easier to handle and merge datasets. The more advanced you get in data science, the more you will end-up using the shell.

This tutorial shows you how to use the Shell for Data Science.


Subscribe to my Newsletter


Navigation Show

What is the Shell

The Shell, or command-line shell, is the program that takes the instructions typed by the user and transmits them to the operating system like Windows, Mac OS, and Linux.

The shell started before operating systems incorporated graphical file explorers to democratize the access to home computers for the users. Nowadays, you open your computer and use your mouse to navigate through applications and directories. Back then, only the command-line shell existed, to run programs.

Yet, still today, the shell is used in the daily lives of programmers, engineers and data scientists as a prime tool for their work.

The shell on Windows is called the command-line and on Mac OS, the Terminal. I will use the command-line, terminal and shell interchangeably throughout this tutorial.

Why use the Shell in Data Science?

The shell is an efficient way often used in Data Science to manage and manipulate large datasets and offers many data science tools to make it possible for data scientists to deal with the amount of data that they do.

How the Shell Works

The command-line (shell), has a filesystem that manages the files and directories. Any file or directory has an absolute path from the root directory of the filesystem.

Thus, a file that is at the root directory would look like /filename.txt. Each folder (or directory) is separated by a slash (/). For example, if the file is located inside a folder, the path would look something like /Users/jchouinard/filename.txt.

Relative Paths Vs Absolute Paths

An absolute path in the shell is the path that starts from the root of the filesystem. The relative path is a path that starts from where you currently are in the shell.

For example, if you are searching for the /project directory and you are inside the /Documents directory, the absolute and relative paths will be different.

$ /users/Documents/project        # Absolute path
$ /project                                              # Relative path

This concept will be useful to understand when working with directories.

What is the Dollar ($) or Percentage (%) Sign in the Shell

In many tutorials, you will find a dollar sign ($), or a percentage sign (%) on lines that are intended to be a command that involves the command line.

$ pwd
# or 
% pwd

This is intended to mimic what you actually see in the Terminal.

The problem is that you don’t want to copy and paste this sign in your shell or it will now work

Just make sure that you exclude this sign when copying and pasting code from this tutorial to the shell.

What are the Command-Line Commands

The Terminal (or command-line) uses commands to know what to do.

Most Useful Shell Commands

The most commonly-used shell commands are:

  • cd: navigate directories
  • pwd: show current working directory
  • ls: listing files and directories
  • mkdir: creating a directories
  • cp: copy files in the shell
  • mv: move files or directories
  • cat: read files
  • touch: create files
  • head: view the head of a file
  • grep: search text datasets for lines that match a regular expression
  • clear: clearing the shell
  • python: use the python interpreter
  • wget: use the wget command

Getting Help on a Command

To get help documentation for a specific command in the shell, use the man command, for manual, along with the name of the command.

$ man <command>

The manual command will return textual documentation on whatever command name it is given.

Type q to quite the commands manual.

Alternatively, you can use the --help flag with the command name.

$ grep --help

How to Navigate in the Shell

In this section, we will learn to navigate in the shell using mostly these three shell commands:

  • pwd
  • ls
  • cd

How to Find What is the Current Working Directory in the Command-line Shell

The first command that you can use to navigate the shell is the pwd command, short for print working directory.

The pwd command shows the absolute path of the current working directory.

$ pwd

How to List Files in the Command-line Shell

To list files and directories that are inside your current working directory (cwd) in the Terminal shell, use the ls shell command, short for listing.

How to List Files in the Current Working Directory

The ls command will list the contents of the current working directory.

$ ls

Note that the single-dot notation (.) always means the current working directory. Thus, doing ls . is the same as simply doing ls.

$ ls .

How to List Files Inside a Specific Directory

To list files inside a specific directory using the Terminal (or command-line), use the ls command and add the directory location.

$ ls path/to/directory

How to List Files Inside a Directory that Has Spaces in it

To list files inside a directory that has spaces in the name, use the backslash (\) to escape the whitespace.

$ ls Documents/my\ projects

How to List Every Single File in a Directory (Including Nested)

To list everything inside a directory in the Terminal or Command-Line, use the ls command in the shell, the -R command flag and the name of the file to be inspected. The -R flag lists all subdirectories recursively.

$ ls -R

How to Move Directories in the Command-line Shell

To move around directories in the command-line shell, use the cd command, short for change directory, along with the location where you want to move to.

How to Move to a Specific Directory in the Command-line

To move to a specific directory in the Terminal shell, use the cd command along with the relative or absolute path where you want to move to.

$ cd path/to/directory
Move down a directory in Terminal
Move down a directory in command-line

How to Move Up a Directory in the Command-line

There are two ways to move up to a parent directory in the shell: using cd command along with the absolute path or the double-dot .. notation. The .. notation in the shell means the directory above the current working directory.

$ cd ..
Move up a directory in Command-line
Move up a directory in Terminal

How to Move Up Multiple Directories in the Command-line

To move up multiple directories in the shell, use the cd command along with the ../.. notation with each forward-slash double-dot moving up one directory.

Move up two directories in Shell

$ cd ../..

Move up three directories in Shell

$ cd ../../..

How to Move to the Home Directory in the Shell

To move to the home directory in the command-line shell, use the cd command with the tilde character (~).

$ cd ~

How to Clear the Shell

Use the clear command to clear the shell from whatever command that you have previously written and gain better legibility.

$ clear

Note that the clear command does not delete previous commands, but simply starts a new line at the top. You can always scroll up to previous lines.

How to Manipulate Files and Directories in the Shell

The shell provides commands to navigate through files and directories. You can efficiently manage and manipulate files, extract data and merge datasets with simple commands in the Shell.

How to Copy Files in the Shell

To copy files in the command-line shell, use the cp command, short for copy, and provide the path of the original file and the filename and path where to copy the file to.

$ cp original.txt destination.txt
Copy files in the terminal
Copy files in the command-line

How to Copy Files in to a Different Location in Command-Line

To copy files from a location to another one in the command-line shell, use the cp command, short for copy, and provide the original file and the path where to copy the file to.

$ cp path/from/file.txt path/to/copied_file.txt

How to Copy Multiple Files to a New Location in Command-Line

To copy multiple files from a location to another directory in the command-line shell, use the cp command, add the path from, and use the curly brackets ({}) to list the filenames to be copied.

$ cp /home/{file1,file2,file3} /home/destination/

How to Move Files in the Shell

To move files from one directory to the other in the command-line shell, use the mv command, short for move, and provide the path of the original file and the filename and path where to move the file to.

$ mv path/from/file.txt path/to/directory

How to Move Multiple Files to a New Location in Command-Line

To move multiple files from a location to another directory in the command-line shell, use the mv command, add each file to move with a space, and finally add the location where to move the files to.

$ mv file1.txt file2.txt path/to/directory

How to Rename Files in the Shell

To rename files in the Terminal or Command-Line, use the mv command in the shell by providing the current name and the new name of the file in the same directory.

$ mv current_name.txt new_name.txt

How to Delete Files in the Shell

To delete files in the Terminal or Command-Line, use the rm command, short for remove, in the shell by providing the path of the file to delete.

$ rm file_to_delete.txt

How to Delete Multiple Files in the Shell with rm

It is possible to delete multiple files at once in the Terminal or Command-Line by using the rm command in the shell and listing all the filenames separated by spaces.

$ rm file1.txt file2.txt

How to Delete Directories in the Shell with rmdir

To delete directories in the Terminal or Command-Line, use the rmdir command in the shell by providing the name of the directory to remove. This command only works for empty directories, thus you need to delete files before deleting directories.

$ rmdir path/to/delete

How to Create Directories in the Shell with Mkdir

To create directories in the Terminal or Command-Line, use the mkdir command in the shell by providing the name of the directory to create.

$ mkdir path/to/create

How to Create Files in the Shell with Touch

To create files in the Terminal or Command-Line, use the touch command in the shell by providing the name of the file to create.

$ touch filename.txt

What are Command-Lines Flags

Command-line flags are used to override the values of the configuration of shell commands. Flags have one or two characters that follows the dash symbol (-). Each command has a series of flags that it can be used with.

Double-dashes (–) signify a more verbose way of interacting with the flags (e.g. -h vs --help).

For example, the ls command can be used with 30 flags such as -A, -a, -b, -c, -C, etc.

For example, the -s flag shows the size in bytes of each item listed by the ls command.

$ ls -s
Example of shell flag
Example of command-line flag

How to Manipulate Data in the Shell

The command-line or Terminal provides commands to manipulate data inside files so that you can automate file modification with efficient commands in the Shell.

How to Look at the Content of a File in Shell

To open files and look their contents in the command-line shell, use the cat command, short for concatenate, and provide the path of the file to be inspected.

$ cat filename.txt

How to View the Head of a File in the Shell

To view the first few lines of a document in the Terminal or Command-Line, use the head command in the shell by providing the name of the file to be inspected.

$ head filename.txt

How to View the First n Rows of a File in the Shell

To view the first N lines of a document in the Terminal or Command-Line, use the head command in the shell, the -n command flag and the name of the file to be inspected.

For example, the command flag -n 5 will only show the first 5 lines of a file.

$ head -n 5 filename.txt

How to View Subsets of Columns of a File in the Shell

To view the specific columns of a document in the Terminal or Command-Line, use the cut command in the shell, the -f and the -d command flag as well as the name of the file to be inspected.

The -f flag lets you specify the columns indexes to select (here 2 to 4) and the -d lets you specify the delimiter to be used (here the comma ,).

$ cut -f 2-4 -d , filename.csv

How to Show All Previous Commands in the Shell

To view the previous commands used in the Terminal or Command-Line, use the history command in the shell.

$ history

What is the Grep Command in the Shell

The grep command is used in the shell to search for rows in text files that match specific strings. It can be used to match exact strings or patterns such as regular expressions.

How to Use Grep in Shell

To use the grep command in the Terminal or Command-Line, type the grep command in the shell, followed by the string to be matched and the filename(s) to search into. Optionally type the grep-specific flags.

$ grep <-flags> pattern <filename>

How to Search for Rows with Specific Text in Shell

To view the rows that contain specific text inside files using the Terminal or Command-Line, use the grep command in the shell, followed by the text to match and the filename.

$ grep text filename.csv
Matching text with grep in the command-line
Matching text with grep in the terminal

How to Search Find Rows that Matches RegEx in Shell

To view the rows of a file that match a regular expression using the Terminal or Command-Line, use the grep command in the shell, followed by the regular expression to match between single quotes and the filename.

$ grep 'regular-expression' filename.csv
Using Regex with grep in the Terminal
Using Regular Expressions with grep in the Shell

Common Grep Flags

The most commonly used flags of the grep command in the shell are the -i, -v , -e and -c flags:

  • -i: ignore casing (e.g. HELLO and hello)
  • -v: show rows that don’t match
  • -c: count the number of rows that match
  • -e: specify more than 1 search pattern

Here is a full list of grep related flags.

Shell Shortcuts

Here are a few keyboard shortcuts that can be used in the shell to improve efficiency.

How to Autocomplete Shell Commands using Tab

To perform tab completion in the Shell, start with the command name, the first few characters of your command and then press tab on your keyboard to complete the path.

$ ls aut (tab)

Output

$ ls autocompleted/path

How to Show Last Command in the Shell using Arrows

Up and down arrows of your keyboard can be used to navigate through each commands that were made in the Terminal or Command-line Shell. With the up arrow, you move back to the previous command, with the down arrow, you move to the command that was made after the currently displayed command.

What is the ! Before a Shell Command?

An exclamation point used before a command in the Terminal or Command means that you run again the most recent use of a command.

So if you have ran the following commands in your history:

$ ls directory_name
$ head filename.csv
$ head file2.csv

Then, the !head command will re-run the following command.

$ head file2.csv

How to Run Python Code in the Shell

To run Python code in the shell, use the python command, followed by the name of the .py file to execute in the command-line or Terminal. You need to install Python first.

$ python filename.py

How to Run Wget in the Shell

To run the wget command in the Shell, use the wget command, followed by the relevant flags and attributes to execute in the command-line or Terminal. You need to install wget first.

$ wget https://www.jcchouinard.com/robots.txt

Follow this tutorial to learn how to use the wget command.

Conclusion

This the end of the introduction to the Unix shell for data science.

Enjoyed This Post?