Learning the Unix shell, also known as command-line interface (CLI), command prompt or terminal, is beneficial for data scientists because it allows efficient data manipulation and automation, enabling tasks like cleaning, preprocessing, and automating workflows.
The shell facilitates the access to data, the management and organization of files, and makes it easier to handle and merge datasets. The more advanced you get in data science, the more you will end-up using the shell.
This tutorial shows you how to use the Shell for Data Science.
What is the Shell
The Shell, or command-line shell, is the program used for executing commands (e.g. Terminal, Command-line). It takes the instructions typed by the user and transmits them to the operating system like Windows, Mac OS, and Linux. The Shell is used to preview, modify and inspect files and directories.
The shell started before operating systems incorporated graphical file explorers to democratize the access to home computers for the users. Nowadays, you open your computer and use your mouse to navigate through applications and directories. Back then, only the command-line shell existed, to run programs.
Yet, still today, the shell is used in the daily lives of programmers, engineers and data scientists as a prime tool for their work.
The shell on Windows is called the command-line and on Mac OS, the Terminal. I will use the command-line, terminal and shell interchangeably throughout this tutorial.
Why use the Shell in Data Science?
The shell is an efficient way often used in Data Science to manage and manipulate large datasets and offers many data science tools to make it possible for data scientists to deal with the amount of data that they do.
How the Shell Works
The command-line (shell), has a filesystem that manages the files and directories. Any file or directory has an absolute path from the root directory of the filesystem.
Thus, a file that is at the root directory would look like /filename.txt
. Each folder (or directory) is separated by a slash (/
). For example, if the file is located inside a folder, the path would look something like /Users/jchouinard/filename.txt
.
A directory is a location used to store files on your computer (e.g. folder).
Relative Paths Vs Absolute Paths
An absolute path in the shell is the path that starts from the root of the filesystem. The relative path is a path that starts from where you currently are in the shell.
For example, if you are searching for the /project
directory and you are inside the /Documents
directory, the absolute and relative paths will be different.
$ /users/Documents/project # Absolute path
$ /project # Relative path
This concept will be useful to understand when working with directories.
What is the Dollar ($) or Percentage (%) Sign in the Shell
In many tutorials, you will find a dollar sign ($
), or a percentage sign (%) on lines that are intended to be a command that involves the command line.
$ pwd
# or
% pwd
This is intended to mimic what you actually see in the Terminal.
The problem is that you don’t want to copy and paste this sign in your shell or it will now work
Just make sure that you exclude this sign when copying and pasting code from this tutorial to the shell.
What are the Command-Line Commands
The Terminal (or command-line) uses commands to know what to do.
Most Useful Shell Commands
The most commonly-used shell commands are:
cd
: navigate directoriespwd
: show current working directoryls
: listing files and directoriesmkdir
: creating a directoriescp
: copy files in the shellmv
: move files or directoriescat
: read filestouch
: create fileshead
: view the head of a filegrep
: search text datasets for lines that match a regular expressionclear
: clearing the shellpython
: use the python interpreterwget
: use the wget command
Getting Help on a Command
To get help documentation for a specific command in the shell, use the man
command, for manual
, along with the name of the command.
$ man <command>
The manual command will return textual documentation on whatever command name it is given.
Type q
to quite the commands manual.
Alternatively, you can use the --help
flag with the command name.
$ grep --help
In this section, we will learn to navigate in the shell using mostly these three shell commands:
- pwd
- ls
- cd
How to Find What is the Current Working Directory in the Command-line Shell
The first command that you can use to navigate the shell is the pwd
command, short for print working directory.
The pwd
command shows the absolute path of the current working directory.
$ pwd
How to List Files in the Command-line Shell
To list files and directories that are inside your current working directory (cwd) in the Terminal shell, use the ls
shell command, short for listing.
How to List Files in the Current Working Directory
The ls command will list the contents of the current working directory.
$ ls
Note that the single-dot notation (.
) always means the current working directory. Thus, doing ls .
is the same as simply doing ls
.
$ ls .
How to List Files Inside a Specific Directory
To list files inside a specific directory using the Terminal (or command-line), use the ls command and add the directory location.
$ ls path/to/directory
How to List Files Inside a Directory that Has Spaces in it
To list files inside a directory that has spaces in the name, use the backslash (\
) to escape the whitespace.
$ ls Documents/my\ projects
How to List Every Single File in a Directory (Including Nested)
To list everything inside a directory in the Terminal or Command-Line, use the ls
command in the shell, the -
R command flag and the name of the file to be inspected. The -R
flag lists all subdirectories recursively.
$ ls -R
How to Move Directories in the Command-line Shell
To move around directories in the command-line shell, use the cd
command, short for change directory, along with the location where you want to move to.
How to Move to a Specific Directory in the Command-line
To move to a specific directory in the Terminal shell, use the cd
command along with the relative or absolute path where you want to move to.
$ cd path/to/directory
How to Move Up a Directory in the Command-line
There are two ways to move up to a parent directory in the shell: using cd
command along with the absolute path or the double-dot ..
notation. The .. notation in the shell means the directory above the current working directory.
$ cd ..
How to Move Up Multiple Directories in the Command-line
To move up multiple directories in the shell, use the cd
command along with the ../..
notation with each forward-slash double-dot moving up one directory.
Move up two directories in Shell
$ cd ../..
Move up three directories in Shell
$ cd ../../..
How to Move to the Home Directory in the Shell
To move to the home directory in the command-line shell, use the cd
command with the tilde character (~
).
$ cd ~
How to Clear the Shell
Use the clear command to clear the shell from whatever command that you have previously written and gain better legibility.
$ clear
Note that the clear command does not delete previous commands, but simply starts a new line at the top. You can always scroll up to previous lines.
How to Manipulate Files and Directories in the Shell
The shell provides commands to navigate through files and directories. You can efficiently manage and manipulate files, extract data and merge datasets with simple commands in the Shell.
How to Copy Files in the Shell
To copy files in the command-line shell, use the cp
command, short for copy, and provide the path of the original file and the filename and path where to copy the file to.
$ cp original.txt destination.txt
How to Copy Files in to a Different Location in Command-Line
To copy files from a location to another one in the command-line shell, use the cp
command, short for copy, and provide the original file and the path where to copy the file to.
$ cp path/from/file.txt path/to/copied_file.txt
How to Copy Multiple Files to a New Location in Command-Line
To copy multiple files from a location to another directory in the command-line shell, use the cp
command, add the path from, and use the curly brackets ({}
) to list the filenames to be copied.
$ cp /home/{file1,file2,file3} /home/destination/
How to Move Files in the Shell
To move files from one directory to the other in the command-line shell, use the mv
command, short for move, and provide the path of the original file and the filename and path where to move the file to.
$ mv path/from/file.txt path/to/directory
How to Move Multiple Files to a New Location in Command-Line
To move multiple files from a location to another directory in the command-line shell, use the mv
command, add each file to move with a space, and finally add the location where to move the files to.
$ mv file1.txt file2.txt path/to/directory
How to Rename Files in the Shell
To rename files in the Terminal or Command-Line, use the mv
command in the shell by providing the current name and the new name of the file in the same directory.
$ mv current_name.txt new_name.txt
How to Delete Files in the Shell
To delete files in the Terminal or Command-Line, use the rm
command, short for remove, in the shell by providing the path of the file to delete.
$ rm file_to_delete.txt
How to Delete Multiple Files in the Shell with rm
It is possible to delete multiple files at once in the Terminal or Command-Line by using the rm
command in the shell and listing all the filenames separated by spaces.
$ rm file1.txt file2.txt
How to Delete Directories in the Shell with rmdir
To delete directories in the Terminal or Command-Line, use the rmdir
command in the shell by providing the name of the directory to remove. This command only works for empty directories, thus you need to delete files before deleting directories.
$ rmdir path/to/delete
How to Create Directories in the Shell with Mkdir
To create directories in the Terminal or Command-Line, use the mkdir
command in the shell by providing the name of the directory to create.
$ mkdir path/to/create
How to Create Files in the Shell with Touch
To create files in the Terminal or Command-Line, use the touch
command in the shell by providing the name of the file to create.
$ touch filename.txt
Echo
The echo Shell command is used to display text on the command-line interface (CLI). Also allows for editing text files.
Create text file with echo.
$ echo "some text" > somefile.txt
Append to text file with echo
$ echo "append some text" >> somefile.txt
What are Command-Lines Flags
Command-line flags are used to override the values of the configuration of shell commands. Flags have one or two characters that follows the dash symbol (-
). Each command has a series of flags that it can be used with.
Double-dashes (–) signify a more verbose way of interacting with the flags (e.g. -h
vs --help
).
For example, the ls
command can be used with 30 flags such as -A
, -a
, -b
, -c
, -C
, etc.
For example, the -s flag shows the size in bytes of each item listed by the ls command.
$ ls -s
How to Manipulate Data in the Shell
The command-line or Terminal provides commands to manipulate data inside files so that you can automate file modification with efficient commands in the Shell.
How to Look at the Content of a File in Shell
To open files and look their contents in the command-line shell, use the cat
command, short for concatenate, and provide the path of the file to be inspected.
$ cat filename.txt
How to View the Head of a File in the Shell
To view the first few lines of a document in the Terminal or Command-Line, use the head
command in the shell by providing the name of the file to be inspected.
$ head filename.txt
Use the ls
command with the -a
flag to show hidden files or directories in the Shell. A hidden directory is a directory not displayed to users.
$ ls -a
How to View the First n Rows of a File in the Shell
To view the first N lines of a document in the Terminal or Command-Line, use the head
command in the shell, the -n
command flag and the name of the file to be inspected.
For example, the command flag -n 5
will only show the first 5 lines of a file.
$ head -n 5 filename.txt
How to View Subsets of Columns of a File in the Shell
To view the specific columns of a document in the Terminal or Command-Line, use the cut
command in the shell, the -f
and the -d
command flag as well as the name of the file to be inspected.
The -f flag lets you specify the columns indexes to select (here 2 to 4) and the -d lets you specify the delimiter to be used (here the comma ,
).
$ cut -f 2-4 -d , filename.csv
How to Show All Previous Commands in the Shell
To view the previous commands used in the Terminal or Command-Line, use the history
command in the shell.
$ history
What is the Grep Command in the Shell
The grep
command is used in the shell to search for rows in text files that match specific strings. It can be used to match exact strings or patterns such as regular expressions.
How to Use Grep in Shell
To use the grep
command in the Terminal or Command-Line, type the grep
command in the shell, followed by the string to be matched and the filename(s) to search into. Optionally type the grep-specific flags.
$ grep <-flags> pattern <filename>
How to Search for Rows with Specific Text in Shell
To view the rows that contain specific text inside files using the Terminal or Command-Line, use the grep
command in the shell, followed by the text to match and the filename.
$ grep text filename.csv
How to Search Find Rows that Matches RegEx in Shell
To view the rows of a file that match a regular expression using the Terminal or Command-Line, use the grep
command in the shell, followed by the regular expression to match between single quotes and the filename.
$ grep 'regular-expression' filename.csv
Common Grep Flags
The most commonly used flags of the grep
command in the shell are the -i
, -v
, -e
and -c
flags:
-i
: ignore casing (e.g. HELLO and hello)-v
: show rows that don’t match-c
: count the number of rows that match-e
: specify more than 1 search pattern
Here is a full list of grep related flags.
Shell Shortcuts
Here are a few keyboard shortcuts that can be used in the shell to improve efficiency.
How to Autocomplete Shell Commands using Tab
To perform tab completion in the Shell, start with the command name, the first few characters of your command and then press tab
on your keyboard to complete the path.
$ ls aut (tab)
Output
$ ls autocompleted/path
How to Show Last Command in the Shell using Arrows
Up and down arrows of your keyboard can be used to navigate through each commands that were made in the Terminal or Command-line Shell. With the up arrow, you move back to the previous command, with the down arrow, you move to the command that was made after the currently displayed command.
What is the ! Before a Shell Command?
An exclamation point used before a command in the Terminal or Command means that you run again the most recent use of a command.
So if you have ran the following commands in your history:
$ ls directory_name
$ head filename.csv
$ head file2.csv
Then, the !head
command will re-run the following command.
$ head file2.csv
How to Run Python Code in the Shell
To run Python code in the shell, use the python
command, followed by the name of the .py
file to execute in the command-line or Terminal. You need to install Python first.
$ python filename.py
How to Run Wget in the Shell
To run the wget command in the Shell, use the wget command, followed by the relevant flags and attributes to execute in the command-line or Terminal. You need to install wget first.
$ wget https://www.jcchouinard.com/robots.txt
Follow this tutorial to learn how to use the wget command.
Conclusion
This the end of the introduction to the Unix shell for data science.
SEO Strategist at Tripadvisor, ex- Seek (Melbourne, Australia). Specialized in technical SEO. Writer in Python, Information Retrieval, SEO and machine learning. Guest author at SearchEngineJournal, SearchEngineLand and OnCrawl.