Learning to copy files using the command line is one of the most difficult tasks some students will encounter during Workshop practicals.
The faculty are not forcing the students to copy files using the command line based on a “that’s the way we did it” mentality, but rather on our current experience. Much of the data analysis that happens today is done on computing clusters in which the users’ only interaction with the computer is on the command line. Learning to use the command line effectively is an extremely important skill for the toolbox of anybody with data analysis ambitions.
This document is long, because it is attempting to explain things from as near to first principles as possible. You are likely familiar with many of the concepts discussed in the next session, in which case skip to the topics that will be useful.
If you are already familiar with navigating directories and using the command line to copy files, then you should find getting started with the practicals to be straightforward.
You will create some directories to organize things, and then in most cases copy scripts and data out of the appropriate directory in
/faculty to the directories you’ve created.
Because for many of the students this may be a completely new topic, We’ll start with some definitions.
- command line: The command line refers to a text based interface with a computer. Examples of these are the MS Window’s Command Shell and PowerShell and MacOS’ terminal. In this course connections to the cloud computing environment’s command line will be made using SSH.
- cli: An abbreviation for “command line interface”. This is often used when describing programs that are run at the commmand line. For example, R has a cli, but there are also other methods to interact with R, such as RStudio.
PRSicehave a cli.
- SSH: a versatile tool for securily transfering data between computers. In this course students may use SSH to access a command line in the cloud computing environment, or to copy files to or from the cloud computing environment.
- directory: A directory, also called a “folder” is an organizational unit for computer files. Files exist within a directory. Directories may contain files, other directories (subdirectories), or even nothing at all.
- folder: Synonym for “directory”. The two terms can be used interchangeably.
- directory you’re in: This, or other phrases involving “in” refer to the current working directory for the command line. Commands that do not specify a different directory will happen on the directory you are in. For example,
less foowill show the contents of a file named “
foo” in the directory you are in.
less /faculty/foowill show the contents of a file named “
foo” in the
cd“Change directory”: is the command used to change what directory you are in. It is similar to
- home directory: Each user has a “home directory” where all of their files are stored. This can be abbreviated as
~(the tilde symbol).
- faculty directory: For convenience, all of the faculties’ home directories are assembled in a folder called
- subdirectory: A directory that is within another directory. All directories (except the “root” directory) are subdirectories of other directories. Usually “subdirectory” will be used when this relationship is important. For example, instructions may say “copy the ‘HW2’ directory into a subdirectory of your ‘Day1’ directory.”
/“forward slash”: The forward slash is the Unix/Linux/MacOS directory separator. When writing out directory names the “
/” is used to separate directories and subdirectories. For example,
~/Day1/HW2refers to a subdirectory named “
HW2”, which is inside a directory named “
Day1”, which is inside of your home directory.
- path: “Path” is used to refer to a series of directories and subdirectories.
~/Day1/HW2is a path.
- file: an entity on a computer file system. Files may contain text, data, program instructions, or application specific data, such as a PowerPoint slide deck.
.“dot”: (A single period, or “dot”) This represents the directory the command line is operating in.
..“dot dot”: (Two periods, or “dots”) This represents the directory one level higher in the hierarchy.
- copy: The act of duplicating a file or directory from its origin to a different location or name. This action is usually done with the
- move: The act of removing a file or directory from its origin, and putting it in a different location, or changing its name. This action is usually done with the
cp“copy”: The Unix/Linux/MacOS command used to copy files or directories.
mv“move”: The Unix/Linux/MacOS command used to move or rename files or directories.
ls“list”: The Unix/Linux/MacOS list command. It shows the names of files and directories, and can also show other informatino about them.
less“sometimes less is more”: A general purpose tool for looking at the contents of a text file.
mkdir“make directory” : The command to create a directory. For example,
mkdir foowill create an empty directory named “
*/wild cards/globbing: These are characters which can be used to match multiple other characters. It is a powerful tool to avoid having to type multiple file names, when action is to be performed on several files. For example
foo.*could be used to match
foo.boz. The collective name of the characters used is “wild cards,” and the action of matching wild cards to files is called globbing.
- command line switches or options: Extra text given to a command to affect its behavior. Switches are often preceeded by
--. For example in
cp -vthe “
-v” is a switch to the “
- command line arguments: This text after a command which tells the command what to operate on. For example in
cp foo bar“
foo” and “
bar” are arguments to the “
cp” command. Some commands may require switches before some arguments.
- ENTER or RETURN: After typing a command at the command line, the ENTER or RETURN key must be pressed to submit the command.
Display conventions in this document
Text will be shown in several different fonts and formats to express meaning.
a fixed, or typewriter, font represents text on a command line. Either something that the user types, or that the computer outputs.
A screen shot of a terminal will show a sequence of command line entries and responses.
An example screen shot
Required arguments to a command will be represented by text surrounded by pointy brackets
< >. For example in
cp it is shown that some argument must be provided in the “source” and “destination” location. When substituting in real values for the arguments, the pointy brackets are not included. So the typed command would look like
cp source destination, to copy the file “source” to a file named “destination”.
Optional arguments are shown with square brackets
[ ]. These are arguments which are not necessary for the command to function, but may be provided by the user to achieve desired results.
Anatomy of a command line
A command line ready for input
There are several items on the default command line used at the Workshop.
- The first part is your username. In this case the example username is
@is a separator.
- Then comes the computer name. In this example it is
ip-10-0-201-191, but the exact name will be different depending on which cloud node you are connected to.
:is a separator.
~shows the current directory path.
~is used as a shorthand for the current user’s home directory.
$is the end of the command line. Anything typed will appear after the
$. Instructions later may show, for example,
$ lswhich will mean the user has typed
lsat the command line.
- The green rectangle is the cursor. Depending on your SSH client and exact terminal settings, the exact color and shape of the cursor will vary.
Putting that all together, if you see a command line showing
That means the user
smith12 is logged into the compute node
ip-10-0-200-233 and is currently in their home directory, and then in the subdirectories
Looking at files and directories
The list command
ls is the command used to list the names of files and directories.
ls with output
ls is run at the command line, and it shows a single thing is in the current directory, somethings named “
R” and “
Switches can be given to
ls to have it provide more information.
ls -l with output
drwxr-xr-x shows that the thing named “
R” is a directory, and the first “
-rw-r--r-- shows that “
foo” is a regular file. The letters following the first one have to do with permissions, and aren’t important at the moment.
Next is shown the owner of the files, “
student,” and the group of the file, “
students,”. These also aren’t important for what we’re doing.
Next is shown the size of the file, then the date and time the file was last modified, and finally the name of the file or directory.
ls -l are extremely useful for seeing what files and directories exist.
ls can be given a directory as an argument, and it will show the contents of that directory.
In all of these examples,
ls is showing directories in blue. That will probably be how your screen looks, but depending on exactly which terminal and SSH client you use directories may be shown in the same color as regular files.
ls /faculty/ with output
Looking inside a file
less command can be used to view the contents of a text file. Many files, such as R scripts and some data files are just text, and can be easily viewed with
To view the contents of a file, run
student@ip-10-0-200-228:~$ less foo
foo is literally filled with some random text. The final line
foo (END) is a status message from
less. It is giving the name of the file being viewed, and showing the position in the file.
If the file is long enough, it can be scrolled by pressing the arrow keys.
less, press the
q (quit) key.
Moving between directories is done using the
cd “change directory” command. The syntax of the command is
Where the destination is the name of the directory you want to move into. The destination is optional, because running
cd with no destination will return you to your home directory.
An example of
cd R has moved the user into the “
R” directory, and the command prompt has been updated to reflect this change.
A full path can be given as the argument to
An example of
cd with a complete path
and you will be moved to the final directory in the path. The effect is the same as using multiple
An example of
cd in separate steps
As can be seen in the previous few examples, the “
/” (forward slash) character is extremely important, and it has different meanings depending on where it is in the path.
When at the start of a name, it is telling the computer to look in the “root” directory for that item. For example “
/faculty” is in the “root” folder.
When in between names, it tells the computer that those are different directories or files. For example “
elizabeth/2022/corrs.csv” is referencing something named “
corrs.csv” which is in the “
elizabeth” directory and then the “
Leaving out a “
/” means that you are referencing something in the directory you are currently in. For example
Importance of the
There is no directory called “
/R”, so it is not possible to change there. An error is shown, “
No such file or directory”. This error is not serious, and does not cause any problems. It just means that the change directory command could not complete, and you should check for typos, a misplaced
/, or other problems.
Actually copying files
cp” is the primary command used to copy files at the command line.
The basic syntax
The basic syntax for
cp creates a duplicate of the source file (or directory in some circumstances) at the destination.
Copying the file “
foo” to another file called “
bar” is done with the command
cp foo bar
This will result in two identical files,
bar in the current directory.
Copying a file
cp can be combined with wild cards to copy multiple files at the same time. For example
cp /faculty/elizabeth/2022/*.R .
will copy all of the files that end in “
.R” to the current directory, which is referenced by “
.” which is usually spoken as “dot”.
cp can be given the
-r “recursive” switch to cause it to copy a directory, and everything in that directory. For example
Copying a directory
has copied everything in the directory
/faculty/elizabeth/2022 to the current directory. There is now a new
2022 directory which contains a copy of everything that is in the
At the start of most practicals, you will use either
cp -r or
cp with wild cards to copy files out of the appropriate directory under
/faculty to one of your directories.
mv” is the primary command used to move files at the command line.
mv is used in a similar way to
cp, but there are some very important differences. The most important is that
mv removes the source file. After running
mv you still have the same number of files or directories you started with, they are just located someplace else, or have a different name.
mv foo bar
renames the file “
foo” to “
bar”. In this case “
foo” could have been a directory, and then it will be renamed to a directory called “
When moving multiple files (or files and directories), then the destination must be a directory.
mv *.R My-R
Will move all of the files in the current directory that end in “
.R” into the directory “
My-R”. The destination directory, “
My-R” in the example, must exist before running the
mv command. It will not automatically be created.
Creating directories with
The command to “make directories” is
mkdir. It is very simple to use, just
To create a directory called “
foo” just run
Your friends TAB and Up Arrow
Two huge time savers are the use of the TAB key and the Up Arrow key.
TAB is used to complete text on the command line. For example, if I want to copy files from
/faculty/elizabeth/2022, I don’t need to type out all of those characters. This is what my typing will actually look like, with
#TAB# for each time I press the TAB key.
and then continue typing
which completes to
If a completion isn’t unique, then pressing TAB a second time will list the possible completions. If nothing is listed after repeated pressings of TAB, then there aren’t any possible completions.
This is what that might look like at a terminal, with a red mark inserted each time I pressed the TAB key, and the rest of the line being what was automaticaly added by the computer.
Using TAB to complete text
Use of the TAB key is highly recommended to avoid typos in long file and directory names.
The Up Arrow is used to recall previous typed commands. Those commands can then be edited or used again as they are. The red arrow shows where I pressed the up arrow.
ls with output
I pressed the Up Arrow once to recover
ls, pressed ENTER, and then I pressed the Up Arrow again to recover
ls, but edited the line to add a
-l before pressing ENTER.
That is a very trivial example, and it is hardly worth pressing the Up Arrow to recover
ls, but on long and complicated commands, the Up Arrow is a large time saver.
After pressing the Up Arrow multiple times and getting into your command “history”, it is possible to use the Down Arrow to move to more recent commands. You can return to an empty command line by either pressing the Down Arrow until you are back at a bare prompt, or pressing
Long commands can be edited by using the left and right arrows, and when modified to your satisfaction, pressing the ENTER (or RETURN) key will submit the command line.