9. Scripts¶
9.1. Exercises: scripts¶
Exercise 9.1: Copying the files required for the exercises
Open a terminal from the Jupyter “Home” page in your web browser (the view of your home directory), by clicking on “Terminal” in the “New” pull down menu.
If you haven’t already done so, first create a folder/directory named cfb_2021
in your network folder for this course:
cd # go to root of your home directory
mkdir cfb_2021 # mkdir creates a directory
cd cfb_2021 # change directory
pwd # print current working directory.
# In my case, the result is: /home/kalbers/cfb_2021
Now we are going the copy the files and scripts necessary to perform these exercises to the “cfb_2021” directory. The Linux command “cp” copies files and directories.
cp -rp /vol/cursus/CFB/scripts_and_debugging ~/cfb_2021/
Note
The ~
refers to the root of your home directory (whatever your username is), and the directory cfb_2021
has to exist for this particular copy command to work.
Now change directory (using cd
) to the directory ~/cfb_2021/scripts_and_debugging
.
Validate that you are in the correct directory using the command pwd
as above.
Exercise 9.2: Comparing the Jupyter notebook with running scripts from the command line
Open a new Python3 Jupyter-notebook for these exercises. Copy the following code into a cell and execute the cell:
print("Hello")
my_number = 1
my_number
Questions:
Describe the exact output.
Which of the three lines in the above code produce output?
Is this output formatted in the same way?
Now create a new text file by clicking “Text File” in the “New” drop-down menu in the file-browser tab-page of your Jupyter notebook.
Make sure to first go the correct subdirectory (~/cfb_2021/scripts_and_debugging
) by clicking on the respective subdirectories in the
Jupyter file browser window.
Put the same three lines of code in this text file. Then, rename the text file to “exercise2.py” by clicking on the file name at the top of the screen.
(The default file name is “untitled.txt”).
If everything went well, the word “print” should now have the color green, and the text “Hello” should be colored red. This is called syntax highlighting.
Now, run the script from the command line in the terminal window as follows:
python exercise2.py
Questions:
Is the output of the script printed to the terminal window exactly the same as the output printed to screen when executing the cell in a Jupyter notebook?
How does a Jupyter-notebook differ from a script in terms of the output, the information that is written to the screen?
Exercise 9.3: Permanence of Python variables in memory
Copy the following code to a cell in a Jupyter notebook and execute the cell.
my_number = 42
Copy the following code to a new cell below the previous one in the same Jupyter notebook, and execute the cell:
print("my_number:", my_number)
Question:
What is the output of the second cell in the Jupyter notebook?
Next, create a Python script called exercise3a.py
with exactly the same code as the first cell (i.e. my_number = 42
).
Create a second Python script called exercise3b.py
with exactly the same code as the second cell (print("my_number:", my_number)
).
Now go to your Terminal and execute the first script and the second script in that order:
python exercise3a.py
python exercise3b.py
Questions:
Does executing the first script produce an error message? If not, what is the output printed to the screen? If yes, why?
Does executing the second script produce an error message? If not, what is the output printed to the screen? If yes, why?
What is the state of the Python memory when it starts to execute a Python script?
9.2. Using script with variable input¶
Create the a file called module_using_sys.py
with the following contents:
import sys
print('The command line arguments are:')
for i in sys.argv:
print(i)
Let’s run this program from the shell.
$ python module_using_sys.py we are arguments
The command line arguments are:
module_using_sys.py
we
are
arguments
The argv
variable in the sys
module contains everything that you have typed on the command line. The sys.argv
variable is a list
of string
s. Specifically, the sys.argv
contains the list of command line arguments i.e. the arguments passed to your program using the command line.
Here, when we execute python module_using_sys.py we are arguments
, we run the module
module_using_sys.py
with the python
command and the other things that follow are arguments
passed to the program. Python stores the command line arguments in the sys.argv
variable for us
to use.
Remember, the name of the script running is always the first argument in the sys.argv
list. So,
in this case we will have 'module_using_sys.py'
as sys.argv[0]
, 'we'
as sys.argv[1]
,
'are'
as sys.argv[2]
and 'arguments'
as sys.argv[3]
.
9.3. Exercises using sys.argv
¶
Exercise 9.4: Reading the command line arguments
Make a script called nuc_arg.py
that prints the nucleotide content of a sequence that you give as argument on the command line.
For example, this command:
$ python nuc_arg.py TGACTCA
should print the following output:
2 2 1 2
Hint: if you want, you can use the nucleotide count function from the lecture.
Now you can edit the script nuc_arg.py
.
Exercise 9.5: FASTA statistics
Write a script called nuc_fasta.py
.
This script should accept the name of a FASTA file as argument, read the FASTA sequences in that file and print the nucleotide content of all the sequences.
The script should print a header line, followed by the FASTA id (the sequence name) of every sequence followed by the nucleotide content of that sequence. The output should be tab-separated. The nucleotide content should be specified as a fraction of the sequence length, with two digits in the order A, C, G and T. When you run the script nuc_fasta.py
on the input file /vol/cursus/CFB/scripts_debugging/sequences.fa
the output should exactly mach the following:
name A C G T
chr14:89352059-89352259 0.31 0.28 0.20 0.22
chr5:74264624-74264824 0.34 0.20 0.20 0.26
chr2:132500203-132500403 0.23 0.12 0.21 0.43
chr6:30630663-30630863 0.28 0.23 0.27 0.22
chr15_KI270905v1_alt:1999423-1999623 0.35 0.22 0.15 0.28
9.4. Using available modules: argparse¶
The Python standard library, which is included with every Python version, contains a lot of useful modules (see here) In addition, there are many third-party modules available. Some of these we will use in this course, such as pandas for data analysis and matplotlib for making figures. However, it is not possible to exhaustively cover all these modules in this course. Therefore, it is very useful to be able to search for modules and for examples and tutorials on how to use tese modules. With the subjects covered so far, you should have enough understanding of basic Python principles.
Let’s take an example, the argparse
module.
This is a very useful module for scripts that take (a lot of) command-line arguments. Some examples on how to use this module:
https://docs.python.org/3/howto/argparse.html
Have a look, Google it, try it out! We will use the argparse module in our final assignment for today.
9.5. Exercises: putting it all together¶
Exercise 9.6: motif scanning
Write a command-line Python script that scans a FASTA file with a IUPAC consensus sequence.
As an optional argument to the script, a use should be able to specify a number of mismatches.
It should be possible to specify these arguments on the command line, use the argparse
module.
As output, the script should print, for every match, the ID of the sequence and the position of the match within the sequence.
Take care of the following:
Comment your code
Use functions
Follow the style guide
Test your code
Start small. It is better to have a working script that has limited functionality but works well, than a script that tries to do everything and fails.
Test data
Two examples of consensus sequences and FASTA files (for c-Myc and STAT3 from Chen et al. 2008) are located in /vol/cursus/CFB/consensus_motif
.
consensus.txt
c-Myc.fa
STAT3.fa
Extension (optional!)
Implement the ‘Match’ algorithm from Kel et al. 2003 to scan for matches to a positional weight matrix.
The log
function can be imported from the math
module. See the definition of this function here.