Installing Software and Configuration
Updated on 03/05/2019
You need to have access to a unix based operating system to use Relion. Mac and linux systems are mostly ready to interact with the FarmShare2 computer cluster. However if you would like to stick with Windows, please configure your computer according to the below instructions to install Ubuntu subsytem for Windows and X server. Mac (only newer MacOS) users please follow this link to install X11 if you do not have it already.
Ubuntu on Windows
We will use FarmShare2 computer cluster to process your data, you can also use your own laptop or desktop cumputer if you have a modern GPU to process this dataset. You can also use computers without GPUs which is not covered in this tutorial. If you would like to use your own computers please check relion, motioncor2 and gctf instructions to install these software.
Commands are written after "$" sign, you should not include "$" sign in your commands.
Lets first login to a "login node" in FarmShare2 using your linux/mac/ubuntu-on-windows terminal
$ ssh -Y sunetid@rice.stanford.edu
Create a Project Directory
$ mkdir Project_T20S
Copy the necessary programs compiled for FarmShare2 and the data set.
$ cp /home/alpays/public/software.tar.gz ~/.
$ cp /home/alpays/public/dataset.tar.gz ~/Project_T20S
Untar these files
$ cd
$ tar -xvzf software.tar.gz
remove the tar file
$ rm software.tar.gz
You will have relion, Gctf and MotionCor2 along with a slurm submission script.
$ cd Project_T20S
$ tar -xvzf dataset.tar.gz
$ rm dataset.tar.gz
Configure .bashrc for our programs
$ cd
$ cp ~/.bashrc backup_bashrc
$ echo "ml use $HOME/software/alpayslua" >> ~/.bashrc
Check if "ml use $HOME/software/alpayslua" is added at the bottom of the file
$ vi ~/.bashrc
Go to the end of the file with your cursor, and go to the end of the line, click "i", click return and add the following lines starting with export.
export RELION_PDFVIEWER_EXECUTABLE="$HOME/software/alpaysprograms/xpdf/4.0/xpdf"
export RELION_GCTF_EXECUTABLE="Gctf-v1.06_sm_20_cu8.0_x86_64"
FarMShare2 and SLURM Information
Please familrize yourself with FarmShare2 from the below links
- https://srcc.stanford.edu/farmshare2
- https://web.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Main_Page
- https://web.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/User_Guide
Some of the most used slurm commands;
srun, sbatch, squeue, scancel, sinfo, and scontrol
Below are the available compute nodes in FarmShare2.
sinfo -e -o "%.10R %.8D %.10m %.5c %7z %8G %110f"
You will use mostly "gpu" and "normal" partitions to run your jobs. You have 48GB, and you can use 64GB upto 1 week on your home directory.
You have more scratch space at the directory below.
/farmshare/user_data/yoursunetid
You will mostly submit batch jobs to the cluster using the provided slurm script through the relion GUI.
Before starting to run relion, you should load all the necessary software and libraries every time you start a new terminal.
ml cuda/8.0 openmpi/3.0.0 MotionCor2 Gctf relion #this command will load all the necessary software.
You should always run relion in the project directory and your micrographs should be in a separate directory under the project directory.
You can run the relion GUI and light processes on the login node. However you need to run the computational intensive jobs and the jobs requiring special resources like GPU or high memory on the compute nodes. You will submit computational intensive jobs to compute nodes through relion GUI and a SLURM submission script. SLURM is the scheduler program for the FarmShare2 cluster.
We will also be using other programs (MotionCor2 and Gctf) wrapped in relion for some processes.
Please refer to the relion tutorial for further information for the relion related parameters.
After finishing these steps, you should start from "Getting Ready" step each time you want to run relion.
Getting Ready
Open your FarmShare2 terminal
$ ssh -Y sunetid@rice.stanford.edu
Purge all the modules by typing
$ ml purge
Load relion by typing
$ ml relion
Other programs can be load now but not necessary, they will be always loaded with the submission script for your batch jobs.
$ ml cuda/8.0 openmpi/3.0.0 MotionCor2 Gctf relion
Go to the project directory
$ cd Project_T20S
Start Relion by typing
$ relion
Below, there will be instructions for the values you should use. If the value is not mentioned, please use the default values or consult the Relion tutorial. You can also email me at alpays@stanford.edu.
You will also need a visualization software for 3 dimensional em densities. I would recommend using Chimera from UCSF, pymol should work too.
In order to copy files from farmshare to your local computer, you can use scp or rsync commands. I would suggest you to use sshfs command to connect one of your local directories to your home directory on farmshare.
Your local linux systems should have sshfs installed. You should install FUSE and SSHFS from the osxfuse site for your local Mac, and for your local windows please follow the instruction on this link.
Create a folder named farmshare on your local computer and then use the following command to connect this folder to your home folder in farmshare cluster.
sshfs yoursunetid@rice.stanford.edu:/home/yoursunetid/ pathtofarmsharefolderonyourlocalcomputer/
This connection will be kept as long as you keep your internet connection (or until the key expires). You can re-run this same command to reestablish the link.
Import
Input files: empiar_10025_subset/14sep*.tif
Node type: 2D micrograph movies (*.mrcs, *.tiff)
RUN
MotionCor2
The purpose of this stem is to correct beam-induced sample motion recorded on dose fractionated movie stacks.
Input movies STAR file: Import/job001/movies.star #job number in this input may change, you basically need to input the imported movies star file.
First frame: 1
Last frame: -1
The pixel size on the dataset : 0.6575
Voltage: 300
Dose per frame (e/A2) : 1.4
Pre-exposure (e/A2): 0
Do dose-weighting?: Yes
Save non-dose weighted?: No
Bfactor: 150
Number of patches: 5 x 5
Group frames: 1
Binning factor: 1
Gain-reference image: empiar_10025_subset/norm-amibox05-0.mrc #Select the .mrc file in the dataset
Gain rotation: No rotation (0)
Gain flip: Flip upside down (1)
Defect file:
Use RELION's own implementation: No
MotionCor2 executable: MotionCor2
Which GPUs to use: 0:1
Other MotionCor2 arguments: -InTiff
Number of MPI procs: 2
Number of threads: 1
Submit to queue: Yes
Queue name: gpu
Submit queue command: sbatch
Memory: 32G
Number of GPU:2
Wall Time: 01:00:00
Standard sumbission script: /home/yoursunetid/software/scripts/relion/farmshare-relion.sbatch
RUN
You can close the relion GUI and check your whether if you job is running by the following command
$ squeue -u yoursunetid
You can rerun relion by typing (if you close your terminal, you should start back from the "Getting Ready" step)
$ relion
You can also open an additional FarmShare2 terminal and run some additional commands like monitoring your slurm jobs while relion is on.
You can inspect your results clicking the logfile.pdf from "Display" dropdown menu after clicking the MotionCor2 job.
You can also inspect individual integrated image files from Relion-GUI/File/Display/MotionCorr/job_name/folder_name/blabla.mrc (scale-0.1, sigma-3, lowpass-10)
If you think you made a mistake and cancel your slurm batch job, you can use the following command
scancel jobid #jobid can be found with the command --> squeue -u yoursunetid
CTF Estimation
Input micrographs: Your call (select the star file of the motion correction job)
Use micrograph without dose-weighting?: No
Spherical aberration (mm): 2.7
Voltage (kV): 300
Amplitude contrast: 0.1
Magnified pixel size (Angstrom): 0.6575
Amount of astigmatism (A): 100
FFT box size (pix): 512
Minimun resolution (A): 30
Maximum resolution: 2
Minimum defocus valuse (A): 5000
Maximum defocus value (A): 50000
Defocus step size (A): 500
Estimate phase shifts?: No
Use CTFFIND-4.1?: No
Use Gctf instead?: Yes
Gctf executable: Gctf-v1.06_sm_20_cu8.0_x86_64
Ignore `Searches` parameters?: No
Perform equi-phase averaging?: Yes
Other Gctf options:
Which GPUs to use: 0:1
Number of MPI procs: 2
Submit to queue?: Yes
Queue name: gpu
Submit queue command: sbatch
Memory: 32G
Number of GPU: 2
Wall time: 00:30:00
Standard sumbission script: /home/yoursunetid/software/scripts/relion/farmshare-relion.sbatch
RUN
Manual picking
You can manually pick about 300 to 1000 particles
Input micrographs: Your call (It should be the star file of the Ctf estimation job)
Particle diameter (A): Your call. (This is very easy to test, just give a value, and examine during manual picking, you can then close popped up additional manual picking windows, change any parameter and click the manual picking job under "Finished jobs" and click "Continue" to start picking again with the new parameters.)
Sigma contrast: 3
White value: 0
Black value: 0
Lowpass filter (A): 20
Highpass filter (A): -1
Pixel size (A): -1
RUN
After running this job, you click the finished/unfinished job, change parameters, like particle diameter and click "Continue". Particle size parameter in this step is just for visual purpose, but it will be a good practice to understand the size of the molecule in both Ångström and pixel. (When you extract your particles you can use this value. Your particle should cover 2/3 of the box size during extraction.)
Particle Extraction
This is a CPU based job and does not require much resources since we have only couple of micrographs and very few selected particles.
micrograph STAR file: Your call (this should be the star file of the Ctf Estimation job.)
Input coordinates: Your call (it should be the coordinate files of your manual pick job)
OR re-extract refined particles: No
Manually set pixel size?: No
Particle box size (pix): Your call (If your particles size is 80A, I would use 100A mask size and 120A during extraction - extraction value is in pixel, so you have to calculate)
Invert contrast?: Yes
Normalize particles?: Yes
Diameter/white/black: -1/-1/-1
Rescale particles: =particle box size or half the size or less, your call
Extract helical segments?: No
You can probably run this job on the login node
Number of MPI procs: 1 (if you run it on the login node) or more (6-8) if your want to submit it to the cluster
Submit to queue: No (if you run it on the login node) or Yes if your want to submit it to the cluster
If the previous answer is No
RUN
If the previous answer is YES
Submit to queue: Yes
Queue name: normal
Submit queue command: sbatch
Memory: 32G
Number of GPU:2
Wall Time: 01:00:00
Standard sumbission script: /home/yoursunetid/software/scripts/relion/farmshare-relion.sbatch
RUN
2D classification
Input images STAR file: Your call
CTF Tab --> Yes, No, No
Number of classes: Your call (you should decide this number according to your number of particles about 5 classes for 500 to 1000 particles
Regularisation parameter T: 2
Number of iterations: 25
Use fast subsets: No
Mask diameter (A): Your call (this parameter can be determined from the manual picking, your circular mask should cover should be slightly bigger than your particle. If your particles size is 80A, I would use 100A mask size and 120A during extraction - extraction value is in pixel, so you have to calculate)
Mask individual particles with zeros?: Yes
Limit resolution: -1
Perform image alignment?: Yes
In-plane angular sampling: 6 (you may change this)
Offset serch range (pix): 5 (you may change this)
Offset search step (pix): 1 (you may change this)
Classify 2D helical segments?: No
Use parallel disc I/O?: Yes
Number of pooled particles: 3
Pre-read all particles into RAM?: No (You can change this to Yes if you have enough ram)
Combine iterations through disc?: No
Use GPU acceleration? Yes
Which GPUs to use:
Number of MPI procs: 3
Number of threads: 1
Submit to queue: Yes
Queue name: gpu
Submit queue command: sbatch
Memory: 32G
Number of GPU:2
Wall Time: 01:00:00
Standard sumbission script: /home/yoursunetid/software/scripts/relion/farmshare-relion.sbatch
RUN
Rest of the Processing
Please follow rest of the processing steps from Relion tutorial but keep in mind the parameters specific to this dataset. I will update this page if I get a lot of questions about a particular step. For cpu based jobs, you can run it directly on the GUI without submitting a batch job if it is a light processing or you can submit to cluster using "normal" queue name. Please submit gpu based jobs to the cluster using "gpu" queue name. Your job will be scheduled faster if you keep the time parameter short, but it should be enough to finish the jobs. The 2D classification, InitialModel, 3D classification and 3D Refine job with all the particles from all 20 micrographs are computationally heavy but should not take more than 8 hours (keep the time parameter at 24 hours for these jobs).
For 2D classification, 3D classification and 3d refinement, keep mpi at 3 and gpu at 2. For autopicking, motioncor2 and gctf keep both mpi and gpu at 2. Do not use threads.
You can reach upto 3A with this data set. If you can do better, you should consider doing this more. Have fun.
The general steps of a Relion Pipeline is following. (These steps can change depending on the project).
Import --> MotionCorr --> CtfFind --> Manual Picking --> Particle Extraction --> 2D Classification --> Subset Selection of 2D Classes for Autopick --> Auto-picking --> Particle Extraction --> 2D Classification --> Subset Selection of 2D Classification to select for "good" particles --> (2D Class + Subset)xN --> Initial Model --> 3D Classification --> Subset Selection of 3D Classes --> (3D Class + Subset)xN to select "good" classes --> 3D Refinement --> Other steps (Post processing, Ctf refine, Polishing, etc ...)
Please email alpays@stanford.edu if you encounter any problem, so I can update this document.