Using your friendly server

From ZSM Entomology Portal
Jump to: navigation, search


Coleoptera Home

Arbeitskreis der bayerischen Käferfreunde

Funding Sources of Beetle Lab

You have a workstation and want to use it from your laptop no matter where you are.

First, thanks to Joan Pons who made this possible with endless advice.

  • You need a FTP program (e.g. Fetch on Mac) - to move files
  • you need a terminal (e.g. Terminal on Mac) - to enter UNIX commands to tell the workstation what it is supposed to do

♦ FTP program:

Establish connection with your server and move the files you want to analyze

♦ Terminal window (Terminal on Mac or Linux, Putty on Win):

Establish connection with your server


you will see something like that: [user01@cluster ~]$

type ls to see contents; black are files, green programs, blue folders

[user01@cluster ~]$ ls Agabinae-BIC.job byalljerome.nex.run1.p Agabinae-BIC.nex byalljerome.nex.run1.t AgabinaeTrees byalljerome.nex.run2.p ausgabe.out byalljerome.nex.run2.t backboneall2.nex byalljerome.out backboneMel.nex.run1.t mb backboneMel.nex.run2.p mb321_icc backboneMel.nex.run2.t mb321_icc_beagleicc BackboneTrees

[user01@cluster ~]$ cd AgabinaeTrees moves you e.g. into this folder

This comes handy

command what it will do for you example
ls shows contents of present directory cd Rene
cd change directory cd Desktop
top see CPU activity top (then see what goes on) then press q to get back to prompt
nohup & keep your job running after you close terminal and process data stored in the file nohup.out nohup beast -beagle_CPU YourInfile.xml &
tail -10 nohup.out see progress of your run
kill -9 terminate a job kill -9 PID, where PID is the process ID you see using top command, e.g. kill -9 12456 and if multicore kill -9 12456 123457 123458 ...
sudo reboot will shutdown and reboot your server
sudo shutdown -h now shutdown your server
iostat input/output statistics
sudo apt-get upgrade will update packages
example example example
example example example
example example example

♥ start mr bayes parallel with CPUs:

start on Lillifee:

XXX@XXXX:~$ cd Desktop

XXX@XXXX:~/Desktop$ mpirun -np 8 mb Yourdatafile.nex

mpirun is what specifies multiprocessor use, -np VALUE the number of CPUs that will be used. If you have set 4 chains in the MrBayes file, you need 8 CPUs (4 chains x 2 runs = 8 CPUs)

This can be done on Lillifee in older MrBayes versions, which you call as mb312 and mb311.

But in order to get the full power you should include the below line in each infile.nex for mrbayes (at the end of the file but before the mcmc line):

set usebeagle=yes beagledevice=CPU beagleprecision=single beaglescaling=dynamic beaglesse=yes beagleopenmp=yes;

This is in MrBayes 322, which you call as mb on Lillifee. In order to avoid zero support and run crash, paste this into your MrBayes block in the nexus file:

propset ExtTBR$prob=0;

♥ start mr bayes with GPU:

Include the below line in each infile.nex for mrbayes (at the end of the file but before the mcmc line):

set usebeagle=yes beagledevice=GPU beagleprecision=single beaglescaling=dynamic beaglesse=no beagleopenmp=no;

start e.g. on Lillifee: XXX@XXXX:~/Desktop$ mb Yourdatafile.nex

Lillifee also features GPU Mrbayes (version 2.1.1) - Jie Bao, Hongju Xia, Jianfu Zhou, Xiaoguang Liu, and Gang Wang, Efficient Implementation of MrBayes on multi-GPU, Mol Biol Evol first published online March 14, 2013 doi:10.1093/molbev/mst043

which you call as: mpirun -np 1 mb_gpu Yourdatafile.nex

or mb_gpu_hp, but this is still under evaluation

♥ raxml

start e.g. on Lillifee: XXX@XXXX:~/Desktop$ raxmlHPC-PTHREADS-AVX -T 4 -m GTRCATI -n test -q partitions -s ALLGENES.phy -k -f a -x 12345 -p 12345 -N 1000

With -T 4 meaning run on 4 cores (under top you will see CPU load 399%) and ALLGENES.phy being your infile. Its a good idea to run quick check to see which -T number is best, improvement from -T 2 to -T 4 or 6 might be worth using more cores, while I found to -T 8 causes decrease of speed.

Reason (from: The RAxML v8.0.X Manual by Alexandros Stamatakis): "...because the more processors you use, the more accumulated time they spend waiting for the input to be parsed and communicating with each other. In computer science this phenomenon is know as Amdahl's law (see's_law)."

An example:

raxmlHPC-PTHREADS-AVX -T 6 -m GTRCATI -n test -q partition.txt -s Aussie_Exo.phy -o Laccophilus_sp_Sulawesi_MB4863 -k -f a -x 12345 -p 12345 -N 1000

-T PTHREADS VERSION ONLY! Specify the number of threads you want to run.

-m Model of Binary (Morphological), Nucleotide, MultiState, or Amino Acid Substitution: This is probably the most complex and confusing command line option because...(see manual).

-n Specifies the name of the output file.

-q Specify the file name which contains the assignment of models to alignment partitions for multiple models of substitution. For the syntax of this file please consult the manual. DNA, part_1 = 1-2013 DNA, part_2 = 2014-2499 DNA, part_3 = 2500-2724

You can also assign distinct models to the codon positions, i.e. if you want a distinct model to be estimated for each codon position in gene1 you can specify: DNA, gene1codon1 = 1-500\3 DNA, gene1codon2 = 2-500\3 DNA, gene1codon3 = 3-500\3 DNA, gene2 = 501-1000

If you only need a distinct model for the 3rd codon position you can write: DNA, gene1codon1andcodon2 = 1-500\3, 2-500\3 DNA, gene1codon3 = 3-500\3 DNA, gene2 = 501-1000

-s Specify the name of the alignment data file in PHYLIP or FASTA format

-o Specify the name of a single outgroup or a commaseparated list of outgroups.

-n Specifies the name of the output file. This option has to be always specified. The arbitrary name passed via -n will be appended to all RAxML output files such that you know which files have been generated by which invocation. If you intend to do two runs that write files into the same directory with the same name specified by -n the program will exit with an error to prevent you over-writing output files from a previous run.

-k Specifies that bootstrapped trees should be printed with branch lengths. The bootstraps will run a bit longer, because model parameters will be optimized at the end of each replicate under GAMMA or GAMMA+PInvar respectively. DEFAULT: OFF

-f a rapid Bootstrap analysis and search for bestscoring ML tree in one program run Tell RAxML to conduct a rapid Bootstrap analysis and search for the best-scoring ML tree in one single program run.

-x Specify an integer number (random seed) and turn on rapid bootstrapping CAUTION: unlike in previous versions of RAxML will conduct rapid BS replicates under the model of rate heterogeneity you specified via m and not by default under CAT

This will invoke the rapid bootstrapping algorithm described in

-p Specify a random number seed for the parsimony inferences. This allows you to reproduce your results and will help me debug the program.

For all options/algorithms in RAxML that require some sort of randomization, this option must be specified. Make sure to pass different random number seeds to RAxML and not only 12345 as I have done in the examples. When not specifying -p when it is required by RAxML, the program will exit with a respective error message. In the example below (a simple tree search for the ML tree) the random number seed is 33 required for randomized stepwise addition order parsimony starting tree that is computed prior to the actual ML optimization.

-#|-N Specify the number of alternative runs on distinct starting trees. In combination with the "b" option, this will invoke a multiple bootstrap analysis. Note that "-N" has been added as an alternative since -# sometimes caused problems with certain MPI job submission systems, since -# is often used to start comments.

♥ Beast

Lillifee CPU

XXX@XXXX:~/Desktop$ beast -beagle_CPU -beagle_double -overwrite Yourdatafile.xml

-beagle_instances helps to speed up some analyses. You can try numbers between 2-5 and see output

XXX@XXXX:~/Desktop$ beast -beagle_CPU -beagle_double -beagle_instances 3 -overwrite Yourdatafile.xml Lillifee GPU

XXX@XXXX:~/Desktop$ beast -beagle_GPU -beagle_double -overwrite Yourdatafile.xml

-beagle_instances helps to speed up some analyses. You can try numbers between 2-5 and see output

Lillifee specs

Single socket R (LGA 2011) supports Intel® Xeon® processor E5-2600/1600 Support Intel® Core i7 Extreme / Performance LGA 2011 processors with Non-ECC UDIMM only Intel® C602 chipset

Up to 256GB RDIMM or 64GB UDIMM; DDR3 up to 1600MHz

Expansion slots: 2 x16 PCI-E 3.0, 1 x4 PCI-E 3.0 (in x8), 1 x4 PCI-E 2.0 (in x8), 1 PCI-32

Dual Gigabit Ethernet LAN ports; Intel® 82579LM and Intel® 82574L

2 SATA3 (6Gbps), 4 SATA (3Gbps), & 4 SATA2 (3Gbps) ports via SCU

4 USB 3.0 (5Gbps) ports (2 rear, 2 via header) & 14 USB 2.0 ports (8 rear, 6 via header)

• 1x Intel© Core i7 3970X, 6-Core 3.5GHz 15MB Cache 5.0 GT/s LGA2011 130W (Sandy Bridge EP)

• 4x 8GB (total 32 GB) DDR3 RAM, PC1600 nonECC unbuffered

• 2x 1.0TB S-ATA300 NCQ 7.200rpm 3,5" 64MB RAID-Edition

• 1x Half DVD-RW drive, SATA

• PNY PCIe 2GB Quadro K2000 DVI/2xDP