Tuesday, 24 March 2015

BEAUti and the BEAST

After putting all my sequences into a nice order, I filled in missing sequences or parts of sequences with question marks (less dynamic and thrilling than it sounds).

What is SD2? We just don't know.

Next, I imported the sequences into the program BEAUti (which takes sequence files and information about how you want the analysis to be run, and turns it into a manageable format for BEAST), and specified the right(ish) substitution models (as predicted by JModeltest, but BEAUti doesn't have a huge selection of preset models so I just did my best with the options it gives). Also, I told BEAUti to make BEAST run as STARBEAST, a modification which enables several sequences to be grouped together (in my case, sequences from the same species can be grouped together as that species). 
Coded sequence names (left panel) can be grouped together into nice, friendly species names (right panel) in STARBEAST.
 STARBEAST also allows you to use multiple genes and specify whether or not the genes are independently evolving. I set it to run 100,000,000 generations, because I have a supercomputer and my supercomputer can do anything. Then I ran the analysis using BEAST. After 20,000,000 generations and a fitful sleep, I stopped the analysis and decided 10,000,000 would be enough next time. It was plenty long enough anyway, especially considering this first tree was but a preliminary one.

This is what BEAST looks like when it starts running. It is sampling lots of phylogenies and seeing which are the best supported, given my prior assumptions and the posterior probabilities estimated from the sequence data by BEAST.

 Anyway, the output tree looked quite cool:

It looks like the new species I found in Central Otago may well be a new species, wheras the new species found in Canterbury all look to be the same as C. dendyi (despite looking very different from it). Also, C. delli, C. stewarti, and C. isolata are all open-burrow spiders, but they appear to be polyphyletic - that is, they are not all grouped together, suggesting that lid-building traits have been lost several times in the evolutionary history of Cantuaria.

Spurred on by this in"tree"guing outcome, I sequenced some more spiders to make a bigger tree...but computer said no. Firstly, finding the correct substitution model using JModeltest didn't seem to work - the DNA alignment (collection of sequences lined up so that the different parts of the sequence match parts of other sequences) took too long to analyse. It seemed to be a bug with JModeltest, so I used an older version, but that couldn't read the sequences at all. I've given up with that temporarily, and used the same models as I did with the first tree. But I will have to get it working for the tree that I put in my thesis.

After selecting all the right(ish) settings in BEAUti (which took a few goes, because some of the question marks were in the wrong place), I tried to run the analysis in BEAST, but it had trouble reading in the data or it just crashed. Most of the problems were down to the plugin BEAGLE, which I use because the power of my computer is mostly in its graphics card, and BEAGLE allows you to harness the almighty power of the graphics card (which does lots of little things very quickly, as opposed to the computer's processor which does few big things slowly). I finally managed to get it to run, after I had remedied the following things:

- When creating mapping files for BEAUti, make sure sequence names differ from the labels used to group them.
- Make sure every set of sequences is in the same order as all the other sets of sequences.
- When running BEAGLE, give up and don't use BEAGLE unless you want to spend half an hour playing with its settings for something that takes the same amount of time as not using BEAGLE (why the hell did I spend so much money getting a good graphics card when it doesn't seem to make any difference at all?).

The first run did not give a large enough effective sample size (ESS, shown by the program Tracer), so I ran it for 40,000,000 generations (which thankfully only took six hours). This was the resulting tree:

This tree should be more representative of Cantuaria evolution than the first tree I made: there are more sequences (though some are a bit short), and I trimmed the ends of the sequences a lot less. Distances might be a bit wrong because the substitution models I used might not be right. Interestingly, all the lidless species are now monophyletic (grouped together), although C. huttoni is also lidless and seems to be quite distantly related to the rest. Again, Canterbury new species seem the same as C. dendyi. Interestingly, C. johnsi, C. magna and C. prina all seem to be pretty much the same species genetically. Species designations will have to be looked into a bit later using the Spider R package. Also, I have to do a bit less of a rushed job next time, and resequence some of the samples that came out too short, and get the right models, in the hope that one of these things will make the branch lengths of the tree a bit more sensible (the tips should all line up). I will by then hopefully have some Misgolas samples to use as outgroups instead of Segregara.

Another interesting thing that has come out of building these trees is that if I try to group the species into separate populations, the posterior probabilities invent a new kind of distribution:

This should be a bell-shaped curve...

I spoke to another BEAST user about this and he said it means the populations are not separate - they are all interbreeding with each other. So, while Cantuaria spp. can be divided into different species, they are not easily divided into populations - perhaps these spiders move around a bit more than I thought.

Playing with trees has been fun, but now I think I need to focus on writing my 18-month report which is due in April. I also need to think about:
- Identifying samples using morphology
- Analysing some ecological data I have collected (I started that but I should probably finish it at some point)
- Visiting Otago Museum to have a look at their holotypes
- Working out how to use Spider (R package)
- Describing species
- Working out how to use GenGIS (a landscape genetics program)
- Pitfall trapping for males (before their season finishes!)
- Evolution conference in Sao Paulo which I really want to go to
- Writing my thesis!