## [1] "50.95 % of sequences were merged"
## [1] "11.21 % of sequences were dropped"
## [1] "8.104 % of sequences were ribosomes and removed"
## [1] "Three samples had a high % of ribosomes removed"
## V1
## 1 HI.4752.003.NEBNext_Index_13.000576_ChampSt1-20160915-WatPhotz_RNAa
## 2 HI.4752.003.NEBNext_Index_14.000577_ChampSt1-20160915-WatPhotz_RNAb
## 3 HI.4752.003.NEBNext_Index_15.000578_ChampSt1-20160915-WatPhotz_RNAc
## V9
## 1 59.60
## 2 63.94
## 3 66.96
## [1] "431147: mininum nb sequence annotated per sample (refseq)"
## [1] "31847464: maximum nb sequence annotated per sample (refseq)"
## [1] "2049000: mean nb sequence annotated per sample (refseq)"
## [1] "108955: mininum nb sequence annotated per sample (subsys)"
## [1] "16733558: maximum nb sequence annotated per sample (subsys)"
## [1] "1001000: mean nb sequence annotated per sample (subsys)"
## [1] "0.7181: Total Number of paired-end reads (billions)"
## [1] "0.15: Total Number of sequences after cleaning+merging (billions)"
## [1] "93.37% : percentage of sequences annotated (refseq)"
## [1] "45.62% : percentage of sequences annotated (subsys)"
samtools faidx Inediibacterium_massiliense_genome.fasta
samtools faidx Inediibacterium_massiliense_genome.fasta NZ_LN876587.1:1637150-1637400 >I_massiliense_contig1.fasta
samtools faidx Inediibacterium_massiliense_genome.fasta
samtools faidx Blautia_massiliensis_GD8_NZ_LN913006_WGS.fasta NZ_LN913006.1:2359000-2360200 >B_massiliensis_contig1.fasta
Check the Microcystis in the database and in the list of annotated genes?
There are 67425 Microcystis gene products (63.5k are for M. aeruginosa) in the dataset
This is a very good representation
Get a Microcystis protein set from Olga?
Done, but not sure how usefull, given that it is actually well represented in the dataset
Check illumina primer/adaptor sequences (find them on nanuq), if they may cause contamination (esp. for the 09-15 samples)?
Done. Refiltered samples as there was indeed some contamination
Didn’t change results very much
Check sequence similarity bwtn the 2 fecal species proteins
done. There are almost the same…
Theoretically, you should get a could corellation bwtn annotation to Rubisco + Photosystem and chlorophyll content. I should verify this (simple correlation or can also perfom a more complex RDA)
Not done yet, but I suspect correlation is not great… Why: I don’t know
Check annotatin at the other three functionnal levels (right know, I only look at the first one…)
Done at all 4 levels
eggnog annotation Not done
I can do a PCA to look at sample clustering based on communities
Done using phyloseq (Sept 15th samples were removed because they skew the ordinations so much…)
I can do an RDA to see how certain samples may correlate with environmental data
Not done, as environmental variables are being updated now
Look at alpha + beta diversity
alpha done: not much going on
metagenome corellated with metatranscriptome In progress: this is probably the most promising analysis
Check oceans metag vs . metat papers for ideas/references
fold changes for functions… maybe but not super exciting
Rpackage taxize database for species names synonyms
Check toxin results and how it correlates with the metat. expression
check expression of Dolicho and the metat. cyanotoxin expression (decrease in Dolicho around sept 15th: do we see more toxin production then…)
phages are one of the driver of blooms (phage kills dolicho, releases toxin in the water…)
metaphlan2