Jan. 1, 2017
Note: SNP and INDEL variants are given with amino acid, intron/exon, protein, etc. information using VEP and SNPeff. Additional data is provided in BAM format, and/or FASTQ.
As a whole-genome sequencing (WGS) product, this represents the most comphrehensive genetic analysis available today and greatly expands the scope of our product line. We have recently completed a pilot study and the results are impressive, leading us to now officially launch this as a standard product; this will supplement our existing Y Elite and Y Prime tests, which target the Y chromosome and are only applicable to males. By using Illumina's groundbreaking HiSeq X next-generation sequencing platform, we are able to provide high-quality WGS results at a best-in-market sub-$2000 price for individual customers.
Details of results delivery are still being developed and refined, but we are currently set up to provide the following:
In terms of the technical details of the underlying raw data, the sequencing produces:
Although Full Genomes is not providing any interpretation of autosomal and X chromosome results, the raw data are provided in a format compatible with a number of tools that provide the opportunity for in-depth analysis. Examples of currently-available online analysis tools include:
As with other Full Genomes products, this is intended for ancestry/research-use only, and should not be relied upon for medical or diagnostic purposes.
Interested customers can order the WGS product through the Full Genomes website here.
The FGC Team
Oct. 24, 2014
Full Genomes Corporation (FGC) has begun releasing new mitochondrial DNA (mtDNA) analyses to customers. The new analysis for each kit is distributed as a FASTA file. FASTA is widely-recognized sequence representation format that is compatible with many mtDNA databases and analysis tools.
The FGC team would like to thank Ian logan and Dr. Ann Turner for very helpful feedback during the development of these new mtDNA files.
The team also thanks George Jones for the suggestion of distributing these files, as well as the FGC customers who volunteered their mtDNA data for testing and refinement of the FASTA generation process.
These mitochondrial sequence FASTA files will be available to all FGC customers, including those with results from FGC's Y Elite and Y Prime tests, whole-genome sequencing, and Big Y analysis. The FGC team is currently targeting distribution to all customers with available sequencing results over the course of the next week.
Firstly, the new results are in a format that is widely-used and compatible with many current mtDNA databases and analysis tools.
Secondly, the results are based on a newly-developed bioinformatics pipeline, designed specifically for performing this mitochondrial sequence analysis. The approach uses more advanced techniques, designed to improve mutation detection and reduce false positives. As a result, the mutations that are indicated in the results may differ slightly from the earlier analysis.
Finally, FGC customers who ordered analysis of Big Y .bam files will be able to more easily interpret their mtDNA results. The previous analysis of Big Y files reported mtDNA results using the Yoruba reference sequence, rather than the more widely-used rCRS reference sequence and the more recent RSRS reference sequence. The FASTA sequence representation is not tied to a particular reference sequence, and many analysis tools will readily analyze the FASTA sequence in the context of these more commonly-used rCRS and RSRS reference sequences.
There are several opportunities to use the mtDNA results in the FASTA file, which can allow you to get a better understanding of your mtDNA, determine your mtDNA haplogroup, and possibly even contribute to ongoing research of mtDNA and the human mtDNA tree.
Determining your mtDNA haplogroup: One particularly useful mtDNA analysis tool available on the web is James Lick's mthap tool, which provides haplogroup classification based on mtDNA data supplied in a number of formats, including FASTA. To use the tool with your FASTA file:
Browsebutton at the top left and select the location of the FASTA file on your computer.
Uploadand wait a minute or so for the report to appear, including identification of mutations and haplogroup classification.
This tool is regularly updated as the mtDNA tree is refined, so you may want to check back periodically for updates. You can also visit PhyloTree to see how your haplogroup fits within the human mtDNA phylogenetic tree.
GenBank submission: If you are a whole-genome sequencing customer or one of the lucky few who happened to obtain a near-complete mtDNA sequence from a Y sequencing test (with at least 16545 bp covered, as indicated by the second number at the top of the file) then you may wish to consider submitting your mtDNA sequence to the GenBank database, used by researchers around the world. An mtDNA expert, Ian Logan (firstname.lastname@example.org) is graciously offering his time and expertise to take a look at your results, determine whether they are suitable for submission to GenBank, and help with the submission process; he has already helped a significant number of individuals submit their mtDNA sequences to GenBank.
FASTA is a format that is commonly used to represent DNA sequences. It consists of a comment line (starting with ">"), followed by one or more lines with the actual DNA sequence of interest. (Further details and examples are at Wikipedia's "FASTA format" article.)
In this case, the sequence starts with the origin (position 1) of the "+" strand of the circular mitochondrial genome and continues (in the standard 5'-3' direction) to higher position numbers.
Although FGC's Y chromosome tests are designed to sequence the Y chromosome, the "targeting" is not perfect. The mitochondrial results that are obtained from these Y chromosome tests are considered "off-target" coverage. On the other hand, mtDNA is also relatively abundant, as there are typically a number of copies per cell. Ultimately, the mtDNA results are a fortuitous side effect for anyone interested in mtDNA for genetic genealogy, anthropology, or other applications. However, the quality and completeness of the mtDNA results can vary significantly from kit to kit, and much of the mtDNA sequence (20% or more) will be undetermined in many cases.
On the other hand, mitochondrial DNA results from whole-genome sequencing are completely "intentional" and should generally allow determination of complete or near-complete mtDNA sequence with relatively high reliability.
Three statistics are reported at the top of the file:
The letter "N" is used to indicate cases where the base cannot be determined. Most of these cases arise when there are no available "reads" of a particular mtDNA site, though in some cases "N" is used to indicate a base that cannot be reliably determined even though the position has been "read".
Letters shown in lower-case correspond to bases that are more uncertain, whereas bases in UPPER-CASE may be considered "high reliability".
Bases shown in UPPER-CASE are supported by at least four reads. UPPER-CASE bases are also required to satisfy additional requirements in cases where there is evidence of variation from the reference sequence.
In the case of Y sequencing tests, the mtDNA results are essentially incidental. Therefore, these Y sequencing tests generally result in relatively few reads from mitochondrial DNA. Small changes in the abundance of mtDNA in the sequencing library can have a relatively large effect on the ability to determine the mtDNA sequence. (When there are a small number of mtDNA reads, it is essentially "the luck of the draw" whether a particular mtDNA site will be covered. A number of factors can impact the abundance of mtDNA that is sequenced, including the "copy number" of mtDNA in the submitted DNA sample and Y chromosome targeting efficiency at the lab.
mtDNA results from whole-genome sequencing can be expected to be much less susceptible to these issues, and should provide nearly complete mtDNA coverage in all cases.
These other letters are standardized "ambiguity codes" that may be used to represent multiple nucleotides (the standard A/T/C/G "letters"). For example, "K" represents either "G" or "T". (See Wikipedia's "Nucleic acid notation" article for further details.)
In these mitochondrial FASTA sequences, the ambiguity codes can indicate either ambiguous sequencing results or heteroplasmy (i.e. a mixture of different mtDNA sequences within the cells of your body). If a particular ambiguity code is shown as a lower-case letter in the FASTA file, it is more likely to represent an ambiguous result, possibly due to a sequencing artifact. On the other hand, if it is shown as an UPPER-CASE letter, it is more likely to be a genuine heteroplasmy.
Currently, there is no standard format for representing ambiguous results or a mixture of sequences with different lengths (corresponding to insertions or deletions) as a single FASTA sequence. Therefore, these cases are instead reported as separate FASTA sequences, labeled as “potential length variants”.
As with the "ambiguity codes" discussed above, these can indicate either ambiguous results or heteroplasmy. If the length variation occurs in a region shown in lower-case letters, it is more likely to be an ambiguous result, possibly due to a sequencing artifact. If it is in a region shown in UPPER-CASE letters, it is more likely to be a genuine heteroplasmy.
As noted above, Y chromosome sequencing tests are specifically designed not to target mitochondrial DNA and other portions of the genome. The mitochondrial results that are obtained from such tests are considered "off-target" coverage (though fortuitous to those interested in mtDNA). As a result, the sequencing provides relatively few mitochondrial DNA "reads" and mtDNA sequence coverage will vary significantly from test to test. So, sequencing performed elsewhere that specifically targets mitochondrial DNA should generally be expected to provide results that are more reliable. (Note that reliability should be much less of an issue with whole-genome sequencing results, as offered in FGC's recent pilot product.)
Discrepancies are more likely with heteroplasmies and in the lower-quality, lower-case portions of the sequence in the FASTA file. Also, as mentioned above, mtDNA results based on Y chromosome targeted sequencing will tend to be less reliable than those based on whole-genome sequencing (WGS).
The FGC team would appreciate feedback about any discrepancies, particularly any that are seen in the "high reliability" UPPER-CASE portions of the sequence or in analysis of whole-genome sequencing results; we are also particularly interested in examining any cases where the new FASTA report appears to contain errors that were not present in the older "mttype" mtDNA variant reports; please send details of any such discrepancies to email@example.com.
FGC is providing only raw sequence and mutation information for mtDNA results, with a focus on ancestry, genealogical, and anthropological use. The results are provided without any analysis or interpretation with regard to potential health implications. Customers are urged to bear in mind potential health implications and the potential for inaccuracy of results when sharing or interpreting the provided mtDNA information.
If you have additional questions that are not addressed above, please feel free to contact us via e-mail at firstname.lastname@example.org; you may also wish to consult with one of the mtDNA experts who participate in the genetic genealogy community.
The FGC Team
© 2015 Full Genomes Corporation, Inc. All rights reserved.