Results of DNA sequencing are provided in three data files – .ab1 file, .seq file and .phd.1 file.
Electropherogram (data after analysis) shows a sequence of peaks in four colors, each color represents the base called for that peak and there is a textual version of recorded sequence visible:
Raw data (data before analysis by the basecaller algorithm) are data as they are recorded by the sequencer:
How do we manipulate results before sending them out?
What we get first from our DNA sequencers is the raw data. These are analyzed using special algorithms dedicated for this purpose and called basecallers. As a result we get the electropherogram, provided to you as a part of .ab1 file, and read the sequence, saved again in .ab1 file but also in .seq and .phd.1 files (see above). Every electropherogram is then checked.
We also choose how to visualize electropherograms. There are in principle two options only – True or Flat profile. While the Flat profile displays the data as processed traces scaled semi-locally, the True profile displays data as processed traces scaled uniformly and is very similar to that of the raw traces which is not suitable for samples with declining peak intensities. In any case these profiles are only two ways of showing the same thing, the data – sequence and quality values (see below) are not changed in any way.
Note: Since January 2015 when we enabled download of results also in jpg files we use Flat profile only.
Data analysis software
To perform data analysis you need software to open .ab1 files. There are many different programs available, some free, and it is not easy to give recommendations as to which software you should choose. In general, you should always use software which not only shows the electropherogram but also the raw sequence data since these are critically important if the quality is low and you need to know why.
Amongst the free software tools FinchTV and Sequence Scanner are probably the most popular ones. They enable viewing and editing .ab1 files and evaluating their quality but typically only one-by-one. In case you need to perform analysis on a more sophisticated level, for example you wish to perform assembly of multiple sequences, comparison to a reference sequence, automatic mutation detection etc., you need special software packages like Sequencher (GeneCodes) or SeqScape (Applied Biosystems). We can provide training for them if you are interested.
Data analysis
When evaluating .ab1 files, you should first see the electropherogram and come to a conclusion whether your data can be considered of good quality or not.
Good quality sequencing data are characterized by:
An example of a very good quality data:
A quick and very comfortable way to check the data quality is Quality Values (QVs). By definition the QV is a per-base estimate of the basecaller accuracy. In a plain language, QVs are colored bars above peaks/bases:
Quality values in data files you receive from us follow these rules:
If your .ab1 file looks like on the picture above then you only need to read its sequence and eventually perform some manual edits at the very beginning/end of it (where there are also yellow and red bars shown).
If, however, you do not see such a pretty picture then you need to troubleshoot your data to understand what failed and why and take corrective measures in the future. These can vary dramatically depending on the nature of the problem and usually you need to examine especially raw data very carefully. Contact us for advice to specific issues you observe in your data.