【佳學(xué)基因檢測】腫瘤基因檢測的這個技術(shù)很牛,不是內(nèi)行看不懂
高通量測序技術(shù)通過發(fā)現(xiàn)導(dǎo)致疾病發(fā)展的改變,有效改變了基因和生物醫(yī)學(xué)研究。盡管在種系和體細(xì)胞變異檢測方面已經(jīng)取得了相當(dāng)大的進(jìn)展,但低等位基因頻率變異的識別仍然受到測序錯誤和技術(shù)人工制品的阻礙。這在腫瘤學(xué)中有許多意義,尤其是在液體活檢應(yīng)用中,腫瘤DNA片段的出現(xiàn)頻率可能小于0.01%。在這些情況下,由于測序器的平均錯誤率,敏感檢測很困難∼0.1–1% .
High-throughput sequencing technologies have revolutionized genetic and biomedical research by uncovering alterations responsible for the development of disease. Although considerable progress has been made toward germline and somatic variant detection, identification of variants at lower allele frequencies remains hindered by sequencing errors and technical artefacts. This has numerous implications in oncology, particularly in liquid biopsy applications, where tumour DNA fragments may be present at frequencies <0.01%. Sensitive detection is difficult in these scenarios as sequencer error rates average ∼0.1–1% .
一種很有希望的抑制錯誤的策略是使用少有分子標(biāo)識符(UMI)來比較來自同一DNA片段的多個讀取。刪除單個讀取中發(fā)現(xiàn)的錯誤,只保留所有冗余讀取中存在的變體,以形成單鏈一致性序列(SSCS)。此外,需要進(jìn)行鏈感知雙重校正,以消除人工制品的氧化損傷;通過比較互補(bǔ)SSCS,雙鏈共有序列(DCS)只保留在片段的兩條鏈上發(fā)現(xiàn)的真實變體。雖然雙工方法允許更大的錯誤抑制,但從SSCS恢復(fù)DCS的效率很低(15–47%),并且依賴于測序覆蓋率
A promising strategy to suppress errors uses unique molecular identifiers (UMIs) to compare multiple reads derived from the same DNA fragment. Errors that are found in individual reads are removed, and only variants present across all redundant reads are retained to form a single-strand consensus sequence (SSCS). In addition, strand-aware duplex correction is needed to eliminate artefacts from oxidative damage; duplex consensus sequences (DCSs) retain only true variants found on both strands of a fragment by comparing complementary SSCSs. While duplex methods allow for greater error suppression, the efficiency of DCS recovery from SSCSs is poor (15–47%, ) and reliant on sequencing coverage .
當(dāng)前基于UMI的錯誤校正方法的一個主要限制是對冗余排序的依賴。這導(dǎo)致效率低下,盡管測序成本很高,但獨特序列的產(chǎn)量卻很低。在雙聯(lián)UMI方法中,這些效率低下的現(xiàn)象進(jìn)一步放大,在這種方法中,一個分子的兩條鏈都必須進(jìn)行冗余測序。這是有問題的,因為不均勻的測序通常是由擴(kuò)增偏差、隨機(jī)抽樣和覆蓋率不足引起的。這些因素將雙工校正的適用性限制為僅0.5–2.5%的序列讀?。▓D?). 此外,當(dāng)前基于UMI的策略沒有對未冗余排序的單次讀取(單例)使用錯誤抑制。這是有害的,因為在中等深度的測序樣本(定義為∼1000×–10000×(本研究覆蓋范圍)。
A major limitation of current UMI-based error correction methods is the dependence on redundant sequencing. This results in poor efficiency with low yield of unique sequences despite high sequencing costs. These inefficiencies are further magnified in duplex UMI methods, where both strands of a molecule must be redundantly sequenced. This is problematic, as uneven sequencing often arises from amplification biases, stochastic sampling, and inadequate coverage . These factors limit the applicability of duplex correction to only 0.5–2.5% of sequenced reads (Figure ?). Furthermore, current UMI-based strategies do not utilize error suppression for single reads (singletons) that have not been redundantly sequenced. This is detrimental as singletons may account for over half of all reads in a moderately deep sequenced sample (defined as ∼1000×–10 000× coverage in this study).
為了解決這些限制,我們開發(fā)了一種“單例校正”方法,可以在單例中抑制錯誤?. 通過利用混合捕獲深度測序數(shù)據(jù)中存在的大量單體,單體校正允許顯著地校正更多序列。與傳統(tǒng)的僅限于冗余讀取的UMI方法不同,我們的方法還使用互補(bǔ)鏈的讀取消除了單例中的錯誤。在這里,我們分析了細(xì)胞系和臨床樣本的組合,發(fā)現(xiàn)單重校正持續(xù)提高了傳統(tǒng)雙重校正方法的效率,增加了靈敏度,同時保持了調(diào)用低等位基因頻率變體的高特異性。
To address these limitations, we developed a ‘Singleton Correction’ methodology that enables error suppression in singletons (Figure ?. By utilizing the large number of singletons present in hybrid capture deep sequencing data, Singleton Correction allows dramatically more sequences to be corrected. Unlike traditional UMI methods that are restricted to redundant reads, our method also eliminates errors in singletons using reads from the complementary strand. Here, we analyzed a combination of cell line and clinical samples and found that Singleton Correction consistently improved the efficiency of traditional duplex correction methods and increased sensitivity while maintaining high specificity for calling low-allele-frequency variants.
(責(zé)任編輯:佳學(xué)基因)