This commit is contained in:
Zengtudor 2024-09-21 22:37:10 +08:00
parent c6706c04de
commit a00036504c

View File

@ -1,3 +1,63 @@
# DNASequence
[提问者问题原文链接](https://www.zhihu.com/question/36143261/answer/3624848144)
## 代码逻辑介绍
DNASequence处理
DNA是双链的互为互补链对DNA样本进行测序时不能确认测出的是哪条链所以就把所有DNA片段的互补链全算出来和原文件放在一起组装。
> 输入格式只是演示样例不保证其生物上的准确性默认最大dna序列长度支持5e4可自行修改代码扩容
>
> 程序将会从项目的根目录中打开filteredReads.txt并处理类似以下若干条dna序列
```
@SRR13280199.1 1 length=32
ACGTACACATTGCTGTCTGCTGAACCACCTAG
@SRR13280199.1 2 length=32
ACGTACACATTGCTGTCTGCTGAACCACCTAG
```
## pybind支持
>在编译完文件后会得到dna.pyd文件和python提示文件dna.pyi用法如下
```python
import dna
help(dna)
dna.dna_reverse("filteredReads.txt","reversedSequence.txt")
```
```
Help on module dna:
NAME
dna - DNASequence processing functions
FUNCTIONS
dna_reverse(...) method of builtins.PyCapsule instance
dna_reverse(input_file_path: str, output_file_path: str) -> None
DNA is double-stranded and complementary to each other, and when sequencing a DNA sample you can't be sure which strand is being measured, so the complementary strands of all the DNA fragments are counted and assembled together with the original file.
FILE
e:\file\dev\cpp\dnasequence\build\windows\x64\release\dna.pyd
Open input file stream to value [input_file_stream] ok , from ["filteredReads.txt"]
Open output file stream to value [output_file_stream] ok , from ["reversedSequence.txt"]
Chunk size :4294967296 bytes
[Timer: All spent] Start timing
[Timer: chunk_id:[1]] Start timing
[Timer: read_chunk_id:[1]] Start timing
[Timer: read_chunk_id:[1]] Stop timing , used 1253ms
buf_len : 897963094
[Timer: calculate_chunk_id:[1]] Start timing
[Timer: calculate_chunk_id:[1]] Stop timing , used 204ms
[Timer: write_chunk_id:[1] , [Wrote bytes] start_pos : 897963094] Start timing
[Timer: write_chunk_id:[1] , [Wrote bytes] start_pos : 897963094] Stop timing , used 1727ms
[Timer: chunk_id:[1]] Stop timing , used 3185ms
[Timer: All spent] Stop timing , used 3186ms
```
# 注意! # 注意!
> 输入的时候麻烦最后一行的换行别删 > 输入的时候麻烦最后一行的换行别删
@ -29,27 +89,6 @@ dna::open_file_and_calculate<(size_t)4 * 1024 * 1024 *1024 , (size_t)5e4+5>("fil
> >
> mingw的IO优化不行 > mingw的IO优化不行
# DNASequence
[提问者问题原文链接](https://www.zhihu.com/question/36143261/answer/3624848144)
## 代码逻辑介绍
DNASequence处理
DNA是双链的互为互补链对DNA样本进行测序时不能确认测出的是哪条链所以就把所有DNA片段的互补链全算出来和原文件放在一起组装。
> 输入格式只是演示样例不保证其生物上的准确性默认最大dna序列长度支持5e4可自行修改代码扩容
>
> 程序将会从项目的根目录中打开filteredReads.txt并处理类似以下若干条dna序列
```
@SRR13280199.1 1 length=32
ACGTACACATTGCTGTCTGCTGAACCACCTAG
@SRR13280199.1 2 length=32
ACGTACACATTGCTGTCTGCTGAACCACCTAG
```
## 关于如何构建本项目 ## 关于如何构建本项目
> 请确保安装了构建工具xmake和任意C++构建工具并将路径添加到了PATH目录 > 请确保安装了构建工具xmake和任意C++构建工具并将路径添加到了PATH目录