update

2024-09-21 22:37:10 +08:00 · 2024-09-21 22:37:10 +08:00 · a00036504c
commit a00036504c
parent c6706c04de
1 changed files with 60 additions and 21 deletions
--- a/README.md
+++ b/README.md
@ -1,3 +1,63 @@
 # DNASequence
 [提问者问题原文链接](https://www.zhihu.com/question/36143261/answer/3624848144)
 ## 代码逻辑介绍
 DNASequence处理
 DNA是双链的，互为互补链，对DNA样本进行测序时不能确认测出的是哪条链，所以就把所有DNA片段的互补链全算出来，和原文件放在一起组装。
 > 输入格式：只是演示样例，不保证其生物上的准确性，默认最大dna序列长度支持5e4，可自行修改代码扩容
 >
 > 程序将会从项目的根目录中打开filteredReads.txt并处理类似以下若干条dna序列
 ```
@SRR13280199.1 1 length=32
 ACGTACACATTGCTGTCTGCTGAACCACCTAG
@SRR13280199.1 2 length=32
 ACGTACACATTGCTGTCTGCTGAACCACCTAG
 ```
 ## pybind支持
 >在编译完文件后会得到dna.pyd文件和python提示文件dna.pyi，用法如下
 ```python
 import dna
 help(dna)
 dna.dna_reverse("filteredReads.txt","reversedSequence.txt")
 ```
 ```
 Help on module dna:
 NAME
    dna - DNASequence processing functions
 FUNCTIONS
    dna_reverse(...) method of builtins.PyCapsule instance
        dna_reverse(input_file_path: str, output_file_path: str) -> None
        DNA is double-stranded and complementary to each other, and when sequencing a DNA sample you can't be sure which strand is being measured, so the complementary strands of all the DNA fragments are counted and assembled together with the original file.
 FILE
    e:\file\dev\cpp\dnasequence\build\windows\x64\release\dna.pyd
 Open input file stream to value [input_file_stream] ok , from ["filteredReads.txt"]
 Open output file stream to value [output_file_stream] ok , from ["reversedSequence.txt"]
 Chunk size :4294967296 bytes
 [Timer: All spent] Start timing
 [Timer: chunk_id:[1]] Start timing
 [Timer: read_chunk_id:[1]] Start timing
 [Timer: read_chunk_id:[1]] Stop timing , used 1253ms
 buf_len : 897963094
 [Timer: calculate_chunk_id:[1]] Start timing
 [Timer: calculate_chunk_id:[1]] Stop timing , used 204ms
 [Timer: write_chunk_id:[1] , [Wrote bytes] start_pos : 897963094] Start timing
 [Timer: write_chunk_id:[1] , [Wrote bytes] start_pos : 897963094] Stop timing , used 1727ms
 [Timer: chunk_id:[1]] Stop timing , used 3185ms
 [Timer: All spent] Stop timing , used 3186ms
 ```
 # 注意！
 > 输入的时候麻烦最后一行的换行别删
@ -29,27 +89,6 @@ dna::open_file_and_calculate<(size_t)4 * 1024 * 1024 *1024 , (size_t)5e4+5>("fil
 >
 > mingw的IO优化不行
 # DNASequence
 [提问者问题原文链接](https://www.zhihu.com/question/36143261/answer/3624848144)
 ## 代码逻辑介绍
 DNASequence处理
 DNA是双链的，互为互补链，对DNA样本进行测序时不能确认测出的是哪条链，所以就把所有DNA片段的互补链全算出来，和原文件放在一起组装。
 > 输入格式：只是演示样例，不保证其生物上的准确性，默认最大dna序列长度支持5e4，可自行修改代码扩容
 >
 > 程序将会从项目的根目录中打开filteredReads.txt并处理类似以下若干条dna序列
 ```
@SRR13280199.1 1 length=32
 ACGTACACATTGCTGTCTGCTGAACCACCTAG
@SRR13280199.1 2 length=32
 ACGTACACATTGCTGTCTGCTGAACCACCTAG
 ```
 ## 关于如何构建本项目
 > 请确保安装了构建工具xmake，和任意C++构建工具并将路径添加到了PATH目录