diff --git a/README.md b/README.md index 0e4c6dc..bd53b31 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,63 @@ +# DNASequence + +[提问者问题原文链接](https://www.zhihu.com/question/36143261/answer/3624848144) + +## 代码逻辑介绍 + +DNASequence处理 + +DNA是双链的,互为互补链,对DNA样本进行测序时不能确认测出的是哪条链,所以就把所有DNA片段的互补链全算出来,和原文件放在一起组装。 + +> 输入格式:只是演示样例,不保证其生物上的准确性,默认最大dna序列长度支持5e4,可自行修改代码扩容 +> +> 程序将会从项目的根目录中打开filteredReads.txt并处理类似以下若干条dna序列 + +``` +@SRR13280199.1 1 length=32 +ACGTACACATTGCTGTCTGCTGAACCACCTAG +@SRR13280199.1 2 length=32 +ACGTACACATTGCTGTCTGCTGAACCACCTAG +``` +## pybind支持 +>在编译完文件后会得到dna.pyd文件和python提示文件dna.pyi,用法如下 +```python +import dna + +help(dna) + +dna.dna_reverse("filteredReads.txt","reversedSequence.txt") +``` +``` +Help on module dna: + +NAME + dna - DNASequence processing functions + +FUNCTIONS + dna_reverse(...) method of builtins.PyCapsule instance + dna_reverse(input_file_path: str, output_file_path: str) -> None + + DNA is double-stranded and complementary to each other, and when sequencing a DNA sample you can't be sure which strand is being measured, so the complementary strands of all the DNA fragments are counted and assembled together with the original file. + +FILE + e:\file\dev\cpp\dnasequence\build\windows\x64\release\dna.pyd + + +Open input file stream to value [input_file_stream] ok , from ["filteredReads.txt"] +Open output file stream to value [output_file_stream] ok , from ["reversedSequence.txt"] +Chunk size :4294967296 bytes +[Timer: All spent] Start timing +[Timer: chunk_id:[1]] Start timing +[Timer: read_chunk_id:[1]] Start timing +[Timer: read_chunk_id:[1]] Stop timing , used 1253ms +buf_len : 897963094 +[Timer: calculate_chunk_id:[1]] Start timing +[Timer: calculate_chunk_id:[1]] Stop timing , used 204ms +[Timer: write_chunk_id:[1] , [Wrote bytes] start_pos : 897963094] Start timing +[Timer: write_chunk_id:[1] , [Wrote bytes] start_pos : 897963094] Stop timing , used 1727ms +[Timer: chunk_id:[1]] Stop timing , used 3185ms +[Timer: All spent] Stop timing , used 3186ms +``` # 注意! > 输入的时候麻烦最后一行的换行别删 @@ -29,27 +89,6 @@ dna::open_file_and_calculate<(size_t)4 * 1024 * 1024 *1024 , (size_t)5e4+5>("fil > > mingw的IO优化不行 -# DNASequence - -[提问者问题原文链接](https://www.zhihu.com/question/36143261/answer/3624848144) - -## 代码逻辑介绍 - -DNASequence处理 - -DNA是双链的,互为互补链,对DNA样本进行测序时不能确认测出的是哪条链,所以就把所有DNA片段的互补链全算出来,和原文件放在一起组装。 - -> 输入格式:只是演示样例,不保证其生物上的准确性,默认最大dna序列长度支持5e4,可自行修改代码扩容 -> -> 程序将会从项目的根目录中打开filteredReads.txt并处理类似以下若干条dna序列 - -``` -@SRR13280199.1 1 length=32 -ACGTACACATTGCTGTCTGCTGAACCACCTAG -@SRR13280199.1 2 length=32 -ACGTACACATTGCTGTCTGCTGAACCACCTAG -``` - ## 关于如何构建本项目 > 请确保安装了构建工具xmake,和任意C++构建工具并将路径添加到了PATH目录