密码保护:鲲鹏ARM平台编译kraken2 conda包
因为涉A的原因,国内一些较大的院校避免被卡脖子,开始转向华为鲲鹏平台(arm)+ 欧拉系统构建常用教学软件系统。最近有朋友问起了在这些平台上编译kraken2的问题,刚好手头上有这样的资源,这里总结下编译测试过程。
一、源码编译
源码安装是最简单的,没有太多需要配置的地方,如下:
1# 从github下载
2git clone https://github.com/DerrickWood/kraken2.git
3
4# 或wget下载并解压
5wget https://github.com/DerrickWood/kraken2/archive/master.zip
6unzip master.zip
7
8# 选择安装的位置,开始安装,最后出现“Kraken 2 installation complete.”说明安装成功
9KRAKEN2_DIR=/opt/kraken2
10./install_kraken2.sh $KRAKEN2_DIR
11
12# 添加到环境变量
13cp $KRAKEN2_DIR/kraken2{,-build,-inspect} $HOME/bin
注意:对应的安装位置还是不能删除的,因为命令调用的时候,还是会提示需要kraken2lib.pm
等相关文件,这些文件在刚刚的/opt/kraken2目录下,这里建议添加export添加到PATH环境变量下。
二、conda编译
arm下安装anaconda环境,这个可以参考:https://blog.361way.com/arm-linux-anaconda/6819.html
编译conda包,还需要用到conda-build包,可以通过以下指令安装或升级:
1conda install conda-build
2conda update conda
3conda update conda-build
编译coda包,需要编写meta.yaml文件、build.sh文件,具体如下:
1# 编写meta.yaml文件
2(base) [root@ecs-7b87 kraken2-2.1.2]# cat meta.yaml
3package:
4 name: kraken2
5 version: 1
6
7build:
8 skip: True # [py /usr/bin/git log -n1
9Date: Mon May 9 16:18:50 2022 -0400
10
11 revert the change to https
12
13==> /usr/bin/git describe --tags --dirty /usr/bin/git status
14ld_impl_linux-aarch64 2.36.1 h0ab8de2_3
15libarchive 3.5.2 h06cee29_0
16libffi 3.3 h7c1a80f_2
17libgcc-ng 10.2.0 h1234567_51
18libgomp 10.2.0 h1234567_51
19liblief 0.11.5 h22f4aa5_1
20libstdcxx-ng 10.2.0 h1234567_51
21libxml2 2.9.14 he30c317_0
22lz4-c 1.9.3 h7c1a80f_0
23markupsafe 2.0.1 py39h2f4d8fa_0
24ncurses 6.3 h2f4d8fa_2
25openssl 1.1.1o h2f4d8fa_0
26patchelf 0.13 h22f4aa5_0
27pip 21.2.4 py39hd43f75c_0
28pkginfo 1.8.2 pyhd3eb1b0_0
29psutil 5.8.0 py39hfd63f10_1
30py-lief 0.11.5 py39h22f4aa5_1
31pycosat 0.6.3 py39hfd63f10_2
32pycparser 2.21 pyhd3eb1b0_0
33pyopenssl 22.0.0 pyhd3eb1b0_0
34pysocks 1.7.1 py39hd43f75c_0
35python 3.9.12 hc137634_0
36python-libarchive-c 2.9 pyhd3eb1b0_1
37pytz 2022.1 py39hd43f75c_0
38pyyaml 6.0 py39h2f4d8fa_0
39readline 8.1.2 h2f4d8fa_1
40requests 2.27.1 pyhd3eb1b0_0
41ruamel_yaml 0.15.100 py39h2f4d8fa_0
42setuptools 61.2.0 py39hd43f75c_0
43six 1.16.0 pyhd3eb1b0_1
44soupsieve 2.3.1 pyhd3eb1b0_0
45sqlite 3.38.2 h6632b73_0
46tk 8.6.11 h241ca14_0
47tqdm 4.63.0 pyhd3eb1b0_0
48tzdata 2022a hda174b7_0
49urllib3 1.26.8 pyhd3eb1b0_0
50wheel 0.37.1 pyhd3eb1b0_0
51xz 5.2.5 hfd63f10_1
52yaml 0.2.5 hfd63f10_0
53zlib 1.2.12 h2f4d8fa_1
54zstd 1.5.2 hfcb3217_0
55
56# 卸载kraken2包
57(base) [root@ecs-7b87 kraken2]# conda uninstall kraken2
58Collecting package metadata (repodata.json): done
59Solving environment: done
60
61## Package Plan ##
62
63 environment location: /opt/minconda3
64
65 removed specs:
66 - kraken2
67
68
69The following packages will be downloaded:
70
71 package | build
72 ---------------------------|-----------------
73 cryptography-37.0.1 | py39h3d58568_0 1.2 MB
74 tqdm-4.64.0 | py39hd43f75c_0 126 KB
75 urllib3-1.26.9 | py39hd43f75c_0 183 KB
76 ------------------------------------------------------------
77 Total: 1.5 MB
78
79The following packages will be REMOVED:
80
81 colorama-0.4.4-pyhd3eb1b0_0
82 conda-content-trust-0.1.1-pyhd3eb1b0_0
83 kraken2-1-py39_0
84
85The following packages will be UPDATED:
86
87 cryptography 36.0.0-py39h3d58568_0 --> 37.0.1-py39h3d58568_0
88 ld_impl_linux-aar~ 2.36.1-h0ab8de2_3 --> 2.38-h8131f2d_1
89 libgcc-ng 10.2.0-h1234567_51 --> 11.2.0-h1234567_1
90 libgomp 10.2.0-h1234567_51 --> 11.2.0-h1234567_1
91 libstdcxx-ng 10.2.0-h1234567_51 --> 11.2.0-h1234567_1
92 sqlite 3.38.2-h6632b73_0 --> 3.38.3-h6632b73_0
93 tk 8.6.11-h241ca14_0 --> 8.6.12-h241ca14_0
94 tqdm pkgs/main/noarch::tqdm-4.63.0-pyhd3eb~ --> pkgs/main/linux-aarch64::tqdm-4.64.0-py39hd43f75c_0
95 urllib3 pkgs/main/noarch::urllib3-1.26.8-pyhd~ --> pkgs/main/linux-aarch64::urllib3-1.26.9-py39hd43f75c_0
96 zlib 1.2.12-h2f4d8fa_1 --> 1.2.12-h2f4d8fa_2
97
98The following packages will be DOWNGRADED:
99
100 xz 5.2.5-hfd63f10_1 --> 5.2.5-h2f4d8fa_1
101
102
103Proceed ([y]/n)? y
104
105
106Downloading and Extracting Packages
107tqdm-4.64.0 | 126 KB | ################################################################################################################# | 100%
108urllib3-1.26.9 | 183 KB | ################################################################################################################# | 100%
109cryptography-37.0.1 | 1.2 MB | ################################################################################################################# | 100%
110Preparing transaction: done
111Verifying transaction: done
112Executing transaction: done
三、测试
1)配置数据库
kraken2可以自己构建数据库也可以下载构建好的数据库。
下载官方构建好的数据库(ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/),或者第三方构建好的数据库( https://github.com/rrwick/Metagenomics-Index-Correction)
-
MiniKraken2_v1_8GB: (5.5GB) 8GB Kraken 2 Database built from the refseq bacteria, archaea, and viral libraries.
-
MiniKraken2_v2_8GB: (5.5GB) 8GB Kraken 2 Database built from the Refseq bacteria, archaea, and viral libraries and the GRCh38 human genome
-
Kraken 2 16S Greengenes 13_5 DB (72.1 MB)
-
Kraken 2 16S RDP 11.5 DB (168 MB)
-
Kraken 2 16S Silva 132 DB (117 MB)
-
Kraken 2 16S Silva 138 DB (112 MB)
-
**GTDB_r89_54k link**A collection of database files for use with Centrifuge, Kraken 1, or Kraken 2 that can be used to classify metagenomes using the GTDB_389_54k index. More information and details at: https://github.com/rrwick/Metagenomics-Index-Correction
-
Maxikraken2 and Kraken2-microbial databases. These databases are maintained by LomanLab. More information at the link provided.
标准数据库构建方法:
1kraken2-build --standard --threads 24 --db $DBNAME
其中DBNAME是指定数据库存储的位置,此步下载数据超过50GB的数据,数据库在创建期间将使用超过100 GB的磁盘空间。默认下载5种数据库:古菌archaea、细菌bacteria、人类human、载体UniVec_Core、病毒viral。
kraken2可以下载如下:”archaea”, “bacteria”, “plasmid”, “viral”, “human”, “fungi”, “plant”, “protozoa”, “nr”, “nt”, “env_nr”, “env_nt”, “UniVec”, “UniVec_Core”数据库,也可以指定下载数据库:
1kraken2-build --download-library nr --threads 24 --db $DBNAME
2# 也可以添加单个序列到数据库,如下格式:
3# >sequence16|kraken:taxid|32630 Adapter sequence
4# CAAGCAGAAGACGGCATACGAGATCTTCGAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA
5kraken2-build --add-to-library chr1.fa --db $DBNAME
6# 添加完成后需要构建数据库,建立索引
7raken2-build --build --db $DBNAME
2)序列分类
利用上一步建好的数据库进行分类
1kraken2 --db $DBNAME seqs.fa
3)环境变量设置
1KRAKEN2_NUM_THREADS:设置默认线程数
2KRAKEN2_DB_PATH:设置默认数据库位置如:
3export KRAKEN2_DB_PATH="/home/user/my_kraken2_dbs:/data/kraken2_dbs:"
4就可以直接调用相应路径下的数据库,`kraken2 --db mainDB sequences.fa`
参考资料:
1.https://github.com/DerrickWood/kraken2/wiki/Manual
2.https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1891-0
3.https://ccb.jhu.edu/software/kraken2/index.shtml?t=downloads
https://medium.com/analytics-vidhya/publish-a-python-package-to-conda-b352eb0bcb2e
https://hackmd.io/@astrobiomike/kraken2-bracken-standard-build
捐赠本站(Donate)
如您感觉文章有用,可扫码捐赠本站!(If the article useful, you can scan the QR code to donate))
- Author: shisekong
- Link: https://blog.361way.com/arm-kraken2-conda/6818.html
- License: This work is under a 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议. Kindly fulfill the requirements of the aforementioned License when adapting or creating a derivative of this work.