initial commit
This commit is contained in:
commit
5e601d0401
1
.gitattributes
vendored
Normal file
1
.gitattributes
vendored
Normal file
@ -0,0 +1 @@
|
|||||||
|
*.pbxproj binary merge=union
|
48
.gitignore
vendored
Normal file
48
.gitignore
vendored
Normal file
@ -0,0 +1,48 @@
|
|||||||
|
*~
|
||||||
|
*.dSYM
|
||||||
|
.DS_Store
|
||||||
|
tags
|
||||||
|
*-debug
|
||||||
|
*-s
|
||||||
|
*-l
|
||||||
|
hisat2.xcodeproj/project.xcworkspace
|
||||||
|
hisat2.xcodeproj/xcuserdata
|
||||||
|
hisat2.xcodeproj/xcshareddata
|
||||||
|
*.patch
|
||||||
|
|
||||||
|
build_automaton
|
||||||
|
build_index
|
||||||
|
clean_alignment
|
||||||
|
determinize
|
||||||
|
gcsa_alignment
|
||||||
|
gcsa_test
|
||||||
|
hisat2-repeat
|
||||||
|
|
||||||
|
hisat2_test/*.bt2
|
||||||
|
hisat2_test/*.ht2
|
||||||
|
hisat2_test/*.sam
|
||||||
|
hisat2_test/paper_example.malignment.automaton
|
||||||
|
hisat2_test/paper_example.malignment.backbone
|
||||||
|
hisat2_test/paper_example.malignment.gcsa
|
||||||
|
hisat2_test/kim_example*.malignment.automaton
|
||||||
|
hisat2_test/kim_example*.malignment.backbone
|
||||||
|
hisat2_test/kim_example*.malignment.gcsa
|
||||||
|
hisat2_test/genome*
|
||||||
|
hisat2_test/2*
|
||||||
|
hisat2_test/snp142*
|
||||||
|
hisat2_test/testset*
|
||||||
|
|
||||||
|
.idea
|
||||||
|
.vscode
|
||||||
|
|
||||||
|
.ht2lib-obj*
|
||||||
|
*.a
|
||||||
|
*.so
|
||||||
|
docs/_site
|
||||||
|
docs/*.lock
|
||||||
|
docs/.*-cache
|
||||||
|
*.tar.gz
|
||||||
|
*.ipynb
|
||||||
|
*.pyc
|
||||||
|
|
||||||
|
cmake*
|
29
AUTHORS
Normal file
29
AUTHORS
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
Ben Langmead <langmea@cs.jhu.edu> wrote Bowtie 2, which is based partially on
|
||||||
|
Bowtie. Bowtie was written by Ben Langmead and Cole Trapnell.
|
||||||
|
|
||||||
|
Bowtie & Bowtie 2: http://bowtie-bio.sf.net
|
||||||
|
|
||||||
|
A DLL from the pthreads for Win32 library is distributed with the Win32 version
|
||||||
|
of Bowtie 2. The pthreads for Win32 library and the GnuWin32 package have many
|
||||||
|
contributors (see their respective web sites).
|
||||||
|
|
||||||
|
pthreads for Win32: http://sourceware.org/pthreads-win32
|
||||||
|
GnuWin32: http://gnuwin32.sf.net
|
||||||
|
|
||||||
|
The ForkManager.pm perl module is used in Bowtie 2's random testing framework,
|
||||||
|
and is included as scripts/sim/contrib/ForkManager.pm. ForkManager.pm is
|
||||||
|
written by dLux (Szabo, Balazs), with contributions by others. See the perldoc
|
||||||
|
in ForkManager.pm for the complete list.
|
||||||
|
|
||||||
|
The file ls.h includes an implementation of the Larsson-Sadakane suffix sorting
|
||||||
|
algorithm. The implementation is by N. Jesper Larsson and was adapted somewhat
|
||||||
|
for use in Bowtie 2.
|
||||||
|
|
||||||
|
TinyThreads is a portable thread implementation with a fairly compatible subset
|
||||||
|
of C++11 thread management classes written by Marcus Geelnard. For more info
|
||||||
|
check http://tinythreadpp.bitsnbites.eu/
|
||||||
|
|
||||||
|
Various users have kindly supplied patches, bug reports and feature requests
|
||||||
|
over the years. Many, many thanks go to them.
|
||||||
|
|
||||||
|
September 2011
|
BIN
HISAT2-genotype.png
Normal file
BIN
HISAT2-genotype.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 724 KiB |
1
HISAT2_VERSION
Normal file
1
HISAT2_VERSION
Normal file
@ -0,0 +1 @@
|
|||||||
|
2.2.1-3n-0.0.3
|
674
LICENSE
Normal file
674
LICENSE
Normal file
@ -0,0 +1,674 @@
|
|||||||
|
GNU GENERAL PUBLIC LICENSE
|
||||||
|
Version 3, 29 June 2007
|
||||||
|
|
||||||
|
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
|
||||||
|
Everyone is permitted to copy and distribute verbatim copies
|
||||||
|
of this license document, but changing it is not allowed.
|
||||||
|
|
||||||
|
Preamble
|
||||||
|
|
||||||
|
The GNU General Public License is a free, copyleft license for
|
||||||
|
software and other kinds of works.
|
||||||
|
|
||||||
|
The licenses for most software and other practical works are designed
|
||||||
|
to take away your freedom to share and change the works. By contrast,
|
||||||
|
the GNU General Public License is intended to guarantee your freedom to
|
||||||
|
share and change all versions of a program--to make sure it remains free
|
||||||
|
software for all its users. We, the Free Software Foundation, use the
|
||||||
|
GNU General Public License for most of our software; it applies also to
|
||||||
|
any other work released this way by its authors. You can apply it to
|
||||||
|
your programs, too.
|
||||||
|
|
||||||
|
When we speak of free software, we are referring to freedom, not
|
||||||
|
price. Our General Public Licenses are designed to make sure that you
|
||||||
|
have the freedom to distribute copies of free software (and charge for
|
||||||
|
them if you wish), that you receive source code or can get it if you
|
||||||
|
want it, that you can change the software or use pieces of it in new
|
||||||
|
free programs, and that you know you can do these things.
|
||||||
|
|
||||||
|
To protect your rights, we need to prevent others from denying you
|
||||||
|
these rights or asking you to surrender the rights. Therefore, you have
|
||||||
|
certain responsibilities if you distribute copies of the software, or if
|
||||||
|
you modify it: responsibilities to respect the freedom of others.
|
||||||
|
|
||||||
|
For example, if you distribute copies of such a program, whether
|
||||||
|
gratis or for a fee, you must pass on to the recipients the same
|
||||||
|
freedoms that you received. You must make sure that they, too, receive
|
||||||
|
or can get the source code. And you must show them these terms so they
|
||||||
|
know their rights.
|
||||||
|
|
||||||
|
Developers that use the GNU GPL protect your rights with two steps:
|
||||||
|
(1) assert copyright on the software, and (2) offer you this License
|
||||||
|
giving you legal permission to copy, distribute and/or modify it.
|
||||||
|
|
||||||
|
For the developers' and authors' protection, the GPL clearly explains
|
||||||
|
that there is no warranty for this free software. For both users' and
|
||||||
|
authors' sake, the GPL requires that modified versions be marked as
|
||||||
|
changed, so that their problems will not be attributed erroneously to
|
||||||
|
authors of previous versions.
|
||||||
|
|
||||||
|
Some devices are designed to deny users access to install or run
|
||||||
|
modified versions of the software inside them, although the manufacturer
|
||||||
|
can do so. This is fundamentally incompatible with the aim of
|
||||||
|
protecting users' freedom to change the software. The systematic
|
||||||
|
pattern of such abuse occurs in the area of products for individuals to
|
||||||
|
use, which is precisely where it is most unacceptable. Therefore, we
|
||||||
|
have designed this version of the GPL to prohibit the practice for those
|
||||||
|
products. If such problems arise substantially in other domains, we
|
||||||
|
stand ready to extend this provision to those domains in future versions
|
||||||
|
of the GPL, as needed to protect the freedom of users.
|
||||||
|
|
||||||
|
Finally, every program is threatened constantly by software patents.
|
||||||
|
States should not allow patents to restrict development and use of
|
||||||
|
software on general-purpose computers, but in those that do, we wish to
|
||||||
|
avoid the special danger that patents applied to a free program could
|
||||||
|
make it effectively proprietary. To prevent this, the GPL assures that
|
||||||
|
patents cannot be used to render the program non-free.
|
||||||
|
|
||||||
|
The precise terms and conditions for copying, distribution and
|
||||||
|
modification follow.
|
||||||
|
|
||||||
|
TERMS AND CONDITIONS
|
||||||
|
|
||||||
|
0. Definitions.
|
||||||
|
|
||||||
|
"This License" refers to version 3 of the GNU General Public License.
|
||||||
|
|
||||||
|
"Copyright" also means copyright-like laws that apply to other kinds of
|
||||||
|
works, such as semiconductor masks.
|
||||||
|
|
||||||
|
"The Program" refers to any copyrightable work licensed under this
|
||||||
|
License. Each licensee is addressed as "you". "Licensees" and
|
||||||
|
"recipients" may be individuals or organizations.
|
||||||
|
|
||||||
|
To "modify" a work means to copy from or adapt all or part of the work
|
||||||
|
in a fashion requiring copyright permission, other than the making of an
|
||||||
|
exact copy. The resulting work is called a "modified version" of the
|
||||||
|
earlier work or a work "based on" the earlier work.
|
||||||
|
|
||||||
|
A "covered work" means either the unmodified Program or a work based
|
||||||
|
on the Program.
|
||||||
|
|
||||||
|
To "propagate" a work means to do anything with it that, without
|
||||||
|
permission, would make you directly or secondarily liable for
|
||||||
|
infringement under applicable copyright law, except executing it on a
|
||||||
|
computer or modifying a private copy. Propagation includes copying,
|
||||||
|
distribution (with or without modification), making available to the
|
||||||
|
public, and in some countries other activities as well.
|
||||||
|
|
||||||
|
To "convey" a work means any kind of propagation that enables other
|
||||||
|
parties to make or receive copies. Mere interaction with a user through
|
||||||
|
a computer network, with no transfer of a copy, is not conveying.
|
||||||
|
|
||||||
|
An interactive user interface displays "Appropriate Legal Notices"
|
||||||
|
to the extent that it includes a convenient and prominently visible
|
||||||
|
feature that (1) displays an appropriate copyright notice, and (2)
|
||||||
|
tells the user that there is no warranty for the work (except to the
|
||||||
|
extent that warranties are provided), that licensees may convey the
|
||||||
|
work under this License, and how to view a copy of this License. If
|
||||||
|
the interface presents a list of user commands or options, such as a
|
||||||
|
menu, a prominent item in the list meets this criterion.
|
||||||
|
|
||||||
|
1. Source Code.
|
||||||
|
|
||||||
|
The "source code" for a work means the preferred form of the work
|
||||||
|
for making modifications to it. "Object code" means any non-source
|
||||||
|
form of a work.
|
||||||
|
|
||||||
|
A "Standard Interface" means an interface that either is an official
|
||||||
|
standard defined by a recognized standards body, or, in the case of
|
||||||
|
interfaces specified for a particular programming language, one that
|
||||||
|
is widely used among developers working in that language.
|
||||||
|
|
||||||
|
The "System Libraries" of an executable work include anything, other
|
||||||
|
than the work as a whole, that (a) is included in the normal form of
|
||||||
|
packaging a Major Component, but which is not part of that Major
|
||||||
|
Component, and (b) serves only to enable use of the work with that
|
||||||
|
Major Component, or to implement a Standard Interface for which an
|
||||||
|
implementation is available to the public in source code form. A
|
||||||
|
"Major Component", in this context, means a major essential component
|
||||||
|
(kernel, window system, and so on) of the specific operating system
|
||||||
|
(if any) on which the executable work runs, or a compiler used to
|
||||||
|
produce the work, or an object code interpreter used to run it.
|
||||||
|
|
||||||
|
The "Corresponding Source" for a work in object code form means all
|
||||||
|
the source code needed to generate, install, and (for an executable
|
||||||
|
work) run the object code and to modify the work, including scripts to
|
||||||
|
control those activities. However, it does not include the work's
|
||||||
|
System Libraries, or general-purpose tools or generally available free
|
||||||
|
programs which are used unmodified in performing those activities but
|
||||||
|
which are not part of the work. For example, Corresponding Source
|
||||||
|
includes interface definition files associated with source files for
|
||||||
|
the work, and the source code for shared libraries and dynamically
|
||||||
|
linked subprograms that the work is specifically designed to require,
|
||||||
|
such as by intimate data communication or control flow between those
|
||||||
|
subprograms and other parts of the work.
|
||||||
|
|
||||||
|
The Corresponding Source need not include anything that users
|
||||||
|
can regenerate automatically from other parts of the Corresponding
|
||||||
|
Source.
|
||||||
|
|
||||||
|
The Corresponding Source for a work in source code form is that
|
||||||
|
same work.
|
||||||
|
|
||||||
|
2. Basic Permissions.
|
||||||
|
|
||||||
|
All rights granted under this License are granted for the term of
|
||||||
|
copyright on the Program, and are irrevocable provided the stated
|
||||||
|
conditions are met. This License explicitly affirms your unlimited
|
||||||
|
permission to run the unmodified Program. The output from running a
|
||||||
|
covered work is covered by this License only if the output, given its
|
||||||
|
content, constitutes a covered work. This License acknowledges your
|
||||||
|
rights of fair use or other equivalent, as provided by copyright law.
|
||||||
|
|
||||||
|
You may make, run and propagate covered works that you do not
|
||||||
|
convey, without conditions so long as your license otherwise remains
|
||||||
|
in force. You may convey covered works to others for the sole purpose
|
||||||
|
of having them make modifications exclusively for you, or provide you
|
||||||
|
with facilities for running those works, provided that you comply with
|
||||||
|
the terms of this License in conveying all material for which you do
|
||||||
|
not control copyright. Those thus making or running the covered works
|
||||||
|
for you must do so exclusively on your behalf, under your direction
|
||||||
|
and control, on terms that prohibit them from making any copies of
|
||||||
|
your copyrighted material outside their relationship with you.
|
||||||
|
|
||||||
|
Conveying under any other circumstances is permitted solely under
|
||||||
|
the conditions stated below. Sublicensing is not allowed; section 10
|
||||||
|
makes it unnecessary.
|
||||||
|
|
||||||
|
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
|
||||||
|
|
||||||
|
No covered work shall be deemed part of an effective technological
|
||||||
|
measure under any applicable law fulfilling obligations under article
|
||||||
|
11 of the WIPO copyright treaty adopted on 20 December 1996, or
|
||||||
|
similar laws prohibiting or restricting circumvention of such
|
||||||
|
measures.
|
||||||
|
|
||||||
|
When you convey a covered work, you waive any legal power to forbid
|
||||||
|
circumvention of technological measures to the extent such circumvention
|
||||||
|
is effected by exercising rights under this License with respect to
|
||||||
|
the covered work, and you disclaim any intention to limit operation or
|
||||||
|
modification of the work as a means of enforcing, against the work's
|
||||||
|
users, your or third parties' legal rights to forbid circumvention of
|
||||||
|
technological measures.
|
||||||
|
|
||||||
|
4. Conveying Verbatim Copies.
|
||||||
|
|
||||||
|
You may convey verbatim copies of the Program's source code as you
|
||||||
|
receive it, in any medium, provided that you conspicuously and
|
||||||
|
appropriately publish on each copy an appropriate copyright notice;
|
||||||
|
keep intact all notices stating that this License and any
|
||||||
|
non-permissive terms added in accord with section 7 apply to the code;
|
||||||
|
keep intact all notices of the absence of any warranty; and give all
|
||||||
|
recipients a copy of this License along with the Program.
|
||||||
|
|
||||||
|
You may charge any price or no price for each copy that you convey,
|
||||||
|
and you may offer support or warranty protection for a fee.
|
||||||
|
|
||||||
|
5. Conveying Modified Source Versions.
|
||||||
|
|
||||||
|
You may convey a work based on the Program, or the modifications to
|
||||||
|
produce it from the Program, in the form of source code under the
|
||||||
|
terms of section 4, provided that you also meet all of these conditions:
|
||||||
|
|
||||||
|
a) The work must carry prominent notices stating that you modified
|
||||||
|
it, and giving a relevant date.
|
||||||
|
|
||||||
|
b) The work must carry prominent notices stating that it is
|
||||||
|
released under this License and any conditions added under section
|
||||||
|
7. This requirement modifies the requirement in section 4 to
|
||||||
|
"keep intact all notices".
|
||||||
|
|
||||||
|
c) You must license the entire work, as a whole, under this
|
||||||
|
License to anyone who comes into possession of a copy. This
|
||||||
|
License will therefore apply, along with any applicable section 7
|
||||||
|
additional terms, to the whole of the work, and all its parts,
|
||||||
|
regardless of how they are packaged. This License gives no
|
||||||
|
permission to license the work in any other way, but it does not
|
||||||
|
invalidate such permission if you have separately received it.
|
||||||
|
|
||||||
|
d) If the work has interactive user interfaces, each must display
|
||||||
|
Appropriate Legal Notices; however, if the Program has interactive
|
||||||
|
interfaces that do not display Appropriate Legal Notices, your
|
||||||
|
work need not make them do so.
|
||||||
|
|
||||||
|
A compilation of a covered work with other separate and independent
|
||||||
|
works, which are not by their nature extensions of the covered work,
|
||||||
|
and which are not combined with it such as to form a larger program,
|
||||||
|
in or on a volume of a storage or distribution medium, is called an
|
||||||
|
"aggregate" if the compilation and its resulting copyright are not
|
||||||
|
used to limit the access or legal rights of the compilation's users
|
||||||
|
beyond what the individual works permit. Inclusion of a covered work
|
||||||
|
in an aggregate does not cause this License to apply to the other
|
||||||
|
parts of the aggregate.
|
||||||
|
|
||||||
|
6. Conveying Non-Source Forms.
|
||||||
|
|
||||||
|
You may convey a covered work in object code form under the terms
|
||||||
|
of sections 4 and 5, provided that you also convey the
|
||||||
|
machine-readable Corresponding Source under the terms of this License,
|
||||||
|
in one of these ways:
|
||||||
|
|
||||||
|
a) Convey the object code in, or embodied in, a physical product
|
||||||
|
(including a physical distribution medium), accompanied by the
|
||||||
|
Corresponding Source fixed on a durable physical medium
|
||||||
|
customarily used for software interchange.
|
||||||
|
|
||||||
|
b) Convey the object code in, or embodied in, a physical product
|
||||||
|
(including a physical distribution medium), accompanied by a
|
||||||
|
written offer, valid for at least three years and valid for as
|
||||||
|
long as you offer spare parts or customer support for that product
|
||||||
|
model, to give anyone who possesses the object code either (1) a
|
||||||
|
copy of the Corresponding Source for all the software in the
|
||||||
|
product that is covered by this License, on a durable physical
|
||||||
|
medium customarily used for software interchange, for a price no
|
||||||
|
more than your reasonable cost of physically performing this
|
||||||
|
conveying of source, or (2) access to copy the
|
||||||
|
Corresponding Source from a network server at no charge.
|
||||||
|
|
||||||
|
c) Convey individual copies of the object code with a copy of the
|
||||||
|
written offer to provide the Corresponding Source. This
|
||||||
|
alternative is allowed only occasionally and noncommercially, and
|
||||||
|
only if you received the object code with such an offer, in accord
|
||||||
|
with subsection 6b.
|
||||||
|
|
||||||
|
d) Convey the object code by offering access from a designated
|
||||||
|
place (gratis or for a charge), and offer equivalent access to the
|
||||||
|
Corresponding Source in the same way through the same place at no
|
||||||
|
further charge. You need not require recipients to copy the
|
||||||
|
Corresponding Source along with the object code. If the place to
|
||||||
|
copy the object code is a network server, the Corresponding Source
|
||||||
|
may be on a different server (operated by you or a third party)
|
||||||
|
that supports equivalent copying facilities, provided you maintain
|
||||||
|
clear directions next to the object code saying where to find the
|
||||||
|
Corresponding Source. Regardless of what server hosts the
|
||||||
|
Corresponding Source, you remain obligated to ensure that it is
|
||||||
|
available for as long as needed to satisfy these requirements.
|
||||||
|
|
||||||
|
e) Convey the object code using peer-to-peer transmission, provided
|
||||||
|
you inform other peers where the object code and Corresponding
|
||||||
|
Source of the work are being offered to the general public at no
|
||||||
|
charge under subsection 6d.
|
||||||
|
|
||||||
|
A separable portion of the object code, whose source code is excluded
|
||||||
|
from the Corresponding Source as a System Library, need not be
|
||||||
|
included in conveying the object code work.
|
||||||
|
|
||||||
|
A "User Product" is either (1) a "consumer product", which means any
|
||||||
|
tangible personal property which is normally used for personal, family,
|
||||||
|
or household purposes, or (2) anything designed or sold for incorporation
|
||||||
|
into a dwelling. In determining whether a product is a consumer product,
|
||||||
|
doubtful cases shall be resolved in favor of coverage. For a particular
|
||||||
|
product received by a particular user, "normally used" refers to a
|
||||||
|
typical or common use of that class of product, regardless of the status
|
||||||
|
of the particular user or of the way in which the particular user
|
||||||
|
actually uses, or expects or is expected to use, the product. A product
|
||||||
|
is a consumer product regardless of whether the product has substantial
|
||||||
|
commercial, industrial or non-consumer uses, unless such uses represent
|
||||||
|
the only significant mode of use of the product.
|
||||||
|
|
||||||
|
"Installation Information" for a User Product means any methods,
|
||||||
|
procedures, authorization keys, or other information required to install
|
||||||
|
and execute modified versions of a covered work in that User Product from
|
||||||
|
a modified version of its Corresponding Source. The information must
|
||||||
|
suffice to ensure that the continued functioning of the modified object
|
||||||
|
code is in no case prevented or interfered with solely because
|
||||||
|
modification has been made.
|
||||||
|
|
||||||
|
If you convey an object code work under this section in, or with, or
|
||||||
|
specifically for use in, a User Product, and the conveying occurs as
|
||||||
|
part of a transaction in which the right of possession and use of the
|
||||||
|
User Product is transferred to the recipient in perpetuity or for a
|
||||||
|
fixed term (regardless of how the transaction is characterized), the
|
||||||
|
Corresponding Source conveyed under this section must be accompanied
|
||||||
|
by the Installation Information. But this requirement does not apply
|
||||||
|
if neither you nor any third party retains the ability to install
|
||||||
|
modified object code on the User Product (for example, the work has
|
||||||
|
been installed in ROM).
|
||||||
|
|
||||||
|
The requirement to provide Installation Information does not include a
|
||||||
|
requirement to continue to provide support service, warranty, or updates
|
||||||
|
for a work that has been modified or installed by the recipient, or for
|
||||||
|
the User Product in which it has been modified or installed. Access to a
|
||||||
|
network may be denied when the modification itself materially and
|
||||||
|
adversely affects the operation of the network or violates the rules and
|
||||||
|
protocols for communication across the network.
|
||||||
|
|
||||||
|
Corresponding Source conveyed, and Installation Information provided,
|
||||||
|
in accord with this section must be in a format that is publicly
|
||||||
|
documented (and with an implementation available to the public in
|
||||||
|
source code form), and must require no special password or key for
|
||||||
|
unpacking, reading or copying.
|
||||||
|
|
||||||
|
7. Additional Terms.
|
||||||
|
|
||||||
|
"Additional permissions" are terms that supplement the terms of this
|
||||||
|
License by making exceptions from one or more of its conditions.
|
||||||
|
Additional permissions that are applicable to the entire Program shall
|
||||||
|
be treated as though they were included in this License, to the extent
|
||||||
|
that they are valid under applicable law. If additional permissions
|
||||||
|
apply only to part of the Program, that part may be used separately
|
||||||
|
under those permissions, but the entire Program remains governed by
|
||||||
|
this License without regard to the additional permissions.
|
||||||
|
|
||||||
|
When you convey a copy of a covered work, you may at your option
|
||||||
|
remove any additional permissions from that copy, or from any part of
|
||||||
|
it. (Additional permissions may be written to require their own
|
||||||
|
removal in certain cases when you modify the work.) You may place
|
||||||
|
additional permissions on material, added by you to a covered work,
|
||||||
|
for which you have or can give appropriate copyright permission.
|
||||||
|
|
||||||
|
Notwithstanding any other provision of this License, for material you
|
||||||
|
add to a covered work, you may (if authorized by the copyright holders of
|
||||||
|
that material) supplement the terms of this License with terms:
|
||||||
|
|
||||||
|
a) Disclaiming warranty or limiting liability differently from the
|
||||||
|
terms of sections 15 and 16 of this License; or
|
||||||
|
|
||||||
|
b) Requiring preservation of specified reasonable legal notices or
|
||||||
|
author attributions in that material or in the Appropriate Legal
|
||||||
|
Notices displayed by works containing it; or
|
||||||
|
|
||||||
|
c) Prohibiting misrepresentation of the origin of that material, or
|
||||||
|
requiring that modified versions of such material be marked in
|
||||||
|
reasonable ways as different from the original version; or
|
||||||
|
|
||||||
|
d) Limiting the use for publicity purposes of names of licensors or
|
||||||
|
authors of the material; or
|
||||||
|
|
||||||
|
e) Declining to grant rights under trademark law for use of some
|
||||||
|
trade names, trademarks, or service marks; or
|
||||||
|
|
||||||
|
f) Requiring indemnification of licensors and authors of that
|
||||||
|
material by anyone who conveys the material (or modified versions of
|
||||||
|
it) with contractual assumptions of liability to the recipient, for
|
||||||
|
any liability that these contractual assumptions directly impose on
|
||||||
|
those licensors and authors.
|
||||||
|
|
||||||
|
All other non-permissive additional terms are considered "further
|
||||||
|
restrictions" within the meaning of section 10. If the Program as you
|
||||||
|
received it, or any part of it, contains a notice stating that it is
|
||||||
|
governed by this License along with a term that is a further
|
||||||
|
restriction, you may remove that term. If a license document contains
|
||||||
|
a further restriction but permits relicensing or conveying under this
|
||||||
|
License, you may add to a covered work material governed by the terms
|
||||||
|
of that license document, provided that the further restriction does
|
||||||
|
not survive such relicensing or conveying.
|
||||||
|
|
||||||
|
If you add terms to a covered work in accord with this section, you
|
||||||
|
must place, in the relevant source files, a statement of the
|
||||||
|
additional terms that apply to those files, or a notice indicating
|
||||||
|
where to find the applicable terms.
|
||||||
|
|
||||||
|
Additional terms, permissive or non-permissive, may be stated in the
|
||||||
|
form of a separately written license, or stated as exceptions;
|
||||||
|
the above requirements apply either way.
|
||||||
|
|
||||||
|
8. Termination.
|
||||||
|
|
||||||
|
You may not propagate or modify a covered work except as expressly
|
||||||
|
provided under this License. Any attempt otherwise to propagate or
|
||||||
|
modify it is void, and will automatically terminate your rights under
|
||||||
|
this License (including any patent licenses granted under the third
|
||||||
|
paragraph of section 11).
|
||||||
|
|
||||||
|
However, if you cease all violation of this License, then your
|
||||||
|
license from a particular copyright holder is reinstated (a)
|
||||||
|
provisionally, unless and until the copyright holder explicitly and
|
||||||
|
finally terminates your license, and (b) permanently, if the copyright
|
||||||
|
holder fails to notify you of the violation by some reasonable means
|
||||||
|
prior to 60 days after the cessation.
|
||||||
|
|
||||||
|
Moreover, your license from a particular copyright holder is
|
||||||
|
reinstated permanently if the copyright holder notifies you of the
|
||||||
|
violation by some reasonable means, this is the first time you have
|
||||||
|
received notice of violation of this License (for any work) from that
|
||||||
|
copyright holder, and you cure the violation prior to 30 days after
|
||||||
|
your receipt of the notice.
|
||||||
|
|
||||||
|
Termination of your rights under this section does not terminate the
|
||||||
|
licenses of parties who have received copies or rights from you under
|
||||||
|
this License. If your rights have been terminated and not permanently
|
||||||
|
reinstated, you do not qualify to receive new licenses for the same
|
||||||
|
material under section 10.
|
||||||
|
|
||||||
|
9. Acceptance Not Required for Having Copies.
|
||||||
|
|
||||||
|
You are not required to accept this License in order to receive or
|
||||||
|
run a copy of the Program. Ancillary propagation of a covered work
|
||||||
|
occurring solely as a consequence of using peer-to-peer transmission
|
||||||
|
to receive a copy likewise does not require acceptance. However,
|
||||||
|
nothing other than this License grants you permission to propagate or
|
||||||
|
modify any covered work. These actions infringe copyright if you do
|
||||||
|
not accept this License. Therefore, by modifying or propagating a
|
||||||
|
covered work, you indicate your acceptance of this License to do so.
|
||||||
|
|
||||||
|
10. Automatic Licensing of Downstream Recipients.
|
||||||
|
|
||||||
|
Each time you convey a covered work, the recipient automatically
|
||||||
|
receives a license from the original licensors, to run, modify and
|
||||||
|
propagate that work, subject to this License. You are not responsible
|
||||||
|
for enforcing compliance by third parties with this License.
|
||||||
|
|
||||||
|
An "entity transaction" is a transaction transferring control of an
|
||||||
|
organization, or substantially all assets of one, or subdividing an
|
||||||
|
organization, or merging organizations. If propagation of a covered
|
||||||
|
work results from an entity transaction, each party to that
|
||||||
|
transaction who receives a copy of the work also receives whatever
|
||||||
|
licenses to the work the party's predecessor in interest had or could
|
||||||
|
give under the previous paragraph, plus a right to possession of the
|
||||||
|
Corresponding Source of the work from the predecessor in interest, if
|
||||||
|
the predecessor has it or can get it with reasonable efforts.
|
||||||
|
|
||||||
|
You may not impose any further restrictions on the exercise of the
|
||||||
|
rights granted or affirmed under this License. For example, you may
|
||||||
|
not impose a license fee, royalty, or other charge for exercise of
|
||||||
|
rights granted under this License, and you may not initiate litigation
|
||||||
|
(including a cross-claim or counterclaim in a lawsuit) alleging that
|
||||||
|
any patent claim is infringed by making, using, selling, offering for
|
||||||
|
sale, or importing the Program or any portion of it.
|
||||||
|
|
||||||
|
11. Patents.
|
||||||
|
|
||||||
|
A "contributor" is a copyright holder who authorizes use under this
|
||||||
|
License of the Program or a work on which the Program is based. The
|
||||||
|
work thus licensed is called the contributor's "contributor version".
|
||||||
|
|
||||||
|
A contributor's "essential patent claims" are all patent claims
|
||||||
|
owned or controlled by the contributor, whether already acquired or
|
||||||
|
hereafter acquired, that would be infringed by some manner, permitted
|
||||||
|
by this License, of making, using, or selling its contributor version,
|
||||||
|
but do not include claims that would be infringed only as a
|
||||||
|
consequence of further modification of the contributor version. For
|
||||||
|
purposes of this definition, "control" includes the right to grant
|
||||||
|
patent sublicenses in a manner consistent with the requirements of
|
||||||
|
this License.
|
||||||
|
|
||||||
|
Each contributor grants you a non-exclusive, worldwide, royalty-free
|
||||||
|
patent license under the contributor's essential patent claims, to
|
||||||
|
make, use, sell, offer for sale, import and otherwise run, modify and
|
||||||
|
propagate the contents of its contributor version.
|
||||||
|
|
||||||
|
In the following three paragraphs, a "patent license" is any express
|
||||||
|
agreement or commitment, however denominated, not to enforce a patent
|
||||||
|
(such as an express permission to practice a patent or covenant not to
|
||||||
|
sue for patent infringement). To "grant" such a patent license to a
|
||||||
|
party means to make such an agreement or commitment not to enforce a
|
||||||
|
patent against the party.
|
||||||
|
|
||||||
|
If you convey a covered work, knowingly relying on a patent license,
|
||||||
|
and the Corresponding Source of the work is not available for anyone
|
||||||
|
to copy, free of charge and under the terms of this License, through a
|
||||||
|
publicly available network server or other readily accessible means,
|
||||||
|
then you must either (1) cause the Corresponding Source to be so
|
||||||
|
available, or (2) arrange to deprive yourself of the benefit of the
|
||||||
|
patent license for this particular work, or (3) arrange, in a manner
|
||||||
|
consistent with the requirements of this License, to extend the patent
|
||||||
|
license to downstream recipients. "Knowingly relying" means you have
|
||||||
|
actual knowledge that, but for the patent license, your conveying the
|
||||||
|
covered work in a country, or your recipient's use of the covered work
|
||||||
|
in a country, would infringe one or more identifiable patents in that
|
||||||
|
country that you have reason to believe are valid.
|
||||||
|
|
||||||
|
If, pursuant to or in connection with a single transaction or
|
||||||
|
arrangement, you convey, or propagate by procuring conveyance of, a
|
||||||
|
covered work, and grant a patent license to some of the parties
|
||||||
|
receiving the covered work authorizing them to use, propagate, modify
|
||||||
|
or convey a specific copy of the covered work, then the patent license
|
||||||
|
you grant is automatically extended to all recipients of the covered
|
||||||
|
work and works based on it.
|
||||||
|
|
||||||
|
A patent license is "discriminatory" if it does not include within
|
||||||
|
the scope of its coverage, prohibits the exercise of, or is
|
||||||
|
conditioned on the non-exercise of one or more of the rights that are
|
||||||
|
specifically granted under this License. You may not convey a covered
|
||||||
|
work if you are a party to an arrangement with a third party that is
|
||||||
|
in the business of distributing software, under which you make payment
|
||||||
|
to the third party based on the extent of your activity of conveying
|
||||||
|
the work, and under which the third party grants, to any of the
|
||||||
|
parties who would receive the covered work from you, a discriminatory
|
||||||
|
patent license (a) in connection with copies of the covered work
|
||||||
|
conveyed by you (or copies made from those copies), or (b) primarily
|
||||||
|
for and in connection with specific products or compilations that
|
||||||
|
contain the covered work, unless you entered into that arrangement,
|
||||||
|
or that patent license was granted, prior to 28 March 2007.
|
||||||
|
|
||||||
|
Nothing in this License shall be construed as excluding or limiting
|
||||||
|
any implied license or other defenses to infringement that may
|
||||||
|
otherwise be available to you under applicable patent law.
|
||||||
|
|
||||||
|
12. No Surrender of Others' Freedom.
|
||||||
|
|
||||||
|
If conditions are imposed on you (whether by court order, agreement or
|
||||||
|
otherwise) that contradict the conditions of this License, they do not
|
||||||
|
excuse you from the conditions of this License. If you cannot convey a
|
||||||
|
covered work so as to satisfy simultaneously your obligations under this
|
||||||
|
License and any other pertinent obligations, then as a consequence you may
|
||||||
|
not convey it at all. For example, if you agree to terms that obligate you
|
||||||
|
to collect a royalty for further conveying from those to whom you convey
|
||||||
|
the Program, the only way you could satisfy both those terms and this
|
||||||
|
License would be to refrain entirely from conveying the Program.
|
||||||
|
|
||||||
|
13. Use with the GNU Affero General Public License.
|
||||||
|
|
||||||
|
Notwithstanding any other provision of this License, you have
|
||||||
|
permission to link or combine any covered work with a work licensed
|
||||||
|
under version 3 of the GNU Affero General Public License into a single
|
||||||
|
combined work, and to convey the resulting work. The terms of this
|
||||||
|
License will continue to apply to the part which is the covered work,
|
||||||
|
but the special requirements of the GNU Affero General Public License,
|
||||||
|
section 13, concerning interaction through a network will apply to the
|
||||||
|
combination as such.
|
||||||
|
|
||||||
|
14. Revised Versions of this License.
|
||||||
|
|
||||||
|
The Free Software Foundation may publish revised and/or new versions of
|
||||||
|
the GNU General Public License from time to time. Such new versions will
|
||||||
|
be similar in spirit to the present version, but may differ in detail to
|
||||||
|
address new problems or concerns.
|
||||||
|
|
||||||
|
Each version is given a distinguishing version number. If the
|
||||||
|
Program specifies that a certain numbered version of the GNU General
|
||||||
|
Public License "or any later version" applies to it, you have the
|
||||||
|
option of following the terms and conditions either of that numbered
|
||||||
|
version or of any later version published by the Free Software
|
||||||
|
Foundation. If the Program does not specify a version number of the
|
||||||
|
GNU General Public License, you may choose any version ever published
|
||||||
|
by the Free Software Foundation.
|
||||||
|
|
||||||
|
If the Program specifies that a proxy can decide which future
|
||||||
|
versions of the GNU General Public License can be used, that proxy's
|
||||||
|
public statement of acceptance of a version permanently authorizes you
|
||||||
|
to choose that version for the Program.
|
||||||
|
|
||||||
|
Later license versions may give you additional or different
|
||||||
|
permissions. However, no additional obligations are imposed on any
|
||||||
|
author or copyright holder as a result of your choosing to follow a
|
||||||
|
later version.
|
||||||
|
|
||||||
|
15. Disclaimer of Warranty.
|
||||||
|
|
||||||
|
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
|
||||||
|
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
|
||||||
|
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
|
||||||
|
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
|
||||||
|
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
|
||||||
|
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
|
||||||
|
|
||||||
|
16. Limitation of Liability.
|
||||||
|
|
||||||
|
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||||
|
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
|
||||||
|
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
|
||||||
|
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
|
||||||
|
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
|
||||||
|
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
|
||||||
|
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
|
||||||
|
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
|
||||||
|
SUCH DAMAGES.
|
||||||
|
|
||||||
|
17. Interpretation of Sections 15 and 16.
|
||||||
|
|
||||||
|
If the disclaimer of warranty and limitation of liability provided
|
||||||
|
above cannot be given local legal effect according to their terms,
|
||||||
|
reviewing courts shall apply local law that most closely approximates
|
||||||
|
an absolute waiver of all civil liability in connection with the
|
||||||
|
Program, unless a warranty or assumption of liability accompanies a
|
||||||
|
copy of the Program in return for a fee.
|
||||||
|
|
||||||
|
END OF TERMS AND CONDITIONS
|
||||||
|
|
||||||
|
How to Apply These Terms to Your New Programs
|
||||||
|
|
||||||
|
If you develop a new program, and you want it to be of the greatest
|
||||||
|
possible use to the public, the best way to achieve this is to make it
|
||||||
|
free software which everyone can redistribute and change under these terms.
|
||||||
|
|
||||||
|
To do so, attach the following notices to the program. It is safest
|
||||||
|
to attach them to the start of each source file to most effectively
|
||||||
|
state the exclusion of warranty; and each file should have at least
|
||||||
|
the "copyright" line and a pointer to where the full notice is found.
|
||||||
|
|
||||||
|
<one line to give the program's name and a brief idea of what it does.>
|
||||||
|
Copyright (C) <year> <name of author>
|
||||||
|
|
||||||
|
This program is free software: you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation, either version 3 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
|
||||||
|
Also add information on how to contact you by electronic and paper mail.
|
||||||
|
|
||||||
|
If the program does terminal interaction, make it output a short
|
||||||
|
notice like this when it starts in an interactive mode:
|
||||||
|
|
||||||
|
<program> Copyright (C) <year> <name of author>
|
||||||
|
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||||
|
This is free software, and you are welcome to redistribute it
|
||||||
|
under certain conditions; type `show c' for details.
|
||||||
|
|
||||||
|
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||||
|
parts of the General Public License. Of course, your program's commands
|
||||||
|
might be different; for a GUI interface, you would use an "about box".
|
||||||
|
|
||||||
|
You should also get your employer (if you work as a programmer) or school,
|
||||||
|
if any, to sign a "copyright disclaimer" for the program, if necessary.
|
||||||
|
For more information on this, and how to apply and follow the GNU GPL, see
|
||||||
|
<http://www.gnu.org/licenses/>.
|
||||||
|
|
||||||
|
The GNU General Public License does not permit incorporating your program
|
||||||
|
into proprietary programs. If your program is a subroutine library, you
|
||||||
|
may consider it more useful to permit linking proprietary applications with
|
||||||
|
the library. If this is what you want to do, use the GNU Lesser General
|
||||||
|
Public License instead of this License. But first, please read
|
||||||
|
<http://www.gnu.org/philosophy/why-not-lgpl.html>.
|
2437
MANUAL.markdown
Normal file
2437
MANUAL.markdown
Normal file
File diff suppressed because it is too large
Load Diff
590
Makefile
Normal file
590
Makefile
Normal file
@ -0,0 +1,590 @@
|
|||||||
|
#
|
||||||
|
# Copyright 2015, Daehwan Kim <infphilo@gmail.com>
|
||||||
|
#
|
||||||
|
# This file is part of HISAT2.
|
||||||
|
#
|
||||||
|
# HISAT 2 is free software: you can redistribute it and/or modify
|
||||||
|
# it under the terms of the GNU General Public License as published by
|
||||||
|
# the Free Software Foundation, either version 3 of the License, or
|
||||||
|
# (at your option) any later version.
|
||||||
|
#
|
||||||
|
# HISAT 2 is distributed in the hope that it will be useful,
|
||||||
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
# GNU General Public License for more details.
|
||||||
|
#
|
||||||
|
# You should have received a copy of the GNU General Public License
|
||||||
|
# along with HISAT. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
#
|
||||||
|
#
|
||||||
|
# Makefile for hisat2-align, hisat2-build, hisat2-inspect
|
||||||
|
#
|
||||||
|
|
||||||
|
INC =
|
||||||
|
GCC_PREFIX = $(shell dirname `which gcc`)
|
||||||
|
GCC_SUFFIX =
|
||||||
|
CC = $(GCC_PREFIX)/gcc$(GCC_SUFFIX)
|
||||||
|
CPP = $(GCC_PREFIX)/g++$(GCC_SUFFIX)
|
||||||
|
CXX = $(CPP)
|
||||||
|
HEADERS = $(wildcard *.h)
|
||||||
|
BOWTIE_MM = 1
|
||||||
|
BOWTIE_SHARED_MEM = 0
|
||||||
|
|
||||||
|
# Detect Cygwin or MinGW
|
||||||
|
WINDOWS = 0
|
||||||
|
CYGWIN = 0
|
||||||
|
MINGW = 0
|
||||||
|
ifneq (,$(findstring CYGWIN,$(shell uname)))
|
||||||
|
WINDOWS = 1
|
||||||
|
CYGWIN = 1
|
||||||
|
# POSIX memory-mapped files not currently supported on Windows
|
||||||
|
BOWTIE_MM = 0
|
||||||
|
BOWTIE_SHARED_MEM = 0
|
||||||
|
else
|
||||||
|
ifneq (,$(findstring MINGW,$(shell uname)))
|
||||||
|
WINDOWS = 1
|
||||||
|
MINGW = 1
|
||||||
|
# POSIX memory-mapped files not currently supported on Windows
|
||||||
|
BOWTIE_MM = 0
|
||||||
|
BOWTIE_SHARED_MEM = 0
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
|
||||||
|
MACOS = 0
|
||||||
|
ifneq (,$(findstring Darwin,$(shell uname)))
|
||||||
|
MACOS = 1
|
||||||
|
endif
|
||||||
|
|
||||||
|
EXTRA_FLAGS += -DPOPCNT_CAPABILITY -std=c++11
|
||||||
|
INC += -I. -I third_party
|
||||||
|
|
||||||
|
MM_DEF =
|
||||||
|
|
||||||
|
ifeq (1,$(BOWTIE_MM))
|
||||||
|
MM_DEF = -DBOWTIE_MM
|
||||||
|
endif
|
||||||
|
|
||||||
|
SHMEM_DEF =
|
||||||
|
|
||||||
|
ifeq (1,$(BOWTIE_SHARED_MEM))
|
||||||
|
SHMEM_DEF = -DBOWTIE_SHARED_MEM
|
||||||
|
endif
|
||||||
|
|
||||||
|
PTHREAD_PKG =
|
||||||
|
PTHREAD_LIB =
|
||||||
|
|
||||||
|
ifeq (1,$(MINGW))
|
||||||
|
PTHREAD_LIB =
|
||||||
|
else
|
||||||
|
PTHREAD_LIB = -lpthread
|
||||||
|
endif
|
||||||
|
|
||||||
|
SEARCH_LIBS =
|
||||||
|
BUILD_LIBS =
|
||||||
|
INSPECT_LIBS =
|
||||||
|
|
||||||
|
ifeq (1,$(MINGW))
|
||||||
|
BUILD_LIBS =
|
||||||
|
INSPECT_LIBS =
|
||||||
|
endif
|
||||||
|
|
||||||
|
USE_SRA = 0
|
||||||
|
SRA_DEF =
|
||||||
|
SRA_LIB =
|
||||||
|
SERACH_INC =
|
||||||
|
ifeq (1,$(USE_SRA))
|
||||||
|
SRA_DEF = -DUSE_SRA
|
||||||
|
SRA_LIB = -lncbi-ngs-c++-static -lngs-c++-static -lncbi-vdb-static -ldl
|
||||||
|
SEARCH_INC += -I$(NCBI_NGS_DIR)/include -I$(NCBI_VDB_DIR)/include
|
||||||
|
SEARCH_LIBS += -L$(NCBI_NGS_DIR)/lib64 -L$(NCBI_VDB_DIR)/lib64
|
||||||
|
endif
|
||||||
|
|
||||||
|
LIBS = $(PTHREAD_LIB)
|
||||||
|
|
||||||
|
HT2LIB_DIR = hisat2lib
|
||||||
|
|
||||||
|
HT2LIB_CPPS = $(HT2LIB_DIR)/ht2_init.cpp \
|
||||||
|
$(HT2LIB_DIR)/ht2_repeat.cpp \
|
||||||
|
$(HT2LIB_DIR)/ht2_index.cpp
|
||||||
|
|
||||||
|
SHARED_CPPS = ccnt_lut.cpp ref_read.cpp alphabet.cpp shmem.cpp \
|
||||||
|
edit.cpp gfm.cpp \
|
||||||
|
reference.cpp ds.cpp multikey_qsort.cpp limit.cpp \
|
||||||
|
random_source.cpp tinythread.cpp utility_3n.cpp
|
||||||
|
SEARCH_CPPS = qual.cpp pat.cpp \
|
||||||
|
read_qseq.cpp aligner_seed_policy.cpp \
|
||||||
|
aligner_seed.cpp \
|
||||||
|
aligner_seed2.cpp \
|
||||||
|
aligner_sw.cpp \
|
||||||
|
aligner_sw_driver.cpp aligner_cache.cpp \
|
||||||
|
aligner_result.cpp ref_coord.cpp mask.cpp \
|
||||||
|
pe.cpp aln_sink.cpp dp_framer.cpp \
|
||||||
|
scoring.cpp presets.cpp unique.cpp \
|
||||||
|
simple_func.cpp \
|
||||||
|
random_util.cpp \
|
||||||
|
aligner_bt.cpp sse_util.cpp \
|
||||||
|
aligner_swsse.cpp outq.cpp \
|
||||||
|
aligner_swsse_loc_i16.cpp \
|
||||||
|
aligner_swsse_ee_i16.cpp \
|
||||||
|
aligner_swsse_loc_u8.cpp \
|
||||||
|
aligner_swsse_ee_u8.cpp \
|
||||||
|
aligner_driver.cpp \
|
||||||
|
splice_site.cpp \
|
||||||
|
alignment_3n.cpp \
|
||||||
|
position_3n.cpp \
|
||||||
|
$(HT2LIB_CPPS)
|
||||||
|
|
||||||
|
BUILD_CPPS = diff_sample.cpp
|
||||||
|
|
||||||
|
REPEAT_CPPS = \
|
||||||
|
mask.cpp \
|
||||||
|
qual.cpp \
|
||||||
|
aligner_bt.cpp \
|
||||||
|
scoring.cpp \
|
||||||
|
simple_func.cpp \
|
||||||
|
dp_framer.cpp \
|
||||||
|
aligner_result.cpp \
|
||||||
|
aligner_sw_driver.cpp \
|
||||||
|
aligner_sw.cpp \
|
||||||
|
aligner_swsse_ee_i16.cpp \
|
||||||
|
aligner_swsse_ee_u8.cpp \
|
||||||
|
aligner_swsse_loc_i16.cpp \
|
||||||
|
aligner_swsse_loc_u8.cpp \
|
||||||
|
aligner_swsse.cpp \
|
||||||
|
bit_packed_array.cpp \
|
||||||
|
repeat_builder.cpp
|
||||||
|
|
||||||
|
THREE_N_HEADERS = \
|
||||||
|
position_3n_table.h \
|
||||||
|
alignment_3n_table.h \
|
||||||
|
utility_3n_table.h
|
||||||
|
|
||||||
|
HISAT2_CPPS_MAIN = $(SEARCH_CPPS) hisat2_main.cpp
|
||||||
|
HISAT2_BUILD_CPPS_MAIN = $(BUILD_CPPS) hisat2_build_main.cpp
|
||||||
|
HISAT2_REPEAT_CPPS_MAIN = $(REPEAT_CPPS) $(BUILD_CPPS) hisat2_repeat_main.cpp
|
||||||
|
|
||||||
|
SEARCH_FRAGMENTS = $(wildcard search_*_phase*.c)
|
||||||
|
VERSION := $(shell cat HISAT2_VERSION)
|
||||||
|
|
||||||
|
# Convert BITS=?? to a -m flag
|
||||||
|
BITS=32
|
||||||
|
ifeq (x86_64,$(shell uname -m))
|
||||||
|
BITS=64
|
||||||
|
endif
|
||||||
|
# msys will always be 32 bit so look at the cpu arch instead.
|
||||||
|
ifneq (,$(findstring AMD64,$(PROCESSOR_ARCHITEW6432)))
|
||||||
|
ifeq (1,$(MINGW))
|
||||||
|
BITS=64
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
BITS_FLAG =
|
||||||
|
|
||||||
|
ifeq (32,$(BITS))
|
||||||
|
BITS_FLAG = -m32
|
||||||
|
endif
|
||||||
|
|
||||||
|
ifeq (64,$(BITS))
|
||||||
|
BITS_FLAG = -m64
|
||||||
|
endif
|
||||||
|
SSE_FLAG=-msse2
|
||||||
|
|
||||||
|
DEBUG_FLAGS = -O0 -g3 $(BITS_FLAG) $(SSE_FLAG)
|
||||||
|
DEBUG_DEFS = -DCOMPILER_OPTIONS="\"$(DEBUG_FLAGS) $(EXTRA_FLAGS)\""
|
||||||
|
RELEASE_FLAGS = -O3 $(BITS_FLAG) $(SSE_FLAG) -funroll-loops -g3
|
||||||
|
RELEASE_DEFS = -DCOMPILER_OPTIONS="\"$(RELEASE_FLAGS) $(EXTRA_FLAGS)\""
|
||||||
|
NOASSERT_FLAGS = -DNDEBUG
|
||||||
|
FILE_FLAGS = -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
|
||||||
|
HT2LIB_FLAGS = -DHISAT2_BUILD_LIB
|
||||||
|
ifeq (1,$(USE_SRA))
|
||||||
|
ifeq (1, $(MACOS))
|
||||||
|
SRA_LIB += -stdlib=libc++
|
||||||
|
DEBUG_FLAGS += -mmacosx-version-min=10.10
|
||||||
|
RELEASE_FLAGS += -mmacosx-version-min=10.10
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
|
||||||
|
|
||||||
|
HISAT2_BIN_LIST = hisat2-build-s \
|
||||||
|
hisat2-build-l \
|
||||||
|
hisat2-align-s \
|
||||||
|
hisat2-align-l \
|
||||||
|
hisat2-inspect-s \
|
||||||
|
hisat2-inspect-l \
|
||||||
|
hisat2-repeat \
|
||||||
|
hisat-3n-table
|
||||||
|
|
||||||
|
HISAT2_BIN_LIST_AUX = hisat2-build-s-debug \
|
||||||
|
hisat2-build-l-debug \
|
||||||
|
hisat2-align-s-debug \
|
||||||
|
hisat2-align-l-debug \
|
||||||
|
hisat2-inspect-s-debug \
|
||||||
|
hisat2-inspect-l-debug \
|
||||||
|
hisat2-repeat-debug
|
||||||
|
|
||||||
|
HT2LIB_SRCS = $(SHARED_CPPS) \
|
||||||
|
$(HT2LIB_CPPS)
|
||||||
|
|
||||||
|
HT2LIB_OBJS = $(HT2LIB_SRCS:.cpp=.o)
|
||||||
|
|
||||||
|
HT2LIB_DEBUG_OBJS = $(addprefix .ht2lib-obj-debug/,$(HT2LIB_OBJS))
|
||||||
|
HT2LIB_RELEASE_OBJS = $(addprefix .ht2lib-obj-release/,$(HT2LIB_OBJS))
|
||||||
|
HT2LIB_SHARED_DEBUG_OBJS = $(addprefix .ht2lib-obj-debug-shared/,$(HT2LIB_OBJS))
|
||||||
|
HT2LIB_SHARED_RELEASE_OBJS = $(addprefix .ht2lib-obj-release-shared/,$(HT2LIB_OBJS))
|
||||||
|
|
||||||
|
HT2LIB_PKG_SRC = \
|
||||||
|
$(HT2LIB_DIR)/ht2_init.cpp \
|
||||||
|
$(HT2LIB_DIR)/ht2_repeat.cpp \
|
||||||
|
$(HT2LIB_DIR)/ht2_index.cpp \
|
||||||
|
$(HT2LIB_DIR)/ht2.h \
|
||||||
|
$(HT2LIB_DIR)/ht2_handle.h \
|
||||||
|
$(HT2LIB_DIR)/java_jni/Makefile \
|
||||||
|
$(HT2LIB_DIR)/java_jni/ht2module.c \
|
||||||
|
$(HT2LIB_DIR)/java_jni/HT2Module.java \
|
||||||
|
$(HT2LIB_DIR)/java_jni/HT2ModuleExample.java \
|
||||||
|
$(HT2LIB_DIR)/pymodule/Makefile \
|
||||||
|
$(HT2LIB_DIR)/pymodule/ht2module.c \
|
||||||
|
$(HT2LIB_DIR)/pymodule/setup.py \
|
||||||
|
$(HT2LIB_DIR)/pymodule/ht2example.py
|
||||||
|
|
||||||
|
|
||||||
|
GENERAL_LIST = $(wildcard scripts/*.sh) \
|
||||||
|
$(wildcard scripts/*.pl) \
|
||||||
|
$(wildcard *.py) \
|
||||||
|
$(wildcard example/index/*.ht2) \
|
||||||
|
$(wildcard example/reads/*.fa) \
|
||||||
|
example/reference/22_20-21M.fa \
|
||||||
|
example/reference/22_20-21M.snp \
|
||||||
|
$(PTHREAD_PKG) \
|
||||||
|
hisat2 \
|
||||||
|
hisat2-build \
|
||||||
|
hisat2-inspect \
|
||||||
|
AUTHORS \
|
||||||
|
LICENSE \
|
||||||
|
NEWS \
|
||||||
|
MANUAL \
|
||||||
|
MANUAL.markdown \
|
||||||
|
TUTORIAL \
|
||||||
|
HISAT2_VERSION
|
||||||
|
|
||||||
|
ifeq (1,$(WINDOWS))
|
||||||
|
HISAT2_BIN_LIST := $(HISAT2_BIN_LIST) hisat2.bat hisat2-build.bat hisat2-inspect.bat
|
||||||
|
endif
|
||||||
|
|
||||||
|
# This is helpful on Windows under MinGW/MSYS, where Make might go for
|
||||||
|
# the Windows FIND tool instead.
|
||||||
|
FIND=$(shell which find)
|
||||||
|
|
||||||
|
SRC_PKG_LIST = $(wildcard *.h) \
|
||||||
|
$(wildcard *.hh) \
|
||||||
|
$(wildcard *.c) \
|
||||||
|
$(wildcard *.cpp) \
|
||||||
|
$(HT2LIB_PKG_SRC) \
|
||||||
|
Makefile \
|
||||||
|
CMakeLists.txt \
|
||||||
|
$(GENERAL_LIST)
|
||||||
|
|
||||||
|
BIN_PKG_LIST = $(GENERAL_LIST)
|
||||||
|
|
||||||
|
.PHONY: all allall both both-debug
|
||||||
|
|
||||||
|
all: $(HISAT2_BIN_LIST)
|
||||||
|
|
||||||
|
allall: $(HISAT2_BIN_LIST) $(HISAT2_BIN_LIST_AUX)
|
||||||
|
|
||||||
|
both: hisat2-align-s hisat2-align-l hisat2-build-s hisat2-build-l
|
||||||
|
|
||||||
|
both-debug: hisat2-align-s-debug hisat2-align-l-debug hisat2-build-s-debug hisat2-build-l-debug
|
||||||
|
|
||||||
|
repeat: hisat2-repeat
|
||||||
|
|
||||||
|
repeat-debug: hisat2-repeat-debug
|
||||||
|
|
||||||
|
DEFS :=-fno-strict-aliasing \
|
||||||
|
-DHISAT2_VERSION="\"`cat HISAT2_VERSION`\"" \
|
||||||
|
-DBUILD_HOST="\"`hostname`\"" \
|
||||||
|
-DBUILD_TIME="\"`date`\"" \
|
||||||
|
-DCOMPILER_VERSION="\"`$(CXX) -v 2>&1 | tail -1`\"" \
|
||||||
|
$(FILE_FLAGS) \
|
||||||
|
$(PREF_DEF) \
|
||||||
|
$(MM_DEF) \
|
||||||
|
$(SHMEM_DEF)
|
||||||
|
|
||||||
|
#
|
||||||
|
# hisat-bp targets
|
||||||
|
#
|
||||||
|
|
||||||
|
hisat-bp-bin: hisat_bp.cpp $(SEARCH_CPPS) $(SHARED_CPPS) $(HEADERS) $(SEARCH_FRAGMENTS)
|
||||||
|
$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) -DBOWTIE2 $(NOASSERT_FLAGS) -Wall \
|
||||||
|
$(INC) \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) $(HISAT_CPPS_MAIN) \
|
||||||
|
$(LIBS) $(SEARCH_LIBS)
|
||||||
|
|
||||||
|
hisat-bp-bin-debug: hisat_bp.cpp $(SEARCH_CPPS) $(SHARED_CPPS) $(HEADERS) $(SEARCH_FRAGMENTS)
|
||||||
|
$(CXX) $(DEBUG_FLAGS) \
|
||||||
|
$(DEBUG_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) -DBOWTIE2 -Wall \
|
||||||
|
$(INC) \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) $(HISAT_CPPS_MAIN) \
|
||||||
|
$(LIBS) $(SEARCH_LIBS)
|
||||||
|
|
||||||
|
#
|
||||||
|
# hisat2-repeat targets
|
||||||
|
#
|
||||||
|
|
||||||
|
hisat2-repeat: hisat2_repeat.cpp $(REPEAT_CPPS) $(SHARED_CPPS) $(HEADERS)
|
||||||
|
$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) -DBOWTIE2 -DBOWTIE_64BIT_INDEX $(NOASSERT_FLAGS) -Wall \
|
||||||
|
$(INC) \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) $(HISAT2_REPEAT_CPPS_MAIN) \
|
||||||
|
$(LIBS) $(BUILD_LIBS)
|
||||||
|
|
||||||
|
hisat2-repeat-debug: hisat2_repeat.cpp $(REPEAT_CPPS) $(SHARED_CPPS) $(HEADERS)
|
||||||
|
$(CXX) $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) -DBOWTIE2 -DBOWTIE_64BIT_INDEX -Wall \
|
||||||
|
$(INC) \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) $(HISAT2_REPEAT_CPPS_MAIN) \
|
||||||
|
$(LIBS) $(BUILD_LIBS)
|
||||||
|
|
||||||
|
|
||||||
|
#
|
||||||
|
# hisat2-build targets
|
||||||
|
#
|
||||||
|
|
||||||
|
hisat2-build-s: hisat2_build.cpp $(SHARED_CPPS) $(HEADERS)
|
||||||
|
$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) -DBOWTIE2 $(NOASSERT_FLAGS) -Wall -DMASSIVE_DATA_RLCSA \
|
||||||
|
$(INC) \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) $(HISAT2_BUILD_CPPS_MAIN) \
|
||||||
|
$(LIBS) $(BUILD_LIBS)
|
||||||
|
|
||||||
|
hisat2-build-l: hisat2_build.cpp $(SHARED_CPPS) $(HEADERS)
|
||||||
|
$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) -DBOWTIE2 -DBOWTIE_64BIT_INDEX $(NOASSERT_FLAGS) -Wall \
|
||||||
|
$(INC) \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) $(HISAT2_BUILD_CPPS_MAIN) \
|
||||||
|
$(LIBS) $(BUILD_LIBS)
|
||||||
|
|
||||||
|
hisat2-build-s-debug: hisat2_build.cpp $(SHARED_CPPS) $(HEADERS)
|
||||||
|
$(CXX) $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) -DBOWTIE2 -Wall -DMASSIVE_DATA_RLCSA \
|
||||||
|
$(INC) \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) $(HISAT2_BUILD_CPPS_MAIN) \
|
||||||
|
$(LIBS) $(BUILD_LIBS)
|
||||||
|
|
||||||
|
hisat2-build-l-debug: hisat2_build.cpp $(SHARED_CPPS) $(HEADERS)
|
||||||
|
$(CXX) $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) -DBOWTIE2 -DBOWTIE_64BIT_INDEX -Wall \
|
||||||
|
$(INC) \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) $(HISAT2_BUILD_CPPS_MAIN) \
|
||||||
|
$(LIBS) $(BUILD_LIBS)
|
||||||
|
|
||||||
|
#
|
||||||
|
# hisat2 targets
|
||||||
|
#
|
||||||
|
|
||||||
|
hisat2-align-s: hisat2.cpp $(SEARCH_CPPS) $(SHARED_CPPS) $(HEADERS) $(SEARCH_FRAGMENTS)
|
||||||
|
$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) $(SRA_DEF) -DBOWTIE2 $(NOASSERT_FLAGS) -Wall \
|
||||||
|
$(INC) $(SEARCH_INC) \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) $(HISAT2_CPPS_MAIN) \
|
||||||
|
$(LIBS) $(SRA_LIB) $(SEARCH_LIBS)
|
||||||
|
|
||||||
|
hisat2-align-l: hisat2.cpp $(SEARCH_CPPS) $(SHARED_CPPS) $(HEADERS) $(SEARCH_FRAGMENTS)
|
||||||
|
$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) $(SRA_DEF) -DBOWTIE2 -DBOWTIE_64BIT_INDEX $(NOASSERT_FLAGS) -Wall \
|
||||||
|
$(INC) $(SEARCH_INC) \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) $(HISAT2_CPPS_MAIN) \
|
||||||
|
$(LIBS) $(SRA_LIB) $(SEARCH_LIBS)
|
||||||
|
|
||||||
|
hisat2-align-s-debug: hisat2.cpp $(SEARCH_CPPS) $(SHARED_CPPS) $(HEADERS) $(SEARCH_FRAGMENTS)
|
||||||
|
$(CXX) $(DEBUG_FLAGS) \
|
||||||
|
$(DEBUG_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) $(SRA_DEF) -DBOWTIE2 -Wall \
|
||||||
|
$(INC) $(SEARCH_INC) \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) $(HISAT2_CPPS_MAIN) \
|
||||||
|
$(LIBS) $(SRA_LIB) $(SEARCH_LIBS)
|
||||||
|
|
||||||
|
hisat2-align-l-debug: hisat2.cpp $(SEARCH_CPPS) $(SHARED_CPPS) $(HEADERS) $(SEARCH_FRAGMENTS)
|
||||||
|
$(CXX) $(DEBUG_FLAGS) \
|
||||||
|
$(DEBUG_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) $(SRA_DEF) -DBOWTIE2 -DBOWTIE_64BIT_INDEX -Wall \
|
||||||
|
$(INC) $(SEARCH_INC) \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) $(HISAT2_CPPS_MAIN) \
|
||||||
|
$(LIBS) $(SRA_LIB) $(SEARCH_LIBS)
|
||||||
|
|
||||||
|
#
|
||||||
|
# hisat2-inspect targets
|
||||||
|
#
|
||||||
|
|
||||||
|
hisat2-inspect-s: hisat2_inspect.cpp $(HEADERS) $(SHARED_CPPS)
|
||||||
|
$(CXX) $(RELEASE_FLAGS) \
|
||||||
|
$(RELEASE_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) -DBOWTIE2 -DHISAT2_INSPECT_MAIN -Wall \
|
||||||
|
$(INC) -I . \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) \
|
||||||
|
$(LIBS) $(INSPECT_LIBS)
|
||||||
|
|
||||||
|
hisat2-inspect-l: hisat2_inspect.cpp $(HEADERS) $(SHARED_CPPS)
|
||||||
|
$(CXX) $(RELEASE_FLAGS) \
|
||||||
|
$(RELEASE_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) -DBOWTIE2 -DBOWTIE_64BIT_INDEX -DHISAT2_INSPECT_MAIN -Wall \
|
||||||
|
$(INC) -I . \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) \
|
||||||
|
$(LIBS) $(INSPECT_LIBS)
|
||||||
|
|
||||||
|
hisat2-inspect-s-debug: hisat2_inspect.cpp $(HEADERS) $(SHARED_CPPS)
|
||||||
|
$(CXX) $(DEBUG_FLAGS) \
|
||||||
|
$(DEBUG_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) -DBOWTIE2 -DHISAT2_INSPECT_MAIN -Wall \
|
||||||
|
$(INC) -I . \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) \
|
||||||
|
$(LIBS) $(INSPECT_LIBS)
|
||||||
|
|
||||||
|
hisat2-inspect-l-debug: hisat2_inspect.cpp $(HEADERS) $(SHARED_CPPS)
|
||||||
|
$(CXX) $(DEBUG_FLAGS) \
|
||||||
|
$(DEBUG_DEFS) $(EXTRA_FLAGS) \
|
||||||
|
$(DEFS) -DBOWTIE2 -DBOWTIE_64BIT_INDEX -DHISAT2_INSPECT_MAIN -Wall \
|
||||||
|
$(INC) -I . \
|
||||||
|
-o $@ $< \
|
||||||
|
$(SHARED_CPPS) \
|
||||||
|
$(LIBS) $(INSPECT_LIBS)
|
||||||
|
|
||||||
|
#
|
||||||
|
# hisat-3n-table targets
|
||||||
|
#
|
||||||
|
|
||||||
|
hisat-3n-table: hisat_3n_table.cpp $(THREE_N_HEADERS)
|
||||||
|
$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) $(NOASSERT_FLAGS) $(DEFS) -pthread -o $@ $<
|
||||||
|
|
||||||
|
#
|
||||||
|
# HT2LIB targets
|
||||||
|
#
|
||||||
|
|
||||||
|
ht2lib: libhisat2lib-debug.a libhisat2lib.a libhisat2lib-debug.so libhisat2lib.so
|
||||||
|
|
||||||
|
libhisat2lib-debug.a: $(HT2LIB_DEBUG_OBJS)
|
||||||
|
ar rc $@ $(HT2LIB_DEBUG_OBJS)
|
||||||
|
|
||||||
|
libhisat2lib.a: $(HT2LIB_RELEASE_OBJS)
|
||||||
|
ar rc $@ $(HT2LIB_RELEASE_OBJS)
|
||||||
|
|
||||||
|
libhisat2lib-debug.so: $(HT2LIB_SHARED_DEBUG_OBJS)
|
||||||
|
$(CXX) $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) $(DEFS) $(SRA_DEF) -DBOWTIE2 -Wall $(INC) $(SEARCH_INC) \
|
||||||
|
-shared -o $@ $(HT2LIB_SHARED_DEBUG_OBJS) $(LIBS) $(SRA_LIB) $(SEARCH_LIBS)
|
||||||
|
|
||||||
|
libhisat2lib.so: $(HT2LIB_SHARED_RELEASE_OBJS)
|
||||||
|
$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) $(DEFS) $(SRA_DEF) -DBOWTIE2 $(NOASSERT_FLAGS) -Wall $(INC) $(SEARCH_INC)\
|
||||||
|
-shared -o $@ $(HT2LIB_SHARED_RELEASE_OBJS) $(LIBS) $(SRA_LIB) $(SEARCH_LIBS)
|
||||||
|
|
||||||
|
.ht2lib-obj-debug/%.o: %.cpp
|
||||||
|
@mkdir -p $(dir $@)/$(dir $<)
|
||||||
|
$(CXX) -fPIC $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) $(DEFS) $(SRA_DEF) $(HT2LIB_FLAGS) -DBOWTIE2 -Wall $(INC) $(SEARCH_INC) \
|
||||||
|
-c -o $@ $<
|
||||||
|
|
||||||
|
.ht2lib-obj-release/%.o: %.cpp
|
||||||
|
@mkdir -p $(dir $@)/$(dir $<)
|
||||||
|
$(CXX) -fPIC $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) $(DEFS) $(SRA_DEF) $(HT2LIB_FLAGS) -DBOWTIE2 $(NOASSERT_FLAGS) -Wall $(INC) $(SEARCH_INC) \
|
||||||
|
-c -o $@ $<
|
||||||
|
|
||||||
|
.ht2lib-obj-debug-shared/%.o: %.cpp
|
||||||
|
@mkdir -p $(dir $@)/$(dir $<)
|
||||||
|
$(CXX) -fPIC $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) $(DEFS) $(SRA_DEF) $(HT2LIB_FLAGS) -DBOWTIE2 -Wall $(INC) $(SEARCH_INC) \
|
||||||
|
-c -o $@ $<
|
||||||
|
|
||||||
|
.ht2lib-obj-release-shared/%.o: %.cpp
|
||||||
|
@mkdir -p $(dir $@)/$(dir $<)
|
||||||
|
$(CXX) -fPIC $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) $(DEFS) $(SRA_DEF) $(HT2LIB_FLAGS) -DBOWTIE2 $(NOASSERT_FLAGS) -Wall $(INC) $(SEARCH_INC) \
|
||||||
|
-c -o $@ $<
|
||||||
|
|
||||||
|
#
|
||||||
|
# repeatexp
|
||||||
|
#
|
||||||
|
repeatexp:
|
||||||
|
g++ -o repeatexp repeatexp.cpp -I hisat2lib libhisat2lib.a
|
||||||
|
|
||||||
|
hisat2: ;
|
||||||
|
|
||||||
|
hisat2.bat:
|
||||||
|
echo "@echo off" > hisat2.bat
|
||||||
|
echo "perl %~dp0/hisat2 %*" >> hisat2.bat
|
||||||
|
|
||||||
|
hisat2-build.bat:
|
||||||
|
echo "@echo off" > hisat2-build.bat
|
||||||
|
echo "python %~dp0/hisat2-build %*" >> hisat2-build.bat
|
||||||
|
|
||||||
|
hisat2-inspect.bat:
|
||||||
|
echo "@echo off" > hisat2-inspect.bat
|
||||||
|
echo "python %~dp0/hisat2-inspect %*" >> hisat2-inspect.bat
|
||||||
|
|
||||||
|
|
||||||
|
.PHONY: hisat2-src
|
||||||
|
hisat2-src: $(SRC_PKG_LIST)
|
||||||
|
chmod a+x scripts/*.sh scripts/*.pl
|
||||||
|
mkdir .src.tmp
|
||||||
|
mkdir .src.tmp/hisat2-$(VERSION)
|
||||||
|
zip tmp.zip $(SRC_PKG_LIST)
|
||||||
|
mv tmp.zip .src.tmp/hisat2-$(VERSION)
|
||||||
|
cd .src.tmp/hisat2-$(VERSION) ; unzip tmp.zip ; rm -f tmp.zip
|
||||||
|
cd .src.tmp ; zip -r hisat2-$(VERSION)-source.zip hisat2-$(VERSION)
|
||||||
|
cp .src.tmp/hisat2-$(VERSION)-source.zip .
|
||||||
|
rm -rf .src.tmp
|
||||||
|
|
||||||
|
.PHONY: hisat2-bin
|
||||||
|
hisat2-bin: $(BIN_PKG_LIST) $(HISAT2_BIN_LIST) $(HISAT2_BIN_LIST_AUX)
|
||||||
|
chmod a+x scripts/*.sh scripts/*.pl
|
||||||
|
rm -rf .bin.tmp
|
||||||
|
mkdir .bin.tmp
|
||||||
|
mkdir .bin.tmp/hisat2-$(VERSION)
|
||||||
|
if [ -f hisat2.exe ] ; then \
|
||||||
|
zip tmp.zip $(BIN_PKG_LIST) $(addsuffix .exe,$(HISAT2_BIN_LIST) $(HISAT2_BIN_LIST_AUX)) ; \
|
||||||
|
else \
|
||||||
|
zip tmp.zip $(BIN_PKG_LIST) $(HISAT2_BIN_LIST) $(HISAT2_BIN_LIST_AUX) ; \
|
||||||
|
fi
|
||||||
|
mv tmp.zip .bin.tmp/hisat2-$(VERSION)
|
||||||
|
cd .bin.tmp/hisat2-$(VERSION) ; unzip tmp.zip ; rm -f tmp.zip
|
||||||
|
cd .bin.tmp ; zip -r hisat2-$(VERSION)-$(BITS).zip hisat2-$(VERSION)
|
||||||
|
cp .bin.tmp/hisat2-$(VERSION)-$(BITS).zip .
|
||||||
|
rm -rf .bin.tmp
|
||||||
|
|
||||||
|
.PHONY: doc
|
||||||
|
doc: doc/manual.inc.html MANUAL
|
||||||
|
|
||||||
|
doc/manual.inc.html: MANUAL.markdown
|
||||||
|
pandoc -T "HISAT2 Manual" -o $@ \
|
||||||
|
--from markdown --to HTML --toc $^
|
||||||
|
perl -i -ne \
|
||||||
|
'$$w=0 if m|^</body>|;print if $$w;$$w=1 if m|^<body>|;' $@
|
||||||
|
|
||||||
|
MANUAL: MANUAL.markdown
|
||||||
|
perl doc/strip_markdown.pl < $^ > $@
|
||||||
|
|
||||||
|
.PHONY: clean
|
||||||
|
clean:
|
||||||
|
rm -f $(HISAT2_BIN_LIST) $(HISAT2_BIN_LIST_AUX) \
|
||||||
|
$(addsuffix .exe,$(HISAT2_BIN_LIST) $(HISAT2_BIN_LIST_AUX)) \
|
||||||
|
hisat2-src.zip hisat2-bin.zip
|
||||||
|
rm -f core.* .tmp.head
|
||||||
|
rm -rf *.dSYM
|
||||||
|
rm -rf .ht2lib-obj*
|
||||||
|
rm -f libhisat2lib*.a libhisat2lib*.so
|
||||||
|
|
||||||
|
|
||||||
|
.PHONY: push-doc
|
||||||
|
push-doc: doc/manual.inc.html
|
||||||
|
scp doc/*.*html doc/indexes.txt salz-dmz:/ccb/salz7-data/www/ccb.jhu.edu/html/software/hisat2/
|
16
NEWS
Normal file
16
NEWS
Normal file
@ -0,0 +1,16 @@
|
|||||||
|
HISAT 2 NEWS
|
||||||
|
=============
|
||||||
|
|
||||||
|
HISAT 2 is now available for download from the project website,
|
||||||
|
http://bowtie-bio.sf.net/bowtie2. 2.0.0-beta is the first version released to
|
||||||
|
the public and 2.0.7 is the latest version. HISAT 2 is licensed under
|
||||||
|
the GPLv3 license. See `LICENSE' file for details.
|
||||||
|
|
||||||
|
|
||||||
|
Version Release History
|
||||||
|
=======================
|
||||||
|
|
||||||
|
Version 2.0.0-beta - August XX, 2015
|
||||||
|
* Improved multithreading support so that Bowtie 2 now uses native Windows
|
||||||
|
threads when compiled on Windows and uses a faster mutex. Threading
|
||||||
|
performance should improve on all platforms.
|
247
README.md
Normal file
247
README.md
Normal file
@ -0,0 +1,247 @@
|
|||||||
|
HISAT-3N
|
||||||
|
============
|
||||||
|
|
||||||
|
Overview
|
||||||
|
-----------------
|
||||||
|
HISAT-3N (hierarchical indexing for spliced alignment of transcripts - 3 nucleotides)
|
||||||
|
is an ultrafast and memory-efficient sequence aligner designed for nucleotide conversion
|
||||||
|
sequencing technologies. HISAT-3N index contains two HISAT2 indexes which require memory small:
|
||||||
|
for the human genome, it requires 9 GB for standard 3N-index and 10.5 GB for repeat 3N-index.
|
||||||
|
The repeat 3N-index could be used to align one read to thousands position 3 times faster standard 3N-index.
|
||||||
|
HISAT-3N is developed based on [HISAT2],
|
||||||
|
which is particularly optimized for RNA sequencing technology. HISAT-3N support both strand-specific and non-strand reads.
|
||||||
|
HISAT-3N can be used for any base-converted sequencing reads include [BS-seq], [SLAM-seq], [scBS-seq], [scSLAM-seq], and [TAPS].
|
||||||
|
See the [HISAT-3N] website for more information.
|
||||||
|
|
||||||
|
[HISAT2]:https://github.com/DaehwanKimLab/hisat2
|
||||||
|
[BS-seq]: https://en.wikipedia.org/wiki/Bisulfite_sequencing
|
||||||
|
[SLAM-seq]: https://www.nature.com/articles/nmeth.4435
|
||||||
|
[scBS-seq]: https://www.nature.com/articles/nmeth.3035
|
||||||
|
[scSLAM-seq]: https://www.nature.com/articles/s41586-019-1369-y
|
||||||
|
[TAPS]: https://www.nature.com/articles/s41587-019-0041-2
|
||||||
|
[HISAT-3N]:https://daehwankimlab.github.io/hisat2/hisat-3n
|
||||||
|
|
||||||
|
|
||||||
|
Getting started
|
||||||
|
============
|
||||||
|
HISAT-3N requires a 64-bit computer running either Linux or Mac OS X and at least 16 GB of RAM.
|
||||||
|
|
||||||
|
A few notes:
|
||||||
|
|
||||||
|
1. Building the standard 3N index requires 16GB of RAM or less.
|
||||||
|
2. Building the repeat 3N index requires 256GB of RAM.
|
||||||
|
3. The alignment process using either the standard or repeat index requires less than 16GB of RAM.
|
||||||
|
4. [SAMtools] is required to sort SAM files in order to generate a HISAT-3N table.
|
||||||
|
|
||||||
|
Install
|
||||||
|
------------
|
||||||
|
|
||||||
|
git clone https://github.com/DaehwanKimLab/hisat2.git hisat-3n
|
||||||
|
cd hisat-3n
|
||||||
|
git checkout -b hisat-3n origin/hisat-3n
|
||||||
|
make
|
||||||
|
|
||||||
|
Build a HISAT-3N index with `hisat-3n-build`
|
||||||
|
-----------
|
||||||
|
`hisat-3n-build` builds a 3N-index, which contains two hisat2 indexes, from a set of DNA sequences. For standard 3N-index,
|
||||||
|
each index contains 16 files with suffix `.3n.*.*.ht2`.
|
||||||
|
For repeat 3N-index, there are 16 more files in addition to the standard 3N-index, and they have the suffix
|
||||||
|
`.3n.*.rep.*.ht2`.
|
||||||
|
These files constitute the hisat-3n index and no other file is needed to alignment reads to the reference.
|
||||||
|
|
||||||
|
* `--base-change <chr1,chr2>` argument is required for `hisat-3n-build` and `hisat-3n`.
|
||||||
|
Provide which base is converted in the sequencing process to another base. Please enter
|
||||||
|
2 letters separated by ',' for this argument. The first letter(chr1) should be the converted base, the second letter(chr2) should be
|
||||||
|
the converted to base. For example, during slam-seq, some 'T' is converted to 'C',
|
||||||
|
please enter `--base-change T,C`. During bisulfite-seq, some 'C' is converted to 'T', please enter `--base-change C,T`.
|
||||||
|
* Different conversion types may build the same hisat-3n index. Please check the table below for more detail.
|
||||||
|
Once you build the hisat-3n index with C to T conversion (for example BS-seq).
|
||||||
|
You can align the T to C conversion reads (for example SLAM-seq reads) with the same index.
|
||||||
|
|
||||||
|
|
||||||
|
| Conversion Types | HISAT-3N index suffix |
|
||||||
|
|:----------------------------------:|:-----------------------------:|
|
||||||
|
|C -> T<br>T -> C<br>A -> G<br>G -> A|.3n.CT.\*.ht2 <br>.3n.GA.\*.ht2|
|
||||||
|
|A -> C<br>C -> A<br>G -> T<br>T -> G|.3n.AC.\*.ht2 <br>.3n.TG.\*.ht2|
|
||||||
|
|A -> T<br>T -> A |.3n.AT.\*.ht2 <br>.3n.TA.\*.ht2|
|
||||||
|
|C -> G<br>G -> C |.3n.CG.\*.ht2 <br>.3n.GC.\*.ht2|
|
||||||
|
|
||||||
|
#### Examples:
|
||||||
|
# Build the standard HISAT-3N index (with C to T conversion):
|
||||||
|
hisat-3n-build --base-change C,T genome.fa genome
|
||||||
|
|
||||||
|
# Build the repeat HISAT-3N index (with T to C conversion, require 256 GB memory for human genome index):
|
||||||
|
hisat-3n-build --base-change T,C --repeat-index genome.fa genome
|
||||||
|
|
||||||
|
It is optional to make the graph index and add SNP or spice site information to the index, to increase the alignment accuracy.
|
||||||
|
The graph index building may require more memory than the linear index building.
|
||||||
|
For more detail, please check the [HISAT2 manual].
|
||||||
|
|
||||||
|
[HISAT2 manual]:https://daehwankimlab.github.io/hisat2/manual/
|
||||||
|
|
||||||
|
#### Examples:
|
||||||
|
# Build the standard HISAT-3N index integrated index with SNP information
|
||||||
|
hisat-3n-build --base-change C,T --snp genome.snp genome.fa genome
|
||||||
|
|
||||||
|
# Build the standard HISAT-3N integrated index with splice site information
|
||||||
|
hisat-3n-build --base-change C,T --ss genome.ss --exon genome.exon genome.fa genome
|
||||||
|
|
||||||
|
# Build the repeat HISAT-3N index integrated index with SNP information
|
||||||
|
hisat-3n-build --base-change C,T --repeat-index --snp genome.snp genome.fa genome
|
||||||
|
|
||||||
|
# Build the repeat HISAT-3N integrated index with splice site information
|
||||||
|
hisat-3n-build --base-change C,T --repeat-index --ss genome.ss --exon genome.exon genome.fa genome
|
||||||
|
|
||||||
|
|
||||||
|
Alignment with `hisat-3n`
|
||||||
|
------------
|
||||||
|
After building the HISAT-3N index, you are ready to use `hisat-3n` for alignment.
|
||||||
|
HISAT-3N has the same set of parameters as in HISAT2 with some additional arguments. Please refer to the [HISAT2 manual] for more details.
|
||||||
|
|
||||||
|
For the human reference genome, HISAT-3N requires about 9GB for alignment with the standard 3N-index and 10.5GB for the repeat 3N-index.
|
||||||
|
|
||||||
|
* `--base-change <nt1,nt2>`
|
||||||
|
Specify the nucleotide conversion type (e.g., C to T in bisulfite-sequencing reads). The parameter option is two characters separated by ','. Type the original nucleotide for the first character (nt1) and type the converted nucleotide as the second character (nt2). For example, if performing [SLAM-seq] where some 'T's are converted to 'C's, input `--base-change T,C`.
|
||||||
|
As another example, if performing bisulfite-seq, where some 'C's are converted to 'T's, please input `--base-change C,T`.
|
||||||
|
If you want to align non-converted reads to the regular HISAT2 index, then omit this command.
|
||||||
|
|
||||||
|
* `--index/-x <hisat-3n-idx>`
|
||||||
|
Specify the index file basename for HISAT-3N. The basename is the name of the index files up to but not including the suffix `.3n.*.*.ht2` / etc.
|
||||||
|
For example, if you build your index with basename 'genome' using a HISAT-3N-build, please input `--index genome`.
|
||||||
|
|
||||||
|
* `--directional-mapping`
|
||||||
|
Make directional mapping. Please use this option only if your sequencing reads are generated from a strand-specific library.
|
||||||
|
The directional mapping mode is about 2x faster than the standard (non-directional) mapping mode.
|
||||||
|
|
||||||
|
* `--repeat-limit <int>`
|
||||||
|
You can set up the number of alignments to be checked for each repeat alignment. You may increase the number to direct hisat-3n
|
||||||
|
to output more, if a read has multiple mapping locations. We suggest that you limit the repeat number for paired-end read alignment to no more
|
||||||
|
than 1,000,000. default: 1000.
|
||||||
|
|
||||||
|
* `--unique-only`
|
||||||
|
Only output uniquely aligned reads.
|
||||||
|
|
||||||
|
|
||||||
|
#### Examples:
|
||||||
|
* Single-end [SLAM-seq] read (T to C conversion) alignment with standard 3N-index:
|
||||||
|
`hisat-3n --index genome -f -U read.fa -S output.sam --base-change T,C`
|
||||||
|
|
||||||
|
* Paired-end strand-specific bisulfite-seq read (C to T conversion) alignment with repeat 3N-index:
|
||||||
|
`hisat-3n --index genome -f -1 read_1.fa -2 read_2.fa -S output.sam --base-change C,T --directional-mapping`
|
||||||
|
|
||||||
|
* Single-end TAPS reads (C to T conversion) alignment with repeat 3N-index and only output unique aligned results:
|
||||||
|
`hisat-3n --index genome -q -U read.fq -S output.sam --base-change C,T --unique`
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#### Extra SAM tags generated by HISAT-3N:
|
||||||
|
|
||||||
|
* `Yf:i:<N>`: Number of conversions detected in the read.
|
||||||
|
* `Zf:i:<N>`: Number of un-converted bases are detected in the read. Yf + Zf = total number of bases which can be converted in the read sequence.
|
||||||
|
* `YZ:A:<A>`: The value `+` or `–` indicates the read is mapped to REF-3N (`+`) or REF-RC-3N (`-`), respectively.
|
||||||
|
|
||||||
|
Generate a 3N-conversion-table with `hisat-3n-table`
|
||||||
|
------------
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
To generate a 3N-conversion-table, users need to sort the `hisat-3n` generated SAM alignment file.
|
||||||
|
|
||||||
|
[SAMtools] is required for this sorting process.
|
||||||
|
|
||||||
|
Use `samtools sort` to convert the SAM file into a sorted SAM file.
|
||||||
|
|
||||||
|
samtools sort output.sam -o output_sorted.sam -O sam
|
||||||
|
|
||||||
|
Generate 3N-conversion-table with `hisat-3n-table`:
|
||||||
|
|
||||||
|
### Usage
|
||||||
|
hisat-3n-table [options]* --alignments <alignmentFile> --ref <refFile> --base-change <char1,char2>
|
||||||
|
|
||||||
|
#### Main arguments
|
||||||
|
* `--alignments <alignmentFile>`
|
||||||
|
SORTED SAM file. Please enter `-` for standard input.
|
||||||
|
|
||||||
|
* `--ref <refFile>`
|
||||||
|
The reference genome file (FASTA format) for generating HISAT-3N index.
|
||||||
|
|
||||||
|
* `--output-name <outputFile>`
|
||||||
|
Filename to write 3N-conversion-table (tsv format) to. By default, table is written to the “standard out” or “stdout” filehandle (i.e. the console).
|
||||||
|
|
||||||
|
* `--base-change <char1,char2>`
|
||||||
|
The base-change rule. User should enter the exact same `--base-change` arguments in hisat-3n.
|
||||||
|
For example, please enter `--base-change C,T` for bisulfite sequencing reads.
|
||||||
|
|
||||||
|
#### Input options
|
||||||
|
* `-u/--unique-only`
|
||||||
|
Only count the unique aligned reads into 3N-conversion-table.
|
||||||
|
|
||||||
|
* `-m/--multiple-only`
|
||||||
|
Only count the multiple aligned reads into 3N-conversion-table.
|
||||||
|
|
||||||
|
* `-c/--CG-only`
|
||||||
|
Only count the CpG sites in reference genome. This option is designed for bisulfite sequencing reads.
|
||||||
|
|
||||||
|
* `--added-chrname`
|
||||||
|
Please add this option if you use `--add-chrname` during `hisat-3n` alignment.
|
||||||
|
During `hisat-3n` alignment, the prefix "chr" is added in front of chromosome name and shows on SAM output, when user choose `--add-chrname`.
|
||||||
|
`hisat-3n-table` cannot find the chromosome name on reference because it has an additional "chr" prefix. This option is to help `hisat-3n-table`
|
||||||
|
find the matching chromosome name on reference file. The 3n-table provides the same chromosome name as SAM file.
|
||||||
|
|
||||||
|
* `--removed-chrname`
|
||||||
|
Please add this option if you use `--remove-chrname` during `hisat-3n` alignment.
|
||||||
|
During `hisat-3n` alignment, the prefix "chr" is removed in front of chromosome name and shows on SAM output, when user choose `--remove-chrname`.
|
||||||
|
`hisat-3n-table` cannot find the chromosome name on reference because it has no "chr" prefix. This option is to help `hisat-3n-table`
|
||||||
|
find the matching chromosome name on reference file. The 3n-table provides the same chromosome name as SAM file.
|
||||||
|
|
||||||
|
#### Other options:
|
||||||
|
* `-p/--threads <int>`
|
||||||
|
Launch `int` parallel threads (default: 1) for table building.
|
||||||
|
|
||||||
|
* `-h/--help`
|
||||||
|
Print usage information and quit.
|
||||||
|
|
||||||
|
#### Examples:
|
||||||
|
# Generate the 3N-conversion-table for bisulfite sequencing data:
|
||||||
|
hisat-3n-table -p 16 --alignments sorted_alignment_result.sam --ref genome.fa --output-name output.tsv --base-change C,T
|
||||||
|
|
||||||
|
# Generate the 3N-conversion-table for TAPS data and only count base in CpG site and uniquely aligned:
|
||||||
|
hisat-3n-table -p 16 --alignments sorted_alignment_result.sam --ref genome.fa --output-name output.tsv --base-change C,T --CG-only --unique-only
|
||||||
|
|
||||||
|
# Generate the 3N-conversion-table for bisulfite sequencing data from sorted BAM file:
|
||||||
|
samtools view -h sorted_alignment_result.bam | hisat-3n-table --ref genome.fa --alignments - --output-name output.tsv --base-change C,T
|
||||||
|
|
||||||
|
# Generate the 3N-conversion-table for bisulfite sequencing data from unsorted BAM file:
|
||||||
|
samtools sort alignment_result.bam -O sam | hisat-3n-table --ref genome.fa --alignments - --output-name output.tsv --base-change C,T
|
||||||
|
|
||||||
|
|
||||||
|
#### Note:
|
||||||
|
There are 7 columns in the 3N-conversion-table:
|
||||||
|
|
||||||
|
1. `ref`: the chromosome name.
|
||||||
|
2. `pos`: 1-based position in `ref`.
|
||||||
|
3. `strand`: '+' for forward strand. '-' for reverse strand.
|
||||||
|
4. `convertedBaseQualities`: the qualities of the converted bases in read-level measurement. The length of this string is equal to the number of converted bases.
|
||||||
|
5. `convertedBaseCount`: the number of distinct read positions where converted bases in read-level measurements were found.
|
||||||
|
this number is equal to the length of convertedBaseQualities.
|
||||||
|
6. `unconvertedBaseQualities`: the qualities of the unconverted bases in read-level measurement. The length of this string is equal to the number of unconverted bases in read-level measurement.
|
||||||
|
7. `unconvertedBaseCount`: the number of distinct read positions where unconverted bases in read-level measurements were found.
|
||||||
|
this number is equal to the length of unconvertedBaseQualities.
|
||||||
|
|
||||||
|
##### Sample 3N-conversion-table:
|
||||||
|
ref pos strand convertedBaseQualities convertedBaseCount unconvertedBaseQualities unconvertedBaseCount
|
||||||
|
1 11874 + FFFFFB<BF<F 11 0
|
||||||
|
1 11877 - FFFFFF< 7 0
|
||||||
|
1 11878 + FFFBB//F/BB 11 0
|
||||||
|
1 11879 + 0 FFFBB//FB/ 10
|
||||||
|
1 11880 - F 1 FFFF/ 5
|
||||||
|
[SAMtools]: http://samtools.sourceforge.net
|
||||||
|
|
||||||
|
Publication
|
||||||
|
============
|
||||||
|
|
||||||
|
* HISAT-3N
|
||||||
|
Zhang, Y., Park, C., Bennett, C., Thornton, M. and Kim, D. [Rapid and accurate alignment of nucleotide conversion sequencing reads with HISAT-3N](https://doi.org/10.1101/gr.275193.120). _Genome Research_ **31(7)**: 1290-1295 (2021)
|
||||||
|
|
||||||
|
|
||||||
|
* HIAST2
|
||||||
|
Kim, D., Paggi, J.M., Park, C. _et al._ [Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype](https://doi.org/10.1038/s41587-019-0201-4). _Nat Biotechnol_ **37**, 907–915 (2019)
|
4
TUTORIAL
Normal file
4
TUTORIAL
Normal file
@ -0,0 +1,4 @@
|
|||||||
|
See section toward end of MANUAL entited "Getting started with HISAT2". Or,
|
||||||
|
for tutorial for latest HISAT2 version, visit:
|
||||||
|
|
||||||
|
https://ccb.jhu.edu/software/hisat2/manual.shtml#getting-started-with-hisat2
|
1
_config.yml
Normal file
1
_config.yml
Normal file
@ -0,0 +1 @@
|
|||||||
|
theme: jekyll-theme-time-machine
|
1772
aligner_bt.cpp
Normal file
1772
aligner_bt.cpp
Normal file
File diff suppressed because it is too large
Load Diff
947
aligner_bt.h
Normal file
947
aligner_bt.h
Normal file
@ -0,0 +1,947 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef ALIGNER_BT_H_
|
||||||
|
#define ALIGNER_BT_H_
|
||||||
|
|
||||||
|
#include <utility>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include "aligner_sw_common.h"
|
||||||
|
#include "aligner_result.h"
|
||||||
|
#include "scoring.h"
|
||||||
|
#include "edit.h"
|
||||||
|
#include "limit.h"
|
||||||
|
#include "dp_framer.h"
|
||||||
|
#include "sse_util.h"
|
||||||
|
|
||||||
|
/* Say we've filled in a DP matrix in a cost-only manner, not saving the scores
|
||||||
|
* for each of the cells. At the end, we obtain a list of candidate cells and
|
||||||
|
* we'd like to backtrace from them. The per-cell scores are gone, but we have
|
||||||
|
* to re-create the correct path somehow. Hopefully we can do this without
|
||||||
|
* recreating most or al of the score matrix, since this takes too much memory.
|
||||||
|
*
|
||||||
|
* Approach 1: Naively refill the matrix.
|
||||||
|
*
|
||||||
|
* Just refill the matrix, perhaps backwards starting from the backtrace cell.
|
||||||
|
* Since this involves recreating all or most of the score matrix, this is not
|
||||||
|
* a good approach.
|
||||||
|
*
|
||||||
|
* Approach 2: Naive backtracking.
|
||||||
|
*
|
||||||
|
* Conduct a search through the space of possible backtraces, rooted at the
|
||||||
|
* candidate cell. To speed things along, we can prioritize paths that have a
|
||||||
|
* high score and that align more characters from the read.
|
||||||
|
*
|
||||||
|
* The approach is simple, but it's neither fast nor memory-efficient in
|
||||||
|
* general.
|
||||||
|
*
|
||||||
|
* Approach 3: Refilling with checkpoints.
|
||||||
|
*
|
||||||
|
* Refill the matrix "backwards" starting from the candidate cell, but use
|
||||||
|
* checkpoints to ensure that only a series of relatively small triangles or
|
||||||
|
* rectangles need to be refilled. The checkpoints must include elements from
|
||||||
|
* the H, E and F matrices; not just H. After each refill, we backtrace
|
||||||
|
* through the refilled area, then discard/reuse the fill memory. I call each
|
||||||
|
* such fill/backtrace a mini-fill/backtrace.
|
||||||
|
*
|
||||||
|
* If there's only one path to be found, then this is O(m+n). But what if
|
||||||
|
* there are many? And what if we would like to avoid paths that overlap in
|
||||||
|
* one or more cells? There are two ways we can make this more efficient:
|
||||||
|
*
|
||||||
|
* 1. Remember the re-calculated E/F/H values and try to retrieve them
|
||||||
|
* 2. Keep a record of cells that have already been traversed
|
||||||
|
*
|
||||||
|
* Legend:
|
||||||
|
*
|
||||||
|
* 1: Candidate cell
|
||||||
|
* 2: Final cell from first mini-fill/backtrace
|
||||||
|
* 3: Final cell from second mini-fill/backtrace (third not shown)
|
||||||
|
* +: Checkpointed cell
|
||||||
|
* *: Cell filled from first or second mini-fill/backtrace
|
||||||
|
* -: Unfilled cell
|
||||||
|
*
|
||||||
|
* ---++--------++--------++----
|
||||||
|
* --++--------++*-------++-----
|
||||||
|
* -++--(etc)-++**------++------
|
||||||
|
* ++--------+3***-----++-------
|
||||||
|
* +--------++****----++--------
|
||||||
|
* --------++*****---++--------+
|
||||||
|
* -------++******--++--------++
|
||||||
|
* ------++*******-++*-------++-
|
||||||
|
* -----++********++**------++--
|
||||||
|
* ----++********2+***-----++---
|
||||||
|
* ---++--------++****----++----
|
||||||
|
* --++--------++*****---++-----
|
||||||
|
* -++--------++*****1--++------
|
||||||
|
* ++--------++--------++-------
|
||||||
|
*
|
||||||
|
* Approach 4: Backtracking with checkpoints.
|
||||||
|
*
|
||||||
|
* Conduct a search through the space of possible backtraces, rooted at the
|
||||||
|
* candidate cell. Use "checkpoints" to prune. That is, when a backtrace
|
||||||
|
* moves through a cell with a checkpointed score, consider the score
|
||||||
|
* accumulated so far and the cell's saved score; abort if those two scores
|
||||||
|
* add to something less than a valid score. Note we're only checkpointing H
|
||||||
|
* in this case (possibly; see "subtle point"), not E or F.
|
||||||
|
*
|
||||||
|
* Subtle point: checkpoint scores are a result of moving forward through
|
||||||
|
* the matrix whereas backtracking scores result from moving backward. This
|
||||||
|
* matters becuase the two paths that meet up at a cell might have both
|
||||||
|
* factored in a gap open penalty for the same gap, in which case we will
|
||||||
|
* underestimate the overall score and prune a good path. Here are two ideas
|
||||||
|
* for how to resolve this:
|
||||||
|
*
|
||||||
|
* Idea 1: when we combine the forward and backward scores to find an overall
|
||||||
|
* score, and our backtrack procedure *just* made a horizontal or vertical
|
||||||
|
* move, add in a "bonus" equal to the gap open penalty of the appropraite
|
||||||
|
* type (read gap open for horizontal, ref gap open for vertical). This might
|
||||||
|
* overcompensate, since
|
||||||
|
*
|
||||||
|
* Idea 2: keep the E and F values for the checkpoints around, in addition to
|
||||||
|
* the H values. When it comes time to combine the score from the forward
|
||||||
|
* and backward paths, we consider the last move we made in the backward
|
||||||
|
* backtrace. If it's a read gap (horizontal move), then we calculate the
|
||||||
|
* overall score as:
|
||||||
|
*
|
||||||
|
* max(Score-backward + H-forward, Score-backward + E-forward + read-open)
|
||||||
|
*
|
||||||
|
* If it's a reference gap (vertical move), then we calculate the overall
|
||||||
|
* score as:
|
||||||
|
*
|
||||||
|
* max(Score-backward + H-forward, Score-backward + F-forward + ref-open)
|
||||||
|
*
|
||||||
|
* What does it mean to abort a backtrack? If we're starting a new branch
|
||||||
|
* and there is a checkpoing in the bottommost cell of the branch, and the
|
||||||
|
* overall score is less than the target, then we can simply ignore the
|
||||||
|
* branch. If the checkpoint occurs in the middle of a string of matches, we
|
||||||
|
* need to curtail the branch such that it doesn't include the checkpointed
|
||||||
|
* cell and we won't ever try to enter the checkpointed cell, e.g., on a
|
||||||
|
* mismatch.
|
||||||
|
*
|
||||||
|
* Approaches 3 and 4 seem reasonable, and could be combined. For simplicity,
|
||||||
|
* we implement only approach 4 for now.
|
||||||
|
*
|
||||||
|
* Checkpoint information is propagated from the fill process to the backtracer
|
||||||
|
* via a
|
||||||
|
*/
|
||||||
|
|
||||||
|
enum {
|
||||||
|
BT_NOT_FOUND = 1, // could not obtain the backtrace because it
|
||||||
|
// overlapped a previous solution
|
||||||
|
BT_FOUND, // obtained a valid backtrace
|
||||||
|
BT_REJECTED_N, // backtrace rejected because it had too many Ns
|
||||||
|
BT_REJECTED_CORE_DIAG // backtrace rejected because it failed to overlap a
|
||||||
|
// core diagonal
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Parameters for a matrix of potential backtrace problems to solve.
|
||||||
|
* Encapsulates information about:
|
||||||
|
*
|
||||||
|
* The problem given a particular reference substring:
|
||||||
|
*
|
||||||
|
* - The query string (nucleotides and qualities)
|
||||||
|
* - The reference substring (incl. orientation, offset into overall sequence)
|
||||||
|
* - Checkpoints (i.e. values of matrix cells)
|
||||||
|
* - Scoring scheme and other thresholds
|
||||||
|
*
|
||||||
|
* The problem given a particular reference substring AND a particular row and
|
||||||
|
* column from which to backtrace:
|
||||||
|
*
|
||||||
|
* - The row and column
|
||||||
|
* - The target score
|
||||||
|
*/
|
||||||
|
class BtBranchProblem {
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Create new uninitialized problem.
|
||||||
|
*/
|
||||||
|
BtBranchProblem() { reset(); }
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialize a new problem.
|
||||||
|
*/
|
||||||
|
void initRef(
|
||||||
|
const char *qry, // query string (along rows)
|
||||||
|
const char *qual, // query quality string (along rows)
|
||||||
|
size_t qrylen, // query string (along rows) length
|
||||||
|
const char *ref, // reference string (along columns)
|
||||||
|
TRefOff reflen, // in-rectangle reference string length
|
||||||
|
TRefOff treflen,// total reference string length
|
||||||
|
TRefId refid, // reference id
|
||||||
|
TRefOff refoff, // reference offset
|
||||||
|
bool fw, // orientation of problem
|
||||||
|
const DPRect* rect, // dynamic programming rectangle filled out
|
||||||
|
const Checkpointer* cper, // checkpointer
|
||||||
|
const Scoring *sc, // scoring scheme
|
||||||
|
size_t nceil) // max # Ns allowed in alignment
|
||||||
|
{
|
||||||
|
qry_ = qry;
|
||||||
|
qual_ = qual;
|
||||||
|
qrylen_ = qrylen;
|
||||||
|
ref_ = ref;
|
||||||
|
reflen_ = reflen;
|
||||||
|
treflen_ = treflen;
|
||||||
|
refid_ = refid;
|
||||||
|
refoff_ = refoff;
|
||||||
|
fw_ = fw;
|
||||||
|
rect_ = rect;
|
||||||
|
cper_ = cper;
|
||||||
|
sc_ = sc;
|
||||||
|
nceil_ = nceil;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialize a new problem.
|
||||||
|
*/
|
||||||
|
void initBt(
|
||||||
|
size_t row, // row
|
||||||
|
size_t col, // column
|
||||||
|
bool fill, // use a filling rather than a backtracking strategy
|
||||||
|
bool usecp, // use checkpoints to short-circuit while backtracking
|
||||||
|
TAlScore targ) // target score
|
||||||
|
{
|
||||||
|
row_ = row;
|
||||||
|
col_ = col;
|
||||||
|
targ_ = targ;
|
||||||
|
fill_ = fill;
|
||||||
|
usecp_ = usecp;
|
||||||
|
if(fill) {
|
||||||
|
assert(usecp_);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Reset to uninitialized state.
|
||||||
|
*/
|
||||||
|
void reset() {
|
||||||
|
qry_ = qual_ = ref_ = NULL;
|
||||||
|
cper_ = NULL;
|
||||||
|
rect_ = NULL;
|
||||||
|
sc_ = NULL;
|
||||||
|
qrylen_ = reflen_ = treflen_ = refid_ = refoff_ = row_ = col_ = targ_ = nceil_ = 0;
|
||||||
|
fill_ = fw_ = usecp_ = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff the BtBranchProblem has been initialized.
|
||||||
|
*/
|
||||||
|
bool inited() const {
|
||||||
|
return qry_ != NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
/**
|
||||||
|
* Sanity-check the problem.
|
||||||
|
*/
|
||||||
|
bool repOk() const {
|
||||||
|
assert_gt(qrylen_, 0);
|
||||||
|
assert_gt(reflen_, 0);
|
||||||
|
assert_gt(treflen_, 0);
|
||||||
|
assert_lt(row_, qrylen_);
|
||||||
|
assert_lt((TRefOff)col_, reflen_);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
size_t reflen() const { return reflen_; }
|
||||||
|
size_t treflen() const { return treflen_; }
|
||||||
|
|
||||||
|
protected:
|
||||||
|
|
||||||
|
const char *qry_; // query string (along rows)
|
||||||
|
const char *qual_; // query quality string (along rows)
|
||||||
|
size_t qrylen_; // query string (along rows) length
|
||||||
|
const char *ref_; // reference string (along columns)
|
||||||
|
TRefOff reflen_; // in-rectangle reference string length
|
||||||
|
TRefOff treflen_;// total reference string length
|
||||||
|
TRefId refid_; // reference id
|
||||||
|
TRefOff refoff_; // reference offset
|
||||||
|
bool fw_; // orientation of problem
|
||||||
|
const DPRect* rect_; // dynamic programming rectangle filled out
|
||||||
|
size_t row_; // starting row
|
||||||
|
size_t col_; // starting column
|
||||||
|
TAlScore targ_; // target score
|
||||||
|
const Checkpointer *cper_; // checkpointer
|
||||||
|
bool fill_; // use mini-fills
|
||||||
|
bool usecp_; // use checkpointing?
|
||||||
|
const Scoring *sc_; // scoring scheme
|
||||||
|
size_t nceil_; // max # Ns allowed in alignment
|
||||||
|
|
||||||
|
friend class BtBranch;
|
||||||
|
friend class BtBranchQ;
|
||||||
|
friend class BtBranchTracer;
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Encapsulates a "branch" which is a diagonal of cells (possibly of length 0)
|
||||||
|
* in the matrix where all the cells are matches. These stretches are linked
|
||||||
|
* together by edits to form a full backtrace path through the matrix. Lengths
|
||||||
|
* are measured w/r/t to the number of rows traversed by the path, so a branch
|
||||||
|
* that represents a read gap extension could have length = 0.
|
||||||
|
*
|
||||||
|
* At the end of the day, the full backtrace path is represented as a list of
|
||||||
|
* BtBranch's where each BtBranch represents a stretch of matching cells (and
|
||||||
|
* up to one mismatching cell at its bottom extreme) ending in an edit (or in
|
||||||
|
* the bottommost row, in which case the edit is uninitialized). Each
|
||||||
|
* BtBranch's row and col fields indicate the bottommost cell involved in the
|
||||||
|
* diagonal stretch of matches, and the len_ field indicates the length of the
|
||||||
|
* stretch of matches. Note that the edits themselves also correspond to
|
||||||
|
* movement through the matrix.
|
||||||
|
*
|
||||||
|
* A related issue is how we record which cells have been visited so that we
|
||||||
|
* never report a pair of paths both traversing the same (row, col) of the
|
||||||
|
* overall DP matrix. This gets a little tricky because we have to take into
|
||||||
|
* account the cells covered by *edits* in addition to the cells covered by the
|
||||||
|
* stretches of matches. For instance: imagine a mismatch. That takes up a
|
||||||
|
* cell of the DP matrix, but it may or may not be preceded by a string of
|
||||||
|
* matches. It's hard to imagine how to represent this unless we let the
|
||||||
|
* mismatch "count toward" the len_ of the branch and let (row, col) refer to
|
||||||
|
* the cell where the mismatch occurs.
|
||||||
|
*
|
||||||
|
* We need BtBranches to "live forever" so that we can make some BtBranches
|
||||||
|
* parents of others using parent pointers. For this reason, BtBranch's are
|
||||||
|
* stored in an EFactory object in the BtBranchTracer class.
|
||||||
|
*/
|
||||||
|
class BtBranch {
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
BtBranch() { reset(); }
|
||||||
|
|
||||||
|
BtBranch(
|
||||||
|
const BtBranchProblem& prob,
|
||||||
|
size_t parentId,
|
||||||
|
TAlScore penalty,
|
||||||
|
TAlScore score_en,
|
||||||
|
int64_t row,
|
||||||
|
int64_t col,
|
||||||
|
Edit e,
|
||||||
|
int hef,
|
||||||
|
bool root,
|
||||||
|
bool extend)
|
||||||
|
{
|
||||||
|
init(prob, parentId, penalty, score_en, row, col, e, hef, root, extend);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Reset to uninitialized state.
|
||||||
|
*/
|
||||||
|
void reset() {
|
||||||
|
parentId_ = 0;
|
||||||
|
score_st_ = score_en_ = len_ = row_ = col_ = 0;
|
||||||
|
curtailed_ = false;
|
||||||
|
e_.reset();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Caller gives us score_en, row and col. We figure out score_st and len_
|
||||||
|
* by comparing characters from the strings.
|
||||||
|
*/
|
||||||
|
void init(
|
||||||
|
const BtBranchProblem& prob,
|
||||||
|
size_t parentId,
|
||||||
|
TAlScore penalty,
|
||||||
|
TAlScore score_en,
|
||||||
|
int64_t row,
|
||||||
|
int64_t col,
|
||||||
|
Edit e,
|
||||||
|
int hef,
|
||||||
|
bool root,
|
||||||
|
bool extend);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff this branch ends in a solution to the backtrace problem.
|
||||||
|
*/
|
||||||
|
bool isSolution(const BtBranchProblem& prob) const {
|
||||||
|
const bool end2end = prob.sc_->monotone;
|
||||||
|
return score_st_ == prob.targ_ && (!end2end || endsInFirstRow());
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff this branch could potentially lead to a valid alignment.
|
||||||
|
*/
|
||||||
|
bool isValid(const BtBranchProblem& prob) const {
|
||||||
|
int64_t scoreFloor = prob.sc_->monotone ? MIN_I64 : 0;
|
||||||
|
if(score_st_ < scoreFloor) {
|
||||||
|
// Dipped below the score floor
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
if(isSolution(prob)) {
|
||||||
|
// It's a solution, so it's also valid
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
if((int64_t)len_ > row_) {
|
||||||
|
// Went all the way to the top row
|
||||||
|
//assert_leq(score_st_, prob.targ_);
|
||||||
|
return score_st_ == prob.targ_;
|
||||||
|
} else {
|
||||||
|
int64_t match = prob.sc_->match();
|
||||||
|
int64_t bonusLeft = (row_ + 1 - len_) * match;
|
||||||
|
return score_st_ + bonusLeft >= prob.targ_;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff this branch overlaps with the given branch.
|
||||||
|
*/
|
||||||
|
bool overlap(const BtBranchProblem& prob, const BtBranch& bt) const {
|
||||||
|
// Calculate this branch's diagonal
|
||||||
|
assert_lt(row_, (int64_t)prob.qrylen_);
|
||||||
|
size_t fromend = prob.qrylen_ - row_ - 1;
|
||||||
|
size_t diag = fromend + col_;
|
||||||
|
int64_t lo = 0, hi = row_ + 1;
|
||||||
|
if(len_ == 0) {
|
||||||
|
lo = row_;
|
||||||
|
} else {
|
||||||
|
lo = row_ - (len_ - 1);
|
||||||
|
}
|
||||||
|
// Calculate other branch's diagonal
|
||||||
|
assert_lt(bt.row_, (int64_t)prob.qrylen_);
|
||||||
|
size_t ofromend = prob.qrylen_ - bt.row_ - 1;
|
||||||
|
size_t odiag = ofromend + bt.col_;
|
||||||
|
if(diag != odiag) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
int64_t olo = 0, ohi = bt.row_ + 1;
|
||||||
|
if(bt.len_ == 0) {
|
||||||
|
olo = bt.row_;
|
||||||
|
} else {
|
||||||
|
olo = bt.row_ - (bt.len_ - 1);
|
||||||
|
}
|
||||||
|
int64_t losm = olo, hism = ohi;
|
||||||
|
if(hi - lo < ohi - olo) {
|
||||||
|
swap(lo, losm);
|
||||||
|
swap(hi, hism);
|
||||||
|
}
|
||||||
|
if((lo <= losm && hi > losm) || (lo < hism && hi >= hism)) {
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff this branch is higher priority than the branch 'o'.
|
||||||
|
*/
|
||||||
|
bool operator<(const BtBranch& o) const {
|
||||||
|
// Prioritize uppermost above score
|
||||||
|
if(uppermostRow() != o.uppermostRow()) {
|
||||||
|
return uppermostRow() < o.uppermostRow();
|
||||||
|
}
|
||||||
|
if(score_st_ != o.score_st_) return score_st_ > o.score_st_;
|
||||||
|
if(row_ != o.row_) return row_ < o.row_;
|
||||||
|
if(col_ != o.col_) return col_ > o.col_;
|
||||||
|
if(parentId_ != o.parentId_) return parentId_ > o.parentId_;
|
||||||
|
assert(false);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff the topmost cell involved in this branch is in the top
|
||||||
|
* row.
|
||||||
|
*/
|
||||||
|
bool endsInFirstRow() const {
|
||||||
|
assert_leq((int64_t)len_, row_ + 1);
|
||||||
|
return (int64_t)len_ == row_+1;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the uppermost row covered by this branch.
|
||||||
|
*/
|
||||||
|
size_t uppermostRow() const {
|
||||||
|
assert_geq(row_ + 1, (int64_t)len_);
|
||||||
|
return row_ + 1 - (int64_t)len_;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the leftmost column covered by this branch.
|
||||||
|
*/
|
||||||
|
size_t leftmostCol() const {
|
||||||
|
assert_geq(col_ + 1, (int64_t)len_);
|
||||||
|
return col_ + 1 - (int64_t)len_;
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
/**
|
||||||
|
* Sanity-check this BtBranch.
|
||||||
|
*/
|
||||||
|
bool repOk() const {
|
||||||
|
assert(root_ || e_.inited());
|
||||||
|
assert_gt(len_, 0);
|
||||||
|
assert_geq(col_ + 1, (int64_t)len_);
|
||||||
|
assert_geq(row_ + 1, (int64_t)len_);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
protected:
|
||||||
|
|
||||||
|
// ID of the parent branch.
|
||||||
|
size_t parentId_;
|
||||||
|
|
||||||
|
// Penalty associated with the edit at the bottom of this branch (0 if
|
||||||
|
// there is no edit)
|
||||||
|
TAlScore penalty_;
|
||||||
|
|
||||||
|
// Score at the beginning of the branch
|
||||||
|
TAlScore score_st_;
|
||||||
|
|
||||||
|
// Score at the end of the branch (taking the edit into account)
|
||||||
|
TAlScore score_en_;
|
||||||
|
|
||||||
|
// Length of the branch. That is, the total number of diagonal cells
|
||||||
|
// involved in all the matches and in the edit (if any). Should always be
|
||||||
|
// > 0.
|
||||||
|
size_t len_;
|
||||||
|
|
||||||
|
// The row of the final (bottommost) cell in the branch. This might be the
|
||||||
|
// bottommost match if the branch has no associated edit. Otherwise, it's
|
||||||
|
// the cell occupied by the edit.
|
||||||
|
int64_t row_;
|
||||||
|
|
||||||
|
// The column of the final (bottommost) cell in the branch.
|
||||||
|
int64_t col_;
|
||||||
|
|
||||||
|
// The edit at the bottom of the branch. If this is the bottommost branch
|
||||||
|
// in the alignment and it does not end in an edit, then this remains
|
||||||
|
// uninitialized.
|
||||||
|
Edit e_;
|
||||||
|
|
||||||
|
// True iff this is the bottommost branch in the alignment. We can't just
|
||||||
|
// use row_ to tell us this because local alignments don't necessarily end
|
||||||
|
// in the last row.
|
||||||
|
bool root_;
|
||||||
|
|
||||||
|
bool curtailed_; // true -> pruned at a checkpoint where we otherwise
|
||||||
|
// would have had a match
|
||||||
|
|
||||||
|
friend class BtBranchQ;
|
||||||
|
friend class BtBranchTracer;
|
||||||
|
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Instantiate and solve best-first branch-based backtraces.
|
||||||
|
*/
|
||||||
|
class BtBranchTracer {
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
explicit BtBranchTracer() :
|
||||||
|
prob_(), bs_(), seenPaths_(DP_CAT), sawcell_(DP_CAT), doTri_() { }
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Add a branch to the queue.
|
||||||
|
*/
|
||||||
|
void add(size_t id) {
|
||||||
|
assert(!bs_[id].isSolution(prob_));
|
||||||
|
unsorted_.push_back(make_pair(bs_[id].score_st_, id));
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Add a branch to the list of solutions.
|
||||||
|
*/
|
||||||
|
void addSolution(size_t id) {
|
||||||
|
assert(bs_[id].isSolution(prob_));
|
||||||
|
solutions_.push_back(id);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Given a potential branch to add to the queue, see if we can follow the
|
||||||
|
* branch a little further first. If it's still valid, or if we reach a
|
||||||
|
* choice between valid outgoing paths, go ahead and add it to the queue.
|
||||||
|
*/
|
||||||
|
void examineBranch(
|
||||||
|
int64_t row,
|
||||||
|
int64_t col,
|
||||||
|
const Edit& e,
|
||||||
|
TAlScore pen,
|
||||||
|
TAlScore sc,
|
||||||
|
size_t parentId);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Take all possible ways of leaving the given branch and add them to the
|
||||||
|
* branch queue.
|
||||||
|
*/
|
||||||
|
void addOffshoots(size_t bid);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get the best branch and remove it from the priority queue.
|
||||||
|
*/
|
||||||
|
size_t best(RandomSource& rnd) {
|
||||||
|
assert(!empty());
|
||||||
|
flushUnsorted();
|
||||||
|
assert_gt(sortedSel_ ? sorted1_.size() : sorted2_.size(), cur_);
|
||||||
|
// Perhaps shuffle everyone who's tied for first?
|
||||||
|
size_t id = sortedSel_ ? sorted1_[cur_] : sorted2_[cur_];
|
||||||
|
cur_++;
|
||||||
|
return id;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff there are no branches left to try.
|
||||||
|
*/
|
||||||
|
bool empty() const {
|
||||||
|
return size() == 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the size, i.e. the total number of branches contained.
|
||||||
|
*/
|
||||||
|
size_t size() const {
|
||||||
|
return unsorted_.size() +
|
||||||
|
(sortedSel_ ? sorted1_.size() : sorted2_.size()) - cur_;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff there are no solutions left to try.
|
||||||
|
*/
|
||||||
|
bool emptySolution() const {
|
||||||
|
return sizeSolution() == 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the size of the solution set so far.
|
||||||
|
*/
|
||||||
|
size_t sizeSolution() const {
|
||||||
|
return solutions_.size();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Sort unsorted branches, merge them with master sorted list.
|
||||||
|
*/
|
||||||
|
void flushUnsorted();
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
/**
|
||||||
|
* Sanity-check the queue.
|
||||||
|
*/
|
||||||
|
bool repOk() const {
|
||||||
|
assert_lt(cur_, (sortedSel_ ? sorted1_.size() : sorted2_.size()));
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialize the tracer with respect to a new read. This involves
|
||||||
|
* resetting all the state relating to the set of cells already visited
|
||||||
|
*/
|
||||||
|
void initRef(
|
||||||
|
const char* rd, // in: read sequence
|
||||||
|
const char* qu, // in: quality sequence
|
||||||
|
size_t rdlen, // in: read sequence length
|
||||||
|
const char* rf, // in: reference sequence
|
||||||
|
size_t rflen, // in: in-rectangle reference sequence length
|
||||||
|
TRefOff trflen, // in: total reference sequence length
|
||||||
|
TRefId refid, // in: reference id
|
||||||
|
TRefOff refoff, // in: reference offset
|
||||||
|
bool fw, // in: orientation
|
||||||
|
const DPRect *rect, // in: DP rectangle
|
||||||
|
const Checkpointer *cper, // in: checkpointer
|
||||||
|
const Scoring& sc, // in: scoring scheme
|
||||||
|
size_t nceil) // in: N ceiling
|
||||||
|
{
|
||||||
|
prob_.initRef(rd, qu, rdlen, rf, rflen, trflen, refid, refoff, fw, rect, cper, &sc, nceil);
|
||||||
|
const size_t ndiag = rflen + rdlen - 1;
|
||||||
|
seenPaths_.resize(ndiag);
|
||||||
|
for(size_t i = 0; i < ndiag; i++) {
|
||||||
|
seenPaths_[i].clear();
|
||||||
|
}
|
||||||
|
// clear each of the per-column sets
|
||||||
|
if(sawcell_.size() < rflen) {
|
||||||
|
size_t isz = sawcell_.size();
|
||||||
|
sawcell_.resize(rflen);
|
||||||
|
for(size_t i = isz; i < rflen; i++) {
|
||||||
|
sawcell_[i].setCat(DP_CAT);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
for(size_t i = 0; i < rflen; i++) {
|
||||||
|
sawcell_[i].setCat(DP_CAT);
|
||||||
|
sawcell_[i].clear(); // clear the set
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialize with a new backtrace.
|
||||||
|
*/
|
||||||
|
void initBt(
|
||||||
|
TAlScore escore, // in: alignment score
|
||||||
|
size_t row, // in: start in this row
|
||||||
|
size_t col, // in: start in this column
|
||||||
|
bool fill, // in: use mini-filling?
|
||||||
|
bool usecp, // in: use checkpointing?
|
||||||
|
bool doTri, // in: triangle-shaped mini-fills?
|
||||||
|
RandomSource& rnd) // in: random gen, to choose among equal paths
|
||||||
|
{
|
||||||
|
prob_.initBt(row, col, fill, usecp, escore);
|
||||||
|
Edit e; e.reset();
|
||||||
|
unsorted_.clear();
|
||||||
|
solutions_.clear();
|
||||||
|
sorted1_.clear();
|
||||||
|
sorted2_.clear();
|
||||||
|
cur_ = 0;
|
||||||
|
nmm_ = 0; // number of mismatches attempted
|
||||||
|
nnmm_ = 0; // number of mismatches involving N attempted
|
||||||
|
nrdop_ = 0; // number of read gap opens attempted
|
||||||
|
nrfop_ = 0; // number of ref gap opens attempted
|
||||||
|
nrdex_ = 0; // number of read gap extensions attempted
|
||||||
|
nrfex_ = 0; // number of ref gap extensions attempted
|
||||||
|
nmmPrune_ = 0; // number of mismatches attempted
|
||||||
|
nnmmPrune_ = 0; // number of mismatches involving N attempted
|
||||||
|
nrdopPrune_ = 0; // number of read gap opens attempted
|
||||||
|
nrfopPrune_ = 0; // number of ref gap opens attempted
|
||||||
|
nrdexPrune_ = 0; // number of read gap extensions attempted
|
||||||
|
nrfexPrune_ = 0; // number of ref gap extensions attempted
|
||||||
|
row_ = row;
|
||||||
|
col_ = col;
|
||||||
|
doTri_ = doTri;
|
||||||
|
bs_.clear();
|
||||||
|
if(!prob_.fill_) {
|
||||||
|
size_t id = bs_.alloc();
|
||||||
|
bs_[id].init(
|
||||||
|
prob_,
|
||||||
|
0, // parent id
|
||||||
|
0, // penalty
|
||||||
|
0, // starting score
|
||||||
|
row, // row
|
||||||
|
col, // column
|
||||||
|
e,
|
||||||
|
0,
|
||||||
|
true, // this is the root
|
||||||
|
true); // this should be extend with exact matches
|
||||||
|
if(bs_[id].isSolution(prob_)) {
|
||||||
|
addSolution(id);
|
||||||
|
} else {
|
||||||
|
add(id);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
int64_t row = row_, col = col_;
|
||||||
|
TAlScore targsc = prob_.targ_;
|
||||||
|
int hef = 0;
|
||||||
|
bool done = false, abort = false;
|
||||||
|
size_t depth = 0;
|
||||||
|
while(!done && !abort) {
|
||||||
|
// Accumulate edits as we go. We can do this by adding
|
||||||
|
// BtBranches to the bs_ structure. Each step of the backtrace
|
||||||
|
// either involves an edit (thereby starting a new branch) or
|
||||||
|
// extends the previous branch by one more position.
|
||||||
|
//
|
||||||
|
// Note: if the BtBranches are in line, then trySolution can be
|
||||||
|
// used to populate the SwResult and check for various
|
||||||
|
// situations where we might reject the alignment (i.e. due to
|
||||||
|
// a cell having been visited previously).
|
||||||
|
if(doTri_) {
|
||||||
|
triangleFill(
|
||||||
|
row, // row of cell to backtrace from
|
||||||
|
col, // column of cell to backtrace from
|
||||||
|
hef, // cell to bt from: H (0), E (1), or F (2)
|
||||||
|
targsc, // score of cell to backtrace from
|
||||||
|
prob_.targ_, // score of alignment we're looking for
|
||||||
|
rnd, // pseudo-random generator
|
||||||
|
row, // out: row we ended up in after bt
|
||||||
|
col, // out: column we ended up in after bt
|
||||||
|
hef, // out: H/E/F after backtrace
|
||||||
|
targsc, // out: score up to cell we ended up in
|
||||||
|
done, // out: finished tracing out an alignment?
|
||||||
|
abort); // out: aborted b/c cell was seen before?
|
||||||
|
} else {
|
||||||
|
squareFill(
|
||||||
|
row, // row of cell to backtrace from
|
||||||
|
col, // column of cell to backtrace from
|
||||||
|
hef, // cell to bt from: H (0), E (1), or F (2)
|
||||||
|
targsc, // score of cell to backtrace from
|
||||||
|
prob_.targ_, // score of alignment we're looking for
|
||||||
|
rnd, // pseudo-random generator
|
||||||
|
row, // out: row we ended up in after bt
|
||||||
|
col, // out: column we ended up in after bt
|
||||||
|
hef, // out: H/E/F after backtrace
|
||||||
|
targsc, // out: score up to cell we ended up in
|
||||||
|
done, // out: finished tracing out an alignment?
|
||||||
|
abort); // out: aborted b/c cell was seen before?
|
||||||
|
}
|
||||||
|
if(depth >= ndep_.size()) {
|
||||||
|
ndep_.resize(depth+1);
|
||||||
|
ndep_[depth] = 1;
|
||||||
|
} else {
|
||||||
|
ndep_[depth]++;
|
||||||
|
}
|
||||||
|
depth++;
|
||||||
|
assert((row >= 0 && col >= 0) || done);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
ASSERT_ONLY(seen_.clear());
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get the next valid alignment given the backtrace problem. Return false
|
||||||
|
* if there is no valid solution, e.g., if
|
||||||
|
*/
|
||||||
|
bool nextAlignment(
|
||||||
|
size_t maxiter,
|
||||||
|
SwResult& res,
|
||||||
|
size_t& off,
|
||||||
|
size_t& nrej,
|
||||||
|
size_t& niter,
|
||||||
|
RandomSource& rnd);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff this tracer has been initialized
|
||||||
|
*/
|
||||||
|
bool inited() const {
|
||||||
|
return prob_.inited();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff the mini-fills are triangle-shaped.
|
||||||
|
*/
|
||||||
|
bool doTri() const { return doTri_; }
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Fill in a triangle of the DP table and backtrace from the given cell to
|
||||||
|
* a cell in the previous checkpoint, or to the terminal cell.
|
||||||
|
*/
|
||||||
|
void triangleFill(
|
||||||
|
int64_t rw, // row of cell to backtrace from
|
||||||
|
int64_t cl, // column of cell to backtrace from
|
||||||
|
int hef, // cell to backtrace from is H (0), E (1), or F (2)
|
||||||
|
TAlScore targ, // score of cell to backtrace from
|
||||||
|
TAlScore targ_final, // score of alignment we're looking for
|
||||||
|
RandomSource& rnd, // pseudo-random generator
|
||||||
|
int64_t& row_new, // out: row we ended up in after backtrace
|
||||||
|
int64_t& col_new, // out: column we ended up in after backtrace
|
||||||
|
int& hef_new, // out: H/E/F after backtrace
|
||||||
|
TAlScore& targ_new, // out: score up to cell we ended up in
|
||||||
|
bool& done, // out: finished tracing out an alignment?
|
||||||
|
bool& abort); // out: aborted b/c cell was seen before?
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Fill in a square of the DP table and backtrace from the given cell to
|
||||||
|
* a cell in the previous checkpoint, or to the terminal cell.
|
||||||
|
*/
|
||||||
|
void squareFill(
|
||||||
|
int64_t rw, // row of cell to backtrace from
|
||||||
|
int64_t cl, // column of cell to backtrace from
|
||||||
|
int hef, // cell to backtrace from is H (0), E (1), or F (2)
|
||||||
|
TAlScore targ, // score of cell to backtrace from
|
||||||
|
TAlScore targ_final, // score of alignment we're looking for
|
||||||
|
RandomSource& rnd, // pseudo-random generator
|
||||||
|
int64_t& row_new, // out: row we ended up in after backtrace
|
||||||
|
int64_t& col_new, // out: column we ended up in after backtrace
|
||||||
|
int& hef_new, // out: H/E/F after backtrace
|
||||||
|
TAlScore& targ_new, // out: score up to cell we ended up in
|
||||||
|
bool& done, // out: finished tracing out an alignment?
|
||||||
|
bool& abort); // out: aborted b/c cell was seen before?
|
||||||
|
|
||||||
|
protected:
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get the next valid alignment given a backtrace problem. Return false
|
||||||
|
* if there is no valid solution. Use a backtracking search to find the
|
||||||
|
* solution. This can be very slow.
|
||||||
|
*/
|
||||||
|
bool nextAlignmentBacktrace(
|
||||||
|
size_t maxiter,
|
||||||
|
SwResult& res,
|
||||||
|
size_t& off,
|
||||||
|
size_t& nrej,
|
||||||
|
size_t& niter,
|
||||||
|
RandomSource& rnd);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get the next valid alignment given a backtrace problem. Return false
|
||||||
|
* if there is no valid solution. Use a triangle-fill backtrace to find
|
||||||
|
* the solution. This is usually fast (it's O(m + n)).
|
||||||
|
*/
|
||||||
|
bool nextAlignmentFill(
|
||||||
|
size_t maxiter,
|
||||||
|
SwResult& res,
|
||||||
|
size_t& off,
|
||||||
|
size_t& nrej,
|
||||||
|
size_t& niter,
|
||||||
|
RandomSource& rnd);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Try all the solutions accumulated so far. Solutions might be rejected
|
||||||
|
* if they, for instance, overlap a previous solution, have too many Ns,
|
||||||
|
* fail to overlap a core diagonal, etc.
|
||||||
|
*/
|
||||||
|
bool trySolutions(
|
||||||
|
bool lookForOlap,
|
||||||
|
SwResult& res,
|
||||||
|
size_t& off,
|
||||||
|
size_t& nrej,
|
||||||
|
RandomSource& rnd,
|
||||||
|
bool& success);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* See if a given solution branch works as a solution (i.e. doesn't overlap
|
||||||
|
* another one, have too many Ns, fail to overlap a core diagonal, etc.)
|
||||||
|
*/
|
||||||
|
int trySolution(
|
||||||
|
size_t id,
|
||||||
|
bool lookForOlap,
|
||||||
|
SwResult& res,
|
||||||
|
size_t& off,
|
||||||
|
size_t& nrej,
|
||||||
|
RandomSource& rnd);
|
||||||
|
|
||||||
|
BtBranchProblem prob_; // problem configuration
|
||||||
|
EFactory<BtBranch> bs_; // global BtBranch factory
|
||||||
|
|
||||||
|
// already reported alignments going through these diagonal segments
|
||||||
|
ELList<std::pair<size_t, size_t> > seenPaths_;
|
||||||
|
ELSet<size_t> sawcell_; // cells already backtraced through
|
||||||
|
|
||||||
|
EList<std::pair<TAlScore, size_t> > unsorted_; // unsorted list of as-yet-unflished BtBranches
|
||||||
|
EList<size_t> sorted1_; // list of BtBranch, sorted by score
|
||||||
|
EList<size_t> sorted2_; // list of BtBranch, sorted by score
|
||||||
|
EList<size_t> solutions_; // list of solution branches
|
||||||
|
bool sortedSel_; // true -> 1, false -> 2
|
||||||
|
size_t cur_; // cursor into sorted list to start from
|
||||||
|
|
||||||
|
size_t nmm_; // number of mismatches attempted
|
||||||
|
size_t nnmm_; // number of mismatches involving N attempted
|
||||||
|
size_t nrdop_; // number of read gap opens attempted
|
||||||
|
size_t nrfop_; // number of ref gap opens attempted
|
||||||
|
size_t nrdex_; // number of read gap extensions attempted
|
||||||
|
size_t nrfex_; // number of ref gap extensions attempted
|
||||||
|
|
||||||
|
size_t nmmPrune_; //
|
||||||
|
size_t nnmmPrune_; //
|
||||||
|
size_t nrdopPrune_; //
|
||||||
|
size_t nrfopPrune_; //
|
||||||
|
size_t nrdexPrune_; //
|
||||||
|
size_t nrfexPrune_; //
|
||||||
|
|
||||||
|
size_t row_; // row
|
||||||
|
size_t col_; // column
|
||||||
|
|
||||||
|
bool doTri_; // true -> fill in triangles; false -> squares
|
||||||
|
EList<CpQuad> sq_; // square to fill when doing mini-fills
|
||||||
|
ELList<CpQuad> tri_; // triangle to fill when doing mini-fills
|
||||||
|
EList<size_t> ndep_; // # triangles mini-filled at various depths
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
ESet<size_t> seen_; // seedn branch ids; should never see same twice
|
||||||
|
#endif
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif /*ndef ALIGNER_BT_H_*/
|
181
aligner_cache.cpp
Normal file
181
aligner_cache.cpp
Normal file
@ -0,0 +1,181 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "aligner_cache.h"
|
||||||
|
#include "tinythread.h"
|
||||||
|
|
||||||
|
#ifdef ALIGNER_CACHE_MAIN
|
||||||
|
|
||||||
|
#include <iostream>
|
||||||
|
#include <getopt.h>
|
||||||
|
#include <string>
|
||||||
|
#include "random_source.h"
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
enum {
|
||||||
|
ARG_TESTS = 256
|
||||||
|
};
|
||||||
|
|
||||||
|
static const char *short_opts = "vCt";
|
||||||
|
static struct option long_opts[] = {
|
||||||
|
{(char*)"verbose", no_argument, 0, 'v'},
|
||||||
|
{(char*)"tests", no_argument, 0, ARG_TESTS},
|
||||||
|
};
|
||||||
|
|
||||||
|
static void printUsage(ostream& os) {
|
||||||
|
os << "Usage: sawhi-cache [options]*" << endl;
|
||||||
|
os << "Options:" << endl;
|
||||||
|
os << " --tests run unit tests" << endl;
|
||||||
|
os << " -v/--verbose talkative mode" << endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
int gVerbose = 0;
|
||||||
|
|
||||||
|
static void add(
|
||||||
|
RedBlack<QKey, QVal>& t,
|
||||||
|
Pool& p,
|
||||||
|
const char *dna)
|
||||||
|
{
|
||||||
|
QKey qk;
|
||||||
|
qk.init(BTDnaString(dna, true));
|
||||||
|
t.add(p, qk, NULL);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Small tests for the AlignmentCache.
|
||||||
|
*/
|
||||||
|
static void aligner_cache_tests() {
|
||||||
|
RedBlack<QKey, QVal> rb(1024);
|
||||||
|
Pool p(64 * 1024, 1024);
|
||||||
|
// Small test
|
||||||
|
add(rb, p, "ACGTCGATCGT");
|
||||||
|
add(rb, p, "ACATCGATCGT");
|
||||||
|
add(rb, p, "ACGACGATCGT");
|
||||||
|
add(rb, p, "ACGTAGATCGT");
|
||||||
|
add(rb, p, "ACGTCAATCGT");
|
||||||
|
add(rb, p, "ACGTCGCTCGT");
|
||||||
|
add(rb, p, "ACGTCGAACGT");
|
||||||
|
assert_eq(7, rb.size());
|
||||||
|
rb.clear();
|
||||||
|
p.clear();
|
||||||
|
// Another small test
|
||||||
|
add(rb, p, "ACGTCGATCGT");
|
||||||
|
add(rb, p, "CCGTCGATCGT");
|
||||||
|
add(rb, p, "TCGTCGATCGT");
|
||||||
|
add(rb, p, "GCGTCGATCGT");
|
||||||
|
add(rb, p, "AAGTCGATCGT");
|
||||||
|
assert_eq(5, rb.size());
|
||||||
|
rb.clear();
|
||||||
|
p.clear();
|
||||||
|
// Regression test (attempt to make it smaller)
|
||||||
|
add(rb, p, "CCTA");
|
||||||
|
add(rb, p, "AGAA");
|
||||||
|
add(rb, p, "TCTA");
|
||||||
|
add(rb, p, "GATC");
|
||||||
|
add(rb, p, "CTGC");
|
||||||
|
add(rb, p, "TTGC");
|
||||||
|
add(rb, p, "GCCG");
|
||||||
|
add(rb, p, "GGAT");
|
||||||
|
rb.clear();
|
||||||
|
p.clear();
|
||||||
|
// Regression test
|
||||||
|
add(rb, p, "CCTA");
|
||||||
|
add(rb, p, "AGAA");
|
||||||
|
add(rb, p, "TCTA");
|
||||||
|
add(rb, p, "GATC");
|
||||||
|
add(rb, p, "CTGC");
|
||||||
|
add(rb, p, "CATC");
|
||||||
|
add(rb, p, "CAAA");
|
||||||
|
add(rb, p, "CTAT");
|
||||||
|
add(rb, p, "CTCA");
|
||||||
|
add(rb, p, "TTGC");
|
||||||
|
add(rb, p, "GCCG");
|
||||||
|
add(rb, p, "GGAT");
|
||||||
|
assert_eq(12, rb.size());
|
||||||
|
rb.clear();
|
||||||
|
p.clear();
|
||||||
|
// Larger random test
|
||||||
|
EList<BTDnaString> strs;
|
||||||
|
char buf[5];
|
||||||
|
for(int i = 0; i < 4; i++) {
|
||||||
|
for(int j = 0; j < 4; j++) {
|
||||||
|
for(int k = 0; k < 4; k++) {
|
||||||
|
for(int m = 0; m < 4; m++) {
|
||||||
|
buf[0] = "ACGT"[i];
|
||||||
|
buf[1] = "ACGT"[j];
|
||||||
|
buf[2] = "ACGT"[k];
|
||||||
|
buf[3] = "ACGT"[m];
|
||||||
|
buf[4] = '\0';
|
||||||
|
strs.push_back(BTDnaString(buf, true));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Add all of the 4-mers in several different random orders
|
||||||
|
RandomSource rand;
|
||||||
|
for(uint32_t runs = 0; runs < 100; runs++) {
|
||||||
|
rb.clear();
|
||||||
|
p.clear();
|
||||||
|
assert_eq(0, rb.size());
|
||||||
|
rand.init(runs);
|
||||||
|
EList<bool> used;
|
||||||
|
used.resize(256);
|
||||||
|
for(int i = 0; i < 256; i++) used[i] = false;
|
||||||
|
for(int i = 0; i < 256; i++) {
|
||||||
|
int r = rand.nextU32() % (256-i);
|
||||||
|
int unused = 0;
|
||||||
|
bool added = false;
|
||||||
|
for(int j = 0; j < 256; j++) {
|
||||||
|
if(!used[j] && unused == r) {
|
||||||
|
used[j] = true;
|
||||||
|
QKey qk;
|
||||||
|
qk.init(strs[j]);
|
||||||
|
rb.add(p, qk, NULL);
|
||||||
|
added = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
if(!used[j]) unused++;
|
||||||
|
}
|
||||||
|
assert(added);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A way of feeding simply tests to the seed alignment infrastructure.
|
||||||
|
*/
|
||||||
|
int main(int argc, char **argv) {
|
||||||
|
int option_index = 0;
|
||||||
|
int next_option;
|
||||||
|
do {
|
||||||
|
next_option = getopt_long(argc, argv, short_opts, long_opts, &option_index);
|
||||||
|
switch (next_option) {
|
||||||
|
case 'v': gVerbose = true; break;
|
||||||
|
case ARG_TESTS: aligner_cache_tests(); return 0;
|
||||||
|
case -1: break;
|
||||||
|
default: {
|
||||||
|
cerr << "Unknown option: " << (char)next_option << endl;
|
||||||
|
printUsage(cerr);
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} while(next_option != -1);
|
||||||
|
}
|
||||||
|
#endif
|
1013
aligner_cache.h
Normal file
1013
aligner_cache.h
Normal file
File diff suppressed because it is too large
Load Diff
80
aligner_driver.cpp
Normal file
80
aligner_driver.cpp
Normal file
@ -0,0 +1,80 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2012, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "aligner_driver.h"
|
||||||
|
|
||||||
|
void AlignerDriverRootSelector::select(
|
||||||
|
const Read& q,
|
||||||
|
const Read* qo,
|
||||||
|
bool nofw,
|
||||||
|
bool norc,
|
||||||
|
EList<DescentConfig>& confs,
|
||||||
|
EList<DescentRoot>& roots)
|
||||||
|
{
|
||||||
|
// Calculate interval length for both mates
|
||||||
|
int interval = rootIval_.f<int>((double)q.length());
|
||||||
|
if(qo != NULL) {
|
||||||
|
// Boost interval length by 20% for paired-end reads
|
||||||
|
interval = (int)(interval * 1.2 + 0.5);
|
||||||
|
}
|
||||||
|
float pri = 0.0f;
|
||||||
|
for(int fwi = 0; fwi < 2; fwi++) {
|
||||||
|
bool fw = (fwi == 0);
|
||||||
|
if((fw && nofw) || (!fw && norc)) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
// Put down left-to-right roots w/r/t forward and reverse-complement reads
|
||||||
|
{
|
||||||
|
bool first = true;
|
||||||
|
size_t i = 0;
|
||||||
|
while(first || (i + landing_ <= q.length())) {
|
||||||
|
confs.expand();
|
||||||
|
confs.back().cons.init(landing_, consExp_);
|
||||||
|
roots.expand();
|
||||||
|
roots.back().init(
|
||||||
|
i, // offset from 5' end
|
||||||
|
true, // left-to-right?
|
||||||
|
fw, // fw?
|
||||||
|
q.length(), // query length
|
||||||
|
pri); // root priority
|
||||||
|
i += interval;
|
||||||
|
first = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Put down right-to-left roots w/r/t forward and reverse-complement reads
|
||||||
|
{
|
||||||
|
bool first = true;
|
||||||
|
size_t i = 0;
|
||||||
|
while(first || (i + landing_ <= q.length())) {
|
||||||
|
confs.expand();
|
||||||
|
confs.back().cons.init(landing_, consExp_);
|
||||||
|
roots.expand();
|
||||||
|
roots.back().init(
|
||||||
|
q.length() - i - 1, // offset from 5' end
|
||||||
|
false, // left-to-right?
|
||||||
|
fw, // fw?
|
||||||
|
q.length(), // query length
|
||||||
|
pri); // root priority
|
||||||
|
i += interval;
|
||||||
|
first = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
247
aligner_driver.h
Normal file
247
aligner_driver.h
Normal file
@ -0,0 +1,247 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2012, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
* aligner_driver.h
|
||||||
|
*
|
||||||
|
* REDUNDANT SEED HITS
|
||||||
|
*
|
||||||
|
* We say that two seed hits are redundant if they trigger identical
|
||||||
|
* seed-extend dynamic programming problems. Put another way, they both lie on
|
||||||
|
* the same diagonal of the overall read/reference dynamic programming matrix.
|
||||||
|
* Detecting redundant seed hits is simple when the seed hits are ungapped. We
|
||||||
|
* do this after offset resolution but before the offset is converted to genome
|
||||||
|
* coordinates (see uses of the seenDiags1_/seenDiags2_ fields for examples).
|
||||||
|
*
|
||||||
|
* REDUNDANT ALIGNMENTS
|
||||||
|
*
|
||||||
|
* In an unpaired context, we say that two alignments are redundant if they
|
||||||
|
* share any cells in the global DP table. Roughly speaking, this is like
|
||||||
|
* saying that two alignments are redundant if any read character aligns to the
|
||||||
|
* same reference character (same reference sequence, same strand, same offset)
|
||||||
|
* in both alignments.
|
||||||
|
*
|
||||||
|
* In a paired-end context, we say that two paired-end alignments are redundant
|
||||||
|
* if the mate #1s are redundant and the mate #2s are redundant.
|
||||||
|
*
|
||||||
|
* How do we enforce this? In the unpaired context, this is relatively simple:
|
||||||
|
* the cells from each alignment are checked against a set containing all cells
|
||||||
|
* from all previous alignments. Given a new alignment, for each cell in the
|
||||||
|
* new alignment we check whether it is in the set. If there is any overlap,
|
||||||
|
* the new alignment is rejected as redundant. Otherwise, the new alignment is
|
||||||
|
* accepted and its cells are added to the set.
|
||||||
|
*
|
||||||
|
* Enforcement in a paired context is a little trickier. Consider the
|
||||||
|
* following approaches:
|
||||||
|
*
|
||||||
|
* 1. Skip anchors that are redundant with any previous anchor or opposite
|
||||||
|
* alignment. This is sufficient to ensure no two concordant alignments
|
||||||
|
* found are redundant.
|
||||||
|
*
|
||||||
|
* 2. Same as scheme 1, but with a "transitive closure" scheme for finding all
|
||||||
|
* concordant pairs in the vicinity of an anchor. Consider the AB/AC
|
||||||
|
* scenario from the previous paragraph. If B is the anchor alignment, we
|
||||||
|
* will find AB but not AC. But under this scheme, once we find AB we then
|
||||||
|
* let B be a new anchor and immediately look for its opposites. Likewise,
|
||||||
|
* if we find any opposite, we make them anchors and continue searching. We
|
||||||
|
* don't stop searching until every opposite is used as an anchor.
|
||||||
|
*
|
||||||
|
* 3. Skip anchors that are redundant with any previous anchor alignment (but
|
||||||
|
* allow anchors that are redundant with previous opposite alignments).
|
||||||
|
* This isn't sufficient to avoid redundant concordant alignments. To avoid
|
||||||
|
* redundant concordants, we need an additional procedure that checks each
|
||||||
|
* new concordant alignment one-by-one against a list of previous concordant
|
||||||
|
* alignments to see if it is redundant.
|
||||||
|
*
|
||||||
|
* We take approach 1.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef ALIGNER_DRIVER_H_
|
||||||
|
#define ALIGNER_DRIVER_H_
|
||||||
|
|
||||||
|
#include "aligner_seed2.h"
|
||||||
|
#include "simple_func.h"
|
||||||
|
#include "aln_sink.h"
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Concrete subclass of DescentRootSelector. Puts a root every 'ival' chars,
|
||||||
|
* where 'ival' is determined by user-specified parameters. A root is filtered
|
||||||
|
* out if the end of the read is less than 'landing' positions away, in the
|
||||||
|
* direction of the search.
|
||||||
|
*/
|
||||||
|
class AlignerDriverRootSelector : public DescentRootSelector {
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
AlignerDriverRootSelector(
|
||||||
|
double consExp,
|
||||||
|
const SimpleFunc& rootIval,
|
||||||
|
size_t landing)
|
||||||
|
{
|
||||||
|
consExp_ = consExp;
|
||||||
|
rootIval_ = rootIval;
|
||||||
|
landing_ = landing;
|
||||||
|
}
|
||||||
|
|
||||||
|
virtual ~AlignerDriverRootSelector() { }
|
||||||
|
|
||||||
|
virtual void select(
|
||||||
|
const Read& q, // read that we're selecting roots for
|
||||||
|
const Read* qo, // opposite mate, if applicable
|
||||||
|
bool nofw, // don't add roots for fw read
|
||||||
|
bool norc, // don't add roots for rc read
|
||||||
|
EList<DescentConfig>& confs, // put DescentConfigs here
|
||||||
|
EList<DescentRoot>& roots); // put DescentRoot here
|
||||||
|
|
||||||
|
protected:
|
||||||
|
|
||||||
|
double consExp_;
|
||||||
|
SimpleFunc rootIval_;
|
||||||
|
size_t landing_;
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return values from extendSeeds and extendSeedsPaired.
|
||||||
|
*/
|
||||||
|
enum {
|
||||||
|
// Candidates were examined exhaustively
|
||||||
|
ALDRIVER_EXHAUSTED_CANDIDATES = 1,
|
||||||
|
// The policy does not need us to look any further
|
||||||
|
ALDRIVER_POLICY_FULFILLED,
|
||||||
|
// We stopped because we ran up against a limit on how much work we should
|
||||||
|
// do for one set of seed ranges, e.g. the limit on number of consecutive
|
||||||
|
// unproductive DP extensions
|
||||||
|
ALDRIVER_EXCEEDED_LIMIT
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This class is the glue between a DescentDriver and the dynamic programming
|
||||||
|
* implementations in Bowtie 2. The DescentDriver is used to find some very
|
||||||
|
* high-scoring alignments, but is additionally used to rank partial alignments
|
||||||
|
* so that they can be extended using dynamic programming.
|
||||||
|
*/
|
||||||
|
template <typename index_t>
|
||||||
|
class AlignerDriver {
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
AlignerDriver(
|
||||||
|
double consExp,
|
||||||
|
const SimpleFunc& rootIval,
|
||||||
|
size_t landing,
|
||||||
|
bool veryVerbose,
|
||||||
|
const SimpleFunc& totsz,
|
||||||
|
const SimpleFunc& totfmops) :
|
||||||
|
sel_(consExp, rootIval, landing),
|
||||||
|
alsel_(),
|
||||||
|
dr1_(veryVerbose),
|
||||||
|
dr2_(veryVerbose)
|
||||||
|
{
|
||||||
|
totsz_ = totsz;
|
||||||
|
totfmops_ = totfmops;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialize driver with respect to a new read or pair.
|
||||||
|
*/
|
||||||
|
void initRead(
|
||||||
|
const Read& q1,
|
||||||
|
bool nofw,
|
||||||
|
bool norc,
|
||||||
|
TAlScore minsc,
|
||||||
|
TAlScore maxpen,
|
||||||
|
const Read* q2)
|
||||||
|
{
|
||||||
|
dr1_.initRead(q1, nofw, norc, minsc, maxpen, q2, &sel_);
|
||||||
|
red1_.init(q1.length());
|
||||||
|
paired_ = false;
|
||||||
|
if(q2 != NULL) {
|
||||||
|
dr2_.initRead(*q2, nofw, norc, minsc, maxpen, &q1, &sel_);
|
||||||
|
red2_.init(q2->length());
|
||||||
|
paired_ = true;
|
||||||
|
} else {
|
||||||
|
dr2_.reset();
|
||||||
|
}
|
||||||
|
size_t totsz = totsz_.f<size_t>(q1.length());
|
||||||
|
size_t totfmops = totfmops_.f<size_t>(q1.length());
|
||||||
|
stop_.init(
|
||||||
|
totsz,
|
||||||
|
0,
|
||||||
|
true,
|
||||||
|
totfmops);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Start the driver. The driver will begin by conducting a best-first,
|
||||||
|
* index-assisted search through the space of possible full and partial
|
||||||
|
* alignments. This search may be followed up with a dynamic programming
|
||||||
|
* extension step, taking a prioritized set of partial SA ranges found
|
||||||
|
* during the search and extending each with DP. The process might also be
|
||||||
|
* iterated, with the search being occasioanally halted so that DPs can be
|
||||||
|
* tried, then restarted, etc.
|
||||||
|
*/
|
||||||
|
int go(
|
||||||
|
const Scoring& sc,
|
||||||
|
const GFM<index_t>& gfmFw,
|
||||||
|
const GFM<index_t>& gfmBw,
|
||||||
|
const BitPairReference& ref,
|
||||||
|
DescentMetrics& met,
|
||||||
|
WalkMetrics& wlm,
|
||||||
|
PerReadMetrics& prm,
|
||||||
|
RandomSource& rnd,
|
||||||
|
AlnSinkWrap<index_t>& sink);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Reset state of all DescentDrivers.
|
||||||
|
*/
|
||||||
|
void reset() {
|
||||||
|
dr1_.reset();
|
||||||
|
dr2_.reset();
|
||||||
|
red1_.reset();
|
||||||
|
red2_.reset();
|
||||||
|
}
|
||||||
|
|
||||||
|
protected:
|
||||||
|
|
||||||
|
AlignerDriverRootSelector sel_; // selects where roots should go
|
||||||
|
DescentAlignmentSelector<index_t> alsel_; // one selector can deal with >1 drivers
|
||||||
|
DescentDriver<index_t> dr1_; // driver for mate 1/unpaired reads
|
||||||
|
DescentDriver<index_t> dr2_; // driver for paired-end reads
|
||||||
|
DescentStoppingConditions stop_; // when to pause index-assisted BFS
|
||||||
|
bool paired_; // current read is paired?
|
||||||
|
|
||||||
|
SimpleFunc totsz_; // memory limit on best-first search data
|
||||||
|
SimpleFunc totfmops_; // max # FM ops for best-first search
|
||||||
|
|
||||||
|
// For detecting redundant alignments
|
||||||
|
RedundantAlns red1_; // database of cells used for mate 1 alignments
|
||||||
|
RedundantAlns red2_; // database of cells used for mate 2 alignments
|
||||||
|
|
||||||
|
// For AlnRes::matchesRef
|
||||||
|
ASSERT_ONLY(SStringExpandable<char> raw_refbuf_);
|
||||||
|
ASSERT_ONLY(SStringExpandable<uint32_t> raw_destU32_);
|
||||||
|
ASSERT_ONLY(EList<bool> raw_matches_);
|
||||||
|
ASSERT_ONLY(BTDnaString tmp_rf_);
|
||||||
|
ASSERT_ONLY(BTDnaString tmp_rdseq_);
|
||||||
|
ASSERT_ONLY(BTString tmp_qseq_);
|
||||||
|
ASSERT_ONLY(EList<index_t> tmp_reflens_);
|
||||||
|
ASSERT_ONLY(EList<index_t> tmp_refoffs_);
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif /* defined(ALIGNER_DRIVER_H_) */
|
352
aligner_metrics.h
Normal file
352
aligner_metrics.h
Normal file
@ -0,0 +1,352 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef ALIGNER_METRICS_H_
|
||||||
|
#define ALIGNER_METRICS_H_
|
||||||
|
|
||||||
|
#include <math.h>
|
||||||
|
#include <iostream>
|
||||||
|
#include "alphabet.h"
|
||||||
|
#include "timer.h"
|
||||||
|
#include "sstring.h"
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Borrowed from http://www.johndcook.com/standard_deviation.html,
|
||||||
|
* which in turn is borrowed from Knuth.
|
||||||
|
*/
|
||||||
|
class RunningStat {
|
||||||
|
public:
|
||||||
|
RunningStat() : m_n(0), m_tot(0.0) { }
|
||||||
|
|
||||||
|
void clear() {
|
||||||
|
m_n = 0;
|
||||||
|
m_tot = 0.0;
|
||||||
|
}
|
||||||
|
|
||||||
|
void push(float x) {
|
||||||
|
m_n++;
|
||||||
|
m_tot += x;
|
||||||
|
// See Knuth TAOCP vol 2, 3rd edition, page 232
|
||||||
|
if (m_n == 1) {
|
||||||
|
m_oldM = m_newM = x;
|
||||||
|
m_oldS = 0.0;
|
||||||
|
} else {
|
||||||
|
m_newM = m_oldM + (x - m_oldM)/m_n;
|
||||||
|
m_newS = m_oldS + (x - m_oldM)*(x - m_newM);
|
||||||
|
// set up for next iteration
|
||||||
|
m_oldM = m_newM;
|
||||||
|
m_oldS = m_newS;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
int num() const {
|
||||||
|
return m_n;
|
||||||
|
}
|
||||||
|
|
||||||
|
double tot() const {
|
||||||
|
return m_tot;
|
||||||
|
}
|
||||||
|
|
||||||
|
double mean() const {
|
||||||
|
return (m_n > 0) ? m_newM : 0.0;
|
||||||
|
}
|
||||||
|
|
||||||
|
double variance() const {
|
||||||
|
return ( (m_n > 1) ? m_newS/(m_n - 1) : 0.0 );
|
||||||
|
}
|
||||||
|
|
||||||
|
double stddev() const {
|
||||||
|
return sqrt(variance());
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
int m_n;
|
||||||
|
double m_tot;
|
||||||
|
double m_oldM, m_newM, m_oldS, m_newS;
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Encapsulates a set of metrics that we would like an aligner to keep
|
||||||
|
* track of, so that we can possibly use it to diagnose performance
|
||||||
|
* issues.
|
||||||
|
*/
|
||||||
|
class AlignerMetrics {
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
AlignerMetrics() :
|
||||||
|
curBacktracks_(0),
|
||||||
|
curBwtOps_(0),
|
||||||
|
first_(true),
|
||||||
|
curIsLowEntropy_(false),
|
||||||
|
curIsHomoPoly_(false),
|
||||||
|
curHadRanges_(false),
|
||||||
|
curNumNs_(0),
|
||||||
|
reads_(0),
|
||||||
|
homoReads_(0),
|
||||||
|
lowEntReads_(0),
|
||||||
|
hiEntReads_(0),
|
||||||
|
alignedReads_(0),
|
||||||
|
unalignedReads_(0),
|
||||||
|
threeOrMoreNReads_(0),
|
||||||
|
lessThanThreeNRreads_(0),
|
||||||
|
bwtOpsPerRead_(),
|
||||||
|
backtracksPerRead_(),
|
||||||
|
bwtOpsPerHomoRead_(),
|
||||||
|
backtracksPerHomoRead_(),
|
||||||
|
bwtOpsPerLoEntRead_(),
|
||||||
|
backtracksPerLoEntRead_(),
|
||||||
|
bwtOpsPerHiEntRead_(),
|
||||||
|
backtracksPerHiEntRead_(),
|
||||||
|
bwtOpsPerAlignedRead_(),
|
||||||
|
backtracksPerAlignedRead_(),
|
||||||
|
bwtOpsPerUnalignedRead_(),
|
||||||
|
backtracksPerUnalignedRead_(),
|
||||||
|
bwtOpsPer0nRead_(),
|
||||||
|
backtracksPer0nRead_(),
|
||||||
|
bwtOpsPer1nRead_(),
|
||||||
|
backtracksPer1nRead_(),
|
||||||
|
bwtOpsPer2nRead_(),
|
||||||
|
backtracksPer2nRead_(),
|
||||||
|
bwtOpsPer3orMoreNRead_(),
|
||||||
|
backtracksPer3orMoreNRead_(),
|
||||||
|
timer_(cout, "", false)
|
||||||
|
{ }
|
||||||
|
|
||||||
|
void printSummary() {
|
||||||
|
if(!first_) {
|
||||||
|
finishRead();
|
||||||
|
}
|
||||||
|
cout << "AlignerMetrics:" << endl;
|
||||||
|
cout << " # Reads: " << reads_ << endl;
|
||||||
|
float hopct = (reads_ > 0) ? (((float)homoReads_)/((float)reads_)) : (0.0f);
|
||||||
|
hopct *= 100.0f;
|
||||||
|
cout << " % homo-polymeric: " << (hopct) << endl;
|
||||||
|
float lopct = (reads_ > 0) ? ((float)lowEntReads_/(float)(reads_)) : (0.0f);
|
||||||
|
lopct *= 100.0f;
|
||||||
|
cout << " % low-entropy: " << (lopct) << endl;
|
||||||
|
float unpct = (reads_ > 0) ? ((float)unalignedReads_/(float)(reads_)) : (0.0f);
|
||||||
|
unpct *= 100.0f;
|
||||||
|
cout << " % unaligned: " << (unpct) << endl;
|
||||||
|
float npct = (reads_ > 0) ? ((float)threeOrMoreNReads_/(float)(reads_)) : (0.0f);
|
||||||
|
npct *= 100.0f;
|
||||||
|
cout << " % with 3 or more Ns: " << (npct) << endl;
|
||||||
|
cout << endl;
|
||||||
|
cout << " Total BWT ops: avg: " << bwtOpsPerRead_.mean() << ", stddev: " << bwtOpsPerRead_.stddev() << endl;
|
||||||
|
cout << " Total Backtracks: avg: " << backtracksPerRead_.mean() << ", stddev: " << backtracksPerRead_.stddev() << endl;
|
||||||
|
time_t elapsed = timer_.elapsed();
|
||||||
|
cout << " BWT ops per second: " << (bwtOpsPerRead_.tot()/elapsed) << endl;
|
||||||
|
cout << " Backtracks per second: " << (backtracksPerRead_.tot()/elapsed) << endl;
|
||||||
|
cout << endl;
|
||||||
|
cout << " Homo-poly:" << endl;
|
||||||
|
cout << " BWT ops: avg: " << bwtOpsPerHomoRead_.mean() << ", stddev: " << bwtOpsPerHomoRead_.stddev() << endl;
|
||||||
|
cout << " Backtracks: avg: " << backtracksPerHomoRead_.mean() << ", stddev: " << backtracksPerHomoRead_.stddev() << endl;
|
||||||
|
cout << " Low-entropy:" << endl;
|
||||||
|
cout << " BWT ops: avg: " << bwtOpsPerLoEntRead_.mean() << ", stddev: " << bwtOpsPerLoEntRead_.stddev() << endl;
|
||||||
|
cout << " Backtracks: avg: " << backtracksPerLoEntRead_.mean() << ", stddev: " << backtracksPerLoEntRead_.stddev() << endl;
|
||||||
|
cout << " High-entropy:" << endl;
|
||||||
|
cout << " BWT ops: avg: " << bwtOpsPerHiEntRead_.mean() << ", stddev: " << bwtOpsPerHiEntRead_.stddev() << endl;
|
||||||
|
cout << " Backtracks: avg: " << backtracksPerHiEntRead_.mean() << ", stddev: " << backtracksPerHiEntRead_.stddev() << endl;
|
||||||
|
cout << endl;
|
||||||
|
cout << " Unaligned:" << endl;
|
||||||
|
cout << " BWT ops: avg: " << bwtOpsPerUnalignedRead_.mean() << ", stddev: " << bwtOpsPerUnalignedRead_.stddev() << endl;
|
||||||
|
cout << " Backtracks: avg: " << backtracksPerUnalignedRead_.mean() << ", stddev: " << backtracksPerUnalignedRead_.stddev() << endl;
|
||||||
|
cout << " Aligned:" << endl;
|
||||||
|
cout << " BWT ops: avg: " << bwtOpsPerAlignedRead_.mean() << ", stddev: " << bwtOpsPerAlignedRead_.stddev() << endl;
|
||||||
|
cout << " Backtracks: avg: " << backtracksPerAlignedRead_.mean() << ", stddev: " << backtracksPerAlignedRead_.stddev() << endl;
|
||||||
|
cout << endl;
|
||||||
|
cout << " 0 Ns:" << endl;
|
||||||
|
cout << " BWT ops: avg: " << bwtOpsPer0nRead_.mean() << ", stddev: " << bwtOpsPer0nRead_.stddev() << endl;
|
||||||
|
cout << " Backtracks: avg: " << backtracksPer0nRead_.mean() << ", stddev: " << backtracksPer0nRead_.stddev() << endl;
|
||||||
|
cout << " 1 N:" << endl;
|
||||||
|
cout << " BWT ops: avg: " << bwtOpsPer1nRead_.mean() << ", stddev: " << bwtOpsPer1nRead_.stddev() << endl;
|
||||||
|
cout << " Backtracks: avg: " << backtracksPer1nRead_.mean() << ", stddev: " << backtracksPer1nRead_.stddev() << endl;
|
||||||
|
cout << " 2 Ns:" << endl;
|
||||||
|
cout << " BWT ops: avg: " << bwtOpsPer2nRead_.mean() << ", stddev: " << bwtOpsPer2nRead_.stddev() << endl;
|
||||||
|
cout << " Backtracks: avg: " << backtracksPer2nRead_.mean() << ", stddev: " << backtracksPer2nRead_.stddev() << endl;
|
||||||
|
cout << " >2 Ns:" << endl;
|
||||||
|
cout << " BWT ops: avg: " << bwtOpsPer3orMoreNRead_.mean() << ", stddev: " << bwtOpsPer3orMoreNRead_.stddev() << endl;
|
||||||
|
cout << " Backtracks: avg: " << backtracksPer3orMoreNRead_.mean() << ", stddev: " << backtracksPer3orMoreNRead_.stddev() << endl;
|
||||||
|
cout << endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
void nextRead(const BTDnaString& read) {
|
||||||
|
if(!first_) {
|
||||||
|
finishRead();
|
||||||
|
}
|
||||||
|
first_ = false;
|
||||||
|
//float ent = entropyDna5(read);
|
||||||
|
float ent = 0.0f;
|
||||||
|
curIsLowEntropy_ = (ent < 0.75f);
|
||||||
|
curIsHomoPoly_ = (ent < 0.001f);
|
||||||
|
curHadRanges_ = false;
|
||||||
|
curBwtOps_ = 0;
|
||||||
|
curBacktracks_ = 0;
|
||||||
|
// Count Ns
|
||||||
|
curNumNs_ = 0;
|
||||||
|
const size_t len = read.length();
|
||||||
|
for(size_t i = 0; i < len; i++) {
|
||||||
|
if((int)read[i] == 4) curNumNs_++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
void setReadHasRange() {
|
||||||
|
curHadRanges_ = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Commit the running statistics for this read to
|
||||||
|
*/
|
||||||
|
void finishRead() {
|
||||||
|
reads_++;
|
||||||
|
if(curIsHomoPoly_) homoReads_++;
|
||||||
|
else if(curIsLowEntropy_) lowEntReads_++;
|
||||||
|
else hiEntReads_++;
|
||||||
|
if(curHadRanges_) alignedReads_++;
|
||||||
|
else unalignedReads_++;
|
||||||
|
bwtOpsPerRead_.push((float)curBwtOps_);
|
||||||
|
backtracksPerRead_.push((float)curBacktracks_);
|
||||||
|
// Drill down by entropy
|
||||||
|
if(curIsHomoPoly_) {
|
||||||
|
bwtOpsPerHomoRead_.push((float)curBwtOps_);
|
||||||
|
backtracksPerHomoRead_.push((float)curBacktracks_);
|
||||||
|
} else if(curIsLowEntropy_) {
|
||||||
|
bwtOpsPerLoEntRead_.push((float)curBwtOps_);
|
||||||
|
backtracksPerLoEntRead_.push((float)curBacktracks_);
|
||||||
|
} else {
|
||||||
|
bwtOpsPerHiEntRead_.push((float)curBwtOps_);
|
||||||
|
backtracksPerHiEntRead_.push((float)curBacktracks_);
|
||||||
|
}
|
||||||
|
// Drill down by whether it aligned
|
||||||
|
if(curHadRanges_) {
|
||||||
|
bwtOpsPerAlignedRead_.push((float)curBwtOps_);
|
||||||
|
backtracksPerAlignedRead_.push((float)curBacktracks_);
|
||||||
|
} else {
|
||||||
|
bwtOpsPerUnalignedRead_.push((float)curBwtOps_);
|
||||||
|
backtracksPerUnalignedRead_.push((float)curBacktracks_);
|
||||||
|
}
|
||||||
|
if(curNumNs_ == 0) {
|
||||||
|
lessThanThreeNRreads_++;
|
||||||
|
bwtOpsPer0nRead_.push((float)curBwtOps_);
|
||||||
|
backtracksPer0nRead_.push((float)curBacktracks_);
|
||||||
|
} else if(curNumNs_ == 1) {
|
||||||
|
lessThanThreeNRreads_++;
|
||||||
|
bwtOpsPer1nRead_.push((float)curBwtOps_);
|
||||||
|
backtracksPer1nRead_.push((float)curBacktracks_);
|
||||||
|
} else if(curNumNs_ == 2) {
|
||||||
|
lessThanThreeNRreads_++;
|
||||||
|
bwtOpsPer2nRead_.push((float)curBwtOps_);
|
||||||
|
backtracksPer2nRead_.push((float)curBacktracks_);
|
||||||
|
} else {
|
||||||
|
threeOrMoreNReads_++;
|
||||||
|
bwtOpsPer3orMoreNRead_.push((float)curBwtOps_);
|
||||||
|
backtracksPer3orMoreNRead_.push((float)curBacktracks_);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Running-total of the number of backtracks and BWT ops for the
|
||||||
|
// current read
|
||||||
|
uint32_t curBacktracks_;
|
||||||
|
uint32_t curBwtOps_;
|
||||||
|
|
||||||
|
protected:
|
||||||
|
|
||||||
|
bool first_;
|
||||||
|
|
||||||
|
// true iff the current read is low entropy
|
||||||
|
bool curIsLowEntropy_;
|
||||||
|
// true if current read is all 1 char (or very close)
|
||||||
|
bool curIsHomoPoly_;
|
||||||
|
// true iff the current read has had one or more ranges reported
|
||||||
|
bool curHadRanges_;
|
||||||
|
// number of Ns in current read
|
||||||
|
int curNumNs_;
|
||||||
|
|
||||||
|
// # reads
|
||||||
|
uint32_t reads_;
|
||||||
|
// # homo-poly reads
|
||||||
|
uint32_t homoReads_;
|
||||||
|
// # low-entropy reads
|
||||||
|
uint32_t lowEntReads_;
|
||||||
|
// # high-entropy reads
|
||||||
|
uint32_t hiEntReads_;
|
||||||
|
// # reads with alignments
|
||||||
|
uint32_t alignedReads_;
|
||||||
|
// # reads without alignments
|
||||||
|
uint32_t unalignedReads_;
|
||||||
|
// # reads with 3 or more Ns
|
||||||
|
uint32_t threeOrMoreNReads_;
|
||||||
|
// # reads with < 3 Ns
|
||||||
|
uint32_t lessThanThreeNRreads_;
|
||||||
|
|
||||||
|
// Distribution of BWT operations per read
|
||||||
|
RunningStat bwtOpsPerRead_;
|
||||||
|
RunningStat backtracksPerRead_;
|
||||||
|
|
||||||
|
// Distribution of BWT operations per homo-poly read
|
||||||
|
RunningStat bwtOpsPerHomoRead_;
|
||||||
|
RunningStat backtracksPerHomoRead_;
|
||||||
|
|
||||||
|
// Distribution of BWT operations per low-entropy read
|
||||||
|
RunningStat bwtOpsPerLoEntRead_;
|
||||||
|
RunningStat backtracksPerLoEntRead_;
|
||||||
|
|
||||||
|
// Distribution of BWT operations per high-entropy read
|
||||||
|
RunningStat bwtOpsPerHiEntRead_;
|
||||||
|
RunningStat backtracksPerHiEntRead_;
|
||||||
|
|
||||||
|
// Distribution of BWT operations per read that "aligned" (for
|
||||||
|
// which a range was arrived at - range may not have necessarily
|
||||||
|
// lead to an alignment)
|
||||||
|
RunningStat bwtOpsPerAlignedRead_;
|
||||||
|
RunningStat backtracksPerAlignedRead_;
|
||||||
|
|
||||||
|
// Distribution of BWT operations per read that didn't align
|
||||||
|
RunningStat bwtOpsPerUnalignedRead_;
|
||||||
|
RunningStat backtracksPerUnalignedRead_;
|
||||||
|
|
||||||
|
// Distribution of BWT operations/backtracks per read with no Ns
|
||||||
|
RunningStat bwtOpsPer0nRead_;
|
||||||
|
RunningStat backtracksPer0nRead_;
|
||||||
|
|
||||||
|
// Distribution of BWT operations/backtracks per read with one N
|
||||||
|
RunningStat bwtOpsPer1nRead_;
|
||||||
|
RunningStat backtracksPer1nRead_;
|
||||||
|
|
||||||
|
// Distribution of BWT operations/backtracks per read with two Ns
|
||||||
|
RunningStat bwtOpsPer2nRead_;
|
||||||
|
RunningStat backtracksPer2nRead_;
|
||||||
|
|
||||||
|
// Distribution of BWT operations/backtracks per read with three or
|
||||||
|
// more Ns
|
||||||
|
RunningStat bwtOpsPer3orMoreNRead_;
|
||||||
|
RunningStat backtracksPer3orMoreNRead_;
|
||||||
|
|
||||||
|
Timer timer_;
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif /* ALIGNER_METRICS_H_ */
|
35
aligner_report.h
Normal file
35
aligner_report.h
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef ALIGNER_REPORT_H_
|
||||||
|
#define ALIGNER_REPORT_H_
|
||||||
|
|
||||||
|
#include "aligner_cache.h"
|
||||||
|
|
||||||
|
class Reporter {
|
||||||
|
public:
|
||||||
|
/**
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
bool report(const AlignmentCacheIface<uint32_t>& cache, const QVal<uint32_t>& qv) {
|
||||||
|
return true; // don't retry
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif /*ALIGNER_REPORT_H_*/
|
2162
aligner_result.cpp
Normal file
2162
aligner_result.cpp
Normal file
File diff suppressed because it is too large
Load Diff
2325
aligner_result.h
Normal file
2325
aligner_result.h
Normal file
File diff suppressed because it is too large
Load Diff
530
aligner_seed.cpp
Normal file
530
aligner_seed.cpp
Normal file
@ -0,0 +1,530 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "aligner_cache.h"
|
||||||
|
#include "aligner_seed.h"
|
||||||
|
#include "search_globals.h"
|
||||||
|
#include "gfm.h"
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Construct a constraint with no edits of any kind allowed.
|
||||||
|
*/
|
||||||
|
Constraint Constraint::exact() {
|
||||||
|
Constraint c;
|
||||||
|
c.edits = c.mms = c.ins = c.dels = c.penalty = 0;
|
||||||
|
return c;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Construct a constraint where the only constraint is a total
|
||||||
|
* penalty constraint.
|
||||||
|
*/
|
||||||
|
Constraint Constraint::penaltyBased(int pen) {
|
||||||
|
Constraint c;
|
||||||
|
c.penalty = pen;
|
||||||
|
return c;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Construct a constraint where the only constraint is a total
|
||||||
|
* penalty constraint related to the length of the read.
|
||||||
|
*/
|
||||||
|
Constraint Constraint::penaltyFuncBased(const SimpleFunc& f) {
|
||||||
|
Constraint c;
|
||||||
|
c.penFunc = f;
|
||||||
|
return c;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Construct a constraint where the only constraint is a total
|
||||||
|
* penalty constraint.
|
||||||
|
*/
|
||||||
|
Constraint Constraint::mmBased(int mms) {
|
||||||
|
Constraint c;
|
||||||
|
c.mms = mms;
|
||||||
|
c.edits = c.dels = c.ins = 0;
|
||||||
|
return c;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Construct a constraint where the only constraint is a total
|
||||||
|
* penalty constraint.
|
||||||
|
*/
|
||||||
|
Constraint Constraint::editBased(int edits) {
|
||||||
|
Constraint c;
|
||||||
|
c.edits = edits;
|
||||||
|
c.dels = c.ins = c.mms = 0;
|
||||||
|
return c;
|
||||||
|
}
|
||||||
|
|
||||||
|
//
|
||||||
|
// Some static methods for constructing some standard SeedPolicies
|
||||||
|
//
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Given a read, depth and orientation, extract a seed data structure
|
||||||
|
* from the read and fill in the steps & zones arrays. The Seed
|
||||||
|
* contains the sequence and quality values.
|
||||||
|
*/
|
||||||
|
bool
|
||||||
|
Seed::instantiate(
|
||||||
|
const Read& read,
|
||||||
|
const BTDnaString& seq, // seed read sequence
|
||||||
|
const BTString& qual, // seed quality sequence
|
||||||
|
const Scoring& pens,
|
||||||
|
int depth,
|
||||||
|
int seedoffidx,
|
||||||
|
int seedtypeidx,
|
||||||
|
bool fw,
|
||||||
|
InstantiatedSeed& is) const
|
||||||
|
{
|
||||||
|
assert(overall != NULL);
|
||||||
|
int seedlen = len;
|
||||||
|
if((int)read.length() < seedlen) {
|
||||||
|
// Shrink seed length to fit read if necessary
|
||||||
|
seedlen = (int)read.length();
|
||||||
|
}
|
||||||
|
assert_gt(seedlen, 0);
|
||||||
|
is.steps.resize(seedlen);
|
||||||
|
is.zones.resize(seedlen);
|
||||||
|
// Fill in 'steps' and 'zones'
|
||||||
|
//
|
||||||
|
// The 'steps' list indicates which read character should be
|
||||||
|
// incorporated at each step of the search process. Often we will
|
||||||
|
// simply proceed from one end to the other, in which case the
|
||||||
|
// 'steps' list is ascending or descending. In some cases (e.g.
|
||||||
|
// the 2mm case), we might want to switch directions at least once
|
||||||
|
// during the search, in which case 'steps' will jump in the
|
||||||
|
// middle. When an element of the 'steps' list is negative, this
|
||||||
|
// indicates that the next
|
||||||
|
//
|
||||||
|
// The 'zones' list indicates which zone constraint is active at
|
||||||
|
// each step. Each element of the 'zones' list is a pair; the
|
||||||
|
// first pair element indicates the applicable zone when
|
||||||
|
// considering either mismatch or delete (ref gap) events, while
|
||||||
|
// the second pair element indicates the applicable zone when
|
||||||
|
// considering insertion (read gap) events. When either pair
|
||||||
|
// element is a negative number, that indicates that we are about
|
||||||
|
// to leave the zone for good, at which point we may need to
|
||||||
|
// evaluate whether we have reached the zone's budget.
|
||||||
|
//
|
||||||
|
switch(type) {
|
||||||
|
case SEED_TYPE_EXACT: {
|
||||||
|
for(int k = 0; k < seedlen; k++) {
|
||||||
|
is.steps[k] = -(seedlen - k);
|
||||||
|
// Zone 0 all the way
|
||||||
|
is.zones[k].first = is.zones[k].second = 0;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
case SEED_TYPE_LEFT_TO_RIGHT: {
|
||||||
|
for(int k = 0; k < seedlen; k++) {
|
||||||
|
is.steps[k] = k+1;
|
||||||
|
// Zone 0 from 0 up to ceil(len/2), then 1
|
||||||
|
is.zones[k].first = is.zones[k].second = ((k < (seedlen+1)/2) ? 0 : 1);
|
||||||
|
}
|
||||||
|
// Zone 1 ends at the RHS
|
||||||
|
is.zones[seedlen-1].first = is.zones[seedlen-1].second = -1;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
case SEED_TYPE_RIGHT_TO_LEFT: {
|
||||||
|
for(int k = 0; k < seedlen; k++) {
|
||||||
|
is.steps[k] = -(seedlen - k);
|
||||||
|
// Zone 0 from 0 up to floor(len/2), then 1
|
||||||
|
is.zones[k].first = ((k < seedlen/2) ? 0 : 1);
|
||||||
|
// Inserts: Zone 0 from 0 up to ceil(len/2)-1, then 1
|
||||||
|
is.zones[k].second = ((k < (seedlen+1)/2+1) ? 0 : 1);
|
||||||
|
}
|
||||||
|
is.zones[seedlen-1].first = is.zones[seedlen-1].second = -1;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
case SEED_TYPE_INSIDE_OUT: {
|
||||||
|
// Zone 0 from ceil(N/4) up to N-floor(N/4)
|
||||||
|
int step = 0;
|
||||||
|
for(int k = (seedlen+3)/4; k < seedlen - (seedlen/4); k++) {
|
||||||
|
is.zones[step].first = is.zones[step].second = 0;
|
||||||
|
is.steps[step++] = k+1;
|
||||||
|
}
|
||||||
|
// Zone 1 from N-floor(N/4) up
|
||||||
|
for(int k = seedlen - (seedlen/4); k < seedlen; k++) {
|
||||||
|
is.zones[step].first = is.zones[step].second = 1;
|
||||||
|
is.steps[step++] = k+1;
|
||||||
|
}
|
||||||
|
// No Zone 1 if seedlen is short (like 2)
|
||||||
|
//assert_eq(1, is.zones[step-1].first);
|
||||||
|
is.zones[step-1].first = is.zones[step-1].second = -1;
|
||||||
|
// Zone 2 from ((seedlen+3)/4)-1 down to 0
|
||||||
|
for(int k = ((seedlen+3)/4)-1; k >= 0; k--) {
|
||||||
|
is.zones[step].first = is.zones[step].second = 2;
|
||||||
|
is.steps[step++] = -(k+1);
|
||||||
|
}
|
||||||
|
assert_eq(2, is.zones[step-1].first);
|
||||||
|
is.zones[step-1].first = is.zones[step-1].second = -2;
|
||||||
|
assert_eq(seedlen, step);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
default:
|
||||||
|
throw 1;
|
||||||
|
}
|
||||||
|
// Instantiate constraints
|
||||||
|
for(int i = 0; i < 3; i++) {
|
||||||
|
is.cons[i] = zones[i];
|
||||||
|
is.cons[i].instantiate(read.length());
|
||||||
|
}
|
||||||
|
is.overall = *overall;
|
||||||
|
is.overall.instantiate(read.length());
|
||||||
|
// Take a sweep through the seed sequence. Consider where the Ns
|
||||||
|
// occur and how zones are laid out. Calculate the maximum number
|
||||||
|
// of positions we can jump over initially (e.g. with the ftab) and
|
||||||
|
// perhaps set this function's return value to false, indicating
|
||||||
|
// that the arrangements of Ns prevents the seed from aligning.
|
||||||
|
bool streak = true;
|
||||||
|
is.maxjump = 0;
|
||||||
|
bool ret = true;
|
||||||
|
bool ltr = (is.steps[0] > 0); // true -> left-to-right
|
||||||
|
for(size_t i = 0; i < is.steps.size(); i++) {
|
||||||
|
assert_neq(0, is.steps[i]);
|
||||||
|
int off = is.steps[i];
|
||||||
|
off = abs(off)-1;
|
||||||
|
Constraint& cons = is.cons[abs(is.zones[i].first)];
|
||||||
|
int c = seq[off]; assert_range(0, 4, c);
|
||||||
|
int q = qual[off];
|
||||||
|
if(ltr != (is.steps[i] > 0) || // changed direction
|
||||||
|
is.zones[i].first != 0 || // changed zone
|
||||||
|
is.zones[i].second != 0) // changed zone
|
||||||
|
{
|
||||||
|
streak = false;
|
||||||
|
}
|
||||||
|
if(c == 4) {
|
||||||
|
// Induced mismatch
|
||||||
|
if(cons.canN(q, pens)) {
|
||||||
|
cons.chargeN(q, pens);
|
||||||
|
} else {
|
||||||
|
// Seed disqualified due to arrangement of Ns
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if(streak) is.maxjump++;
|
||||||
|
}
|
||||||
|
is.seedoff = depth;
|
||||||
|
is.seedoffidx = seedoffidx;
|
||||||
|
is.fw = fw;
|
||||||
|
is.s = *this;
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return a set consisting of 1 seed encapsulating an exact matching
|
||||||
|
* strategy.
|
||||||
|
*/
|
||||||
|
void
|
||||||
|
Seed::zeroMmSeeds(int ln, EList<Seed>& pols, Constraint& oall) {
|
||||||
|
oall.init();
|
||||||
|
// Seed policy 1: left-to-right search
|
||||||
|
pols.expand();
|
||||||
|
pols.back().len = ln;
|
||||||
|
pols.back().type = SEED_TYPE_EXACT;
|
||||||
|
pols.back().zones[0] = Constraint::exact();
|
||||||
|
pols.back().zones[1] = Constraint::exact();
|
||||||
|
pols.back().zones[2] = Constraint::exact(); // not used
|
||||||
|
pols.back().overall = &oall;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return a set of 2 seeds encapsulating a half-and-half 1mm strategy.
|
||||||
|
*/
|
||||||
|
void
|
||||||
|
Seed::oneMmSeeds(int ln, EList<Seed>& pols, Constraint& oall) {
|
||||||
|
oall.init();
|
||||||
|
// Seed policy 1: left-to-right search
|
||||||
|
pols.expand();
|
||||||
|
pols.back().len = ln;
|
||||||
|
pols.back().type = SEED_TYPE_LEFT_TO_RIGHT;
|
||||||
|
pols.back().zones[0] = Constraint::exact();
|
||||||
|
pols.back().zones[1] = Constraint::mmBased(1);
|
||||||
|
pols.back().zones[2] = Constraint::exact(); // not used
|
||||||
|
pols.back().overall = &oall;
|
||||||
|
// Seed policy 2: right-to-left search
|
||||||
|
pols.expand();
|
||||||
|
pols.back().len = ln;
|
||||||
|
pols.back().type = SEED_TYPE_RIGHT_TO_LEFT;
|
||||||
|
pols.back().zones[0] = Constraint::exact();
|
||||||
|
pols.back().zones[1] = Constraint::mmBased(1);
|
||||||
|
pols.back().zones[1].mmsCeil = 0;
|
||||||
|
pols.back().zones[2] = Constraint::exact(); // not used
|
||||||
|
pols.back().overall = &oall;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return a set of 3 seeds encapsulating search roots for:
|
||||||
|
*
|
||||||
|
* 1. Starting from the left-hand side and searching toward the
|
||||||
|
* right-hand side allowing 2 mismatches in the right half.
|
||||||
|
* 2. Starting from the right-hand side and searching toward the
|
||||||
|
* left-hand side allowing 2 mismatches in the left half.
|
||||||
|
* 3. Starting (effectively) from the center and searching out toward
|
||||||
|
* both the left and right-hand sides, allowing one mismatch on
|
||||||
|
* either side.
|
||||||
|
*
|
||||||
|
* This is not exhaustive. There are 2 mismatch cases mised; if you
|
||||||
|
* imagine the seed as divided into four successive quarters A, B, C
|
||||||
|
* and D, the cases we miss are when mismatches occur in A and C or B
|
||||||
|
* and D.
|
||||||
|
*/
|
||||||
|
void
|
||||||
|
Seed::twoMmSeeds(int ln, EList<Seed>& pols, Constraint& oall) {
|
||||||
|
oall.init();
|
||||||
|
// Seed policy 1: left-to-right search
|
||||||
|
pols.expand();
|
||||||
|
pols.back().len = ln;
|
||||||
|
pols.back().type = SEED_TYPE_LEFT_TO_RIGHT;
|
||||||
|
pols.back().zones[0] = Constraint::exact();
|
||||||
|
pols.back().zones[1] = Constraint::mmBased(2);
|
||||||
|
pols.back().zones[2] = Constraint::exact(); // not used
|
||||||
|
pols.back().overall = &oall;
|
||||||
|
// Seed policy 2: right-to-left search
|
||||||
|
pols.expand();
|
||||||
|
pols.back().len = ln;
|
||||||
|
pols.back().type = SEED_TYPE_RIGHT_TO_LEFT;
|
||||||
|
pols.back().zones[0] = Constraint::exact();
|
||||||
|
pols.back().zones[1] = Constraint::mmBased(2);
|
||||||
|
pols.back().zones[1].mmsCeil = 1; // Must have used at least 1 mismatch
|
||||||
|
pols.back().zones[2] = Constraint::exact(); // not used
|
||||||
|
pols.back().overall = &oall;
|
||||||
|
// Seed policy 3: inside-out search
|
||||||
|
pols.expand();
|
||||||
|
pols.back().len = ln;
|
||||||
|
pols.back().type = SEED_TYPE_INSIDE_OUT;
|
||||||
|
pols.back().zones[0] = Constraint::exact();
|
||||||
|
pols.back().zones[1] = Constraint::mmBased(1);
|
||||||
|
pols.back().zones[1].mmsCeil = 0; // Must have used at least 1 mismatch
|
||||||
|
pols.back().zones[2] = Constraint::mmBased(1);
|
||||||
|
pols.back().zones[2].mmsCeil = 0; // Must have used at least 1 mismatch
|
||||||
|
pols.back().overall = &oall;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Types of actions that can be taken by the SeedAligner.
|
||||||
|
*/
|
||||||
|
enum {
|
||||||
|
SA_ACTION_TYPE_RESET = 1,
|
||||||
|
SA_ACTION_TYPE_SEARCH_SEED, // 2
|
||||||
|
SA_ACTION_TYPE_FTAB, // 3
|
||||||
|
SA_ACTION_TYPE_FCHR, // 4
|
||||||
|
SA_ACTION_TYPE_MATCH, // 5
|
||||||
|
SA_ACTION_TYPE_EDIT // 6
|
||||||
|
};
|
||||||
|
|
||||||
|
#define MIN(x, y) ((x < y) ? x : y)
|
||||||
|
|
||||||
|
#ifdef ALIGNER_SEED_MAIN
|
||||||
|
|
||||||
|
#include <getopt.h>
|
||||||
|
#include <string>
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Parse an int out of optarg and enforce that it be at least 'lower';
|
||||||
|
* if it is less than 'lower', than output the given error message and
|
||||||
|
* exit with an error and a usage message.
|
||||||
|
*/
|
||||||
|
static int parseInt(const char *errmsg, const char *arg) {
|
||||||
|
long l;
|
||||||
|
char *endPtr = NULL;
|
||||||
|
l = strtol(arg, &endPtr, 10);
|
||||||
|
if (endPtr != NULL) {
|
||||||
|
return (int32_t)l;
|
||||||
|
}
|
||||||
|
cerr << errmsg << endl;
|
||||||
|
throw 1;
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
enum {
|
||||||
|
ARG_NOFW = 256,
|
||||||
|
ARG_NORC,
|
||||||
|
ARG_MM,
|
||||||
|
ARG_SHMEM,
|
||||||
|
ARG_TESTS,
|
||||||
|
ARG_RANDOM_TESTS,
|
||||||
|
ARG_SEED
|
||||||
|
};
|
||||||
|
|
||||||
|
static const char *short_opts = "vCt";
|
||||||
|
static struct option long_opts[] = {
|
||||||
|
{(char*)"verbose", no_argument, 0, 'v'},
|
||||||
|
{(char*)"color", no_argument, 0, 'C'},
|
||||||
|
{(char*)"timing", no_argument, 0, 't'},
|
||||||
|
{(char*)"nofw", no_argument, 0, ARG_NOFW},
|
||||||
|
{(char*)"norc", no_argument, 0, ARG_NORC},
|
||||||
|
{(char*)"mm", no_argument, 0, ARG_MM},
|
||||||
|
{(char*)"shmem", no_argument, 0, ARG_SHMEM},
|
||||||
|
{(char*)"tests", no_argument, 0, ARG_TESTS},
|
||||||
|
{(char*)"random", required_argument, 0, ARG_RANDOM_TESTS},
|
||||||
|
{(char*)"seed", required_argument, 0, ARG_SEED},
|
||||||
|
};
|
||||||
|
|
||||||
|
static void printUsage(ostream& os) {
|
||||||
|
os << "Usage: ac [options]* <index> <patterns>" << endl;
|
||||||
|
os << "Options:" << endl;
|
||||||
|
os << " --mm memory-mapped mode" << endl;
|
||||||
|
os << " --shmem shared memory mode" << endl;
|
||||||
|
os << " --nofw don't align forward-oriented read" << endl;
|
||||||
|
os << " --norc don't align reverse-complemented read" << endl;
|
||||||
|
os << " -t/--timing show timing information" << endl;
|
||||||
|
os << " -C/--color colorspace mode" << endl;
|
||||||
|
os << " -v/--verbose talkative mode" << endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool gNorc = false;
|
||||||
|
bool gNofw = false;
|
||||||
|
bool gColor = false;
|
||||||
|
int gVerbose = 0;
|
||||||
|
int gGapBarrier = 1;
|
||||||
|
bool gColorExEnds = true;
|
||||||
|
int gSnpPhred = 30;
|
||||||
|
bool gReportOverhangs = true;
|
||||||
|
|
||||||
|
extern void aligner_seed_tests();
|
||||||
|
extern void aligner_random_seed_tests(
|
||||||
|
int num_tests,
|
||||||
|
uint32_t qslo,
|
||||||
|
uint32_t qshi,
|
||||||
|
bool color,
|
||||||
|
uint32_t seed);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A way of feeding simply tests to the seed alignment infrastructure.
|
||||||
|
*/
|
||||||
|
int main(int argc, char **argv) {
|
||||||
|
bool useMm = false;
|
||||||
|
bool useShmem = false;
|
||||||
|
bool mmSweep = false;
|
||||||
|
bool noRefNames = false;
|
||||||
|
bool sanity = false;
|
||||||
|
bool timing = false;
|
||||||
|
int option_index = 0;
|
||||||
|
int seed = 777;
|
||||||
|
int next_option;
|
||||||
|
do {
|
||||||
|
next_option = getopt_long(
|
||||||
|
argc, argv, short_opts, long_opts, &option_index);
|
||||||
|
switch (next_option) {
|
||||||
|
case 'v': gVerbose = true; break;
|
||||||
|
case 'C': gColor = true; break;
|
||||||
|
case 't': timing = true; break;
|
||||||
|
case ARG_NOFW: gNofw = true; break;
|
||||||
|
case ARG_NORC: gNorc = true; break;
|
||||||
|
case ARG_MM: useMm = true; break;
|
||||||
|
case ARG_SHMEM: useShmem = true; break;
|
||||||
|
case ARG_SEED: seed = parseInt("", optarg); break;
|
||||||
|
case ARG_TESTS: {
|
||||||
|
aligner_seed_tests();
|
||||||
|
aligner_random_seed_tests(
|
||||||
|
100, // num references
|
||||||
|
100, // queries per reference lo
|
||||||
|
400, // queries per reference hi
|
||||||
|
false, // true -> generate colorspace reference/reads
|
||||||
|
18); // pseudo-random seed
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
case ARG_RANDOM_TESTS: {
|
||||||
|
seed = parseInt("", optarg);
|
||||||
|
aligner_random_seed_tests(
|
||||||
|
100, // num references
|
||||||
|
100, // queries per reference lo
|
||||||
|
400, // queries per reference hi
|
||||||
|
false, // true -> generate colorspace reference/reads
|
||||||
|
seed); // pseudo-random seed
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
case -1: break;
|
||||||
|
default: {
|
||||||
|
cerr << "Unknown option: " << (char)next_option << endl;
|
||||||
|
printUsage(cerr);
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} while(next_option != -1);
|
||||||
|
char *reffn;
|
||||||
|
if(optind >= argc) {
|
||||||
|
cerr << "No reference; quitting..." << endl;
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
reffn = argv[optind++];
|
||||||
|
if(optind >= argc) {
|
||||||
|
cerr << "No reads; quitting..." << endl;
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
string gfmBase(reffn);
|
||||||
|
BitPairReference ref(
|
||||||
|
gfmBase, // base path
|
||||||
|
gColor, // whether we expect it to be colorspace
|
||||||
|
sanity, // whether to sanity-check reference as it's loaded
|
||||||
|
NULL, // fasta files to sanity check reference against
|
||||||
|
NULL, // another way of specifying original sequences
|
||||||
|
false, // true -> infiles (2 args ago) contains raw seqs
|
||||||
|
useMm, // use memory mapping to load index?
|
||||||
|
useShmem, // use shared memory (not memory mapping)
|
||||||
|
mmSweep, // touch all the pages after memory-mapping the index
|
||||||
|
gVerbose, // verbose
|
||||||
|
gVerbose); // verbose but just for startup messages
|
||||||
|
Timer *t = new Timer(cerr, "Time loading fw index: ", timing);
|
||||||
|
GFM gfmFw(
|
||||||
|
gfmBase,
|
||||||
|
0, // don't need entireReverse for fw index
|
||||||
|
true, // index is for the forward direction
|
||||||
|
-1, // offrate (irrelevant)
|
||||||
|
useMm, // whether to use memory-mapped files
|
||||||
|
useShmem, // whether to use shared memory
|
||||||
|
mmSweep, // sweep memory-mapped files
|
||||||
|
!noRefNames, // load names?
|
||||||
|
false, // load SA sample?
|
||||||
|
true, // load ftab?
|
||||||
|
true, // load rstarts?
|
||||||
|
NULL, // reference map, or NULL if none is needed
|
||||||
|
gVerbose, // whether to be talkative
|
||||||
|
gVerbose, // talkative during initialization
|
||||||
|
false, // handle memory exceptions, don't pass them up
|
||||||
|
sanity);
|
||||||
|
delete t;
|
||||||
|
t = new Timer(cerr, "Time loading bw index: ", timing);
|
||||||
|
GFM gfmBw(
|
||||||
|
gfmBase + ".rev",
|
||||||
|
1, // need entireReverse
|
||||||
|
false, // index is for the backward direction
|
||||||
|
-1, // offrate (irrelevant)
|
||||||
|
useMm, // whether to use memory-mapped files
|
||||||
|
useShmem, // whether to use shared memory
|
||||||
|
mmSweep, // sweep memory-mapped files
|
||||||
|
!noRefNames, // load names?
|
||||||
|
false, // load SA sample?
|
||||||
|
true, // load ftab?
|
||||||
|
false, // load rstarts?
|
||||||
|
NULL, // reference map, or NULL if none is needed
|
||||||
|
gVerbose, // whether to be talkative
|
||||||
|
gVerbose, // talkative during initialization
|
||||||
|
false, // handle memory exceptions, don't pass them up
|
||||||
|
sanity);
|
||||||
|
delete t;
|
||||||
|
for(int i = optind; i < argc; i++) {
|
||||||
|
}
|
||||||
|
}
|
||||||
|
#endif
|
2922
aligner_seed.h
Normal file
2922
aligner_seed.h
Normal file
File diff suppressed because it is too large
Load Diff
1245
aligner_seed2.cpp
Normal file
1245
aligner_seed2.cpp
Normal file
File diff suppressed because it is too large
Load Diff
4291
aligner_seed2.h
Normal file
4291
aligner_seed2.h
Normal file
File diff suppressed because it is too large
Load Diff
916
aligner_seed_policy.cpp
Normal file
916
aligner_seed_policy.cpp
Normal file
@ -0,0 +1,916 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <string>
|
||||||
|
#include <iostream>
|
||||||
|
#include <sstream>
|
||||||
|
#include <limits>
|
||||||
|
#include "ds.h"
|
||||||
|
#include "aligner_seed_policy.h"
|
||||||
|
#include "mem_ids.h"
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
static int parseFuncType(const std::string& otype) {
|
||||||
|
string type = otype;
|
||||||
|
if(type == "C" || type == "Constant") {
|
||||||
|
return SIMPLE_FUNC_CONST;
|
||||||
|
} else if(type == "L" || type == "Linear") {
|
||||||
|
return SIMPLE_FUNC_LINEAR;
|
||||||
|
} else if(type == "S" || type == "Sqrt") {
|
||||||
|
return SIMPLE_FUNC_SQRT;
|
||||||
|
} else if(type == "G" || type == "Log") {
|
||||||
|
return SIMPLE_FUNC_LOG;
|
||||||
|
}
|
||||||
|
std::cerr << "Error: Bad function type '" << otype.c_str()
|
||||||
|
<< "'. Should be C (constant), L (linear), "
|
||||||
|
<< "S (square root) or G (natural log)." << std::endl;
|
||||||
|
throw 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
#define PARSE_FUNC(fv) { \
|
||||||
|
if(ctoks.size() >= 1) { \
|
||||||
|
fv.setType(parseFuncType(ctoks[0])); \
|
||||||
|
} \
|
||||||
|
if(ctoks.size() >= 2) { \
|
||||||
|
double co; \
|
||||||
|
istringstream tmpss(ctoks[1]); \
|
||||||
|
tmpss >> co; \
|
||||||
|
fv.setConst(co); \
|
||||||
|
} \
|
||||||
|
if(ctoks.size() >= 3) { \
|
||||||
|
double ce; \
|
||||||
|
istringstream tmpss(ctoks[2]); \
|
||||||
|
tmpss >> ce; \
|
||||||
|
fv.setCoeff(ce); \
|
||||||
|
} \
|
||||||
|
if(ctoks.size() >= 4) { \
|
||||||
|
double mn; \
|
||||||
|
istringstream tmpss(ctoks[3]); \
|
||||||
|
tmpss >> mn; \
|
||||||
|
fv.setMin(mn); \
|
||||||
|
} \
|
||||||
|
if(ctoks.size() >= 5) { \
|
||||||
|
double mx; \
|
||||||
|
istringstream tmpss(ctoks[4]); \
|
||||||
|
tmpss >> mx; \
|
||||||
|
fv.setMin(mx); \
|
||||||
|
} \
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Parse alignment policy when provided in this format:
|
||||||
|
* <lab>=<val>;<lab>=<val>;<lab>=<val>...
|
||||||
|
*
|
||||||
|
* And label=value possibilities are:
|
||||||
|
*
|
||||||
|
* Bonus for a match
|
||||||
|
* -----------------
|
||||||
|
*
|
||||||
|
* MA=xx (default: MA=0, or MA=2 if --local is set)
|
||||||
|
*
|
||||||
|
* xx = Each position where equal read and reference characters match up
|
||||||
|
* in the alignment contriubtes this amount to the total score.
|
||||||
|
*
|
||||||
|
* Penalty for a mismatch
|
||||||
|
* ----------------------
|
||||||
|
*
|
||||||
|
* MMP={Cxx|Q|RQ} (default: MMP=C6)
|
||||||
|
*
|
||||||
|
* Cxx = Each mismatch costs xx. If MMP=Cxx is specified, quality
|
||||||
|
* values are ignored when assessing penalities for mismatches.
|
||||||
|
* Q = Each mismatch incurs a penalty equal to the mismatched base's
|
||||||
|
* value.
|
||||||
|
* R = Each mismatch incurs a penalty equal to the mismatched base's
|
||||||
|
* rounded quality value. Qualities are rounded off to the
|
||||||
|
* nearest 10, and qualities greater than 30 are rounded to 30.
|
||||||
|
*
|
||||||
|
* Penalty for position with N (in either read or reference)
|
||||||
|
* ---------------------------------------------------------
|
||||||
|
*
|
||||||
|
* NP={Cxx|Q|RQ} (default: NP=C1)
|
||||||
|
*
|
||||||
|
* Cxx = Each alignment position with an N in either the read or the
|
||||||
|
* reference costs xx. If NP=Cxx is specified, quality values are
|
||||||
|
* ignored when assessing penalities for Ns.
|
||||||
|
* Q = Each alignment position with an N in either the read or the
|
||||||
|
* reference incurs a penalty equal to the read base's quality
|
||||||
|
* value.
|
||||||
|
* R = Each alignment position with an N in either the read or the
|
||||||
|
* reference incurs a penalty equal to the read base's rounded
|
||||||
|
* quality value. Qualities are rounded off to the nearest 10,
|
||||||
|
* and qualities greater than 30 are rounded to 30.
|
||||||
|
*
|
||||||
|
* Penalty for a read gap
|
||||||
|
* ----------------------
|
||||||
|
*
|
||||||
|
* RDG=xx,yy (default: RDG=5,3)
|
||||||
|
*
|
||||||
|
* xx = Read gap open penalty.
|
||||||
|
* yy = Read gap extension penalty.
|
||||||
|
*
|
||||||
|
* Total cost incurred by a read gap = xx + (yy * gap length)
|
||||||
|
*
|
||||||
|
* Penalty for a reference gap
|
||||||
|
* ---------------------------
|
||||||
|
*
|
||||||
|
* RFG=xx,yy (default: RFG=5,3)
|
||||||
|
*
|
||||||
|
* xx = Reference gap open penalty.
|
||||||
|
* yy = Reference gap extension penalty.
|
||||||
|
*
|
||||||
|
* Total cost incurred by a reference gap = xx + (yy * gap length)
|
||||||
|
*
|
||||||
|
* Minimum score for valid alignment
|
||||||
|
* ---------------------------------
|
||||||
|
*
|
||||||
|
* MIN=xx,yy (defaults: MIN=-0.6,-0.6, or MIN=0.0,0.66 if --local is set)
|
||||||
|
*
|
||||||
|
* xx,yy = For a read of length N, the total score must be at least
|
||||||
|
* xx + (read length * yy) for the alignment to be valid. The
|
||||||
|
* total score is the sum of all negative penalties (from
|
||||||
|
* mismatches and gaps) and all positive bonuses. The minimum
|
||||||
|
* can be negative (and is by default in global alignment mode).
|
||||||
|
*
|
||||||
|
* Score floor for local alignment
|
||||||
|
* -------------------------------
|
||||||
|
*
|
||||||
|
* FL=xx,yy (defaults: FL=-Infinity,0.0, or FL=0.0,0.0 if --local is set)
|
||||||
|
*
|
||||||
|
* xx,yy = If a cell in the dynamic programming table has a score less
|
||||||
|
* than xx + (read length * yy), then no valid alignment can go
|
||||||
|
* through it. Defaults are highly recommended.
|
||||||
|
*
|
||||||
|
* N ceiling
|
||||||
|
* ---------
|
||||||
|
*
|
||||||
|
* NCEIL=xx,yy (default: NCEIL=0.0,0.15)
|
||||||
|
*
|
||||||
|
* xx,yy = For a read of length N, the number of alignment
|
||||||
|
* positions with an N in either the read or the
|
||||||
|
* reference cannot exceed
|
||||||
|
* ceiling = xx + (read length * yy). If the ceiling is
|
||||||
|
* exceeded, the alignment is considered invalid.
|
||||||
|
*
|
||||||
|
* Seeds
|
||||||
|
* -----
|
||||||
|
*
|
||||||
|
* SEED=mm,len,ival (default: SEED=0,22)
|
||||||
|
*
|
||||||
|
* mm = Maximum number of mismatches allowed within a seed.
|
||||||
|
* Must be >= 0 and <= 2. Note that 2-mismatch mode is
|
||||||
|
* not fully sensitive; i.e. some 2-mismatch seed
|
||||||
|
* alignments may be missed.
|
||||||
|
* len = Length of seed.
|
||||||
|
* ival = Interval between seeds. If not specified, seed
|
||||||
|
* interval is determined by IVAL.
|
||||||
|
*
|
||||||
|
* Seed interval
|
||||||
|
* -------------
|
||||||
|
*
|
||||||
|
* IVAL={L|S|C},xx,yy (default: IVAL=S,1.0,0.0)
|
||||||
|
*
|
||||||
|
* L = let interval between seeds be a linear function of the
|
||||||
|
* read length. xx and yy are the constant and linear
|
||||||
|
* coefficients respectively. In other words, the interval
|
||||||
|
* equals a * len + b, where len is the read length.
|
||||||
|
* Intervals less than 1 are rounded up to 1.
|
||||||
|
* S = let interval between seeds be a function of the sqaure
|
||||||
|
* root of the read length. xx and yy are the
|
||||||
|
* coefficients. In other words, the interval equals
|
||||||
|
* a * sqrt(len) + b, where len is the read length.
|
||||||
|
* Intervals less than 1 are rounded up to 1.
|
||||||
|
* C = Like S but uses cube root of length instead of square
|
||||||
|
* root.
|
||||||
|
*
|
||||||
|
* Example 1:
|
||||||
|
*
|
||||||
|
* SEED=1,10,5 and read sequence is TGCTATCGTACGATCGTAC:
|
||||||
|
*
|
||||||
|
* The following seeds are extracted from the forward
|
||||||
|
* representation of the read and aligned to the reference
|
||||||
|
* allowing up to 1 mismatch:
|
||||||
|
*
|
||||||
|
* Read: TGCTATCGTACGATCGTACA
|
||||||
|
*
|
||||||
|
* Seed 1+: TGCTATCGTA
|
||||||
|
* Seed 2+: TCGTACGATC
|
||||||
|
* Seed 3+: CGATCGTACA
|
||||||
|
*
|
||||||
|
* ...and the following are extracted from the reverse-complement
|
||||||
|
* representation of the read and align to the reference allowing
|
||||||
|
* up to 1 mismatch:
|
||||||
|
*
|
||||||
|
* Seed 1-: TACGATAGCA
|
||||||
|
* Seed 2-: GATCGTACGA
|
||||||
|
* Seed 3-: TGTACGATCG
|
||||||
|
*
|
||||||
|
* Example 2:
|
||||||
|
*
|
||||||
|
* SEED=1,20,20 and read sequence is TGCTATCGTACGATC. The seed
|
||||||
|
* length is 20 but the read is only 15 characters long. In this
|
||||||
|
* case, Bowtie2 automatically shrinks the seed length to be equal
|
||||||
|
* to the read length.
|
||||||
|
*
|
||||||
|
* Read: TGCTATCGTACGATC
|
||||||
|
*
|
||||||
|
* Seed 1+: TGCTATCGTACGATC
|
||||||
|
* Seed 1-: GATCGTACGATAGCA
|
||||||
|
*
|
||||||
|
* Example 3:
|
||||||
|
*
|
||||||
|
* SEED=1,10,10 and read sequence is TGCTATCGTACGATC. Only one seed
|
||||||
|
* fits on the read; a second seed would overhang the end of the read
|
||||||
|
* by 5 positions. In this case, Bowtie2 extracts one seed.
|
||||||
|
*
|
||||||
|
* Read: TGCTATCGTACGATC
|
||||||
|
*
|
||||||
|
* Seed 1+: TGCTATCGTA
|
||||||
|
* Seed 1-: TACGATAGCA
|
||||||
|
*/
|
||||||
|
void SeedAlignmentPolicy::parseString(
|
||||||
|
const std::string& s,
|
||||||
|
bool local,
|
||||||
|
bool noisyHpolymer,
|
||||||
|
bool ignoreQuals,
|
||||||
|
int& bonusMatchType,
|
||||||
|
int& bonusMatch,
|
||||||
|
int& penMmcType,
|
||||||
|
int& penMmcMax,
|
||||||
|
int& penMmcMin,
|
||||||
|
int& penScMax,
|
||||||
|
int& penScMin,
|
||||||
|
int& penNType,
|
||||||
|
int& penN,
|
||||||
|
int& penRdExConst,
|
||||||
|
int& penRfExConst,
|
||||||
|
int& penRdExLinear,
|
||||||
|
int& penRfExLinear,
|
||||||
|
SimpleFunc& costMin,
|
||||||
|
SimpleFunc& nCeil,
|
||||||
|
bool& nCatPair,
|
||||||
|
int& multiseedMms,
|
||||||
|
int& multiseedLen,
|
||||||
|
SimpleFunc& multiseedIval,
|
||||||
|
size_t& failStreak,
|
||||||
|
size_t& seedRounds,
|
||||||
|
SimpleFunc* penCanIntronLen,
|
||||||
|
SimpleFunc* penNoncanIntronLen)
|
||||||
|
{
|
||||||
|
|
||||||
|
bonusMatchType = local ? DEFAULT_MATCH_BONUS_TYPE_LOCAL : DEFAULT_MATCH_BONUS_TYPE;
|
||||||
|
bonusMatch = local ? DEFAULT_MATCH_BONUS_LOCAL : DEFAULT_MATCH_BONUS;
|
||||||
|
penMmcType = ignoreQuals ? DEFAULT_MM_PENALTY_TYPE_IGNORE_QUALS :
|
||||||
|
DEFAULT_MM_PENALTY_TYPE;
|
||||||
|
penMmcMax = DEFAULT_MM_PENALTY_MAX;
|
||||||
|
penMmcMin = DEFAULT_MM_PENALTY_MIN;
|
||||||
|
penNType = DEFAULT_N_PENALTY_TYPE;
|
||||||
|
penN = DEFAULT_N_PENALTY;
|
||||||
|
|
||||||
|
penScMax = DEFAULT_SC_PENALTY_MAX;
|
||||||
|
penScMin = DEFAULT_SC_PENALTY_MIN;
|
||||||
|
|
||||||
|
const double DMAX = std::numeric_limits<double>::max();
|
||||||
|
costMin.init(
|
||||||
|
local ? SIMPLE_FUNC_LOG : SIMPLE_FUNC_LINEAR,
|
||||||
|
local ? DEFAULT_MIN_CONST_LOCAL : 0.0f,
|
||||||
|
local ? DEFAULT_MIN_LINEAR_LOCAL : -0.2f);
|
||||||
|
nCeil.init(
|
||||||
|
SIMPLE_FUNC_LINEAR, 0.0f, DMAX,
|
||||||
|
DEFAULT_N_CEIL_CONST, DEFAULT_N_CEIL_LINEAR);
|
||||||
|
multiseedIval.init(
|
||||||
|
DEFAULT_IVAL, 1.0f, DMAX,
|
||||||
|
DEFAULT_IVAL_B, DEFAULT_IVAL_A);
|
||||||
|
nCatPair = DEFAULT_N_CAT_PAIR;
|
||||||
|
|
||||||
|
if(!noisyHpolymer) {
|
||||||
|
penRdExConst = DEFAULT_READ_GAP_CONST;
|
||||||
|
penRdExLinear = DEFAULT_READ_GAP_LINEAR;
|
||||||
|
penRfExConst = DEFAULT_REF_GAP_CONST;
|
||||||
|
penRfExLinear = DEFAULT_REF_GAP_LINEAR;
|
||||||
|
} else {
|
||||||
|
penRdExConst = DEFAULT_READ_GAP_CONST_BADHPOLY;
|
||||||
|
penRdExLinear = DEFAULT_READ_GAP_LINEAR_BADHPOLY;
|
||||||
|
penRfExConst = DEFAULT_REF_GAP_CONST_BADHPOLY;
|
||||||
|
penRfExLinear = DEFAULT_REF_GAP_LINEAR_BADHPOLY;
|
||||||
|
}
|
||||||
|
|
||||||
|
multiseedMms = DEFAULT_SEEDMMS;
|
||||||
|
multiseedLen = DEFAULT_SEEDLEN;
|
||||||
|
|
||||||
|
EList<string> toks(MISC_CAT);
|
||||||
|
string tok;
|
||||||
|
istringstream ss(s);
|
||||||
|
int setting = 0;
|
||||||
|
// Get each ;-separated token
|
||||||
|
while(getline(ss, tok, ';')) {
|
||||||
|
setting++;
|
||||||
|
EList<string> etoks(MISC_CAT);
|
||||||
|
string etok;
|
||||||
|
// Divide into tokens on either side of =
|
||||||
|
istringstream ess(tok);
|
||||||
|
while(getline(ess, etok, '=')) {
|
||||||
|
etoks.push_back(etok);
|
||||||
|
}
|
||||||
|
// Must be exactly 1 =
|
||||||
|
if(etoks.size() != 2) {
|
||||||
|
cerr << "Error parsing alignment policy setting " << setting
|
||||||
|
<< "; must be bisected by = sign" << endl
|
||||||
|
<< "Policy: " << s.c_str() << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
// LHS is tag, RHS value
|
||||||
|
string tag = etoks[0], val = etoks[1];
|
||||||
|
// Separate value into comma-separated tokens
|
||||||
|
EList<string> ctoks(MISC_CAT);
|
||||||
|
string ctok;
|
||||||
|
istringstream css(val);
|
||||||
|
while(getline(css, ctok, ',')) {
|
||||||
|
ctoks.push_back(ctok);
|
||||||
|
}
|
||||||
|
if(ctoks.size() == 0) {
|
||||||
|
cerr << "Error parsing alignment policy setting " << setting
|
||||||
|
<< "; RHS must have at least 1 token" << endl
|
||||||
|
<< "Policy: " << s.c_str() << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
for(size_t i = 0; i < ctoks.size(); i++) {
|
||||||
|
if(ctoks[i].length() == 0) {
|
||||||
|
cerr << "Error parsing alignment policy setting " << setting
|
||||||
|
<< "; token " << i+1 << " on RHS had length=0" << endl
|
||||||
|
<< "Policy: " << s.c_str() << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Bonus for a match
|
||||||
|
// MA=xx (default: MA=0, or MA=10 if --local is set)
|
||||||
|
if(tag == "MA") {
|
||||||
|
if(ctoks.size() != 1) {
|
||||||
|
cerr << "Error parsing alignment policy setting " << setting
|
||||||
|
<< "; RHS must have 1 token" << endl
|
||||||
|
<< "Policy: " << s.c_str() << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
string tmp = ctoks[0];
|
||||||
|
istringstream tmpss(tmp);
|
||||||
|
tmpss >> bonusMatch;
|
||||||
|
}
|
||||||
|
// Scoring for mismatches
|
||||||
|
// MMP={Cxx|Q|RQ}
|
||||||
|
// Cxx = constant, where constant is integer xx
|
||||||
|
// Qxx = equal to quality, scaled
|
||||||
|
// R = equal to maq-rounded quality value (rounded to nearest
|
||||||
|
// 10, can't be greater than 30)
|
||||||
|
else if(tag == "MMP") {
|
||||||
|
if(ctoks.size() > 3) {
|
||||||
|
cerr << "Error parsing alignment policy setting "
|
||||||
|
<< "'" << tag.c_str() << "'"
|
||||||
|
<< "; RHS must have at most 3 tokens" << endl
|
||||||
|
<< "Policy: '" << s.c_str() << "'" << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
if(ctoks[0][0] == 'C') {
|
||||||
|
string tmp = ctoks[0].substr(1);
|
||||||
|
// Parse constant penalty
|
||||||
|
istringstream tmpss(tmp);
|
||||||
|
tmpss >> penMmcMax;
|
||||||
|
penMmcMin = penMmcMax;
|
||||||
|
// Parse constant penalty
|
||||||
|
penMmcType = COST_MODEL_CONSTANT;
|
||||||
|
} else if(ctoks[0][0] == 'Q') {
|
||||||
|
if(ctoks.size() >= 2) {
|
||||||
|
string tmp = ctoks[1];
|
||||||
|
istringstream tmpss(tmp);
|
||||||
|
tmpss >> penMmcMax;
|
||||||
|
} else {
|
||||||
|
penMmcMax = DEFAULT_MM_PENALTY_MAX;
|
||||||
|
}
|
||||||
|
if(ctoks.size() >= 3) {
|
||||||
|
string tmp = ctoks[2];
|
||||||
|
istringstream tmpss(tmp);
|
||||||
|
tmpss >> penMmcMin;
|
||||||
|
} else {
|
||||||
|
penMmcMin = DEFAULT_MM_PENALTY_MIN;
|
||||||
|
}
|
||||||
|
if(penMmcMin > penMmcMax) {
|
||||||
|
cerr << "Error: Maximum mismatch penalty (" << penMmcMax
|
||||||
|
<< ") is less than minimum penalty (" << penMmcMin
|
||||||
|
<< endl;
|
||||||
|
throw 1;
|
||||||
|
}
|
||||||
|
// Set type to =quality
|
||||||
|
penMmcType = COST_MODEL_QUAL;
|
||||||
|
} else if(ctoks[0][0] == 'R') {
|
||||||
|
// Set type to=Maq-quality
|
||||||
|
penMmcType = COST_MODEL_ROUNDED_QUAL;
|
||||||
|
} else {
|
||||||
|
cerr << "Error parsing alignment policy setting "
|
||||||
|
<< "'" << tag.c_str() << "'"
|
||||||
|
<< "; RHS must start with C, Q or R" << endl
|
||||||
|
<< "Policy: '" << s.c_str() << "'" << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else if(tag == "SCP") {
|
||||||
|
if(ctoks.size() > 3) {
|
||||||
|
cerr << "Error parsing alignment policy setting "
|
||||||
|
<< "'" << tag.c_str() << "'"
|
||||||
|
<< "; SCP must have at most 3 tokens" << endl
|
||||||
|
<< "Policy: '" << s.c_str() << "'" << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
istringstream tmpMax(ctoks[1]);
|
||||||
|
tmpMax >> penScMax;
|
||||||
|
istringstream tmpMin(ctoks[1]);
|
||||||
|
tmpMin >> penScMin;
|
||||||
|
if(penScMin > penScMax) {
|
||||||
|
cerr << "max (" << penScMax << ") should be >= min (" << penScMin << ")" << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
if(penScMin < 1) {
|
||||||
|
cerr << "min (" << penScMin << ") should be greater than 0" << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Scoring for mismatches where read char=N
|
||||||
|
// NP={Cxx|Q|RQ}
|
||||||
|
// Cxx = constant, where constant is integer xx
|
||||||
|
// Q = equal to quality
|
||||||
|
// R = equal to maq-rounded quality value (rounded to nearest
|
||||||
|
// 10, can't be greater than 30)
|
||||||
|
else if(tag == "NP") {
|
||||||
|
if(ctoks.size() != 1) {
|
||||||
|
cerr << "Error parsing alignment policy setting "
|
||||||
|
<< "'" << tag.c_str() << "'"
|
||||||
|
<< "; RHS must have 1 token" << endl
|
||||||
|
<< "Policy: '" << s.c_str() << "'" << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
if(ctoks[0][0] == 'C') {
|
||||||
|
string tmp = ctoks[0].substr(1);
|
||||||
|
// Parse constant penalty
|
||||||
|
istringstream tmpss(tmp);
|
||||||
|
tmpss >> penN;
|
||||||
|
// Parse constant penalty
|
||||||
|
penNType = COST_MODEL_CONSTANT;
|
||||||
|
} else if(ctoks[0][0] == 'Q') {
|
||||||
|
// Set type to =quality
|
||||||
|
penNType = COST_MODEL_QUAL;
|
||||||
|
} else if(ctoks[0][0] == 'R') {
|
||||||
|
// Set type to=Maq-quality
|
||||||
|
penNType = COST_MODEL_ROUNDED_QUAL;
|
||||||
|
} else {
|
||||||
|
cerr << "Error parsing alignment policy setting "
|
||||||
|
<< "'" << tag.c_str() << "'"
|
||||||
|
<< "; RHS must start with C, Q or R" << endl
|
||||||
|
<< "Policy: '" << s.c_str() << "'" << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Scoring for read gaps
|
||||||
|
// RDG=xx,yy,zz
|
||||||
|
// xx = read gap open penalty
|
||||||
|
// yy = read gap extension penalty constant coefficient
|
||||||
|
// (defaults to open penalty)
|
||||||
|
// zz = read gap extension penalty linear coefficient
|
||||||
|
// (defaults to 0)
|
||||||
|
else if(tag == "RDG") {
|
||||||
|
if(ctoks.size() >= 1) {
|
||||||
|
istringstream tmpss(ctoks[0]);
|
||||||
|
tmpss >> penRdExConst;
|
||||||
|
} else {
|
||||||
|
penRdExConst = noisyHpolymer ?
|
||||||
|
DEFAULT_READ_GAP_CONST_BADHPOLY :
|
||||||
|
DEFAULT_READ_GAP_CONST;
|
||||||
|
}
|
||||||
|
if(ctoks.size() >= 2) {
|
||||||
|
istringstream tmpss(ctoks[1]);
|
||||||
|
tmpss >> penRdExLinear;
|
||||||
|
} else {
|
||||||
|
penRdExLinear = noisyHpolymer ?
|
||||||
|
DEFAULT_READ_GAP_LINEAR_BADHPOLY :
|
||||||
|
DEFAULT_READ_GAP_LINEAR;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Scoring for reference gaps
|
||||||
|
// RFG=xx,yy,zz
|
||||||
|
// xx = ref gap open penalty
|
||||||
|
// yy = ref gap extension penalty constant coefficient
|
||||||
|
// (defaults to open penalty)
|
||||||
|
// zz = ref gap extension penalty linear coefficient
|
||||||
|
// (defaults to 0)
|
||||||
|
else if(tag == "RFG") {
|
||||||
|
if(ctoks.size() >= 1) {
|
||||||
|
istringstream tmpss(ctoks[0]);
|
||||||
|
tmpss >> penRfExConst;
|
||||||
|
} else {
|
||||||
|
penRfExConst = noisyHpolymer ?
|
||||||
|
DEFAULT_REF_GAP_CONST_BADHPOLY :
|
||||||
|
DEFAULT_REF_GAP_CONST;
|
||||||
|
}
|
||||||
|
if(ctoks.size() >= 2) {
|
||||||
|
istringstream tmpss(ctoks[1]);
|
||||||
|
tmpss >> penRfExLinear;
|
||||||
|
} else {
|
||||||
|
penRfExLinear = noisyHpolymer ?
|
||||||
|
DEFAULT_REF_GAP_LINEAR_BADHPOLY :
|
||||||
|
DEFAULT_REF_GAP_LINEAR;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Minimum score as a function of read length
|
||||||
|
// MIN=xx,yy
|
||||||
|
// xx = constant coefficient
|
||||||
|
// yy = linear coefficient
|
||||||
|
else if(tag == "MIN") {
|
||||||
|
PARSE_FUNC(costMin);
|
||||||
|
}
|
||||||
|
// Per-read N ceiling as a function of read length
|
||||||
|
// NCEIL=xx,yy
|
||||||
|
// xx = N ceiling constant coefficient
|
||||||
|
// yy = N ceiling linear coefficient (set to 0 if unspecified)
|
||||||
|
else if(tag == "NCEIL") {
|
||||||
|
PARSE_FUNC(nCeil);
|
||||||
|
}
|
||||||
|
/*
|
||||||
|
* Seeds
|
||||||
|
* -----
|
||||||
|
*
|
||||||
|
* SEED=mm,len,ival (default: SEED=0,22)
|
||||||
|
*
|
||||||
|
* mm = Maximum number of mismatches allowed within a seed.
|
||||||
|
* Must be >= 0 and <= 2. Note that 2-mismatch mode is
|
||||||
|
* not fully sensitive; i.e. some 2-mismatch seed
|
||||||
|
* alignments may be missed.
|
||||||
|
* len = Length of seed.
|
||||||
|
* ival = Interval between seeds. If not specified, seed
|
||||||
|
* interval is determined by IVAL.
|
||||||
|
*/
|
||||||
|
else if(tag == "SEED") {
|
||||||
|
if(ctoks.size() > 2) {
|
||||||
|
cerr << "Error parsing alignment policy setting "
|
||||||
|
<< "'" << tag.c_str() << "'; RHS must have 1 or 2 tokens, "
|
||||||
|
<< "had " << ctoks.size() << ". "
|
||||||
|
<< "Policy: '" << s.c_str() << "'" << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
if(ctoks.size() >= 1) {
|
||||||
|
istringstream tmpss(ctoks[0]);
|
||||||
|
tmpss >> multiseedMms;
|
||||||
|
if(multiseedMms > 1) {
|
||||||
|
cerr << "Error: -N was set to " << multiseedMms << ", but cannot be set greater than 1" << endl;
|
||||||
|
throw 1;
|
||||||
|
}
|
||||||
|
if(multiseedMms < 0) {
|
||||||
|
cerr << "Error: -N was set to a number less than 0 (" << multiseedMms << ")" << endl;
|
||||||
|
throw 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if(ctoks.size() >= 2) {
|
||||||
|
istringstream tmpss(ctoks[1]);
|
||||||
|
tmpss >> multiseedLen;
|
||||||
|
} else {
|
||||||
|
multiseedLen = DEFAULT_SEEDLEN;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else if(tag == "SEEDLEN") {
|
||||||
|
if(ctoks.size() > 1) {
|
||||||
|
cerr << "Error parsing alignment policy setting "
|
||||||
|
<< "'" << tag.c_str() << "'; RHS must have 1 token, "
|
||||||
|
<< "had " << ctoks.size() << ". "
|
||||||
|
<< "Policy: '" << s.c_str() << "'" << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
if(ctoks.size() >= 1) {
|
||||||
|
istringstream tmpss(ctoks[0]);
|
||||||
|
tmpss >> multiseedLen;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else if(tag == "DPS") {
|
||||||
|
if(ctoks.size() > 1) {
|
||||||
|
cerr << "Error parsing alignment policy setting "
|
||||||
|
<< "'" << tag.c_str() << "'; RHS must have 1 token, "
|
||||||
|
<< "had " << ctoks.size() << ". "
|
||||||
|
<< "Policy: '" << s.c_str() << "'" << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
if(ctoks.size() >= 1) {
|
||||||
|
istringstream tmpss(ctoks[0]);
|
||||||
|
tmpss >> failStreak;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else if(tag == "ROUNDS") {
|
||||||
|
if(ctoks.size() > 1) {
|
||||||
|
cerr << "Error parsing alignment policy setting "
|
||||||
|
<< "'" << tag.c_str() << "'; RHS must have 1 token, "
|
||||||
|
<< "had " << ctoks.size() << ". "
|
||||||
|
<< "Policy: '" << s.c_str() << "'" << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
if(ctoks.size() >= 1) {
|
||||||
|
istringstream tmpss(ctoks[0]);
|
||||||
|
tmpss >> seedRounds;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
/*
|
||||||
|
* Seed interval
|
||||||
|
* -------------
|
||||||
|
*
|
||||||
|
* IVAL={L|S|C},a,b (default: IVAL=S,1.0,0.0)
|
||||||
|
*
|
||||||
|
* L = let interval between seeds be a linear function of the
|
||||||
|
* read length. xx and yy are the constant and linear
|
||||||
|
* coefficients respectively. In other words, the interval
|
||||||
|
* equals a * len + b, where len is the read length.
|
||||||
|
* Intervals less than 1 are rounded up to 1.
|
||||||
|
* S = let interval between seeds be a function of the sqaure
|
||||||
|
* root of the read length. xx and yy are the
|
||||||
|
* coefficients. In other words, the interval equals
|
||||||
|
* a * sqrt(len) + b, where len is the read length.
|
||||||
|
* Intervals less than 1 are rounded up to 1.
|
||||||
|
* C = Like S but uses cube root of length instead of square
|
||||||
|
* root.
|
||||||
|
*/
|
||||||
|
else if(tag == "IVAL") {
|
||||||
|
PARSE_FUNC(multiseedIval);
|
||||||
|
}
|
||||||
|
else if(tag == "CANINTRONLEN") {
|
||||||
|
assert(penCanIntronLen != NULL);
|
||||||
|
PARSE_FUNC((*penCanIntronLen));
|
||||||
|
}
|
||||||
|
else if(tag == "NONCANINTRONLEN") {
|
||||||
|
assert(penNoncanIntronLen != NULL);
|
||||||
|
PARSE_FUNC((*penNoncanIntronLen));
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
// Unknown tag
|
||||||
|
cerr << "Unexpected alignment policy setting "
|
||||||
|
<< "'" << tag.c_str() << "'" << endl
|
||||||
|
<< "Policy: '" << s.c_str() << "'" << endl;
|
||||||
|
assert(false); throw 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifdef ALIGNER_SEED_POLICY_MAIN
|
||||||
|
int main() {
|
||||||
|
|
||||||
|
int bonusMatchType;
|
||||||
|
int bonusMatch;
|
||||||
|
int penMmcType;
|
||||||
|
int penMmc;
|
||||||
|
int penScMax;
|
||||||
|
int penScMin;
|
||||||
|
int penNType;
|
||||||
|
int penN;
|
||||||
|
int penRdExConst;
|
||||||
|
int penRfExConst;
|
||||||
|
int penRdExLinear;
|
||||||
|
int penRfExLinear;
|
||||||
|
SimpleFunc costMin;
|
||||||
|
SimpleFunc costFloor;
|
||||||
|
SimpleFunc nCeil;
|
||||||
|
bool nCatPair;
|
||||||
|
int multiseedMms;
|
||||||
|
int multiseedLen;
|
||||||
|
SimpleFunc msIval;
|
||||||
|
SimpleFunc posfrac;
|
||||||
|
SimpleFunc rowmult;
|
||||||
|
uint32_t mhits;
|
||||||
|
|
||||||
|
{
|
||||||
|
cout << "Case 1: Defaults 1 ... ";
|
||||||
|
const char *pol = "";
|
||||||
|
SeedAlignmentPolicy::parseString(
|
||||||
|
string(pol),
|
||||||
|
false, // --local?
|
||||||
|
false, // noisy homopolymers a la 454?
|
||||||
|
false, // ignore qualities?
|
||||||
|
bonusMatchType,
|
||||||
|
bonusMatch,
|
||||||
|
penMmcType,
|
||||||
|
penMmc,
|
||||||
|
penScMax,
|
||||||
|
penScMin,
|
||||||
|
penNType,
|
||||||
|
penN,
|
||||||
|
penRdExConst,
|
||||||
|
penRfExConst,
|
||||||
|
penRdExLinear,
|
||||||
|
penRfExLinear,
|
||||||
|
costMin,
|
||||||
|
costFloor,
|
||||||
|
nCeil,
|
||||||
|
nCatPair,
|
||||||
|
multiseedMms,
|
||||||
|
multiseedLen,
|
||||||
|
msIval,
|
||||||
|
mhits);
|
||||||
|
|
||||||
|
assert_eq(DEFAULT_MATCH_BONUS_TYPE, bonusMatchType);
|
||||||
|
assert_eq(DEFAULT_MATCH_BONUS, bonusMatch);
|
||||||
|
assert_eq(DEFAULT_MM_PENALTY_TYPE, penMmcType);
|
||||||
|
assert_eq(DEFAULT_MM_PENALTY_MAX, penMmcMax);
|
||||||
|
assert_eq(DEFAULT_MM_PENALTY_MIN, penMmcMin);
|
||||||
|
assert_eq(DEFAULT_N_PENALTY_TYPE, penNType);
|
||||||
|
assert_eq(DEFAULT_N_PENALTY, penN);
|
||||||
|
assert_eq(DEFAULT_MIN_CONST, costMin.getConst());
|
||||||
|
assert_eq(DEFAULT_MIN_LINEAR, costMin.getCoeff());
|
||||||
|
assert_eq(DEFAULT_FLOOR_CONST, costFloor.getConst());
|
||||||
|
assert_eq(DEFAULT_FLOOR_LINEAR, costFloor.getCoeff());
|
||||||
|
assert_eq(DEFAULT_N_CEIL_CONST, nCeil.getConst());
|
||||||
|
assert_eq(DEFAULT_N_CAT_PAIR, nCatPair);
|
||||||
|
|
||||||
|
assert_eq(DEFAULT_READ_GAP_CONST, penRdExConst);
|
||||||
|
assert_eq(DEFAULT_READ_GAP_LINEAR, penRdExLinear);
|
||||||
|
assert_eq(DEFAULT_REF_GAP_CONST, penRfExConst);
|
||||||
|
assert_eq(DEFAULT_REF_GAP_LINEAR, penRfExLinear);
|
||||||
|
assert_eq(DEFAULT_SEEDMMS, multiseedMms);
|
||||||
|
assert_eq(DEFAULT_SEEDLEN, multiseedLen);
|
||||||
|
assert_eq(DEFAULT_IVAL, msIval.getType());
|
||||||
|
assert_eq(DEFAULT_IVAL_A, msIval.getCoeff());
|
||||||
|
assert_eq(DEFAULT_IVAL_B, msIval.getConst());
|
||||||
|
|
||||||
|
cout << "PASSED" << endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
{
|
||||||
|
cout << "Case 2: Defaults 2 ... ";
|
||||||
|
const char *pol = "";
|
||||||
|
SeedAlignmentPolicy::parseString(
|
||||||
|
string(pol),
|
||||||
|
false, // --local?
|
||||||
|
true, // noisy homopolymers a la 454?
|
||||||
|
false, // ignore qualities?
|
||||||
|
bonusMatchType,
|
||||||
|
bonusMatch,
|
||||||
|
penMmcType,
|
||||||
|
penMmc,
|
||||||
|
|
||||||
|
penNType,
|
||||||
|
penN,
|
||||||
|
penRdExConst,
|
||||||
|
penRfExConst,
|
||||||
|
penRdExLinear,
|
||||||
|
penRfExLinear,
|
||||||
|
costMin,
|
||||||
|
costFloor,
|
||||||
|
nCeil,
|
||||||
|
nCatPair,
|
||||||
|
multiseedMms,
|
||||||
|
multiseedLen,
|
||||||
|
msIval,
|
||||||
|
mhits);
|
||||||
|
|
||||||
|
assert_eq(DEFAULT_MATCH_BONUS_TYPE, bonusMatchType);
|
||||||
|
assert_eq(DEFAULT_MATCH_BONUS, bonusMatch);
|
||||||
|
assert_eq(DEFAULT_MM_PENALTY_TYPE, penMmcType);
|
||||||
|
assert_eq(DEFAULT_MM_PENALTY_MAX, penMmc);
|
||||||
|
assert_eq(DEFAULT_MM_PENALTY_MIN, penMmc);
|
||||||
|
assert_eq(DEFAULT_N_PENALTY_TYPE, penNType);
|
||||||
|
assert_eq(DEFAULT_N_PENALTY, penN);
|
||||||
|
assert_eq(DEFAULT_MIN_CONST, costMin.getConst());
|
||||||
|
assert_eq(DEFAULT_MIN_LINEAR, costMin.getCoeff());
|
||||||
|
assert_eq(DEFAULT_FLOOR_CONST, costFloor.getConst());
|
||||||
|
assert_eq(DEFAULT_FLOOR_LINEAR, costFloor.getCoeff());
|
||||||
|
assert_eq(DEFAULT_N_CEIL_CONST, nCeil.getConst());
|
||||||
|
assert_eq(DEFAULT_N_CAT_PAIR, nCatPair);
|
||||||
|
|
||||||
|
assert_eq(DEFAULT_READ_GAP_CONST_BADHPOLY, penRdExConst);
|
||||||
|
assert_eq(DEFAULT_READ_GAP_LINEAR_BADHPOLY, penRdExLinear);
|
||||||
|
assert_eq(DEFAULT_REF_GAP_CONST_BADHPOLY, penRfExConst);
|
||||||
|
assert_eq(DEFAULT_REF_GAP_LINEAR_BADHPOLY, penRfExLinear);
|
||||||
|
assert_eq(DEFAULT_SEEDMMS, multiseedMms);
|
||||||
|
assert_eq(DEFAULT_SEEDLEN, multiseedLen);
|
||||||
|
assert_eq(DEFAULT_IVAL, msIval.getType());
|
||||||
|
assert_eq(DEFAULT_IVAL_A, msIval.getCoeff());
|
||||||
|
assert_eq(DEFAULT_IVAL_B, msIval.getConst());
|
||||||
|
|
||||||
|
cout << "PASSED" << endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
{
|
||||||
|
cout << "Case 3: Defaults 3 ... ";
|
||||||
|
const char *pol = "";
|
||||||
|
SeedAlignmentPolicy::parseString(
|
||||||
|
string(pol),
|
||||||
|
true, // --local?
|
||||||
|
false, // noisy homopolymers a la 454?
|
||||||
|
false, // ignore qualities?
|
||||||
|
bonusMatchType,
|
||||||
|
bonusMatch,
|
||||||
|
penMmcType,
|
||||||
|
penMmc,
|
||||||
|
penNType,
|
||||||
|
penN,
|
||||||
|
penRdExConst,
|
||||||
|
penRfExConst,
|
||||||
|
penRdExLinear,
|
||||||
|
penRfExLinear,
|
||||||
|
costMin,
|
||||||
|
costFloor,
|
||||||
|
nCeil,
|
||||||
|
nCatPair,
|
||||||
|
multiseedMms,
|
||||||
|
multiseedLen,
|
||||||
|
msIval,
|
||||||
|
mhits);
|
||||||
|
|
||||||
|
assert_eq(DEFAULT_MATCH_BONUS_TYPE_LOCAL, bonusMatchType);
|
||||||
|
assert_eq(DEFAULT_MATCH_BONUS_LOCAL, bonusMatch);
|
||||||
|
assert_eq(DEFAULT_MM_PENALTY_TYPE, penMmcType);
|
||||||
|
assert_eq(DEFAULT_MM_PENALTY_MAX, penMmcMax);
|
||||||
|
assert_eq(DEFAULT_MM_PENALTY_MIN, penMmcMin);
|
||||||
|
assert_eq(DEFAULT_N_PENALTY_TYPE, penNType);
|
||||||
|
assert_eq(DEFAULT_N_PENALTY, penN);
|
||||||
|
assert_eq(DEFAULT_MIN_CONST_LOCAL, costMin.getConst());
|
||||||
|
assert_eq(DEFAULT_MIN_LINEAR_LOCAL, costMin.getCoeff());
|
||||||
|
assert_eq(DEFAULT_FLOOR_CONST_LOCAL, costFloor.getConst());
|
||||||
|
assert_eq(DEFAULT_FLOOR_LINEAR_LOCAL, costFloor.getCoeff());
|
||||||
|
assert_eq(DEFAULT_N_CEIL_CONST, nCeil.getConst());
|
||||||
|
assert_eq(DEFAULT_N_CEIL_LINEAR, nCeil.getCoeff());
|
||||||
|
assert_eq(DEFAULT_N_CAT_PAIR, nCatPair);
|
||||||
|
|
||||||
|
assert_eq(DEFAULT_READ_GAP_CONST, penRdExConst);
|
||||||
|
assert_eq(DEFAULT_READ_GAP_LINEAR, penRdExLinear);
|
||||||
|
assert_eq(DEFAULT_REF_GAP_CONST, penRfExConst);
|
||||||
|
assert_eq(DEFAULT_REF_GAP_LINEAR, penRfExLinear);
|
||||||
|
assert_eq(DEFAULT_SEEDMMS, multiseedMms);
|
||||||
|
assert_eq(DEFAULT_SEEDLEN, multiseedLen);
|
||||||
|
assert_eq(DEFAULT_IVAL, msIval.getType());
|
||||||
|
assert_eq(DEFAULT_IVAL_A, msIval.getCoeff());
|
||||||
|
assert_eq(DEFAULT_IVAL_B, msIval.getConst());
|
||||||
|
|
||||||
|
cout << "PASSED" << endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
{
|
||||||
|
cout << "Case 4: Simple string 1 ... ";
|
||||||
|
const char *pol = "MMP=C44;MA=4;RFG=24,12;FL=C,8;RDG=2;NP=C4;MIN=C,7";
|
||||||
|
SeedAlignmentPolicy::parseString(
|
||||||
|
string(pol),
|
||||||
|
true, // --local?
|
||||||
|
false, // noisy homopolymers a la 454?
|
||||||
|
false, // ignore qualities?
|
||||||
|
bonusMatchType,
|
||||||
|
bonusMatch,
|
||||||
|
penMmcType,
|
||||||
|
penMmc,
|
||||||
|
penNType,
|
||||||
|
penN,
|
||||||
|
penRdExConst,
|
||||||
|
penRfExConst,
|
||||||
|
penRdExLinear,
|
||||||
|
penRfExLinear,
|
||||||
|
costMin,
|
||||||
|
costFloor,
|
||||||
|
nCeil,
|
||||||
|
nCatPair,
|
||||||
|
multiseedMms,
|
||||||
|
multiseedLen,
|
||||||
|
msIval,
|
||||||
|
mhits);
|
||||||
|
|
||||||
|
assert_eq(COST_MODEL_CONSTANT, bonusMatchType);
|
||||||
|
assert_eq(4, bonusMatch);
|
||||||
|
assert_eq(COST_MODEL_CONSTANT, penMmcType);
|
||||||
|
assert_eq(44, penMmc);
|
||||||
|
assert_eq(COST_MODEL_CONSTANT, penNType);
|
||||||
|
assert_eq(4.0f, penN);
|
||||||
|
assert_eq(7, costMin.getConst());
|
||||||
|
assert_eq(DEFAULT_MIN_LINEAR_LOCAL, costMin.getCoeff());
|
||||||
|
assert_eq(8, costFloor.getConst());
|
||||||
|
assert_eq(DEFAULT_FLOOR_LINEAR_LOCAL, costFloor.getCoeff());
|
||||||
|
assert_eq(DEFAULT_N_CEIL_CONST, nCeil.getConst());
|
||||||
|
assert_eq(DEFAULT_N_CEIL_LINEAR, nCeil.getCoeff());
|
||||||
|
assert_eq(DEFAULT_N_CAT_PAIR, nCatPair);
|
||||||
|
|
||||||
|
assert_eq(2.0f, penRdExConst);
|
||||||
|
assert_eq(DEFAULT_READ_GAP_LINEAR, penRdExLinear);
|
||||||
|
assert_eq(24.0f, penRfExConst);
|
||||||
|
assert_eq(12.0f, penRfExLinear);
|
||||||
|
assert_eq(DEFAULT_SEEDMMS, multiseedMms);
|
||||||
|
assert_eq(DEFAULT_SEEDLEN, multiseedLen);
|
||||||
|
assert_eq(DEFAULT_IVAL, msIval.getType());
|
||||||
|
assert_eq(DEFAULT_IVAL_A, msIval.getCoeff());
|
||||||
|
assert_eq(DEFAULT_IVAL_B, msIval.getConst());
|
||||||
|
|
||||||
|
cout << "PASSED" << endl;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
#endif /*def ALIGNER_SEED_POLICY_MAIN*/
|
234
aligner_seed_policy.h
Normal file
234
aligner_seed_policy.h
Normal file
@ -0,0 +1,234 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef ALIGNER_SEED_POLICY_H_
|
||||||
|
#define ALIGNER_SEED_POLICY_H_
|
||||||
|
|
||||||
|
#include "scoring.h"
|
||||||
|
#include "simple_func.h"
|
||||||
|
|
||||||
|
#define DEFAULT_SEEDMMS 0
|
||||||
|
#define DEFAULT_SEEDLEN 22
|
||||||
|
|
||||||
|
#define DEFAULT_IVAL SIMPLE_FUNC_SQRT
|
||||||
|
#define DEFAULT_IVAL_A 1.15f
|
||||||
|
#define DEFAULT_IVAL_B 0.0f
|
||||||
|
|
||||||
|
#define DEFAULT_UNGAPPED_HITS 6
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Encapsulates the set of all parameters that affect what the
|
||||||
|
* SeedAligner does with reads.
|
||||||
|
*/
|
||||||
|
class SeedAlignmentPolicy {
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Parse alignment policy when provided in this format:
|
||||||
|
* <lab>=<val>;<lab>=<val>;<lab>=<val>...
|
||||||
|
*
|
||||||
|
* And label=value possibilities are:
|
||||||
|
*
|
||||||
|
* Bonus for a match
|
||||||
|
* -----------------
|
||||||
|
*
|
||||||
|
* MA=xx (default: MA=0, or MA=2 if --local is set)
|
||||||
|
*
|
||||||
|
* xx = Each position where equal read and reference characters match up
|
||||||
|
* in the alignment contriubtes this amount to the total score.
|
||||||
|
*
|
||||||
|
* Penalty for a mismatch
|
||||||
|
* ----------------------
|
||||||
|
*
|
||||||
|
* MMP={Cxx|Q|RQ} (default: MMP=C6)
|
||||||
|
*
|
||||||
|
* Cxx = Each mismatch costs xx. If MMP=Cxx is specified, quality
|
||||||
|
* values are ignored when assessing penalities for mismatches.
|
||||||
|
* Q = Each mismatch incurs a penalty equal to the mismatched base's
|
||||||
|
* value.
|
||||||
|
* R = Each mismatch incurs a penalty equal to the mismatched base's
|
||||||
|
* rounded quality value. Qualities are rounded off to the
|
||||||
|
* nearest 10, and qualities greater than 30 are rounded to 30.
|
||||||
|
*
|
||||||
|
* Penalty for position with N (in either read or reference)
|
||||||
|
* ---------------------------------------------------------
|
||||||
|
*
|
||||||
|
* NP={Cxx|Q|RQ} (default: NP=C1)
|
||||||
|
*
|
||||||
|
* Cxx = Each alignment position with an N in either the read or the
|
||||||
|
* reference costs xx. If NP=Cxx is specified, quality values are
|
||||||
|
* ignored when assessing penalities for Ns.
|
||||||
|
* Q = Each alignment position with an N in either the read or the
|
||||||
|
* reference incurs a penalty equal to the read base's quality
|
||||||
|
* value.
|
||||||
|
* R = Each alignment position with an N in either the read or the
|
||||||
|
* reference incurs a penalty equal to the read base's rounded
|
||||||
|
* quality value. Qualities are rounded off to the nearest 10,
|
||||||
|
* and qualities greater than 30 are rounded to 30.
|
||||||
|
*
|
||||||
|
* Penalty for a read gap
|
||||||
|
* ----------------------
|
||||||
|
*
|
||||||
|
* RDG=xx,yy (default: RDG=5,3)
|
||||||
|
*
|
||||||
|
* xx = Read gap open penalty.
|
||||||
|
* yy = Read gap extension penalty.
|
||||||
|
*
|
||||||
|
* Total cost incurred by a read gap = xx + (yy * gap length)
|
||||||
|
*
|
||||||
|
* Penalty for a reference gap
|
||||||
|
* ---------------------------
|
||||||
|
*
|
||||||
|
* RFG=xx,yy (default: RFG=5,3)
|
||||||
|
*
|
||||||
|
* xx = Reference gap open penalty.
|
||||||
|
* yy = Reference gap extension penalty.
|
||||||
|
*
|
||||||
|
* Total cost incurred by a reference gap = xx + (yy * gap length)
|
||||||
|
*
|
||||||
|
* Minimum score for valid alignment
|
||||||
|
* ---------------------------------
|
||||||
|
*
|
||||||
|
* MIN=xx,yy (defaults: MIN=-0.6,-0.6, or MIN=0.0,0.66 if --local is set)
|
||||||
|
*
|
||||||
|
* xx,yy = For a read of length N, the total score must be at least
|
||||||
|
* xx + (read length * yy) for the alignment to be valid. The
|
||||||
|
* total score is the sum of all negative penalties (from
|
||||||
|
* mismatches and gaps) and all positive bonuses. The minimum
|
||||||
|
* can be negative (and is by default in global alignment mode).
|
||||||
|
*
|
||||||
|
* N ceiling
|
||||||
|
* ---------
|
||||||
|
*
|
||||||
|
* NCEIL=xx,yy (default: NCEIL=0.0,0.15)
|
||||||
|
*
|
||||||
|
* xx,yy = For a read of length N, the number of alignment
|
||||||
|
* positions with an N in either the read or the
|
||||||
|
* reference cannot exceed
|
||||||
|
* ceiling = xx + (read length * yy). If the ceiling is
|
||||||
|
* exceeded, the alignment is considered invalid.
|
||||||
|
*
|
||||||
|
* Seeds
|
||||||
|
* -----
|
||||||
|
*
|
||||||
|
* SEED=mm,len,ival (default: SEED=0,22)
|
||||||
|
*
|
||||||
|
* mm = Maximum number of mismatches allowed within a seed.
|
||||||
|
* Must be >= 0 and <= 2. Note that 2-mismatch mode is
|
||||||
|
* not fully sensitive; i.e. some 2-mismatch seed
|
||||||
|
* alignments may be missed.
|
||||||
|
* len = Length of seed.
|
||||||
|
* ival = Interval between seeds. If not specified, seed
|
||||||
|
* interval is determined by IVAL.
|
||||||
|
*
|
||||||
|
* Seed interval
|
||||||
|
* -------------
|
||||||
|
*
|
||||||
|
* IVAL={L|S|C},xx,yy (default: IVAL=S,1.0,0.0)
|
||||||
|
*
|
||||||
|
* L = let interval between seeds be a linear function of the
|
||||||
|
* read length. xx and yy are the constant and linear
|
||||||
|
* coefficients respectively. In other words, the interval
|
||||||
|
* equals a * len + b, where len is the read length.
|
||||||
|
* Intervals less than 1 are rounded up to 1.
|
||||||
|
* S = let interval between seeds be a function of the sqaure
|
||||||
|
* root of the read length. xx and yy are the
|
||||||
|
* coefficients. In other words, the interval equals
|
||||||
|
* a * sqrt(len) + b, where len is the read length.
|
||||||
|
* Intervals less than 1 are rounded up to 1.
|
||||||
|
* C = Like S but uses cube root of length instead of square
|
||||||
|
* root.
|
||||||
|
*
|
||||||
|
* Example 1:
|
||||||
|
*
|
||||||
|
* SEED=1,10,5 and read sequence is TGCTATCGTACGATCGTAC:
|
||||||
|
*
|
||||||
|
* The following seeds are extracted from the forward
|
||||||
|
* representation of the read and aligned to the reference
|
||||||
|
* allowing up to 1 mismatch:
|
||||||
|
*
|
||||||
|
* Read: TGCTATCGTACGATCGTACA
|
||||||
|
*
|
||||||
|
* Seed 1+: TGCTATCGTA
|
||||||
|
* Seed 2+: TCGTACGATC
|
||||||
|
* Seed 3+: CGATCGTACA
|
||||||
|
*
|
||||||
|
* ...and the following are extracted from the reverse-complement
|
||||||
|
* representation of the read and align to the reference allowing
|
||||||
|
* up to 1 mismatch:
|
||||||
|
*
|
||||||
|
* Seed 1-: TACGATAGCA
|
||||||
|
* Seed 2-: GATCGTACGA
|
||||||
|
* Seed 3-: TGTACGATCG
|
||||||
|
*
|
||||||
|
* Example 2:
|
||||||
|
*
|
||||||
|
* SEED=1,20,20 and read sequence is TGCTATCGTACGATC. The seed
|
||||||
|
* length is 20 but the read is only 15 characters long. In this
|
||||||
|
* case, Bowtie2 automatically shrinks the seed length to be equal
|
||||||
|
* to the read length.
|
||||||
|
*
|
||||||
|
* Read: TGCTATCGTACGATC
|
||||||
|
*
|
||||||
|
* Seed 1+: TGCTATCGTACGATC
|
||||||
|
* Seed 1-: GATCGTACGATAGCA
|
||||||
|
*
|
||||||
|
* Example 3:
|
||||||
|
*
|
||||||
|
* SEED=1,10,10 and read sequence is TGCTATCGTACGATC. Only one seed
|
||||||
|
* fits on the read; a second seed would overhang the end of the read
|
||||||
|
* by 5 positions. In this case, Bowtie2 extracts one seed.
|
||||||
|
*
|
||||||
|
* Read: TGCTATCGTACGATC
|
||||||
|
*
|
||||||
|
* Seed 1+: TGCTATCGTA
|
||||||
|
* Seed 1-: TACGATAGCA
|
||||||
|
*/
|
||||||
|
static void parseString(
|
||||||
|
const std::string& s,
|
||||||
|
bool local,
|
||||||
|
bool noisyHpolymer,
|
||||||
|
bool ignoreQuals,
|
||||||
|
int& bonusMatchType,
|
||||||
|
int& bonusMatch,
|
||||||
|
int& penMmcType,
|
||||||
|
int& penMmcMax,
|
||||||
|
int& penMmcMin,
|
||||||
|
int& penScMax,
|
||||||
|
int& penScMin,
|
||||||
|
int& penNType,
|
||||||
|
int& penN,
|
||||||
|
int& penRdExConst,
|
||||||
|
int& penRfExConst,
|
||||||
|
int& penRdExLinear,
|
||||||
|
int& penRfExLinear,
|
||||||
|
SimpleFunc& costMin,
|
||||||
|
SimpleFunc& nCeil,
|
||||||
|
bool& nCatPair,
|
||||||
|
int& multiseedMms,
|
||||||
|
int& multiseedLen,
|
||||||
|
SimpleFunc& multiseedIval,
|
||||||
|
size_t& failStreak,
|
||||||
|
size_t& seedRounds,
|
||||||
|
SimpleFunc* penCanIntronLen = NULL,
|
||||||
|
SimpleFunc* penNoncanIntronLen = NULL);
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif /*ndef ALIGNER_SEED_POLICY_H_*/
|
3214
aligner_sw.cpp
Normal file
3214
aligner_sw.cpp
Normal file
File diff suppressed because it is too large
Load Diff
648
aligner_sw.h
Normal file
648
aligner_sw.h
Normal file
@ -0,0 +1,648 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
* aligner_sw.h
|
||||||
|
*
|
||||||
|
* Classes and routines for solving dynamic programming problems in aid of read
|
||||||
|
* alignment. Goals include the ability to handle:
|
||||||
|
*
|
||||||
|
* - Both read alignment, where the query must align end-to-end, and local
|
||||||
|
* alignment, where we seek a high-scoring alignment that need not involve
|
||||||
|
* the entire query.
|
||||||
|
* - Situations where: (a) we've found a seed hit and are trying to extend it
|
||||||
|
* into a larger hit, (b) we've found an alignment for one mate of a pair and
|
||||||
|
* are trying to find a nearby alignment for the other mate, (c) we're
|
||||||
|
* aligning against an entire reference sequence.
|
||||||
|
* - Caller-specified indicators for what columns of the dynamic programming
|
||||||
|
* matrix we are allowed to start in or end in.
|
||||||
|
*
|
||||||
|
* TODO:
|
||||||
|
*
|
||||||
|
* - A slicker way to filter out alignments that violate a ceiling placed on
|
||||||
|
* the number of Ns permitted in the reference portion of the alignment.
|
||||||
|
* Right now we accomplish this by masking out ending columns that correspond
|
||||||
|
* to *ungapped* alignments with too many Ns. This results in false
|
||||||
|
* positives and false negatives for gapped alignments. The margin of error
|
||||||
|
* (# of Ns by which we might miscount) is bounded by the number of gaps.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* |-maxgaps-|
|
||||||
|
* ***********oooooooooooooooooooooo -
|
||||||
|
* ***********ooooooooooooooooooooo |
|
||||||
|
* ***********oooooooooooooooooooo |
|
||||||
|
* ***********ooooooooooooooooooo |
|
||||||
|
* ***********oooooooooooooooooo |
|
||||||
|
* ***********ooooooooooooooooo read len
|
||||||
|
* ***********oooooooooooooooo |
|
||||||
|
* ***********ooooooooooooooo |
|
||||||
|
* ***********oooooooooooooo |
|
||||||
|
* ***********ooooooooooooo |
|
||||||
|
* ***********oooooooooooo -
|
||||||
|
* |-maxgaps-|
|
||||||
|
* |-readlen-|
|
||||||
|
* |-------skip--------|
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef ALIGNER_SW_H_
|
||||||
|
#define ALIGNER_SW_H_
|
||||||
|
|
||||||
|
#define INLINE_CUPS
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <iostream>
|
||||||
|
#include <limits>
|
||||||
|
#include "threading.h"
|
||||||
|
#include <emmintrin.h>
|
||||||
|
#include "aligner_sw_common.h"
|
||||||
|
#include "aligner_sw_nuc.h"
|
||||||
|
#include "ds.h"
|
||||||
|
#include "aligner_seed.h"
|
||||||
|
#include "reference.h"
|
||||||
|
#include "random_source.h"
|
||||||
|
#include "mem_ids.h"
|
||||||
|
#include "aligner_result.h"
|
||||||
|
#include "mask.h"
|
||||||
|
#include "dp_framer.h"
|
||||||
|
#include "aligner_swsse.h"
|
||||||
|
#include "aligner_bt.h"
|
||||||
|
|
||||||
|
#define QUAL2(d, f) sc_->mm((int)(*rd_)[rdi_ + d], \
|
||||||
|
(int) rf_ [rfi_ + f], \
|
||||||
|
(int)(*qu_)[rdi_ + d] - 33)
|
||||||
|
#define QUAL(d) sc_->mm((int)(*rd_)[rdi_ + d], \
|
||||||
|
(int)(*qu_)[rdi_ + d] - 33)
|
||||||
|
#define N_SNP_PEN(c) (((int)rf_[rfi_ + c] > 15) ? sc_->n(30) : sc_->penSnp)
|
||||||
|
|
||||||
|
/**
|
||||||
|
* SwAligner
|
||||||
|
* =========
|
||||||
|
*
|
||||||
|
* Ensapsulates facilities for alignment using dynamic programming. Handles
|
||||||
|
* alignment of nucleotide reads against known reference nucleotides.
|
||||||
|
*
|
||||||
|
* The class is stateful. First the user must call init() to initialize the
|
||||||
|
* object with details regarding the dynamic programming problem to be solved.
|
||||||
|
* Next, the user calls align() to fill the dynamic programming matrix and
|
||||||
|
* calculate summaries describing the solutions. Finally the user calls
|
||||||
|
* nextAlignment(...), perhaps repeatedly, to populate the SwResult object with
|
||||||
|
* the next result. Results are dispensend in best-to-worst, left-to-right
|
||||||
|
* order.
|
||||||
|
*
|
||||||
|
* The class expects the read string, quality string, and reference string
|
||||||
|
* provided by the caller live at least until the user is finished aligning and
|
||||||
|
* obtaining alignments from this object.
|
||||||
|
*
|
||||||
|
* There is a design tradeoff between hiding/exposing details of the genome and
|
||||||
|
* its strands to the SwAligner. In a sense, a better design is to hide
|
||||||
|
* details such as the id of the reference sequence aligned to, or whether
|
||||||
|
* we're aligning the read in its original forward orientation or its reverse
|
||||||
|
* complement. But this means that any alignment results returned by SwAligner
|
||||||
|
* have to be extended to include those details before they're useful to the
|
||||||
|
* caller. We opt for messy but expedient - the reference id and orientation
|
||||||
|
* of the read are given to SwAligner, remembered, and used to populate
|
||||||
|
* SwResults.
|
||||||
|
*
|
||||||
|
* LOCAL VS GLOBAL
|
||||||
|
*
|
||||||
|
* The dynamic programming aligner supports both local and global alignment,
|
||||||
|
* and one option in between. To implement global alignment, the aligner (a)
|
||||||
|
* allows negative scores (i.e. doesn't necessarily clamp them up to 0), (b)
|
||||||
|
* checks in rows other than the last row for acceptable solutions, and (c)
|
||||||
|
* optionally adds a bonus to the score for matches.
|
||||||
|
*
|
||||||
|
* For global alignment, we:
|
||||||
|
*
|
||||||
|
* (a) Allow negative scores
|
||||||
|
* (b) Check only in the last row
|
||||||
|
* (c) Either add a bonus for matches or not (doesn't matter)
|
||||||
|
*
|
||||||
|
* For local alignment, we:
|
||||||
|
*
|
||||||
|
* (a) Clamp scores to 0
|
||||||
|
* (b) Check in any row for a sufficiently high score
|
||||||
|
* (c) Add a bonus for matches
|
||||||
|
*
|
||||||
|
* An in-between solution is to allow alignments to be curtailed on the
|
||||||
|
* right-hand side if a better score can be achieved thereby, but not on the
|
||||||
|
* left. For this, we:
|
||||||
|
*
|
||||||
|
* (a) Allow negative scores
|
||||||
|
* (b) Check in any row for a sufficiently high score
|
||||||
|
* (c) Either add a bonus for matches or not (doesn't matter)
|
||||||
|
*
|
||||||
|
* REDUNDANT ALIGNMENTS
|
||||||
|
*
|
||||||
|
* When are two alignments distinct and when are they redundant (not distinct)?
|
||||||
|
* At one extreme, we might say the best alignment from any given dynamic
|
||||||
|
* programming problem is redundant with all other alignments from that
|
||||||
|
# problem. At the other extreme, we might say that any two alignments with
|
||||||
|
* distinct starting points and edits are distinct. The former is probably too
|
||||||
|
* conservative for mate-finding DP problems. The latter is certainly too
|
||||||
|
* permissive, since two alignments that differ only in how gaps are arranged
|
||||||
|
* should not be considered distinct.
|
||||||
|
*
|
||||||
|
* Some in-between solutions are:
|
||||||
|
*
|
||||||
|
* (a) If two alignments share an end point on either end, they are redundant.
|
||||||
|
* Otherwise, they are distinct.
|
||||||
|
* (b) If two alignments share *both* end points, they are redundant.
|
||||||
|
* (c) If two alignments share any cells in the DP table, they are redundant.
|
||||||
|
* (d) 2 alignments are redundant if either end within N poss of each other
|
||||||
|
* (e) Like (d) but both instead of either
|
||||||
|
* (f, g) Like d, e, but where N is tied to maxgaps somehow
|
||||||
|
*
|
||||||
|
* Why not (a)? One reason is that it's possible for two alignments to have
|
||||||
|
* different start & end positions but share many cells. Consider alignments 1
|
||||||
|
* and 2 below; their end-points are labeled.
|
||||||
|
*
|
||||||
|
* 1 2
|
||||||
|
* \ \
|
||||||
|
* -\
|
||||||
|
* \
|
||||||
|
* \
|
||||||
|
* \
|
||||||
|
* -\
|
||||||
|
* \ \
|
||||||
|
* 1 2
|
||||||
|
*
|
||||||
|
* 1 and 2 are distinct according to (a) but they share many cells in common.
|
||||||
|
*
|
||||||
|
* Why not (f, g)? It fixes the problem with (a) above by forcing the
|
||||||
|
* alignments to be spread so far that they can't possibly share diagonal cells
|
||||||
|
* in common
|
||||||
|
*/
|
||||||
|
class SwAligner {
|
||||||
|
|
||||||
|
typedef std::pair<size_t, size_t> SizeTPair;
|
||||||
|
|
||||||
|
// States that the aligner can be in
|
||||||
|
enum {
|
||||||
|
STATE_UNINIT, // init() hasn't been called yet
|
||||||
|
STATE_INITED, // init() has been called, but not align()
|
||||||
|
STATE_ALIGNED, // align() has been called
|
||||||
|
};
|
||||||
|
|
||||||
|
const static size_t ALPHA_SIZE = 5;
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
explicit SwAligner() :
|
||||||
|
sseU8fw_(DP_CAT),
|
||||||
|
sseU8rc_(DP_CAT),
|
||||||
|
sseI16fw_(DP_CAT),
|
||||||
|
sseI16rc_(DP_CAT),
|
||||||
|
state_(STATE_UNINIT),
|
||||||
|
initedRead_(false),
|
||||||
|
readSse16_(false),
|
||||||
|
initedRef_(false),
|
||||||
|
rfwbuf_(DP_CAT),
|
||||||
|
btnstack_(DP_CAT),
|
||||||
|
btcells_(DP_CAT),
|
||||||
|
btdiag_(),
|
||||||
|
btncand_(DP_CAT),
|
||||||
|
btncanddone_(DP_CAT),
|
||||||
|
btncanddoneSucc_(0),
|
||||||
|
btncanddoneFail_(0),
|
||||||
|
cper_(),
|
||||||
|
cperMinlen_(),
|
||||||
|
cperPerPow2_(),
|
||||||
|
cperEf_(),
|
||||||
|
cperTri_(),
|
||||||
|
colstop_(0),
|
||||||
|
lastsolcol_(0),
|
||||||
|
cural_(0)
|
||||||
|
ASSERT_ONLY(, cand_tmp_(DP_CAT))
|
||||||
|
{ }
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Prepare the dynamic programming driver with a new read and a new scoring
|
||||||
|
* scheme.
|
||||||
|
*/
|
||||||
|
void initRead(
|
||||||
|
const BTDnaString& rdfw, // read sequence for fw read
|
||||||
|
const BTDnaString& rdrc, // read sequence for rc read
|
||||||
|
const BTString& qufw, // read qualities for fw read
|
||||||
|
const BTString& qurc, // read qualities for rc read
|
||||||
|
size_t rdi, // offset of first read char to align
|
||||||
|
size_t rdf, // offset of last read char to align
|
||||||
|
const Scoring& sc); // scoring scheme
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialize with a new alignment problem.
|
||||||
|
*/
|
||||||
|
void initRef(
|
||||||
|
bool fw, // whether to forward or revcomp read is aligning
|
||||||
|
TRefId refidx, // id of reference aligned against
|
||||||
|
const DPRect& rect, // DP rectangle
|
||||||
|
char *rf, // reference sequence
|
||||||
|
size_t rfi, // offset of first reference char to align to
|
||||||
|
size_t rff, // offset of last reference char to align to
|
||||||
|
TRefOff reflen, // length of reference sequence
|
||||||
|
const Scoring& sc, // scoring scheme
|
||||||
|
TAlScore minsc, // minimum score
|
||||||
|
bool enable8, // use 8-bit SSE if possible?
|
||||||
|
size_t cminlen, // minimum length for using checkpointing scheme
|
||||||
|
size_t cpow2, // interval b/t checkpointed diags; 1 << this
|
||||||
|
bool doTri, // triangular mini-fills?
|
||||||
|
bool extend); // true iff this is a seed extension
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Given a read, an alignment orientation, a range of characters in a
|
||||||
|
* referece sequence, and a bit-encoded version of the reference,
|
||||||
|
* execute the corresponding dynamic programming problem.
|
||||||
|
*
|
||||||
|
* Here we expect that the caller has already narrowed down the relevant
|
||||||
|
* portion of the reference (e.g. using a seed hit) and all we do is
|
||||||
|
* banded dynamic programming in the vicinity of that portion. This is not
|
||||||
|
* the function to call if we are trying to solve the whole alignment
|
||||||
|
* problem with dynamic programming (that is TODO).
|
||||||
|
*
|
||||||
|
* Returns true if an alignment was found, false otherwise.
|
||||||
|
*/
|
||||||
|
void initRef(
|
||||||
|
bool fw, // whether to forward or revcomp read aligned
|
||||||
|
TRefId refidx, // reference aligned against
|
||||||
|
const DPRect& rect, // DP rectangle
|
||||||
|
const BitPairReference& refs, // Reference strings
|
||||||
|
TRefOff reflen, // length of reference sequence
|
||||||
|
const Scoring& sc, // scoring scheme
|
||||||
|
TAlScore minsc, // minimum alignment score
|
||||||
|
bool enable8, // use 8-bit SSE if possible?
|
||||||
|
size_t cminlen, // minimum length for using checkpointing scheme
|
||||||
|
size_t cpow2, // interval b/t checkpointed diags; 1 << this
|
||||||
|
bool doTri, // triangular mini-fills?
|
||||||
|
bool extend, // true iff this is a seed extension
|
||||||
|
size_t upto, // count the number of Ns up to this offset
|
||||||
|
size_t& nsUpto); // output: the number of Ns up to 'upto'
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Given a read, an alignment orientation, a range of characters in a
|
||||||
|
* referece sequence, and a bit-encoded version of the reference, set up
|
||||||
|
* and execute the corresponding ungapped alignment problem. There can
|
||||||
|
* only be one solution.
|
||||||
|
*
|
||||||
|
* The caller has already narrowed down the relevant portion of the
|
||||||
|
* reference using, e.g., the location of a seed hit, or the range of
|
||||||
|
* possible fragment lengths if we're searching for the opposite mate in a
|
||||||
|
* pair.
|
||||||
|
*/
|
||||||
|
int ungappedAlign(
|
||||||
|
const BTDnaString& rd, // read sequence (could be RC)
|
||||||
|
const BTString& qu, // qual sequence (could be rev)
|
||||||
|
const Coord& coord, // coordinate aligned to
|
||||||
|
const BitPairReference& refs, // Reference strings
|
||||||
|
size_t reflen, // length of reference sequence
|
||||||
|
const Scoring& sc, // scoring scheme
|
||||||
|
bool ohang, // allow overhang?
|
||||||
|
TAlScore minsc, // minimum score
|
||||||
|
SwResult& res); // put alignment result here
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Align read 'rd' to reference using read & reference information given
|
||||||
|
* last time init() was called. Uses dynamic programming.
|
||||||
|
*/
|
||||||
|
bool align(RandomSource& rnd, TAlScore& best);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Populate the given SwResult with information about the "next best"
|
||||||
|
* alignment if there is one. If there isn't one, false is returned. Note
|
||||||
|
* that false might be returned even though a call to done() would have
|
||||||
|
* returned false.
|
||||||
|
*/
|
||||||
|
bool nextAlignment(
|
||||||
|
SwResult& res,
|
||||||
|
TAlScore minsc,
|
||||||
|
RandomSource& rnd);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Print out an alignment result as an ASCII DP table.
|
||||||
|
*/
|
||||||
|
void printResultStacked(
|
||||||
|
const SwResult& res,
|
||||||
|
std::ostream& os)
|
||||||
|
{
|
||||||
|
res.alres.printStacked(*rd_, os);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff there are no more solution cells to backtace from.
|
||||||
|
* Note that this may return false in situations where there are actually
|
||||||
|
* no more solutions, but that hasn't been discovered yet.
|
||||||
|
*/
|
||||||
|
bool done() const {
|
||||||
|
assert(initedRead() && initedRef());
|
||||||
|
return cural_ == btncand_.size();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff this SwAligner has been initialized with a read to align.
|
||||||
|
*/
|
||||||
|
inline bool initedRef() const { return initedRef_; }
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff this SwAligner has been initialized with a reference to
|
||||||
|
* align against.
|
||||||
|
*/
|
||||||
|
inline bool initedRead() const { return initedRead_; }
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Reset, signaling that we're done with this dynamic programming problem
|
||||||
|
* and won't be asking for any more alignments.
|
||||||
|
*/
|
||||||
|
inline void reset() { initedRef_ = initedRead_ = false; }
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
/**
|
||||||
|
* Check that aligner is internally consistent.
|
||||||
|
*/
|
||||||
|
bool repOk() const {
|
||||||
|
assert_gt(dpRows(), 0);
|
||||||
|
// Check btncand_
|
||||||
|
for(size_t i = 0; i < btncand_.size(); i++) {
|
||||||
|
assert(btncand_[i].repOk());
|
||||||
|
assert_geq(btncand_[i].score, minsc_);
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the number of alignments given out so far by nextAlignment().
|
||||||
|
*/
|
||||||
|
size_t numAlignmentsReported() const { return cural_; }
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Merge tallies in the counters related to filling the DP table.
|
||||||
|
*/
|
||||||
|
void merge(
|
||||||
|
SSEMetrics& sseU8ExtendMet,
|
||||||
|
SSEMetrics& sseU8MateMet,
|
||||||
|
SSEMetrics& sseI16ExtendMet,
|
||||||
|
SSEMetrics& sseI16MateMet,
|
||||||
|
uint64_t& nbtfiltst,
|
||||||
|
uint64_t& nbtfiltsc,
|
||||||
|
uint64_t& nbtfiltdo)
|
||||||
|
{
|
||||||
|
sseU8ExtendMet.merge(sseU8ExtendMet_);
|
||||||
|
sseU8MateMet.merge(sseU8MateMet_);
|
||||||
|
sseI16ExtendMet.merge(sseI16ExtendMet_);
|
||||||
|
sseI16MateMet.merge(sseI16MateMet_);
|
||||||
|
nbtfiltst += nbtfiltst_;
|
||||||
|
nbtfiltsc += nbtfiltsc_;
|
||||||
|
nbtfiltdo += nbtfiltdo_;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Reset all the counters related to filling in the DP table to 0.
|
||||||
|
*/
|
||||||
|
void resetCounters() {
|
||||||
|
sseU8ExtendMet_.reset();
|
||||||
|
sseU8MateMet_.reset();
|
||||||
|
sseI16ExtendMet_.reset();
|
||||||
|
sseI16MateMet_.reset();
|
||||||
|
nbtfiltst_ = nbtfiltsc_ = nbtfiltdo_ = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the size of the DP problem.
|
||||||
|
*/
|
||||||
|
size_t size() const {
|
||||||
|
return dpRows() * (rff_ - rfi_);
|
||||||
|
}
|
||||||
|
|
||||||
|
protected:
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the number of rows that will be in the dynamic programming table.
|
||||||
|
*/
|
||||||
|
inline size_t dpRows() const {
|
||||||
|
assert(initedRead_);
|
||||||
|
return rdf_ - rdi_;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Align nucleotides from read 'rd' to the reference string 'rf' using
|
||||||
|
* vector instructions. Return the score of the best alignment found, or
|
||||||
|
* the minimum integer if an alignment could not be found. Flag is set to
|
||||||
|
* 0 if an alignment is found, -1 if no valid alignment is found, or -2 if
|
||||||
|
* the score saturated at any point during alignment.
|
||||||
|
*/
|
||||||
|
TAlScore alignNucleotidesEnd2EndSseU8( // unsigned 8-bit elements
|
||||||
|
int& flag, bool debug);
|
||||||
|
TAlScore alignNucleotidesLocalSseU8( // unsigned 8-bit elements
|
||||||
|
int& flag, bool debug);
|
||||||
|
TAlScore alignNucleotidesEnd2EndSseI16( // signed 16-bit elements
|
||||||
|
int& flag, bool debug);
|
||||||
|
TAlScore alignNucleotidesLocalSseI16( // signed 16-bit elements
|
||||||
|
int& flag, bool debug);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Aligns by filling a dynamic programming matrix with the SSE-accelerated,
|
||||||
|
* banded DP approach of Farrar. As it goes, it determines which cells we
|
||||||
|
* might backtrace from and tallies the best (highest-scoring) N backtrace
|
||||||
|
* candidate cells per diagonal. Also returns the alignment score of the best
|
||||||
|
* alignment in the matrix.
|
||||||
|
*
|
||||||
|
* This routine does *not* maintain a matrix holding the entire matrix worth of
|
||||||
|
* scores, nor does it maintain any other dense O(mn) data structure, as this
|
||||||
|
* would quickly exhaust memory for queries longer than about 10,000 kb.
|
||||||
|
* Instead, in the fill stage it maintains two columns worth of scores at a
|
||||||
|
* time (current/previous, or right/left) - these take O(m) space. When
|
||||||
|
* finished with the current column, it determines which cells from the
|
||||||
|
* previous column, if any, are candidates we might backtrace from to find a
|
||||||
|
* full alignment. A candidate cell has a score that rises above the threshold
|
||||||
|
* and isn't improved upon by a match in the next column. The best N
|
||||||
|
* candidates per diagonal are stored in a O(m + n) data structure.
|
||||||
|
*/
|
||||||
|
TAlScore alignGatherEE8( // unsigned 8-bit elements
|
||||||
|
int& flag, bool debug);
|
||||||
|
TAlScore alignGatherLoc8( // unsigned 8-bit elements
|
||||||
|
int& flag, bool debug);
|
||||||
|
TAlScore alignGatherEE16( // signed 16-bit elements
|
||||||
|
int& flag, bool debug);
|
||||||
|
TAlScore alignGatherLoc16( // signed 16-bit elements
|
||||||
|
int& flag, bool debug);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Build query profile look up tables for the read. The query profile look
|
||||||
|
* up table is organized as a 1D array indexed by [i][j] where i is the
|
||||||
|
* reference character in the current DP column (0=A, 1=C, etc), and j is
|
||||||
|
* the segment of the query we're currently working on.
|
||||||
|
*/
|
||||||
|
void buildQueryProfileEnd2EndSseU8(bool fw);
|
||||||
|
void buildQueryProfileLocalSseU8(bool fw);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Build query profile look up tables for the read. The query profile look
|
||||||
|
* up table is organized as a 1D array indexed by [i][j] where i is the
|
||||||
|
* reference character in the current DP column (0=A, 1=C, etc), and j is
|
||||||
|
* the segment of the query we're currently working on.
|
||||||
|
*/
|
||||||
|
void buildQueryProfileEnd2EndSseI16(bool fw);
|
||||||
|
void buildQueryProfileLocalSseI16(bool fw);
|
||||||
|
|
||||||
|
bool gatherCellsNucleotidesLocalSseU8(TAlScore best);
|
||||||
|
bool gatherCellsNucleotidesEnd2EndSseU8(TAlScore best);
|
||||||
|
|
||||||
|
bool gatherCellsNucleotidesLocalSseI16(TAlScore best);
|
||||||
|
bool gatherCellsNucleotidesEnd2EndSseI16(TAlScore best);
|
||||||
|
|
||||||
|
bool backtraceNucleotidesLocalSseU8(
|
||||||
|
TAlScore escore, // in: expected score
|
||||||
|
SwResult& res, // out: store results (edits and scores) here
|
||||||
|
size_t& off, // out: store diagonal projection of origin
|
||||||
|
size_t& nbts, // out: # backtracks
|
||||||
|
size_t row, // start in this rectangle row
|
||||||
|
size_t col, // start in this rectangle column
|
||||||
|
RandomSource& rand); // random gen, to choose among equal paths
|
||||||
|
|
||||||
|
bool backtraceNucleotidesLocalSseI16(
|
||||||
|
TAlScore escore, // in: expected score
|
||||||
|
SwResult& res, // out: store results (edits and scores) here
|
||||||
|
size_t& off, // out: store diagonal projection of origin
|
||||||
|
size_t& nbts, // out: # backtracks
|
||||||
|
size_t row, // start in this rectangle row
|
||||||
|
size_t col, // start in this rectangle column
|
||||||
|
RandomSource& rand); // random gen, to choose among equal paths
|
||||||
|
|
||||||
|
bool backtraceNucleotidesEnd2EndSseU8(
|
||||||
|
TAlScore escore, // in: expected score
|
||||||
|
SwResult& res, // out: store results (edits and scores) here
|
||||||
|
size_t& off, // out: store diagonal projection of origin
|
||||||
|
size_t& nbts, // out: # backtracks
|
||||||
|
size_t row, // start in this rectangle row
|
||||||
|
size_t col, // start in this rectangle column
|
||||||
|
RandomSource& rand); // random gen, to choose among equal paths
|
||||||
|
|
||||||
|
bool backtraceNucleotidesEnd2EndSseI16(
|
||||||
|
TAlScore escore, // in: expected score
|
||||||
|
SwResult& res, // out: store results (edits and scores) here
|
||||||
|
size_t& off, // out: store diagonal projection of origin
|
||||||
|
size_t& nbts, // out: # backtracks
|
||||||
|
size_t row, // start in this rectangle row
|
||||||
|
size_t col, // start in this rectangle column
|
||||||
|
RandomSource& rand); // random gen, to choose among equal paths
|
||||||
|
|
||||||
|
bool backtrace(
|
||||||
|
TAlScore escore, // in: expected score
|
||||||
|
bool fill, // in: use mini-fill?
|
||||||
|
bool usecp, // in: use checkpoints?
|
||||||
|
SwResult& res, // out: store results (edits and scores) here
|
||||||
|
size_t& off, // out: store diagonal projection of origin
|
||||||
|
size_t row, // start in this rectangle row
|
||||||
|
size_t col, // start in this rectangle column
|
||||||
|
size_t maxiter,// max # extensions to try
|
||||||
|
size_t& niter, // # extensions tried
|
||||||
|
RandomSource& rnd) // random gen, to choose among equal paths
|
||||||
|
{
|
||||||
|
bter_.initBt(
|
||||||
|
escore, // in: alignment score
|
||||||
|
row, // in: start in this row
|
||||||
|
col, // in: start in this column
|
||||||
|
fill, // in: use mini-fill?
|
||||||
|
usecp, // in: use checkpoints?
|
||||||
|
cperTri_, // in: triangle-shaped mini-fills?
|
||||||
|
rnd); // in: random gen, to choose among equal paths
|
||||||
|
assert(bter_.inited());
|
||||||
|
size_t nrej = 0;
|
||||||
|
if(bter_.emptySolution()) {
|
||||||
|
return false;
|
||||||
|
} else {
|
||||||
|
return bter_.nextAlignment(maxiter, res, off, nrej, niter, rnd);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const BTDnaString *rd_; // read sequence
|
||||||
|
const BTString *qu_; // read qualities
|
||||||
|
const BTDnaString *rdfw_; // read sequence for fw read
|
||||||
|
const BTDnaString *rdrc_; // read sequence for rc read
|
||||||
|
const BTString *qufw_; // read qualities for fw read
|
||||||
|
const BTString *qurc_; // read qualities for rc read
|
||||||
|
TReadOff rdi_; // offset of first read char to align
|
||||||
|
TReadOff rdf_; // offset of last read char to align
|
||||||
|
bool fw_; // true iff read sequence is original fw read
|
||||||
|
TRefId refidx_; // id of reference aligned against
|
||||||
|
TRefOff reflen_; // length of entire reference sequence
|
||||||
|
const DPRect* rect_; // DP rectangle
|
||||||
|
char *rf_; // reference sequence
|
||||||
|
TRefOff rfi_; // offset of first ref char to align to
|
||||||
|
TRefOff rff_; // offset of last ref char to align to (excl)
|
||||||
|
size_t rdgap_; // max # gaps in read
|
||||||
|
size_t rfgap_; // max # gaps in reference
|
||||||
|
bool enable8_;// enable 8-bit sse
|
||||||
|
bool extend_; // true iff this is a seed-extend problem
|
||||||
|
const Scoring *sc_; // penalties for edit types
|
||||||
|
TAlScore minsc_; // penalty ceiling for valid alignments
|
||||||
|
int nceil_; // max # Ns allowed in ref portion of aln
|
||||||
|
|
||||||
|
bool sse8succ_; // whether 8-bit worked
|
||||||
|
bool sse16succ_; // whether 16-bit worked
|
||||||
|
SSEData sseU8fw_; // buf for fw query, 8-bit score
|
||||||
|
SSEData sseU8rc_; // buf for rc query, 8-bit score
|
||||||
|
SSEData sseI16fw_; // buf for fw query, 16-bit score
|
||||||
|
SSEData sseI16rc_; // buf for rc query, 16-bit score
|
||||||
|
bool sseU8fwBuilt_; // built fw query profile, 8-bit score
|
||||||
|
bool sseU8rcBuilt_; // built rc query profile, 8-bit score
|
||||||
|
bool sseI16fwBuilt_; // built fw query profile, 16-bit score
|
||||||
|
bool sseI16rcBuilt_; // built rc query profile, 16-bit score
|
||||||
|
|
||||||
|
SSEMetrics sseU8ExtendMet_;
|
||||||
|
SSEMetrics sseU8MateMet_;
|
||||||
|
SSEMetrics sseI16ExtendMet_;
|
||||||
|
SSEMetrics sseI16MateMet_;
|
||||||
|
|
||||||
|
int state_; // state
|
||||||
|
bool initedRead_; // true iff initialized with initRead
|
||||||
|
bool readSse16_; // true -> sse16 from now on for read
|
||||||
|
bool initedRef_; // true iff initialized with initRef
|
||||||
|
EList<uint32_t> rfwbuf_; // buffer for wordized ref stretches
|
||||||
|
|
||||||
|
EList<DpNucFrame> btnstack_; // backtrace stack for nucleotides
|
||||||
|
EList<SizeTPair> btcells_; // cells involved in current backtrace
|
||||||
|
|
||||||
|
NBest<DpBtCandidate> btdiag_; // per-diagonal backtrace candidates
|
||||||
|
EList<DpBtCandidate> btncand_; // cells we might backtrace from
|
||||||
|
EList<DpBtCandidate> btncanddone_; // candidates that we investigated
|
||||||
|
size_t btncanddoneSucc_; // # investigated and succeeded
|
||||||
|
size_t btncanddoneFail_; // # investigated and failed
|
||||||
|
|
||||||
|
BtBranchTracer bter_; // backtracer
|
||||||
|
|
||||||
|
Checkpointer cper_; // structure for saving checkpoint cells
|
||||||
|
size_t cperMinlen_; // minimum length for using checkpointer
|
||||||
|
size_t cperPerPow2_; // checkpoint every 1 << perpow2 diags (& next)
|
||||||
|
bool cperEf_; // store E and F in addition to H?
|
||||||
|
bool cperTri_; // checkpoint for triangular mini-fills?
|
||||||
|
|
||||||
|
size_t colstop_; // bailed on DP loop after this many cols
|
||||||
|
size_t lastsolcol_; // last DP col with valid cell
|
||||||
|
size_t cural_; // index of next alignment to be given
|
||||||
|
|
||||||
|
uint64_t nbtfiltst_; // # candidates filtered b/c starting cell was seen
|
||||||
|
uint64_t nbtfiltsc_; // # candidates filtered b/c score uninteresting
|
||||||
|
uint64_t nbtfiltdo_; // # candidates filtered b/c dominated by other cell
|
||||||
|
|
||||||
|
ASSERT_ONLY(SStringExpandable<uint32_t> tmp_destU32_);
|
||||||
|
ASSERT_ONLY(BTDnaString tmp_editstr_, tmp_refstr_);
|
||||||
|
ASSERT_ONLY(EList<DpBtCandidate> cand_tmp_);
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif /*ALIGNER_SW_H_*/
|
305
aligner_sw_common.h
Normal file
305
aligner_sw_common.h
Normal file
@ -0,0 +1,305 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef ALIGNER_SW_COMMON_H_
|
||||||
|
#define ALIGNER_SW_COMMON_H_
|
||||||
|
|
||||||
|
#include "aligner_result.h"
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Encapsulates the result of a dynamic programming alignment, including
|
||||||
|
* colorspace alignments. In our case, the result is a combination of:
|
||||||
|
*
|
||||||
|
* 1. All the nucleotide edits
|
||||||
|
* 2. All the "edits" where an ambiguous reference char is resolved to
|
||||||
|
* an unambiguous char.
|
||||||
|
* 3. All the color edits (if applicable)
|
||||||
|
* 4. All the color miscalls (if applicable). This is a subset of 3.
|
||||||
|
* 5. The score of the best alginment
|
||||||
|
* 6. The score of the second-best alignment
|
||||||
|
*
|
||||||
|
* Having scores for the best and second-best alignments gives us an
|
||||||
|
* idea of where gaps may make reassembly beneficial.
|
||||||
|
*/
|
||||||
|
struct SwResult {
|
||||||
|
|
||||||
|
SwResult() :
|
||||||
|
alres(),
|
||||||
|
sws(0),
|
||||||
|
swcups(0),
|
||||||
|
swrows(0),
|
||||||
|
swskiprows(0),
|
||||||
|
swskip(0),
|
||||||
|
swsucc(0),
|
||||||
|
swfail(0),
|
||||||
|
swbts(0)
|
||||||
|
{ }
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Clear all contents.
|
||||||
|
*/
|
||||||
|
void reset() {
|
||||||
|
sws = swcups = swrows = swskiprows = swskip = swsucc =
|
||||||
|
swfail = swbts = 0;
|
||||||
|
alres.reset();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Reverse all edit lists.
|
||||||
|
*/
|
||||||
|
void reverse() {
|
||||||
|
alres.reverseEdits();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff no result has been installed.
|
||||||
|
*/
|
||||||
|
bool empty() const {
|
||||||
|
return alres.empty();
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
/**
|
||||||
|
* Check that result is internally consistent.
|
||||||
|
*/
|
||||||
|
bool repOk() const {
|
||||||
|
assert(alres.repOk());
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Check that result is internally consistent w/r/t read.
|
||||||
|
*/
|
||||||
|
bool repOk(const Read& rd) const {
|
||||||
|
assert(alres.repOk(rd));
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
AlnRes alres;
|
||||||
|
uint64_t sws; // # DP problems solved
|
||||||
|
uint64_t swcups; // # DP cell updates
|
||||||
|
uint64_t swrows; // # DP row updates
|
||||||
|
uint64_t swskiprows; // # skipped DP row updates (b/c no valid alignments can go thru row)
|
||||||
|
uint64_t swskip; // # DP problems skipped by sse filter
|
||||||
|
uint64_t swsucc; // # DP problems resulting in alignment
|
||||||
|
uint64_t swfail; // # DP problems not resulting in alignment
|
||||||
|
uint64_t swbts; // # DP backtrace steps
|
||||||
|
|
||||||
|
int nup; // upstream decoded nucleotide; for colorspace reads
|
||||||
|
int ndn; // downstream decoded nucleotide; for colorspace reads
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Encapsulates counters that measure how much work has been done by
|
||||||
|
* the dynamic programming driver and aligner.
|
||||||
|
*/
|
||||||
|
struct SwMetrics {
|
||||||
|
|
||||||
|
SwMetrics() : mutex_m() {
|
||||||
|
reset();
|
||||||
|
}
|
||||||
|
|
||||||
|
void reset() {
|
||||||
|
sws = swcups = swrows = swskiprows = swskip = swsucc = swfail = swbts =
|
||||||
|
sws10 = sws5 = sws3 =
|
||||||
|
rshit = ungapsucc = ungapfail = ungapnodec = 0;
|
||||||
|
exatts = exranges = exrows = exsucc = exooms = 0;
|
||||||
|
mm1atts = mm1ranges = mm1rows = mm1succ = mm1ooms = 0;
|
||||||
|
sdatts = sdranges = sdrows = sdsucc = sdooms = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
void init(
|
||||||
|
uint64_t sws_,
|
||||||
|
uint64_t sws10_,
|
||||||
|
uint64_t sws5_,
|
||||||
|
uint64_t sws3_,
|
||||||
|
uint64_t swcups_,
|
||||||
|
uint64_t swrows_,
|
||||||
|
uint64_t swskiprows_,
|
||||||
|
uint64_t swskip_,
|
||||||
|
uint64_t swsucc_,
|
||||||
|
uint64_t swfail_,
|
||||||
|
uint64_t swbts_,
|
||||||
|
uint64_t rshit_,
|
||||||
|
uint64_t ungapsucc_,
|
||||||
|
uint64_t ungapfail_,
|
||||||
|
uint64_t ungapnodec_,
|
||||||
|
uint64_t exatts_,
|
||||||
|
uint64_t exranges_,
|
||||||
|
uint64_t exrows_,
|
||||||
|
uint64_t exsucc_,
|
||||||
|
uint64_t exooms_,
|
||||||
|
uint64_t mm1atts_,
|
||||||
|
uint64_t mm1ranges_,
|
||||||
|
uint64_t mm1rows_,
|
||||||
|
uint64_t mm1succ_,
|
||||||
|
uint64_t mm1ooms_,
|
||||||
|
uint64_t sdatts_,
|
||||||
|
uint64_t sdranges_,
|
||||||
|
uint64_t sdrows_,
|
||||||
|
uint64_t sdsucc_,
|
||||||
|
uint64_t sdooms_)
|
||||||
|
{
|
||||||
|
sws = sws_;
|
||||||
|
sws10 = sws10_;
|
||||||
|
sws5 = sws5_;
|
||||||
|
sws3 = sws3_;
|
||||||
|
swcups = swcups_;
|
||||||
|
swrows = swrows_;
|
||||||
|
swskiprows = swskiprows_;
|
||||||
|
swskip = swskip_;
|
||||||
|
swsucc = swsucc_;
|
||||||
|
swfail = swfail_;
|
||||||
|
swbts = swbts_;
|
||||||
|
ungapsucc = ungapsucc_;
|
||||||
|
ungapfail = ungapfail_;
|
||||||
|
ungapnodec = ungapnodec_;
|
||||||
|
|
||||||
|
// Exact end-to-end attempts
|
||||||
|
exatts = exatts_;
|
||||||
|
exranges = exranges_;
|
||||||
|
exrows = exrows_;
|
||||||
|
exsucc = exsucc_;
|
||||||
|
exooms = exooms_;
|
||||||
|
|
||||||
|
// 1-mismatch end-to-end attempts
|
||||||
|
mm1atts = mm1atts_;
|
||||||
|
mm1ranges = mm1ranges_;
|
||||||
|
mm1rows = mm1rows_;
|
||||||
|
mm1succ = mm1succ_;
|
||||||
|
mm1ooms = mm1ooms_;
|
||||||
|
|
||||||
|
// Seed attempts
|
||||||
|
sdatts = sdatts_;
|
||||||
|
sdranges = sdranges_;
|
||||||
|
sdrows = sdrows_;
|
||||||
|
sdsucc = sdsucc_;
|
||||||
|
sdooms = sdooms_;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Merge (add) the counters in the given SwResult object into this
|
||||||
|
* SwMetrics object.
|
||||||
|
*/
|
||||||
|
void update(const SwResult& r) {
|
||||||
|
sws += r.sws;
|
||||||
|
swcups += r.swcups;
|
||||||
|
swrows += r.swrows;
|
||||||
|
swskiprows += r.swskiprows;
|
||||||
|
swskip += r.swskip;
|
||||||
|
swsucc += r.swsucc;
|
||||||
|
swfail += r.swfail;
|
||||||
|
swbts += r.swbts;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Merge (add) the counters in the given SwMetrics object into this
|
||||||
|
* object. This is the only safe way to update a SwMetrics shared
|
||||||
|
* by multiple threads.
|
||||||
|
*/
|
||||||
|
void merge(const SwMetrics& r, bool getLock = false) {
|
||||||
|
ThreadSafe ts(&mutex_m, getLock);
|
||||||
|
sws += r.sws;
|
||||||
|
sws10 += r.sws10;
|
||||||
|
sws5 += r.sws5;
|
||||||
|
sws3 += r.sws3;
|
||||||
|
swcups += r.swcups;
|
||||||
|
swrows += r.swrows;
|
||||||
|
swskiprows += r.swskiprows;
|
||||||
|
swskip += r.swskip;
|
||||||
|
swsucc += r.swsucc;
|
||||||
|
swfail += r.swfail;
|
||||||
|
swbts += r.swbts;
|
||||||
|
rshit += r.rshit;
|
||||||
|
ungapsucc += r.ungapsucc;
|
||||||
|
ungapfail += r.ungapfail;
|
||||||
|
ungapnodec += r.ungapnodec;
|
||||||
|
exatts += r.exatts;
|
||||||
|
exranges += r.exranges;
|
||||||
|
exrows += r.exrows;
|
||||||
|
exsucc += r.exsucc;
|
||||||
|
exooms += r.exooms;
|
||||||
|
mm1atts += r.mm1atts;
|
||||||
|
mm1ranges += r.mm1ranges;
|
||||||
|
mm1rows += r.mm1rows;
|
||||||
|
mm1succ += r.mm1succ;
|
||||||
|
mm1ooms += r.mm1ooms;
|
||||||
|
sdatts += r.sdatts;
|
||||||
|
sdranges += r.sdranges;
|
||||||
|
sdrows += r.sdrows;
|
||||||
|
sdsucc += r.sdsucc;
|
||||||
|
sdooms += r.sdooms;
|
||||||
|
}
|
||||||
|
|
||||||
|
void tallyGappedDp(size_t readGaps, size_t refGaps) {
|
||||||
|
size_t mx = max(readGaps, refGaps);
|
||||||
|
if(mx < 10) sws10++;
|
||||||
|
if(mx < 5) sws5++;
|
||||||
|
if(mx < 3) sws3++;
|
||||||
|
}
|
||||||
|
|
||||||
|
uint64_t sws; // # DP problems solved
|
||||||
|
uint64_t sws10; // # DP problems solved where max gaps < 10
|
||||||
|
uint64_t sws5; // # DP problems solved where max gaps < 5
|
||||||
|
uint64_t sws3; // # DP problems solved where max gaps < 3
|
||||||
|
uint64_t swcups; // # DP cell updates
|
||||||
|
uint64_t swrows; // # DP row updates
|
||||||
|
uint64_t swskiprows; // # skipped DP rows (b/c no valid alns go thru row)
|
||||||
|
uint64_t swskip; // # DP problems skipped by sse filter
|
||||||
|
uint64_t swsucc; // # DP problems resulting in alignment
|
||||||
|
uint64_t swfail; // # DP problems not resulting in alignment
|
||||||
|
uint64_t swbts; // # DP backtrace steps
|
||||||
|
uint64_t rshit; // # DP problems avoided b/c seed hit was redundant
|
||||||
|
uint64_t ungapsucc; // # DP problems avoided b/c seed hit was redundant
|
||||||
|
uint64_t ungapfail; // # DP problems avoided b/c seed hit was redundant
|
||||||
|
uint64_t ungapnodec; // # DP problems avoided b/c seed hit was redundant
|
||||||
|
|
||||||
|
uint64_t exatts; // total # attempts at exact-hit end-to-end aln
|
||||||
|
uint64_t exranges; // total # ranges returned by exact-hit queries
|
||||||
|
uint64_t exrows; // total # rows returned by exact-hit queries
|
||||||
|
uint64_t exsucc; // exact-hit yielded non-empty result
|
||||||
|
uint64_t exooms; // exact-hit offset memory exhausted
|
||||||
|
|
||||||
|
uint64_t mm1atts; // total # attempts at 1mm end-to-end aln
|
||||||
|
uint64_t mm1ranges; // total # ranges returned by 1mm-hit queries
|
||||||
|
uint64_t mm1rows; // total # rows returned by 1mm-hit queries
|
||||||
|
uint64_t mm1succ; // 1mm-hit yielded non-empty result
|
||||||
|
uint64_t mm1ooms; // 1mm-hit offset memory exhausted
|
||||||
|
|
||||||
|
uint64_t sdatts; // total # attempts to find seed alignments
|
||||||
|
uint64_t sdranges; // total # seed-alignment ranges found
|
||||||
|
uint64_t sdrows; // total # seed-alignment rows found
|
||||||
|
uint64_t sdsucc; // # times seed alignment yielded >= 1 hit
|
||||||
|
uint64_t sdooms; // # times an OOM occurred during seed alignment
|
||||||
|
|
||||||
|
MUTEX_T mutex_m;
|
||||||
|
};
|
||||||
|
|
||||||
|
// The various ways that one might backtrack from a later cell (either oall,
|
||||||
|
// rdgap or rfgap) to an earlier cell
|
||||||
|
enum {
|
||||||
|
SW_BT_OALL_DIAG, // from oall cell to oall cell
|
||||||
|
SW_BT_OALL_REF_OPEN, // from oall cell to oall cell
|
||||||
|
SW_BT_OALL_READ_OPEN, // from oall cell to oall cell
|
||||||
|
SW_BT_RDGAP_EXTEND, // from rdgap cell to rdgap cell
|
||||||
|
SW_BT_RFGAP_EXTEND // from rfgap cell to rfgap cell
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif /*def ALIGNER_SW_COMMON_H_*/
|
20
aligner_sw_driver.cpp
Normal file
20
aligner_sw_driver.cpp
Normal file
@ -0,0 +1,20 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
2938
aligner_sw_driver.h
Normal file
2938
aligner_sw_driver.h
Normal file
File diff suppressed because it is too large
Load Diff
262
aligner_sw_nuc.h
Normal file
262
aligner_sw_nuc.h
Normal file
@ -0,0 +1,262 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef ALIGNER_SW_NUC_H_
|
||||||
|
#define ALIGNER_SW_NUC_H_
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
#include "aligner_sw_common.h"
|
||||||
|
#include "aligner_result.h"
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Encapsulates a backtrace stack frame. Includes enough information that we
|
||||||
|
* can "pop" back up to this frame and choose to make a different backtracking
|
||||||
|
* decision. The information included is:
|
||||||
|
*
|
||||||
|
* 1. The mask at the decision point. When we first move through the mask and
|
||||||
|
* when we backtrack to it, we're careful to mask out the bit corresponding
|
||||||
|
* to the path we're taking. When we move through it after removing the
|
||||||
|
* last bit from the mask, we're careful to pop it from the stack.
|
||||||
|
* 2. The sizes of the edit lists. When we backtrack, we resize the lists back
|
||||||
|
* down to these sizes to get rid of any edits introduced since the branch
|
||||||
|
* point.
|
||||||
|
*/
|
||||||
|
struct DpNucFrame {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialize a new DpNucFrame stack frame.
|
||||||
|
*/
|
||||||
|
void init(
|
||||||
|
size_t nedsz_,
|
||||||
|
size_t aedsz_,
|
||||||
|
size_t celsz_,
|
||||||
|
size_t row_,
|
||||||
|
size_t col_,
|
||||||
|
size_t gaps_,
|
||||||
|
size_t readGaps_,
|
||||||
|
size_t refGaps_,
|
||||||
|
AlnScore score_,
|
||||||
|
int ct_)
|
||||||
|
{
|
||||||
|
nedsz = nedsz_;
|
||||||
|
aedsz = aedsz_;
|
||||||
|
celsz = celsz_;
|
||||||
|
row = row_;
|
||||||
|
col = col_;
|
||||||
|
gaps = gaps_;
|
||||||
|
readGaps = readGaps_;
|
||||||
|
refGaps = refGaps_;
|
||||||
|
score = score_;
|
||||||
|
ct = ct_;
|
||||||
|
}
|
||||||
|
|
||||||
|
size_t nedsz; // size of the nucleotide edit list at branch (before
|
||||||
|
// adding the branch edit)
|
||||||
|
size_t aedsz; // size of ambiguous nucleotide edit list at branch
|
||||||
|
size_t celsz; // size of cell-traversed list at branch
|
||||||
|
size_t row; // row of cell where branch occurred
|
||||||
|
size_t col; // column of cell where branch occurred
|
||||||
|
size_t gaps; // number of gaps before branch occurred
|
||||||
|
size_t readGaps; // number of read gaps before branch occurred
|
||||||
|
size_t refGaps; // number of ref gaps before branch occurred
|
||||||
|
AlnScore score; // score where branch occurred
|
||||||
|
int ct; // table type (oall, rdgap or rfgap)
|
||||||
|
};
|
||||||
|
|
||||||
|
enum {
|
||||||
|
BT_CAND_FATE_SUCCEEDED = 1,
|
||||||
|
BT_CAND_FATE_FAILED,
|
||||||
|
BT_CAND_FATE_FILT_START, // skipped b/c starting cell already explored
|
||||||
|
BT_CAND_FATE_FILT_DOMINATED, // skipped b/c it was dominated
|
||||||
|
BT_CAND_FATE_FILT_SCORE // skipped b/c score not interesting anymore
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Encapsulates a cell that we might want to backtrace from.
|
||||||
|
*/
|
||||||
|
struct DpBtCandidate {
|
||||||
|
|
||||||
|
DpBtCandidate() { reset(); }
|
||||||
|
|
||||||
|
DpBtCandidate(size_t row_, size_t col_, TAlScore score_) {
|
||||||
|
init(row_, col_, score_);
|
||||||
|
}
|
||||||
|
|
||||||
|
void reset() { init(0, 0, 0); }
|
||||||
|
|
||||||
|
void init(size_t row_, size_t col_, TAlScore score_) {
|
||||||
|
row = row_;
|
||||||
|
col = col_;
|
||||||
|
score = score_;
|
||||||
|
// 0 = invalid; this should be set later according to what happens
|
||||||
|
// before / during the backtrace
|
||||||
|
fate = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff this candidate is (heuristically) dominated by the given
|
||||||
|
* candidate. We say that candidate A dominates candidate B if (a) B is
|
||||||
|
* somewhere in the N x N square that extends up and to the left of A,
|
||||||
|
* where N is an arbitrary number like 20, and (b) B's score is <= than
|
||||||
|
* A's.
|
||||||
|
*/
|
||||||
|
inline bool dominatedBy(const DpBtCandidate& o) {
|
||||||
|
const size_t SQ = 40;
|
||||||
|
size_t rowhi = row;
|
||||||
|
size_t rowlo = o.row;
|
||||||
|
if(rowhi < rowlo) swap(rowhi, rowlo);
|
||||||
|
size_t colhi = col;
|
||||||
|
size_t collo = o.col;
|
||||||
|
if(colhi < collo) swap(colhi, collo);
|
||||||
|
return (colhi - collo) <= SQ &&
|
||||||
|
(rowhi - rowlo) <= SQ;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true if this candidate is "greater than" (should be considered
|
||||||
|
* later than) the given candidate.
|
||||||
|
*/
|
||||||
|
bool operator>(const DpBtCandidate& o) const {
|
||||||
|
if(score < o.score) return true;
|
||||||
|
if(score > o.score) return false;
|
||||||
|
if(row < o.row ) return true;
|
||||||
|
if(row > o.row ) return false;
|
||||||
|
if(col < o.col ) return true;
|
||||||
|
if(col > o.col ) return false;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true if this candidate is "less than" (should be considered
|
||||||
|
* sooner than) the given candidate.
|
||||||
|
*/
|
||||||
|
bool operator<(const DpBtCandidate& o) const {
|
||||||
|
if(score > o.score) return true;
|
||||||
|
if(score < o.score) return false;
|
||||||
|
if(row > o.row ) return true;
|
||||||
|
if(row < o.row ) return false;
|
||||||
|
if(col > o.col ) return true;
|
||||||
|
if(col < o.col ) return false;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true if this candidate equals the given candidate.
|
||||||
|
*/
|
||||||
|
bool operator==(const DpBtCandidate& o) const {
|
||||||
|
return row == o.row &&
|
||||||
|
col == o.col &&
|
||||||
|
score == o.score;
|
||||||
|
}
|
||||||
|
bool operator>=(const DpBtCandidate& o) const { return !((*this) < o); }
|
||||||
|
bool operator<=(const DpBtCandidate& o) const { return !((*this) > o); }
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
/**
|
||||||
|
* Check internal consistency.
|
||||||
|
*/
|
||||||
|
bool repOk() const {
|
||||||
|
assert(VALID_SCORE(score));
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
size_t row; // cell row
|
||||||
|
size_t col; // cell column w/r/t LHS of rectangle
|
||||||
|
TAlScore score; // score fo alignment
|
||||||
|
int fate; // flag indicating whether we succeeded, failed, skipped
|
||||||
|
};
|
||||||
|
|
||||||
|
template <typename T>
|
||||||
|
class NBest {
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
NBest<T>() { nelt_ = nbest_ = n_ = 0; }
|
||||||
|
|
||||||
|
bool inited() const { return nelt_ > 0; }
|
||||||
|
|
||||||
|
void init(size_t nelt, size_t nbest) {
|
||||||
|
nelt_ = nelt;
|
||||||
|
nbest_ = nbest;
|
||||||
|
elts_.resize(nelt * nbest);
|
||||||
|
ncur_.resize(nelt);
|
||||||
|
ncur_.fill(0);
|
||||||
|
n_ = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Add a new result to bin 'elt'. Where it gets prioritized in the list of
|
||||||
|
* results in that bin depends on the result of operator>.
|
||||||
|
*/
|
||||||
|
bool add(size_t elt, const T& o) {
|
||||||
|
assert_lt(elt, nelt_);
|
||||||
|
const size_t ncur = ncur_[elt];
|
||||||
|
assert_leq(ncur, nbest_);
|
||||||
|
n_++;
|
||||||
|
for(size_t i = 0; i < nbest_ && i <= ncur; i++) {
|
||||||
|
if(o > elts_[nbest_ * elt + i] || i >= ncur) {
|
||||||
|
// Insert it here
|
||||||
|
// Move everyone from here on down by one slot
|
||||||
|
for(int j = (int)ncur; j > (int)i; j--) {
|
||||||
|
if(j < (int)nbest_) {
|
||||||
|
elts_[nbest_ * elt + j] = elts_[nbest_ * elt + j - 1];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
elts_[nbest_ * elt + i] = o;
|
||||||
|
if(ncur < nbest_) {
|
||||||
|
ncur_[elt]++;
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff there are no solutions.
|
||||||
|
*/
|
||||||
|
bool empty() const {
|
||||||
|
return n_ == 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Dump all the items in our payload into the given EList.
|
||||||
|
*/
|
||||||
|
template<typename TList>
|
||||||
|
void dump(TList& l) const {
|
||||||
|
if(empty()) return;
|
||||||
|
for(size_t i = 0; i < nelt_; i++) {
|
||||||
|
assert_leq(ncur_[i], nbest_);
|
||||||
|
for(size_t j = 0; j < ncur_[i]; j++) {
|
||||||
|
l.push_back(elts_[i * nbest_ + j]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
protected:
|
||||||
|
|
||||||
|
size_t nelt_;
|
||||||
|
size_t nbest_;
|
||||||
|
EList<T> elts_;
|
||||||
|
EList<size_t> ncur_;
|
||||||
|
size_t n_; // total # results added
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif /*def ALIGNER_SW_NUC_H_*/
|
88
aligner_swsse.cpp
Normal file
88
aligner_swsse.cpp
Normal file
@ -0,0 +1,88 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <string.h>
|
||||||
|
#include "aligner_sw_common.h"
|
||||||
|
#include "aligner_swsse.h"
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Given a number of rows (nrow), a number of columns (ncol), and the
|
||||||
|
* number of words to fit inside a single __m128i vector, initialize the
|
||||||
|
* matrix buffer to accomodate the needed configuration of vectors.
|
||||||
|
*/
|
||||||
|
void SSEMatrix::init(
|
||||||
|
size_t nrow,
|
||||||
|
size_t ncol,
|
||||||
|
size_t wperv)
|
||||||
|
{
|
||||||
|
nrow_ = nrow;
|
||||||
|
ncol_ = ncol;
|
||||||
|
wperv_ = wperv;
|
||||||
|
nvecPerCol_ = (nrow + (wperv-1)) / wperv;
|
||||||
|
// The +1 is so that we don't have to special-case the final column;
|
||||||
|
// instead, we just write off the end of the useful part of the table
|
||||||
|
// with pvEStore.
|
||||||
|
try {
|
||||||
|
matbuf_.resizeNoCopy((ncol+1) * nvecPerCell_ * nvecPerCol_);
|
||||||
|
} catch(exception& e) {
|
||||||
|
cerr << "Tried to allocate DP matrix with " << (ncol+1)
|
||||||
|
<< " columns, " << nvecPerCol_
|
||||||
|
<< " vectors per column, and and " << nvecPerCell_
|
||||||
|
<< " vectors per cell" << endl;
|
||||||
|
throw e;
|
||||||
|
}
|
||||||
|
assert(wperv_ == 8 || wperv_ == 16);
|
||||||
|
vecshift_ = (wperv_ == 8) ? 3 : 4;
|
||||||
|
nvecrow_ = (nrow + (wperv_-1)) >> vecshift_;
|
||||||
|
nveccol_ = ncol;
|
||||||
|
colstride_ = nvecPerCol_ * nvecPerCell_;
|
||||||
|
rowstride_ = nvecPerCell_;
|
||||||
|
inited_ = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialize the matrix of masks and backtracking flags.
|
||||||
|
*/
|
||||||
|
void SSEMatrix::initMasks() {
|
||||||
|
assert_gt(nrow_, 0);
|
||||||
|
assert_gt(ncol_, 0);
|
||||||
|
masks_.resize(nrow_);
|
||||||
|
reset_.resizeNoCopy(nrow_);
|
||||||
|
reset_.fill(false);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Given a row, col and matrix (i.e. E, F or H), return the corresponding
|
||||||
|
* element.
|
||||||
|
*/
|
||||||
|
int SSEMatrix::eltSlow(size_t row, size_t col, size_t mat) const {
|
||||||
|
assert_lt(row, nrow_);
|
||||||
|
assert_lt(col, ncol_);
|
||||||
|
assert_leq(mat, 3);
|
||||||
|
// Move to beginning of column/row
|
||||||
|
size_t rowelt = row / nvecrow_;
|
||||||
|
size_t rowvec = row % nvecrow_;
|
||||||
|
size_t eltvec = (col * colstride_) + (rowvec * rowstride_) + mat;
|
||||||
|
if(wperv_ == 16) {
|
||||||
|
return (int)((uint8_t*)(matbuf_.ptr() + eltvec))[rowelt];
|
||||||
|
} else {
|
||||||
|
assert_eq(8, wperv_);
|
||||||
|
return (int)((int16_t*)(matbuf_.ptr() + eltvec))[rowelt];
|
||||||
|
}
|
||||||
|
}
|
500
aligner_swsse.h
Normal file
500
aligner_swsse.h
Normal file
@ -0,0 +1,500 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef ALIGNER_SWSSE_H_
|
||||||
|
#define ALIGNER_SWSSE_H_
|
||||||
|
|
||||||
|
#include "ds.h"
|
||||||
|
#include "mem_ids.h"
|
||||||
|
#include "random_source.h"
|
||||||
|
#include "scoring.h"
|
||||||
|
#include "mask.h"
|
||||||
|
#include "sse_util.h"
|
||||||
|
#include <string>
|
||||||
|
|
||||||
|
|
||||||
|
struct SSEMetrics {
|
||||||
|
|
||||||
|
SSEMetrics():mutex_m() { reset(); }
|
||||||
|
|
||||||
|
void clear() { reset(); }
|
||||||
|
void reset() {
|
||||||
|
dp = dpsat = dpfail = dpsucc =
|
||||||
|
col = cell = inner = fixup =
|
||||||
|
gathsol = bt = btfail = btsucc = btcell =
|
||||||
|
corerej = nrej = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
void merge(const SSEMetrics& o, bool getLock = false) {
|
||||||
|
ThreadSafe ts(&mutex_m, getLock);
|
||||||
|
dp += o.dp;
|
||||||
|
dpsat += o.dpsat;
|
||||||
|
dpfail += o.dpfail;
|
||||||
|
dpsucc += o.dpsucc;
|
||||||
|
col += o.col;
|
||||||
|
cell += o.cell;
|
||||||
|
inner += o.inner;
|
||||||
|
fixup += o.fixup;
|
||||||
|
gathsol += o.gathsol;
|
||||||
|
bt += o.bt;
|
||||||
|
btfail += o.btfail;
|
||||||
|
btsucc += o.btsucc;
|
||||||
|
btcell += o.btcell;
|
||||||
|
corerej += o.corerej;
|
||||||
|
nrej += o.nrej;
|
||||||
|
}
|
||||||
|
|
||||||
|
uint64_t dp; // DPs tried
|
||||||
|
uint64_t dpsat; // DPs saturated
|
||||||
|
uint64_t dpfail; // DPs failed
|
||||||
|
uint64_t dpsucc; // DPs succeeded
|
||||||
|
uint64_t col; // DP columns
|
||||||
|
uint64_t cell; // DP cells
|
||||||
|
uint64_t inner; // DP inner loop iters
|
||||||
|
uint64_t fixup; // DP fixup loop iters
|
||||||
|
uint64_t gathsol; // DP gather solution cells found
|
||||||
|
uint64_t bt; // DP backtraces
|
||||||
|
uint64_t btfail; // DP backtraces failed
|
||||||
|
uint64_t btsucc; // DP backtraces succeeded
|
||||||
|
uint64_t btcell; // DP backtrace cells traversed
|
||||||
|
uint64_t corerej; // DP backtrace core rejections
|
||||||
|
uint64_t nrej; // DP backtrace N rejections
|
||||||
|
MUTEX_T mutex_m;
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Encapsulates matrix information calculated by the SSE aligner.
|
||||||
|
*
|
||||||
|
* Matrix memory is laid out as follows:
|
||||||
|
*
|
||||||
|
* - Elements (individual cell scores) are packed into __m128i vectors
|
||||||
|
* - Vectors are packed into quartets, quartet elements correspond to: a vector
|
||||||
|
* from E, one from F, one from H, and one that's "reserved"
|
||||||
|
* - Quartets are packed into columns, where the number of quartets is
|
||||||
|
* determined by the number of query characters divided by the number of
|
||||||
|
* elements per vector
|
||||||
|
*
|
||||||
|
* Regarding the "reserved" element of the vector quartet: we use it for two
|
||||||
|
* things. First, we use the first column of reserved vectors to stage the
|
||||||
|
* initial column of H vectors. Second, we use the "reserved" vectors during
|
||||||
|
* the backtrace procedure to store information about (a) which cells have been
|
||||||
|
* traversed, (b) whether the cell is "terminal" (in local mode), etc.
|
||||||
|
*/
|
||||||
|
struct SSEMatrix {
|
||||||
|
|
||||||
|
// Each matrix element is a quartet of vectors. These constants are used
|
||||||
|
// to identify members of the quartet.
|
||||||
|
const static size_t E = 0;
|
||||||
|
const static size_t F = 1;
|
||||||
|
const static size_t H = 2;
|
||||||
|
const static size_t TMP = 3;
|
||||||
|
|
||||||
|
SSEMatrix(int cat = 0) : nvecPerCell_(4), matbuf_(cat) { }
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return a pointer to the matrix buffer.
|
||||||
|
*/
|
||||||
|
inline __m128i *ptr() {
|
||||||
|
assert(inited_);
|
||||||
|
return matbuf_.ptr();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return a pointer to the E vector at the given row and column. Note:
|
||||||
|
* here row refers to rows of vectors, not rows of elements.
|
||||||
|
*/
|
||||||
|
inline __m128i* evec(size_t row, size_t col) {
|
||||||
|
assert_lt(row, nvecrow_);
|
||||||
|
assert_lt(col, nveccol_);
|
||||||
|
size_t elt = row * rowstride() + col * colstride() + E;
|
||||||
|
assert_lt(elt, matbuf_.size());
|
||||||
|
return ptr() + elt;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Like evec, but it's allowed to ask for a pointer to one column after the
|
||||||
|
* final one.
|
||||||
|
*/
|
||||||
|
inline __m128i* evecUnsafe(size_t row, size_t col) {
|
||||||
|
assert_lt(row, nvecrow_);
|
||||||
|
assert_leq(col, nveccol_);
|
||||||
|
size_t elt = row * rowstride() + col * colstride() + E;
|
||||||
|
assert_lt(elt, matbuf_.size());
|
||||||
|
return ptr() + elt;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return a pointer to the F vector at the given row and column. Note:
|
||||||
|
* here row refers to rows of vectors, not rows of elements.
|
||||||
|
*/
|
||||||
|
inline __m128i* fvec(size_t row, size_t col) {
|
||||||
|
assert_lt(row, nvecrow_);
|
||||||
|
assert_lt(col, nveccol_);
|
||||||
|
size_t elt = row * rowstride() + col * colstride() + F;
|
||||||
|
assert_lt(elt, matbuf_.size());
|
||||||
|
return ptr() + elt;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return a pointer to the H vector at the given row and column. Note:
|
||||||
|
* here row refers to rows of vectors, not rows of elements.
|
||||||
|
*/
|
||||||
|
inline __m128i* hvec(size_t row, size_t col) {
|
||||||
|
assert_lt(row, nvecrow_);
|
||||||
|
assert_lt(col, nveccol_);
|
||||||
|
size_t elt = row * rowstride() + col * colstride() + H;
|
||||||
|
assert_lt(elt, matbuf_.size());
|
||||||
|
return ptr() + elt;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return a pointer to the TMP vector at the given row and column. Note:
|
||||||
|
* here row refers to rows of vectors, not rows of elements.
|
||||||
|
*/
|
||||||
|
inline __m128i* tmpvec(size_t row, size_t col) {
|
||||||
|
assert_lt(row, nvecrow_);
|
||||||
|
assert_lt(col, nveccol_);
|
||||||
|
size_t elt = row * rowstride() + col * colstride() + TMP;
|
||||||
|
assert_lt(elt, matbuf_.size());
|
||||||
|
return ptr() + elt;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Like tmpvec, but it's allowed to ask for a pointer to one column after
|
||||||
|
* the final one.
|
||||||
|
*/
|
||||||
|
inline __m128i* tmpvecUnsafe(size_t row, size_t col) {
|
||||||
|
assert_lt(row, nvecrow_);
|
||||||
|
assert_leq(col, nveccol_);
|
||||||
|
size_t elt = row * rowstride() + col * colstride() + TMP;
|
||||||
|
assert_lt(elt, matbuf_.size());
|
||||||
|
return ptr() + elt;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Given a number of rows (nrow), a number of columns (ncol), and the
|
||||||
|
* number of words to fit inside a single __m128i vector, initialize the
|
||||||
|
* matrix buffer to accomodate the needed configuration of vectors.
|
||||||
|
*/
|
||||||
|
void init(
|
||||||
|
size_t nrow,
|
||||||
|
size_t ncol,
|
||||||
|
size_t wperv);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the number of __m128i's you need to skip over to get from one
|
||||||
|
* cell to the cell one column over from it.
|
||||||
|
*/
|
||||||
|
inline size_t colstride() const { return colstride_; }
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the number of __m128i's you need to skip over to get from one
|
||||||
|
* cell to the cell one row down from it.
|
||||||
|
*/
|
||||||
|
inline size_t rowstride() const { return rowstride_; }
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Given a row, col and matrix (i.e. E, F or H), return the corresponding
|
||||||
|
* element.
|
||||||
|
*/
|
||||||
|
int eltSlow(size_t row, size_t col, size_t mat) const;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Given a row, col and matrix (i.e. E, F or H), return the corresponding
|
||||||
|
* element.
|
||||||
|
*/
|
||||||
|
inline int elt(size_t row, size_t col, size_t mat) const {
|
||||||
|
assert(inited_);
|
||||||
|
assert_lt(row, nrow_);
|
||||||
|
assert_lt(col, ncol_);
|
||||||
|
assert_lt(mat, 3);
|
||||||
|
// Move to beginning of column/row
|
||||||
|
size_t rowelt = row / nvecrow_;
|
||||||
|
size_t rowvec = row % nvecrow_;
|
||||||
|
size_t eltvec = (col * colstride_) + (rowvec * rowstride_) + mat;
|
||||||
|
assert_lt(eltvec, matbuf_.size());
|
||||||
|
if(wperv_ == 16) {
|
||||||
|
return (int)((uint8_t*)(matbuf_.ptr() + eltvec))[rowelt];
|
||||||
|
} else {
|
||||||
|
assert_eq(8, wperv_);
|
||||||
|
return (int)((int16_t*)(matbuf_.ptr() + eltvec))[rowelt];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the element in the E matrix at element row, col.
|
||||||
|
*/
|
||||||
|
inline int eelt(size_t row, size_t col) const {
|
||||||
|
return elt(row, col, E);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the element in the F matrix at element row, col.
|
||||||
|
*/
|
||||||
|
inline int felt(size_t row, size_t col) const {
|
||||||
|
return elt(row, col, F);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the element in the H matrix at element row, col.
|
||||||
|
*/
|
||||||
|
inline int helt(size_t row, size_t col) const {
|
||||||
|
return elt(row, col, H);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff the given cell has its reportedThru bit set.
|
||||||
|
*/
|
||||||
|
inline bool reportedThrough(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col) const // current column
|
||||||
|
{
|
||||||
|
return (masks_[row][col] & (1 << 0)) != 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Set the given cell's reportedThru bit.
|
||||||
|
*/
|
||||||
|
inline void setReportedThrough(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col) // current column
|
||||||
|
{
|
||||||
|
masks_[row][col] |= (1 << 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff the H mask has been set with a previous call to hMaskSet.
|
||||||
|
*/
|
||||||
|
bool isHMaskSet(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col) const; // current column
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Set the given cell's H mask. This is the mask of remaining legal ways to
|
||||||
|
* backtrack from the H cell at this coordinate. It's 5 bits long and has
|
||||||
|
* offset=2 into the 16-bit field.
|
||||||
|
*/
|
||||||
|
void hMaskSet(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col, // current column
|
||||||
|
int mask);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff the E mask has been set with a previous call to eMaskSet.
|
||||||
|
*/
|
||||||
|
bool isEMaskSet(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col) const; // current column
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Set the given cell's E mask. This is the mask of remaining legal ways to
|
||||||
|
* backtrack from the E cell at this coordinate. It's 2 bits long and has
|
||||||
|
* offset=8 into the 16-bit field.
|
||||||
|
*/
|
||||||
|
void eMaskSet(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col, // current column
|
||||||
|
int mask);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff the F mask has been set with a previous call to fMaskSet.
|
||||||
|
*/
|
||||||
|
bool isFMaskSet(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col) const; // current column
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Set the given cell's F mask. This is the mask of remaining legal ways to
|
||||||
|
* backtrack from the F cell at this coordinate. It's 2 bits long and has
|
||||||
|
* offset=11 into the 16-bit field.
|
||||||
|
*/
|
||||||
|
void fMaskSet(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col, // current column
|
||||||
|
int mask);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Analyze a cell in the SSE-filled dynamic programming matrix. Determine &
|
||||||
|
* memorize ways that we can backtrack from the cell. If there is at least one
|
||||||
|
* way to backtrack, select one at random and return the selection.
|
||||||
|
*
|
||||||
|
* There are a few subtleties to keep in mind regarding which cells can be at
|
||||||
|
* the end of a backtrace. First of all: cells from which we can backtrack
|
||||||
|
* should not be at the end of a backtrace. But have to distinguish between
|
||||||
|
* cells whose masks eventually become 0 (we shouldn't end at those), from
|
||||||
|
* those whose masks were 0 all along (we can end at those).
|
||||||
|
*/
|
||||||
|
void analyzeCell(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col, // current column
|
||||||
|
size_t ct, // current cell type: E/F/H
|
||||||
|
int refc,
|
||||||
|
int readc,
|
||||||
|
int readq,
|
||||||
|
const Scoring& sc, // scoring scheme
|
||||||
|
int64_t offsetsc, // offset to add to each score
|
||||||
|
RandomSource& rand, // rand gen for choosing among equal options
|
||||||
|
bool& empty, // out: =true iff no way to backtrace
|
||||||
|
int& cur, // out: =type of transition
|
||||||
|
bool& branch, // out: =true iff we chose among >1 options
|
||||||
|
bool& canMoveThru, // out: =true iff ...
|
||||||
|
bool& reportedThru); // out: =true iff ...
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialize the matrix of masks and backtracking flags.
|
||||||
|
*/
|
||||||
|
void initMasks();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the number of rows in the dynamic programming matrix.
|
||||||
|
*/
|
||||||
|
size_t nrow() const {
|
||||||
|
return nrow_;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the number of columns in the dynamic programming matrix.
|
||||||
|
*/
|
||||||
|
size_t ncol() const {
|
||||||
|
return ncol_;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Prepare a row so we can use it to store masks.
|
||||||
|
*/
|
||||||
|
void resetRow(size_t i) {
|
||||||
|
assert(!reset_[i]);
|
||||||
|
masks_[i].resizeNoCopy(ncol_);
|
||||||
|
masks_[i].fillZero();
|
||||||
|
reset_[i] = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool inited_; // initialized?
|
||||||
|
size_t nrow_; // # rows
|
||||||
|
size_t ncol_; // # columns
|
||||||
|
size_t nvecrow_; // # vector rows (<= nrow_)
|
||||||
|
size_t nveccol_; // # vector columns (<= ncol_)
|
||||||
|
size_t wperv_; // # words per vector
|
||||||
|
size_t vecshift_; // # bits to shift to divide by words per vec
|
||||||
|
size_t nvecPerCol_; // # vectors per column
|
||||||
|
size_t nvecPerCell_; // # vectors per matrix cell (4)
|
||||||
|
size_t colstride_; // # vectors b/t adjacent cells in same row
|
||||||
|
size_t rowstride_; // # vectors b/t adjacent cells in same col
|
||||||
|
EList_m128i matbuf_; // buffer for holding vectors
|
||||||
|
ELList<uint16_t> masks_; // buffer for masks/backtracking flags
|
||||||
|
EList<bool> reset_; // true iff row in masks_ has been reset
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* All the data associated with the query profile and other data needed for SSE
|
||||||
|
* alignment of a query.
|
||||||
|
*/
|
||||||
|
struct SSEData {
|
||||||
|
SSEData(int cat = 0) : profbuf_(cat), mat_(cat) { }
|
||||||
|
EList_m128i profbuf_; // buffer for query profile & temp vecs
|
||||||
|
EList_m128i vecbuf_; // buffer for 2 column vectors (not using mat_)
|
||||||
|
size_t qprofStride_; // stride for query profile
|
||||||
|
size_t gbarStride_; // gap barrier for query profile
|
||||||
|
SSEMatrix mat_; // SSE matrix for holding all E, F, H vectors
|
||||||
|
size_t maxPen_; // biggest penalty of all
|
||||||
|
size_t maxBonus_; // biggest bonus of all
|
||||||
|
size_t lastIter_; // which 128-bit striped word has final row?
|
||||||
|
size_t lastWord_; // which word within 128-word has final row?
|
||||||
|
int bias_; // all scores shifted up by this for unsigned
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff the H mask has been set with a previous call to hMaskSet.
|
||||||
|
*/
|
||||||
|
inline bool SSEMatrix::isHMaskSet(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col) const // current column
|
||||||
|
{
|
||||||
|
return (masks_[row][col] & (1 << 1)) != 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Set the given cell's H mask. This is the mask of remaining legal ways to
|
||||||
|
* backtrack from the H cell at this coordinate. It's 5 bits long and has
|
||||||
|
* offset=2 into the 16-bit field.
|
||||||
|
*/
|
||||||
|
inline void SSEMatrix::hMaskSet(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col, // current column
|
||||||
|
int mask)
|
||||||
|
{
|
||||||
|
assert_lt(mask, 32);
|
||||||
|
masks_[row][col] &= ~(31 << 1);
|
||||||
|
masks_[row][col] |= (1 << 1 | mask << 2);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff the E mask has been set with a previous call to eMaskSet.
|
||||||
|
*/
|
||||||
|
inline bool SSEMatrix::isEMaskSet(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col) const // current column
|
||||||
|
{
|
||||||
|
return (masks_[row][col] & (1 << 7)) != 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Set the given cell's E mask. This is the mask of remaining legal ways to
|
||||||
|
* backtrack from the E cell at this coordinate. It's 2 bits long and has
|
||||||
|
* offset=8 into the 16-bit field.
|
||||||
|
*/
|
||||||
|
inline void SSEMatrix::eMaskSet(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col, // current column
|
||||||
|
int mask)
|
||||||
|
{
|
||||||
|
assert_lt(mask, 4);
|
||||||
|
masks_[row][col] &= ~(7 << 7);
|
||||||
|
masks_[row][col] |= (1 << 7 | mask << 8);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff the F mask has been set with a previous call to fMaskSet.
|
||||||
|
*/
|
||||||
|
inline bool SSEMatrix::isFMaskSet(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col) const // current column
|
||||||
|
{
|
||||||
|
return (masks_[row][col] & (1 << 10)) != 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Set the given cell's F mask. This is the mask of remaining legal ways to
|
||||||
|
* backtrack from the F cell at this coordinate. It's 2 bits long and has
|
||||||
|
* offset=11 into the 16-bit field.
|
||||||
|
*/
|
||||||
|
inline void SSEMatrix::fMaskSet(
|
||||||
|
size_t row, // current row
|
||||||
|
size_t col, // current column
|
||||||
|
int mask)
|
||||||
|
{
|
||||||
|
assert_lt(mask, 4);
|
||||||
|
masks_[row][col] &= ~(7 << 10);
|
||||||
|
masks_[row][col] |= (1 << 10 | mask << 11);
|
||||||
|
}
|
||||||
|
|
||||||
|
#define ROWSTRIDE_2COL 4
|
||||||
|
#define ROWSTRIDE 4
|
||||||
|
|
||||||
|
#endif /*ndef ALIGNER_SWSSE_H_*/
|
1911
aligner_swsse_ee_i16.cpp
Normal file
1911
aligner_swsse_ee_i16.cpp
Normal file
File diff suppressed because it is too large
Load Diff
1902
aligner_swsse_ee_u8.cpp
Normal file
1902
aligner_swsse_ee_u8.cpp
Normal file
File diff suppressed because it is too large
Load Diff
2272
aligner_swsse_loc_i16.cpp
Normal file
2272
aligner_swsse_loc_i16.cpp
Normal file
File diff suppressed because it is too large
Load Diff
2266
aligner_swsse_loc_u8.cpp
Normal file
2266
aligner_swsse_loc_u8.cpp
Normal file
File diff suppressed because it is too large
Load Diff
193
alignment_3n.cpp
Normal file
193
alignment_3n.cpp
Normal file
@ -0,0 +1,193 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2020, Yun (Leo) Zhang <imzhangyun@gmail.com>
|
||||||
|
*
|
||||||
|
* This file is part of HISAT-3N.
|
||||||
|
*
|
||||||
|
* HISAT-3N is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* HISAT-3N is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with HISAT-3N. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "alignment_3n.h"
|
||||||
|
#include "aln_sink.h"
|
||||||
|
|
||||||
|
|
||||||
|
/**
|
||||||
|
* return true if two location is concordant.
|
||||||
|
* return false, if there are not concordant or too far (>maxPairDistance).
|
||||||
|
*/
|
||||||
|
bool Alignment::isConcordant(long long int location1, bool &forward1, long long int readLength1, long long int location2, bool &forward2, long long int readLength2) {
|
||||||
|
if (forward1 == forward2) // same direction
|
||||||
|
{
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
// adjust the location of the start of the read
|
||||||
|
if (!forward1)
|
||||||
|
{
|
||||||
|
location1 = location1 + readLength1 - 1;
|
||||||
|
}
|
||||||
|
if (!forward2)
|
||||||
|
{
|
||||||
|
location2 = location2 + readLength2 - 1;
|
||||||
|
}
|
||||||
|
// return false if two reads are too far from each other
|
||||||
|
if (abs(location1-location2) > maxPairDistance)
|
||||||
|
{
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (location1 == location2)
|
||||||
|
{
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
else if (location1 < location2)
|
||||||
|
{
|
||||||
|
if (forward1 && !forward2)
|
||||||
|
{
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
if (!forward1 && forward2)
|
||||||
|
{
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* this is the basic function to calculate DNA pair score.
|
||||||
|
* if the distance between 2 alignments is more than penaltyFreeDistance_DNA, we reduce the score by the distance/100.
|
||||||
|
* if two alignment is concordant we add concordantScoreBounce to make sure to select the concordant pair as best pair.
|
||||||
|
*/
|
||||||
|
int Alignment::calculatePairScore_DNA (long long int &location0, int& AS0, bool& forward0, long long int readLength0, long long int &location1, int &AS1, bool &forward1, long long int readLength1, bool& concordant) {
|
||||||
|
|
||||||
|
int score = ASPenalty*AS0 + ASPenalty*AS1;
|
||||||
|
int distance = abs(location0 - location1);
|
||||||
|
if (distance > maxPairDistance) { return numeric_limits<int>::min(); }
|
||||||
|
if (distance > penaltyFreeDistance_DNA) { score -= distance/distancePenaltyFraction_DNA; }
|
||||||
|
concordant = isConcordant(location0, forward0, readLength0, location1, forward1, readLength1);
|
||||||
|
if (concordant) { score += concordantScoreBounce; }
|
||||||
|
return score;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* this is the basic function to calculate RNA pair score.
|
||||||
|
* if the distance between 2 alignments is more than penaltyFreeDistance_RNA, we reduce the score by the distance/1000.
|
||||||
|
* if two alignment is concordant we add concordantScoreBounce to make sure to select the concordant pair as best pair.
|
||||||
|
*/
|
||||||
|
int Alignment::calculatePairScore_RNA (long long int &location0, int& XM0, bool& forward0, long long int readLength0, long long int &location1, int &XM1, bool &forward1, long long int readLength1, bool& concordant) {
|
||||||
|
// this is the basic function to calculate pair score.
|
||||||
|
// if the distance between 2 alignment is more than 100,000, we reduce the score by the distance/1000.
|
||||||
|
// if two alignment is concordant we add 500,000 to make sure to select the concordant pair as best pair.
|
||||||
|
int score = -ASPenalty*XM0 + -ASPenalty*XM1;
|
||||||
|
int distance = abs(location0 - location1);
|
||||||
|
if (distance > maxPairDistance) { return numeric_limits<int>::min(); }
|
||||||
|
if (distance > penaltyFreeDistance_RNA) { score -= distance/distancePenaltyFraction_RNA; }
|
||||||
|
concordant = isConcordant(location0, forward0, readLength0, location1, forward1, readLength1);
|
||||||
|
if (concordant) { score += concordantScoreBounce; }
|
||||||
|
return score;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* calculate the pairScore for a pair of alignment result. Output pair Score and number of pair.
|
||||||
|
* Do not update their pairScore.
|
||||||
|
*/
|
||||||
|
int Alignment::calculatePairScore(Alignment *inputAlignment, int &nPair) {
|
||||||
|
int pairScore = numeric_limits<int>::min();
|
||||||
|
nPair = 0;
|
||||||
|
if (pairSegment == inputAlignment->pairSegment){
|
||||||
|
// when 2 alignment results are from same pair segment, output the lowest score and number of pair equal zero.
|
||||||
|
pairScore = numeric_limits<int>::min();
|
||||||
|
} else if (!mapped && !inputAlignment->mapped) {
|
||||||
|
// both unmapped.
|
||||||
|
pairScore = numeric_limits<int>::min()/2 - 1;
|
||||||
|
} else if (!mapped || !inputAlignment->mapped) {
|
||||||
|
// one of the segment unmapped.
|
||||||
|
pairScore = numeric_limits<int>::min()/2;
|
||||||
|
nPair = 1;
|
||||||
|
} else if ((!repeat && !inputAlignment->repeat)){
|
||||||
|
// both mapped and (both non-repeat or not expand repeat)
|
||||||
|
bool concordant;
|
||||||
|
if (DNA) {
|
||||||
|
pairScore = calculatePairScore_DNA(location,
|
||||||
|
AS,
|
||||||
|
forward,
|
||||||
|
readSequence.length(),
|
||||||
|
inputAlignment->location,
|
||||||
|
inputAlignment->AS,
|
||||||
|
inputAlignment->forward,
|
||||||
|
inputAlignment->readSequence.length(),
|
||||||
|
concordant);
|
||||||
|
} else {
|
||||||
|
pairScore = calculatePairScore_RNA(location,
|
||||||
|
XM,
|
||||||
|
forward,
|
||||||
|
readSequence.length(),
|
||||||
|
inputAlignment->location,
|
||||||
|
inputAlignment->XM,
|
||||||
|
inputAlignment->forward,
|
||||||
|
inputAlignment->readSequence.length(),
|
||||||
|
concordant);
|
||||||
|
}
|
||||||
|
setConcordant(concordant);
|
||||||
|
inputAlignment->setConcordant(concordant);
|
||||||
|
nPair = 1;
|
||||||
|
}
|
||||||
|
return pairScore;
|
||||||
|
}
|
||||||
|
|
||||||
|
void Alignments::reportStats_single(ReportingMetrics& met) {
|
||||||
|
|
||||||
|
int nAlignment = alignmentPositions.nBestSingle;
|
||||||
|
if (nAlignment == 0) {
|
||||||
|
met.nunp_0++;
|
||||||
|
} else {
|
||||||
|
met.nunp_uni++;
|
||||||
|
if (nAlignment == 1) { met.nunp_uni1++; }
|
||||||
|
else { met.nunp_uni2++; }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void Alignments::reportStats_paired(ReportingMetrics& met) {
|
||||||
|
if (!alignmentPositions.concordantExist) {
|
||||||
|
met.nconcord_0++;
|
||||||
|
if (alignmentPositions.nBestPair == 0) {
|
||||||
|
met.nunp_0_0 += 2;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
if (alignmentPositions.bestPairScore == numeric_limits<int>::min()/2) {
|
||||||
|
// one mate is unmapped, one mate is mapped
|
||||||
|
met.nunp_0_0++;
|
||||||
|
met.nunp_0_uni++;
|
||||||
|
if (alignmentPositions.nBestPair == 1) { met.nunp_0_uni1++; }
|
||||||
|
else { met.nunp_0_uni2++; }
|
||||||
|
} else { //both mate is mapped
|
||||||
|
if (alignmentPositions.nBestPair == 1) {
|
||||||
|
met.ndiscord++;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
met.nunp_0_uni += 2;
|
||||||
|
met.nunp_0_uni2 += 2;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
assert(alignmentPositions.nBestPair > 0);
|
||||||
|
met.nconcord_uni++;
|
||||||
|
if (alignmentPositions.nBestPair == 1) { met.nconcord_uni1++; }
|
||||||
|
else { met.nconcord_uni2++; }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
1214
alignment_3n.h
Normal file
1214
alignment_3n.h
Normal file
File diff suppressed because it is too large
Load Diff
287
alignment_3n_table.h
Normal file
287
alignment_3n_table.h
Normal file
@ -0,0 +1,287 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2020, Yun (Leo) Zhang <imzhangyun@gmail.com>
|
||||||
|
*
|
||||||
|
* This file is part of HISAT-3N.
|
||||||
|
*
|
||||||
|
* HISAT-3N is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* HISAT-3N is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with HISAT-3N. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef ALIGNMENT_3N_TABLE_H
|
||||||
|
#define ALIGNMENT_3N_TABLE_H
|
||||||
|
|
||||||
|
#include <string>
|
||||||
|
#include "utility_3n_table.h"
|
||||||
|
|
||||||
|
extern bool uniqueOnly;
|
||||||
|
extern bool multipleOnly;
|
||||||
|
extern char convertFrom;
|
||||||
|
extern char convertTo;
|
||||||
|
extern char convertFromComplement;
|
||||||
|
extern char convertToComplement;
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* the class to store information from one SAM line
|
||||||
|
*/
|
||||||
|
class Alignment {
|
||||||
|
public:
|
||||||
|
string chromosome;
|
||||||
|
long long int location;
|
||||||
|
long long int mateLocation;
|
||||||
|
int flag;
|
||||||
|
bool mapped;
|
||||||
|
char strand;
|
||||||
|
string sequence;
|
||||||
|
string quality;
|
||||||
|
bool unique;
|
||||||
|
string mapQ;
|
||||||
|
int NH;
|
||||||
|
vector<PosQuality> bases;
|
||||||
|
CIGAR cigarString;
|
||||||
|
MD_tag MD;
|
||||||
|
unsigned long long readNameID;
|
||||||
|
int sequenceCoveredLength; // the sum of number is cigarString;
|
||||||
|
bool overlap; // if the segment could overlap with the mate segment.
|
||||||
|
bool paired;
|
||||||
|
|
||||||
|
void initialize() {
|
||||||
|
chromosome.clear();
|
||||||
|
location = -1;
|
||||||
|
mateLocation = -1;
|
||||||
|
flag = -1;
|
||||||
|
mapped = false;
|
||||||
|
MD.initialize();
|
||||||
|
cigarString.initialize();
|
||||||
|
sequence.clear();
|
||||||
|
quality.clear();
|
||||||
|
unique = false;
|
||||||
|
mapQ.clear();
|
||||||
|
NH = -1;
|
||||||
|
bases.clear();
|
||||||
|
readNameID = 0;
|
||||||
|
sequenceCoveredLength = 0;
|
||||||
|
overlap = false;
|
||||||
|
paired = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* for start position in input Line, check if it contain the target information.
|
||||||
|
*/
|
||||||
|
bool startWith(string* inputLine, int startPosition, string tag){
|
||||||
|
for (int i = 0; i < tag.size(); i++){
|
||||||
|
if (inputLine->at(startPosition+i) != tag[i]){
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* generate a hash value for readName
|
||||||
|
*/
|
||||||
|
void getNameHash(string& readName) {
|
||||||
|
readNameID = 0;
|
||||||
|
int a = 63689;
|
||||||
|
for (size_t i = 0; i < readName.size(); i++) {
|
||||||
|
readNameID = (readNameID * a) + (int)readName[i];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* extract the information from SAM line to Alignment.
|
||||||
|
*/
|
||||||
|
void parseInfo(string* line) {
|
||||||
|
int startPosition = 0;
|
||||||
|
int endPosition = 0;
|
||||||
|
int count = 0;
|
||||||
|
|
||||||
|
while ((endPosition = line->find("\t", startPosition)) != string::npos) {
|
||||||
|
if (count == 0) {
|
||||||
|
string readName = line->substr(startPosition, endPosition - startPosition);
|
||||||
|
getNameHash(readName);
|
||||||
|
} else if (count == 1) {
|
||||||
|
flag = stoi(line->substr(startPosition, endPosition - startPosition));
|
||||||
|
mapped = (flag & 4) == 0;
|
||||||
|
paired = (flag & 1) != 0;
|
||||||
|
} else if (count == 2) {
|
||||||
|
chromosome = line->substr(startPosition, endPosition - startPosition);
|
||||||
|
} else if (count == 3) {
|
||||||
|
location = stoll(line->substr(startPosition, endPosition - startPosition));
|
||||||
|
} else if (count == 4) {
|
||||||
|
mapQ = line->substr(startPosition, endPosition - startPosition);
|
||||||
|
if (mapQ == "1") {
|
||||||
|
unique = false;
|
||||||
|
} else {
|
||||||
|
unique = true;
|
||||||
|
}
|
||||||
|
} else if (count == 5) {
|
||||||
|
cigarString.loadString(line->substr(startPosition, endPosition - startPosition));
|
||||||
|
} else if (count == 7) {
|
||||||
|
mateLocation = stoll(line->substr(startPosition, endPosition - startPosition));
|
||||||
|
} else if (count == 9) {
|
||||||
|
sequence = line->substr(startPosition, endPosition - startPosition);
|
||||||
|
} else if (count == 10) {
|
||||||
|
quality = line->substr(startPosition, endPosition - startPosition);
|
||||||
|
} else if (count > 10) {
|
||||||
|
if (startWith(line, startPosition, "MD")) {
|
||||||
|
MD.loadString(line->substr(startPosition + 5, endPosition - startPosition - 5));
|
||||||
|
} else if (startWith(line, startPosition, "NM")) {
|
||||||
|
NH = stoi(line->substr(startPosition + 5, endPosition - startPosition - 5));
|
||||||
|
} else if (startWith(line, startPosition, "YZ")) {
|
||||||
|
strand = line->at(endPosition-1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
startPosition = endPosition + 1;
|
||||||
|
count++;
|
||||||
|
}
|
||||||
|
if (startWith(line, startPosition, "MD")) {
|
||||||
|
MD.loadString(line->substr(startPosition + 5, endPosition - startPosition - 5));
|
||||||
|
} else if (startWith(line, startPosition, "NM")) {
|
||||||
|
NH = stoi(line->substr(startPosition + 5, endPosition - startPosition - 5));
|
||||||
|
} else if (startWith(line, startPosition, "YZ")) {
|
||||||
|
strand = line->at(endPosition-1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* change the overlap = true, if the read is not uniquely mapped or the read segment is overlap to it's mate.
|
||||||
|
*/
|
||||||
|
void checkOverlap() {
|
||||||
|
if (!unique) {
|
||||||
|
overlap = true;
|
||||||
|
} else {
|
||||||
|
if (paired && (location + sequenceCoveredLength >= mateLocation)) {
|
||||||
|
overlap = true;
|
||||||
|
} else {
|
||||||
|
overlap = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/**
|
||||||
|
* parse the sam line to alignment information
|
||||||
|
*/
|
||||||
|
void parse(string* line) {
|
||||||
|
initialize();
|
||||||
|
parseInfo(line);
|
||||||
|
if ((uniqueOnly && !unique) || (multipleOnly && unique)) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
appendBase();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* scan all base in read sequence label them if they are qualified.
|
||||||
|
*/
|
||||||
|
void appendBase() {
|
||||||
|
if (!mapped || sequenceCoveredLength > 500000) { // if the read's intron longer than 500,000 ignore this read
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
bases.reserve(sequence.size());
|
||||||
|
for (int i = 0; i < sequence.size(); i++) {
|
||||||
|
bases.emplace_back(i);
|
||||||
|
}
|
||||||
|
int pos = adjustPos();
|
||||||
|
|
||||||
|
string match;
|
||||||
|
while (MD.getNextSegment(match)) {
|
||||||
|
if (isdigit(match.front())) { // the first char of match is digit this is match
|
||||||
|
int len = stoi(match);
|
||||||
|
for (int i = 0; i < len; i++) {
|
||||||
|
while (bases[pos].remove) {
|
||||||
|
pos++;
|
||||||
|
}
|
||||||
|
if ((strand == '+' && sequence[pos] == convertFrom) ||
|
||||||
|
(strand == '-' && sequence[pos] == convertFromComplement)) {
|
||||||
|
bases[pos].setQual(quality[pos], false);
|
||||||
|
} else {
|
||||||
|
bases[pos].remove = true;
|
||||||
|
}
|
||||||
|
pos ++;
|
||||||
|
}
|
||||||
|
} else if (isalpha(match.front())) { // this is mismatch or conversion
|
||||||
|
char refBase = match.front();
|
||||||
|
// for + strand, it should have C->T change
|
||||||
|
// for - strand, it should have G->A change
|
||||||
|
while (bases[pos].remove) {
|
||||||
|
pos++;
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((strand == '+' && refBase == convertFrom && sequence[pos] == convertTo) ||
|
||||||
|
(strand == '-' && refBase == convertFromComplement && sequence[pos] == convertToComplement)){
|
||||||
|
bases[pos].setQual(quality[pos], true);
|
||||||
|
} else {
|
||||||
|
bases[pos].remove = true;
|
||||||
|
}
|
||||||
|
pos ++;
|
||||||
|
} else { // deletion. do nothing.
|
||||||
|
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* adjust the reference position in bases
|
||||||
|
*/
|
||||||
|
int adjustPos() {
|
||||||
|
|
||||||
|
int readPos = 0;
|
||||||
|
int returnPos = 0;
|
||||||
|
int seqLength = sequence.size();
|
||||||
|
|
||||||
|
char cigarSymbol;
|
||||||
|
int cigarLen;
|
||||||
|
sequenceCoveredLength = 0;
|
||||||
|
|
||||||
|
while (cigarString.getNextSegment(cigarLen, cigarSymbol)) {
|
||||||
|
sequenceCoveredLength += cigarLen;
|
||||||
|
if (cigarSymbol == 'S') {
|
||||||
|
if (readPos == 0) { // soft clip is at the begin of the read
|
||||||
|
returnPos = cigarLen;
|
||||||
|
for (int i = cigarLen; i < seqLength; i++) {
|
||||||
|
bases[i].refPos -= cigarLen;
|
||||||
|
}
|
||||||
|
} else { // soft clip is at the end of the read
|
||||||
|
// do nothing
|
||||||
|
}
|
||||||
|
readPos += cigarLen;
|
||||||
|
} else if (cigarSymbol == 'N') {
|
||||||
|
for (int i = readPos; i < seqLength; i++) {
|
||||||
|
bases[i].refPos += cigarLen;
|
||||||
|
}
|
||||||
|
} else if (cigarSymbol == 'M') {
|
||||||
|
for (int i = readPos; i < readPos+cigarLen; i++) {
|
||||||
|
bases[i].remove = false;
|
||||||
|
}
|
||||||
|
readPos += cigarLen;
|
||||||
|
} else if (cigarSymbol == 'I') {
|
||||||
|
for (int i = readPos + cigarLen; i < seqLength; i++) {
|
||||||
|
bases[i].refPos -= cigarLen;
|
||||||
|
}
|
||||||
|
readPos += cigarLen;
|
||||||
|
} else if (cigarSymbol == 'D') {
|
||||||
|
for (int i = readPos; i < seqLength; i++) {
|
||||||
|
bases[i].refPos += cigarLen;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return returnPos;
|
||||||
|
}
|
||||||
|
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif //ALIGNMENT_3N_TABLE_H
|
785
aln_sink.cpp
Normal file
785
aln_sink.cpp
Normal file
@ -0,0 +1,785 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <iomanip>
|
||||||
|
#include <limits>
|
||||||
|
#include "aln_sink.h"
|
||||||
|
#include "aligner_seed.h"
|
||||||
|
#include "util.h"
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialize state machine with a new read. The state we start in depends
|
||||||
|
* on whether it's paired-end or unpaired.
|
||||||
|
*/
|
||||||
|
void ReportingState::nextRead(bool paired) {
|
||||||
|
paired_ = paired;
|
||||||
|
if(paired) {
|
||||||
|
state_ = CONCORDANT_PAIRS;
|
||||||
|
doneConcord_ = false;
|
||||||
|
doneDiscord_ = p_.discord ? false : true;
|
||||||
|
doneUnpair1_ = p_.mixed ? false : true;
|
||||||
|
doneUnpair2_ = p_.mixed ? false : true;
|
||||||
|
exitConcord_ = ReportingState::EXIT_DID_NOT_EXIT;
|
||||||
|
exitDiscord_ = p_.discord ?
|
||||||
|
ReportingState::EXIT_DID_NOT_EXIT :
|
||||||
|
ReportingState::EXIT_DID_NOT_ENTER;
|
||||||
|
exitUnpair1_ = p_.mixed ?
|
||||||
|
ReportingState::EXIT_DID_NOT_EXIT :
|
||||||
|
ReportingState::EXIT_DID_NOT_ENTER;
|
||||||
|
exitUnpair2_ = p_.mixed ?
|
||||||
|
ReportingState::EXIT_DID_NOT_EXIT :
|
||||||
|
ReportingState::EXIT_DID_NOT_ENTER;
|
||||||
|
} else {
|
||||||
|
// Unpaired
|
||||||
|
state_ = UNPAIRED;
|
||||||
|
doneConcord_ = true;
|
||||||
|
doneDiscord_ = true;
|
||||||
|
doneUnpair1_ = false;
|
||||||
|
doneUnpair2_ = true;
|
||||||
|
exitConcord_ = ReportingState::EXIT_DID_NOT_ENTER; // not relevant
|
||||||
|
exitDiscord_ = ReportingState::EXIT_DID_NOT_ENTER; // not relevant
|
||||||
|
exitUnpair1_ = ReportingState::EXIT_DID_NOT_EXIT;
|
||||||
|
exitUnpair2_ = ReportingState::EXIT_DID_NOT_ENTER; // not relevant
|
||||||
|
}
|
||||||
|
doneUnpair_ = doneUnpair1_ && doneUnpair2_;
|
||||||
|
done_ = false;
|
||||||
|
nconcord_ = ndiscord_ = nunpair1_ = nunpair2_ = 0;
|
||||||
|
nunpairRepeat1_ = nunpairRepeat2_ = 0;
|
||||||
|
concordBest_ = getMinScore();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Caller uses this member function to indicate that one additional
|
||||||
|
* concordant alignment has been found.
|
||||||
|
*/
|
||||||
|
bool ReportingState::foundConcordant(TAlScore score) {
|
||||||
|
assert(paired_);
|
||||||
|
assert_geq(state_, ReportingState::CONCORDANT_PAIRS);
|
||||||
|
assert(!doneConcord_);
|
||||||
|
|
||||||
|
if(score > concordBest_) {
|
||||||
|
concordBest_ = score;
|
||||||
|
nconcord_ = 0;
|
||||||
|
}
|
||||||
|
nconcord_++;
|
||||||
|
|
||||||
|
// DK CONCORDANT - debugging purpuses
|
||||||
|
// areDone(nconcord_, doneConcord_, exitConcord_);
|
||||||
|
|
||||||
|
// No need to search for discordant alignments if there are one or more
|
||||||
|
// concordant alignments.
|
||||||
|
doneDiscord_ = true;
|
||||||
|
exitDiscord_ = ReportingState::EXIT_SHORT_CIRCUIT_TRUMPED;
|
||||||
|
if(doneConcord_) {
|
||||||
|
// If we're finished looking for concordant alignments, do we have to
|
||||||
|
// continue on to search for unpaired alignments? Only if our exit
|
||||||
|
// from the concordant stage is EXIT_SHORT_CIRCUIT_M. If it's
|
||||||
|
// EXIT_SHORT_CIRCUIT_k or EXIT_WITH_ALIGNMENTS, we can skip unpaired.
|
||||||
|
assert_neq(ReportingState::EXIT_NO_ALIGNMENTS, exitConcord_);
|
||||||
|
if(exitConcord_ != ReportingState::EXIT_SHORT_CIRCUIT_M) {
|
||||||
|
if(!doneUnpair1_) {
|
||||||
|
doneUnpair1_ = true;
|
||||||
|
exitUnpair1_ = ReportingState::EXIT_SHORT_CIRCUIT_TRUMPED;
|
||||||
|
}
|
||||||
|
if(!doneUnpair2_) {
|
||||||
|
doneUnpair2_ = true;
|
||||||
|
exitUnpair2_ = ReportingState::EXIT_SHORT_CIRCUIT_TRUMPED;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
updateDone();
|
||||||
|
return done();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Caller uses this member function to indicate that one additional unpaired
|
||||||
|
* mate alignment has been found for the specified mate.
|
||||||
|
*/
|
||||||
|
bool ReportingState::foundUnpaired(bool mate1, bool repeat) {
|
||||||
|
assert_gt(state_, ReportingState::NO_READ);
|
||||||
|
// Note: it's not right to assert !doneUnpair1_/!doneUnpair2_ here.
|
||||||
|
// Even if we're done with finding
|
||||||
|
if(mate1) {
|
||||||
|
nunpair1_++;
|
||||||
|
if(repeat) {
|
||||||
|
nunpairRepeat1_++;
|
||||||
|
}
|
||||||
|
// Did we just finish with this mate?
|
||||||
|
if(!doneUnpair1_) {
|
||||||
|
areDone(nunpair1_, doneUnpair1_, exitUnpair1_);
|
||||||
|
if(doneUnpair1_) {
|
||||||
|
doneUnpair_ = doneUnpair1_ && doneUnpair2_;
|
||||||
|
updateDone();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if(nunpair1_ > 1) {
|
||||||
|
doneDiscord_ = true;
|
||||||
|
exitDiscord_ = ReportingState::EXIT_NO_ALIGNMENTS;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
nunpair2_++;
|
||||||
|
if(repeat) {
|
||||||
|
nunpairRepeat2_++;
|
||||||
|
}
|
||||||
|
// Did we just finish with this mate?
|
||||||
|
if(!doneUnpair2_) {
|
||||||
|
areDone(nunpair2_, doneUnpair2_, exitUnpair2_);
|
||||||
|
if(doneUnpair2_) {
|
||||||
|
doneUnpair_ = doneUnpair1_ && doneUnpair2_;
|
||||||
|
updateDone();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if(nunpair2_ > 1) {
|
||||||
|
doneDiscord_ = true;
|
||||||
|
exitDiscord_ = ReportingState::EXIT_NO_ALIGNMENTS;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return done();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Called to indicate that the aligner has finished searching for
|
||||||
|
* alignments. This gives us a chance to finalize our state.
|
||||||
|
*
|
||||||
|
* TODO: Keep track of short-circuiting information.
|
||||||
|
*/
|
||||||
|
void ReportingState::finish() {
|
||||||
|
if(!doneConcord_) {
|
||||||
|
doneConcord_ = true;
|
||||||
|
exitConcord_ =
|
||||||
|
((nconcord_ > 0) ?
|
||||||
|
ReportingState::EXIT_WITH_ALIGNMENTS :
|
||||||
|
ReportingState::EXIT_NO_ALIGNMENTS);
|
||||||
|
}
|
||||||
|
assert_gt(exitConcord_, EXIT_DID_NOT_EXIT);
|
||||||
|
if(!doneUnpair1_) {
|
||||||
|
doneUnpair1_ = true;
|
||||||
|
exitUnpair1_ =
|
||||||
|
((nunpair1_ > 0) ?
|
||||||
|
ReportingState::EXIT_WITH_ALIGNMENTS :
|
||||||
|
ReportingState::EXIT_NO_ALIGNMENTS);
|
||||||
|
}
|
||||||
|
assert_gt(exitUnpair1_, EXIT_DID_NOT_EXIT);
|
||||||
|
if(!doneUnpair2_) {
|
||||||
|
doneUnpair2_ = true;
|
||||||
|
exitUnpair2_ =
|
||||||
|
((nunpair2_ > 0) ?
|
||||||
|
ReportingState::EXIT_WITH_ALIGNMENTS :
|
||||||
|
ReportingState::EXIT_NO_ALIGNMENTS);
|
||||||
|
}
|
||||||
|
assert_gt(exitUnpair2_, EXIT_DID_NOT_EXIT);
|
||||||
|
if(!doneDiscord_) {
|
||||||
|
// Check if the unpaired alignments should be converted to a single
|
||||||
|
// discordant paired-end alignment.
|
||||||
|
assert_eq(0, ndiscord_);
|
||||||
|
if(nconcord_ == 0 && nunpair1_ == 1 && nunpair2_ == 1) {
|
||||||
|
convertUnpairedToDiscordant();
|
||||||
|
}
|
||||||
|
doneDiscord_ = true;
|
||||||
|
exitDiscord_ =
|
||||||
|
((ndiscord_ > 0) ?
|
||||||
|
ReportingState::EXIT_WITH_ALIGNMENTS :
|
||||||
|
ReportingState::EXIT_NO_ALIGNMENTS);
|
||||||
|
}
|
||||||
|
assert(!paired_ || exitDiscord_ > ReportingState::EXIT_DID_NOT_EXIT);
|
||||||
|
doneUnpair_ = done_ = true;
|
||||||
|
assert(done());
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Populate given counters with the number of various kinds of alignments
|
||||||
|
* to report for this read. Concordant alignments are preferable to (and
|
||||||
|
* mutually exclusive with) discordant alignments, and paired-end
|
||||||
|
* alignments are preferable to unpaired alignments.
|
||||||
|
*
|
||||||
|
* The caller also needs some additional information for the case where a
|
||||||
|
* pair or unpaired read aligns repetitively. If the read is paired-end
|
||||||
|
* and the paired-end has repetitive concordant alignments, that should be
|
||||||
|
* reported, and 'pairMax' is set to true to indicate this. If the read is
|
||||||
|
* paired-end, does not have any conordant alignments, but does have
|
||||||
|
* repetitive alignments for one or both mates, then that should be
|
||||||
|
* reported, and 'unpair1Max' and 'unpair2Max' are set accordingly.
|
||||||
|
*
|
||||||
|
* Note that it's possible in the case of a paired-end read for the read to
|
||||||
|
* have repetitive concordant alignments, but for one mate to have a unique
|
||||||
|
* unpaired alignment.
|
||||||
|
*/
|
||||||
|
void ReportingState::getReport(
|
||||||
|
uint64_t& nconcordAln, // # concordant alignments to report
|
||||||
|
uint64_t& ndiscordAln, // # discordant alignments to report
|
||||||
|
uint64_t& nunpair1Aln, // # unpaired alignments for mate #1 to report
|
||||||
|
uint64_t& nunpair2Aln, // # unpaired alignments for mate #2 to report
|
||||||
|
uint64_t& nunpairRepeat1Aln, // # unpaired alignments for mate #1 to report
|
||||||
|
uint64_t& nunpairRepeat2Aln, // # unpaired alignments for mate #2 to report
|
||||||
|
bool& pairMax, // repetitive concordant alignments
|
||||||
|
bool& unpair1Max, // repetitive alignments for mate #1
|
||||||
|
bool& unpair2Max) // repetitive alignments for mate #2
|
||||||
|
const
|
||||||
|
{
|
||||||
|
nconcordAln = ndiscordAln = nunpair1Aln = nunpair2Aln = 0;
|
||||||
|
nunpairRepeat1Aln = nunpairRepeat2Aln = 0;
|
||||||
|
pairMax = unpair1Max = unpair2Max = false;
|
||||||
|
assert_gt(p_.khits, 0);
|
||||||
|
assert_gt(p_.mhits, 0);
|
||||||
|
if(paired_) {
|
||||||
|
// Do we have 1 or more concordant alignments to report?
|
||||||
|
if(exitConcord_ == ReportingState::EXIT_SHORT_CIRCUIT_k) {
|
||||||
|
// k at random
|
||||||
|
assert_geq(nconcord_, (uint64_t)p_.khits);
|
||||||
|
nconcordAln = p_.khits;
|
||||||
|
return;
|
||||||
|
} else if(exitConcord_ == ReportingState::EXIT_SHORT_CIRCUIT_M) {
|
||||||
|
assert(p_.msample);
|
||||||
|
assert_gt(nconcord_, 0);
|
||||||
|
pairMax = true; // repetitive concordant alignments
|
||||||
|
if(p_.mixed) {
|
||||||
|
unpair1Max = nunpair1_ > (uint64_t)p_.mhits;
|
||||||
|
unpair2Max = nunpair2_ > (uint64_t)p_.mhits;
|
||||||
|
}
|
||||||
|
// Not sure if this is OK
|
||||||
|
nconcordAln = 1; // 1 at random
|
||||||
|
return;
|
||||||
|
} else if(exitConcord_ == ReportingState::EXIT_WITH_ALIGNMENTS) {
|
||||||
|
assert_gt(nconcord_, 0);
|
||||||
|
// <= k at random
|
||||||
|
nconcordAln = min<uint64_t>(p_.khits, nconcord_);
|
||||||
|
}
|
||||||
|
assert(!p_.mhitsSet() || nconcord_ <= (uint64_t)p_.mhits+1);
|
||||||
|
|
||||||
|
// Do we have a discordant alignment to report?
|
||||||
|
if(exitDiscord_ == ReportingState::EXIT_WITH_ALIGNMENTS) {
|
||||||
|
// Report discordant
|
||||||
|
assert(p_.discord);
|
||||||
|
ndiscordAln = 1;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
assert_neq(ReportingState::EXIT_SHORT_CIRCUIT_TRUMPED, exitUnpair1_);
|
||||||
|
assert_neq(ReportingState::EXIT_SHORT_CIRCUIT_TRUMPED, exitUnpair2_);
|
||||||
|
|
||||||
|
if((paired_ && !p_.mixed) || nunpair1_ + nunpair2_ == 0) {
|
||||||
|
// Unpaired alignments either not reportable or non-existant
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Do we have 1 or more alignments for mate #1 to report?
|
||||||
|
if(exitUnpair1_ == ReportingState::EXIT_SHORT_CIRCUIT_k) {
|
||||||
|
// k at random
|
||||||
|
assert_geq(nunpair1_, (uint64_t)p_.khits);
|
||||||
|
nunpair1Aln = p_.khits;
|
||||||
|
} else if(exitUnpair1_ == ReportingState::EXIT_SHORT_CIRCUIT_M) {
|
||||||
|
assert(p_.msample);
|
||||||
|
assert_gt(nunpair1_, 0);
|
||||||
|
unpair1Max = true; // repetitive alignments for mate #1
|
||||||
|
nunpair1Aln = 1; // 1 at random
|
||||||
|
} else if(exitUnpair1_ == ReportingState::EXIT_WITH_ALIGNMENTS) {
|
||||||
|
assert_gt(nunpair1_, 0);
|
||||||
|
// <= k at random
|
||||||
|
nunpair1Aln = min<uint64_t>(nunpair1_, (uint64_t)p_.khits);
|
||||||
|
}
|
||||||
|
assert(!p_.mhitsSet() || paired_ || nunpair1_ <= (uint64_t)p_.mhits+1);
|
||||||
|
if(p_.repeat) nunpairRepeat1Aln = nunpairRepeat1_;
|
||||||
|
|
||||||
|
// Do we have 2 or more alignments for mate #2 to report?
|
||||||
|
if(exitUnpair2_ == ReportingState::EXIT_SHORT_CIRCUIT_k) {
|
||||||
|
// k at random
|
||||||
|
nunpair2Aln = p_.khits;
|
||||||
|
} else if(exitUnpair2_ == ReportingState::EXIT_SHORT_CIRCUIT_M) {
|
||||||
|
assert(p_.msample);
|
||||||
|
assert_gt(nunpair2_, 0);
|
||||||
|
unpair2Max = true; // repetitive alignments for mate #1
|
||||||
|
nunpair2Aln = 1; // 1 at random
|
||||||
|
} else if(exitUnpair2_ == ReportingState::EXIT_WITH_ALIGNMENTS) {
|
||||||
|
assert_gt(nunpair2_, 0);
|
||||||
|
// <= k at random
|
||||||
|
nunpair2Aln = min<uint64_t>(nunpair2_, (uint64_t)p_.khits);
|
||||||
|
}
|
||||||
|
assert(!p_.mhitsSet() || paired_ || nunpair2_ <= (uint64_t)p_.mhits+1);
|
||||||
|
if(p_.repeat) nunpairRepeat2Aln = nunpairRepeat2_;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Given the number of alignments in a category, check whether we
|
||||||
|
* short-circuited out of the category. Set the done and exit arguments to
|
||||||
|
* indicate whether and how we short-circuited.
|
||||||
|
*/
|
||||||
|
inline void ReportingState::areDone(
|
||||||
|
uint64_t cnt, // # alignments in category
|
||||||
|
bool& done, // out: whether we short-circuited out of category
|
||||||
|
int& exit) const // out: if done, how we short-circuited (-k? -m? etc)
|
||||||
|
{
|
||||||
|
assert(!done);
|
||||||
|
// Have we exceeded the -k limit?
|
||||||
|
assert_gt(p_.khits, 0);
|
||||||
|
assert_gt(p_.mhits, 0);
|
||||||
|
if(cnt >= (uint64_t)p_.khits && !p_.mhitsSet()) {
|
||||||
|
done = true;
|
||||||
|
exit = ReportingState::EXIT_SHORT_CIRCUIT_k;
|
||||||
|
}
|
||||||
|
// Have we exceeded the -m or -M limit?
|
||||||
|
else if(p_.mhitsSet() && cnt > (uint64_t)p_.mhits) {
|
||||||
|
done = true;
|
||||||
|
assert(p_.msample);
|
||||||
|
exit = ReportingState::EXIT_SHORT_CIRCUIT_M;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifdef ALN_SINK_MAIN
|
||||||
|
|
||||||
|
#include <iostream>
|
||||||
|
|
||||||
|
bool testDones(
|
||||||
|
const ReportingState& st,
|
||||||
|
bool done1,
|
||||||
|
bool done2,
|
||||||
|
bool done3,
|
||||||
|
bool done4,
|
||||||
|
bool done5,
|
||||||
|
bool done6)
|
||||||
|
{
|
||||||
|
assert(st.doneConcordant() == done1);
|
||||||
|
assert(st.doneDiscordant() == done2);
|
||||||
|
assert(st.doneUnpaired(true) == done3);
|
||||||
|
assert(st.doneUnpaired(false) == done4);
|
||||||
|
assert(st.doneUnpaired() == done5);
|
||||||
|
assert(st.done() == done6);
|
||||||
|
assert(st.repOk());
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(void) {
|
||||||
|
cerr << "Case 1 (simple unpaired 1) ... ";
|
||||||
|
{
|
||||||
|
uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
|
||||||
|
bool pairMax = false, unpair1Max = false, unpair2Max = false;
|
||||||
|
ReportingParams rp(
|
||||||
|
2, // khits
|
||||||
|
0, // mhits
|
||||||
|
0, // pengap
|
||||||
|
false, // msample
|
||||||
|
false, // discord
|
||||||
|
false); // mixed
|
||||||
|
ReportingState st(rp);
|
||||||
|
st.nextRead(false); // unpaired read
|
||||||
|
assert(testDones(st, true, true, false, true, false, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, true, true, false, true, false, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, true, true, true, true, true, true));
|
||||||
|
st.finish();
|
||||||
|
assert(testDones(st, true, true, true, true, true, true));
|
||||||
|
assert_eq(0, st.numConcordant());
|
||||||
|
assert_eq(0, st.numDiscordant());
|
||||||
|
assert_eq(2, st.numUnpaired1());
|
||||||
|
assert_eq(0, st.numUnpaired2());
|
||||||
|
assert(st.repOk());
|
||||||
|
st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
|
||||||
|
pairMax, unpair1Max, unpair2Max);
|
||||||
|
assert_eq(0, nconcord);
|
||||||
|
assert_eq(0, ndiscord);
|
||||||
|
assert_eq(2, nunpair1);
|
||||||
|
assert_eq(0, nunpair2);
|
||||||
|
assert(!pairMax);
|
||||||
|
assert(!unpair1Max);
|
||||||
|
assert(!unpair2Max);
|
||||||
|
}
|
||||||
|
cerr << "PASSED" << endl;
|
||||||
|
|
||||||
|
cerr << "Case 2 (simple unpaired 1) ... ";
|
||||||
|
{
|
||||||
|
uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
|
||||||
|
bool pairMax = false, unpair1Max = false, unpair2Max = false;
|
||||||
|
ReportingParams rp(
|
||||||
|
2, // khits
|
||||||
|
3, // mhits
|
||||||
|
0, // pengap
|
||||||
|
false, // msample
|
||||||
|
false, // discord
|
||||||
|
false); // mixed
|
||||||
|
ReportingState st(rp);
|
||||||
|
st.nextRead(false); // unpaired read
|
||||||
|
assert(testDones(st, true, true, false, true, false, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, true, true, false, true, false, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, true, true, false, true, false, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, true, true, false, true, false, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, true, true, true, true, true, true));
|
||||||
|
assert_eq(0, st.numConcordant());
|
||||||
|
assert_eq(0, st.numDiscordant());
|
||||||
|
assert_eq(4, st.numUnpaired1());
|
||||||
|
assert_eq(0, st.numUnpaired2());
|
||||||
|
st.finish();
|
||||||
|
assert(testDones(st, true, true, true, true, true, true));
|
||||||
|
assert_eq(0, st.numConcordant());
|
||||||
|
assert_eq(0, st.numDiscordant());
|
||||||
|
assert_eq(4, st.numUnpaired1());
|
||||||
|
assert_eq(0, st.numUnpaired2());
|
||||||
|
assert(st.repOk());
|
||||||
|
st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
|
||||||
|
pairMax, unpair1Max, unpair2Max);
|
||||||
|
assert_eq(0, nconcord);
|
||||||
|
assert_eq(0, ndiscord);
|
||||||
|
assert_eq(0, nunpair1);
|
||||||
|
assert_eq(0, nunpair2);
|
||||||
|
assert(!pairMax);
|
||||||
|
assert(unpair1Max);
|
||||||
|
assert(!unpair2Max);
|
||||||
|
}
|
||||||
|
cerr << "PASSED" << endl;
|
||||||
|
|
||||||
|
cerr << "Case 3 (simple paired 1) ... ";
|
||||||
|
{
|
||||||
|
uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
|
||||||
|
bool pairMax = false, unpair1Max = false, unpair2Max = false;
|
||||||
|
ReportingParams rp(
|
||||||
|
2, // khits
|
||||||
|
3, // mhits
|
||||||
|
0, // pengap
|
||||||
|
false, // msample
|
||||||
|
false, // discord
|
||||||
|
false); // mixed
|
||||||
|
ReportingState st(rp);
|
||||||
|
st.nextRead(true); // unpaired read
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundUnpaired(false);
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundUnpaired(false);
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundUnpaired(false);
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundUnpaired(false);
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundConcordant();
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundConcordant();
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundConcordant();
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundConcordant();
|
||||||
|
assert(testDones(st, true, true, true, true, true, true));
|
||||||
|
assert_eq(4, st.numConcordant());
|
||||||
|
assert_eq(0, st.numDiscordant());
|
||||||
|
assert_eq(4, st.numUnpaired1());
|
||||||
|
assert_eq(4, st.numUnpaired2());
|
||||||
|
st.finish();
|
||||||
|
assert(testDones(st, true, true, true, true, true, true));
|
||||||
|
assert_eq(4, st.numConcordant());
|
||||||
|
assert_eq(0, st.numDiscordant());
|
||||||
|
assert_eq(4, st.numUnpaired1());
|
||||||
|
assert_eq(4, st.numUnpaired2());
|
||||||
|
assert(st.repOk());
|
||||||
|
st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
|
||||||
|
pairMax, unpair1Max, unpair2Max);
|
||||||
|
assert_eq(0, nconcord);
|
||||||
|
assert_eq(0, ndiscord);
|
||||||
|
assert_eq(0, nunpair1);
|
||||||
|
assert_eq(0, nunpair2);
|
||||||
|
assert(pairMax);
|
||||||
|
assert(!unpair1Max); // because !mixed
|
||||||
|
assert(!unpair2Max); // because !mixed
|
||||||
|
}
|
||||||
|
cerr << "PASSED" << endl;
|
||||||
|
|
||||||
|
cerr << "Case 4 (simple paired 2) ... ";
|
||||||
|
{
|
||||||
|
uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
|
||||||
|
bool pairMax = false, unpair1Max = false, unpair2Max = false;
|
||||||
|
ReportingParams rp(
|
||||||
|
2, // khits
|
||||||
|
3, // mhits
|
||||||
|
0, // pengap
|
||||||
|
false, // msample
|
||||||
|
true, // discord
|
||||||
|
true); // mixed
|
||||||
|
ReportingState st(rp);
|
||||||
|
st.nextRead(true); // unpaired read
|
||||||
|
assert(testDones(st, false, false, false, false, false, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, false, false, false, false, false, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, false, true, false, false, false, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, false, true, false, false, false, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, false, true, true, false, false, false));
|
||||||
|
st.foundUnpaired(false);
|
||||||
|
assert(testDones(st, false, true, true, false, false, false));
|
||||||
|
st.foundUnpaired(false);
|
||||||
|
assert(testDones(st, false, true, true, false, false, false));
|
||||||
|
st.foundUnpaired(false);
|
||||||
|
assert(testDones(st, false, true, true, false, false, false));
|
||||||
|
st.foundUnpaired(false);
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundConcordant();
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundConcordant();
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundConcordant();
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
st.foundConcordant();
|
||||||
|
assert(testDones(st, true, true, true, true, true, true));
|
||||||
|
assert_eq(4, st.numConcordant());
|
||||||
|
assert_eq(0, st.numDiscordant());
|
||||||
|
assert_eq(4, st.numUnpaired1());
|
||||||
|
assert_eq(4, st.numUnpaired2());
|
||||||
|
st.finish();
|
||||||
|
assert(testDones(st, true, true, true, true, true, true));
|
||||||
|
assert_eq(4, st.numConcordant());
|
||||||
|
assert_eq(0, st.numDiscordant());
|
||||||
|
assert_eq(4, st.numUnpaired1());
|
||||||
|
assert_eq(4, st.numUnpaired2());
|
||||||
|
assert(st.repOk());
|
||||||
|
st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
|
||||||
|
pairMax, unpair1Max, unpair2Max);
|
||||||
|
assert_eq(0, nconcord);
|
||||||
|
assert_eq(0, ndiscord);
|
||||||
|
assert_eq(0, nunpair1);
|
||||||
|
assert_eq(0, nunpair2);
|
||||||
|
assert(pairMax);
|
||||||
|
assert(unpair1Max);
|
||||||
|
assert(unpair2Max);
|
||||||
|
}
|
||||||
|
cerr << "PASSED" << endl;
|
||||||
|
|
||||||
|
cerr << "Case 5 (potential discordant after concordant) ... ";
|
||||||
|
{
|
||||||
|
uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
|
||||||
|
bool pairMax = false, unpair1Max = false, unpair2Max = false;
|
||||||
|
ReportingParams rp(
|
||||||
|
2, // khits
|
||||||
|
3, // mhits
|
||||||
|
0, // pengap
|
||||||
|
false, // msample
|
||||||
|
true, // discord
|
||||||
|
true); // mixed
|
||||||
|
ReportingState st(rp);
|
||||||
|
st.nextRead(true);
|
||||||
|
assert(testDones(st, false, false, false, false, false, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
st.foundUnpaired(false);
|
||||||
|
st.foundConcordant();
|
||||||
|
assert(testDones(st, false, true, false, false, false, false));
|
||||||
|
st.finish();
|
||||||
|
assert(testDones(st, true, true, true, true, true, true));
|
||||||
|
assert_eq(1, st.numConcordant());
|
||||||
|
assert_eq(0, st.numDiscordant());
|
||||||
|
assert_eq(1, st.numUnpaired1());
|
||||||
|
assert_eq(1, st.numUnpaired2());
|
||||||
|
assert(st.repOk());
|
||||||
|
st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
|
||||||
|
pairMax, unpair1Max, unpair2Max);
|
||||||
|
assert_eq(1, nconcord);
|
||||||
|
assert_eq(0, ndiscord);
|
||||||
|
assert_eq(0, nunpair1);
|
||||||
|
assert_eq(0, nunpair2);
|
||||||
|
assert(!pairMax);
|
||||||
|
assert(!unpair1Max);
|
||||||
|
assert(!unpair2Max);
|
||||||
|
}
|
||||||
|
cerr << "PASSED" << endl;
|
||||||
|
|
||||||
|
cerr << "Case 6 (true discordant) ... ";
|
||||||
|
{
|
||||||
|
uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
|
||||||
|
bool pairMax = false, unpair1Max = false, unpair2Max = false;
|
||||||
|
ReportingParams rp(
|
||||||
|
2, // khits
|
||||||
|
3, // mhits
|
||||||
|
0, // pengap
|
||||||
|
false, // msample
|
||||||
|
true, // discord
|
||||||
|
true); // mixed
|
||||||
|
ReportingState st(rp);
|
||||||
|
st.nextRead(true);
|
||||||
|
assert(testDones(st, false, false, false, false, false, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
st.foundUnpaired(false);
|
||||||
|
assert(testDones(st, false, false, false, false, false, false));
|
||||||
|
st.finish();
|
||||||
|
assert(testDones(st, true, true, true, true, true, true));
|
||||||
|
assert_eq(0, st.numConcordant());
|
||||||
|
assert_eq(1, st.numDiscordant());
|
||||||
|
assert_eq(0, st.numUnpaired1());
|
||||||
|
assert_eq(0, st.numUnpaired2());
|
||||||
|
assert(st.repOk());
|
||||||
|
st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
|
||||||
|
pairMax, unpair1Max, unpair2Max);
|
||||||
|
assert_eq(0, nconcord);
|
||||||
|
assert_eq(1, ndiscord);
|
||||||
|
assert_eq(0, nunpair1);
|
||||||
|
assert_eq(0, nunpair2);
|
||||||
|
assert(!pairMax);
|
||||||
|
assert(!unpair1Max);
|
||||||
|
assert(!unpair2Max);
|
||||||
|
}
|
||||||
|
cerr << "PASSED" << endl;
|
||||||
|
|
||||||
|
cerr << "Case 7 (unaligned pair & uniquely aligned mate, mixed-mode) ... ";
|
||||||
|
{
|
||||||
|
uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
|
||||||
|
bool pairMax = false, unpair1Max = false, unpair2Max = false;
|
||||||
|
ReportingParams rp(
|
||||||
|
1, // khits
|
||||||
|
1, // mhits
|
||||||
|
0, // pengap
|
||||||
|
false, // msample
|
||||||
|
true, // discord
|
||||||
|
true); // mixed
|
||||||
|
ReportingState st(rp);
|
||||||
|
st.nextRead(true); // unpaired read
|
||||||
|
// assert(st.doneConcordant() == done1);
|
||||||
|
// assert(st.doneDiscordant() == done2);
|
||||||
|
// assert(st.doneUnpaired(true) == done3);
|
||||||
|
// assert(st.doneUnpaired(false) == done4);
|
||||||
|
// assert(st.doneUnpaired() == done5);
|
||||||
|
// assert(st.done() == done6);
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, false, false, false, false, false, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, false, true, true, false, false, false));
|
||||||
|
assert_eq(0, st.numConcordant());
|
||||||
|
assert_eq(0, st.numDiscordant());
|
||||||
|
assert_eq(2, st.numUnpaired1());
|
||||||
|
assert_eq(0, st.numUnpaired2());
|
||||||
|
st.finish();
|
||||||
|
st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
|
||||||
|
pairMax, unpair1Max, unpair2Max);
|
||||||
|
assert_eq(0, nconcord);
|
||||||
|
assert_eq(0, ndiscord);
|
||||||
|
assert_eq(0, nunpair1);
|
||||||
|
assert_eq(0, nunpair2);
|
||||||
|
assert(!pairMax);
|
||||||
|
assert(unpair1Max);
|
||||||
|
assert(!unpair2Max);
|
||||||
|
}
|
||||||
|
cerr << "PASSED" << endl;
|
||||||
|
|
||||||
|
cerr << "Case 8 (unaligned pair & uniquely aligned mate, NOT mixed-mode) ... ";
|
||||||
|
{
|
||||||
|
uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
|
||||||
|
bool pairMax = false, unpair1Max = false, unpair2Max = false;
|
||||||
|
ReportingParams rp(
|
||||||
|
1, // khits
|
||||||
|
1, // mhits
|
||||||
|
0, // pengap
|
||||||
|
false, // msample
|
||||||
|
true, // discord
|
||||||
|
false); // mixed
|
||||||
|
ReportingState st(rp);
|
||||||
|
st.nextRead(true); // unpaired read
|
||||||
|
// assert(st.doneConcordant() == done1);
|
||||||
|
// assert(st.doneDiscordant() == done2);
|
||||||
|
// assert(st.doneUnpaired(true) == done3);
|
||||||
|
// assert(st.doneUnpaired(false) == done4);
|
||||||
|
// assert(st.doneUnpaired() == done5);
|
||||||
|
// assert(st.done() == done6);
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, false, false, true, true, true, false));
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(testDones(st, false, true, true, true, true, false));
|
||||||
|
assert_eq(0, st.numConcordant());
|
||||||
|
assert_eq(0, st.numDiscordant());
|
||||||
|
assert_eq(2, st.numUnpaired1());
|
||||||
|
assert_eq(0, st.numUnpaired2());
|
||||||
|
st.finish();
|
||||||
|
st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
|
||||||
|
pairMax, unpair1Max, unpair2Max);
|
||||||
|
assert_eq(0, nconcord);
|
||||||
|
assert_eq(0, ndiscord);
|
||||||
|
assert_eq(0, nunpair1);
|
||||||
|
assert_eq(0, nunpair2);
|
||||||
|
assert(!pairMax);
|
||||||
|
assert(!unpair1Max); // not really relevant
|
||||||
|
assert(!unpair2Max); // not really relevant
|
||||||
|
}
|
||||||
|
cerr << "PASSED" << endl;
|
||||||
|
|
||||||
|
cerr << "Case 9 (repetitive pair, only one mate repetitive) ... ";
|
||||||
|
{
|
||||||
|
uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
|
||||||
|
bool pairMax = false, unpair1Max = false, unpair2Max = false;
|
||||||
|
ReportingParams rp(
|
||||||
|
1, // khits
|
||||||
|
1, // mhits
|
||||||
|
0, // pengap
|
||||||
|
true, // msample
|
||||||
|
true, // discord
|
||||||
|
true); // mixed
|
||||||
|
ReportingState st(rp);
|
||||||
|
st.nextRead(true); // unpaired read
|
||||||
|
// assert(st.doneConcordant() == done1);
|
||||||
|
// assert(st.doneDiscordant() == done2);
|
||||||
|
// assert(st.doneUnpaired(true) == done3);
|
||||||
|
// assert(st.doneUnpaired(false) == done4);
|
||||||
|
// assert(st.doneUnpaired() == done5);
|
||||||
|
// assert(st.done() == done6);
|
||||||
|
st.foundConcordant();
|
||||||
|
assert(st.repOk());
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(st.repOk());
|
||||||
|
st.foundUnpaired(false);
|
||||||
|
assert(st.repOk());
|
||||||
|
assert(testDones(st, false, true, false, false, false, false));
|
||||||
|
assert(st.repOk());
|
||||||
|
st.foundConcordant();
|
||||||
|
assert(st.repOk());
|
||||||
|
st.foundUnpaired(true);
|
||||||
|
assert(st.repOk());
|
||||||
|
assert(testDones(st, true, true, true, false, false, false));
|
||||||
|
assert_eq(2, st.numConcordant());
|
||||||
|
assert_eq(0, st.numDiscordant());
|
||||||
|
assert_eq(2, st.numUnpaired1());
|
||||||
|
assert_eq(1, st.numUnpaired2());
|
||||||
|
st.foundUnpaired(false);
|
||||||
|
assert(st.repOk());
|
||||||
|
assert(testDones(st, true, true, true, true, true, true));
|
||||||
|
assert_eq(2, st.numConcordant());
|
||||||
|
assert_eq(0, st.numDiscordant());
|
||||||
|
assert_eq(2, st.numUnpaired1());
|
||||||
|
assert_eq(2, st.numUnpaired2());
|
||||||
|
st.finish();
|
||||||
|
st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
|
||||||
|
pairMax, unpair1Max, unpair2Max);
|
||||||
|
assert_eq(1, nconcord);
|
||||||
|
assert_eq(0, ndiscord);
|
||||||
|
assert_eq(0, nunpair1);
|
||||||
|
assert_eq(0, nunpair2);
|
||||||
|
assert(pairMax);
|
||||||
|
assert(unpair1Max); // not really relevant
|
||||||
|
assert(unpair2Max); // not really relevant
|
||||||
|
}
|
||||||
|
cerr << "PASSED" << endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /*def ALN_SINK_MAIN*/
|
4384
aln_sink.h
Normal file
4384
aln_sink.h
Normal file
File diff suppressed because it is too large
Load Diff
536
alphabet.cpp
Normal file
536
alphabet.cpp
Normal file
@ -0,0 +1,536 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <cassert>
|
||||||
|
#include <string>
|
||||||
|
#include "alphabet.h"
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Mapping from ASCII characters to DNA categories:
|
||||||
|
*
|
||||||
|
* 0 = invalid - error
|
||||||
|
* 1 = DNA
|
||||||
|
* 2 = IUPAC (ambiguous DNA)
|
||||||
|
* 3 = not an error, but unmatchable; alignments containing this
|
||||||
|
* character are invalid
|
||||||
|
*/
|
||||||
|
uint8_t asc2dnacat[] = {
|
||||||
|
/* 0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0,
|
||||||
|
/* - */
|
||||||
|
/* 48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 64 */ 0, 1, 2, 1, 2, 0, 0, 1, 2, 0, 0, 2, 0, 2, 2, 0,
|
||||||
|
/* A B C D G H K M N */
|
||||||
|
/* 80 */ 0, 0, 2, 2, 1, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* R S T V W X Y */
|
||||||
|
/* 96 */ 0, 1, 2, 1, 2, 0, 0, 1, 2, 0, 0, 2, 0, 2, 2, 0,
|
||||||
|
/* a b c d g h k m n */
|
||||||
|
/* 112 */ 0, 0, 2, 2, 1, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* r s t v w x y */
|
||||||
|
/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
};
|
||||||
|
|
||||||
|
// 5-bit pop count
|
||||||
|
int mask2popcnt[] = {
|
||||||
|
0, 1, 1, 2, 1, 2, 2, 3,
|
||||||
|
1, 2, 2, 3, 2, 3, 3, 4,
|
||||||
|
1, 2, 2, 3, 2, 3, 3, 4,
|
||||||
|
2, 3, 3, 4, 3, 4, 4, 5
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Mapping from masks to ASCII characters for ambiguous nucleotides.
|
||||||
|
*/
|
||||||
|
char mask2dna[] = {
|
||||||
|
'?', // 0
|
||||||
|
'A', // 1
|
||||||
|
'C', // 2
|
||||||
|
'M', // 3
|
||||||
|
'G', // 4
|
||||||
|
'R', // 5
|
||||||
|
'S', // 6
|
||||||
|
'V', // 7
|
||||||
|
'T', // 8
|
||||||
|
'W', // 9
|
||||||
|
'Y', // 10
|
||||||
|
'H', // 11
|
||||||
|
'K', // 12
|
||||||
|
'D', // 13
|
||||||
|
'B', // 14
|
||||||
|
'N', // 15 (inclusive N)
|
||||||
|
'N' // 16 (exclusive N)
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Mapping from ASCII characters for ambiguous nucleotides into masks:
|
||||||
|
*/
|
||||||
|
uint8_t asc2dnamask[] = {
|
||||||
|
/* 0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 64 */ 0, 1,14, 2,13, 0, 0, 4,11, 0, 0,12, 0, 3,15, 0,
|
||||||
|
/* A B C D G H K M N */
|
||||||
|
/* 80 */ 0, 0, 5, 6, 8, 0, 7, 9, 0,10, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* R S T V W Y */
|
||||||
|
/* 96 */ 0, 1,14, 2,13, 0, 0, 4,11, 0, 0,12, 0, 3,15, 0,
|
||||||
|
/* a b c d g h k m n */
|
||||||
|
/* 112 */ 0, 0, 5, 6, 8, 0, 7, 9, 0,10, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* r s t v w y */
|
||||||
|
/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Convert a pair of DNA masks to a color mask
|
||||||
|
*
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
uint8_t dnamasks2colormask[16][16] = {
|
||||||
|
/* 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 */
|
||||||
|
/* 0 */ { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
|
||||||
|
/* 1 */ { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
|
||||||
|
/* 2 */ { 0, 2, 1, 3, 8, 10, 9, 11, 4, 6, 5, 7, 12, 14, 13, 15 },
|
||||||
|
/* 3 */ { 0, 3, 3, 3, 12, 15, 15, 15, 12, 15, 15, 15, 12, 15, 15, 15 },
|
||||||
|
/* 4 */ { 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15 },
|
||||||
|
/* 5 */ { 0, 5, 10, 15, 5, 5, 15, 15, 10, 15, 10, 15, 15, 15, 15, 15 },
|
||||||
|
/* 6 */ { 0, 6, 9, 15, 9, 15, 9, 15, 6, 6, 15, 15, 15, 15, 15, 15 },
|
||||||
|
/* 7 */ { 0, 7, 11, 15, 13, 15, 15, 15, 14, 15, 15, 15, 15, 15, 15, 15 },
|
||||||
|
/* 8 */ { 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15 },
|
||||||
|
/* 9 */ { 0, 9, 6, 15, 6, 15, 6, 15, 9, 9, 15, 15, 15, 15, 15, 15 },
|
||||||
|
/* 10 */ { 0, 10, 5, 15, 10, 10, 15, 15, 5, 15, 5, 15, 15, 15, 15, 15 },
|
||||||
|
/* 11 */ { 0, 11, 7, 15, 14, 15, 15, 15, 13, 15, 15, 15, 15, 15, 15, 15 },
|
||||||
|
/* 12 */ { 0, 12, 12, 12, 3, 15, 15, 15, 3, 15, 15, 15, 3, 15, 15, 15 },
|
||||||
|
/* 13 */ { 0, 13, 14, 15, 7, 15, 15, 15, 11, 15, 15, 15, 15, 15, 15, 15 },
|
||||||
|
/* 14 */ { 0, 14, 13, 15, 11, 15, 15, 15, 7, 15, 15, 15, 15, 15, 15, 15 },
|
||||||
|
/* 15 */ { 0, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15 }
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Mapping from ASCII characters for ambiguous nucleotides into masks:
|
||||||
|
*/
|
||||||
|
char asc2dnacomp[] = {
|
||||||
|
/* 0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,'-', 0, 0,
|
||||||
|
/* 48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 64 */ 0,'T','V','G','H', 0, 0,'C','D', 0, 0,'M', 0,'K','N', 0,
|
||||||
|
/* A B C D G H K M N */
|
||||||
|
/* 80 */ 0, 0,'Y','S','A', 0,'B','W', 0,'R', 0, 0, 0, 0, 0, 0,
|
||||||
|
/* R S T V W Y */
|
||||||
|
/* 96 */ 0,'T','V','G','H', 0, 0,'C','D', 0, 0,'M', 0,'K','N', 0,
|
||||||
|
/* a b c d g h k m n */
|
||||||
|
/* 112 */ 0, 0,'Y','S','A', 0,'B','W', 0,'R', 0, 0, 0, 0, 0, 0,
|
||||||
|
/* r s t v w y */
|
||||||
|
/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Mapping from ASCII characters for ambiguous nucleotides into masks:
|
||||||
|
*/
|
||||||
|
char col2dna[] = {
|
||||||
|
/* 0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,'-','N', 0,
|
||||||
|
/* - . */
|
||||||
|
/* 48 */'A','C','G','T','N', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 0 1 2 3 4 */
|
||||||
|
/* 64 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 80 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 96 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 112 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Mapping from ASCII characters for ambiguous nucleotides into masks:
|
||||||
|
*/
|
||||||
|
char dna2col[] = {
|
||||||
|
/* 0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,'-', 0, 0,
|
||||||
|
/* 48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 64 */ 0,'0', 0,'1', 0, 0, 0,'2', 0, 0, 0, 0, 0, 0,'4', 0,
|
||||||
|
/* A C G N */
|
||||||
|
/* 80 */ 0, 0, 0, 0,'3', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* T */
|
||||||
|
/* 92 */ 0,'0', 0,'1', 0, 0, 0,'2', 0, 0, 0, 0, 0, 0,'4', 0,
|
||||||
|
/* a c g n */
|
||||||
|
/* 112 */ 0, 0, 0, 0,'3', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* t */
|
||||||
|
/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Mapping from ASCII characters for ambiguous nucleotides into masks:
|
||||||
|
*/
|
||||||
|
const char* dna2colstr[] = {
|
||||||
|
/* 0 */ "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?",
|
||||||
|
/* 16 */ "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?",
|
||||||
|
/* 32 */ "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "-", "?", "?",
|
||||||
|
/* 48 */ "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?",
|
||||||
|
/* 64 */ "?", "0","1|2|3","1","0|2|3","?", "?", "2","0|1|3","?", "?", "2|3", "?", "0|1", ".", "?",
|
||||||
|
/* A B C D G H K M N */
|
||||||
|
/* 80 */ "?", "?", "0|2","1|2", "3", "?","0|1|2","0|3","?", "1|3", "?", "?", "?", "?", "?", "?",
|
||||||
|
/* R S T V W Y */
|
||||||
|
/* 92 */ "?", "?","1|2|3","1","0|2|3","?", "?", "2","0|1|3","?", "?", "2|3", "?", "0|1", ".", "?",
|
||||||
|
/* a b c d g h k m n */
|
||||||
|
/* 112 */ "?", "0", "0|2","1|2", "3", "?","0|1|2","0|3","?", "1|3", "?", "?", "?", "?", "?", "?",
|
||||||
|
/* r s t v w y */
|
||||||
|
/* 128 */ "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?",
|
||||||
|
/* 144 */ "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?",
|
||||||
|
/* 160 */ "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?",
|
||||||
|
/* 176 */ "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?",
|
||||||
|
/* 192 */ "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?",
|
||||||
|
/* 208 */ "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?",
|
||||||
|
/* 224 */ "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?",
|
||||||
|
/* 240 */ "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?", "?"
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Mapping from ASCII characters to color categories:
|
||||||
|
*
|
||||||
|
* 0 = invalid - error
|
||||||
|
* 1 = valid color
|
||||||
|
* 2 = IUPAC (ambiguous DNA) - there is no such thing for colors to my
|
||||||
|
* knowledge
|
||||||
|
* 3 = not an error, but unmatchable; alignments containing this
|
||||||
|
* character are invalid
|
||||||
|
*/
|
||||||
|
uint8_t asc2colcat[] = {
|
||||||
|
/* 0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0,
|
||||||
|
/* - . */
|
||||||
|
/* 48 */ 1, 1, 1, 1, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 0 1 2 3 4 */
|
||||||
|
/* 64 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 80 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 96 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 112 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Set the category for all IUPAC codes. By default they're in
|
||||||
|
* category 2 (IUPAC), but sometimes we'd like to put them in category
|
||||||
|
* 3 (unmatchable), for example.
|
||||||
|
*/
|
||||||
|
void setIupacsCat(uint8_t cat) {
|
||||||
|
assert(cat < 4);
|
||||||
|
asc2dnacat[(int)'B'] = asc2dnacat[(int)'b'] =
|
||||||
|
asc2dnacat[(int)'D'] = asc2dnacat[(int)'d'] =
|
||||||
|
asc2dnacat[(int)'H'] = asc2dnacat[(int)'h'] =
|
||||||
|
asc2dnacat[(int)'K'] = asc2dnacat[(int)'k'] =
|
||||||
|
asc2dnacat[(int)'M'] = asc2dnacat[(int)'m'] =
|
||||||
|
asc2dnacat[(int)'N'] = asc2dnacat[(int)'n'] =
|
||||||
|
asc2dnacat[(int)'R'] = asc2dnacat[(int)'r'] =
|
||||||
|
asc2dnacat[(int)'S'] = asc2dnacat[(int)'s'] =
|
||||||
|
asc2dnacat[(int)'V'] = asc2dnacat[(int)'v'] =
|
||||||
|
asc2dnacat[(int)'W'] = asc2dnacat[(int)'w'] =
|
||||||
|
asc2dnacat[(int)'X'] = asc2dnacat[(int)'x'] =
|
||||||
|
asc2dnacat[(int)'Y'] = asc2dnacat[(int)'y'] = cat;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// For converting from ASCII to the Dna5 code where A=0, C=1, G=2,
|
||||||
|
/// T=3, N=4
|
||||||
|
|
||||||
|
|
||||||
|
uint8_t asc2dna[] = {
|
||||||
|
/* 0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 64 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
|
||||||
|
/* A C G N */
|
||||||
|
/* 80 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* T */
|
||||||
|
/* 96 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
|
||||||
|
/* a c g n */
|
||||||
|
/* 112 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* t */
|
||||||
|
/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
};
|
||||||
|
|
||||||
|
uint8_t asc2dna_3N[2][256] = {
|
||||||
|
{
|
||||||
|
/* 0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 64 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
|
||||||
|
/* A C G N */
|
||||||
|
/* 80 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* T */
|
||||||
|
/* 96 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
|
||||||
|
/* a c g n */
|
||||||
|
/* 112 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* t */
|
||||||
|
/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
/* 0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 64 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
|
||||||
|
/* A C G N */
|
||||||
|
/* 80 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* T */
|
||||||
|
/* 96 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
|
||||||
|
/* a c g n */
|
||||||
|
/* 112 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* t */
|
||||||
|
/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
// this is only used in BASE_CHANGE case
|
||||||
|
uint8_t asc2dna_1[] = {
|
||||||
|
/* 0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 64 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
|
||||||
|
/* A C G N */
|
||||||
|
/* 80 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* T */
|
||||||
|
/* 96 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
|
||||||
|
/* a c g n */
|
||||||
|
/* 112 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* t */
|
||||||
|
/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
};
|
||||||
|
|
||||||
|
uint8_t asc2dna_2[] = {
|
||||||
|
/* 0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 64 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
|
||||||
|
/* A C G N */
|
||||||
|
/* 80 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* T */
|
||||||
|
/* 96 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
|
||||||
|
/* a c g n */
|
||||||
|
/* 112 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* t */
|
||||||
|
/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
};
|
||||||
|
|
||||||
|
/// Convert an ascii char representing a base or a color to a 2-bit
|
||||||
|
/// code: 0=A,0; 1=C,1; 2=G,2; 3=T,3; 4=N,.
|
||||||
|
uint8_t asc2dnaOrCol[] = {
|
||||||
|
/* 0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 0,
|
||||||
|
/* - . */
|
||||||
|
/* 48 */ 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 0 1 2 3 */
|
||||||
|
/* 64 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
|
||||||
|
/* A C G N */
|
||||||
|
/* 80 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* T */
|
||||||
|
/* 96 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
|
||||||
|
/* a c g n */
|
||||||
|
/* 112 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* t */
|
||||||
|
/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
};
|
||||||
|
|
||||||
|
/// For converting from ASCII to the Dna5 code where A=0, C=1, G=2,
|
||||||
|
/// T=3, N=4
|
||||||
|
uint8_t asc2col[] = {
|
||||||
|
/* 0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 0,
|
||||||
|
/* - . */
|
||||||
|
/* 48 */ 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 0 1 2 3 */
|
||||||
|
/* 64 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 80 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 96 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 112 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Convert a nucleotide and a color to the paired nucleotide. Indexed
|
||||||
|
* first by nucleotide then by color. Note that this is exactly the
|
||||||
|
* same as the dinuc2color array.
|
||||||
|
*/
|
||||||
|
uint8_t nuccol2nuc[5][5] = {
|
||||||
|
/* B G O R . */
|
||||||
|
/* A */ {0, 1, 2, 3, 4},
|
||||||
|
/* C */ {1, 0, 3, 2, 4},
|
||||||
|
/* G */ {2, 3, 0, 1, 4},
|
||||||
|
/* T */ {3, 2, 1, 0, 4},
|
||||||
|
/* N */ {4, 4, 4, 4, 4}
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Convert a pair of nucleotides to a color.
|
||||||
|
*/
|
||||||
|
uint8_t dinuc2color[5][5] = {
|
||||||
|
/* A */ {0, 1, 2, 3, 4},
|
||||||
|
/* C */ {1, 0, 3, 2, 4},
|
||||||
|
/* G */ {2, 3, 0, 1, 4},
|
||||||
|
/* T */ {3, 2, 1, 0, 4},
|
||||||
|
/* N */ {4, 4, 4, 4, 4}
|
||||||
|
};
|
||||||
|
|
||||||
|
/// Convert bit encoded DNA char to its complement
|
||||||
|
int dnacomp[5] = {
|
||||||
|
3, 2, 1, 0, 4
|
||||||
|
};
|
||||||
|
|
||||||
|
const char *iupacs = "!ACMGRSVTWYHKDBN!acmgrsvtwyhkdbn";
|
||||||
|
|
||||||
|
char mask2iupac[16] = {
|
||||||
|
-1,
|
||||||
|
'A', // 0001
|
||||||
|
'C', // 0010
|
||||||
|
'M', // 0011
|
||||||
|
'G', // 0100
|
||||||
|
'R', // 0101
|
||||||
|
'S', // 0110
|
||||||
|
'V', // 0111
|
||||||
|
'T', // 1000
|
||||||
|
'W', // 1001
|
||||||
|
'Y', // 1010
|
||||||
|
'H', // 1011
|
||||||
|
'K', // 1100
|
||||||
|
'D', // 1101
|
||||||
|
'B', // 1110
|
||||||
|
'N', // 1111
|
||||||
|
};
|
||||||
|
|
||||||
|
int maskcomp[16] = {
|
||||||
|
0, // 0000 (!) -> 0000 (!)
|
||||||
|
8, // 0001 (A) -> 1000 (T)
|
||||||
|
4, // 0010 (C) -> 0100 (G)
|
||||||
|
12, // 0011 (M) -> 1100 (K)
|
||||||
|
2, // 0100 (G) -> 0010 (C)
|
||||||
|
10, // 0101 (R) -> 1010 (Y)
|
||||||
|
6, // 0110 (S) -> 0110 (S)
|
||||||
|
14, // 0111 (V) -> 1110 (B)
|
||||||
|
1, // 1000 (T) -> 0001 (A)
|
||||||
|
9, // 1001 (W) -> 1001 (W)
|
||||||
|
5, // 1010 (Y) -> 0101 (R)
|
||||||
|
13, // 1011 (H) -> 1101 (D)
|
||||||
|
3, // 1100 (K) -> 0011 (M)
|
||||||
|
11, // 1101 (D) -> 1011 (H)
|
||||||
|
7, // 1110 (B) -> 0111 (V)
|
||||||
|
15, // 1111 (N) -> 1111 (N)
|
||||||
|
};
|
||||||
|
|
199
alphabet.h
Normal file
199
alphabet.h
Normal file
@ -0,0 +1,199 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef ALPHABETS_H_
|
||||||
|
#define ALPHABETS_H_
|
||||||
|
|
||||||
|
#include <stdexcept>
|
||||||
|
#include <string>
|
||||||
|
#include <sstream>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include "assert_helpers.h"
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
/// Convert an ascii char to a DNA category. Categories are:
|
||||||
|
/// 0 -> invalid
|
||||||
|
/// 1 -> unambiguous a, c, g or t
|
||||||
|
/// 2 -> ambiguous
|
||||||
|
/// 3 -> unmatchable
|
||||||
|
extern uint8_t asc2dnacat[];
|
||||||
|
/// Convert masks to ambiguous nucleotides
|
||||||
|
extern char mask2dna[];
|
||||||
|
/// Convert ambiguous ASCII nuceleotide to mask
|
||||||
|
extern uint8_t asc2dnamask[];
|
||||||
|
/// Convert mask to # of alternative in the mask
|
||||||
|
extern int mask2popcnt[];
|
||||||
|
/// Convert an ascii char to a 2-bit base: 0=A, 1=C, 2=G, 3=T, 4=N
|
||||||
|
extern uint8_t asc2dna[];
|
||||||
|
/// Convert an ascii char representing a base or a color to a 2-bit
|
||||||
|
/// code: 0=A,0; 1=C,1; 2=G,2; 3=T,3; 4=N,.
|
||||||
|
extern uint8_t asc2dnaOrCol[];
|
||||||
|
/// Convert a pair of DNA masks to a color mask
|
||||||
|
extern uint8_t dnamasks2colormask[16][16];
|
||||||
|
|
||||||
|
/// Convert an ascii char to a color category. Categories are:
|
||||||
|
/// 0 -> invalid
|
||||||
|
/// 1 -> unambiguous 0, 1, 2 or 3
|
||||||
|
/// 2 -> ambiguous (not applicable for colors)
|
||||||
|
/// 3 -> unmatchable
|
||||||
|
extern uint8_t asc2colcat[];
|
||||||
|
/// Convert an ascii char to a 2-bit base: 0=A, 1=C, 2=G, 3=T, 4=N
|
||||||
|
extern uint8_t asc2col[];
|
||||||
|
/// Convert an ascii char to its DNA complement, including IUPACs
|
||||||
|
extern char asc2dnacomp[];
|
||||||
|
|
||||||
|
/// Convert a pair of 2-bit (and 4=N) encoded DNA bases to a color
|
||||||
|
extern uint8_t dinuc2color[5][5];
|
||||||
|
/// Convert a 2-bit nucleotide (and 4=N) and a color to the
|
||||||
|
/// corresponding 2-bit nucleotide
|
||||||
|
extern uint8_t nuccol2nuc[5][5];
|
||||||
|
/// Convert a 4-bit mask into an IUPAC code
|
||||||
|
extern char mask2iupac[16];
|
||||||
|
|
||||||
|
/// Convert an ascii color to an ascii dna char
|
||||||
|
extern char col2dna[];
|
||||||
|
/// Convert an ascii dna to a color char
|
||||||
|
extern char dna2col[];
|
||||||
|
/// Convert an ascii dna to a color char
|
||||||
|
extern const char* dna2colstr[];
|
||||||
|
|
||||||
|
/// Convert bit encoded DNA char to its complement
|
||||||
|
extern int dnacomp[5];
|
||||||
|
|
||||||
|
/// String of all DNA and IUPAC characters
|
||||||
|
extern const char *iupacs;
|
||||||
|
|
||||||
|
/// Map from masks to their reverse-complement masks
|
||||||
|
extern int maskcomp[16];
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff c is a Dna character.
|
||||||
|
*/
|
||||||
|
static inline bool isDna(char c) {
|
||||||
|
return asc2dnacat[(int)c] > 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff c is a color character.
|
||||||
|
*/
|
||||||
|
static inline bool isColor(char c) {
|
||||||
|
return asc2colcat[(int)c] > 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff c is an ambiguous Dna character.
|
||||||
|
*/
|
||||||
|
static inline bool isAmbigNuc(char c) {
|
||||||
|
return asc2dnacat[(int)c] == 2;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff c is an ambiguous color character.
|
||||||
|
*/
|
||||||
|
static inline bool isAmbigColor(char c) {
|
||||||
|
return asc2colcat[(int)c] == 2;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff c is an ambiguous character.
|
||||||
|
*/
|
||||||
|
static inline bool isAmbig(char c, bool color) {
|
||||||
|
return (color ? asc2colcat[(int)c] : asc2dnacat[(int)c]) == 2;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff c is an unambiguous DNA character.
|
||||||
|
*/
|
||||||
|
static inline bool isUnambigNuc(char c) {
|
||||||
|
return asc2dnacat[(int)c] == 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the DNA complement of the given ASCII char.
|
||||||
|
*/
|
||||||
|
static inline char comp(char c) {
|
||||||
|
switch(c) {
|
||||||
|
case 'a': return 't';
|
||||||
|
case 'A': return 'T';
|
||||||
|
case 'c': return 'g';
|
||||||
|
case 'C': return 'G';
|
||||||
|
case 'g': return 'c';
|
||||||
|
case 'G': return 'C';
|
||||||
|
case 't': return 'a';
|
||||||
|
case 'T': return 'A';
|
||||||
|
default: return c;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return the reverse complement of a bit-encoded nucleotide.
|
||||||
|
*/
|
||||||
|
static inline int compDna(int c) {
|
||||||
|
assert_leq(c, 4);
|
||||||
|
return dnacomp[c];
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff c is an unambiguous Dna character.
|
||||||
|
*/
|
||||||
|
static inline bool isUnambigDna(char c) {
|
||||||
|
return asc2dnacat[(int)c] == 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff c is an unambiguous color character (0,1,2,3).
|
||||||
|
*/
|
||||||
|
static inline bool isUnambigColor(char c) {
|
||||||
|
return asc2colcat[(int)c] == 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Convert a pair of 2-bit (and 4=N) encoded DNA bases to a color
|
||||||
|
extern uint8_t dinuc2color[5][5];
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Decode a not-necessarily-ambiguous nucleotide.
|
||||||
|
*/
|
||||||
|
static inline void decodeNuc(char c , int& num, int *alts) {
|
||||||
|
switch(c) {
|
||||||
|
case 'A': alts[0] = 0; num = 1; break;
|
||||||
|
case 'C': alts[0] = 1; num = 1; break;
|
||||||
|
case 'G': alts[0] = 2; num = 1; break;
|
||||||
|
case 'T': alts[0] = 3; num = 1; break;
|
||||||
|
case 'M': alts[0] = 0; alts[1] = 1; num = 2; break;
|
||||||
|
case 'R': alts[0] = 0; alts[1] = 2; num = 2; break;
|
||||||
|
case 'W': alts[0] = 0; alts[1] = 3; num = 2; break;
|
||||||
|
case 'S': alts[0] = 1; alts[1] = 2; num = 2; break;
|
||||||
|
case 'Y': alts[0] = 1; alts[1] = 3; num = 2; break;
|
||||||
|
case 'K': alts[0] = 2; alts[1] = 3; num = 2; break;
|
||||||
|
case 'V': alts[0] = 0; alts[1] = 1; alts[2] = 2; num = 3; break;
|
||||||
|
case 'H': alts[0] = 0; alts[1] = 1; alts[2] = 3; num = 3; break;
|
||||||
|
case 'D': alts[0] = 0; alts[1] = 2; alts[2] = 3; num = 3; break;
|
||||||
|
case 'B': alts[0] = 1; alts[1] = 2; alts[2] = 3; num = 3; break;
|
||||||
|
case 'N': alts[0] = 0; alts[1] = 1; alts[2] = 2; alts[3] = 3; num = 4; break;
|
||||||
|
default: {
|
||||||
|
std::cerr << "Bad IUPAC code: " << c << ", (int: " << (int)c << ")" << std::endl;
|
||||||
|
throw std::runtime_error("");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
extern void setIupacsCat(uint8_t cat);
|
||||||
|
|
||||||
|
#endif /*ALPHABETS_H_*/
|
294
alt.h
Normal file
294
alt.h
Normal file
@ -0,0 +1,294 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2015, Daehwan Kim <infphilo@gmail.com>
|
||||||
|
*
|
||||||
|
* This file is part of HISAT 2.
|
||||||
|
*
|
||||||
|
* HISAT 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* HISAT 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with HISAT 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef ALT_H_
|
||||||
|
#define ALT_H_
|
||||||
|
|
||||||
|
#include <iostream>
|
||||||
|
#include <fstream>
|
||||||
|
#include <limits>
|
||||||
|
#include "assert_helpers.h"
|
||||||
|
#include "word_io.h"
|
||||||
|
#include "mem_ids.h"
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
enum ALT_TYPE {
|
||||||
|
ALT_NONE = 0,
|
||||||
|
ALT_SNP_SGL, // single nucleotide substitution
|
||||||
|
ALT_SNP_INS, // small insertion wrt reference genome
|
||||||
|
ALT_SNP_DEL, // small deletion wrt reference genome
|
||||||
|
ALT_SNP_ALT, // alternative sequence (to be implemented ...)
|
||||||
|
ALT_SPLICESITE,
|
||||||
|
ALT_EXON
|
||||||
|
};
|
||||||
|
|
||||||
|
template <typename index_t>
|
||||||
|
struct ALT {
|
||||||
|
ALT() {
|
||||||
|
reset();
|
||||||
|
}
|
||||||
|
|
||||||
|
void reset() {
|
||||||
|
type = ALT_NONE;
|
||||||
|
pos = len = 0;
|
||||||
|
seq = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
ALT_TYPE type;
|
||||||
|
|
||||||
|
union {
|
||||||
|
index_t pos;
|
||||||
|
index_t left;
|
||||||
|
};
|
||||||
|
|
||||||
|
union {
|
||||||
|
index_t len;
|
||||||
|
index_t right;
|
||||||
|
};
|
||||||
|
|
||||||
|
union {
|
||||||
|
uint64_t seq; // used to store 32 bp, but it can be used to store a pointer to EList<uint64_t>
|
||||||
|
struct {
|
||||||
|
union {
|
||||||
|
bool fw;
|
||||||
|
bool reversed;
|
||||||
|
};
|
||||||
|
bool excluded;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
public:
|
||||||
|
// in order to support a sequence longer than 32 bp
|
||||||
|
|
||||||
|
bool snp() const { return type == ALT_SNP_SGL || type == ALT_SNP_DEL || type == ALT_SNP_INS; }
|
||||||
|
bool splicesite() const { return type == ALT_SPLICESITE; }
|
||||||
|
bool mismatch() const { return type == ALT_SNP_SGL; }
|
||||||
|
bool gap() const { return type == ALT_SNP_DEL || type == ALT_SNP_INS || type == ALT_SPLICESITE; }
|
||||||
|
bool deletion() const { return type == ALT_SNP_DEL; }
|
||||||
|
bool insertion() const { return type == ALT_SNP_INS; }
|
||||||
|
bool exon() const { return type == ALT_EXON; }
|
||||||
|
|
||||||
|
bool operator< (const ALT& o) const {
|
||||||
|
if(pos != o.pos) return pos < o.pos;
|
||||||
|
if(type != o.type) {
|
||||||
|
if(type == ALT_NONE || o.type == ALT_NONE) {
|
||||||
|
return type == ALT_NONE;
|
||||||
|
}
|
||||||
|
if(type == ALT_SNP_INS) return true;
|
||||||
|
else if(o.type == ALT_SNP_INS) return false;
|
||||||
|
return type < o.type;
|
||||||
|
}
|
||||||
|
if(len != o.len) return len < o.len;
|
||||||
|
if(seq != o.seq) return seq < o.seq;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool compatibleWith(const ALT& o) const {
|
||||||
|
if(pos == o.pos) return false;
|
||||||
|
|
||||||
|
// sort the two SNPs
|
||||||
|
const ALT& a = (pos < o.pos ? *this : o);
|
||||||
|
const ALT& b = (pos < o.pos ? o : *this);
|
||||||
|
|
||||||
|
if(a.snp()) {
|
||||||
|
if(a.type == ALT_SNP_DEL || a.type == ALT_SNP_INS) {
|
||||||
|
if(b.pos <= a.pos + a.len) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else if(a.splicesite()) {
|
||||||
|
if(b.pos <= a.right + 2) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
assert(false);
|
||||||
|
}
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool isSame(const ALT& o) const {
|
||||||
|
if(type != o.type)
|
||||||
|
return false;
|
||||||
|
if(type == ALT_SNP_SGL) {
|
||||||
|
return pos == o.pos && seq == o.seq;
|
||||||
|
} else if(type == ALT_SNP_DEL || type == ALT_SNP_INS || type == ALT_SPLICESITE) {
|
||||||
|
if(type == ALT_SNP_INS) {
|
||||||
|
if(seq != o.seq)
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
if(reversed == o.reversed) {
|
||||||
|
return pos == o.pos && len == o.len;
|
||||||
|
} else {
|
||||||
|
if(reversed) {
|
||||||
|
return pos - len + 1 == o.pos && len == o.len;
|
||||||
|
} else {
|
||||||
|
return pos == o.pos - o.len + 1 && len == o.len;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
assert(false);
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
bool repOk() const {
|
||||||
|
if(type == ALT_SNP_SGL) {
|
||||||
|
if(len != 1) {
|
||||||
|
assert(false);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
if(seq > 3) {
|
||||||
|
assert(false);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
} else if(type == ALT_SNP_DEL) {
|
||||||
|
if(len <= 0) {
|
||||||
|
assert(false);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
if(seq != 0) {
|
||||||
|
assert(false);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
} else if(type == ALT_SNP_INS) {
|
||||||
|
if(len <= 0) {
|
||||||
|
assert(false);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
} else if(type == ALT_SPLICESITE) {
|
||||||
|
assert_lt(left, right);
|
||||||
|
assert_leq(fw, 1);
|
||||||
|
}else {
|
||||||
|
assert(false);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
bool write(ofstream& f_out, bool bigEndian) const {
|
||||||
|
writeIndex<index_t>(f_out, pos, bigEndian);
|
||||||
|
writeU32(f_out, type, bigEndian);
|
||||||
|
writeIndex<index_t>(f_out, len, bigEndian);
|
||||||
|
writeIndex<uint64_t>(f_out, seq, bigEndian);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool read(ifstream& f_in, bool bigEndian) {
|
||||||
|
pos = readIndex<index_t>(f_in, bigEndian);
|
||||||
|
type = (ALT_TYPE)readU32(f_in, bigEndian);
|
||||||
|
assert_neq(type, ALT_SNP_ALT);
|
||||||
|
len = readIndex<index_t>(f_in, bigEndian);
|
||||||
|
seq = readIndex<uint64_t>(f_in, bigEndian);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
template <typename index_t>
|
||||||
|
struct Haplotype {
|
||||||
|
Haplotype() {
|
||||||
|
reset();
|
||||||
|
}
|
||||||
|
|
||||||
|
void reset() {
|
||||||
|
left = right = 0;
|
||||||
|
alts.clear();
|
||||||
|
}
|
||||||
|
|
||||||
|
index_t left;
|
||||||
|
index_t right;
|
||||||
|
EList<index_t, 1> alts;
|
||||||
|
|
||||||
|
bool operator< (const Haplotype& o) const {
|
||||||
|
if(left != o.left) return left < o.left;
|
||||||
|
if(right != o.right) return right < o.right;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool write(ofstream& f_out, bool bigEndian) const {
|
||||||
|
writeIndex<index_t>(f_out, left, bigEndian);
|
||||||
|
writeIndex<index_t>(f_out, right, bigEndian);
|
||||||
|
writeIndex<index_t>(f_out, alts.size(), bigEndian);
|
||||||
|
for(index_t i = 0; i < alts.size(); i++) {
|
||||||
|
writeIndex<index_t>(f_out, alts[i], bigEndian);
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool read(ifstream& f_in, bool bigEndian) {
|
||||||
|
left = readIndex<index_t>(f_in, bigEndian);
|
||||||
|
right = readIndex<index_t>(f_in, bigEndian);
|
||||||
|
assert_leq(left, right);
|
||||||
|
index_t num_alts = readIndex<index_t>(f_in, bigEndian);
|
||||||
|
alts.resizeExact(num_alts); alts.clear();
|
||||||
|
for(index_t i = 0; i < num_alts; i++) {
|
||||||
|
alts.push_back(readIndex<index_t>(f_in, bigEndian));
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
template <typename index_t>
|
||||||
|
class ALTDB {
|
||||||
|
public:
|
||||||
|
ALTDB() :
|
||||||
|
_snp(false),
|
||||||
|
_ss(false),
|
||||||
|
_exon(false)
|
||||||
|
{}
|
||||||
|
|
||||||
|
virtual ~ALTDB() {}
|
||||||
|
|
||||||
|
bool hasSNPs() const { return _snp; }
|
||||||
|
bool hasSpliceSites() const { return _ss; }
|
||||||
|
bool hasExons() const { return _exon; }
|
||||||
|
|
||||||
|
void setSNPs(bool snp) { _snp = snp; }
|
||||||
|
void setSpliceSites(bool ss) { _ss = ss; }
|
||||||
|
void setExons(bool exon) { _exon = exon; }
|
||||||
|
|
||||||
|
EList<ALT<index_t> >& alts() { return _alts; }
|
||||||
|
EList<string>& altnames() { return _altnames; }
|
||||||
|
EList<Haplotype<index_t> >& haplotypes() { return _haplotypes; }
|
||||||
|
EList<index_t>& haplotype_maxrights() { return _haplotype_maxrights; }
|
||||||
|
|
||||||
|
const EList<ALT<index_t> >& alts() const { return _alts; }
|
||||||
|
const EList<string>& altnames() const { return _altnames; }
|
||||||
|
const EList<Haplotype<index_t> >& haplotypes() const { return _haplotypes; }
|
||||||
|
const EList<index_t>& haplotype_maxrights() const { return _haplotype_maxrights; }
|
||||||
|
|
||||||
|
private:
|
||||||
|
bool _snp;
|
||||||
|
bool _ss;
|
||||||
|
bool _exon;
|
||||||
|
|
||||||
|
EList<ALT<index_t> > _alts;
|
||||||
|
EList<string> _altnames;
|
||||||
|
EList<Haplotype<index_t> > _haplotypes;
|
||||||
|
EList<index_t> _haplotype_maxrights;
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
#endif /*ifndef ALT_H_*/
|
279
assert_helpers.h
Normal file
279
assert_helpers.h
Normal file
@ -0,0 +1,279 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef ASSERT_HELPERS_H_
|
||||||
|
#define ASSERT_HELPERS_H_
|
||||||
|
|
||||||
|
#include <stdexcept>
|
||||||
|
#include <string>
|
||||||
|
#include <cassert>
|
||||||
|
#include <iostream>
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Assertion for release-enabled assertions
|
||||||
|
*/
|
||||||
|
class ReleaseAssertException : public std::runtime_error {
|
||||||
|
public:
|
||||||
|
ReleaseAssertException(const std::string& msg = "") : std::runtime_error(msg) {}
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Macros for release-enabled assertions, and helper macros to make
|
||||||
|
* all assertion error messages more helpful.
|
||||||
|
*/
|
||||||
|
#ifndef NDEBUG
|
||||||
|
#define ASSERT_ONLY(...) __VA_ARGS__
|
||||||
|
#else
|
||||||
|
#define ASSERT_ONLY(...)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#define rt_assert(b) \
|
||||||
|
if(!(b)) { \
|
||||||
|
std::cout << "rt_assert at " << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
throw ReleaseAssertException(); \
|
||||||
|
}
|
||||||
|
#define rt_assert_msg(b,msg) \
|
||||||
|
if(!(b)) { \
|
||||||
|
std::cout << msg << " at " << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
throw ReleaseAssertException(msg); \
|
||||||
|
}
|
||||||
|
|
||||||
|
#define rt_assert_eq(ex,ac) \
|
||||||
|
if(!((ex) == (ac))) { \
|
||||||
|
std::cout << "rt_assert_eq: expected (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
throw ReleaseAssertException(); \
|
||||||
|
}
|
||||||
|
#define rt_assert_eq_msg(ex,ac,msg) \
|
||||||
|
if(!((ex) == (ac))) { \
|
||||||
|
std::cout << "rt_assert_eq: " << msg << ": (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
throw ReleaseAssertException(msg); \
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
#define assert_eq(ex,ac) \
|
||||||
|
if(!((ex) == (ac))) { \
|
||||||
|
std::cout << "assert_eq: expected (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
assert(0); \
|
||||||
|
}
|
||||||
|
#define assert_eq_msg(ex,ac,msg) \
|
||||||
|
if(!((ex) == (ac))) { \
|
||||||
|
std::cout << "assert_eq: " << msg << ": (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
assert(0); \
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
#define assert_eq(ex,ac)
|
||||||
|
#define assert_eq_msg(ex,ac,msg)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#define rt_assert_neq(ex,ac) \
|
||||||
|
if(!((ex) != (ac))) { \
|
||||||
|
std::cout << "rt_assert_neq: expected not (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
throw ReleaseAssertException(); \
|
||||||
|
}
|
||||||
|
#define rt_assert_neq_msg(ex,ac,msg) \
|
||||||
|
if(!((ex) != (ac))) { \
|
||||||
|
std::cout << "rt_assert_neq: " << msg << ": (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
throw ReleaseAssertException(msg); \
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
#define assert_neq(ex,ac) \
|
||||||
|
if(!((ex) != (ac))) { \
|
||||||
|
std::cout << "assert_neq: expected not (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
assert(0); \
|
||||||
|
}
|
||||||
|
#define assert_neq_msg(ex,ac,msg) \
|
||||||
|
if(!((ex) != (ac))) { \
|
||||||
|
std::cout << "assert_neq: " << msg << ": (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
assert(0); \
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
#define assert_neq(ex,ac)
|
||||||
|
#define assert_neq_msg(ex,ac,msg)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#define rt_assert_gt(a,b) \
|
||||||
|
if(!((a) > (b))) { \
|
||||||
|
std::cout << "rt_assert_gt: expected (" << (a) << ") > (" << (b) << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
throw ReleaseAssertException(); \
|
||||||
|
}
|
||||||
|
#define rt_assert_gt_msg(a,b,msg) \
|
||||||
|
if(!((a) > (b))) { \
|
||||||
|
std::cout << "rt_assert_gt: " << msg << ": (" << (a) << ") > (" << (b) << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
throw ReleaseAssertException(msg); \
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
#define assert_gt(a,b) \
|
||||||
|
if(!((a) > (b))) { \
|
||||||
|
std::cout << "assert_gt: expected (" << (a) << ") > (" << (b) << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
assert(0); \
|
||||||
|
}
|
||||||
|
#define assert_gt_msg(a,b,msg) \
|
||||||
|
if(!((a) > (b))) { \
|
||||||
|
std::cout << "assert_gt: " << msg << ": (" << (a) << ") > (" << (b) << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
assert(0); \
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
#define assert_gt(a,b)
|
||||||
|
#define assert_gt_msg(a,b,msg)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#define rt_assert_geq(a,b) \
|
||||||
|
if(!((a) >= (b))) { \
|
||||||
|
std::cout << "rt_assert_geq: expected (" << (a) << ") >= (" << (b) << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
throw ReleaseAssertException(); \
|
||||||
|
}
|
||||||
|
#define rt_assert_geq_msg(a,b,msg) \
|
||||||
|
if(!((a) >= (b))) { \
|
||||||
|
std::cout << "rt_assert_geq: " << msg << ": (" << (a) << ") >= (" << (b) << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
throw ReleaseAssertException(msg); \
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
#define assert_geq(a,b) \
|
||||||
|
if(!((a) >= (b))) { \
|
||||||
|
std::cout << "assert_geq: expected (" << (a) << ") >= (" << (b) << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
assert(0); \
|
||||||
|
}
|
||||||
|
#define assert_geq_msg(a,b,msg) \
|
||||||
|
if(!((a) >= (b))) { \
|
||||||
|
std::cout << "assert_geq: " << msg << ": (" << (a) << ") >= (" << (b) << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
assert(0); \
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
#define assert_geq(a,b)
|
||||||
|
#define assert_geq_msg(a,b,msg)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#define rt_assert_lt(a,b) \
|
||||||
|
if(!(a < b)) { \
|
||||||
|
std::cout << "rt_assert_lt: expected (" << a << ") < (" << b << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
throw ReleaseAssertException(); \
|
||||||
|
}
|
||||||
|
#define rt_assert_lt_msg(a,b,msg) \
|
||||||
|
if(!(a < b)) { \
|
||||||
|
std::cout << "rt_assert_lt: " << msg << ": (" << a << ") < (" << b << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
throw ReleaseAssertException(msg); \
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
#define assert_lt(a,b) \
|
||||||
|
if(!(a < b)) { \
|
||||||
|
std::cout << "assert_lt: expected (" << a << ") < (" << b << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
assert(0); \
|
||||||
|
}
|
||||||
|
#define assert_lt_msg(a,b,msg) \
|
||||||
|
if(!(a < b)) { \
|
||||||
|
std::cout << "assert_lt: " << msg << ": (" << a << ") < (" << b << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
assert(0); \
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
#define assert_lt(a,b)
|
||||||
|
#define assert_lt_msg(a,b,msg)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#define rt_assert_leq(a,b) \
|
||||||
|
if(!((a) <= (b))) { \
|
||||||
|
std::cout << "rt_assert_leq: expected (" << (a) << ") <= (" << (b) << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
throw ReleaseAssertException(); \
|
||||||
|
}
|
||||||
|
#define rt_assert_leq_msg(a,b,msg) \
|
||||||
|
if(!((a) <= (b))) { \
|
||||||
|
std::cout << "rt_assert_leq: " << msg << ": (" << (a) << ") <= (" << (b) << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
throw ReleaseAssertException(msg); \
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
#define assert_leq(a,b) \
|
||||||
|
if(!((a) <= (b))) { \
|
||||||
|
std::cout << "assert_leq: expected (" << (a) << ") <= (" << (b) << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
assert(0); \
|
||||||
|
}
|
||||||
|
#define assert_leq_msg(a,b,msg) \
|
||||||
|
if(!((a) <= (b))) { \
|
||||||
|
std::cout << "assert_leq: " << msg << ": (" << (a) << ") <= (" << (b) << ")" << std::endl; \
|
||||||
|
std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
|
||||||
|
assert(0); \
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
#define assert_leq(a,b)
|
||||||
|
#define assert_leq_msg(a,b,msg)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
#define assert_in(c, s) assert_in2(c, s, __FILE__, __LINE__)
|
||||||
|
static inline void assert_in2(char c, const char *str, const char *file, int line) {
|
||||||
|
const char *s = str;
|
||||||
|
while(*s != '\0') {
|
||||||
|
if(c == *s) return;
|
||||||
|
s++;
|
||||||
|
}
|
||||||
|
std::cout << "assert_in: (" << c << ") not in (" << str << ")" << std::endl;
|
||||||
|
std::cout << file << ":" << line << std::endl;
|
||||||
|
assert(0);
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
#define assert_in(c, s)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifndef NDEBUG
|
||||||
|
#define assert_range(b, e, v) assert_range_helper(b, e, v, __FILE__, __LINE__)
|
||||||
|
template<typename T>
|
||||||
|
inline static void assert_range_helper(const T& begin,
|
||||||
|
const T& end,
|
||||||
|
const T& val,
|
||||||
|
const char *file,
|
||||||
|
int line)
|
||||||
|
{
|
||||||
|
if(val < begin || val > end) {
|
||||||
|
std::cout << "assert_range: (" << val << ") not in ["
|
||||||
|
<< begin << ", " << end << "]" << std::endl;
|
||||||
|
std::cout << file << ":" << line << std::endl;
|
||||||
|
assert(0);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
#define assert_range(b, e, v)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#endif /*ASSERT_HELPERS_H_*/
|
27
banded.cpp
Normal file
27
banded.cpp
Normal file
@ -0,0 +1,27 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <iostream>
|
||||||
|
#include "banded.h"
|
||||||
|
|
||||||
|
#ifdef MAIN_BANDED
|
||||||
|
int main(void) {
|
||||||
|
|
||||||
|
}
|
||||||
|
#endif
|
52
banded.h
Normal file
52
banded.h
Normal file
@ -0,0 +1,52 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef BANDED_H_
|
||||||
|
#define BANDED_H_
|
||||||
|
|
||||||
|
#include "sse_util.h"
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Use SSE instructions to quickly find stretches with lots of matches, then
|
||||||
|
* resolve alignments.
|
||||||
|
*/
|
||||||
|
class BandedSseAligner {
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
void init(
|
||||||
|
int *q, // query, maskized
|
||||||
|
size_t qi, // query start
|
||||||
|
size_t qf, // query end
|
||||||
|
int *r, // reference, maskized
|
||||||
|
size_t ri, // reference start
|
||||||
|
size_t rf) // reference end
|
||||||
|
{
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
void nextAlignment() {
|
||||||
|
}
|
||||||
|
|
||||||
|
protected:
|
||||||
|
|
||||||
|
EList_m128i mat_;
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif
|
102
binary_sa_search.h
Normal file
102
binary_sa_search.h
Normal file
@ -0,0 +1,102 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef BINARY_SA_SEARCH_H_
|
||||||
|
#define BINARY_SA_SEARCH_H_
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <iostream>
|
||||||
|
#include <limits>
|
||||||
|
#include "alphabet.h"
|
||||||
|
#include "assert_helpers.h"
|
||||||
|
#include "ds.h"
|
||||||
|
#include "btypes.h"
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Do a binary search using the suffix of 'host' beginning at offset
|
||||||
|
* 'qry' as the query and 'sa' as an already-lexicographically-sorted
|
||||||
|
* list of suffixes of host. 'sa' may be all suffixes of host or just
|
||||||
|
* a subset. Returns the index in sa of the smallest suffix of host
|
||||||
|
* that is larger than qry, or length(sa) if all suffixes of host are
|
||||||
|
* less than qry.
|
||||||
|
*
|
||||||
|
* We use the Manber and Myers optimization of maintaining a pair of
|
||||||
|
* counters for the longest lcp observed so far on the left- and right-
|
||||||
|
* hand sides and using the min of the two as a way of skipping over
|
||||||
|
* characters at the beginning of a new round.
|
||||||
|
*
|
||||||
|
* Returns maximum value if the query suffix matches an element of sa.
|
||||||
|
*/
|
||||||
|
template<typename TStr, typename TSufElt> inline
|
||||||
|
TIndexOffU binarySASearch(
|
||||||
|
const TStr& host,
|
||||||
|
TIndexOffU qry,
|
||||||
|
const EList<TSufElt>& sa)
|
||||||
|
{
|
||||||
|
TIndexOffU lLcp = 0, rLcp = 0; // greatest observed LCPs on left and right
|
||||||
|
TIndexOffU l = 0, r = (TIndexOffU)sa.size()+1; // binary-search window
|
||||||
|
TIndexOffU hostLen = (TIndexOffU)host.length();
|
||||||
|
while(true) {
|
||||||
|
assert_gt(r, l);
|
||||||
|
TIndexOffU m = (l+r) >> 1;
|
||||||
|
if(m == l) {
|
||||||
|
// Binary-search window has closed: we have an answer
|
||||||
|
if(m > 0 && sa[m-1] == qry) {
|
||||||
|
return std::numeric_limits<TIndexOffU>::max(); // qry matches
|
||||||
|
}
|
||||||
|
assert_leq(m, sa.size());
|
||||||
|
return m; // Return index of right-hand suffix
|
||||||
|
}
|
||||||
|
assert_gt(m, 0);
|
||||||
|
TIndexOffU suf = sa[m-1];
|
||||||
|
if(suf == qry) {
|
||||||
|
return std::numeric_limits<TIndexOffU>::max(); // query matches an elt of sa
|
||||||
|
}
|
||||||
|
TIndexOffU lcp = min(lLcp, rLcp);
|
||||||
|
#ifndef NDEBUG
|
||||||
|
if(sstr_suf_upto_neq(host, qry, host, suf, lcp)) {
|
||||||
|
assert(0);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
// Keep advancing lcp, but stop when query mismatches host or
|
||||||
|
// when the counter falls off either the query or the suffix
|
||||||
|
while(suf+lcp < hostLen && qry+lcp < hostLen && host[suf+lcp] == host[qry+lcp]) {
|
||||||
|
lcp++;
|
||||||
|
}
|
||||||
|
// Fell off the end of either the query or the sa elt?
|
||||||
|
bool fell = (suf+lcp == hostLen || qry+lcp == hostLen);
|
||||||
|
if((fell && qry+lcp == hostLen) || (!fell && host[suf+lcp] < host[qry+lcp])) {
|
||||||
|
// Query is greater than sa elt
|
||||||
|
l = m; // update left bound
|
||||||
|
lLcp = max(lLcp, lcp); // update left lcp
|
||||||
|
}
|
||||||
|
else if((fell && suf+lcp == hostLen) || (!fell && host[suf+lcp] > host[qry+lcp])) {
|
||||||
|
// Query is less than sa elt
|
||||||
|
r = m; // update right bound
|
||||||
|
rLcp = max(rLcp, lcp); // update right lcp
|
||||||
|
} else {
|
||||||
|
assert(false); // Must be one or the other!
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Shouldn't get here
|
||||||
|
assert(false);
|
||||||
|
return std::numeric_limits<TIndexOffU>::max();
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /*BINARY_SA_SEARCH_H_*/
|
315
bit_packed_array.cpp
Normal file
315
bit_packed_array.cpp
Normal file
@ -0,0 +1,315 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2018, Chanhee Park <parkchanhee@gmail.com> and Daehwan Kim <infphilo@gmail.com>
|
||||||
|
*
|
||||||
|
* This file is part of HISAT 2.
|
||||||
|
*
|
||||||
|
* HISAT 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* HISAT 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with HISAT 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <iostream>
|
||||||
|
#include <vector>
|
||||||
|
#include <algorithm>
|
||||||
|
#include "timer.h"
|
||||||
|
#include "aligner_sw.h"
|
||||||
|
#include "aligner_result.h"
|
||||||
|
#include "scoring.h"
|
||||||
|
#include "sstring.h"
|
||||||
|
|
||||||
|
#include "bit_packed_array.h"
|
||||||
|
|
||||||
|
TIndexOffU BitPackedArray::get(size_t index) const
|
||||||
|
{
|
||||||
|
assert_lt(index, cur_);
|
||||||
|
|
||||||
|
pair<size_t, size_t> addr = indexToAddress(index);
|
||||||
|
uint64_t *block = blocks_[addr.first];
|
||||||
|
pair<size_t, size_t> pos = columnToPosition(addr.second);
|
||||||
|
TIndexOffU val = getItem(block, pos.first, pos.second);
|
||||||
|
|
||||||
|
return val;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#define write_fp(x) fp.write((const char *)&(x), sizeof((x)))
|
||||||
|
|
||||||
|
void BitPackedArray::writeFile(ofstream &fp)
|
||||||
|
{
|
||||||
|
size_t sz = 0;
|
||||||
|
|
||||||
|
write_fp(item_bit_size_);
|
||||||
|
write_fp(elm_bit_size_);
|
||||||
|
write_fp(items_per_block_bit_);
|
||||||
|
write_fp(items_per_block_bit_mask_);
|
||||||
|
write_fp(items_per_block_);
|
||||||
|
|
||||||
|
write_fp(cur_);
|
||||||
|
write_fp(sz_);
|
||||||
|
|
||||||
|
write_fp(block_size_);
|
||||||
|
|
||||||
|
// number of blocks
|
||||||
|
sz = blocks_.size();
|
||||||
|
write_fp(sz);
|
||||||
|
for(size_t i = 0; i < sz; i++) {
|
||||||
|
fp.write((const char *)blocks_[i], block_size_);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void BitPackedArray::writeFile(const char *filename)
|
||||||
|
{
|
||||||
|
ofstream fp(filename, std::ofstream::binary);
|
||||||
|
writeFile(fp);
|
||||||
|
fp.close();
|
||||||
|
}
|
||||||
|
|
||||||
|
void BitPackedArray::writeFile(const string &filename)
|
||||||
|
{
|
||||||
|
writeFile(filename.c_str());
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#define read_fp(x) fp.read((char *)&(x), sizeof((x)))
|
||||||
|
|
||||||
|
void BitPackedArray::readFile(ifstream &fp)
|
||||||
|
{
|
||||||
|
size_t val_sz = 0;
|
||||||
|
|
||||||
|
read_fp(val_sz);
|
||||||
|
init_by_log2(val_sz);
|
||||||
|
//rt_assert_eq(val_sz, item_bit_size_);
|
||||||
|
|
||||||
|
read_fp(val_sz);
|
||||||
|
rt_assert_eq(val_sz, elm_bit_size_);
|
||||||
|
|
||||||
|
read_fp(val_sz);
|
||||||
|
rt_assert_eq(val_sz, items_per_block_bit_);
|
||||||
|
|
||||||
|
read_fp(val_sz);
|
||||||
|
rt_assert_eq(val_sz, items_per_block_bit_mask_);
|
||||||
|
|
||||||
|
read_fp(val_sz);
|
||||||
|
rt_assert_eq(val_sz, items_per_block_);
|
||||||
|
|
||||||
|
// skip cur_
|
||||||
|
size_t prev_cnt = 0;
|
||||||
|
read_fp(prev_cnt);
|
||||||
|
cur_ = 0;
|
||||||
|
|
||||||
|
// skip sz_
|
||||||
|
size_t prev_sz = 0;
|
||||||
|
read_fp(prev_sz);
|
||||||
|
sz_ = 0;
|
||||||
|
|
||||||
|
// block_size_
|
||||||
|
read_fp(val_sz);
|
||||||
|
rt_assert_eq(val_sz, block_size_);
|
||||||
|
|
||||||
|
// alloc blocks
|
||||||
|
allocItems(prev_cnt);
|
||||||
|
rt_assert_eq(prev_sz, sz_);
|
||||||
|
|
||||||
|
// number of blocks
|
||||||
|
read_fp(val_sz);
|
||||||
|
rt_assert_eq(val_sz, blocks_.size());
|
||||||
|
for(size_t i = 0; i < blocks_.size(); i++) {
|
||||||
|
fp.read((char *)blocks_[i], block_size_);
|
||||||
|
}
|
||||||
|
cur_ = prev_cnt;
|
||||||
|
}
|
||||||
|
|
||||||
|
void BitPackedArray::readFile(const char *filename)
|
||||||
|
{
|
||||||
|
ifstream fp(filename, std::ifstream::binary);
|
||||||
|
readFile(fp);
|
||||||
|
fp.close();
|
||||||
|
}
|
||||||
|
|
||||||
|
void BitPackedArray::readFile(const string &filename)
|
||||||
|
{
|
||||||
|
readFile(filename.c_str());
|
||||||
|
}
|
||||||
|
|
||||||
|
void BitPackedArray::put(size_t index, TIndexOffU val)
|
||||||
|
{
|
||||||
|
assert_lt(index, cur_);
|
||||||
|
|
||||||
|
pair<size_t, size_t> addr = indexToAddress(index);
|
||||||
|
uint64_t *block = blocks_[addr.first];
|
||||||
|
pair<size_t, size_t> pos = columnToPosition(addr.second);
|
||||||
|
|
||||||
|
setItem(block, pos.first, pos.second, val);
|
||||||
|
}
|
||||||
|
|
||||||
|
void BitPackedArray::pushBack(TIndexOffU val)
|
||||||
|
{
|
||||||
|
if(cur_ == sz_) {
|
||||||
|
allocItems(items_per_block_);
|
||||||
|
}
|
||||||
|
|
||||||
|
put(cur_++, val);
|
||||||
|
|
||||||
|
assert_leq(cur_, sz_);
|
||||||
|
}
|
||||||
|
|
||||||
|
TIndexOffU BitPackedArray::getItem(uint64_t *block, size_t idx, size_t offset) const
|
||||||
|
{
|
||||||
|
size_t remains = item_bit_size_;
|
||||||
|
|
||||||
|
TIndexOffU val = 0;
|
||||||
|
|
||||||
|
while(remains > 0) {
|
||||||
|
size_t bits = min(elm_bit_size_ - offset, remains);
|
||||||
|
uint64_t mask = bitToMask(bits);
|
||||||
|
|
||||||
|
// get value from block
|
||||||
|
TIndexOffU t = (block[idx] >> offset) & mask;
|
||||||
|
val = val | (t << (item_bit_size_ - remains));
|
||||||
|
|
||||||
|
remains -= bits;
|
||||||
|
offset = 0;
|
||||||
|
idx++;
|
||||||
|
}
|
||||||
|
|
||||||
|
return val;
|
||||||
|
}
|
||||||
|
|
||||||
|
void BitPackedArray::setItem(uint64_t *block, size_t idx, size_t offset, TIndexOffU val)
|
||||||
|
{
|
||||||
|
size_t remains = item_bit_size_;
|
||||||
|
|
||||||
|
while(remains > 0) {
|
||||||
|
size_t bits = min(elm_bit_size_ - offset, remains);
|
||||||
|
uint64_t mask = bitToMask(bits);
|
||||||
|
uint64_t dest_mask = mask << offset;
|
||||||
|
|
||||||
|
// get 'bits' lsb from val
|
||||||
|
uint64_t t = val & mask;
|
||||||
|
val >>= bits;
|
||||||
|
|
||||||
|
// save 't' to block[idx]
|
||||||
|
t <<= offset;
|
||||||
|
block[idx] &= ~(dest_mask); // clear
|
||||||
|
block[idx] |= t;
|
||||||
|
|
||||||
|
idx++;
|
||||||
|
remains -= bits;
|
||||||
|
offset = 0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pair<size_t, size_t> BitPackedArray::indexToAddress(size_t index) const
|
||||||
|
{
|
||||||
|
pair<size_t, size_t> addr;
|
||||||
|
|
||||||
|
addr.first = index >> items_per_block_bit_;
|
||||||
|
addr.second = index & items_per_block_bit_mask_;
|
||||||
|
|
||||||
|
return addr;
|
||||||
|
}
|
||||||
|
|
||||||
|
pair<size_t, size_t> BitPackedArray::columnToPosition(size_t col) const {
|
||||||
|
pair<size_t, size_t> pos;
|
||||||
|
|
||||||
|
pos.first = (col * item_bit_size_) / elm_bit_size_;
|
||||||
|
pos.second = (col * item_bit_size_) % elm_bit_size_;
|
||||||
|
return pos;
|
||||||
|
}
|
||||||
|
|
||||||
|
void BitPackedArray::expand(size_t count)
|
||||||
|
{
|
||||||
|
if((cur_ + count) > sz_) {
|
||||||
|
allocItems(count);
|
||||||
|
}
|
||||||
|
|
||||||
|
cur_ += count;
|
||||||
|
|
||||||
|
assert_leq(cur_, sz_);
|
||||||
|
}
|
||||||
|
|
||||||
|
void BitPackedArray::allocSize(size_t sz)
|
||||||
|
{
|
||||||
|
size_t num_block = (sz * sizeof(uint64_t) + block_size_ - 1) / block_size_;
|
||||||
|
|
||||||
|
for(size_t i = 0; i < num_block; i++) {
|
||||||
|
uint64_t *ptr = new uint64_t[block_size_];
|
||||||
|
blocks_.push_back(ptr);
|
||||||
|
sz_ += items_per_block_;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void BitPackedArray::allocItems(size_t count)
|
||||||
|
{
|
||||||
|
size_t sz = (count * item_bit_size_ + elm_bit_size_ - 1) / elm_bit_size_;
|
||||||
|
allocSize(sz);
|
||||||
|
}
|
||||||
|
|
||||||
|
void BitPackedArray::init_by_log2(size_t ceil_log2)
|
||||||
|
{
|
||||||
|
item_bit_size_ = ceil_log2;
|
||||||
|
|
||||||
|
elm_bit_size_ = sizeof(uint64_t) * 8;
|
||||||
|
|
||||||
|
items_per_block_bit_ = 20; // 1M
|
||||||
|
items_per_block_ = 1ULL << (items_per_block_bit_);
|
||||||
|
items_per_block_bit_mask_ = items_per_block_ - 1;
|
||||||
|
|
||||||
|
block_size_ = (items_per_block_ * item_bit_size_ + elm_bit_size_ - 1) / elm_bit_size_ * sizeof(uint64_t);
|
||||||
|
|
||||||
|
cur_ = 0;
|
||||||
|
sz_ = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
void BitPackedArray::init(size_t max_value)
|
||||||
|
{
|
||||||
|
init_by_log2((size_t)ceil(log2(max_value)));
|
||||||
|
}
|
||||||
|
|
||||||
|
void BitPackedArray::dump() const
|
||||||
|
{
|
||||||
|
cerr << "item_bit_size_: " << item_bit_size_ << endl;
|
||||||
|
cerr << "block_size_: " << block_size_ << endl;
|
||||||
|
cerr << "items_per_block_: " << items_per_block_ << endl;
|
||||||
|
cerr << "cur_: " << cur_ << endl;
|
||||||
|
cerr << "sz_: " << sz_ << endl;
|
||||||
|
cerr << "number of blocks: " << blocks_.size() << endl;
|
||||||
|
}
|
||||||
|
|
||||||
|
size_t BitPackedArray::getMemUsage() const
|
||||||
|
{
|
||||||
|
size_t tot = blocks_.size() * block_size_;
|
||||||
|
tot += blocks_.totalCapacityBytes();
|
||||||
|
return tot;
|
||||||
|
}
|
||||||
|
|
||||||
|
BitPackedArray::~BitPackedArray()
|
||||||
|
{
|
||||||
|
for(size_t i = 0; i < blocks_.size(); i++) {
|
||||||
|
uint64_t *ptr = blocks_[i];
|
||||||
|
delete [] ptr;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void BitPackedArray::reset()
|
||||||
|
{
|
||||||
|
cur_ = 0;
|
||||||
|
sz_ = 0;
|
||||||
|
|
||||||
|
for(size_t i = 0; i < blocks_.size(); i++) {
|
||||||
|
uint64_t *ptr = blocks_[i];
|
||||||
|
delete [] ptr;
|
||||||
|
}
|
||||||
|
|
||||||
|
blocks_.clear();
|
||||||
|
}
|
||||||
|
|
105
bit_packed_array.h
Normal file
105
bit_packed_array.h
Normal file
@ -0,0 +1,105 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2018, Chanhee Park <parkchanhee@gmail.com> and Daehwan Kim <infphilo@gmail.com>
|
||||||
|
*
|
||||||
|
* This file is part of HISAT 2.
|
||||||
|
*
|
||||||
|
* HISAT 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* HISAT 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with HISAT 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef __HISAT2_BIT_PACKED_ARRAY_H
|
||||||
|
#define __HISAT2_BIT_PACKED_ARRAY_H
|
||||||
|
|
||||||
|
#include <iostream>
|
||||||
|
#include <fstream>
|
||||||
|
#include <limits>
|
||||||
|
#include <map>
|
||||||
|
#include "assert_helpers.h"
|
||||||
|
#include "word_io.h"
|
||||||
|
#include "mem_ids.h"
|
||||||
|
#include "ds.h"
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
class BitPackedArray {
|
||||||
|
public:
|
||||||
|
BitPackedArray () {}
|
||||||
|
~BitPackedArray();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Return true iff there are no items
|
||||||
|
* @return
|
||||||
|
*/
|
||||||
|
inline bool empty() const { return cur_ == 0; }
|
||||||
|
inline size_t size() const { return cur_; }
|
||||||
|
|
||||||
|
TIndexOffU get(size_t idx) const;
|
||||||
|
|
||||||
|
inline TIndexOffU operator[](size_t i) const { return get(i); }
|
||||||
|
void pushBack(TIndexOffU val);
|
||||||
|
|
||||||
|
void init(size_t max_value);
|
||||||
|
void reset();
|
||||||
|
|
||||||
|
void writeFile(const char *filename);
|
||||||
|
void writeFile(const string& filename);
|
||||||
|
void writeFile(ofstream &fp);
|
||||||
|
|
||||||
|
void readFile(const char *filename);
|
||||||
|
void readFile(const string& filename);
|
||||||
|
void readFile(ifstream &fp);
|
||||||
|
|
||||||
|
void dump() const;
|
||||||
|
|
||||||
|
size_t getMemUsage() const;
|
||||||
|
|
||||||
|
private:
|
||||||
|
void init_by_log2(size_t ceil_log2);
|
||||||
|
|
||||||
|
void put(size_t index, TIndexOffU val);
|
||||||
|
inline uint64_t bitToMask(size_t bit) const
|
||||||
|
{
|
||||||
|
return (uint64_t) ((1ULL << bit) - 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
TIndexOffU getItem(uint64_t *block, size_t idx, size_t offset) const;
|
||||||
|
void setItem(uint64_t *block, size_t idx, size_t offset, TIndexOffU val);
|
||||||
|
|
||||||
|
pair<size_t, size_t> indexToAddress(size_t index) const;
|
||||||
|
pair<size_t, size_t> columnToPosition(size_t col) const;
|
||||||
|
|
||||||
|
|
||||||
|
void expand(size_t count = 1);
|
||||||
|
void allocSize(size_t sz);
|
||||||
|
void allocItems(size_t count);
|
||||||
|
|
||||||
|
|
||||||
|
private:
|
||||||
|
size_t item_bit_size_; // item bit size(e.g. 33bit)
|
||||||
|
|
||||||
|
size_t elm_bit_size_; // 64bit
|
||||||
|
size_t items_per_block_bit_;
|
||||||
|
size_t items_per_block_bit_mask_;
|
||||||
|
size_t items_per_block_; // number of items in block
|
||||||
|
|
||||||
|
size_t cur_; // current item count
|
||||||
|
size_t sz_; // maximum item count
|
||||||
|
|
||||||
|
size_t block_size_; // block size in byte
|
||||||
|
|
||||||
|
// List of packed array
|
||||||
|
EList<uint64_t *> blocks_;
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
#endif //__HISAT2_BIT_PACKED_ARRAY_H
|
80
bitpack.h
Normal file
80
bitpack.h
Normal file
@ -0,0 +1,80 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef BITPACK_H_
|
||||||
|
#define BITPACK_H_
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
#include "assert_helpers.h"
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Routines for marshalling 2-bit values into and out of 8-bit or
|
||||||
|
* 32-bit hosts
|
||||||
|
*/
|
||||||
|
|
||||||
|
static inline void pack_2b_in_8b(const int two, uint8_t& eight, const int off) {
|
||||||
|
assert_lt(two, 4);
|
||||||
|
assert_lt(off, 4);
|
||||||
|
eight |= (two << (off*2));
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline int unpack_2b_from_8b(const uint8_t eight, const int off) {
|
||||||
|
assert_lt(off, 4);
|
||||||
|
return ((eight >> (off*2)) & 0x3);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void pack_2b_in_32b(const int two, uint32_t& thirty2, const int off) {
|
||||||
|
assert_lt(two, 4);
|
||||||
|
assert_lt(off, 16);
|
||||||
|
thirty2 |= (two << (off*2));
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline int unpack_2b_from_32b(const uint32_t thirty2, const int off) {
|
||||||
|
assert_lt(off, 16);
|
||||||
|
return ((thirty2 >> (off*2)) & 0x3);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Routines for marshalling 1-bit values into and out of 8-bit or
|
||||||
|
* 32-bit hosts
|
||||||
|
*/
|
||||||
|
|
||||||
|
static inline void pack_1b_in_8b(const int one, uint8_t& eight, const int off) {
|
||||||
|
assert_lt(one, 2);
|
||||||
|
assert_lt(off, 8);
|
||||||
|
eight |= (one << off);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline int unpack_1b_from_8b(const uint8_t eight, const int off) {
|
||||||
|
assert_lt(off, 2);
|
||||||
|
return ((eight >> off) & 0x1);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void pack_1b_in_32b(const int one, uint32_t& thirty2, const int off) {
|
||||||
|
assert_lt(one, 2);
|
||||||
|
assert_lt(off, 32);
|
||||||
|
thirty2 |= (one << off);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline int unpack_1b_from_32b(const uint32_t thirty2, const int off) {
|
||||||
|
assert_lt(off, 32);
|
||||||
|
return ((thirty2 >> off) & 0x1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /*BITPACK_H_*/
|
1113
blockwise_sa.h
Normal file
1113
blockwise_sa.h
Normal file
File diff suppressed because it is too large
Load Diff
1237
bp_aligner.h
Normal file
1237
bp_aligner.h
Normal file
File diff suppressed because it is too large
Load Diff
48
btypes.h
Normal file
48
btypes.h
Normal file
@ -0,0 +1,48 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef BOWTIE_INDEX_TYPES_H
|
||||||
|
#define BOWTIE_INDEX_TYPES_H
|
||||||
|
|
||||||
|
#ifdef BOWTIE_64BIT_INDEX
|
||||||
|
#define OFF_MASK 0xffffffffffffffff
|
||||||
|
#define OFF_LEN_MASK 0xc000000000000000
|
||||||
|
#define LS_SIZE 0x100000000000000
|
||||||
|
#define OFF_SIZE 8
|
||||||
|
#define INDEX_MAX 0xffffffffffffffff
|
||||||
|
|
||||||
|
typedef uint64_t TIndexOffU;
|
||||||
|
typedef int64_t TIndexOff;
|
||||||
|
|
||||||
|
#else
|
||||||
|
#define OFF_MASK 0xffffffff
|
||||||
|
#define OFF_LEN_MASK 0xc0000000
|
||||||
|
#define LS_SIZE 0x10000000
|
||||||
|
#define OFF_SIZE 4
|
||||||
|
#define INDEX_MAX 0xffffffff
|
||||||
|
|
||||||
|
typedef uint32_t TIndexOffU;
|
||||||
|
typedef int TIndexOff;
|
||||||
|
|
||||||
|
#endif /* BOWTIE_64BIT_INDEX */
|
||||||
|
|
||||||
|
extern const std::string gfm_ext;
|
||||||
|
|
||||||
|
#endif /* BOWTIE_INDEX_TYPES_H */
|
80
ccnt_lut.cpp
Normal file
80
ccnt_lut.cpp
Normal file
@ -0,0 +1,80 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
|
||||||
|
/* Generated by gen_lookup_tables.pl */
|
||||||
|
|
||||||
|
uint8_t cCntLUT_4[4][4][256];
|
||||||
|
uint8_t cCntLUT_4_rev[4][4][256];
|
||||||
|
uint8_t cCntBIT[8][256];
|
||||||
|
|
||||||
|
int countCnt(int by, int c, uint8_t str) {
|
||||||
|
int count = 0;
|
||||||
|
if(by == 0) by = 4;
|
||||||
|
while(by-- > 0) {
|
||||||
|
int c2 = str & 3;
|
||||||
|
str >>= 2;
|
||||||
|
if(c == c2) count++;
|
||||||
|
}
|
||||||
|
|
||||||
|
return count;
|
||||||
|
}
|
||||||
|
|
||||||
|
int countCnt_rev(int by, int c, uint8_t str) {
|
||||||
|
int count = 0;
|
||||||
|
if(by == 0) by = 4;
|
||||||
|
while(by-- > 0) {
|
||||||
|
int c2 = (str >> 6) & 3;
|
||||||
|
str <<= 2;
|
||||||
|
if(c == c2) count++;
|
||||||
|
}
|
||||||
|
|
||||||
|
return count;
|
||||||
|
}
|
||||||
|
|
||||||
|
void initializeCntLut() {
|
||||||
|
for(int by = 0; by < 4; by++) {
|
||||||
|
for(int c = 0; c < 4; c++) {
|
||||||
|
for(int str = 0; str < 256; str++) {
|
||||||
|
cCntLUT_4[by][c][str] = countCnt(by, c, str);
|
||||||
|
cCntLUT_4_rev[by][c][str] = countCnt_rev(by, c, str);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
int countBit(int b, uint8_t str) {
|
||||||
|
int count = 0;
|
||||||
|
if(b == 0) b = 8;
|
||||||
|
while(b-- > 0) {
|
||||||
|
if(str & 0x1) count++;
|
||||||
|
str >>= 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
return count;
|
||||||
|
}
|
||||||
|
|
||||||
|
void initializeCntBit() {
|
||||||
|
for(int b = 0; b < 8; b++) {
|
||||||
|
for(int str = 0; str < 256; str++) {
|
||||||
|
cCntBIT[b][str] = countBit(b, str);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
117
diff_sample.cpp
Normal file
117
diff_sample.cpp
Normal file
@ -0,0 +1,117 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
|
||||||
|
*
|
||||||
|
* This file is part of Bowtie 2.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Bowtie 2 is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with Bowtie 2. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "diff_sample.h"
|
||||||
|
|
||||||
|
struct sampleEntry clDCs[16];
|
||||||
|
bool clDCs_calced = false; /// have clDCs been calculated?
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Entries 4-57 are transcribed from page 6 of Luk and Wong's paper
|
||||||
|
* "Two New Quorum Based Algorithms for Distributed Mutual Exclusion",
|
||||||
|
* which is also used and cited in the Burkhardt and Karkkainen's
|
||||||
|
* papers on difference covers for sorting. These samples are optimal
|
||||||
|
* according to Luk and Wong.
|
||||||
|
*
|
||||||
|
* All other entries are generated via the exhaustive algorithm in
|
||||||
|
* calcExhaustiveDC().
|
||||||
|
*
|
||||||
|
* The 0 is stored at the end of the sample as an end-of-list marker,
|
||||||
|
* but 0 is also an element of each.
|
||||||
|
*
|
||||||
|
* Note that every difference cover has a 0 and a 1. Intuitively,
|
||||||
|
* any optimal difference cover sample can be oriented (i.e. rotated)
|
||||||
|
* such that it includes 0 and 1 as elements.
|
||||||
|
*
|
||||||
|
* All samples in this list have been verified to be complete covers.
|
||||||
|
*
|
||||||
|
* A value of 0xffffffff in the first column indicates that there is no
|
||||||
|
* sample for that value of v. We do not keep samples for values of v
|
||||||
|
* less than 3, since they are trivial (and the caller probably didn't
|
||||||
|
* mean to ask for it).
|
||||||
|
*/
|
||||||
|
uint32_t dc0to64[65][10] = {
|
||||||
|
{0xffffffff}, // 0
|
||||||
|
{0xffffffff}, // 1
|
||||||
|
{0xffffffff}, // 2
|
||||||
|
{1, 0}, // 3
|
||||||
|
{1, 2, 0}, // 4
|
||||||
|
{1, 2, 0}, // 5
|
||||||
|
{1, 3, 0}, // 6
|
||||||
|
{1, 3, 0}, // 7
|
||||||
|
{1, 2, 4, 0}, // 8
|
||||||
|
{1, 2, 4, 0}, // 9
|
||||||
|
{1, 2, 5, 0}, // 10
|
||||||
|
{1, 2, 5, 0}, // 11
|
||||||
|
{1, 3, 7, 0}, // 12
|
||||||
|
{1, 3, 9, 0}, // 13
|
||||||
|
{1, 2, 3, 7, 0}, // 14
|
||||||
|
{1, 2, 3, 7, 0}, // 15
|
||||||
|
{1, 2, 5, 8, 0}, // 16
|
||||||
|
{1, 2, 4, 12, 0}, // 17
|
||||||
|
{1, 2, 5, 11, 0}, // 18
|
||||||
|
{1, 2, 6, 9, 0}, // 19
|
||||||
|
{1, 2, 3, 6, 10, 0}, // 20
|
||||||
|
{1, 4, 14, 16, 0}, // 21
|
||||||
|
{1, 2, 3, 7, 11, 0}, // 22
|
||||||
|
{1, 2, 3, 7, 11, 0}, // 23
|
||||||
|
{1, 2, 3, 7, 15, 0}, // 24
|
||||||
|
{1, 2, 3, 8, 12, 0}, // 25
|
||||||
|
{1, 2, 5, 9, 15, 0}, // 26
|
||||||
|
{1, 2, 5, 13, 22, 0}, // 27
|
||||||
|
{1, 4, 15, 20, 22, 0}, // 28
|
||||||
|
{1, 2, 3, 4, 9, 14, 0}, // 29
|
||||||
|
{1, 2, 3, 4, 9, 19, 0}, // 30
|
||||||
|
{1, 3, 8, 12, 18, 0}, // 31
|
||||||
|
{1, 2, 3, 7, 11, 19, 0}, // 32
|
||||||
|
{1, 2, 3, 6, 16, 27, 0}, // 33
|
||||||
|
{1, 2, 3, 7, 12, 20, 0}, // 34
|
||||||
|
{1, 2, 3, 8, 12, 21, 0}, // 35
|
||||||
|
{1, 2, 5, 12, 14, 20, 0}, // 36
|
||||||
|
{1, 2, 4, 10, 15, 22, 0}, // 37
|
||||||
|
{1, 2, 3, 4, 8, 14, 23, 0}, // 38
|
||||||
|
{1, 2, 4, 13, 18, 33, 0}, // 39
|
||||||
|
{1, 2, 3, 4, 9, 14, 24, 0}, // 40
|
||||||
|
{1, 2, 3, 4, 9, 15, 25, 0}, // 41
|
||||||
|
{1, 2, 3, 4, 9, 15, 25, 0}, // 42
|
||||||
|
{1, 2, 3, 4, 10, 15, 26, 0}, // 43
|
||||||
|
{1, 2, 3, 6, 16, 27, 38, 0}, // 44
|
||||||
|
{1, 2, 3, 5, 12, 18, 26, 0}, // 45
|
||||||
|
{1, 2, 3, 6, 18, 25, 38, 0}, // 46
|
||||||
|
{1, 2, 3, 5, 16, 22, 40, 0}, // 47
|
||||||
|
{1, 2, 5, 9, 20, 26, 36, 0}, // 48
|
||||||
|
{1, 2, 5, 24, 33, 36, 44, 0}, // 49
|
||||||
|
{1, 3, 8, 17, 28, 32, 38, 0}, // 50
|
||||||
|
{1, 2, 5, 11, 18, 30, 38, 0}, // 51
|
||||||
|
{1, 2, 3, 4, 6, 14, 21, 30, 0}, // 52
|
||||||
|
{1, 2, 3, 4, 7, 21, 29, 44, 0}, // 53
|
||||||
|
{1, 2, 3, 4, 9, 15, 21, 31, 0}, // 54
|
||||||
|
{1, 2, 3, 4, 6, 19, 26, 47, 0}, // 55
|
||||||
|
{1, 2, 3, 4, 11, 16, 33, 39, 0}, // 56
|
||||||
|
{1, 3, 13, 32, 36, 43, 52, 0}, // 57
|
||||||
|
|
||||||
|
// Generated by calcExhaustiveDC()
|
||||||
|
{1, 2, 3, 7, 21, 33, 37, 50, 0}, // 58
|
||||||
|
{1, 2, 3, 6, 13, 21, 35, 44, 0}, // 59
|
||||||
|
{1, 2, 4, 9, 15, 25, 30, 42, 0}, // 60
|
||||||
|
{1, 2, 3, 7, 15, 25, 36, 45, 0}, // 61
|
||||||
|
{1, 2, 4, 10, 32, 39, 46, 51, 0}, // 62
|
||||||
|
{1, 2, 6, 8, 20, 38, 41, 54, 0}, // 63
|
||||||
|
{1, 2, 5, 14, 16, 34, 42, 59, 0} // 64
|
||||||
|
};
|
1000
diff_sample.h
Normal file
1000
diff_sample.h
Normal file
File diff suppressed because it is too large
Load Diff
9
docs/404.html
Normal file
9
docs/404.html
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
---
|
||||||
|
layout: page
|
||||||
|
title: 404 Not Found
|
||||||
|
permalink: 404.html
|
||||||
|
hide: true
|
||||||
|
share: false
|
||||||
|
---
|
||||||
|
|
||||||
|
Sorry, the requested page wasn't found on the server.
|
4
docs/Gemfile
Normal file
4
docs/Gemfile
Normal file
@ -0,0 +1,4 @@
|
|||||||
|
source 'https://rubygems.org'
|
||||||
|
gem 'github-pages'
|
||||||
|
gem 'jekyll-feed'
|
||||||
|
gem 'jemoji'
|
21
docs/LICENSE
Normal file
21
docs/LICENSE
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
The MIT License (MIT)
|
||||||
|
|
||||||
|
Copyright (c) 2014 Rohan Chandra
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
59
docs/README.md
Normal file
59
docs/README.md
Normal file
@ -0,0 +1,59 @@
|
|||||||
|
# jekyll-ttskch-theme
|
||||||
|
|
||||||
|
A simple and customizable theme for Jekyll.
|
||||||
|
|
||||||
|
> This theme was renamed from _jekyll-**qck**-theme_ to _jekyll-**tch**-theme_ at 2016.06.02.
|
||||||
|
> And renamed again from _jekyll-**tch**-theme_ to _jekyll-**ttskch**-theme_ at 2016.09.23.
|
||||||
|
|
||||||
|
## Screen shot
|
||||||
|
|
||||||
|
![image](https://cloud.githubusercontent.com/assets/4360663/18776176/62611b38-81a2-11e6-875b-86a66aa8f15c.png)
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
* A lot of Markdown features (also GitHub Flavored Markdown)
|
||||||
|
* `:emoji:` ready :+1:
|
||||||
|
* Easy color-scheme customization
|
||||||
|
* Tags list page
|
||||||
|
* Monthly Archives page
|
||||||
|
* Search feature without any Jekyll plugins
|
||||||
|
* `<!--more-->` tag feature
|
||||||
|
* Anchor links for each headings
|
||||||
|
* Sticky side nav
|
||||||
|
* Responsive
|
||||||
|
* OGP ready
|
||||||
|
* Share buttons ready
|
||||||
|
|
||||||
|
## Getting started
|
||||||
|
|
||||||
|
1. [Fork me](https://github.com/ttskch/jekyll-ttskch-theme/fork)
|
||||||
|
2. Rename the repository from `jekyll-ttskch-theme` to `{username}.github.io` ([learn more](https://pages.github.com/))
|
||||||
|
3. Modify `_config.yml`
|
||||||
|
4. Modify `_sass/base/_variables.scss` if you need to change colors or font sizes
|
||||||
|
5. Add new posts into `_posts/` :smiley:
|
||||||
|
|
||||||
|
## Demo
|
||||||
|
|
||||||
|
You can see live demo at below:
|
||||||
|
|
||||||
|
* https://ttskch.github.io/jekyll-ttskch-theme/
|
||||||
|
|
||||||
|
## Thanks for using :wink:
|
||||||
|
|
||||||
|
* http://ttskch.github.io
|
||||||
|
* http://sitaramshelke.github.io
|
||||||
|
* http://jffourmond.github.io
|
||||||
|
* http://vbflash8.github.io
|
||||||
|
* http://luqitao.github.io
|
||||||
|
* http://harusametime.github.io
|
||||||
|
* http://gitzxon.github.io
|
||||||
|
* http://hutsonlu.github.io
|
||||||
|
* http://k0-1.github.io
|
||||||
|
* http://anthonygore.github.io
|
||||||
|
* http://getjsdojo.github.io
|
||||||
|
* http://georgezhuo.github.io
|
||||||
|
* http://neontapir.github.io
|
||||||
|
* https://sasukeh.github.io
|
||||||
|
* https://blog.guilhermegarnier.com
|
||||||
|
|
||||||
|
Please PR if you want to add your blog.
|
130
docs/_config.yml
Normal file
130
docs/_config.yml
Normal file
@ -0,0 +1,130 @@
|
|||||||
|
#
|
||||||
|
# Basic settings.
|
||||||
|
#
|
||||||
|
url: http://DaehwanKimLab.github.io
|
||||||
|
baseurl: /hisat2
|
||||||
|
title: HISAT2
|
||||||
|
description: graph-based alignment of next generation sequencing reads to a population of genomes
|
||||||
|
avatar: /assets/img/ogp.png
|
||||||
|
# favicon: /favicon.ico
|
||||||
|
favicon: /assets/img/ogp.png
|
||||||
|
# language: ja
|
||||||
|
language: en
|
||||||
|
|
||||||
|
#
|
||||||
|
# Icons
|
||||||
|
#
|
||||||
|
icons:
|
||||||
|
rss: true
|
||||||
|
email:
|
||||||
|
github: DaehwanKimLab
|
||||||
|
bitbucket:
|
||||||
|
twitter:
|
||||||
|
facebook:
|
||||||
|
google_plus:
|
||||||
|
tumblr:
|
||||||
|
behance:
|
||||||
|
dribbble:
|
||||||
|
flickr:
|
||||||
|
instagram:
|
||||||
|
linkedin: # full URL
|
||||||
|
pinterest:
|
||||||
|
reddit:
|
||||||
|
soundcloud:
|
||||||
|
stack_exchange: # full URL
|
||||||
|
steam:
|
||||||
|
wordpress:
|
||||||
|
youtube:
|
||||||
|
|
||||||
|
#
|
||||||
|
# default for front matter
|
||||||
|
#
|
||||||
|
defaults:
|
||||||
|
-
|
||||||
|
scope:
|
||||||
|
path: ""
|
||||||
|
values:
|
||||||
|
category: "main"
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#
|
||||||
|
# Prettify url.
|
||||||
|
#
|
||||||
|
permalink: pretty
|
||||||
|
|
||||||
|
#
|
||||||
|
# Scripts.
|
||||||
|
#
|
||||||
|
google_analytics: # e.g. UA-000000-01
|
||||||
|
disqus:
|
||||||
|
|
||||||
|
#
|
||||||
|
# Localizations.
|
||||||
|
#
|
||||||
|
str_next: Next
|
||||||
|
str_prev: Prev
|
||||||
|
str_read_more: Read more...
|
||||||
|
str_search: Search
|
||||||
|
str_recent_posts: Recent posts
|
||||||
|
str_show_all_posts: Show all posts
|
||||||
|
|
||||||
|
#
|
||||||
|
# Recent posts.
|
||||||
|
#
|
||||||
|
recent_posts_num: 10
|
||||||
|
|
||||||
|
#
|
||||||
|
# Pagination.
|
||||||
|
#
|
||||||
|
paginate: 10
|
||||||
|
paginate_path: page/:num
|
||||||
|
|
||||||
|
#
|
||||||
|
# Social.
|
||||||
|
#
|
||||||
|
share_buttons:
|
||||||
|
twitter: true
|
||||||
|
facebook: false # needs ogp.fb.app_id
|
||||||
|
hatena: false
|
||||||
|
ogp:
|
||||||
|
image_url: //ttskch.github.io/jekyll-ttskch-theme/assets/img/ogp.png
|
||||||
|
fb:
|
||||||
|
admin: # facebook admin id
|
||||||
|
app_id: # facebook application id
|
||||||
|
|
||||||
|
#
|
||||||
|
# Plugins.
|
||||||
|
#
|
||||||
|
gems:
|
||||||
|
- jekyll-paginate
|
||||||
|
- jekyll-feed
|
||||||
|
- jemoji
|
||||||
|
|
||||||
|
#
|
||||||
|
# Styles: see "_sass/base/_variables.scss"
|
||||||
|
#
|
||||||
|
|
||||||
|
#
|
||||||
|
# !! Danger zone !!
|
||||||
|
#
|
||||||
|
|
||||||
|
include: ["_pages"]
|
||||||
|
|
||||||
|
markdown: kramdown
|
||||||
|
kramdown:
|
||||||
|
input: GFM
|
||||||
|
syntax_highlighter: rouge
|
||||||
|
|
||||||
|
excerpt_separator: <!--more-->
|
||||||
|
|
||||||
|
sass:
|
||||||
|
sass_dir: _sass
|
||||||
|
style: :compressed # or :expanded
|
||||||
|
|
||||||
|
exclude:
|
||||||
|
- Gemfile
|
||||||
|
- Gemfile.lock
|
||||||
|
- LICENSE
|
||||||
|
- README.md
|
||||||
|
- vendor
|
6
docs/_data/collaborate.yml
Normal file
6
docs/_data/collaborate.yml
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
- name: Lyda Hill Department of Bioinformatics, The University of Texas Southwestern Medical Center
|
||||||
|
url: https://www.utsouthwestern.edu/departments/bioinformatics
|
||||||
|
logo: /assets/img/bioinformatics_utsw_logo.png
|
||||||
|
- name: Center for Computational Biologoy, Johns Hopkins University
|
||||||
|
url: http://ccb.jhu.edu
|
||||||
|
logo: /assets/img/ccb_jhu_logo_tmp.png
|
10
docs/_data/contributor.yml
Normal file
10
docs/_data/contributor.yml
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
- name: Chanhee Park
|
||||||
|
url: /chanhee.park/
|
||||||
|
- name: Ben Langmead
|
||||||
|
url: http://www.langmead-lab.org/
|
||||||
|
- name: Yun (Leo) Zhang
|
||||||
|
url: /leo.zhang/
|
||||||
|
- name: Steven Salzberg
|
||||||
|
url: https://salzberg-lab.org/in-the-news/about-me/
|
||||||
|
- name: Daehwan Kim
|
||||||
|
url: https://kim-lab.org/daehwan-kim-principal-investigator/
|
66
docs/_data/download-binary.yml
Normal file
66
docs/_data/download-binary.yml
Normal file
@ -0,0 +1,66 @@
|
|||||||
|
latest_version: 2.2.1,2.2.0,2.1.0
|
||||||
|
release:
|
||||||
|
- version: 2.2.1
|
||||||
|
date: 7/24/2020
|
||||||
|
name: HISAT2
|
||||||
|
artifacts:
|
||||||
|
Source: https://cloud.biohpc.swmed.edu/index.php/s/fE9QCsX3NH4QwBi/download
|
||||||
|
OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/zMgEtnF6LjnjFrr/download
|
||||||
|
Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/oTtGWbWjaxsQ2Ho/download
|
||||||
|
- version: 2.2.0
|
||||||
|
date: 2/6/2020
|
||||||
|
name: HISAT2
|
||||||
|
artifacts:
|
||||||
|
Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-220-source/download
|
||||||
|
OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-220-OSX_x86_64/download
|
||||||
|
Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-220-Linux_x86_64/download
|
||||||
|
- version: 2.1.0
|
||||||
|
date: 6/8/2017
|
||||||
|
name: HISAT2
|
||||||
|
artifacts:
|
||||||
|
Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-210-source/download
|
||||||
|
OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-210-OSX_x86_64/download
|
||||||
|
Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-210-Linux_x86_64/download
|
||||||
|
Windows: http://www.di.fc.ul.pt/~afalcao/hisat2_windows.html
|
||||||
|
- version: 2.0.5
|
||||||
|
date: 11/4/2016
|
||||||
|
name: HISAT2
|
||||||
|
artifacts:
|
||||||
|
Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-205-source/download
|
||||||
|
OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-205-OSX_x86_64/download
|
||||||
|
Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-205-Linux_x86_64/download
|
||||||
|
- version: 2.0.4
|
||||||
|
date: 5/18/2016
|
||||||
|
name: HISAT2
|
||||||
|
artifacts:
|
||||||
|
Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-204-source/download
|
||||||
|
OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-204-OSX_x86_64/download
|
||||||
|
Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-204-Linux_x86_64/download
|
||||||
|
- version: 2.0.3-beta
|
||||||
|
date: 3/28/2016
|
||||||
|
name: HISAT2
|
||||||
|
artifacts:
|
||||||
|
Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-203-beta-source/download
|
||||||
|
OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-203-beta-OSX_x86_64/download
|
||||||
|
Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-203-beta-Linux_x86_64/download
|
||||||
|
- version: 2.0.2-beta
|
||||||
|
date: 3/17/2016
|
||||||
|
name: HISAT2
|
||||||
|
artifacts:
|
||||||
|
Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-202-beta-source/download
|
||||||
|
OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-202-beta-OSX_x86_64/download
|
||||||
|
Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-202-beta-Linux_x86_64/download
|
||||||
|
- version: 2.0.1-beta
|
||||||
|
date: 11/19/2015
|
||||||
|
name: HISAT2
|
||||||
|
artifacts:
|
||||||
|
Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-201-beta-source/download
|
||||||
|
OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-201-beta-OSX_x86_64/download
|
||||||
|
Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-201-beta-Linux_x86_64/download
|
||||||
|
- version: 2.0.0-beta
|
||||||
|
date: 9/8/2015
|
||||||
|
name: HISAT2
|
||||||
|
artifacts:
|
||||||
|
Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-200-beta-source/download
|
||||||
|
OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-200-beta-OSX_x86_64/download
|
||||||
|
Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-200-beta-Linux_x86_64/download
|
81
docs/_data/download-index.yml
Normal file
81
docs/_data/download-index.yml
Normal file
@ -0,0 +1,81 @@
|
|||||||
|
- organism: H. sapiens
|
||||||
|
data:
|
||||||
|
GRCh38:
|
||||||
|
genome:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/grch38_genome.tar.gz
|
||||||
|
genome_snp:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/grch38_snp.tar.gz
|
||||||
|
genome_tran:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/grch38_tran.tar.gz
|
||||||
|
genome_snp_tran:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/grch38_snptran.tar.gz
|
||||||
|
genome_rep(above 2.2.0):
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/grch38_rep.tar.gz
|
||||||
|
genome_snp_rep(above 2.2.0):
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/grch38_snprep.tar.gz
|
||||||
|
UCSC hg38:
|
||||||
|
genome:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/hg38_genome.tar.gz
|
||||||
|
genome_tran:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/hg38_tran.tar.gz
|
||||||
|
GRCh37:
|
||||||
|
genome:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/grch37_genome.tar.gz
|
||||||
|
genome_snp:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/grch37_snp.tar.gz
|
||||||
|
genome_tran:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/grch37_tran.tar.gz
|
||||||
|
genome_snp_tran:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/grch37_snptran.tar.gz
|
||||||
|
UCSC hg19:
|
||||||
|
genome:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/hg19_genome.tar.gz
|
||||||
|
- organism: M. musculus
|
||||||
|
data:
|
||||||
|
GRCm38:
|
||||||
|
genome:
|
||||||
|
url: https://cloud.biohpc.swmed.edu/index.php/s/grcm38/download
|
||||||
|
genome_snp:
|
||||||
|
url: https://cloud.biohpc.swmed.edu/index.php/s/grcm38_snp/download
|
||||||
|
genome_tran:
|
||||||
|
url: https://cloud.biohpc.swmed.edu/index.php/s/grcm38_tran/download
|
||||||
|
genome_snp_tran:
|
||||||
|
url: https://cloud.biohpc.swmed.edu/index.php/s/grcm38_snp_tran/download
|
||||||
|
UCSC mm10:
|
||||||
|
genome:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/mm10_genome.tar.gz
|
||||||
|
- organism: R. norvegicus
|
||||||
|
data:
|
||||||
|
UCSC rn6:
|
||||||
|
genome:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/rn6_genome.tar.gz
|
||||||
|
- organism: D. melanogaster
|
||||||
|
data:
|
||||||
|
BDGP6:
|
||||||
|
genome:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/bdgp6.tar.gz
|
||||||
|
genome_tran:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/bdgp6_tran.tar.gz
|
||||||
|
UCSC dm6:
|
||||||
|
genome:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/dm6.tar.gz
|
||||||
|
- organism: C. elegans
|
||||||
|
data:
|
||||||
|
WBcel235:
|
||||||
|
genome:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/wbcel235.tar.gz
|
||||||
|
genome_tran:
|
||||||
|
url: https://genome-idx.s3.amazonaws.com/hisat/wbcel235_tran.tar.gz
|
||||||
|
UCSC ce10:
|
||||||
|
genome:
|
||||||
|
url: https://cloud.biohpc.swmed.edu/index.php/s/bbynxoY2TPpRNQb/download
|
||||||
|
- organism: S. cerevisiae
|
||||||
|
data:
|
||||||
|
R64-1-1:
|
||||||
|
genome:
|
||||||
|
url: https://cloud.biohpc.swmed.edu/index.php/s/JRSoKHD5cHfpCFE/download
|
||||||
|
genome_tran:
|
||||||
|
url: https://cloud.biohpc.swmed.edu/index.php/s/akeiMrGGtt5KoJY/download
|
||||||
|
UCSC sacCer3:
|
||||||
|
genome:
|
||||||
|
url: https://cloud.biohpc.swmed.edu/index.php/s/Gsq4goLW4TDAz4E/download
|
5
docs/_includes/article-footer.html
Normal file
5
docs/_includes/article-footer.html
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
<footer>
|
||||||
|
{% if site.share_buttons and include.share != false %}
|
||||||
|
{% include share-buttons.html page=include.page %}
|
||||||
|
{% endif %}
|
||||||
|
</footer>
|
64
docs/_includes/article-header.html
Normal file
64
docs/_includes/article-header.html
Normal file
@ -0,0 +1,64 @@
|
|||||||
|
{% assign page = include.page %}
|
||||||
|
|
||||||
|
<header>
|
||||||
|
<div class="panel">
|
||||||
|
<h1>
|
||||||
|
{% if include.link %}
|
||||||
|
<a class="post-link" href="{{ page.url | prepend: site.baseurl }}">{{ page.title }}</a>
|
||||||
|
{% else %}
|
||||||
|
{{ page.title }}
|
||||||
|
{% endif %}
|
||||||
|
</h1>
|
||||||
|
|
||||||
|
<ul class="tags">
|
||||||
|
{% assign tags_num = (page.tags | size) %}
|
||||||
|
{% if tags_num > 0 %}
|
||||||
|
<li><i class="fa fa-tags"></i></li>
|
||||||
|
{% endif %}
|
||||||
|
{% for tag in page.tags %}
|
||||||
|
<li>
|
||||||
|
<a class="tag" href="{{ '/search/?t=' | append: tag | prepend: site.baseurl }}">#{{ tag }}</a>
|
||||||
|
</li>
|
||||||
|
{% endfor %}
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<div class="clearfix">
|
||||||
|
<ul class="meta">
|
||||||
|
{% if page.date %}
|
||||||
|
<li>
|
||||||
|
<i class="fa fa-calendar"></i>
|
||||||
|
{{ page.date | date: "%Y-%m-%d" }}
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if page.author %}
|
||||||
|
<li>
|
||||||
|
<a href="{{ '/search/?a=' | append: page.author | prepend: site.baseurl }}">
|
||||||
|
<i class="fa fa-user"></i>
|
||||||
|
{{ page.author }}
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% if page.icons %}
|
||||||
|
<li>
|
||||||
|
<ul class="icons">
|
||||||
|
{% include icons.html icons=page.icons %}
|
||||||
|
</ul>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
{% endif %}
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{% if site.share_buttons and include.share != false %}
|
||||||
|
<div style="margin-top: 1em;">
|
||||||
|
{% include share-buttons.html page=page %}
|
||||||
|
</div>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if include.eye_catch != false and page.eye_catch %}
|
||||||
|
<p style="text-align: center">
|
||||||
|
<img class="eye-catch" src="{{ page.eye_catch }}"/>
|
||||||
|
</p>
|
||||||
|
{% endif %}
|
||||||
|
</header>
|
10
docs/_includes/disqus.html
Normal file
10
docs/_includes/disqus.html
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
<div id="disqus_thread"></div>
|
||||||
|
<script type="text/javascript">
|
||||||
|
var disqus_shortname = '{{ site.disqus }}';
|
||||||
|
(function() {
|
||||||
|
var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
|
||||||
|
dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
|
||||||
|
(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
|
||||||
|
})();
|
||||||
|
</script>
|
||||||
|
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
|
11
docs/_includes/fb-root.html
Normal file
11
docs/_includes/fb-root.html
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
<!-- Init Facebook SDK -->
|
||||||
|
{% if site.share_buttons.facebook %}
|
||||||
|
<div id="fb-root"></div>
|
||||||
|
<script>(function(d, s, id) {
|
||||||
|
var js, fjs = d.getElementsByTagName(s)[0];
|
||||||
|
if (d.getElementById(id)) return;
|
||||||
|
js = d.createElement(s); js.id = id;
|
||||||
|
js.src = "//connect.facebook.net/ja_JP/sdk.js#xfbml=1&version=v2.5&appId={{ site.ogp.fb.app_id }}";
|
||||||
|
fjs.parentNode.insertBefore(js, fjs);
|
||||||
|
}(document, 'script', 'facebook-jssdk'));</script>
|
||||||
|
{% endif %}
|
12
docs/_includes/google-analytics.html
Normal file
12
docs/_includes/google-analytics.html
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
<!-- Google Analytics -->
|
||||||
|
{% if site.google_analytics %}
|
||||||
|
<script>
|
||||||
|
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
|
||||||
|
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
|
||||||
|
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
|
||||||
|
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
|
||||||
|
|
||||||
|
ga('create', '{{ site.google_analytics }}', 'auto');
|
||||||
|
ga('send', 'pageview');
|
||||||
|
</script>
|
||||||
|
{% endif %}
|
161
docs/_includes/icons.html
Normal file
161
docs/_includes/icons.html
Normal file
@ -0,0 +1,161 @@
|
|||||||
|
{% assign icons = include.icons %}
|
||||||
|
|
||||||
|
{% if icons.rss %}
|
||||||
|
<li>
|
||||||
|
<a href="{{ '/feed.xml' | prepend: site.baseurl }}">
|
||||||
|
<i class="fa fa-fw fa-rss"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.email %}
|
||||||
|
<li>
|
||||||
|
<a href="mailto:{{ icons.email }}">
|
||||||
|
<i class="fa fa-fw fa-envelope"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.github %}
|
||||||
|
<li>
|
||||||
|
<a href="https://github.com/{{ icons.github }}">
|
||||||
|
<i class="fa fa-fw fa-github"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.bitbucket %}
|
||||||
|
<li>
|
||||||
|
<a href="https://bitbucket.org/{{ icons.bitbucket }}">
|
||||||
|
<i class="fa fa-fw fa-bitbucket"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.twitter %}
|
||||||
|
<li>
|
||||||
|
<a href="https://twitter.com/{{ icons.twitter }}">
|
||||||
|
<i class="fa fa-fw fa-twitter"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.facebook %}
|
||||||
|
<li>
|
||||||
|
<a href="https://www.facebook.com/{{ icons.facebook }}">
|
||||||
|
<i class="fa fa-fw fa-facebook"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.google_plus %}
|
||||||
|
<li>
|
||||||
|
<a href="https://plus.google.com/{{ icons.google_plus }}">
|
||||||
|
<i class="fa fa-fw fa-google-plus"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.tumblr %}
|
||||||
|
<li>
|
||||||
|
<a href="https://{{ icons.tumblr }}.tumblr.com/">
|
||||||
|
<i class="fa fa-fw fa-tumblr"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.behance %}
|
||||||
|
<li>
|
||||||
|
<a href="https://www.behance.net/{{ icons.behance }}">
|
||||||
|
<i class="fa fa-fw fa-behance"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.dribbble %}
|
||||||
|
<li>
|
||||||
|
<a href="https://dribbble.com/{{ icons.dribbble }}">
|
||||||
|
<i class="fa fa-fw fa-dribbble"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.flickr %}
|
||||||
|
<li>
|
||||||
|
<a href="https://www.flickr.com/photos/{{ icons.flickr }}">
|
||||||
|
<i class="fa fa-fw fa-flickr"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.instagram %}
|
||||||
|
<li>
|
||||||
|
<a href="http://instagram.com/{{ icons.instagram }}">
|
||||||
|
<i class="fa fa-fw fa-instagram"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.linkedin %}
|
||||||
|
<li>
|
||||||
|
<a href="{{ icons.linkedin }}">
|
||||||
|
<i class="fa fa-fw fa-linkedin"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.pinterest %}
|
||||||
|
<li>
|
||||||
|
<a href="http://www.pinterest.com/{{ icons.pinterest }}">
|
||||||
|
<i class="fa fa-fw fa-pinterest"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.reddit %}
|
||||||
|
<li>
|
||||||
|
<a href="https://www.reddit.com/user/{{ icons.reddit }}">
|
||||||
|
<i class="fa fa-fw fa-reddit"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.soundcloud %}
|
||||||
|
<li>
|
||||||
|
<a href="https://soundcloud.com/{{ icons.soundcloud }}">
|
||||||
|
<i class="fa fa-fw fa-soundcloud"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.stack_exchange %}
|
||||||
|
<li>
|
||||||
|
<a href="{{ icons.stack_exchange }}">
|
||||||
|
<i class="fa fa-fw fa-stack-exchange"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.steam %}
|
||||||
|
<li>
|
||||||
|
<a href="http://steamcommunity.com/id/{{ icons.steam }}">
|
||||||
|
<i class="fa fa-fw fa-steam"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.wordpress %}
|
||||||
|
<li>
|
||||||
|
<a href="https://{{ icons.wordpress }}.wordpress.com/">
|
||||||
|
<i class="fa fa-fw fa-wordpress"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if icons.youtube %}
|
||||||
|
<li>
|
||||||
|
<a href="https://www.youtube.com/user/{{ icons.youtube }}">
|
||||||
|
<i class="fa fa-fw fa-youtube"></i>
|
||||||
|
</a>
|
||||||
|
</li>
|
||||||
|
{% endif %}
|
7
docs/_includes/page-url-resolver.html
Normal file
7
docs/_includes/page-url-resolver.html
Normal file
@ -0,0 +1,7 @@
|
|||||||
|
{% assign page = include.page %}
|
||||||
|
|
||||||
|
{% if page.canonical %}
|
||||||
|
{% assign url = page.canonical | prepend: site.baseurl | prepend: site.url %}
|
||||||
|
{% else %}
|
||||||
|
{% assign url = page.url | replace: 'index.html', '' | prepend: site.baseurl | prepend: site.url %}
|
||||||
|
{% endif %}
|
29
docs/_includes/paginator.html
Normal file
29
docs/_includes/paginator.html
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
{% if paginator.total_pages > 1 %}
|
||||||
|
<div class="pagination">
|
||||||
|
|
||||||
|
{% if paginator.previous_page %}
|
||||||
|
<a class="btn" href="{{ paginator.previous_page_path | prepend: site.baseurl }}">
|
||||||
|
<i class="fa fa-chevron-left"></i>
|
||||||
|
{{ site.str_prev }}
|
||||||
|
</a>
|
||||||
|
{% else %}
|
||||||
|
<span class="btn disabled">
|
||||||
|
<i class="fa fa-chevron-left"></i>
|
||||||
|
{{ site.str_prev }}
|
||||||
|
</span>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if paginator.next_page %}
|
||||||
|
<a class="btn" href="{{ paginator.next_page_path | prepend: site.baseurl }}">
|
||||||
|
{{ site.str_next }}
|
||||||
|
<i class="fa fa-chevron-right"></i>
|
||||||
|
</a>
|
||||||
|
{% else %}
|
||||||
|
<span class="btn disabled">
|
||||||
|
{{ site.str_next }}
|
||||||
|
<i class="fa fa-chevron-right"></i>
|
||||||
|
</span>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
</div>
|
||||||
|
{% endif %}
|
22
docs/_includes/share-buttons.html
Normal file
22
docs/_includes/share-buttons.html
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
{% include page-url-resolver.html page=include.page %}
|
||||||
|
{% assign title = include.page.title | append: ' | ' | append: site.title %}
|
||||||
|
<div class="clearfix">
|
||||||
|
<div style="float: right !important;">
|
||||||
|
{% if site.share_buttons.twitter %}
|
||||||
|
<div style="margin-right: 5px !important; float: left !important;">
|
||||||
|
<a href="https://twitter.com/share" class="twitter-share-button"{count} data-url="{{ url }}" data-text="{{ title }}">Tweet</a>
|
||||||
|
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>
|
||||||
|
</div>
|
||||||
|
{% endif %}
|
||||||
|
{% if site.share_buttons.facebook %}
|
||||||
|
<div style="width: 93px !important; float: left !important;">
|
||||||
|
<div class="fb-like" data-href="{{ url }}" data-layout="button_count"></div>
|
||||||
|
</div>
|
||||||
|
{% endif %}
|
||||||
|
{% if site.share_buttons.hatena %}
|
||||||
|
<div style="float: left !important;">
|
||||||
|
<a href="http://b.hatena.ne.jp/entry/{{ url }}" class="hatena-bookmark-button" data-hatena-bookmark-title="{{ title }}" data-hatena-bookmark-layout="standard-balloon" data-hatena-bookmark-lang="ja" title="このエントリーをはてなブックマークに追加"><img src="https://b.st-hatena.com/images/entry-button/button-only@2x.png" alt="このエントリーをはてなブックマークに追加" width="20" height="20" style="border: none;" /></a><script type="text/javascript" src="https://b.st-hatena.com/js/bookmark_button.js" charset="utf-8" async="async"></script>
|
||||||
|
</div>
|
||||||
|
{% endif %}
|
||||||
|
</div>
|
||||||
|
</div>
|
194
docs/_layouts/default.html
Normal file
194
docs/_layouts/default.html
Normal file
@ -0,0 +1,194 @@
|
|||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="{{ site.language }}">
|
||||||
|
<head>
|
||||||
|
{% capture title %}{% if page.title %}{{ page.title }} | {% endif %}{{ site.title }}{% endcapture %}
|
||||||
|
|
||||||
|
{% include page-url-resolver.html page=page %}
|
||||||
|
|
||||||
|
{% if page.excerpt %}
|
||||||
|
{% assign description = page.excerpt | strip_html | strip_newlines | truncate: 160 %}
|
||||||
|
{% else %}
|
||||||
|
{% assign description = site.description %}
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
<meta charset="utf-8">
|
||||||
|
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||||
|
|
||||||
|
<title>{{ title }}</title>
|
||||||
|
|
||||||
|
<meta name="description" content="{{ description }}">
|
||||||
|
|
||||||
|
<link rel="shortcut icon" href="{{ site.favicon | prepend: site.baseurl }}" type="image/x-icon">
|
||||||
|
<link rel="canonical" href="{{ url }}">
|
||||||
|
<link rel="alternate" type="application/atom+xml" title="{{ site.title }}" href="{{ '/feed.xml' | prepend: site.baseurl }}" />
|
||||||
|
|
||||||
|
{% if page.eye_catch %}
|
||||||
|
{% assign ogp_image_url = page.eye_catch %}
|
||||||
|
{% else %}
|
||||||
|
{% assign ogp_image_url = site.ogp.image_url %}
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
<meta property="og:title" content="{{ title }}" />
|
||||||
|
<meta property="og:type" content="website" />
|
||||||
|
<meta property="og:image" content="{{ ogp_image_url }}" />
|
||||||
|
<meta property="og:url" content="{{ url }}" />
|
||||||
|
<meta property="og:site_name" content="{{ site.title }}" />
|
||||||
|
<meta property="fb:admins" content="{{ site.ogp.fb.admin }}" />
|
||||||
|
<meta property="fb:app_id" content="{{ site.ogp.fb.app_id }}" />
|
||||||
|
<meta property="og:description" content="{{ description }}" />
|
||||||
|
|
||||||
|
<!--[if lt IE 9]>
|
||||||
|
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
|
||||||
|
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
|
||||||
|
<![endif]-->
|
||||||
|
<script src="https://use.fontawesome.com/1f5f360d80.js"></script>
|
||||||
|
<link href="//fonts.googleapis.com/css?family=Source+Sans+Pro:400,700,700italic,400italic" rel="stylesheet">
|
||||||
|
|
||||||
|
<link href="{{ '/assets/css/style.css' | prepend: site.baseurl }}" rel="stylesheet">
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
|
||||||
|
<header class="site-header">
|
||||||
|
<div class="inner clearfix">
|
||||||
|
{% if site.avatar %}
|
||||||
|
<a href="{{ '/' | prepend: site.baseurl }}">
|
||||||
|
<img class="avatar" src="{{ site.avatar | prepend: site.baseurl }}" alt=""/>
|
||||||
|
</a>
|
||||||
|
{% endif %}
|
||||||
|
<h1 class="clearfix">
|
||||||
|
<a class="title {% if site.avatar == null %}slim{% endif %}" href="{{ '/' | prepend: site.baseurl }}">{{ site.title }}</a>
|
||||||
|
<br><span class="description">{{ site.description }}</span>
|
||||||
|
</h1>
|
||||||
|
</div>
|
||||||
|
</header>
|
||||||
|
|
||||||
|
<div class="site-container">
|
||||||
|
<div class="site-content">
|
||||||
|
{{ content }}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<aside class="site-aside">
|
||||||
|
<div class="inner">
|
||||||
|
<div class="block">
|
||||||
|
<form action="{{ site.baseurl }}/search">
|
||||||
|
<input type="search" id="search" name="q" placeholder="{{ site.str_search }}" />
|
||||||
|
</form>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="block">
|
||||||
|
<ul>
|
||||||
|
{% assign pages = site.pages | where: "category", "main" | sort: 'order' %}
|
||||||
|
{% for page in pages %}
|
||||||
|
{% if page.title and page.hide != true %}
|
||||||
|
<li><a class="page-link" href="{{ page.url | prepend: site.baseurl }}">{{ page.title }}</a></li>
|
||||||
|
{% endif %}
|
||||||
|
{% endfor %}
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
<!--
|
||||||
|
<ul class="icons">
|
||||||
|
{% include icons.html icons=site.icons %}
|
||||||
|
</ul>
|
||||||
|
<hr class="with-no-margin margin-bottom"/>
|
||||||
|
-->
|
||||||
|
|
||||||
|
<div class="block">
|
||||||
|
<h2>Funding</h2>
|
||||||
|
<br>
|
||||||
|
<div style="font-size: 0.8em">
|
||||||
|
This work was supported in part by the National Human Genome Research Institute under grants R01-HG006102 and R01-HG006677,
|
||||||
|
and NIH grants R01-LM06845 and R01-GM083873 and NSF grant CCF-0347992 to Steven L. Salzberg
|
||||||
|
and by the Cancer Prevention Research Institute of Texas under grant RR170068 and NIH grant R01-GM135341 to Daehwan Kim
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="block">
|
||||||
|
<h2>Getting Help</h2>
|
||||||
|
<br>
|
||||||
|
Please use <a href="mailto:hisat2.genomics@gmail.com">hisat2.genomics@gmail.com</a> for private communications only. Please do not email technical questions to HISAT2 contributors directly.
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="block">
|
||||||
|
<h2>Publications</h2>
|
||||||
|
<div style="font-size: 0.8em">
|
||||||
|
<ul>
|
||||||
|
<li>Kim, D., Paggi, J.M., Park, C. <i>et al.</i> <a class="publication" href="https://doi.org/10.1038/s41587-019-0201-4">Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.</a> <a class="publication" href="https://www.nature.com/nbt/"><i>Nat Biotechnol</i></a> <b>37</b>, 907–915 (2019).</li>
|
||||||
|
<li>Kim D, Langmead B and Salzberg SL. <a class="publication" href="https://doi.org/10.1038/nmeth.3317">HISAT: a fast spliced aligner with low memory requirements.</a> <a class="publication" href="https://www.nature.com/nmeth/"><i>Nature Methods</i></a> 2015</li>
|
||||||
|
<li>Pertea M, Kim D, Pertea G, Leek JT and Salzberg SL. <a class="publication" href="https://doi.org/10.1038/nprot.2016.095">Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown.</a> <a class="publication" href="https://www.nature.com/nprot/"><i>Nature Protocols</i></a> 2016</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="block">
|
||||||
|
<h2>Contributors</h2>
|
||||||
|
<ul>
|
||||||
|
{% for item in site.data.contributor %}
|
||||||
|
<li>
|
||||||
|
{% if item.url contains "http://" or item.url contains "https://" %}
|
||||||
|
<a class="page-link" href="{{ item.url }}">{{ item.name }}</a>
|
||||||
|
{% else %}
|
||||||
|
<a class="page-link" href="{{ item.url | prepend: site.baseurl }}">{{ item.name }}</a>
|
||||||
|
{% endif %}
|
||||||
|
</li>
|
||||||
|
{% endfor %}
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{% if site.data.collaborate %}
|
||||||
|
<div class="block">
|
||||||
|
{% for item in site.data.collaborate %}
|
||||||
|
<ul style="text-align: center">
|
||||||
|
<a href="{{ item.url }}">
|
||||||
|
<img class="avatar" src="{{ item.logo | prepend: site.baseurl }}" alt="{{ item.name }}" />
|
||||||
|
</a>
|
||||||
|
</ul>
|
||||||
|
{% endfor %}
|
||||||
|
</div>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
<!--
|
||||||
|
<div class="block sticky">
|
||||||
|
<h2>{{ site.str_recent_posts }}</h2>
|
||||||
|
<ul>
|
||||||
|
{% assign posts = '' | split: '' %}
|
||||||
|
{% for post in site.posts %}
|
||||||
|
{% if post.hide != true %}
|
||||||
|
{% assign posts = posts | push: post %}
|
||||||
|
{% endif %}
|
||||||
|
{% endfor %}
|
||||||
|
{% assign posts = posts | sort: 'date' | reverse %}
|
||||||
|
{% for post in posts limit:site.recent_posts_num %}
|
||||||
|
<li><a href="{{ post.url | prepend: site.baseurl }}">{{ post.title }}</a></li>
|
||||||
|
{% endfor %}
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
-->
|
||||||
|
|
||||||
|
</div>
|
||||||
|
</aside>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<footer class="site-footer">
|
||||||
|
<div class="inner">
|
||||||
|
<span>Powered by <a href="http://jekyllrb.com">Jekyll</a> with <a href="https://github.com/ttskch/jekyll-ttskch-theme">TtskchTheme</a></span>
|
||||||
|
</div>
|
||||||
|
</footer>
|
||||||
|
|
||||||
|
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
|
||||||
|
<script src="{{ '/assets/lib/garand-sticky/jquery.sticky.js' | prepend: site.baseurl }}"></script>
|
||||||
|
<script src="{{ '/assets/js/script.js' | prepend: site.baseurl }}"></script>
|
||||||
|
|
||||||
|
{% if page.id %}
|
||||||
|
<script src="{{ '/assets/js/header-link.js' | prepend: site.baseurl }}"></script>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if page.permalink == '/search/' %}
|
||||||
|
<script src="{{ '/assets/js/search.js' | prepend: site.baseurl }}"></script>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% include fb-root.html %}
|
||||||
|
{% include google-analytics.html %}
|
||||||
|
|
||||||
|
</body>
|
||||||
|
</html>
|
13
docs/_layouts/page.html
Normal file
13
docs/_layouts/page.html
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
---
|
||||||
|
layout: default
|
||||||
|
---
|
||||||
|
|
||||||
|
<div class="article-wrapper">
|
||||||
|
<article>
|
||||||
|
{% include article-header.html page=page link=false share=page.share %}
|
||||||
|
<section class="post-content">
|
||||||
|
{{ content }}
|
||||||
|
</section>
|
||||||
|
{% include article-footer.html page=page share=page.share %}
|
||||||
|
</article>
|
||||||
|
</div>
|
19
docs/_layouts/post.html
Normal file
19
docs/_layouts/post.html
Normal file
@ -0,0 +1,19 @@
|
|||||||
|
---
|
||||||
|
layout: default
|
||||||
|
---
|
||||||
|
|
||||||
|
<div class="article-wrapper">
|
||||||
|
<article>
|
||||||
|
{% include article-header.html page=page link=false share=page.share %}
|
||||||
|
<section class="post-content">
|
||||||
|
{{ content }}
|
||||||
|
</section>
|
||||||
|
{% include article-footer.html page=page share=page.share %}
|
||||||
|
</article>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{% if site.disqus %}
|
||||||
|
<section class="comments">
|
||||||
|
{% include disqus.html %}
|
||||||
|
</section>
|
||||||
|
{% endif %}
|
9
docs/_pages/about.md
Normal file
9
docs/_pages/about.md
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
---
|
||||||
|
layout: page
|
||||||
|
title: About
|
||||||
|
permalink: /about/
|
||||||
|
order: 2
|
||||||
|
share: false
|
||||||
|
---
|
||||||
|
|
||||||
|
**HISAT2** is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome. Based on an extension of BWT for graphs ([Sirén et al. 2014](http://dl.acm.org/citation.cfm?id=2674828)), we designed and implemented a graph FM index (GFM), an original approach and its first implementation. In addition to using one global GFM index that represents a population of human genomes, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome. These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph FM index (HGFM).
|
20
docs/_pages/archives-all.html
Normal file
20
docs/_pages/archives-all.html
Normal file
@ -0,0 +1,20 @@
|
|||||||
|
---
|
||||||
|
layout: page
|
||||||
|
title: All Posts
|
||||||
|
permalink: /archives/all/
|
||||||
|
hide: true
|
||||||
|
share: false
|
||||||
|
---
|
||||||
|
|
||||||
|
<div id="search-results">
|
||||||
|
<hr id="first-hr" class="with-no-margin"/>
|
||||||
|
|
||||||
|
{% for post in site.posts %}
|
||||||
|
<div class="article-wrapper">
|
||||||
|
<article>
|
||||||
|
{% include article-header.html page=post link=true share=false eye_catch=false %}
|
||||||
|
</article>
|
||||||
|
</div>
|
||||||
|
<hr class="with-no-margin"/>
|
||||||
|
{% endfor %}
|
||||||
|
</div>
|
35
docs/_pages/archives.html
Normal file
35
docs/_pages/archives.html
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
---
|
||||||
|
layout: page
|
||||||
|
title: Archives
|
||||||
|
permalink: /archives/
|
||||||
|
order: 3
|
||||||
|
share: false
|
||||||
|
hide: true
|
||||||
|
---
|
||||||
|
|
||||||
|
{% for post in site.posts %}
|
||||||
|
{% unless post.next %}
|
||||||
|
<h3>{{ post.date | date: '%Y' }}</h3>
|
||||||
|
<ul>
|
||||||
|
{% else %}
|
||||||
|
{% assign year = post.date | date: '%Y' %}
|
||||||
|
{% assign next_year = post.next.date | date: '%Y' %}
|
||||||
|
{% if year != next_year %}
|
||||||
|
</ul>
|
||||||
|
<h3>{{ post.date | date: '%Y' }}</h3>
|
||||||
|
<ul>
|
||||||
|
{% endif %}
|
||||||
|
{% endunless %}
|
||||||
|
|
||||||
|
{% assign month = post.date | date: '%m' %}
|
||||||
|
{% assign next_month = post.next.date | date: '%m' %}
|
||||||
|
{% if year != next_year or month != next_month %}
|
||||||
|
<li><a href="{{ '/search/?d=' | prepend: site.baseurl }}{{ post.date | date: '%Y-%m' }}">{{ post.date | date: '%Y/%m' }}</a></li>
|
||||||
|
{% endif %}
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
|
{% if site.posts %}
|
||||||
|
</ul>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
<a class="btn" href="{{ '/archives/all/' | prepend: site.baseurl }}">{{ site.str_show_all_posts }}</a>
|
12
docs/_pages/contributors/chanheepark.md
Normal file
12
docs/_pages/contributors/chanheepark.md
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
---
|
||||||
|
layout: page
|
||||||
|
title: Chanhee Park
|
||||||
|
permalink: /chanhee.park/
|
||||||
|
order: 1
|
||||||
|
share: false
|
||||||
|
category: contributor
|
||||||
|
---
|
||||||
|
|
||||||
|
Chanhee Park is a Scientific Software Engineer in the Kim Lab at UTSW responsible for maintaining and improving HISAT2.
|
||||||
|
|
||||||
|
[Linkedin](https://www.linkedin.com/in/chanhee-park-97677297/)
|
12
docs/_pages/contributors/yunleozhang.md
Normal file
12
docs/_pages/contributors/yunleozhang.md
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
---
|
||||||
|
layout: page
|
||||||
|
title: Yun (Leo) Zhang
|
||||||
|
permalink: /leo.zhang/
|
||||||
|
order: 1
|
||||||
|
share: false
|
||||||
|
category: contributor
|
||||||
|
---
|
||||||
|
|
||||||
|
Yun (Leo) is a biomedical engineering graduate student at UT Southwestern Medical Center. His main research includes developing advance alignment tools.
|
||||||
|
|
||||||
|
[Linkedin](https://www.linkedin.com/in/zhang-yun-a9565891/)
|
61
docs/_pages/download.md
Normal file
61
docs/_pages/download.md
Normal file
@ -0,0 +1,61 @@
|
|||||||
|
---
|
||||||
|
layout: page
|
||||||
|
title: Download
|
||||||
|
permalink: /download/
|
||||||
|
order: 5
|
||||||
|
share: false
|
||||||
|
---
|
||||||
|
|
||||||
|
Please cite:
|
||||||
|
>Kim, D., Paggi, J.M., Park, C. _et al._ Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. _Nat Biotechnol_ **37**, 907–915 (2019). <https://doi.org/10.1038/s41587-019-0201-4>
|
||||||
|
|
||||||
|
- TOC
|
||||||
|
{:toc}
|
||||||
|
|
||||||
|
## Index
|
||||||
|
HISAT2 indexes are hosted on AWS (Amazon Web Services), thanks to the AWS Public Datasets program. Click this [link](https://registry.opendata.aws/jhu-indexes/) for more details.
|
||||||
|
|
||||||
|
{% for item in site.data.download-index %}
|
||||||
|
### {{ item.organism }}
|
||||||
|
{% for data in item.data %}
|
||||||
|
<li>{{ data[0] }}</li>
|
||||||
|
<table style="border-collapse: collapse; border: none;">
|
||||||
|
{% for genome in data[1] %}
|
||||||
|
<tr style="border: none;"><td style="border: none;">{{ genome[0] }}</td>
|
||||||
|
<td style="border: none;">
|
||||||
|
{% for url in genome[1] %}
|
||||||
|
<a href="{{ url[1] }}">{{ url[1] }}</a><br/>
|
||||||
|
{% endfor %}
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
{% endfor %}
|
||||||
|
</table>
|
||||||
|
{% endfor %}
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
|
|
||||||
|
genome: HISAT2 index for reference
|
||||||
|
genome_snp: HISAT2 Graph index for reference plus SNPs
|
||||||
|
genome_tran: HISAT2 Graph index for reference plus transcripts
|
||||||
|
genome_snp_tran: HISAT2 Graph index for reference plus SNPs and transcripts
|
||||||
|
|
||||||
|
|
||||||
|
## Binaries
|
||||||
|
{: binaries }
|
||||||
|
|
||||||
|
{% assign targets = site.data.download-binary.latest_version | split: "," %}
|
||||||
|
{% for release in site.data.download-binary.release %}
|
||||||
|
{% assign version = release['version'] %}
|
||||||
|
{% if targets contains version or targets == null %}
|
||||||
|
{% assign name = release['name'] %}
|
||||||
|
### Version: {{name}} {{version}}
|
||||||
|
<table style="border-collapse: collapse; border: none;">
|
||||||
|
<tr style="border: none;"><td style="border: none;" colspan="2"><b>Release Date</b>: {{release['date']}}</td></tr>
|
||||||
|
{% for artifact in release['artifacts'] %}
|
||||||
|
{% assign type = artifact[0] %}
|
||||||
|
<tr style="border: none;"><td style="border: none;">{{type}}</td><td style="border: none;"><a href="{{artifact[1]}}">{{artifact[1]}}</a></td></tr>
|
||||||
|
{% endfor %}
|
||||||
|
</table>
|
||||||
|
{% endif %}
|
||||||
|
{% endfor %}
|
||||||
|
|
225
docs/_pages/hisat-3n.md
Normal file
225
docs/_pages/hisat-3n.md
Normal file
@ -0,0 +1,225 @@
|
|||||||
|
---
|
||||||
|
layout: page
|
||||||
|
title: HISAT-3N
|
||||||
|
permalink: /hisat-3n/
|
||||||
|
order: 4
|
||||||
|
share: false
|
||||||
|
---
|
||||||
|
|
||||||
|
HISAT-3N
|
||||||
|
============
|
||||||
|
|
||||||
|
Overview
|
||||||
|
-----------------
|
||||||
|
**HISAT-3N** (hierarchical indexing for spliced alignment of transcripts - 3 nucleotides)
|
||||||
|
is designed for nucleotide conversion sequencing technologies and implemented based on HISAT2.
|
||||||
|
There are two strategies for HISAT-3N to align nuleotide conversion sequencing reads: *standard mode* and *repeat mode*.
|
||||||
|
The standard mode align reads with standard-3N index only, so it is fast and require small memory (~9GB for human genome alignment).
|
||||||
|
The repeat mode align reads with both standard-3N index and repeat-3N index, then output 1,000 alignment result (the output number can be adjust by `--repeat-limit`).
|
||||||
|
The repeat mode can align nucleotide conversion reads more accurately,
|
||||||
|
and it is only 10% slower than the standard mode with tiny more memory (repeat mode use about ~10.5GB) usage than standard mode.
|
||||||
|
|
||||||
|
HISAT-3N is developed based on [HISAT2], which is particularly optimized for RNA sequencing technology.
|
||||||
|
HISAT-3N can be used for any base-converted sequencing reads include [BS-seq], [SLAM-seq], [TAB-seq], [oxBS-seq], [TAPS], [scBS-seq], and [scSLAM-seq],.
|
||||||
|
|
||||||
|
[HISAT2]:https://github.com/DaehwanKimLab/hisat2
|
||||||
|
[BS-seq]: https://en.wikipedia.org/wiki/Bisulfite_sequencing
|
||||||
|
[SLAM-seq]: https://www.nature.com/articles/nmeth.4435
|
||||||
|
[scBS-seq]: https://www.nature.com/articles/nmeth.3035
|
||||||
|
[scSLAM-seq]: https://www.nature.com/articles/s41586-019-1369-y
|
||||||
|
[TAPS]: https://www.nature.com/articles/s41587-019-0041-2
|
||||||
|
[TAB-seq]: https://doi.org/10.1016/j.cell.2012.04.027
|
||||||
|
[oxBS-seq]: https://science.sciencemag.org/content/336/6083/934
|
||||||
|
|
||||||
|
|
||||||
|
Getting started
|
||||||
|
============
|
||||||
|
HISAT-3N requires a 64-bit computer running either Linux or Mac OS X and at least 16 GB of RAM.
|
||||||
|
|
||||||
|
A few notes:
|
||||||
|
|
||||||
|
1. The repeat 3N index building process requires 256 GB of RAM.
|
||||||
|
2. The standard 3N index building requires no more than 16 GB of RAM.
|
||||||
|
3. The alignment process with either standard or repeat index requires no more than 16 GB of RAM.
|
||||||
|
4. [SAMtools] is required to sort SAM file for hisat-3n-table.
|
||||||
|
|
||||||
|
Install
|
||||||
|
------------
|
||||||
|
|
||||||
|
git clone https://github.com/DaehwanKimLab/hisat2.git
|
||||||
|
cd hisat2
|
||||||
|
git checkout -b hisat-3n origin/hisat-3n
|
||||||
|
make
|
||||||
|
|
||||||
|
|
||||||
|
Make sure that you are in the `hisat-3n` branch
|
||||||
|
|
||||||
|
|
||||||
|
Build a 3N index with `hisat-3n-build`
|
||||||
|
-----------
|
||||||
|
`hisat-3n-build` builds a 3N-index, which contains two hisat2 indexes, from a set of DNA sequences. For standard 3N-index,
|
||||||
|
each index contains 16 files with suffix `.3n.*.*.ht2`.
|
||||||
|
For repeat 3N-index, there are 16 more files in addition to the standard 3N-index, and they have the suffix
|
||||||
|
`.3n.*.rep.*.ht2`.
|
||||||
|
These files constitute the hisat-3n index and no other file is needed to alignment reads to the reference.
|
||||||
|
|
||||||
|
* Example for standard HISAT-3N index building:
|
||||||
|
`hisat-3n-build genome.fa genome`
|
||||||
|
|
||||||
|
* Example for repeat HISAT-3N index building (require 256 GB memory):
|
||||||
|
`hisat-3n-build --repeat-index genome.fa genome`
|
||||||
|
|
||||||
|
It is optional to make the graph index and add SNP or spicing site information to the index, to increase the alignment accuracy.
|
||||||
|
for more detail, please check the [HISAT2 manual].
|
||||||
|
|
||||||
|
[HISAT2 manual]:https://daehwankimlab.github.io/hisat2/manual/
|
||||||
|
|
||||||
|
# Standard HISAT-3N integrated index with SNP information
|
||||||
|
hisat-3n-build --exons genome.exon genome.fa genome
|
||||||
|
|
||||||
|
# Standard HISAT-3N integrated index with splicing site information
|
||||||
|
hisat-3n-build --ss genome.ss genome.fa genome
|
||||||
|
|
||||||
|
# Repeat HISAT-3N integrated index with SNP information
|
||||||
|
hisat-3n-build --repeat-index --exons genome.exon genome.fa genome
|
||||||
|
|
||||||
|
# Repeat HISAT-3N integrated index with splicing site information
|
||||||
|
hisat-3n-build --repeat-index --ss genome.ss genome.fa genome
|
||||||
|
|
||||||
|
Alignment with `hisat-3n`
|
||||||
|
------------
|
||||||
|
After we build the HISAT-3N index, you are ready to use `hisat-3n` for alignment.
|
||||||
|
HISAT-3N uses the HISAT2 argument but has some extra arguments. Please check [HISAT2 manual] for more detail.
|
||||||
|
|
||||||
|
For human genome reference, HISAT-3N requires about 9GB for alignment with standard 3N-index and 10.5 GB for repeat 3N-index.
|
||||||
|
|
||||||
|
* `--base-change <chr1,chr2>`
|
||||||
|
Provide which base is converted in the sequencing process to another base. Please enter
|
||||||
|
2 letters separated by ',' for this argument. The first letter(chr1) should be the converted base, the second letter(chr2) should be
|
||||||
|
the converted to base. For example, during slam-seq, some 'T' is converted to 'C',
|
||||||
|
please enter `--base-change T,C`. During bisulfite-seq, some 'C' is converted to 'T', please enter `--base-change C,T`.
|
||||||
|
If you want to align non-converted reads to the regular HISAT2 index, do not use this option.
|
||||||
|
|
||||||
|
* `--index/-x <hisat-3n-idx>`
|
||||||
|
The index for HISAT-3N. The basename is the name of the index files up to but not including the suffix `.3n.*.*.ht2` / etc.
|
||||||
|
For example, you build your index with basename 'genome' by HISAT-3N-build, please enter `--index genome`.
|
||||||
|
|
||||||
|
* `--repeat-limit <int>`
|
||||||
|
You can set up the number of alignment will be check for each repeat alignment. You may increase the number to let hisat-3n
|
||||||
|
output more, if a read has multiple mapping. We suggest the repeat limit number for paired-end reads alignment is no more
|
||||||
|
than 1,000,000. default: 1000.
|
||||||
|
|
||||||
|
* `--unique-only`
|
||||||
|
Only output uniquely aligned reads.
|
||||||
|
|
||||||
|
#### Examples:
|
||||||
|
* Single-end slam-seq reads (T to C conversion) alignment with standard 3N-index:
|
||||||
|
`hisat-3n --index genome -f -U read.fa -S alignment_result.sam --base-change T,C`
|
||||||
|
|
||||||
|
* Paired-end bisulfite-seq reads (C to T conversion) alignment with repeat 3N-index:
|
||||||
|
`hisat-3n --index genome -f -1 read_1.fa -2 read_2.fa -S alignment_result.sam --base-change C,T`
|
||||||
|
|
||||||
|
* Single-end TAPS reads (have C to T conversion) alignment with repeat 3N-index and only output unique aligned result:
|
||||||
|
`hisat-3n --index genome -q -U read.fq -S alignment_result.sam --base-change C,T --unique`
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#### Extra SAM tags generated by HISAT-3N:
|
||||||
|
|
||||||
|
* `Yf:i:<N>`: Number of conversions are detected in the read.
|
||||||
|
|
||||||
|
* `YZ:A:<A>`: The value `+` or `–` indicate the read is mapped to REF-3N (`+`) or REF-RC-3N (`-`).
|
||||||
|
|
||||||
|
Generate a 3N-conversion-table with `hisat-3n-table`
|
||||||
|
------------
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
To generate 3N-conversion-table, users need to sort the SAM file which generated by `hisat-3n`.
|
||||||
|
[SAMtools] is required for this sorting process.
|
||||||
|
|
||||||
|
Use `samtools sort` to convert the SAM file to a sorted SAM file.
|
||||||
|
|
||||||
|
samtools sort alignment_result.sam -o sorted_alignment_result.sam -O sam
|
||||||
|
|
||||||
|
Generate 3N-conversion-table with `hisat-3n-table`:
|
||||||
|
|
||||||
|
### Usage
|
||||||
|
hisat-3n-table [options]* --alignments <alignmentFile> --ref <refFile> --output-name <outputFile> --base-change <char1,char2>
|
||||||
|
|
||||||
|
#### Main arguments
|
||||||
|
* `--alignments <alignmentFile>`
|
||||||
|
SORTED SAM file. Please enter `-` for standard input.
|
||||||
|
|
||||||
|
* `--ref <refFile>`
|
||||||
|
The reference genome file (FASTA format) for generating HISAT-3N index.
|
||||||
|
|
||||||
|
* `--output-name <outputFile>`
|
||||||
|
Filename to write 3N-conversion-table (tsv format) to.
|
||||||
|
|
||||||
|
* `--base-change <char1,char2>`
|
||||||
|
The base-change rule. User should enter the exact same `--base-change` arguments in hisat-3n.
|
||||||
|
For example, please enter `--base-change C,T` for bisulfite sequencing reads.
|
||||||
|
|
||||||
|
#### Input options
|
||||||
|
* `-u/--unique-only`
|
||||||
|
Only count the unique aligned reads into 3N-conversion-table.
|
||||||
|
|
||||||
|
* `-m/--multiple-only`
|
||||||
|
Only count the multiple aligned reads into 3N-conversion-table.
|
||||||
|
|
||||||
|
* `-c/--CG-only`
|
||||||
|
Only count the CpG island in reference genome. This option is designed for bisulfite sequencing reads.
|
||||||
|
|
||||||
|
* `-p/--threads <int>`
|
||||||
|
Launch `int` parallel threads (default: 1) for table building.
|
||||||
|
|
||||||
|
* `-h/--help`
|
||||||
|
Print usage information and quit.
|
||||||
|
|
||||||
|
|
||||||
|
#### Examples:
|
||||||
|
* Generate 3N conversion table for bisulfite sequencing data:
|
||||||
|
`hisat-3n-table -p 16 --alignments sorted_alignment_result.sam --ref genome.fa --output-name output.tsv --base-change C,T`
|
||||||
|
|
||||||
|
* Generate 3N-conversion-table for TAPS data and only count base in CpG island and uniquely aligned:
|
||||||
|
`hisat-3n-table -p 16 --alignments sorted_alignment_result.sam --ref genome.fa --output-name output.tsv --base-change C,T --CG-only --unique-only`
|
||||||
|
|
||||||
|
* Generate 3N conversion table for bisulfite sequencing data from sorted BAM file:
|
||||||
|
`samtools view -h sorted_alignment_result.bam | hisat-3n-table --ref genome.fa --alignments - --output-name output.tsv --base-change C,T`
|
||||||
|
|
||||||
|
* Generate 3N conversion table for bisulfite sequencing data from unsorted BAM file:
|
||||||
|
`samtools sort alignment_result.bam -O sam | hisat-3n-table --ref genome.fa --alignments - --output-name output.tsv --base-change C,T`
|
||||||
|
|
||||||
|
|
||||||
|
#### Note:
|
||||||
|
There are 7 columns in the 3N-conversion-table:
|
||||||
|
|
||||||
|
1. `ref`: the chromosome name.
|
||||||
|
2. `pos`: 1-based position in ref.
|
||||||
|
3. `strand`: '+' for forward strand. '-' for reverse strand.
|
||||||
|
4. `convertedBaseQualities`: the qualities for converted base in read-level measurement. Length of this string is equal to
|
||||||
|
the number of converted Base in read-level measurement.
|
||||||
|
5. `convertedBaseCount`: number of distinct read positions where converted base in read-level measurements were found.
|
||||||
|
this number should equal to the length of convertedBaseQualities.
|
||||||
|
6. `unconvertedBaseQualities`: the qualities for unconverted base in read-level measurement. Length of this string is equal to
|
||||||
|
the number of unconverted Base in read-level measurement.
|
||||||
|
7. `unconvertedBaseCount`: number of distinct read positions where unconverted base in read-level measurements were found.
|
||||||
|
this number should equal to the length of unconvertedBaseQualities.
|
||||||
|
|
||||||
|
##### Sample 3N-conversion-table:
|
||||||
|
ref pos strand convertedBaseQualities convertedBaseCount unconvertedBaseQualities unconvertedBaseCount
|
||||||
|
1 11874 + FFFFFB<BF<F 11 0
|
||||||
|
1 11877 - FFFFFF< 7 0
|
||||||
|
1 11878 + FFFBB//F/BB 11 0
|
||||||
|
1 11879 + 0 FFFBB//FB/ 10
|
||||||
|
1 11880 - F 1 FFFF/ 5
|
||||||
|
[SAMtools]: http://samtools.sourceforge.net
|
||||||
|
|
||||||
|
Publication
|
||||||
|
============
|
||||||
|
|
||||||
|
* HISAT-3N paper
|
||||||
|
Zhang, Y., Park, C., Bennett, C., Thornton, M., & Kim, D. (2021). [Rapid and accurate alignment of nucleotide conversion sequencing reads with HISAT-3N](https://doi.org/10.1101/gr.275193.120). Genome research, gr.275193.120. Advance online publication.
|
||||||
|
|
||||||
|
* HIAST2 paper
|
||||||
|
Kim, D., Paggi, J.M., Park, C. _et al._ [Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype](https://doi.org/10.1038/s41587-019-0201-4). _Nat Biotechnol_ **37**, 907–915 (2019)
|
135
docs/_pages/hisat2.md
Normal file
135
docs/_pages/hisat2.md
Normal file
@ -0,0 +1,135 @@
|
|||||||
|
---
|
||||||
|
layout: page
|
||||||
|
title: Main
|
||||||
|
permalink: /main/
|
||||||
|
order: 1
|
||||||
|
share: false
|
||||||
|
---
|
||||||
|
|
||||||
|
**HISAT2** is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome. Based on an extension of BWT for graphs ([Sirén et al. 2014](http://dl.acm.org/citation.cfm?id=2674828)), we designed and implemented a graph FM index (GFM), an original approach and its first implementation. In addition to using one global GFM index that represents a population of human genomes, **HISAT2** uses a large set of small GFM indexes that collectively cover the whole genome. These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph FM index (HGFM).
|
||||||
|
|
||||||
|
|
||||||
|
### Index files are moved to the AWS Public Dataset Program. 9/3/2020
|
||||||
|
|
||||||
|
We have moved HISAT2 index files to the AWS Public Dataset Program. See the [link](https://registry.opendata.aws/jhu-indexes/) for more details.
|
||||||
|
|
||||||
|
|
||||||
|
### HISAT 2.2.1 release 7/24/2020
|
||||||
|
|
||||||
|
This patch version includes the following changes.
|
||||||
|
* Python3 support
|
||||||
|
* Remove the HISAT-genotype related scripts. HISAT-genotype moved to [http://daehwankimlab.github.io/hisat-genotype/](http://daehwankimlab.github.io/hisat-genotype/)
|
||||||
|
* Fixed bugs related to `--read-lengths` option
|
||||||
|
|
||||||
|
|
||||||
|
### HISAT 2.2.0 release 2/6/2020
|
||||||
|
|
||||||
|
This major version update includes a new feature to handle “repeat” reads. Based on sets of 100-bp simulated and 101-bp real reads that we tested, we found that 2.6-3.4% and 1.4-1.8% of the reads were mapped to >5 locations and >100 locations, respectively. Attempting to report all alignments would likely consume a prohibitive amount of disk space. In order to address this issue, our repeat indexing and alignment approach directly aligns reads to repeat sequences, resulting in one repeat alignment per read. HISAT2 provides application programming interfaces (API) for C++, Python, and JAVA that rapidly retrieve genomic locations from repeat alignments for use in downstream analyses.
|
||||||
|
Other minor bug fixes are also included as follows:
|
||||||
|
|
||||||
|
* Fixed occasional sign (+ or -) issues of template lengths in SAM file
|
||||||
|
* Fixed duplicate read alignments in SAM file
|
||||||
|
* Skip a splice site if exon's last base or first base is ambiguous (N)
|
||||||
|
|
||||||
|
|
||||||
|
### Index files are moved to a different location. 8/30/2019
|
||||||
|
|
||||||
|
Due to a high volume of index downloads, we have moved HISAT2 index files to a different location in order to provide faster download speed. If you use wget or curl to download index files, then you may need to use the following commands to get the correct file name.
|
||||||
|
* `wget --content-disposition` *download_link*
|
||||||
|
* `curl -OJ` *download_link*
|
||||||
|
|
||||||
|
|
||||||
|
### [The HISAT2 paper](https://www.nature.com/articles/s41587-019-0201-4) is out in *Nature Biotechnology*. 8/2/2019
|
||||||
|
|
||||||
|
|
||||||
|
### HISAT 2.1.0 release 6/8/2017
|
||||||
|
|
||||||
|
* This major version includes the first release of HISAT-genotype, which currently performs HLA typing,
|
||||||
|
DNA fingerprinting analysis, and CYP typing on whole genome sequencing (WGS) reads.
|
||||||
|
We plan to extend the system so that it can analyze not just a few genes, but a whole human genome.
|
||||||
|
Please refer to [the HISAT-genotype website](https://daehwankimlab.github.io/hisat-genotype) for more details.
|
||||||
|
* HISAT2 can be directly compiled and executed on Windows system using Visual Studio, thanks to [Nigel Dyer](http://www2.warwick.ac.uk/fac/sci/systemsbiology/staff/dyer/).
|
||||||
|
* Implemented `--new-summary` option to output a new style of alignment summary, which is easier to parse for programming purposes.
|
||||||
|
* Implemented `--summary-file` option to output alignment summary to a file in addition to the terminal (e.g. stderr).
|
||||||
|
* Fixed discrepancy in HISAT2’s alignment summary.
|
||||||
|
* Implemented `--no-templatelen-adjustment` option to disable automatic template length adjustment for RNA-seq reads.
|
||||||
|
|
||||||
|
|
||||||
|
### HISAT2 2.0.5 release 11/4/2016
|
||||||
|
Version 2.0.5 is a minor release with the following changes.
|
||||||
|
* Due to a policy change (HTTP to HTTPS) in using SRA data (`--sra-option`), users are strongly encouraged to use this version. As of 11/9/2016, NCBI will begin a permanent redirect to HTTPS, which means the previous versions of HISAT2 no longer works with `--sra-acc` option soon.
|
||||||
|
* Implemented `-I` and `-X` options for specifying minimum and maximum fragment lengths. The options are valid only when used with `--no-spliced-alignment`, which is used for the alignment of DNA-seq reads.
|
||||||
|
* Fixed some cases where reads with SNPs on their 5' ends were not properly aligned.
|
||||||
|
* Implemented `--no-softclip` option to disable soft-clipping.
|
||||||
|
* Implemented `--max-seeds` to specify the maximum number of seeds that HISAT2 will try to extend to full-length alignments (see [the manual] for details).
|
||||||
|
|
||||||
|
|
||||||
|
### [HISAT, StringTie and Ballgown protocol](http://www.nature.com/nprot/journal/v11/n9/full/nprot.2016.095.html) published at Nature Protocols 8/11/2016
|
||||||
|
|
||||||
|
|
||||||
|
### HISAT2 2.0.4 Windows binary available [here](http://www.di.fc.ul.pt/~afalcao/hisat2_windows.html), thanks to [Andre Osorio Falcao](http://www.di.fc.ul.pt/~afalcao/) 5/24/2016
|
||||||
|
|
||||||
|
|
||||||
|
### HISAT2 2.0.4 release 5/18/2016
|
||||||
|
Version 2.0.4 is a minor release with the following changes.
|
||||||
|
* Improved template length estimation (the 9th column of the SAM format) of RNA-seq reads by taking introns into account.
|
||||||
|
* Introduced two options, `--remove-chrname` and `--add-chrname`, to remove "chr" from reference names or add "chr" to reference names in the alignment output, respectively (the 3rd column of the SAM format).
|
||||||
|
* Changed the maximum of mapping quality (the 5th column of the SAM format) from 255 to 60. Note that 255 is an undefined value according to the SAM manual and some programs would not work with this value (255) properly.
|
||||||
|
* Fixed NH (number of hits) in the alignment output.
|
||||||
|
* HISAT2 allows indels of any length pertaining to minimum alignment score (previously, the maximum length of indels was 3 bp).
|
||||||
|
* Fixed several cases that alignment goes beyond reference sequences.
|
||||||
|
* Fixed reporting duplicate alignments.
|
||||||
|
|
||||||
|
|
||||||
|
### HISAT2 2.0.3-beta release 3/28/2016
|
||||||
|
Version 2.0.3-beta is a minor release with the following changes.
|
||||||
|
* Fixed graph index building when using both SNPs and transcripts. As a result, genome_snp_tran indexes here on the HISAT2 website have been rebuilt.
|
||||||
|
* Included some missing files needed to follow the small test example (see [the manual] for details).
|
||||||
|
|
||||||
|
|
||||||
|
### HISAT2 2.0.2-beta release 3/17/2016
|
||||||
|
**Note (3/19/2016):** this version is slightly updated to handle reporting splice sites with the correct chromosome names.
|
||||||
|
Version 2.0.2-beta is a major release with the following changes.
|
||||||
|
* Memory mappaped IO (`--mm` option) works now.
|
||||||
|
* Building linear index can be now done using multi-threads.
|
||||||
|
* Changed the minimum score for alignment in keeping with read lengths, so it's now `--score-min L,0.0,-0.2`, meaning a minimum score of -20 for 100-bp reads and -30 for 150-bp reads.
|
||||||
|
* Fixed a bug that the same read was written into a file multiple times when `--un-conc` was used.
|
||||||
|
* Fixed another bug that caused reads to map beyond reference sequences.
|
||||||
|
* Introduced `--haplotype` option in the hisat2-build (index building), which is used with `--snp` option together to incorporate those SNP combinations present in the human population. This option also prevents graph construction from exploding due to exponential combinations of SNPs in small genomic regions.
|
||||||
|
* Provided a new python script to extract SNPs and haplotypes from VCF files, <i>hisat2_extract_snps_haplotypes_VCF.py</i>
|
||||||
|
* Changed several python script names as follows<
|
||||||
|
* *extract_splice_sites.py* to *hisat2_extract_splice_sites.py*
|
||||||
|
* *extract_exons.py* to *hisat2_extract_exons.py*
|
||||||
|
* *extract_snps.py* to *hisat2_extract_snps_haplotypes_UCSC.py*
|
||||||
|
|
||||||
|
|
||||||
|
### HISAT2 2.0.1-beta release 11/19/2015
|
||||||
|
Version 2.0.1-beta is a maintenance release with the following changes.
|
||||||
|
* Fixed a bug that caused reads to map beyond reference sequences.
|
||||||
|
* Fixed a deadlock issue that happened very rarely.
|
||||||
|
* Fixed a bug that led to illegal memory access when reading SNP information.
|
||||||
|
* Fixed a system-specific bug related to popcount instruction.
|
||||||
|
|
||||||
|
|
||||||
|
### HISAT2 2.0.0-beta release 9/8/2015 - first release
|
||||||
|
We extended the BWT/FM index to incorporate genomic differences among individuals into the reference genome, while keeping memory requirements low enough to fit the entire index onto a desktop computer. Using this novel Hierarchical Graph FM index (HGFM) approach, we built a new alignment system, HISAT2, with an index that incorporates ~12.3M common SNPs from the dbSNP database. HISAT2 provides greater alignment accuracy for reads containing SNPs.
|
||||||
|
* HISAT2's index size for the human reference genome and 12.3 million common SNPs is 6.2GB (the memory footprint of HISAT2 is 6.7GB). The SNPs consist of 11 million single nucleotide polymorphisms, 728,000 deletions, and 555,000 insertions. The insertions and deletions used in this index are small (usually <20bp).
|
||||||
|
* HISAT2 comes with several index types:
|
||||||
|
* Hierarchical FM index (HFM) for a reference genome (index base: <i>genome</i>)
|
||||||
|
* Hierarchical Graph FM index (HGFM) for a reference genome plus SNPs (index base: <i>genome_snp</i>)
|
||||||
|
* Hierarchical Graph FM index (HGFM) for a reference genome plus transcripts (index base: <i>genome_tran</i>)
|
||||||
|
* Hierarchical Graph FM index (HGFM) for a reference genome plus SNPs and transcripts (index base: <i>genome_snp_tran</i>)
|
||||||
|
* HISAT2 is a successor to both [HISAT](http://ccb.jhu.edu/software/hisat) and [TopHat2](http://ccb.jhu.edu/software/tophat). We recommend that HISAT and TopHat2 users switch to HISAT2.
|
||||||
|
* HISAT2 can be considered an enhanced version of HISAT with many improvements and bug fixes. The alignment speed and memory requirements of HISAT2 are virtually the same as those of HISAT when using the HFM index (<i>genome</i>).
|
||||||
|
* When using graph-based indexes (HGFM), the runtime of HISAT2 is slightly slower than HISAT (30~80% additional CPU time).
|
||||||
|
* HISAT2 allows for mapping reads directly against transcripts, similar to that of TopHat2 (use <i>genome_tran</i> or <i>genome_snp_tran</i>).
|
||||||
|
* When reads contain SNPs, the SNP information is provided as an optional field in the SAM output of HISAT2 (e.g., **<code>Zs:Z:1|S|rs3747203,97|S|rs16990981</code>** - see [the manual] for details). This feature enables fast and sensitive genotyping in downstream analyses. Note that there is no alignment penalty for mismatches, insertions, and deletions if they correspond to known SNPs.
|
||||||
|
* HISAT2 provides options for transcript assemblers (e.g., StringTie and Cufflinks) to work better with the alignment from HISAT2 (see options such as `--dta` and `--dta-cufflinks`).
|
||||||
|
* Some slides about HISAT2 are found [here]({{ '/assets/data/HISAT2-first_release-Sept_8_2015.pdf' | prepend: site.baseurl }}) and we are preparing detailed documention.
|
||||||
|
* We plan to incorporate a larger set of SNPs and structural variations (SV) into this index (e.g., long insertions/deletions, inversions, and translocations).
|
||||||
|
|
||||||
|
[the manual]: {{ site.baseurl }}{% link _pages/manual.md %}
|
||||||
|
|
||||||
|
### The HISAT2 source code is available in a [public GitHub repository](https://github.com/DaehwanKimLab/hisat2) (5/30/2015).
|
||||||
|
|
||||||
|
|
78
docs/_pages/howto.md
Normal file
78
docs/_pages/howto.md
Normal file
@ -0,0 +1,78 @@
|
|||||||
|
---
|
||||||
|
layout: page
|
||||||
|
title: HowTo
|
||||||
|
permalink: /howto/
|
||||||
|
order: 6
|
||||||
|
share: false
|
||||||
|
---
|
||||||
|
|
||||||
|
## HOWTO
|
||||||
|
{: .no_toc}
|
||||||
|
|
||||||
|
- TOC
|
||||||
|
{:toc}
|
||||||
|
|
||||||
|
### Building indexes
|
||||||
|
Depend on your purpose, you have to download reference sequence, gene annotation and SNP files.
|
||||||
|
We also provides scripts to build indexes. [Download]({{ site.baseurl }}{% link _pages/download.md %})
|
||||||
|
|
||||||
|
#### Prepare data
|
||||||
|
1. Download reference
|
||||||
|
```
|
||||||
|
$ wget ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
|
||||||
|
$ gzip -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
|
||||||
|
$ mv Homo_sapiens.GRCh38.dna.primary_assembly.fa genome.fa
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Download GTF and make exon, splicesite file.
|
||||||
|
If you want to build HFM index, you can skip this step.
|
||||||
|
```
|
||||||
|
$ wget ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.gtf.gz
|
||||||
|
$ gzip -d Homo_sapiens.GRCh38.84.gtf.gz
|
||||||
|
$ mv Homo_sapiens.GRCh38.84.gtf genome.gtf
|
||||||
|
$ hisat2_extract_splice_sites.py genome.gtf > genome.ss
|
||||||
|
$ hisat2_extract_exons.py genome.gtf > genome.exon
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Download SNP
|
||||||
|
If you want to build HFM index, you can skip this step.
|
||||||
|
```
|
||||||
|
$ wget http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/snp144Common.txt.gz
|
||||||
|
$ gzip -d snp144Common.txt.gz
|
||||||
|
```
|
||||||
|
|
||||||
|
Convert chromosome names of UCSC Database to Ensembl Annotation
|
||||||
|
```
|
||||||
|
$ awk 'BEGIN{OFS="\t"} {if($2 ~ /^chr/) {$2 = substr($2, 4)}; if($2 == "M") {$2 = "MT"} print}' snp144Common.txt > snp144Common.txt.ensembl
|
||||||
|
```
|
||||||
|
|
||||||
|
make SNPs and haplotype file
|
||||||
|
```
|
||||||
|
$ hisat2_extract_snps_haplotypes_UCSC.py genome.fa snp144Common.txt.ensembl genome
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Build HFM index
|
||||||
|
It takes about 20 minutes(depend on HW spec) to build index, and requires at least 6GB memory.
|
||||||
|
```
|
||||||
|
$ hisat2-build -p 16 genome.fa genome
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Build HGFM index with SNPs
|
||||||
|
```
|
||||||
|
$ hisat2-build -p 16 --snp genome.snp --haplotype genome.haplotype genome.fa genome_snp
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Build HGFM index with transcripts
|
||||||
|
It takes about 1 hour(depend on HW spec) to build index, and requires at least 160GB memory.
|
||||||
|
```
|
||||||
|
$ hisat2-build -p 16 --exon genome.exon --ss genome.ss genome.fa genome_tran
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Build HGFM index with SNPs and transcripts
|
||||||
|
|
||||||
|
```
|
||||||
|
$ hisat2-build -p 16 --snp genome.snp --haplotype genome.haplotype --exon genome.exon --ss genome.ss genome.fa genome_snp_tran
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
17
docs/_pages/links.md
Normal file
17
docs/_pages/links.md
Normal file
@ -0,0 +1,17 @@
|
|||||||
|
---
|
||||||
|
layout: page
|
||||||
|
title: Links
|
||||||
|
permalink: /links/
|
||||||
|
order: 7
|
||||||
|
share: false
|
||||||
|
---
|
||||||
|
|
||||||
|
* KimLab - <https://kim-lab.org>
|
||||||
|
* github - <https://github.com/DaehwanKimLab>
|
||||||
|
* hisat-genotype - <https://daehwankimlab.github.io/hisat-genotype>
|
||||||
|
* github for hisat-genotype - <https://github.com/DaehwanKimLab/hisat-genotype>
|
||||||
|
|
||||||
|
* Lyda Hill Department of Bioinformatics at UT Southwestern Medical Center - <https://www.utsouthwestern.edu/departments/bioinformatics>
|
||||||
|
|
||||||
|
* Center for Computational Biology at Johns Hopkins University - <http://www.ccb.jhu.edu>
|
||||||
|
|
1545
docs/_pages/manual.md
Normal file
1545
docs/_pages/manual.md
Normal file
File diff suppressed because it is too large
Load Diff
26
docs/_pages/search.html
Normal file
26
docs/_pages/search.html
Normal file
@ -0,0 +1,26 @@
|
|||||||
|
---
|
||||||
|
layout: page
|
||||||
|
title: Search Results
|
||||||
|
permalink: /search/
|
||||||
|
hide: true
|
||||||
|
share: false
|
||||||
|
---
|
||||||
|
|
||||||
|
<script>
|
||||||
|
var baseurl = "{{ site.baseurl }}";
|
||||||
|
</script>
|
||||||
|
|
||||||
|
<div id="search-results">
|
||||||
|
<hr id="first-hr" class="with-no-margin"/>
|
||||||
|
|
||||||
|
{% for post in site.posts %}
|
||||||
|
<div id="{{ post.id | replace: '/', '-' }}" style="display: none;">
|
||||||
|
<div class="article-wrapper">
|
||||||
|
<article>
|
||||||
|
{% include article-header.html page=post link=true share=false eye_catch=false %}
|
||||||
|
</article>
|
||||||
|
</div>
|
||||||
|
<hr class="with-no-margin"/>
|
||||||
|
</div>
|
||||||
|
{% endfor %}
|
||||||
|
</div>
|
14
docs/_pages/tags.html
Normal file
14
docs/_pages/tags.html
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
---
|
||||||
|
layout: page
|
||||||
|
title: Tags
|
||||||
|
permalink: /tags/
|
||||||
|
order: 2
|
||||||
|
share: false
|
||||||
|
hide: true
|
||||||
|
---
|
||||||
|
|
||||||
|
<ul class="inline">
|
||||||
|
{% for tag in site.tags %}
|
||||||
|
<li><a href="{{ '/search/?t=' | prepend: site.baseurl }}{{ tag[0] }}">#{{ tag[0] }}</a></li>
|
||||||
|
{% endfor %}
|
||||||
|
</ul>
|
13
docs/_posts/2000-01-01-kim.md
Normal file
13
docs/_posts/2000-01-01-kim.md
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
---
|
||||||
|
layout: post
|
||||||
|
title: Daehwan Kim
|
||||||
|
tags: daehwankim
|
||||||
|
eye_catch: https://avatars0.githubusercontent.com/u/28678667?s=460&v=4
|
||||||
|
---
|
||||||
|
|
||||||
|
Daehwan Kim is an Assistant Professor at UT Southwestern and was the original designer who layed much of the ground work for HISAT-genotype.
|
||||||
|
|
||||||
|
[Webpage](https://kim-lab.org/daehwan-kim-principal-investigator/)
|
||||||
|
|
||||||
|
|
||||||
|
|
11
docs/_posts/2000-01-02-salzberg.md
Normal file
11
docs/_posts/2000-01-02-salzberg.md
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
---
|
||||||
|
layout: post
|
||||||
|
title: Steven Salzberg
|
||||||
|
tags: stevensalzberg
|
||||||
|
eye_catch: https://avatars0.githubusercontent.com/u/28678667?s=460&v=4
|
||||||
|
---
|
||||||
|
|
||||||
|
Steven Salzberg is the Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics at Johns Hopkins University, where I’m also Director of the Center for Computational Biology.
|
||||||
|
|
||||||
|
[Webpage](https://salzberg-lab.org/in-the-news/about-me/)
|
||||||
|
|
13
docs/_posts/2000-01-03-langmead.md
Normal file
13
docs/_posts/2000-01-03-langmead.md
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
---
|
||||||
|
layout: post
|
||||||
|
title: Ben Langmead
|
||||||
|
tags: benlangmead
|
||||||
|
eye_catch: https://avatars0.githubusercontent.com/u/28678667?s=460&v=4
|
||||||
|
---
|
||||||
|
|
||||||
|
Ben Langmead is an Associate Professor of Computer Science at Johns Hopkins University.
|
||||||
|
|
||||||
|
[Webpage](http://www.langmead-lab.org/)
|
||||||
|
|
||||||
|
|
||||||
|
|
10
docs/_posts/2019-07-28-park.md
Normal file
10
docs/_posts/2019-07-28-park.md
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
---
|
||||||
|
layout: post
|
||||||
|
title: Chanhee Park
|
||||||
|
tags: chanheepark
|
||||||
|
eye_catch: https://avatars0.githubusercontent.com/u/28678667?s=460&v=4
|
||||||
|
---
|
||||||
|
|
||||||
|
Chanhee Park is a Scientific Software Engineer in the Kim Lab at UTSW responsible for maintaining and improving HISAT2, the core of HISAT-genotype.
|
||||||
|
|
||||||
|
[Linkedin](https://www.linkedin.com/in/chanhee-park-97677297/)
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user