initial commit

2025-01-18 21:09:52 +08:00 · 2025-01-18 21:09:52 +08:00 · 5e601d0401
commit 5e601d0401
428 changed files with 206785 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@ -0,0 +1 @@
 *.pbxproj binary merge=union
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,48 @@
 *~
 *.dSYM
 .DS_Store
 tags
 *-debug
 *-s
 *-l
 hisat2.xcodeproj/project.xcworkspace
 hisat2.xcodeproj/xcuserdata
 hisat2.xcodeproj/xcshareddata
 *.patch
 build_automaton
 build_index
 clean_alignment
 determinize
 gcsa_alignment
 gcsa_test
 hisat2-repeat
 hisat2_test/*.bt2
 hisat2_test/*.ht2
 hisat2_test/*.sam
 hisat2_test/paper_example.malignment.automaton
 hisat2_test/paper_example.malignment.backbone
 hisat2_test/paper_example.malignment.gcsa
 hisat2_test/kim_example*.malignment.automaton
 hisat2_test/kim_example*.malignment.backbone
 hisat2_test/kim_example*.malignment.gcsa
 hisat2_test/genome*
 hisat2_test/2*
 hisat2_test/snp142*
 hisat2_test/testset*
 .idea
 .vscode
 .ht2lib-obj*
 *.a
 *.so
 docs/_site
 docs/*.lock
 docs/.*-cache
 *.tar.gz
 *.ipynb
 *.pyc
 cmake*
--- a/29
+++ b/29
@ -0,0 +1,29 @@
 Ben Langmead <langmea@cs.jhu.edu> wrote Bowtie 2, which is based partially on
 Bowtie.  Bowtie was written by Ben Langmead and Cole Trapnell.
  Bowtie & Bowtie 2:  http://bowtie-bio.sf.net
 A DLL from the pthreads for Win32 library is distributed with the Win32 version
 of Bowtie 2.  The pthreads for Win32 library and the GnuWin32 package have many
 contributors (see their respective web sites).
  pthreads for Win32: http://sourceware.org/pthreads-win32
  GnuWin32:           http://gnuwin32.sf.net
 The ForkManager.pm perl module is used in Bowtie 2's random testing framework,
 and is included as scripts/sim/contrib/ForkManager.pm.  ForkManager.pm is
 written by dLux (Szabo, Balazs), with contributions by others.  See the perldoc
 in ForkManager.pm for the complete list.
 The file ls.h includes an implementation of the Larsson-Sadakane suffix sorting
 algorithm.  The implementation is by N. Jesper Larsson and was adapted somewhat
 for use in Bowtie 2.
 TinyThreads is a portable thread implementation with a fairly compatible subset 
 of C++11 thread management classes written by Marcus Geelnard. For more info
 check http://tinythreadpp.bitsnbites.eu/ 
 Various users have kindly supplied patches, bug reports and feature requests
 over the years.  Many, many thanks go to them.
 September 2011
--- a/HISAT2-genotype.png
+++ b/HISAT2-genotype.png
--- a/1
+++ b/1
@ -0,0 +1 @@
 2.2.1-3n-0.0.3
--- a/674
+++ b/674
@ -0,0 +1,674 @@
                    GNU GENERAL PUBLIC LICENSE
                       Version 3, 29 June 2007
 Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.
                            Preamble
  The GNU General Public License is a free, copyleft license for
 software and other kinds of works.
  The licenses for most software and other practical works are designed
 to take away your freedom to share and change the works.  By contrast,
 the GNU General Public License is intended to guarantee your freedom to
 share and change all versions of a program--to make sure it remains free
 software for all its users.  We, the Free Software Foundation, use the
 GNU General Public License for most of our software; it applies also to
 any other work released this way by its authors.  You can apply it to
 your programs, too.
  When we speak of free software, we are referring to freedom, not
 price.  Our General Public Licenses are designed to make sure that you
 have the freedom to distribute copies of free software (and charge for
 them if you wish), that you receive source code or can get it if you
 want it, that you can change the software or use pieces of it in new
 free programs, and that you know you can do these things.
  To protect your rights, we need to prevent others from denying you
 these rights or asking you to surrender the rights.  Therefore, you have
 certain responsibilities if you distribute copies of the software, or if
 you modify it: responsibilities to respect the freedom of others.
  For example, if you distribute copies of such a program, whether
 gratis or for a fee, you must pass on to the recipients the same
 freedoms that you received.  You must make sure that they, too, receive
 or can get the source code.  And you must show them these terms so they
 know their rights.
  Developers that use the GNU GPL protect your rights with two steps:
 (1) assert copyright on the software, and (2) offer you this License
 giving you legal permission to copy, distribute and/or modify it.
  For the developers' and authors' protection, the GPL clearly explains
 that there is no warranty for this free software.  For both users' and
 authors' sake, the GPL requires that modified versions be marked as
 changed, so that their problems will not be attributed erroneously to
 authors of previous versions.
  Some devices are designed to deny users access to install or run
 modified versions of the software inside them, although the manufacturer
 can do so.  This is fundamentally incompatible with the aim of
 protecting users' freedom to change the software.  The systematic
 pattern of such abuse occurs in the area of products for individuals to
 use, which is precisely where it is most unacceptable.  Therefore, we
 have designed this version of the GPL to prohibit the practice for those
 products.  If such problems arise substantially in other domains, we
 stand ready to extend this provision to those domains in future versions
 of the GPL, as needed to protect the freedom of users.
  Finally, every program is threatened constantly by software patents.
 States should not allow patents to restrict development and use of
 software on general-purpose computers, but in those that do, we wish to
 avoid the special danger that patents applied to a free program could
 make it effectively proprietary.  To prevent this, the GPL assures that
 patents cannot be used to render the program non-free.
  The precise terms and conditions for copying, distribution and
 modification follow.
                       TERMS AND CONDITIONS
  0. Definitions.
  "This License" refers to version 3 of the GNU General Public License.
  "Copyright" also means copyright-like laws that apply to other kinds of
 works, such as semiconductor masks.
  "The Program" refers to any copyrightable work licensed under this
 License.  Each licensee is addressed as "you".  "Licensees" and
 "recipients" may be individuals or organizations.
  To "modify" a work means to copy from or adapt all or part of the work
 in a fashion requiring copyright permission, other than the making of an
 exact copy.  The resulting work is called a "modified version" of the
 earlier work or a work "based on" the earlier work.
  A "covered work" means either the unmodified Program or a work based
 on the Program.
  To "propagate" a work means to do anything with it that, without
 permission, would make you directly or secondarily liable for
 infringement under applicable copyright law, except executing it on a
 computer or modifying a private copy.  Propagation includes copying,
 distribution (with or without modification), making available to the
 public, and in some countries other activities as well.
  To "convey" a work means any kind of propagation that enables other
 parties to make or receive copies.  Mere interaction with a user through
 a computer network, with no transfer of a copy, is not conveying.
  An interactive user interface displays "Appropriate Legal Notices"
 to the extent that it includes a convenient and prominently visible
 feature that (1) displays an appropriate copyright notice, and (2)
 tells the user that there is no warranty for the work (except to the
 extent that warranties are provided), that licensees may convey the
 work under this License, and how to view a copy of this License.  If
 the interface presents a list of user commands or options, such as a
 menu, a prominent item in the list meets this criterion.
  1. Source Code.
  The "source code" for a work means the preferred form of the work
 for making modifications to it.  "Object code" means any non-source
 form of a work.
  A "Standard Interface" means an interface that either is an official
 standard defined by a recognized standards body, or, in the case of
 interfaces specified for a particular programming language, one that
 is widely used among developers working in that language.
  The "System Libraries" of an executable work include anything, other
 than the work as a whole, that (a) is included in the normal form of
 packaging a Major Component, but which is not part of that Major
 Component, and (b) serves only to enable use of the work with that
 Major Component, or to implement a Standard Interface for which an
 implementation is available to the public in source code form.  A
 "Major Component", in this context, means a major essential component
 (kernel, window system, and so on) of the specific operating system
 (if any) on which the executable work runs, or a compiler used to
 produce the work, or an object code interpreter used to run it.
  The "Corresponding Source" for a work in object code form means all
 the source code needed to generate, install, and (for an executable
 work) run the object code and to modify the work, including scripts to
 control those activities.  However, it does not include the work's
 System Libraries, or general-purpose tools or generally available free
 programs which are used unmodified in performing those activities but
 which are not part of the work.  For example, Corresponding Source
 includes interface definition files associated with source files for
 the work, and the source code for shared libraries and dynamically
 linked subprograms that the work is specifically designed to require,
 such as by intimate data communication or control flow between those
 subprograms and other parts of the work.
  The Corresponding Source need not include anything that users
 can regenerate automatically from other parts of the Corresponding
 Source.
  The Corresponding Source for a work in source code form is that
 same work.
  2. Basic Permissions.
  All rights granted under this License are granted for the term of
 copyright on the Program, and are irrevocable provided the stated
 conditions are met.  This License explicitly affirms your unlimited
 permission to run the unmodified Program.  The output from running a
 covered work is covered by this License only if the output, given its
 content, constitutes a covered work.  This License acknowledges your
 rights of fair use or other equivalent, as provided by copyright law.
  You may make, run and propagate covered works that you do not
 convey, without conditions so long as your license otherwise remains
 in force.  You may convey covered works to others for the sole purpose
 of having them make modifications exclusively for you, or provide you
 with facilities for running those works, provided that you comply with
 the terms of this License in conveying all material for which you do
 not control copyright.  Those thus making or running the covered works
 for you must do so exclusively on your behalf, under your direction
 and control, on terms that prohibit them from making any copies of
 your copyrighted material outside their relationship with you.
  Conveying under any other circumstances is permitted solely under
 the conditions stated below.  Sublicensing is not allowed; section 10
 makes it unnecessary.
  3. Protecting Users' Legal Rights From Anti-Circumvention Law.
  No covered work shall be deemed part of an effective technological
 measure under any applicable law fulfilling obligations under article
 11 of the WIPO copyright treaty adopted on 20 December 1996, or
 similar laws prohibiting or restricting circumvention of such
 measures.
  When you convey a covered work, you waive any legal power to forbid
 circumvention of technological measures to the extent such circumvention
 is effected by exercising rights under this License with respect to
 the covered work, and you disclaim any intention to limit operation or
 modification of the work as a means of enforcing, against the work's
 users, your or third parties' legal rights to forbid circumvention of
 technological measures.
  4. Conveying Verbatim Copies.
  You may convey verbatim copies of the Program's source code as you
 receive it, in any medium, provided that you conspicuously and
 appropriately publish on each copy an appropriate copyright notice;
 keep intact all notices stating that this License and any
 non-permissive terms added in accord with section 7 apply to the code;
 keep intact all notices of the absence of any warranty; and give all
 recipients a copy of this License along with the Program.
  You may charge any price or no price for each copy that you convey,
 and you may offer support or warranty protection for a fee.
  5. Conveying Modified Source Versions.
  You may convey a work based on the Program, or the modifications to
 produce it from the Program, in the form of source code under the
 terms of section 4, provided that you also meet all of these conditions:
    a) The work must carry prominent notices stating that you modified
    it, and giving a relevant date.
    b) The work must carry prominent notices stating that it is
    released under this License and any conditions added under section
    7.  This requirement modifies the requirement in section 4 to
    "keep intact all notices".
    c) You must license the entire work, as a whole, under this
    License to anyone who comes into possession of a copy.  This
    License will therefore apply, along with any applicable section 7
    additional terms, to the whole of the work, and all its parts,
    regardless of how they are packaged.  This License gives no
    permission to license the work in any other way, but it does not
    invalidate such permission if you have separately received it.
    d) If the work has interactive user interfaces, each must display
    Appropriate Legal Notices; however, if the Program has interactive
    interfaces that do not display Appropriate Legal Notices, your
    work need not make them do so.
  A compilation of a covered work with other separate and independent
 works, which are not by their nature extensions of the covered work,
 and which are not combined with it such as to form a larger program,
 in or on a volume of a storage or distribution medium, is called an
 "aggregate" if the compilation and its resulting copyright are not
 used to limit the access or legal rights of the compilation's users
 beyond what the individual works permit.  Inclusion of a covered work
 in an aggregate does not cause this License to apply to the other
 parts of the aggregate.
  6. Conveying Non-Source Forms.
  You may convey a covered work in object code form under the terms
 of sections 4 and 5, provided that you also convey the
 machine-readable Corresponding Source under the terms of this License,
 in one of these ways:
    a) Convey the object code in, or embodied in, a physical product
    (including a physical distribution medium), accompanied by the
    Corresponding Source fixed on a durable physical medium
    customarily used for software interchange.
    b) Convey the object code in, or embodied in, a physical product
    (including a physical distribution medium), accompanied by a
    written offer, valid for at least three years and valid for as
    long as you offer spare parts or customer support for that product
    model, to give anyone who possesses the object code either (1) a
    copy of the Corresponding Source for all the software in the
    product that is covered by this License, on a durable physical
    medium customarily used for software interchange, for a price no
    more than your reasonable cost of physically performing this
    conveying of source, or (2) access to copy the
    Corresponding Source from a network server at no charge.
    c) Convey individual copies of the object code with a copy of the
    written offer to provide the Corresponding Source.  This
    alternative is allowed only occasionally and noncommercially, and
    only if you received the object code with such an offer, in accord
    with subsection 6b.
    d) Convey the object code by offering access from a designated
    place (gratis or for a charge), and offer equivalent access to the
    Corresponding Source in the same way through the same place at no
    further charge.  You need not require recipients to copy the
    Corresponding Source along with the object code.  If the place to
    copy the object code is a network server, the Corresponding Source
    may be on a different server (operated by you or a third party)
    that supports equivalent copying facilities, provided you maintain
    clear directions next to the object code saying where to find the
    Corresponding Source.  Regardless of what server hosts the
    Corresponding Source, you remain obligated to ensure that it is
    available for as long as needed to satisfy these requirements.
    e) Convey the object code using peer-to-peer transmission, provided
    you inform other peers where the object code and Corresponding
    Source of the work are being offered to the general public at no
    charge under subsection 6d.
  A separable portion of the object code, whose source code is excluded
 from the Corresponding Source as a System Library, need not be
 included in conveying the object code work.
  A "User Product" is either (1) a "consumer product", which means any
 tangible personal property which is normally used for personal, family,
 or household purposes, or (2) anything designed or sold for incorporation
 into a dwelling.  In determining whether a product is a consumer product,
 doubtful cases shall be resolved in favor of coverage.  For a particular
 product received by a particular user, "normally used" refers to a
 typical or common use of that class of product, regardless of the status
 of the particular user or of the way in which the particular user
 actually uses, or expects or is expected to use, the product.  A product
 is a consumer product regardless of whether the product has substantial
 commercial, industrial or non-consumer uses, unless such uses represent
 the only significant mode of use of the product.
  "Installation Information" for a User Product means any methods,
 procedures, authorization keys, or other information required to install
 and execute modified versions of a covered work in that User Product from
 a modified version of its Corresponding Source.  The information must
 suffice to ensure that the continued functioning of the modified object
 code is in no case prevented or interfered with solely because
 modification has been made.
  If you convey an object code work under this section in, or with, or
 specifically for use in, a User Product, and the conveying occurs as
 part of a transaction in which the right of possession and use of the
 User Product is transferred to the recipient in perpetuity or for a
 fixed term (regardless of how the transaction is characterized), the
 Corresponding Source conveyed under this section must be accompanied
 by the Installation Information.  But this requirement does not apply
 if neither you nor any third party retains the ability to install
 modified object code on the User Product (for example, the work has
 been installed in ROM).
  The requirement to provide Installation Information does not include a
 requirement to continue to provide support service, warranty, or updates
 for a work that has been modified or installed by the recipient, or for
 the User Product in which it has been modified or installed.  Access to a
 network may be denied when the modification itself materially and
 adversely affects the operation of the network or violates the rules and
 protocols for communication across the network.
  Corresponding Source conveyed, and Installation Information provided,
 in accord with this section must be in a format that is publicly
 documented (and with an implementation available to the public in
 source code form), and must require no special password or key for
 unpacking, reading or copying.
  7. Additional Terms.
  "Additional permissions" are terms that supplement the terms of this
 License by making exceptions from one or more of its conditions.
 Additional permissions that are applicable to the entire Program shall
 be treated as though they were included in this License, to the extent
 that they are valid under applicable law.  If additional permissions
 apply only to part of the Program, that part may be used separately
 under those permissions, but the entire Program remains governed by
 this License without regard to the additional permissions.
  When you convey a copy of a covered work, you may at your option
 remove any additional permissions from that copy, or from any part of
 it.  (Additional permissions may be written to require their own
 removal in certain cases when you modify the work.)  You may place
 additional permissions on material, added by you to a covered work,
 for which you have or can give appropriate copyright permission.
  Notwithstanding any other provision of this License, for material you
 add to a covered work, you may (if authorized by the copyright holders of
 that material) supplement the terms of this License with terms:
    a) Disclaiming warranty or limiting liability differently from the
    terms of sections 15 and 16 of this License; or
    b) Requiring preservation of specified reasonable legal notices or
    author attributions in that material or in the Appropriate Legal
    Notices displayed by works containing it; or
    c) Prohibiting misrepresentation of the origin of that material, or
    requiring that modified versions of such material be marked in
    reasonable ways as different from the original version; or
    d) Limiting the use for publicity purposes of names of licensors or
    authors of the material; or
    e) Declining to grant rights under trademark law for use of some
    trade names, trademarks, or service marks; or
    f) Requiring indemnification of licensors and authors of that
    material by anyone who conveys the material (or modified versions of
    it) with contractual assumptions of liability to the recipient, for
    any liability that these contractual assumptions directly impose on
    those licensors and authors.
  All other non-permissive additional terms are considered "further
 restrictions" within the meaning of section 10.  If the Program as you
 received it, or any part of it, contains a notice stating that it is
 governed by this License along with a term that is a further
 restriction, you may remove that term.  If a license document contains
 a further restriction but permits relicensing or conveying under this
 License, you may add to a covered work material governed by the terms
 of that license document, provided that the further restriction does
 not survive such relicensing or conveying.
  If you add terms to a covered work in accord with this section, you
 must place, in the relevant source files, a statement of the
 additional terms that apply to those files, or a notice indicating
 where to find the applicable terms.
  Additional terms, permissive or non-permissive, may be stated in the
 form of a separately written license, or stated as exceptions;
 the above requirements apply either way.
  8. Termination.
  You may not propagate or modify a covered work except as expressly
 provided under this License.  Any attempt otherwise to propagate or
 modify it is void, and will automatically terminate your rights under
 this License (including any patent licenses granted under the third
 paragraph of section 11).
  However, if you cease all violation of this License, then your
 license from a particular copyright holder is reinstated (a)
 provisionally, unless and until the copyright holder explicitly and
 finally terminates your license, and (b) permanently, if the copyright
 holder fails to notify you of the violation by some reasonable means
 prior to 60 days after the cessation.
  Moreover, your license from a particular copyright holder is
 reinstated permanently if the copyright holder notifies you of the
 violation by some reasonable means, this is the first time you have
 received notice of violation of this License (for any work) from that
 copyright holder, and you cure the violation prior to 30 days after
 your receipt of the notice.
  Termination of your rights under this section does not terminate the
 licenses of parties who have received copies or rights from you under
 this License.  If your rights have been terminated and not permanently
 reinstated, you do not qualify to receive new licenses for the same
 material under section 10.
  9. Acceptance Not Required for Having Copies.
  You are not required to accept this License in order to receive or
 run a copy of the Program.  Ancillary propagation of a covered work
 occurring solely as a consequence of using peer-to-peer transmission
 to receive a copy likewise does not require acceptance.  However,
 nothing other than this License grants you permission to propagate or
 modify any covered work.  These actions infringe copyright if you do
 not accept this License.  Therefore, by modifying or propagating a
 covered work, you indicate your acceptance of this License to do so.
  10. Automatic Licensing of Downstream Recipients.
  Each time you convey a covered work, the recipient automatically
 receives a license from the original licensors, to run, modify and
 propagate that work, subject to this License.  You are not responsible
 for enforcing compliance by third parties with this License.
  An "entity transaction" is a transaction transferring control of an
 organization, or substantially all assets of one, or subdividing an
 organization, or merging organizations.  If propagation of a covered
 work results from an entity transaction, each party to that
 transaction who receives a copy of the work also receives whatever
 licenses to the work the party's predecessor in interest had or could
 give under the previous paragraph, plus a right to possession of the
 Corresponding Source of the work from the predecessor in interest, if
 the predecessor has it or can get it with reasonable efforts.
  You may not impose any further restrictions on the exercise of the
 rights granted or affirmed under this License.  For example, you may
 not impose a license fee, royalty, or other charge for exercise of
 rights granted under this License, and you may not initiate litigation
 (including a cross-claim or counterclaim in a lawsuit) alleging that
 any patent claim is infringed by making, using, selling, offering for
 sale, or importing the Program or any portion of it.
  11. Patents.
  A "contributor" is a copyright holder who authorizes use under this
 License of the Program or a work on which the Program is based.  The
 work thus licensed is called the contributor's "contributor version".
  A contributor's "essential patent claims" are all patent claims
 owned or controlled by the contributor, whether already acquired or
 hereafter acquired, that would be infringed by some manner, permitted
 by this License, of making, using, or selling its contributor version,
 but do not include claims that would be infringed only as a
 consequence of further modification of the contributor version.  For
 purposes of this definition, "control" includes the right to grant
 patent sublicenses in a manner consistent with the requirements of
 this License.
  Each contributor grants you a non-exclusive, worldwide, royalty-free
 patent license under the contributor's essential patent claims, to
 make, use, sell, offer for sale, import and otherwise run, modify and
 propagate the contents of its contributor version.
  In the following three paragraphs, a "patent license" is any express
 agreement or commitment, however denominated, not to enforce a patent
 (such as an express permission to practice a patent or covenant not to
 sue for patent infringement).  To "grant" such a patent license to a
 party means to make such an agreement or commitment not to enforce a
 patent against the party.
  If you convey a covered work, knowingly relying on a patent license,
 and the Corresponding Source of the work is not available for anyone
 to copy, free of charge and under the terms of this License, through a
 publicly available network server or other readily accessible means,
 then you must either (1) cause the Corresponding Source to be so
 available, or (2) arrange to deprive yourself of the benefit of the
 patent license for this particular work, or (3) arrange, in a manner
 consistent with the requirements of this License, to extend the patent
 license to downstream recipients.  "Knowingly relying" means you have
 actual knowledge that, but for the patent license, your conveying the
 covered work in a country, or your recipient's use of the covered work
 in a country, would infringe one or more identifiable patents in that
 country that you have reason to believe are valid.
  If, pursuant to or in connection with a single transaction or
 arrangement, you convey, or propagate by procuring conveyance of, a
 covered work, and grant a patent license to some of the parties
 receiving the covered work authorizing them to use, propagate, modify
 or convey a specific copy of the covered work, then the patent license
 you grant is automatically extended to all recipients of the covered
 work and works based on it.
  A patent license is "discriminatory" if it does not include within
 the scope of its coverage, prohibits the exercise of, or is
 conditioned on the non-exercise of one or more of the rights that are
 specifically granted under this License.  You may not convey a covered
 work if you are a party to an arrangement with a third party that is
 in the business of distributing software, under which you make payment
 to the third party based on the extent of your activity of conveying
 the work, and under which the third party grants, to any of the
 parties who would receive the covered work from you, a discriminatory
 patent license (a) in connection with copies of the covered work
 conveyed by you (or copies made from those copies), or (b) primarily
 for and in connection with specific products or compilations that
 contain the covered work, unless you entered into that arrangement,
 or that patent license was granted, prior to 28 March 2007.
  Nothing in this License shall be construed as excluding or limiting
 any implied license or other defenses to infringement that may
 otherwise be available to you under applicable patent law.
  12. No Surrender of Others' Freedom.
  If conditions are imposed on you (whether by court order, agreement or
 otherwise) that contradict the conditions of this License, they do not
 excuse you from the conditions of this License.  If you cannot convey a
 covered work so as to satisfy simultaneously your obligations under this
 License and any other pertinent obligations, then as a consequence you may
 not convey it at all.  For example, if you agree to terms that obligate you
 to collect a royalty for further conveying from those to whom you convey
 the Program, the only way you could satisfy both those terms and this
 License would be to refrain entirely from conveying the Program.
  13. Use with the GNU Affero General Public License.
  Notwithstanding any other provision of this License, you have
 permission to link or combine any covered work with a work licensed
 under version 3 of the GNU Affero General Public License into a single
 combined work, and to convey the resulting work.  The terms of this
 License will continue to apply to the part which is the covered work,
 but the special requirements of the GNU Affero General Public License,
 section 13, concerning interaction through a network will apply to the
 combination as such.
  14. Revised Versions of this License.
  The Free Software Foundation may publish revised and/or new versions of
 the GNU General Public License from time to time.  Such new versions will
 be similar in spirit to the present version, but may differ in detail to
 address new problems or concerns.
  Each version is given a distinguishing version number.  If the
 Program specifies that a certain numbered version of the GNU General
 Public License "or any later version" applies to it, you have the
 option of following the terms and conditions either of that numbered
 version or of any later version published by the Free Software
 Foundation.  If the Program does not specify a version number of the
 GNU General Public License, you may choose any version ever published
 by the Free Software Foundation.
  If the Program specifies that a proxy can decide which future
 versions of the GNU General Public License can be used, that proxy's
 public statement of acceptance of a version permanently authorizes you
 to choose that version for the Program.
  Later license versions may give you additional or different
 permissions.  However, no additional obligations are imposed on any
 author or copyright holder as a result of your choosing to follow a
 later version.
  15. Disclaimer of Warranty.
  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
 APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
 HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
 OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
 THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
 IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
 ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
  16. Limitation of Liability.
  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
 WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
 THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
 GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
 USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
 DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
 PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
 EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
 SUCH DAMAGES.
  17. Interpretation of Sections 15 and 16.
  If the disclaimer of warranty and limitation of liability provided
 above cannot be given local legal effect according to their terms,
 reviewing courts shall apply local law that most closely approximates
 an absolute waiver of all civil liability in connection with the
 Program, unless a warranty or assumption of liability accompanies a
 copy of the Program in return for a fee.
                     END OF TERMS AND CONDITIONS
            How to Apply These Terms to Your New Programs
  If you develop a new program, and you want it to be of the greatest
 possible use to the public, the best way to achieve this is to make it
 free software which everyone can redistribute and change under these terms.
  To do so, attach the following notices to the program.  It is safest
 to attach them to the start of each source file to most effectively
 state the exclusion of warranty; and each file should have at least
 the "copyright" line and a pointer to where the full notice is found.
    <one line to give the program's name and a brief idea of what it does.>
    Copyright (C) <year>  <name of author>
    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.
    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.
    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 Also add information on how to contact you by electronic and paper mail.
  If the program does terminal interaction, make it output a short
 notice like this when it starts in an interactive mode:
    <program>  Copyright (C) <year>  <name of author>
    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
    This is free software, and you are welcome to redistribute it
    under certain conditions; type `show c' for details.
 The hypothetical commands `show w' and `show c' should show the appropriate
 parts of the General Public License.  Of course, your program's commands
 might be different; for a GUI interface, you would use an "about box".
  You should also get your employer (if you work as a programmer) or school,
 if any, to sign a "copyright disclaimer" for the program, if necessary.
 For more information on this, and how to apply and follow the GNU GPL, see
 <http://www.gnu.org/licenses/>.
  The GNU General Public License does not permit incorporating your program
 into proprietary programs.  If your program is a subroutine library, you
 may consider it more useful to permit linking proprietary applications with
 the library.  If this is what you want to do, use the GNU Lesser General
 Public License instead of this License.  But first, please read
 <http://www.gnu.org/philosophy/why-not-lgpl.html>.
--- a/1467
+++ b/1467
--- a/MANUAL.markdown
+++ b/MANUAL.markdown
--- a/590
+++ b/590
@ -0,0 +1,590 @@
 #
 # Copyright 2015, Daehwan Kim <infphilo@gmail.com>
 #
 # This file is part of HISAT2.
 #
 # HISAT 2 is free software: you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
 # the Free Software Foundation, either version 3 of the License, or
 # (at your option) any later version.
 #
 # HISAT 2 is distributed in the hope that it will be useful,
 # but WITHOUT ANY WARRANTY; without even the implied warranty of
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 # GNU General Public License for more details.
 #
 # You should have received a copy of the GNU General Public License
 # along with HISAT.  If not, see <http://www.gnu.org/licenses/>.
 #
 #
 # Makefile for hisat2-align, hisat2-build, hisat2-inspect
 #
 INC =
 GCC_PREFIX = $(shell dirname `which gcc`)
 GCC_SUFFIX =
 CC = $(GCC_PREFIX)/gcc$(GCC_SUFFIX)
 CPP = $(GCC_PREFIX)/g++$(GCC_SUFFIX)
 CXX = $(CPP)
 HEADERS = $(wildcard *.h)
 BOWTIE_MM = 1
 BOWTIE_SHARED_MEM = 0
 # Detect Cygwin or MinGW
 WINDOWS = 0
 CYGWIN = 0
 MINGW = 0
 ifneq (,$(findstring CYGWIN,$(shell uname)))
 	WINDOWS = 1 
 	CYGWIN = 1
 	# POSIX memory-mapped files not currently supported on Windows
 	BOWTIE_MM = 0
 	BOWTIE_SHARED_MEM = 0
 else
 	ifneq (,$(findstring MINGW,$(shell uname)))
 		WINDOWS = 1
 		MINGW = 1
 		# POSIX memory-mapped files not currently supported on Windows
 		BOWTIE_MM = 0
 		BOWTIE_SHARED_MEM = 0
 	endif
 endif
 MACOS = 0
 ifneq (,$(findstring Darwin,$(shell uname)))
 	MACOS = 1
 endif
 EXTRA_FLAGS += -DPOPCNT_CAPABILITY -std=c++11
 INC += -I. -I third_party 
 MM_DEF = 
 ifeq (1,$(BOWTIE_MM))
 	MM_DEF = -DBOWTIE_MM
 endif
 SHMEM_DEF = 
 ifeq (1,$(BOWTIE_SHARED_MEM))
 	SHMEM_DEF = -DBOWTIE_SHARED_MEM
 endif
 PTHREAD_PKG =
 PTHREAD_LIB = 
 ifeq (1,$(MINGW))
 	PTHREAD_LIB = 
 else
 	PTHREAD_LIB = -lpthread
 endif
 SEARCH_LIBS = 
 BUILD_LIBS = 
 INSPECT_LIBS =
 ifeq (1,$(MINGW))
 	BUILD_LIBS = 
 	INSPECT_LIBS = 
 endif
 USE_SRA = 0
 SRA_DEF =
 SRA_LIB =
 SERACH_INC = 
 ifeq (1,$(USE_SRA))
 	SRA_DEF = -DUSE_SRA
 	SRA_LIB = -lncbi-ngs-c++-static -lngs-c++-static -lncbi-vdb-static -ldl
 	SEARCH_INC += -I$(NCBI_NGS_DIR)/include -I$(NCBI_VDB_DIR)/include
 	SEARCH_LIBS += -L$(NCBI_NGS_DIR)/lib64 -L$(NCBI_VDB_DIR)/lib64
 endif
 LIBS = $(PTHREAD_LIB)
 HT2LIB_DIR = hisat2lib
 HT2LIB_CPPS = $(HT2LIB_DIR)/ht2_init.cpp \
 			  $(HT2LIB_DIR)/ht2_repeat.cpp \
 			  $(HT2LIB_DIR)/ht2_index.cpp
 SHARED_CPPS = ccnt_lut.cpp ref_read.cpp alphabet.cpp shmem.cpp \
 	edit.cpp gfm.cpp \
 	reference.cpp ds.cpp multikey_qsort.cpp limit.cpp \
 	random_source.cpp tinythread.cpp utility_3n.cpp
 SEARCH_CPPS = qual.cpp pat.cpp \
 	read_qseq.cpp aligner_seed_policy.cpp \
 	aligner_seed.cpp \
 	aligner_seed2.cpp \
 	aligner_sw.cpp \
 	aligner_sw_driver.cpp aligner_cache.cpp \
 	aligner_result.cpp ref_coord.cpp mask.cpp \
 	pe.cpp aln_sink.cpp dp_framer.cpp \
 	scoring.cpp presets.cpp unique.cpp \
 	simple_func.cpp \
 	random_util.cpp \
 	aligner_bt.cpp sse_util.cpp \
 	aligner_swsse.cpp outq.cpp \
 	aligner_swsse_loc_i16.cpp \
 	aligner_swsse_ee_i16.cpp \
 	aligner_swsse_loc_u8.cpp \
 	aligner_swsse_ee_u8.cpp \
 	aligner_driver.cpp \
 	splice_site.cpp \
 	alignment_3n.cpp \
 	position_3n.cpp \
 	$(HT2LIB_CPPS)
 BUILD_CPPS = diff_sample.cpp
 REPEAT_CPPS = \
 	mask.cpp \
 	qual.cpp \
 	aligner_bt.cpp \
 	scoring.cpp \
 	simple_func.cpp \
 	dp_framer.cpp \
 	aligner_result.cpp \
 	aligner_sw_driver.cpp \
 	aligner_sw.cpp \
 	aligner_swsse_ee_i16.cpp \
 	aligner_swsse_ee_u8.cpp \
 	aligner_swsse_loc_i16.cpp \
 	aligner_swsse_loc_u8.cpp \
 	aligner_swsse.cpp \
 	bit_packed_array.cpp \
 	repeat_builder.cpp
 THREE_N_HEADERS = \
 	position_3n_table.h \
 	alignment_3n_table.h \
 	utility_3n_table.h
 HISAT2_CPPS_MAIN = $(SEARCH_CPPS) hisat2_main.cpp
 HISAT2_BUILD_CPPS_MAIN = $(BUILD_CPPS) hisat2_build_main.cpp
 HISAT2_REPEAT_CPPS_MAIN = $(REPEAT_CPPS) $(BUILD_CPPS) hisat2_repeat_main.cpp
 SEARCH_FRAGMENTS = $(wildcard search_*_phase*.c)
 VERSION := $(shell cat HISAT2_VERSION)
 # Convert BITS=?? to a -m flag
 BITS=32
 ifeq (x86_64,$(shell uname -m))
 BITS=64
 endif
 # msys will always be 32 bit so look at the cpu arch instead.
 ifneq (,$(findstring AMD64,$(PROCESSOR_ARCHITEW6432)))
 	ifeq (1,$(MINGW))
 		BITS=64
 	endif
 endif
 BITS_FLAG =
 ifeq (32,$(BITS))
 	BITS_FLAG = -m32
 endif
 ifeq (64,$(BITS))
 	BITS_FLAG = -m64
 endif
 SSE_FLAG=-msse2
 DEBUG_FLAGS    = -O0 -g3 $(BITS_FLAG) $(SSE_FLAG)
 DEBUG_DEFS     = -DCOMPILER_OPTIONS="\"$(DEBUG_FLAGS) $(EXTRA_FLAGS)\""
 RELEASE_FLAGS  = -O3 $(BITS_FLAG) $(SSE_FLAG) -funroll-loops -g3
 RELEASE_DEFS   = -DCOMPILER_OPTIONS="\"$(RELEASE_FLAGS) $(EXTRA_FLAGS)\""
 NOASSERT_FLAGS = -DNDEBUG
 FILE_FLAGS     = -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
 HT2LIB_FLAGS   = -DHISAT2_BUILD_LIB
 ifeq (1,$(USE_SRA))
 	ifeq (1, $(MACOS))
 		SRA_LIB += -stdlib=libc++
 		DEBUG_FLAGS += -mmacosx-version-min=10.10
 		RELEASE_FLAGS += -mmacosx-version-min=10.10
 	endif
 endif
 HISAT2_BIN_LIST = hisat2-build-s \
 	hisat2-build-l \
 	hisat2-align-s \
 	hisat2-align-l \
 	hisat2-inspect-s \
 	hisat2-inspect-l \
 	hisat2-repeat \
 	hisat-3n-table
 HISAT2_BIN_LIST_AUX = hisat2-build-s-debug \
 	hisat2-build-l-debug \
 	hisat2-align-s-debug \
 	hisat2-align-l-debug \
 	hisat2-inspect-s-debug \
 	hisat2-inspect-l-debug \
 	hisat2-repeat-debug
 HT2LIB_SRCS = $(SHARED_CPPS) \
              $(HT2LIB_CPPS)
 HT2LIB_OBJS = $(HT2LIB_SRCS:.cpp=.o)
 HT2LIB_DEBUG_OBJS = $(addprefix .ht2lib-obj-debug/,$(HT2LIB_OBJS))
 HT2LIB_RELEASE_OBJS = $(addprefix .ht2lib-obj-release/,$(HT2LIB_OBJS))
 HT2LIB_SHARED_DEBUG_OBJS = $(addprefix .ht2lib-obj-debug-shared/,$(HT2LIB_OBJS))
 HT2LIB_SHARED_RELEASE_OBJS = $(addprefix .ht2lib-obj-release-shared/,$(HT2LIB_OBJS))
 HT2LIB_PKG_SRC = \
 	$(HT2LIB_DIR)/ht2_init.cpp \
 	$(HT2LIB_DIR)/ht2_repeat.cpp \
 	$(HT2LIB_DIR)/ht2_index.cpp \
 	$(HT2LIB_DIR)/ht2.h \
 	$(HT2LIB_DIR)/ht2_handle.h \
 	$(HT2LIB_DIR)/java_jni/Makefile \
 	$(HT2LIB_DIR)/java_jni/ht2module.c \
 	$(HT2LIB_DIR)/java_jni/HT2Module.java \
 	$(HT2LIB_DIR)/java_jni/HT2ModuleExample.java \
 	$(HT2LIB_DIR)/pymodule/Makefile \
 	$(HT2LIB_DIR)/pymodule/ht2module.c \
 	$(HT2LIB_DIR)/pymodule/setup.py \
 	$(HT2LIB_DIR)/pymodule/ht2example.py
 GENERAL_LIST = $(wildcard scripts/*.sh) \
 	$(wildcard scripts/*.pl) \
 	$(wildcard *.py) \
 	$(wildcard example/index/*.ht2) \
 	$(wildcard example/reads/*.fa) \
 	example/reference/22_20-21M.fa \
 	example/reference/22_20-21M.snp \
 	$(PTHREAD_PKG) \
 	hisat2 \
 	hisat2-build \
 	hisat2-inspect \
 	AUTHORS \
 	LICENSE \
 	NEWS \
 	MANUAL \
 	MANUAL.markdown \
 	TUTORIAL \
 	HISAT2_VERSION
 ifeq (1,$(WINDOWS))
 	HISAT2_BIN_LIST := $(HISAT2_BIN_LIST) hisat2.bat hisat2-build.bat hisat2-inspect.bat 
 endif
 # This is helpful on Windows under MinGW/MSYS, where Make might go for
 # the Windows FIND tool instead.
 FIND=$(shell which find)
 SRC_PKG_LIST = $(wildcard *.h) \
 	$(wildcard *.hh) \
 	$(wildcard *.c) \
 	$(wildcard *.cpp) \
 	$(HT2LIB_PKG_SRC) \
 	Makefile \
 	CMakeLists.txt \
 	$(GENERAL_LIST)
 BIN_PKG_LIST = $(GENERAL_LIST)
 .PHONY: all allall both both-debug
 all: $(HISAT2_BIN_LIST)
 allall: $(HISAT2_BIN_LIST) $(HISAT2_BIN_LIST_AUX)
 both: hisat2-align-s hisat2-align-l hisat2-build-s hisat2-build-l
 both-debug: hisat2-align-s-debug hisat2-align-l-debug hisat2-build-s-debug hisat2-build-l-debug
 repeat: hisat2-repeat
 repeat-debug: hisat2-repeat-debug
 DEFS :=-fno-strict-aliasing \
     -DHISAT2_VERSION="\"`cat HISAT2_VERSION`\"" \
     -DBUILD_HOST="\"`hostname`\"" \
     -DBUILD_TIME="\"`date`\"" \
     -DCOMPILER_VERSION="\"`$(CXX) -v 2>&1 | tail -1`\"" \
     $(FILE_FLAGS) \
     $(PREF_DEF) \
     $(MM_DEF) \
     $(SHMEM_DEF)
 #
 # hisat-bp targets
 #
 hisat-bp-bin: hisat_bp.cpp $(SEARCH_CPPS) $(SHARED_CPPS) $(HEADERS) $(SEARCH_FRAGMENTS)
 	$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) -DBOWTIE2 $(NOASSERT_FLAGS) -Wall \
 	$(INC) \
 	-o $@ $< \
 	$(SHARED_CPPS) $(HISAT_CPPS_MAIN) \
 	$(LIBS) $(SEARCH_LIBS)
 hisat-bp-bin-debug: hisat_bp.cpp $(SEARCH_CPPS) $(SHARED_CPPS) $(HEADERS) $(SEARCH_FRAGMENTS)
 	$(CXX) $(DEBUG_FLAGS) \
 	$(DEBUG_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) -DBOWTIE2 -Wall \
 	$(INC) \
 	-o $@ $< \
 	$(SHARED_CPPS) $(HISAT_CPPS_MAIN) \
 	$(LIBS) $(SEARCH_LIBS)
 #
 # hisat2-repeat targets
 #
 hisat2-repeat: hisat2_repeat.cpp $(REPEAT_CPPS) $(SHARED_CPPS) $(HEADERS)
 	$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) -DBOWTIE2 -DBOWTIE_64BIT_INDEX $(NOASSERT_FLAGS) -Wall \
 	$(INC) \
 	-o $@ $< \
 	$(SHARED_CPPS) $(HISAT2_REPEAT_CPPS_MAIN) \
 	$(LIBS) $(BUILD_LIBS)
 hisat2-repeat-debug: hisat2_repeat.cpp $(REPEAT_CPPS) $(SHARED_CPPS) $(HEADERS)
 	$(CXX) $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) -DBOWTIE2 -DBOWTIE_64BIT_INDEX -Wall \
 	$(INC) \
 	-o $@ $< \
 	$(SHARED_CPPS) $(HISAT2_REPEAT_CPPS_MAIN) \
 	$(LIBS) $(BUILD_LIBS)
 #
 # hisat2-build targets
 #
 hisat2-build-s: hisat2_build.cpp $(SHARED_CPPS) $(HEADERS)
 	$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) -DBOWTIE2 $(NOASSERT_FLAGS) -Wall -DMASSIVE_DATA_RLCSA \
 	$(INC) \
 	-o $@ $< \
 	$(SHARED_CPPS) $(HISAT2_BUILD_CPPS_MAIN) \
 	$(LIBS) $(BUILD_LIBS)
 hisat2-build-l: hisat2_build.cpp $(SHARED_CPPS) $(HEADERS)
 	$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) -DBOWTIE2 -DBOWTIE_64BIT_INDEX $(NOASSERT_FLAGS) -Wall \
 	$(INC) \
 	-o $@ $< \
 	$(SHARED_CPPS) $(HISAT2_BUILD_CPPS_MAIN) \
 	$(LIBS) $(BUILD_LIBS)
 hisat2-build-s-debug: hisat2_build.cpp $(SHARED_CPPS) $(HEADERS)
 	$(CXX) $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) -DBOWTIE2 -Wall -DMASSIVE_DATA_RLCSA \
 	$(INC) \
 	-o $@ $< \
 	$(SHARED_CPPS) $(HISAT2_BUILD_CPPS_MAIN) \
 	$(LIBS) $(BUILD_LIBS)
 hisat2-build-l-debug: hisat2_build.cpp $(SHARED_CPPS) $(HEADERS)
 	$(CXX) $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) -DBOWTIE2 -DBOWTIE_64BIT_INDEX -Wall \
 	$(INC) \
 	-o $@ $< \
 	$(SHARED_CPPS) $(HISAT2_BUILD_CPPS_MAIN) \
 	$(LIBS) $(BUILD_LIBS)
 #
 # hisat2 targets
 #
 hisat2-align-s: hisat2.cpp $(SEARCH_CPPS) $(SHARED_CPPS) $(HEADERS) $(SEARCH_FRAGMENTS)
 	$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) $(SRA_DEF) -DBOWTIE2 $(NOASSERT_FLAGS) -Wall \
 	$(INC) $(SEARCH_INC) \
 	-o $@ $< \
 	$(SHARED_CPPS) $(HISAT2_CPPS_MAIN) \
 	$(LIBS) $(SRA_LIB) $(SEARCH_LIBS)
 hisat2-align-l: hisat2.cpp $(SEARCH_CPPS) $(SHARED_CPPS) $(HEADERS) $(SEARCH_FRAGMENTS)
 	$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) $(SRA_DEF) -DBOWTIE2 -DBOWTIE_64BIT_INDEX $(NOASSERT_FLAGS) -Wall \
 	$(INC) $(SEARCH_INC) \
 	-o $@ $< \
 	$(SHARED_CPPS) $(HISAT2_CPPS_MAIN) \
 	$(LIBS) $(SRA_LIB) $(SEARCH_LIBS)
 hisat2-align-s-debug: hisat2.cpp $(SEARCH_CPPS) $(SHARED_CPPS) $(HEADERS) $(SEARCH_FRAGMENTS)
 	$(CXX) $(DEBUG_FLAGS) \
 	$(DEBUG_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) $(SRA_DEF) -DBOWTIE2 -Wall \
 	$(INC) $(SEARCH_INC) \
 	-o $@ $< \
 	$(SHARED_CPPS) $(HISAT2_CPPS_MAIN) \
 	$(LIBS) $(SRA_LIB) $(SEARCH_LIBS)
 hisat2-align-l-debug: hisat2.cpp $(SEARCH_CPPS) $(SHARED_CPPS) $(HEADERS) $(SEARCH_FRAGMENTS)
 	$(CXX) $(DEBUG_FLAGS) \
 	$(DEBUG_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) $(SRA_DEF) -DBOWTIE2 -DBOWTIE_64BIT_INDEX -Wall \
 	$(INC) $(SEARCH_INC) \
 	-o $@ $< \
 	$(SHARED_CPPS) $(HISAT2_CPPS_MAIN) \
 	$(LIBS) $(SRA_LIB) $(SEARCH_LIBS)
 #
 # hisat2-inspect targets
 #
 hisat2-inspect-s: hisat2_inspect.cpp $(HEADERS) $(SHARED_CPPS)
 	$(CXX) $(RELEASE_FLAGS) \
 	$(RELEASE_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) -DBOWTIE2 -DHISAT2_INSPECT_MAIN -Wall \
 	$(INC) -I . \
 	-o $@ $< \
 	$(SHARED_CPPS) \
 	$(LIBS) $(INSPECT_LIBS)
 hisat2-inspect-l: hisat2_inspect.cpp $(HEADERS) $(SHARED_CPPS)
 	$(CXX) $(RELEASE_FLAGS) \
 	$(RELEASE_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) -DBOWTIE2 -DBOWTIE_64BIT_INDEX -DHISAT2_INSPECT_MAIN -Wall \
 	$(INC) -I . \
 	-o $@ $< \
 	$(SHARED_CPPS) \
 	$(LIBS) $(INSPECT_LIBS)
 hisat2-inspect-s-debug: hisat2_inspect.cpp $(HEADERS) $(SHARED_CPPS) 
 	$(CXX) $(DEBUG_FLAGS) \
 	$(DEBUG_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) -DBOWTIE2 -DHISAT2_INSPECT_MAIN -Wall \
 	$(INC) -I . \
 	-o $@ $< \
 	$(SHARED_CPPS) \
 	$(LIBS) $(INSPECT_LIBS)
 hisat2-inspect-l-debug: hisat2_inspect.cpp $(HEADERS) $(SHARED_CPPS) 
 	$(CXX) $(DEBUG_FLAGS) \
 	$(DEBUG_DEFS) $(EXTRA_FLAGS) \
 	$(DEFS) -DBOWTIE2 -DBOWTIE_64BIT_INDEX -DHISAT2_INSPECT_MAIN -Wall \
 	$(INC) -I . \
 	-o $@ $< \
 	$(SHARED_CPPS) \
 	$(LIBS) $(INSPECT_LIBS)
 #
 # hisat-3n-table targets
 #
 hisat-3n-table: hisat_3n_table.cpp $(THREE_N_HEADERS)
 	$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) $(NOASSERT_FLAGS) $(DEFS) -pthread -o $@ $<
 #
 # HT2LIB targets
 #
 ht2lib: libhisat2lib-debug.a libhisat2lib.a libhisat2lib-debug.so libhisat2lib.so
 libhisat2lib-debug.a: $(HT2LIB_DEBUG_OBJS)
 	ar rc $@ $(HT2LIB_DEBUG_OBJS) 
 libhisat2lib.a: $(HT2LIB_RELEASE_OBJS)
 	ar rc $@ $(HT2LIB_RELEASE_OBJS) 
 libhisat2lib-debug.so: $(HT2LIB_SHARED_DEBUG_OBJS)
 	$(CXX) $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) $(DEFS) $(SRA_DEF) -DBOWTIE2 -Wall $(INC) $(SEARCH_INC) \
 	-shared -o $@  $(HT2LIB_SHARED_DEBUG_OBJS) $(LIBS) $(SRA_LIB) $(SEARCH_LIBS)
 libhisat2lib.so: $(HT2LIB_SHARED_RELEASE_OBJS)
 	$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) $(DEFS) $(SRA_DEF) -DBOWTIE2 $(NOASSERT_FLAGS) -Wall  $(INC) $(SEARCH_INC)\
 	-shared -o $@ $(HT2LIB_SHARED_RELEASE_OBJS) $(LIBS) $(SRA_LIB) $(SEARCH_LIBS)
 .ht2lib-obj-debug/%.o: %.cpp
 	@mkdir -p $(dir $@)/$(dir $<)
 	$(CXX) -fPIC $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) $(DEFS) $(SRA_DEF) $(HT2LIB_FLAGS) -DBOWTIE2 -Wall $(INC) $(SEARCH_INC) \
 	-c -o $@ $< 
 .ht2lib-obj-release/%.o: %.cpp
 	@mkdir -p $(dir $@)/$(dir $<)
 	$(CXX) -fPIC $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) $(DEFS) $(SRA_DEF) $(HT2LIB_FLAGS) -DBOWTIE2 $(NOASSERT_FLAGS) -Wall $(INC) $(SEARCH_INC) \
 	-c -o $@ $< 
 .ht2lib-obj-debug-shared/%.o: %.cpp
 	@mkdir -p $(dir $@)/$(dir $<)
 	$(CXX) -fPIC $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) $(DEFS) $(SRA_DEF) $(HT2LIB_FLAGS) -DBOWTIE2 -Wall $(INC) $(SEARCH_INC) \
 	-c -o $@ $< 
 .ht2lib-obj-release-shared/%.o: %.cpp
 	@mkdir -p $(dir $@)/$(dir $<)
 	$(CXX) -fPIC $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) $(DEFS) $(SRA_DEF) $(HT2LIB_FLAGS) -DBOWTIE2 $(NOASSERT_FLAGS) -Wall $(INC) $(SEARCH_INC) \
 	-c -o $@ $< 
 #
 # repeatexp
 #
 repeatexp:
 	g++ -o repeatexp repeatexp.cpp -I hisat2lib libhisat2lib.a
 hisat2: ;
 hisat2.bat:
 	echo "@echo off" > hisat2.bat
 	echo "perl %~dp0/hisat2 %*" >> hisat2.bat
 hisat2-build.bat:
 	echo "@echo off" > hisat2-build.bat
 	echo "python %~dp0/hisat2-build %*" >> hisat2-build.bat
 hisat2-inspect.bat:
 	echo "@echo off" > hisat2-inspect.bat
 	echo "python %~dp0/hisat2-inspect %*" >> hisat2-inspect.bat
 .PHONY: hisat2-src
 hisat2-src: $(SRC_PKG_LIST)
 	chmod a+x scripts/*.sh scripts/*.pl
 	mkdir .src.tmp
 	mkdir .src.tmp/hisat2-$(VERSION)
 	zip tmp.zip $(SRC_PKG_LIST)
 	mv tmp.zip .src.tmp/hisat2-$(VERSION)
 	cd .src.tmp/hisat2-$(VERSION) ; unzip tmp.zip ; rm -f tmp.zip
 	cd .src.tmp ; zip -r hisat2-$(VERSION)-source.zip hisat2-$(VERSION)
 	cp .src.tmp/hisat2-$(VERSION)-source.zip .
 	rm -rf .src.tmp
 .PHONY: hisat2-bin
 hisat2-bin: $(BIN_PKG_LIST) $(HISAT2_BIN_LIST) $(HISAT2_BIN_LIST_AUX)
 	chmod a+x scripts/*.sh scripts/*.pl
 	rm -rf .bin.tmp
 	mkdir .bin.tmp
 	mkdir .bin.tmp/hisat2-$(VERSION)
 	if [ -f hisat2.exe ] ; then \
 		zip tmp.zip $(BIN_PKG_LIST) $(addsuffix .exe,$(HISAT2_BIN_LIST) $(HISAT2_BIN_LIST_AUX)) ; \
 	else \
 		zip tmp.zip $(BIN_PKG_LIST) $(HISAT2_BIN_LIST) $(HISAT2_BIN_LIST_AUX) ; \
 	fi
 	mv tmp.zip .bin.tmp/hisat2-$(VERSION)
 	cd .bin.tmp/hisat2-$(VERSION) ; unzip tmp.zip ; rm -f tmp.zip
 	cd .bin.tmp ; zip -r hisat2-$(VERSION)-$(BITS).zip hisat2-$(VERSION)
 	cp .bin.tmp/hisat2-$(VERSION)-$(BITS).zip .
 	rm -rf .bin.tmp
 .PHONY: doc
 doc: doc/manual.inc.html MANUAL
 doc/manual.inc.html: MANUAL.markdown
 	pandoc -T "HISAT2 Manual" -o $@ \
 	 --from markdown --to HTML --toc $^
 	perl -i -ne \
 	 '$$w=0 if m|^</body>|;print if $$w;$$w=1 if m|^<body>|;' $@
 MANUAL: MANUAL.markdown
 	perl doc/strip_markdown.pl < $^ > $@
 .PHONY: clean
 clean:
 	rm -f $(HISAT2_BIN_LIST) $(HISAT2_BIN_LIST_AUX) \
 	$(addsuffix .exe,$(HISAT2_BIN_LIST) $(HISAT2_BIN_LIST_AUX)) \
 	hisat2-src.zip hisat2-bin.zip
 	rm -f core.* .tmp.head
 	rm -rf *.dSYM
 	rm -rf .ht2lib-obj*
 	rm -f libhisat2lib*.a libhisat2lib*.so
 .PHONY: push-doc
 push-doc: doc/manual.inc.html
 	scp doc/*.*html doc/indexes.txt salz-dmz:/ccb/salz7-data/www/ccb.jhu.edu/html/software/hisat2/
--- a/16
+++ b/16
@ -0,0 +1,16 @@
 HISAT 2 NEWS
 =============
 HISAT 2 is now available for download from the project website,
 http://bowtie-bio.sf.net/bowtie2.  2.0.0-beta is the first version released to
 the public and 2.0.7 is the latest version.  HISAT 2 is licensed under
 the GPLv3 license.  See `LICENSE' file for details.
 Version Release History
 =======================
 Version 2.0.0-beta - August XX, 2015
   * Improved multithreading support so that Bowtie 2 now uses native Windows
     threads when compiled on Windows and uses a faster mutex.  Threading
     performance should improve on all platforms.
--- a/README.md
+++ b/README.md
@ -0,0 +1,247 @@
 HISAT-3N
 ============
 Overview
 -----------------
 HISAT-3N (hierarchical indexing for spliced alignment of transcripts - 3 nucleotides)
 is an ultrafast and memory-efficient sequence aligner designed for nucleotide conversion
 sequencing technologies. HISAT-3N index contains two HISAT2 indexes which require memory small:
 for the human genome, it requires 9 GB for standard 3N-index and 10.5 GB for repeat 3N-index.
 The repeat 3N-index could be used to align one read to thousands position 3 times faster standard 3N-index.
 HISAT-3N is developed based on [HISAT2],
 which is particularly optimized for RNA sequencing technology. HISAT-3N support both strand-specific and non-strand reads.
 HISAT-3N can be used for any base-converted sequencing reads include [BS-seq], [SLAM-seq], [scBS-seq], [scSLAM-seq], and [TAPS].
 See the [HISAT-3N] website for more information.
 [HISAT2]:https://github.com/DaehwanKimLab/hisat2
 [BS-seq]: https://en.wikipedia.org/wiki/Bisulfite_sequencing
 [SLAM-seq]: https://www.nature.com/articles/nmeth.4435
 [scBS-seq]: https://www.nature.com/articles/nmeth.3035
 [scSLAM-seq]: https://www.nature.com/articles/s41586-019-1369-y
 [TAPS]: https://www.nature.com/articles/s41587-019-0041-2
 [HISAT-3N]:https://daehwankimlab.github.io/hisat2/hisat-3n
 Getting started
 ============
 HISAT-3N requires a 64-bit computer running either Linux or Mac OS X and at least 16 GB of RAM.
 A few notes:
 1. Building the standard 3N index requires 16GB of RAM or less.
 2. Building the repeat 3N index requires 256GB of RAM.
 3. The alignment process using either the standard or repeat index requires less than 16GB of RAM.
 4. [SAMtools] is required to sort SAM files in order to generate a HISAT-3N table.
 Install
 ------------
    git clone https://github.com/DaehwanKimLab/hisat2.git hisat-3n
    cd hisat-3n
    git checkout -b hisat-3n origin/hisat-3n
    make
 Build a HISAT-3N index with `hisat-3n-build`
 -----------
 `hisat-3n-build` builds a 3N-index, which contains two hisat2 indexes, from a set of DNA sequences. For standard 3N-index,
 each index contains 16 files with suffix `.3n.*.*.ht2`.
 For repeat 3N-index, there are 16 more files in addition to the standard 3N-index, and they have the suffix
 `.3n.*.rep.*.ht2`.
 These files constitute the hisat-3n index and no other file is needed to alignment reads to the reference.
 * `--base-change <chr1,chr2>` argument is required for `hisat-3n-build` and `hisat-3n`.   
  Provide which base is converted in the sequencing process to another base. Please enter
  2 letters separated by ',' for this argument. The first letter(chr1) should be the converted base, the second letter(chr2) should be
  the converted to base. For example, during slam-seq, some 'T' is converted to 'C',
  please enter `--base-change T,C`. During bisulfite-seq, some 'C' is converted to 'T', please enter `--base-change C,T`.
 * Different conversion types may build the same hisat-3n index. Please check the table below for more detail.
  Once you build the hisat-3n index with C to T conversion (for example BS-seq).
  You can align the T to C conversion reads (for example SLAM-seq reads) with the same index.
 | Conversion Types                   | HISAT-3N index suffix         |
  |:----------------------------------:|:-----------------------------:|
 |C -> T<br>T -> C<br>A -> G<br>G -> A|.3n.CT.\*.ht2 <br>.3n.GA.\*.ht2|
 |A -> C<br>C -> A<br>G -> T<br>T -> G|.3n.AC.\*.ht2 <br>.3n.TG.\*.ht2|
 |A -> T<br>T -> A                    |.3n.AT.\*.ht2 <br>.3n.TA.\*.ht2|
 |C -> G<br>G -> C                    |.3n.CG.\*.ht2 <br>.3n.GC.\*.ht2|
 #### Examples:
    # Build the standard HISAT-3N index (with C to T conversion):  
    hisat-3n-build --base-change C,T genome.fa genome
    # Build the repeat HISAT-3N index (with T to C conversion, require 256 GB memory for human genome index):  
    hisat-3n-build --base-change T,C --repeat-index genome.fa genome
 It is optional to make the graph index and add SNP or spice site information to the index, to increase the alignment accuracy.
 The graph index building may require more memory than the linear index building.
 For more detail, please check the [HISAT2 manual].
 [HISAT2 manual]:https://daehwankimlab.github.io/hisat2/manual/
 #### Examples:
    # Build the standard HISAT-3N index integrated index with SNP information
    hisat-3n-build --base-change C,T --snp genome.snp genome.fa genome 
    # Build the standard HISAT-3N integrated index with splice site information
    hisat-3n-build --base-change C,T --ss genome.ss --exon genome.exon genome.fa genome 
    # Build the repeat HISAT-3N index integrated index with SNP information
    hisat-3n-build --base-change C,T --repeat-index --snp genome.snp genome.fa genome 
    # Build the repeat HISAT-3N integrated index with splice site information
    hisat-3n-build --base-change C,T --repeat-index --ss genome.ss --exon genome.exon genome.fa genome 
 Alignment with `hisat-3n`
 ------------
 After building the HISAT-3N index, you are ready to use `hisat-3n` for alignment.
 HISAT-3N has the same set of parameters as in HISAT2 with some additional arguments. Please refer to the [HISAT2 manual] for more details.
 For the human reference genome, HISAT-3N requires about 9GB for alignment with the standard 3N-index and 10.5GB for the repeat 3N-index.
 * `--base-change <nt1,nt2>`  
  Specify the nucleotide conversion type (e.g., C to T in bisulfite-sequencing reads). The parameter option is two characters separated by ','.  Type the original nucleotide for the first character (nt1) and type the converted nucleotide as the second character (nt2). For example, if performing [SLAM-seq] where some 'T's are converted to 'C's, input `--base-change T,C`.
  As another example, if performing bisulfite-seq, where some 'C's are converted to 'T's, please input `--base-change C,T`.
  If you want to align non-converted reads to the regular HISAT2 index, then omit this command.
 * `--index/-x <hisat-3n-idx>`  
  Specify the index file basename for HISAT-3N.  The basename is the name of the index files up to but not including the suffix `.3n.*.*.ht2` / etc.
  For example, if you build your index with basename 'genome' using a HISAT-3N-build, please input `--index genome`.
 * `--directional-mapping`  
  Make directional mapping. Please use this option only if your sequencing reads are generated from a strand-specific library. 
  The directional mapping mode is about 2x faster than the standard (non-directional) mapping mode.
 * `--repeat-limit <int>`  
  You can set up the number of alignments to be checked for each repeat alignment. You may increase the number to direct hisat-3n
  to output more, if a read has multiple mapping locations. We suggest that you limit the repeat number for paired-end read alignment to no more
  than 1,000,000. default: 1000.
 * `--unique-only`  
  Only output uniquely aligned reads.
 #### Examples:
 * Single-end [SLAM-seq] read (T to C conversion) alignment with standard 3N-index:  
  `hisat-3n --index genome -f -U read.fa -S output.sam --base-change T,C`
 * Paired-end strand-specific bisulfite-seq read (C to T conversion) alignment with repeat 3N-index:   
  `hisat-3n --index genome -f -1 read_1.fa -2 read_2.fa -S output.sam --base-change C,T --directional-mapping`
 * Single-end TAPS reads (C to T conversion) alignment with repeat 3N-index and only output unique aligned results:   
  `hisat-3n --index genome -q -U read.fq -S output.sam --base-change C,T --unique`
 #### Extra SAM tags generated by HISAT-3N:
 * `Yf:i:<N>`: Number of conversions detected in the read.
 * `Zf:i:<N>`: Number of un-converted bases are detected in the read. Yf + Zf = total number of bases which can be converted in the read sequence.
 * `YZ:A:<A>`: The value `+` or `–` indicates the read is mapped to REF-3N (`+`) or REF-RC-3N (`-`), respectively.
 Generate a 3N-conversion-table with `hisat-3n-table`
 ------------
 ### Preparation
 To generate a 3N-conversion-table, users need to sort the `hisat-3n` generated SAM alignment file.
 [SAMtools] is required for this sorting process.
 Use `samtools sort` to convert the SAM file into a sorted SAM file.
    samtools sort output.sam -o output_sorted.sam -O sam
 Generate 3N-conversion-table with `hisat-3n-table`:
 ### Usage
    hisat-3n-table [options]* --alignments <alignmentFile> --ref <refFile> --base-change <char1,char2>
 #### Main arguments
 * `--alignments <alignmentFile>`   
  SORTED SAM file. Please enter `-` for standard input.
 * `--ref <refFile>`  
  The reference genome file (FASTA format) for generating HISAT-3N index.
 * `--output-name <outputFile>`  
  Filename to write 3N-conversion-table (tsv format) to.  By default, table is written to the “standard out” or “stdout” filehandle (i.e. the console).
 * `--base-change <char1,char2>`  
  The base-change rule. User should enter the exact same `--base-change` arguments in hisat-3n.
  For example, please enter `--base-change C,T` for bisulfite sequencing reads.
 #### Input options
 * `-u/--unique-only`  
  Only count the unique aligned reads into 3N-conversion-table.
 * `-m/--multiple-only`  
  Only count the multiple aligned reads into 3N-conversion-table.
 * `-c/--CG-only`  
  Only count the CpG sites in reference genome. This option is designed for bisulfite sequencing reads.
 * `--added-chrname`  
  Please add this option if you use `--add-chrname` during `hisat-3n` alignment.
  During `hisat-3n` alignment, the prefix "chr" is added in front of chromosome name and shows on SAM output, when user choose `--add-chrname`.
  `hisat-3n-table` cannot find the chromosome name on reference because it has an additional "chr" prefix. This option is to help `hisat-3n-table`
  find the matching chromosome name on reference file. The 3n-table provides the same chromosome name as SAM file.
 * `--removed-chrname`  
  Please add this option if you use `--remove-chrname` during `hisat-3n` alignment.
  During `hisat-3n` alignment, the prefix "chr" is removed in front of chromosome name and shows on SAM output, when user choose `--remove-chrname`.
  `hisat-3n-table` cannot find the chromosome name on reference because it has no "chr" prefix. This option is to help `hisat-3n-table`
  find the matching chromosome name on reference file. The 3n-table provides the same chromosome name as SAM file.
 #### Other options:
 * `-p/--threads <int>`  
  Launch `int` parallel threads (default: 1) for table building.
 * `-h/--help`  
  Print usage information and quit.
 #### Examples:
    # Generate the 3N-conversion-table for bisulfite sequencing data:  
      hisat-3n-table -p 16 --alignments sorted_alignment_result.sam --ref genome.fa --output-name output.tsv --base-change C,T
    # Generate the 3N-conversion-table for TAPS data and only count base in CpG site and uniquely aligned:  
      hisat-3n-table -p 16 --alignments sorted_alignment_result.sam --ref genome.fa --output-name output.tsv --base-change C,T --CG-only --unique-only
    # Generate the 3N-conversion-table for bisulfite sequencing data from sorted BAM file:  
      samtools view -h sorted_alignment_result.bam | hisat-3n-table --ref genome.fa --alignments - --output-name output.tsv --base-change C,T
    # Generate the 3N-conversion-table for bisulfite sequencing data from unsorted BAM file:  
      samtools sort alignment_result.bam -O sam | hisat-3n-table --ref genome.fa --alignments - --output-name output.tsv --base-change C,T
 #### Note:
 There are 7 columns in the 3N-conversion-table:
 1. `ref`: the chromosome name.
 2. `pos`: 1-based position in `ref`.
 3. `strand`: '+' for forward strand. '-' for reverse strand.
 4. `convertedBaseQualities`: the qualities of the converted bases in read-level measurement. The length of this string is equal to the number of converted bases.
 5. `convertedBaseCount`: the number of distinct read positions where converted bases in read-level measurements were found.
   this number is equal to the length of convertedBaseQualities.
 6. `unconvertedBaseQualities`: the qualities of the unconverted bases in read-level measurement. The length of this string is equal to the number of unconverted bases in read-level measurement.
 7. `unconvertedBaseCount`: the number of distinct read positions where unconverted bases in read-level measurements were found.
   this number is equal to the length of unconvertedBaseQualities.
 ##### Sample 3N-conversion-table:
    ref    pos    strand    convertedBaseQualities    convertedBaseCount    unconvertedBaseQualities    unconvertedBaseCount
    1      11874  +         FFFFFB<BF<F               11                                                0
    1      11877  -         FFFFFF<                   7                                                 0
    1      11878  +         FFFBB//F/BB               11                                                0
    1      11879  +                                   0                     FFFBB//FB/                  10
    1      11880  -         F                         1                     FFFF/                       5
 [SAMtools]:        http://samtools.sourceforge.net
 Publication
 ============
 * HISAT-3N   
  Zhang, Y., Park, C., Bennett, C., Thornton, M. and Kim, D. [Rapid and accurate alignment of nucleotide conversion sequencing reads with HISAT-3N](https://doi.org/10.1101/gr.275193.120). _Genome Research_ **31(7)**: 1290-1295 (2021)
 * HIAST2   
  Kim, D., Paggi, J.M., Park, C. _et al._ [Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype](https://doi.org/10.1038/s41587-019-0201-4). _Nat Biotechnol_ **37**, 907–915 (2019)  
--- a/4
+++ b/4
@ -0,0 +1,4 @@
 See section toward end of MANUAL entited "Getting started with HISAT2".  Or,
 for tutorial for latest HISAT2 version, visit:
 https://ccb.jhu.edu/software/hisat2/manual.shtml#getting-started-with-hisat2
--- a/_config.yml
+++ b/_config.yml
@ -0,0 +1 @@
 theme: jekyll-theme-time-machine
--- a/aligner_bt.cpp
+++ b/aligner_bt.cpp
--- a/aligner_bt.h
+++ b/aligner_bt.h
@ -0,0 +1,947 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef ALIGNER_BT_H_
 #define ALIGNER_BT_H_
 #include <utility>
 #include <stdint.h>
 #include "aligner_sw_common.h"
 #include "aligner_result.h"
 #include "scoring.h"
 #include "edit.h"
 #include "limit.h"
 #include "dp_framer.h"
 #include "sse_util.h"
 /* Say we've filled in a DP matrix in a cost-only manner, not saving the scores
 * for each of the cells.  At the end, we obtain a list of candidate cells and
 * we'd like to backtrace from them.  The per-cell scores are gone, but we have
 * to re-create the correct path somehow.  Hopefully we can do this without
 * recreating most or al of the score matrix, since this takes too much memory.
 *
 * Approach 1: Naively refill the matrix.
 *
 *  Just refill the matrix, perhaps backwards starting from the backtrace cell.
 *  Since this involves recreating all or most of the score matrix, this is not
 *  a good approach.
 *
 * Approach 2: Naive backtracking.
 *
 *  Conduct a search through the space of possible backtraces, rooted at the
 *  candidate cell.  To speed things along, we can prioritize paths that have a
 *  high score and that align more characters from the read.
 *
 *  The approach is simple, but it's neither fast nor memory-efficient in
 *  general.
 *
 * Approach 3: Refilling with checkpoints.
 *
 *  Refill the matrix "backwards" starting from the candidate cell, but use
 *  checkpoints to ensure that only a series of relatively small triangles or
 *  rectangles need to be refilled.  The checkpoints must include elements from
 *  the H, E and F matrices; not just H.  After each refill, we backtrace
 *  through the refilled area, then discard/reuse the fill memory.  I call each
 *  such fill/backtrace a mini-fill/backtrace.
 *
 *  If there's only one path to be found, then this is O(m+n).  But what if
 *  there are many?  And what if we would like to avoid paths that overlap in
 *  one or more cells?  There are two ways we can make this more efficient:
 *
 *   1. Remember the re-calculated E/F/H values and try to retrieve them
 *   2. Keep a record of cells that have already been traversed
 *
 *  Legend:
 *
 *  1: Candidate cell
 *  2: Final cell from first mini-fill/backtrace
 *  3: Final cell from second mini-fill/backtrace (third not shown)
 *  +: Checkpointed cell
 *  *: Cell filled from first or second mini-fill/backtrace
 *  -: Unfilled cell
 *
 *        ---++--------++--------++----
 *        --++--------++*-------++-----
 *        -++--(etc)-++**------++------
 *        ++--------+3***-----++-------
 *        +--------++****----++--------
 *        --------++*****---++--------+
 *        -------++******--++--------++
 *        ------++*******-++*-------++-
 *        -----++********++**------++--
 *        ----++********2+***-----++---
 *        ---++--------++****----++----
 *        --++--------++*****---++-----
 *        -++--------++*****1--++------
 *        ++--------++--------++-------
 *
 * Approach 4: Backtracking with checkpoints.
 *
 *  Conduct a search through the space of possible backtraces, rooted at the
 *  candidate cell.  Use "checkpoints" to prune.  That is, when a backtrace
 *  moves through a cell with a checkpointed score, consider the score
 *  accumulated so far and the cell's saved score; abort if those two scores
 *  add to something less than a valid score.  Note we're only checkpointing H
 *  in this case (possibly; see "subtle point"), not E or F.
 *
 *  Subtle point: checkpoint scores are a result of moving forward through
 *  the matrix whereas backtracking scores result from moving backward.  This
 *  matters becuase the two paths that meet up at a cell might have both
 *  factored in a gap open penalty for the same gap, in which case we will
 *  underestimate the overall score and prune a good path.  Here are two ideas
 *  for how to resolve this:
 *
 *   Idea 1: when we combine the forward and backward scores to find an overall
 *   score, and our backtrack procedure *just* made a horizontal or vertical
 *   move, add in a "bonus" equal to the gap open penalty of the appropraite
 *   type (read gap open for horizontal, ref gap open for vertical). This might
 *   overcompensate, since
 *
 *   Idea 2: keep the E and F values for the checkpoints around, in addition to
 *   the H values.  When it comes time to combine the score from the forward
 *   and backward paths, we consider the last move we made in the backward
 *   backtrace.  If it's a read gap (horizontal move), then we calculate the
 *   overall score as:
 *
 *     max(Score-backward + H-forward, Score-backward + E-forward + read-open)
 *
 *   If it's a reference gap (vertical move), then we calculate the overall
 *   score as:
 *
 *     max(Score-backward + H-forward, Score-backward + F-forward + ref-open)
 *
 *   What does it mean to abort a backtrack?  If we're starting a new branch
 *   and there is a checkpoing in the bottommost cell of the branch, and the
 *   overall score is less than the target, then we can simply ignore the
 *   branch.  If the checkpoint occurs in the middle of a string of matches, we
 *   need to curtail the branch such that it doesn't include the checkpointed
 *   cell and we won't ever try to enter the checkpointed cell, e.g., on a
 *   mismatch.
 *
 * Approaches 3 and 4 seem reasonable, and could be combined.  For simplicity,
 * we implement only approach 4 for now.
 *
 * Checkpoint information is propagated from the fill process to the backtracer
 * via a 
 */
 enum {
 	BT_NOT_FOUND = 1,      // could not obtain the backtrace because it
 	                       // overlapped a previous solution
 	BT_FOUND,              // obtained a valid backtrace
 	BT_REJECTED_N,         // backtrace rejected because it had too many Ns
 	BT_REJECTED_CORE_DIAG  // backtrace rejected because it failed to overlap a
 	                       // core diagonal
 };
 /**
 * Parameters for a matrix of potential backtrace problems to solve.
 * Encapsulates information about:
 *
 * The problem given a particular reference substring:
 *
 * - The query string (nucleotides and qualities)
 * - The reference substring (incl. orientation, offset into overall sequence)
 * - Checkpoints (i.e. values of matrix cells)
 * - Scoring scheme and other thresholds
 *
 * The problem given a particular reference substring AND a particular row and
 * column from which to backtrace:
 *
 * - The row and column
 * - The target score
 */
 class BtBranchProblem {
 public:
 	/**
 	 * Create new uninitialized problem.
 	 */
 	BtBranchProblem() { reset(); }
 	/**
 	 * Initialize a new problem.
 	 */
 	void initRef(
 		const char          *qry,    // query string (along rows)
 		const char          *qual,   // query quality string (along rows)
 		size_t               qrylen, // query string (along rows) length
 		const char          *ref,    // reference string (along columns)
 		TRefOff              reflen, // in-rectangle reference string length
 		TRefOff              treflen,// total reference string length
 		TRefId               refid,  // reference id
 		TRefOff              refoff, // reference offset
 		bool                 fw,     // orientation of problem
 		const DPRect*        rect,   // dynamic programming rectangle filled out
 		const Checkpointer*  cper,   // checkpointer
 		const Scoring       *sc,     // scoring scheme
 		size_t               nceil)  // max # Ns allowed in alignment
 	{
 		qry_     = qry;
 		qual_    = qual;
 		qrylen_  = qrylen;
 		ref_     = ref;
 		reflen_  = reflen;
 		treflen_ = treflen;
 		refid_   = refid;
 		refoff_  = refoff;
 		fw_      = fw;
 		rect_    = rect;
 		cper_    = cper;
 		sc_      = sc;
 		nceil_   = nceil;
 	}
 	/**
 	 * Initialize a new problem.
 	 */
 	void initBt(
 		size_t   row,   // row
 		size_t   col,   // column
 		bool     fill,  // use a filling rather than a backtracking strategy
 		bool     usecp, // use checkpoints to short-circuit while backtracking
 		TAlScore targ)  // target score
 	{
 		row_    = row;
 		col_    = col;
 		targ_   = targ;
 		fill_   = fill;
 		usecp_  = usecp;
 		if(fill) {
 			assert(usecp_);
 		}
 	}
 	/**
 	 * Reset to uninitialized state.
 	 */
 	void reset() {
 		qry_ = qual_ = ref_ = NULL;
 		cper_ = NULL;
 		rect_ = NULL;
 		sc_ = NULL;
 		qrylen_ = reflen_ = treflen_ = refid_ = refoff_ = row_ = col_ = targ_ = nceil_ = 0;
 		fill_ = fw_ = usecp_ = false;
 	}
 	/**
 	 * Return true iff the BtBranchProblem has been initialized.
 	 */
 	bool inited() const {
 		return qry_ != NULL;
 	}
 #ifndef NDEBUG
 	/**
 	 * Sanity-check the problem.
 	 */
 	bool repOk() const {
 		assert_gt(qrylen_, 0);
 		assert_gt(reflen_, 0);
 		assert_gt(treflen_, 0);
 		assert_lt(row_, qrylen_);
 		assert_lt((TRefOff)col_, reflen_);
 		return true;
 	}
 #endif
 	size_t reflen() const { return reflen_; }
 	size_t treflen() const { return treflen_; }
 protected:
 	const char         *qry_;    // query string (along rows)
 	const char         *qual_;   // query quality string (along rows)
 	size_t              qrylen_; // query string (along rows) length
 	const char         *ref_;    // reference string (along columns)
 	TRefOff             reflen_; // in-rectangle reference string length
 	TRefOff             treflen_;// total reference string length
 	TRefId              refid_;  // reference id
 	TRefOff             refoff_; // reference offset
 	bool                fw_;     // orientation of problem
 	const DPRect*       rect_;   // dynamic programming rectangle filled out
 	size_t              row_;    // starting row
 	size_t              col_;    // starting column
 	TAlScore            targ_;   // target score
 	const Checkpointer *cper_;   // checkpointer
 	bool                fill_;   // use mini-fills
 	bool                usecp_;  // use checkpointing?
 	const Scoring      *sc_;     // scoring scheme
 	size_t              nceil_;  // max # Ns allowed in alignment
 	friend class BtBranch;
 	friend class BtBranchQ;
 	friend class BtBranchTracer;
 };
 /**
 * Encapsulates a "branch" which is a diagonal of cells (possibly of length 0)
 * in the matrix where all the cells are matches.  These stretches are linked
 * together by edits to form a full backtrace path through the matrix.  Lengths
 * are measured w/r/t to the number of rows traversed by the path, so a branch
 * that represents a read gap extension could have length = 0.
 *
 * At the end of the day, the full backtrace path is represented as a list of
 * BtBranch's where each BtBranch represents a stretch of matching cells (and
 * up to one mismatching cell at its bottom extreme) ending in an edit (or in
 * the bottommost row, in which case the edit is uninitialized).  Each
 * BtBranch's row and col fields indicate the bottommost cell involved in the
 * diagonal stretch of matches, and the len_ field indicates the length of the
 * stretch of matches.  Note that the edits themselves also correspond to
 * movement through the matrix.
 *
 * A related issue is how we record which cells have been visited so that we
 * never report a pair of paths both traversing the same (row, col) of the
 * overall DP matrix.  This gets a little tricky because we have to take into
 * account the cells covered by *edits* in addition to the cells covered by the
 * stretches of matches.  For instance: imagine a mismatch.  That takes up a
 * cell of the DP matrix, but it may or may not be preceded by a string of
 * matches.  It's hard to imagine how to represent this unless we let the
 * mismatch "count toward" the len_ of the branch and let (row, col) refer to
 * the cell where the mismatch occurs.
 *
 * We need BtBranches to "live forever" so that we can make some BtBranches
 * parents of others using parent pointers.  For this reason, BtBranch's are
 * stored in an EFactory object in the BtBranchTracer class.
 */
 class BtBranch {
 public:
 	BtBranch() { reset(); }
 	BtBranch(
 		const BtBranchProblem& prob,
 		size_t parentId,
 		TAlScore penalty,
 		TAlScore score_en,
 		int64_t row,
 		int64_t col,
 		Edit e,
 		int hef,
 		bool root,
 		bool extend)
 	{
 		init(prob, parentId, penalty, score_en, row, col, e, hef, root, extend);
 	}
 	/**
 	 * Reset to uninitialized state.
 	 */
 	void reset() {
 		parentId_ = 0;
 		score_st_ = score_en_ = len_ = row_ = col_ = 0;
 		curtailed_ = false;
 		e_.reset();
 	}
 	/**
 	 * Caller gives us score_en, row and col.  We figure out score_st and len_
 	 * by comparing characters from the strings.
 	 */
 	void init(
 		const BtBranchProblem& prob,
 		size_t parentId,
 		TAlScore penalty,
 		TAlScore score_en,
 		int64_t row,
 		int64_t col,
 		Edit e,
 		int hef,
 		bool root,
 		bool extend);
 	/**
 	 * Return true iff this branch ends in a solution to the backtrace problem.
 	 */
 	bool isSolution(const BtBranchProblem& prob) const {
 		const bool end2end = prob.sc_->monotone;
 		return score_st_ == prob.targ_ && (!end2end || endsInFirstRow());
 	}
 	/**
 	 * Return true iff this branch could potentially lead to a valid alignment.
 	 */
 	bool isValid(const BtBranchProblem& prob) const {
 		int64_t scoreFloor = prob.sc_->monotone ? MIN_I64 : 0;
 		if(score_st_ < scoreFloor) {
 			// Dipped below the score floor
 			return false;
 		}
 		if(isSolution(prob)) {
 			// It's a solution, so it's also valid
 			return true;
 		}
 		if((int64_t)len_ > row_) {
 			// Went all the way to the top row
 			//assert_leq(score_st_, prob.targ_);
 			return score_st_ == prob.targ_;
 		} else {
 			int64_t match = prob.sc_->match();
 			int64_t bonusLeft = (row_ + 1 - len_) * match;
 			return score_st_ + bonusLeft >= prob.targ_;
 		}
 	}
 	/**
 	 * Return true iff this branch overlaps with the given branch.
 	 */
 	bool overlap(const BtBranchProblem& prob, const BtBranch& bt) const {
 		// Calculate this branch's diagonal
 		assert_lt(row_, (int64_t)prob.qrylen_);
 		size_t fromend = prob.qrylen_ - row_ - 1;
 		size_t diag = fromend + col_;
 		int64_t lo = 0, hi = row_ + 1;
 		if(len_ == 0) {
 			lo = row_;
 		} else {
 			lo = row_ - (len_ - 1);
 		}
 		// Calculate other branch's diagonal
 		assert_lt(bt.row_, (int64_t)prob.qrylen_);
 		size_t ofromend = prob.qrylen_ - bt.row_ - 1;
 		size_t odiag = ofromend + bt.col_;
 		if(diag != odiag) {
 			return false;
 		}
 		int64_t olo = 0, ohi = bt.row_ + 1;
 		if(bt.len_ == 0) {
 			olo = bt.row_;
 		} else {
 			olo = bt.row_ - (bt.len_ - 1);
 		}
 		int64_t losm = olo, hism = ohi;
 		if(hi - lo < ohi - olo) {
 			swap(lo, losm);
 			swap(hi, hism);
 		}
 		if((lo <= losm && hi > losm) || (lo <  hism && hi >= hism)) {
 			return true;
 		}
 		return false;
 	}
 	/**
 	 * Return true iff this branch is higher priority than the branch 'o'.
 	 */
 	bool operator<(const BtBranch& o) const {
 		// Prioritize uppermost above score
 		if(uppermostRow() != o.uppermostRow()) {
 			return uppermostRow() < o.uppermostRow();
 		}
 		if(score_st_ != o.score_st_) return score_st_ > o.score_st_;
 		if(row_      != o.row_)      return row_ < o.row_;
 		if(col_      != o.col_)      return col_ > o.col_;
 		if(parentId_ != o.parentId_) return parentId_ > o.parentId_;
 		assert(false);
 		return false;
 	}
 	/**
 	 * Return true iff the topmost cell involved in this branch is in the top
 	 * row.
 	 */
 	bool endsInFirstRow() const {
 		assert_leq((int64_t)len_, row_ + 1);
 		return (int64_t)len_ == row_+1;
 	}
 	/**
 	 * Return the uppermost row covered by this branch.
 	 */
 	size_t uppermostRow() const {
 		assert_geq(row_ + 1, (int64_t)len_);
 		return row_ + 1 - (int64_t)len_;
 	}
 	/**
 	 * Return the leftmost column covered by this branch.
 	 */
 	size_t leftmostCol() const {
 		assert_geq(col_ + 1, (int64_t)len_);
 		return col_ + 1 - (int64_t)len_;
 	}
 #ifndef NDEBUG
 	/**
 	 * Sanity-check this BtBranch.
 	 */
 	bool repOk() const {
 		assert(root_ || e_.inited());
 		assert_gt(len_, 0);
 		assert_geq(col_ + 1, (int64_t)len_);
 		assert_geq(row_ + 1, (int64_t)len_);
 		return true;
 	}
 #endif
 protected:
 	// ID of the parent branch.
 	size_t   parentId_;
 	// Penalty associated with the edit at the bottom of this branch (0 if
 	// there is no edit)
 	TAlScore penalty_;
 	// Score at the beginning of the branch
 	TAlScore score_st_;
 	// Score at the end of the branch (taking the edit into account)
 	TAlScore score_en_;
 	// Length of the branch.  That is, the total number of diagonal cells
 	// involved in all the matches and in the edit (if any).  Should always be
 	// > 0.
 	size_t   len_;
 	// The row of the final (bottommost) cell in the branch.  This might be the
 	// bottommost match if the branch has no associated edit.  Otherwise, it's
 	// the cell occupied by the edit.
 	int64_t  row_;
 	// The column of the final (bottommost) cell in the branch.
 	int64_t  col_;
 	// The edit at the bottom of the branch.  If this is the bottommost branch
 	// in the alignment and it does not end in an edit, then this remains
 	// uninitialized.
 	Edit     e_;
 	// True iff this is the bottommost branch in the alignment.  We can't just
 	// use row_ to tell us this because local alignments don't necessarily end
 	// in the last row.
 	bool     root_;
 	bool     curtailed_;  // true -> pruned at a checkpoint where we otherwise
 	                      // would have had a match
 friend class BtBranchQ;
 friend class BtBranchTracer;
 };
 /**
 * Instantiate and solve best-first branch-based backtraces.
 */
 class BtBranchTracer {
 public:
 	explicit BtBranchTracer() :
 		prob_(), bs_(), seenPaths_(DP_CAT), sawcell_(DP_CAT), doTri_() { }
 	/**
 	 * Add a branch to the queue.
 	 */
 	void add(size_t id) {
 		assert(!bs_[id].isSolution(prob_));
 		unsorted_.push_back(make_pair(bs_[id].score_st_, id));
 	}
 	/**
 	 * Add a branch to the list of solutions.
 	 */
 	void addSolution(size_t id) {
 		assert(bs_[id].isSolution(prob_));
 		solutions_.push_back(id);
 	}
 	/**
 	 * Given a potential branch to add to the queue, see if we can follow the
 	 * branch a little further first.  If it's still valid, or if we reach a
 	 * choice between valid outgoing paths, go ahead and add it to the queue.
 	 */
 	void examineBranch(
 		int64_t row,
 		int64_t col,
 		const Edit& e,
 		TAlScore pen,
 		TAlScore sc,
 		size_t parentId);
 	/**
 	 * Take all possible ways of leaving the given branch and add them to the
 	 * branch queue.
 	 */
 	void addOffshoots(size_t bid);
 	/**
 	 * Get the best branch and remove it from the priority queue.
 	 */
 	size_t best(RandomSource& rnd) {
 		assert(!empty());
 		flushUnsorted();
 		assert_gt(sortedSel_ ? sorted1_.size() : sorted2_.size(), cur_);
 		// Perhaps shuffle everyone who's tied for first?
 		size_t id = sortedSel_ ? sorted1_[cur_] : sorted2_[cur_];
 		cur_++;
 		return id;
 	}
 	/**
 	 * Return true iff there are no branches left to try.
 	 */
 	bool empty() const {
 		return size() == 0;
 	}
 	/**
 	 * Return the size, i.e. the total number of branches contained.
 	 */
 	size_t size() const {
 		return unsorted_.size() +
 		       (sortedSel_ ? sorted1_.size() : sorted2_.size()) - cur_;
 	}
 	/**
 	 * Return true iff there are no solutions left to try.
 	 */
 	bool emptySolution() const {
 		return sizeSolution() == 0;
 	}
 	/**
 	 * Return the size of the solution set so far.
 	 */
 	size_t sizeSolution() const {
 		return solutions_.size();
 	}
 	/**
 	 * Sort unsorted branches, merge them with master sorted list.
 	 */
 	void flushUnsorted();
 #ifndef NDEBUG
 	/**
 	 * Sanity-check the queue.
 	 */
 	bool repOk() const {
 		assert_lt(cur_, (sortedSel_ ? sorted1_.size() : sorted2_.size()));
 		return true;
 	}
 #endif
 	/**
 	 * Initialize the tracer with respect to a new read.  This involves
 	 * resetting all the state relating to the set of cells already visited
 	 */
 	void initRef(
 		const char*         rd,     // in: read sequence
 		const char*         qu,     // in: quality sequence
 		size_t              rdlen,  // in: read sequence length
 		const char*         rf,     // in: reference sequence
 		size_t              rflen,  // in: in-rectangle reference sequence length
 		TRefOff             trflen, // in: total reference sequence length
 		TRefId              refid,  // in: reference id
 		TRefOff             refoff, // in: reference offset
 		bool                fw,     // in: orientation
 		const DPRect       *rect,   // in: DP rectangle
 		const Checkpointer *cper,   // in: checkpointer
 		const Scoring&      sc,     // in: scoring scheme
 		size_t              nceil)  // in: N ceiling
 	{
 		prob_.initRef(rd, qu, rdlen, rf, rflen, trflen, refid, refoff, fw, rect, cper, &sc, nceil);
 		const size_t ndiag = rflen + rdlen - 1;
 		seenPaths_.resize(ndiag);
 		for(size_t i = 0; i < ndiag; i++) {
 			seenPaths_[i].clear();
 		}
 		// clear each of the per-column sets
 		if(sawcell_.size() < rflen) {
 			size_t isz = sawcell_.size();
 			sawcell_.resize(rflen);
 			for(size_t i = isz; i < rflen; i++) {
 				sawcell_[i].setCat(DP_CAT);
 			}
 		}
 		for(size_t i = 0; i < rflen; i++) {
 			sawcell_[i].setCat(DP_CAT);
 			sawcell_[i].clear(); // clear the set
 		}
 	}
 	/**
 	 * Initialize with a new backtrace.
 	 */
 	void initBt(
 		TAlScore       escore, // in: alignment score
 		size_t         row,    // in: start in this row
 		size_t         col,    // in: start in this column
 		bool           fill,   // in: use mini-filling?
 		bool           usecp,  // in: use checkpointing?
 		bool           doTri,  // in: triangle-shaped mini-fills?
 		RandomSource&  rnd)    // in: random gen, to choose among equal paths
 	{
 		prob_.initBt(row, col, fill, usecp, escore);
 		Edit e; e.reset();
 		unsorted_.clear();
 		solutions_.clear();
 		sorted1_.clear();
 		sorted2_.clear();
 		cur_ = 0;
 		nmm_ = 0;         // number of mismatches attempted
 		nnmm_ = 0;        // number of mismatches involving N attempted
 		nrdop_ = 0;       // number of read gap opens attempted
 		nrfop_ = 0;       // number of ref gap opens attempted
 		nrdex_ = 0;       // number of read gap extensions attempted
 		nrfex_ = 0;       // number of ref gap extensions attempted
 		nmmPrune_ = 0;    // number of mismatches attempted
 		nnmmPrune_ = 0;   // number of mismatches involving N attempted
 		nrdopPrune_ = 0;  // number of read gap opens attempted
 		nrfopPrune_ = 0;  // number of ref gap opens attempted
 		nrdexPrune_ = 0;  // number of read gap extensions attempted
 		nrfexPrune_ = 0;  // number of ref gap extensions attempted
 		row_ = row;
 		col_ = col;
 		doTri_ = doTri;
 		bs_.clear();
 		if(!prob_.fill_) {
 			size_t id = bs_.alloc();
 			bs_[id].init(
 				prob_,
 				0,     // parent id
 				0,     // penalty
 				0,     // starting score
 				row,   // row
 				col,   // column
 				e,
 				0,
 			    true,  // this is the root
 				true); // this should be extend with exact matches
 			if(bs_[id].isSolution(prob_)) {
 				addSolution(id);
 			} else {
 				add(id);
 			}
 		} else {
 			int64_t row = row_, col = col_;
 			TAlScore targsc = prob_.targ_;
 			int hef = 0;
 			bool done = false, abort = false;
 			size_t depth = 0;
 			while(!done && !abort) {
 				// Accumulate edits as we go.  We can do this by adding
 				// BtBranches to the bs_ structure.  Each step of the backtrace
 				// either involves an edit (thereby starting a new branch) or
 				// extends the previous branch by one more position.
 				//
 				// Note: if the BtBranches are in line, then trySolution can be
 				// used to populate the SwResult and check for various
 				// situations where we might reject the alignment (i.e. due to
 				// a cell having been visited previously).
 				if(doTri_) {
 					triangleFill(
 						row,          // row of cell to backtrace from
 						col,          // column of cell to backtrace from
 						hef,          // cell to bt from: H (0), E (1), or F (2)
 						targsc,       // score of cell to backtrace from
 						prob_.targ_,  // score of alignment we're looking for
 						rnd,          // pseudo-random generator
 						row,          // out: row we ended up in after bt
 						col,          // out: column we ended up in after bt
 						hef,          // out: H/E/F after backtrace
 						targsc,       // out: score up to cell we ended up in
 						done,         // out: finished tracing out an alignment?
 						abort);       // out: aborted b/c cell was seen before?
 				} else {
 					squareFill(
 						row,          // row of cell to backtrace from
 						col,          // column of cell to backtrace from
 						hef,          // cell to bt from: H (0), E (1), or F (2)
 						targsc,       // score of cell to backtrace from
 						prob_.targ_,  // score of alignment we're looking for
 						rnd,          // pseudo-random generator
 						row,          // out: row we ended up in after bt
 						col,          // out: column we ended up in after bt
 						hef,          // out: H/E/F after backtrace
 						targsc,       // out: score up to cell we ended up in
 						done,         // out: finished tracing out an alignment?
 						abort);       // out: aborted b/c cell was seen before?
 				}
 				if(depth >= ndep_.size()) {
 					ndep_.resize(depth+1);
 					ndep_[depth] = 1;
 				} else {
 					ndep_[depth]++;
 				}
 				depth++;
 				assert((row >= 0 && col >= 0) || done);
 			}
 		}
 		ASSERT_ONLY(seen_.clear());
 	}
 	/**
 	 * Get the next valid alignment given the backtrace problem.  Return false
 	 * if there is no valid solution, e.g., if 
 	 */
 	bool nextAlignment(
 		size_t maxiter,
 		SwResult& res,
 		size_t& off,
 		size_t& nrej,
 		size_t& niter,
 		RandomSource& rnd);
 	/**
 	 * Return true iff this tracer has been initialized
 	 */
 	bool inited() const {
 		return prob_.inited();
 	}
 	/**
 	 * Return true iff the mini-fills are triangle-shaped.
 	 */
 	bool doTri() const { return doTri_; }
 	/**
 	 * Fill in a triangle of the DP table and backtrace from the given cell to
 	 * a cell in the previous checkpoint, or to the terminal cell.
 	 */
 	void triangleFill(
 		int64_t rw,          // row of cell to backtrace from
 		int64_t cl,          // column of cell to backtrace from
 		int hef,             // cell to backtrace from is H (0), E (1), or F (2)
 		TAlScore targ,       // score of cell to backtrace from
 		TAlScore targ_final, // score of alignment we're looking for
 		RandomSource& rnd,   // pseudo-random generator
 		int64_t& row_new,    // out: row we ended up in after backtrace
 		int64_t& col_new,    // out: column we ended up in after backtrace
 		int& hef_new,        // out: H/E/F after backtrace
 		TAlScore& targ_new,  // out: score up to cell we ended up in
 		bool& done,          // out: finished tracing out an alignment?
 		bool& abort);        // out: aborted b/c cell was seen before?
 	/**
 	 * Fill in a square of the DP table and backtrace from the given cell to
 	 * a cell in the previous checkpoint, or to the terminal cell.
 	 */
 	void squareFill(
 		int64_t rw,          // row of cell to backtrace from
 		int64_t cl,          // column of cell to backtrace from
 		int hef,             // cell to backtrace from is H (0), E (1), or F (2)
 		TAlScore targ,       // score of cell to backtrace from
 		TAlScore targ_final, // score of alignment we're looking for
 		RandomSource& rnd,   // pseudo-random generator
 		int64_t& row_new,    // out: row we ended up in after backtrace
 		int64_t& col_new,    // out: column we ended up in after backtrace
 		int& hef_new,        // out: H/E/F after backtrace
 		TAlScore& targ_new,  // out: score up to cell we ended up in
 		bool& done,          // out: finished tracing out an alignment?
 		bool& abort);        // out: aborted b/c cell was seen before?
 protected:
 	/**
 	 * Get the next valid alignment given a backtrace problem.  Return false
 	 * if there is no valid solution.  Use a backtracking search to find the
 	 * solution.  This can be very slow.
 	 */
 	bool nextAlignmentBacktrace(
 		size_t maxiter,
 		SwResult& res,
 		size_t& off,
 		size_t& nrej,
 		size_t& niter,
 		RandomSource& rnd);
 	/**
 	 * Get the next valid alignment given a backtrace problem.  Return false
 	 * if there is no valid solution.  Use a triangle-fill backtrace to find
 	 * the solution.  This is usually fast (it's O(m + n)).
 	 */
 	bool nextAlignmentFill(
 		size_t maxiter,
 		SwResult& res,
 		size_t& off,
 		size_t& nrej,
 		size_t& niter,
 		RandomSource& rnd);
 	/**
 	 * Try all the solutions accumulated so far.  Solutions might be rejected
 	 * if they, for instance, overlap a previous solution, have too many Ns,
 	 * fail to overlap a core diagonal, etc.
 	 */
 	bool trySolutions(
 		bool lookForOlap,
 		SwResult& res,
 		size_t& off,
 		size_t& nrej,
 		RandomSource& rnd,
 		bool& success);
 	/**
 	 * See if a given solution branch works as a solution (i.e. doesn't overlap
 	 * another one, have too many Ns, fail to overlap a core diagonal, etc.)
 	 */
 	int trySolution(
 		size_t id,
 		bool lookForOlap,
 		SwResult& res,
 		size_t& off,
 		size_t& nrej,
 		RandomSource& rnd);
 	BtBranchProblem    prob_; // problem configuration
 	EFactory<BtBranch> bs_;   // global BtBranch factory
 	// already reported alignments going through these diagonal segments
 	ELList<std::pair<size_t, size_t> > seenPaths_;
 	ELSet<size_t> sawcell_; // cells already backtraced through
 	EList<std::pair<TAlScore, size_t> > unsorted_;  // unsorted list of as-yet-unflished BtBranches
 	EList<size_t> sorted1_;   // list of BtBranch, sorted by score
 	EList<size_t> sorted2_;   // list of BtBranch, sorted by score
 	EList<size_t> solutions_; // list of solution branches
 	bool          sortedSel_; // true -> 1, false -> 2
 	size_t        cur_;       // cursor into sorted list to start from
 	size_t        nmm_;         // number of mismatches attempted
 	size_t        nnmm_;        // number of mismatches involving N attempted
 	size_t        nrdop_;       // number of read gap opens attempted
 	size_t        nrfop_;       // number of ref gap opens attempted
 	size_t        nrdex_;       // number of read gap extensions attempted
 	size_t        nrfex_;       // number of ref gap extensions attempted
 	size_t        nmmPrune_;    // 
 	size_t        nnmmPrune_;   // 
 	size_t        nrdopPrune_;  // 
 	size_t        nrfopPrune_;  // 
 	size_t        nrdexPrune_;  // 
 	size_t        nrfexPrune_;  // 
 	size_t        row_;         // row
 	size_t        col_;         // column
 	bool           doTri_;      // true -> fill in triangles; false -> squares
 	EList<CpQuad>  sq_;         // square to fill when doing mini-fills
 	ELList<CpQuad> tri_;        // triangle to fill when doing mini-fills
 	EList<size_t>  ndep_;       // # triangles mini-filled at various depths
 #ifndef NDEBUG
 	ESet<size_t>  seen_;        // seedn branch ids; should never see same twice
 #endif
 };
 #endif /*ndef ALIGNER_BT_H_*/
--- a/aligner_cache.cpp
+++ b/aligner_cache.cpp
@ -0,0 +1,181 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #include "aligner_cache.h"
 #include "tinythread.h"
 #ifdef ALIGNER_CACHE_MAIN
 #include <iostream>
 #include <getopt.h>
 #include <string>
 #include "random_source.h"
 using namespace std;
 enum {
 	ARG_TESTS = 256
 };
 static const char *short_opts = "vCt";
 static struct option long_opts[] = {
 	{(char*)"verbose",  no_argument, 0, 'v'},
 	{(char*)"tests",    no_argument, 0, ARG_TESTS},
 };
 static void printUsage(ostream& os) {
 	os << "Usage: sawhi-cache [options]*" << endl;
 	os << "Options:" << endl;
 	os << "  --tests       run unit tests" << endl;
 	os << "  -v/--verbose  talkative mode" << endl;
 }
 int gVerbose = 0;
 static void add(
 	RedBlack<QKey, QVal>& t,
 	Pool& p,
 	const char *dna)
 {
 	QKey qk;
 	qk.init(BTDnaString(dna, true));
 	t.add(p, qk, NULL);
 }
 /**
 * Small tests for the AlignmentCache.
 */
 static void aligner_cache_tests() {
 	RedBlack<QKey, QVal> rb(1024);
 	Pool p(64 * 1024, 1024);
 	// Small test
 	add(rb, p, "ACGTCGATCGT");
 	add(rb, p, "ACATCGATCGT");
 	add(rb, p, "ACGACGATCGT");
 	add(rb, p, "ACGTAGATCGT");
 	add(rb, p, "ACGTCAATCGT");
 	add(rb, p, "ACGTCGCTCGT");
 	add(rb, p, "ACGTCGAACGT");
 	assert_eq(7, rb.size());
 	rb.clear();
 	p.clear();
 	// Another small test
 	add(rb, p, "ACGTCGATCGT");
 	add(rb, p, "CCGTCGATCGT");
 	add(rb, p, "TCGTCGATCGT");
 	add(rb, p, "GCGTCGATCGT");
 	add(rb, p, "AAGTCGATCGT");
 	assert_eq(5, rb.size());
 	rb.clear();
 	p.clear();
 	// Regression test (attempt to make it smaller)
 	add(rb, p, "CCTA");
 	add(rb, p, "AGAA");
 	add(rb, p, "TCTA");
 	add(rb, p, "GATC");
 	add(rb, p, "CTGC");
 	add(rb, p, "TTGC");
 	add(rb, p, "GCCG");
 	add(rb, p, "GGAT");
 	rb.clear();
 	p.clear();
 	// Regression test
 	add(rb, p, "CCTA");
 	add(rb, p, "AGAA");
 	add(rb, p, "TCTA");
 	add(rb, p, "GATC");
 	add(rb, p, "CTGC");
 	add(rb, p, "CATC");
 	add(rb, p, "CAAA");
 	add(rb, p, "CTAT");
 	add(rb, p, "CTCA");
 	add(rb, p, "TTGC");
 	add(rb, p, "GCCG");
 	add(rb, p, "GGAT");
 	assert_eq(12, rb.size());
 	rb.clear();
 	p.clear();
 	// Larger random test
 	EList<BTDnaString> strs;
 	char buf[5];
 	for(int i = 0; i < 4; i++) {
 		for(int j = 0; j < 4; j++) {
 			for(int k = 0; k < 4; k++) {
 				for(int m = 0; m < 4; m++) {
 					buf[0] = "ACGT"[i];
 					buf[1] = "ACGT"[j];
 					buf[2] = "ACGT"[k];
 					buf[3] = "ACGT"[m];
 					buf[4] = '\0';
 					strs.push_back(BTDnaString(buf, true));
 				}
 			}
 		}
 	}
 	// Add all of the 4-mers in several different random orders
 	RandomSource rand;
 	for(uint32_t runs = 0; runs < 100; runs++) {
 		rb.clear();
 		p.clear();
 		assert_eq(0, rb.size());
 		rand.init(runs);
 		EList<bool> used;
 		used.resize(256);
 		for(int i = 0; i < 256; i++) used[i] = false;
 		for(int i = 0; i < 256; i++) {
 			int r = rand.nextU32() % (256-i);
 			int unused = 0;
 			bool added = false;
 			for(int j = 0; j < 256; j++) {
 				if(!used[j] && unused == r) {
 					used[j] = true;
 					QKey qk;
 					qk.init(strs[j]);
 					rb.add(p, qk, NULL);
 					added = true;
 					break;
 				}
 				if(!used[j]) unused++;
 			}
 			assert(added);
 		}
 	}
 }
 /**
 * A way of feeding simply tests to the seed alignment infrastructure.
 */
 int main(int argc, char **argv) {
 	int option_index = 0;
 	int next_option;
 	do {
 		next_option = getopt_long(argc, argv, short_opts, long_opts, &option_index);
 		switch (next_option) {
 			case 'v':       gVerbose = true; break;
 			case ARG_TESTS: aligner_cache_tests(); return 0;
 			case -1: break;
 			default: {
 				cerr << "Unknown option: " << (char)next_option << endl;
 				printUsage(cerr);
 				exit(1);
 			}
 		}
 	} while(next_option != -1);
 }
 #endif
--- a/aligner_cache.h
+++ b/aligner_cache.h
--- a/aligner_driver.cpp
+++ b/aligner_driver.cpp
@ -0,0 +1,80 @@
 /*
 * Copyright 2012, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #include "aligner_driver.h"
 void AlignerDriverRootSelector::select(
 	const Read& q,
 	const Read* qo,
 	bool nofw,
 	bool norc,
 	EList<DescentConfig>& confs,
 	EList<DescentRoot>& roots)
 {
 	// Calculate interval length for both mates
 	int interval = rootIval_.f<int>((double)q.length());
 	if(qo != NULL) {
 		// Boost interval length by 20% for paired-end reads
 		interval = (int)(interval * 1.2 + 0.5);
 	}
 	float pri = 0.0f;
 	for(int fwi = 0; fwi < 2; fwi++) {
 		bool fw = (fwi == 0);
 		if((fw && nofw) || (!fw && norc)) {
 			continue;
 		}
 		// Put down left-to-right roots w/r/t forward and reverse-complement reads
 		{
 			bool first = true;
 			size_t i = 0;
 			while(first || (i + landing_ <= q.length())) {
 				confs.expand();
 				confs.back().cons.init(landing_, consExp_);
 				roots.expand();
 				roots.back().init(
 					i,          // offset from 5' end
 					true,       // left-to-right?
 					fw,         // fw?
 					q.length(), // query length
 					pri);       // root priority
 				i += interval;
 				first = false;
 			}
 		}
 		// Put down right-to-left roots w/r/t forward and reverse-complement reads
 		{
 			bool first = true;
 			size_t i = 0;
 			while(first || (i + landing_ <= q.length())) {
 				confs.expand();
 				confs.back().cons.init(landing_, consExp_);
 				roots.expand();
 				roots.back().init(
 					q.length() - i - 1, // offset from 5' end
 					false,              // left-to-right?
 					fw,                 // fw?
 					q.length(),         // query length
 					pri);               // root priority
 				i += interval;
 				first = false;
 			}
 		}
 	}
 }
--- a/aligner_driver.h
+++ b/aligner_driver.h
@ -0,0 +1,247 @@
 /*
 * Copyright 2012, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 /*
 * aligner_driver.h
 *
 * REDUNDANT SEED HITS
 *
 * We say that two seed hits are redundant if they trigger identical
 * seed-extend dynamic programming problems.  Put another way, they both lie on
 * the same diagonal of the overall read/reference dynamic programming matrix.
 * Detecting redundant seed hits is simple when the seed hits are ungapped.  We
 * do this after offset resolution but before the offset is converted to genome
 * coordinates (see uses of the seenDiags1_/seenDiags2_ fields for examples).
 *
 * REDUNDANT ALIGNMENTS
 *
 * In an unpaired context, we say that two alignments are redundant if they
 * share any cells in the global DP table.  Roughly speaking, this is like
 * saying that two alignments are redundant if any read character aligns to the
 * same reference character (same reference sequence, same strand, same offset)
 * in both alignments.
 *
 * In a paired-end context, we say that two paired-end alignments are redundant
 * if the mate #1s are redundant and the mate #2s are redundant.
 *
 * How do we enforce this?  In the unpaired context, this is relatively simple:
 * the cells from each alignment are checked against a set containing all cells
 * from all previous alignments.  Given a new alignment, for each cell in the
 * new alignment we check whether it is in the set.  If there is any overlap,
 * the new alignment is rejected as redundant.  Otherwise, the new alignment is
 * accepted and its cells are added to the set.
 *
 * Enforcement in a paired context is a little trickier.  Consider the
 * following approaches:
 *
 * 1. Skip anchors that are redundant with any previous anchor or opposite
 *    alignment.  This is sufficient to ensure no two concordant alignments
 *    found are redundant.
 *
 * 2. Same as scheme 1, but with a "transitive closure" scheme for finding all
 *    concordant pairs in the vicinity of an anchor.  Consider the AB/AC
 *    scenario from the previous paragraph.  If B is the anchor alignment, we
 *    will find AB but not AC.  But under this scheme, once we find AB we then
 *    let B be a new anchor and immediately look for its opposites.  Likewise,
 *    if we find any opposite, we make them anchors and continue searching.  We
 *    don't stop searching until every opposite is used as an anchor.
 *
 * 3. Skip anchors that are redundant with any previous anchor alignment (but
 *    allow anchors that are redundant with previous opposite alignments).
 *    This isn't sufficient to avoid redundant concordant alignments.  To avoid
 *    redundant concordants, we need an additional procedure that checks each
 *    new concordant alignment one-by-one against a list of previous concordant
 *    alignments to see if it is redundant.
 *
 * We take approach 1.
 */
 #ifndef ALIGNER_DRIVER_H_
 #define ALIGNER_DRIVER_H_
 #include "aligner_seed2.h"
 #include "simple_func.h"
 #include "aln_sink.h"
 /**
 * Concrete subclass of DescentRootSelector.  Puts a root every 'ival' chars,
 * where 'ival' is determined by user-specified parameters.  A root is filtered
 * out if the end of the read is less than 'landing' positions away, in the
 * direction of the search.
 */
 class AlignerDriverRootSelector : public DescentRootSelector {
 public:
 	AlignerDriverRootSelector(
 		double consExp,
 		const SimpleFunc& rootIval,
 		size_t landing)
 	{
 		consExp_ = consExp;
 		rootIval_ = rootIval;
 		landing_ = landing;
 	}
 	virtual ~AlignerDriverRootSelector() { }
 	virtual void select(
 		const Read& q,                 // read that we're selecting roots for
 		const Read* qo,                // opposite mate, if applicable
 		bool nofw,                     // don't add roots for fw read
 		bool norc,                     // don't add roots for rc read
 		EList<DescentConfig>& confs,   // put DescentConfigs here
 		EList<DescentRoot>& roots);    // put DescentRoot here
 protected:
 	double consExp_;
 	SimpleFunc rootIval_;
 	size_t landing_;
 };
 /**
 * Return values from extendSeeds and extendSeedsPaired.
 */
 enum {
 	// Candidates were examined exhaustively
 	ALDRIVER_EXHAUSTED_CANDIDATES = 1,
 	// The policy does not need us to look any further
 	ALDRIVER_POLICY_FULFILLED,
 	// We stopped because we ran up against a limit on how much work we should
 	// do for one set of seed ranges, e.g. the limit on number of consecutive
 	// unproductive DP extensions
 	ALDRIVER_EXCEEDED_LIMIT
 };
 /**
 * This class is the glue between a DescentDriver and the dynamic programming
 * implementations in Bowtie 2.  The DescentDriver is used to find some very
 * high-scoring alignments, but is additionally used to rank partial alignments
 * so that they can be extended using dynamic programming.
 */
 template <typename index_t>
 class AlignerDriver {
 public:
 	AlignerDriver(
 		double consExp,
 		const SimpleFunc& rootIval,
 		size_t landing,
 		bool veryVerbose,
 		const SimpleFunc& totsz,
 		const SimpleFunc& totfmops) :
 		sel_(consExp, rootIval, landing),
 		alsel_(),
 		dr1_(veryVerbose),
 		dr2_(veryVerbose)
 	{
 		totsz_ = totsz;
 		totfmops_ = totfmops;
 	}
 	/**
 	 * Initialize driver with respect to a new read or pair.
 	 */
 	void initRead(
 		const Read& q1,
 		bool nofw,
 		bool norc,
 		TAlScore minsc,
 		TAlScore maxpen,
 		const Read* q2)
 	{
 		dr1_.initRead(q1, nofw, norc, minsc, maxpen, q2, &sel_);
 		red1_.init(q1.length());
 		paired_ = false;
 		if(q2 != NULL) {
 			dr2_.initRead(*q2, nofw, norc, minsc, maxpen, &q1, &sel_);
 			red2_.init(q2->length());
 			paired_ = true;
 		} else {
 			dr2_.reset();
 		}
 		size_t totsz = totsz_.f<size_t>(q1.length());
 		size_t totfmops = totfmops_.f<size_t>(q1.length());
 		stop_.init(
 			totsz,
 			0,
 			true,
 			totfmops);
 	}
 	/**
 	 * Start the driver.  The driver will begin by conducting a best-first,
 	 * index-assisted search through the space of possible full and partial
 	 * alignments.  This search may be followed up with a dynamic programming
 	 * extension step, taking a prioritized set of partial SA ranges found
 	 * during the search and extending each with DP.  The process might also be
 	 * iterated, with the search being occasioanally halted so that DPs can be
 	 * tried, then restarted, etc.
 	 */
 	int go(
 		const Scoring& sc,
 		const GFM<index_t>& gfmFw,
 		const GFM<index_t>& gfmBw,
 		const BitPairReference& ref,
 		DescentMetrics& met,
 		WalkMetrics& wlm,
 		PerReadMetrics& prm,
 		RandomSource& rnd,
           AlnSinkWrap<index_t>& sink);
 	/**
 	 * Reset state of all DescentDrivers.
 	 */
 	void reset() {
 		dr1_.reset();
 		dr2_.reset();
 		red1_.reset();
 		red2_.reset();
 	}
 protected:
 	AlignerDriverRootSelector sel_;   // selects where roots should go
 	DescentAlignmentSelector<index_t> alsel_;  // one selector can deal with >1 drivers
 	DescentDriver<index_t> dr1_;               // driver for mate 1/unpaired reads
 	DescentDriver<index_t> dr2_;               // driver for paired-end reads
 	DescentStoppingConditions stop_;  // when to pause index-assisted BFS
 	bool paired_;                     // current read is paired?
 	SimpleFunc totsz_;      // memory limit on best-first search data
 	SimpleFunc totfmops_;   // max # FM ops for best-first search
 	// For detecting redundant alignments
 	RedundantAlns  red1_;   // database of cells used for mate 1 alignments
 	RedundantAlns  red2_;   // database of cells used for mate 2 alignments
 	// For AlnRes::matchesRef
 	ASSERT_ONLY(SStringExpandable<char> raw_refbuf_);
 	ASSERT_ONLY(SStringExpandable<uint32_t> raw_destU32_);
 	ASSERT_ONLY(EList<bool> raw_matches_);
 	ASSERT_ONLY(BTDnaString tmp_rf_);
 	ASSERT_ONLY(BTDnaString tmp_rdseq_);
 	ASSERT_ONLY(BTString tmp_qseq_);
 	ASSERT_ONLY(EList<index_t> tmp_reflens_);
 	ASSERT_ONLY(EList<index_t> tmp_refoffs_);
 };
 #endif /* defined(ALIGNER_DRIVER_H_) */
--- a/aligner_metrics.h
+++ b/aligner_metrics.h
@ -0,0 +1,352 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef ALIGNER_METRICS_H_
 #define ALIGNER_METRICS_H_
 #include <math.h>
 #include <iostream>
 #include "alphabet.h"
 #include "timer.h"
 #include "sstring.h"
 using namespace std;
 /**
 * Borrowed from http://www.johndcook.com/standard_deviation.html,
 * which in turn is borrowed from Knuth.
 */
 class RunningStat {
 public:
 	RunningStat() : m_n(0), m_tot(0.0) { }
 	void clear() {
 		m_n = 0;
 		m_tot = 0.0;
 	}
 	void push(float x) {
 		m_n++;
 		m_tot += x;
 		// See Knuth TAOCP vol 2, 3rd edition, page 232
 		if (m_n == 1) {
 			m_oldM = m_newM = x;
 			m_oldS = 0.0;
 		} else {
 			m_newM = m_oldM + (x - m_oldM)/m_n;
 			m_newS = m_oldS + (x - m_oldM)*(x - m_newM);
 			// set up for next iteration
 			m_oldM = m_newM;
 			m_oldS = m_newS;
 		}
 	}
 	int num() const {
 		return m_n;
 	}
 	double tot() const {
 		return m_tot;
 	}
 	double mean() const {
 		return (m_n > 0) ? m_newM : 0.0;
 	}
 	double variance() const {
 		return ( (m_n > 1) ? m_newS/(m_n - 1) : 0.0 );
 	}
 	double stddev() const {
 		return sqrt(variance());
 	}
 private:
 	int m_n;
 	double m_tot;
 	double m_oldM, m_newM, m_oldS, m_newS;
 };
 /**
 * Encapsulates a set of metrics that we would like an aligner to keep
 * track of, so that we can possibly use it to diagnose performance
 * issues.
 */
 class AlignerMetrics {
 public:
 	AlignerMetrics() :
 		curBacktracks_(0),
 		curBwtOps_(0),
 		first_(true),
 		curIsLowEntropy_(false),
 		curIsHomoPoly_(false),
 		curHadRanges_(false),
 		curNumNs_(0),
 		reads_(0),
 		homoReads_(0),
 		lowEntReads_(0),
 		hiEntReads_(0),
 		alignedReads_(0),
 		unalignedReads_(0),
 		threeOrMoreNReads_(0),
 		lessThanThreeNRreads_(0),
 		bwtOpsPerRead_(),
 		backtracksPerRead_(),
 		bwtOpsPerHomoRead_(),
 		backtracksPerHomoRead_(),
 		bwtOpsPerLoEntRead_(),
 		backtracksPerLoEntRead_(),
 		bwtOpsPerHiEntRead_(),
 		backtracksPerHiEntRead_(),
 		bwtOpsPerAlignedRead_(),
 		backtracksPerAlignedRead_(),
 		bwtOpsPerUnalignedRead_(),
 		backtracksPerUnalignedRead_(),
 		bwtOpsPer0nRead_(),
 		backtracksPer0nRead_(),
 		bwtOpsPer1nRead_(),
 		backtracksPer1nRead_(),
 		bwtOpsPer2nRead_(),
 		backtracksPer2nRead_(),
 		bwtOpsPer3orMoreNRead_(),
 		backtracksPer3orMoreNRead_(),
 		timer_(cout, "", false)
 		{ }
 	void printSummary() {
 		if(!first_) {
 			finishRead();
 		}
 		cout << "AlignerMetrics:" << endl;
 		cout << "  # Reads:             " << reads_ << endl;
 		float hopct = (reads_ > 0) ? (((float)homoReads_)/((float)reads_)) : (0.0f);
 		hopct *= 100.0f;
 		cout << "  % homo-polymeric:    " << (hopct) << endl;
 		float lopct = (reads_ > 0) ? ((float)lowEntReads_/(float)(reads_)) : (0.0f);
 		lopct *= 100.0f;
 		cout << "  % low-entropy:       " << (lopct) << endl;
 		float unpct = (reads_ > 0) ? ((float)unalignedReads_/(float)(reads_)) : (0.0f);
 		unpct *= 100.0f;
 		cout << "  % unaligned:         " << (unpct) << endl;
 		float npct = (reads_ > 0) ? ((float)threeOrMoreNReads_/(float)(reads_)) : (0.0f);
 		npct *= 100.0f;
 		cout << "  % with 3 or more Ns: " << (npct) << endl;
 		cout << endl;
 		cout << "  Total BWT ops:    avg: " << bwtOpsPerRead_.mean() << ", stddev: " << bwtOpsPerRead_.stddev() << endl;
 		cout << "  Total Backtracks: avg: " << backtracksPerRead_.mean() << ", stddev: " << backtracksPerRead_.stddev() << endl;
 		time_t elapsed = timer_.elapsed();
 		cout << "  BWT ops per second:    " << (bwtOpsPerRead_.tot()/elapsed) << endl;
 		cout << "  Backtracks per second: " << (backtracksPerRead_.tot()/elapsed) << endl;
 		cout << endl;
 		cout << "  Homo-poly:" << endl;
 		cout << "    BWT ops:    avg: " << bwtOpsPerHomoRead_.mean() << ", stddev: " << bwtOpsPerHomoRead_.stddev() << endl;
 		cout << "    Backtracks: avg: " << backtracksPerHomoRead_.mean() << ", stddev: " << backtracksPerHomoRead_.stddev() << endl;
 		cout << "  Low-entropy:" << endl;
 		cout << "    BWT ops:    avg: " << bwtOpsPerLoEntRead_.mean() << ", stddev: " << bwtOpsPerLoEntRead_.stddev() << endl;
 		cout << "    Backtracks: avg: " << backtracksPerLoEntRead_.mean() << ", stddev: " << backtracksPerLoEntRead_.stddev() << endl;
 		cout << "  High-entropy:" << endl;
 		cout << "    BWT ops:    avg: " << bwtOpsPerHiEntRead_.mean() << ", stddev: " << bwtOpsPerHiEntRead_.stddev() << endl;
 		cout << "    Backtracks: avg: " << backtracksPerHiEntRead_.mean() << ", stddev: " << backtracksPerHiEntRead_.stddev() << endl;
 		cout << endl;
 		cout << "  Unaligned:" << endl;
 		cout << "    BWT ops:    avg: " << bwtOpsPerUnalignedRead_.mean() << ", stddev: " << bwtOpsPerUnalignedRead_.stddev() << endl;
 		cout << "    Backtracks: avg: " << backtracksPerUnalignedRead_.mean() << ", stddev: " << backtracksPerUnalignedRead_.stddev() << endl;
 		cout << "  Aligned:" << endl;
 		cout << "    BWT ops:    avg: " << bwtOpsPerAlignedRead_.mean() << ", stddev: " << bwtOpsPerAlignedRead_.stddev() << endl;
 		cout << "    Backtracks: avg: " << backtracksPerAlignedRead_.mean() << ", stddev: " << backtracksPerAlignedRead_.stddev() << endl;
 		cout << endl;
 		cout << "  0 Ns:" << endl;
 		cout << "    BWT ops:    avg: " << bwtOpsPer0nRead_.mean() << ", stddev: " << bwtOpsPer0nRead_.stddev() << endl;
 		cout << "    Backtracks: avg: " << backtracksPer0nRead_.mean() << ", stddev: " << backtracksPer0nRead_.stddev() << endl;
 		cout << "  1 N:" << endl;
 		cout << "    BWT ops:    avg: " << bwtOpsPer1nRead_.mean() << ", stddev: " << bwtOpsPer1nRead_.stddev() << endl;
 		cout << "    Backtracks: avg: " << backtracksPer1nRead_.mean() << ", stddev: " << backtracksPer1nRead_.stddev() << endl;
 		cout << "  2 Ns:" << endl;
 		cout << "    BWT ops:    avg: " << bwtOpsPer2nRead_.mean() << ", stddev: " << bwtOpsPer2nRead_.stddev() << endl;
 		cout << "    Backtracks: avg: " << backtracksPer2nRead_.mean() << ", stddev: " << backtracksPer2nRead_.stddev() << endl;
 		cout << "  >2 Ns:" << endl;
 		cout << "    BWT ops:    avg: " << bwtOpsPer3orMoreNRead_.mean() << ", stddev: " << bwtOpsPer3orMoreNRead_.stddev() << endl;
 		cout << "    Backtracks: avg: " << backtracksPer3orMoreNRead_.mean() << ", stddev: " << backtracksPer3orMoreNRead_.stddev() << endl;
 		cout << endl;
 	}
 	/**
 	 *
 	 */
 	void nextRead(const BTDnaString& read) {
 		if(!first_) {
 			finishRead();
 		}
 		first_ = false;
 		//float ent = entropyDna5(read);
 		float ent = 0.0f;
 		curIsLowEntropy_ = (ent < 0.75f);
 		curIsHomoPoly_ = (ent < 0.001f);
 		curHadRanges_ = false;
 		curBwtOps_ = 0;
 		curBacktracks_ = 0;
 		// Count Ns
 		curNumNs_ = 0;
 		const size_t len = read.length();
 		for(size_t i = 0; i < len; i++) {
 			if((int)read[i] == 4) curNumNs_++;
 		}
 	}
 	/**
 	 *
 	 */
 	void setReadHasRange() {
 		curHadRanges_ = true;
 	}
 	/**
 	 * Commit the running statistics for this read to
 	 */
 	void finishRead() {
 		reads_++;
 		if(curIsHomoPoly_) homoReads_++;
 		else if(curIsLowEntropy_) lowEntReads_++;
 		else hiEntReads_++;
 		if(curHadRanges_) alignedReads_++;
 		else unalignedReads_++;
 		bwtOpsPerRead_.push((float)curBwtOps_);
 		backtracksPerRead_.push((float)curBacktracks_);
 		// Drill down by entropy
 		if(curIsHomoPoly_) {
 			bwtOpsPerHomoRead_.push((float)curBwtOps_);
 			backtracksPerHomoRead_.push((float)curBacktracks_);
 		} else if(curIsLowEntropy_) {
 			bwtOpsPerLoEntRead_.push((float)curBwtOps_);
 			backtracksPerLoEntRead_.push((float)curBacktracks_);
 		} else {
 			bwtOpsPerHiEntRead_.push((float)curBwtOps_);
 			backtracksPerHiEntRead_.push((float)curBacktracks_);
 		}
 		// Drill down by whether it aligned
 		if(curHadRanges_) {
 			bwtOpsPerAlignedRead_.push((float)curBwtOps_);
 			backtracksPerAlignedRead_.push((float)curBacktracks_);
 		} else {
 			bwtOpsPerUnalignedRead_.push((float)curBwtOps_);
 			backtracksPerUnalignedRead_.push((float)curBacktracks_);
 		}
 		if(curNumNs_ == 0) {
 			lessThanThreeNRreads_++;
 			bwtOpsPer0nRead_.push((float)curBwtOps_);
 			backtracksPer0nRead_.push((float)curBacktracks_);
 		} else if(curNumNs_ == 1) {
 			lessThanThreeNRreads_++;
 			bwtOpsPer1nRead_.push((float)curBwtOps_);
 			backtracksPer1nRead_.push((float)curBacktracks_);
 		} else if(curNumNs_ == 2) {
 			lessThanThreeNRreads_++;
 			bwtOpsPer2nRead_.push((float)curBwtOps_);
 			backtracksPer2nRead_.push((float)curBacktracks_);
 		} else {
 			threeOrMoreNReads_++;
 			bwtOpsPer3orMoreNRead_.push((float)curBwtOps_);
 			backtracksPer3orMoreNRead_.push((float)curBacktracks_);
 		}
 	}
 	// Running-total of the number of backtracks and BWT ops for the
 	// current read
 	uint32_t curBacktracks_;
 	uint32_t curBwtOps_;
 protected:
 	bool first_;
 	// true iff the current read is low entropy
 	bool curIsLowEntropy_;
 	// true if current read is all 1 char (or very close)
 	bool curIsHomoPoly_;
 	// true iff the current read has had one or more ranges reported
 	bool curHadRanges_;
 	// number of Ns in current read
 	int curNumNs_;
 	// # reads
 	uint32_t reads_;
 	// # homo-poly reads
 	uint32_t homoReads_;
 	// # low-entropy reads
 	uint32_t lowEntReads_;
 	// # high-entropy reads
 	uint32_t hiEntReads_;
 	// # reads with alignments
 	uint32_t alignedReads_;
 	// # reads without alignments
 	uint32_t unalignedReads_;
 	// # reads with 3 or more Ns
 	uint32_t threeOrMoreNReads_;
 	// # reads with < 3 Ns
 	uint32_t lessThanThreeNRreads_;
 	// Distribution of BWT operations per read
 	RunningStat bwtOpsPerRead_;
 	RunningStat backtracksPerRead_;
 	// Distribution of BWT operations per homo-poly read
 	RunningStat bwtOpsPerHomoRead_;
 	RunningStat backtracksPerHomoRead_;
 	// Distribution of BWT operations per low-entropy read
 	RunningStat bwtOpsPerLoEntRead_;
 	RunningStat backtracksPerLoEntRead_;
 	// Distribution of BWT operations per high-entropy read
 	RunningStat bwtOpsPerHiEntRead_;
 	RunningStat backtracksPerHiEntRead_;
 	// Distribution of BWT operations per read that "aligned" (for
 	// which a range was arrived at - range may not have necessarily
 	// lead to an alignment)
 	RunningStat bwtOpsPerAlignedRead_;
 	RunningStat backtracksPerAlignedRead_;
 	// Distribution of BWT operations per read that didn't align
 	RunningStat bwtOpsPerUnalignedRead_;
 	RunningStat backtracksPerUnalignedRead_;
 	// Distribution of BWT operations/backtracks per read with no Ns
 	RunningStat bwtOpsPer0nRead_;
 	RunningStat backtracksPer0nRead_;
 	// Distribution of BWT operations/backtracks per read with one N
 	RunningStat bwtOpsPer1nRead_;
 	RunningStat backtracksPer1nRead_;
 	// Distribution of BWT operations/backtracks per read with two Ns
 	RunningStat bwtOpsPer2nRead_;
 	RunningStat backtracksPer2nRead_;
 	// Distribution of BWT operations/backtracks per read with three or
 	// more Ns
 	RunningStat bwtOpsPer3orMoreNRead_;
 	RunningStat backtracksPer3orMoreNRead_;
 	Timer timer_;
 };
 #endif /* ALIGNER_METRICS_H_ */
--- a/aligner_report.h
+++ b/aligner_report.h
@ -0,0 +1,35 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef ALIGNER_REPORT_H_
 #define ALIGNER_REPORT_H_
 #include "aligner_cache.h"
 class Reporter {
 public:
 	/**
 	 *
 	 */
 	bool report(const AlignmentCacheIface<uint32_t>& cache, const QVal<uint32_t>& qv) {
 		return true; // don't retry
 	}
 };
 #endif /*ALIGNER_REPORT_H_*/
--- a/aligner_result.cpp
+++ b/aligner_result.cpp
--- a/aligner_result.h
+++ b/aligner_result.h
--- a/aligner_seed.cpp
+++ b/aligner_seed.cpp
@ -0,0 +1,530 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #include "aligner_cache.h"
 #include "aligner_seed.h"
 #include "search_globals.h"
 #include "gfm.h"
 using namespace std;
 /**
 * Construct a constraint with no edits of any kind allowed.
 */
 Constraint Constraint::exact() {
 	Constraint c;
 	c.edits = c.mms = c.ins = c.dels = c.penalty = 0;
 	return c;
 }
 /**
 * Construct a constraint where the only constraint is a total
 * penalty constraint.
 */
 Constraint Constraint::penaltyBased(int pen) {
 	Constraint c;
 	c.penalty = pen;
 	return c;
 }
 /**
 * Construct a constraint where the only constraint is a total
 * penalty constraint related to the length of the read.
 */
 Constraint Constraint::penaltyFuncBased(const SimpleFunc& f) {
 	Constraint c;
 	c.penFunc = f;
 	return c;
 }
 /**
 * Construct a constraint where the only constraint is a total
 * penalty constraint.
 */
 Constraint Constraint::mmBased(int mms) {
 	Constraint c;
 	c.mms = mms;
 	c.edits = c.dels = c.ins = 0;
 	return c;
 }
 /**
 * Construct a constraint where the only constraint is a total
 * penalty constraint.
 */
 Constraint Constraint::editBased(int edits) {
 	Constraint c;
 	c.edits = edits;
 	c.dels = c.ins = c.mms = 0;
 	return c;
 }
 //
 // Some static methods for constructing some standard SeedPolicies
 //
 /**
 * Given a read, depth and orientation, extract a seed data structure
 * from the read and fill in the steps & zones arrays.  The Seed
 * contains the sequence and quality values.
 */
 bool
 Seed::instantiate(
 	const Read& read,
 	const BTDnaString& seq, // seed read sequence
 	const BTString& qual,   // seed quality sequence
 	const Scoring& pens,
 	int depth,
 	int seedoffidx,
 	int seedtypeidx,
 	bool fw,
 	InstantiatedSeed& is) const
 {
 	assert(overall != NULL);
 	int seedlen = len;
 	if((int)read.length() < seedlen) {
 		// Shrink seed length to fit read if necessary
 		seedlen = (int)read.length();
 	}
 	assert_gt(seedlen, 0);
 	is.steps.resize(seedlen);
 	is.zones.resize(seedlen);
 	// Fill in 'steps' and 'zones'
 	//
 	// The 'steps' list indicates which read character should be
 	// incorporated at each step of the search process.  Often we will
 	// simply proceed from one end to the other, in which case the
 	// 'steps' list is ascending or descending.  In some cases (e.g.
 	// the 2mm case), we might want to switch directions at least once
 	// during the search, in which case 'steps' will jump in the
 	// middle.  When an element of the 'steps' list is negative, this
 	// indicates that the next
 	//
 	// The 'zones' list indicates which zone constraint is active at
 	// each step.  Each element of the 'zones' list is a pair; the
 	// first pair element indicates the applicable zone when
 	// considering either mismatch or delete (ref gap) events, while
 	// the second pair element indicates the applicable zone when
 	// considering insertion (read gap) events.  When either pair
 	// element is a negative number, that indicates that we are about
 	// to leave the zone for good, at which point we may need to
 	// evaluate whether we have reached the zone's budget.
 	//
 	switch(type) {
 		case SEED_TYPE_EXACT: {
 			for(int k = 0; k < seedlen; k++) {
 				is.steps[k] = -(seedlen - k);
 				// Zone 0 all the way
 				is.zones[k].first = is.zones[k].second = 0;
 			}
 			break;
 		}
 		case SEED_TYPE_LEFT_TO_RIGHT: {
 			for(int k = 0; k < seedlen; k++) {
 				is.steps[k] = k+1;
 				// Zone 0 from 0 up to ceil(len/2), then 1
 				is.zones[k].first = is.zones[k].second = ((k < (seedlen+1)/2) ? 0 : 1);
 			}
 			// Zone 1 ends at the RHS
 			is.zones[seedlen-1].first = is.zones[seedlen-1].second = -1;
 			break;
 		}
 		case SEED_TYPE_RIGHT_TO_LEFT: {
 			for(int k = 0; k < seedlen; k++) {
 				is.steps[k] = -(seedlen - k);
 				// Zone 0 from 0 up to floor(len/2), then 1
 				is.zones[k].first  = ((k < seedlen/2) ? 0 : 1);
 				// Inserts: Zone 0 from 0 up to ceil(len/2)-1, then 1
 				is.zones[k].second = ((k < (seedlen+1)/2+1) ? 0 : 1);
 			}
 			is.zones[seedlen-1].first = is.zones[seedlen-1].second = -1;
 			break;
 		}
 		case SEED_TYPE_INSIDE_OUT: {
 			// Zone 0 from ceil(N/4) up to N-floor(N/4)
 			int step = 0;
 			for(int k = (seedlen+3)/4; k < seedlen - (seedlen/4); k++) {
 				is.zones[step].first = is.zones[step].second = 0;
 				is.steps[step++] = k+1;
 			}
 			// Zone 1 from N-floor(N/4) up
 			for(int k = seedlen - (seedlen/4); k < seedlen; k++) {
 				is.zones[step].first = is.zones[step].second = 1;
 				is.steps[step++] = k+1;
 			}
 			// No Zone 1 if seedlen is short (like 2)
 			//assert_eq(1, is.zones[step-1].first);
 			is.zones[step-1].first = is.zones[step-1].second = -1;
 			// Zone 2 from ((seedlen+3)/4)-1 down to 0
 			for(int k = ((seedlen+3)/4)-1; k >= 0; k--) {
 				is.zones[step].first = is.zones[step].second = 2;
 				is.steps[step++] = -(k+1);
 			}
 			assert_eq(2, is.zones[step-1].first);
 			is.zones[step-1].first = is.zones[step-1].second = -2;
 			assert_eq(seedlen, step);
 			break;
 		}
 		default:
 			throw 1;
 	}
 	// Instantiate constraints
 	for(int i = 0; i < 3; i++) {
 		is.cons[i] = zones[i];
 		is.cons[i].instantiate(read.length());
 	}
 	is.overall = *overall;
 	is.overall.instantiate(read.length());
 	// Take a sweep through the seed sequence.  Consider where the Ns
 	// occur and how zones are laid out.  Calculate the maximum number
 	// of positions we can jump over initially (e.g. with the ftab) and
 	// perhaps set this function's return value to false, indicating
 	// that the arrangements of Ns prevents the seed from aligning.
 	bool streak = true;
 	is.maxjump = 0;
 	bool ret = true;
 	bool ltr = (is.steps[0] > 0); // true -> left-to-right
 	for(size_t i = 0; i < is.steps.size(); i++) {
 		assert_neq(0, is.steps[i]);
 		int off = is.steps[i];
 		off = abs(off)-1;
 		Constraint& cons = is.cons[abs(is.zones[i].first)];
 		int c = seq[off];  assert_range(0, 4, c);
 		int q = qual[off];
 		if(ltr != (is.steps[i] > 0) || // changed direction
 		   is.zones[i].first != 0 ||   // changed zone
 		   is.zones[i].second != 0)    // changed zone
 		{
 			streak = false;
 		}
 		if(c == 4) {
 			// Induced mismatch
 			if(cons.canN(q, pens)) {
 				cons.chargeN(q, pens);
 			} else {
 				// Seed disqualified due to arrangement of Ns
 				return false;
 			}
 		}
 		if(streak) is.maxjump++;
 	}
 	is.seedoff = depth;
 	is.seedoffidx = seedoffidx;
 	is.fw = fw;
 	is.s = *this;
 	return ret;
 }
 /**
 * Return a set consisting of 1 seed encapsulating an exact matching
 * strategy.
 */
 void
 Seed::zeroMmSeeds(int ln, EList<Seed>& pols, Constraint& oall) {
 	oall.init();
 	// Seed policy 1: left-to-right search
 	pols.expand();
 	pols.back().len = ln;
 	pols.back().type = SEED_TYPE_EXACT;
 	pols.back().zones[0] = Constraint::exact();
 	pols.back().zones[1] = Constraint::exact();
 	pols.back().zones[2] = Constraint::exact(); // not used
 	pols.back().overall = &oall;
 }
 /**
 * Return a set of 2 seeds encapsulating a half-and-half 1mm strategy.
 */
 void
 Seed::oneMmSeeds(int ln, EList<Seed>& pols, Constraint& oall) {
 	oall.init();
 	// Seed policy 1: left-to-right search
 	pols.expand();
 	pols.back().len = ln;
 	pols.back().type = SEED_TYPE_LEFT_TO_RIGHT;
 	pols.back().zones[0] = Constraint::exact();
 	pols.back().zones[1] = Constraint::mmBased(1);
 	pols.back().zones[2] = Constraint::exact(); // not used
 	pols.back().overall = &oall;
 	// Seed policy 2: right-to-left search
 	pols.expand();
 	pols.back().len = ln;
 	pols.back().type = SEED_TYPE_RIGHT_TO_LEFT;
 	pols.back().zones[0] = Constraint::exact();
 	pols.back().zones[1] = Constraint::mmBased(1);
 	pols.back().zones[1].mmsCeil = 0;
 	pols.back().zones[2] = Constraint::exact(); // not used
 	pols.back().overall = &oall;
 }
 /**
 * Return a set of 3 seeds encapsulating search roots for:
 *
 * 1. Starting from the left-hand side and searching toward the
 *    right-hand side allowing 2 mismatches in the right half.
 * 2. Starting from the right-hand side and searching toward the
 *    left-hand side allowing 2 mismatches in the left half.
 * 3. Starting (effectively) from the center and searching out toward
 *    both the left and right-hand sides, allowing one mismatch on
 *    either side.
 *
 * This is not exhaustive.  There are 2 mismatch cases mised; if you
 * imagine the seed as divided into four successive quarters A, B, C
 * and D, the cases we miss are when mismatches occur in A and C or B
 * and D.
 */
 void
 Seed::twoMmSeeds(int ln, EList<Seed>& pols, Constraint& oall) {
 	oall.init();
 	// Seed policy 1: left-to-right search
 	pols.expand();
 	pols.back().len = ln;
 	pols.back().type = SEED_TYPE_LEFT_TO_RIGHT;
 	pols.back().zones[0] = Constraint::exact();
 	pols.back().zones[1] = Constraint::mmBased(2);
 	pols.back().zones[2] = Constraint::exact(); // not used
 	pols.back().overall = &oall;
 	// Seed policy 2: right-to-left search
 	pols.expand();
 	pols.back().len = ln;
 	pols.back().type = SEED_TYPE_RIGHT_TO_LEFT;
 	pols.back().zones[0] = Constraint::exact();
 	pols.back().zones[1] = Constraint::mmBased(2);
 	pols.back().zones[1].mmsCeil = 1; // Must have used at least 1 mismatch
 	pols.back().zones[2] = Constraint::exact(); // not used
 	pols.back().overall = &oall;
 	// Seed policy 3: inside-out search
 	pols.expand();
 	pols.back().len = ln;
 	pols.back().type = SEED_TYPE_INSIDE_OUT;
 	pols.back().zones[0] = Constraint::exact();
 	pols.back().zones[1] = Constraint::mmBased(1);
 	pols.back().zones[1].mmsCeil = 0; // Must have used at least 1 mismatch
 	pols.back().zones[2] = Constraint::mmBased(1);
 	pols.back().zones[2].mmsCeil = 0; // Must have used at least 1 mismatch
 	pols.back().overall = &oall;
 }
 /**
 * Types of actions that can be taken by the SeedAligner.
 */
 enum {
 	SA_ACTION_TYPE_RESET = 1,
 	SA_ACTION_TYPE_SEARCH_SEED, // 2
 	SA_ACTION_TYPE_FTAB,        // 3
 	SA_ACTION_TYPE_FCHR,        // 4
 	SA_ACTION_TYPE_MATCH,       // 5
 	SA_ACTION_TYPE_EDIT         // 6
 };
 #define MIN(x, y) ((x < y) ? x : y)
 #ifdef ALIGNER_SEED_MAIN
 #include <getopt.h>
 #include <string>
 /**
 * Parse an int out of optarg and enforce that it be at least 'lower';
 * if it is less than 'lower', than output the given error message and
 * exit with an error and a usage message.
 */
 static int parseInt(const char *errmsg, const char *arg) {
 	long l;
 	char *endPtr = NULL;
 	l = strtol(arg, &endPtr, 10);
 	if (endPtr != NULL) {
 		return (int32_t)l;
 	}
 	cerr << errmsg << endl;
 	throw 1;
 	return -1;
 }
 enum {
 	ARG_NOFW = 256,
 	ARG_NORC,
 	ARG_MM,
 	ARG_SHMEM,
 	ARG_TESTS,
 	ARG_RANDOM_TESTS,
 	ARG_SEED
 };
 static const char *short_opts = "vCt";
 static struct option long_opts[] = {
 	{(char*)"verbose",  no_argument,       0, 'v'},
 	{(char*)"color",    no_argument,       0, 'C'},
 	{(char*)"timing",   no_argument,       0, 't'},
 	{(char*)"nofw",     no_argument,       0, ARG_NOFW},
 	{(char*)"norc",     no_argument,       0, ARG_NORC},
 	{(char*)"mm",       no_argument,       0, ARG_MM},
 	{(char*)"shmem",    no_argument,       0, ARG_SHMEM},
 	{(char*)"tests",    no_argument,       0, ARG_TESTS},
 	{(char*)"random",   required_argument, 0, ARG_RANDOM_TESTS},
 	{(char*)"seed",     required_argument, 0, ARG_SEED},
 };
 static void printUsage(ostream& os) {
 	os << "Usage: ac [options]* <index> <patterns>" << endl;
 	os << "Options:" << endl;
 	os << "  --mm                memory-mapped mode" << endl;
 	os << "  --shmem             shared memory mode" << endl;
 	os << "  --nofw              don't align forward-oriented read" << endl;
 	os << "  --norc              don't align reverse-complemented read" << endl;
 	os << "  -t/--timing         show timing information" << endl;
 	os << "  -C/--color          colorspace mode" << endl;
 	os << "  -v/--verbose        talkative mode" << endl;
 }
 bool gNorc = false;
 bool gNofw = false;
 bool gColor = false;
 int gVerbose = 0;
 int gGapBarrier = 1;
 bool gColorExEnds = true;
 int gSnpPhred = 30;
 bool gReportOverhangs = true;
 extern void aligner_seed_tests();
 extern void aligner_random_seed_tests(
 	int num_tests,
 	uint32_t qslo,
 	uint32_t qshi,
 	bool color,
 	uint32_t seed);
 /**
 * A way of feeding simply tests to the seed alignment infrastructure.
 */
 int main(int argc, char **argv) {
 	bool useMm = false;
 	bool useShmem = false;
 	bool mmSweep = false;
 	bool noRefNames = false;
 	bool sanity = false;
 	bool timing = false;
 	int option_index = 0;
 	int seed = 777;
 	int next_option;
 	do {
 		next_option = getopt_long(
 			argc, argv, short_opts, long_opts, &option_index);
 		switch (next_option) {
 			case 'v':       gVerbose = true; break;
 			case 'C':       gColor   = true; break;
 			case 't':       timing   = true; break;
 			case ARG_NOFW:  gNofw    = true; break;
 			case ARG_NORC:  gNorc    = true; break;
 			case ARG_MM:    useMm    = true; break;
 			case ARG_SHMEM: useShmem = true; break;
 			case ARG_SEED:  seed = parseInt("", optarg); break;
 			case ARG_TESTS: {
 				aligner_seed_tests();
 				aligner_random_seed_tests(
 					100,     // num references
 					100,   // queries per reference lo
 					400,   // queries per reference hi
 					false, // true -> generate colorspace reference/reads
 					18);   // pseudo-random seed
 				return 0;
 			}
 			case ARG_RANDOM_TESTS: {
 				seed = parseInt("", optarg);
 				aligner_random_seed_tests(
 					100,   // num references
 					100,   // queries per reference lo
 					400,   // queries per reference hi
 					false, // true -> generate colorspace reference/reads
 					seed); // pseudo-random seed
 				return 0;
 			}
 			case -1: break;
 			default: {
 				cerr << "Unknown option: " << (char)next_option << endl;
 				printUsage(cerr);
 				exit(1);
 			}
 		}
 	} while(next_option != -1);
 	char *reffn;
 	if(optind >= argc) {
 		cerr << "No reference; quitting..." << endl;
 		return 1;
 	}
 	reffn = argv[optind++];
 	if(optind >= argc) {
 		cerr << "No reads; quitting..." << endl;
 		return 1;
 	}
 	string gfmBase(reffn);
 	BitPairReference ref(
 		gfmBase,     // base path
 		gColor,      // whether we expect it to be colorspace
 		sanity,      // whether to sanity-check reference as it's loaded
 		NULL,        // fasta files to sanity check reference against
 		NULL,        // another way of specifying original sequences
 		false,       // true -> infiles (2 args ago) contains raw seqs
 		useMm,       // use memory mapping to load index?
 		useShmem,    // use shared memory (not memory mapping)
 		mmSweep,     // touch all the pages after memory-mapping the index
 		gVerbose,    // verbose
 		gVerbose);   // verbose but just for startup messages
 	Timer *t = new Timer(cerr, "Time loading fw index: ", timing);
 	GFM gfmFw(
 		gfmBase,
 		0,           // don't need entireReverse for fw index
 		true,        // index is for the forward direction
 		-1,          // offrate (irrelevant)
 		useMm,       // whether to use memory-mapped files
 		useShmem,    // whether to use shared memory
 		mmSweep,     // sweep memory-mapped files
 		!noRefNames, // load names?
 		false,       // load SA sample?
 		true,        // load ftab?
 		true,        // load rstarts?
 		NULL,        // reference map, or NULL if none is needed
 		gVerbose,    // whether to be talkative
 		gVerbose,    // talkative during initialization
 		false,       // handle memory exceptions, don't pass them up
 		sanity);
 	delete t;
 	t = new Timer(cerr, "Time loading bw index: ", timing);
 	GFM gfmBw(
 		gfmBase + ".rev",
 		1,           // need entireReverse
 		false,       // index is for the backward direction
 		-1,          // offrate (irrelevant)
 		useMm,       // whether to use memory-mapped files
 		useShmem,    // whether to use shared memory
 		mmSweep,     // sweep memory-mapped files
 		!noRefNames, // load names?
 		false,       // load SA sample?
 		true,        // load ftab?
 		false,       // load rstarts?
 		NULL,        // reference map, or NULL if none is needed
 		gVerbose,    // whether to be talkative
 		gVerbose,    // talkative during initialization
 		false,       // handle memory exceptions, don't pass them up
 		sanity);
 	delete t;
 	for(int i = optind; i < argc; i++) {
 	}
 }
 #endif
--- a/aligner_seed.h
+++ b/aligner_seed.h
--- a/aligner_seed2.cpp
+++ b/aligner_seed2.cpp
--- a/aligner_seed2.h
+++ b/aligner_seed2.h
--- a/aligner_seed_policy.cpp
+++ b/aligner_seed_policy.cpp
@ -0,0 +1,916 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #include <string>
 #include <iostream>
 #include <sstream>
 #include <limits>
 #include "ds.h"
 #include "aligner_seed_policy.h"
 #include "mem_ids.h"
 using namespace std;
 static int parseFuncType(const std::string& otype) {
 	string type = otype;
 	if(type == "C" || type == "Constant") {
 		return SIMPLE_FUNC_CONST;
 	} else if(type == "L" || type == "Linear") {
 		return SIMPLE_FUNC_LINEAR;
 	} else if(type == "S" || type == "Sqrt") {
 		return SIMPLE_FUNC_SQRT;
 	} else if(type == "G" || type == "Log") {
 		return SIMPLE_FUNC_LOG;
 	}
 	std::cerr << "Error: Bad function type '" << otype.c_str()
 	          << "'.  Should be C (constant), L (linear), "
 	          << "S (square root) or G (natural log)." << std::endl;
 	throw 1;
 }
 #define PARSE_FUNC(fv) { \
 	if(ctoks.size() >= 1) { \
 		fv.setType(parseFuncType(ctoks[0])); \
 	} \
 	if(ctoks.size() >= 2) { \
 		double co; \
 		istringstream tmpss(ctoks[1]); \
 		tmpss >> co; \
 		fv.setConst(co); \
 	} \
 	if(ctoks.size() >= 3) { \
 		double ce; \
 		istringstream tmpss(ctoks[2]); \
 		tmpss >> ce; \
 		fv.setCoeff(ce); \
 	} \
 	if(ctoks.size() >= 4) { \
 		double mn; \
 		istringstream tmpss(ctoks[3]); \
 		tmpss >> mn; \
 		fv.setMin(mn); \
 	} \
 	if(ctoks.size() >= 5) { \
 		double mx; \
 		istringstream tmpss(ctoks[4]); \
 		tmpss >> mx; \
 		fv.setMin(mx); \
 	} \
 }
 /**
 * Parse alignment policy when provided in this format:
 * <lab>=<val>;<lab>=<val>;<lab>=<val>...
 *
 * And label=value possibilities are:
 *
 * Bonus for a match
 * -----------------
 *
 * MA=xx (default: MA=0, or MA=2 if --local is set)
 *
 *    xx = Each position where equal read and reference characters match up
 *         in the alignment contriubtes this amount to the total score.
 *
 * Penalty for a mismatch
 * ----------------------
 *
 * MMP={Cxx|Q|RQ} (default: MMP=C6)
 *
 *   Cxx = Each mismatch costs xx.  If MMP=Cxx is specified, quality
 *         values are ignored when assessing penalities for mismatches.
 *   Q   = Each mismatch incurs a penalty equal to the mismatched base's
 *         value.
 *   R   = Each mismatch incurs a penalty equal to the mismatched base's
 *         rounded quality value.  Qualities are rounded off to the
 *         nearest 10, and qualities greater than 30 are rounded to 30.
 *
 * Penalty for position with N (in either read or reference)
 * ---------------------------------------------------------
 *
 * NP={Cxx|Q|RQ} (default: NP=C1)
 *
 *   Cxx = Each alignment position with an N in either the read or the
 *         reference costs xx.  If NP=Cxx is specified, quality values are
 *         ignored when assessing penalities for Ns.
 *   Q   = Each alignment position with an N in either the read or the
 *         reference incurs a penalty equal to the read base's quality
 *         value.
 *   R   = Each alignment position with an N in either the read or the
 *         reference incurs a penalty equal to the read base's rounded
 *         quality value.  Qualities are rounded off to the nearest 10,
 *         and qualities greater than 30 are rounded to 30.
 *
 * Penalty for a read gap
 * ----------------------
 *
 * RDG=xx,yy (default: RDG=5,3)
 *
 *   xx    = Read gap open penalty.
 *   yy    = Read gap extension penalty.
 *
 * Total cost incurred by a read gap = xx + (yy * gap length)
 *
 * Penalty for a reference gap
 * ---------------------------
 *
 * RFG=xx,yy (default: RFG=5,3)
 *
 *   xx    = Reference gap open penalty.
 *   yy    = Reference gap extension penalty.
 *
 * Total cost incurred by a reference gap = xx + (yy * gap length)
 *
 * Minimum score for valid alignment
 * ---------------------------------
 *
 * MIN=xx,yy (defaults: MIN=-0.6,-0.6, or MIN=0.0,0.66 if --local is set)
 *
 *   xx,yy = For a read of length N, the total score must be at least
 *           xx + (read length * yy) for the alignment to be valid.  The
 *           total score is the sum of all negative penalties (from
 *           mismatches and gaps) and all positive bonuses.  The minimum
 *           can be negative (and is by default in global alignment mode).
 *
 * Score floor for local alignment
 * -------------------------------
 *
 * FL=xx,yy (defaults: FL=-Infinity,0.0, or FL=0.0,0.0 if --local is set)
 *
 *   xx,yy = If a cell in the dynamic programming table has a score less
 *           than xx + (read length * yy), then no valid alignment can go
 *           through it.  Defaults are highly recommended.
 *
 * N ceiling
 * ---------
 *
 * NCEIL=xx,yy (default: NCEIL=0.0,0.15)
 *
 *   xx,yy = For a read of length N, the number of alignment
 *           positions with an N in either the read or the
 *           reference cannot exceed
 *           ceiling = xx + (read length * yy).  If the ceiling is
 *           exceeded, the alignment is considered invalid.
 *
 * Seeds
 * -----
 *
 * SEED=mm,len,ival (default: SEED=0,22)
 *
 *   mm   = Maximum number of mismatches allowed within a seed.
 *          Must be >= 0 and <= 2.  Note that 2-mismatch mode is
 *          not fully sensitive; i.e. some 2-mismatch seed
 *          alignments may be missed.
 *   len  = Length of seed.
 *   ival = Interval between seeds.  If not specified, seed
 *          interval is determined by IVAL.
 *
 * Seed interval
 * -------------
 *
 * IVAL={L|S|C},xx,yy (default: IVAL=S,1.0,0.0)
 *
 *   L  = let interval between seeds be a linear function of the
 *        read length.  xx and yy are the constant and linear
 *        coefficients respectively.  In other words, the interval
 *        equals a * len + b, where len is the read length.
 *        Intervals less than 1 are rounded up to 1.
 *   S  = let interval between seeds be a function of the sqaure
 *        root of the  read length.  xx and yy are the
 *        coefficients.  In other words, the interval equals
 *        a * sqrt(len) + b, where len is the read length.
 *        Intervals less than 1 are rounded up to 1.
 *   C  = Like S but uses cube root of length instead of square
 *        root.
 *
 * Example 1:
 *
 *  SEED=1,10,5 and read sequence is TGCTATCGTACGATCGTAC:
 *
 *  The following seeds are extracted from the forward
 *  representation of the read and aligned to the reference
 *  allowing up to 1 mismatch:
 *
 *  Read:    TGCTATCGTACGATCGTACA
 *
 *  Seed 1+: TGCTATCGTA
 *  Seed 2+:      TCGTACGATC
 *  Seed 3+:           CGATCGTACA
 *
 *  ...and the following are extracted from the reverse-complement
 *  representation of the read and align to the reference allowing
 *  up to 1 mismatch:
 *
 *  Seed 1-: TACGATAGCA
 *  Seed 2-:      GATCGTACGA
 *  Seed 3-:           TGTACGATCG
 *
 * Example 2:
 *
 *  SEED=1,20,20 and read sequence is TGCTATCGTACGATC.  The seed
 *  length is 20 but the read is only 15 characters long.  In this
 *  case, Bowtie2 automatically shrinks the seed length to be equal
 *  to the read length.
 *
 *  Read:    TGCTATCGTACGATC
 *
 *  Seed 1+: TGCTATCGTACGATC
 *  Seed 1-: GATCGTACGATAGCA
 *
 * Example 3:
 *
 *  SEED=1,10,10 and read sequence is TGCTATCGTACGATC.  Only one seed
 *  fits on the read; a second seed would overhang the end of the read
 *  by 5 positions.  In this case, Bowtie2 extracts one seed.
 *
 *  Read:    TGCTATCGTACGATC
 *
 *  Seed 1+: TGCTATCGTA
 *  Seed 1-: TACGATAGCA
 */
 void SeedAlignmentPolicy::parseString(
                                      const       std::string& s,
                                      bool        local,
                                      bool        noisyHpolymer,
                                      bool        ignoreQuals,
                                      int&        bonusMatchType,
                                      int&        bonusMatch,
                                      int&        penMmcType,
                                      int&        penMmcMax,
                                      int&        penMmcMin,
                                      int&        penScMax,
                                      int&        penScMin,
                                      int&        penNType,
                                      int&        penN,
                                      int&        penRdExConst,
                                      int&        penRfExConst,
                                      int&        penRdExLinear,
                                      int&        penRfExLinear,
                                      SimpleFunc& costMin,
                                      SimpleFunc& nCeil,
                                      bool&       nCatPair,
                                      int&        multiseedMms,
                                      int&        multiseedLen,
                                      SimpleFunc& multiseedIval,
                                      size_t&     failStreak,
                                      size_t&     seedRounds,
                                      SimpleFunc* penCanIntronLen,
                                      SimpleFunc* penNoncanIntronLen)
 {
 	bonusMatchType    = local ? DEFAULT_MATCH_BONUS_TYPE_LOCAL : DEFAULT_MATCH_BONUS_TYPE;
 	bonusMatch        = local ? DEFAULT_MATCH_BONUS_LOCAL : DEFAULT_MATCH_BONUS;
 	penMmcType        = ignoreQuals ? DEFAULT_MM_PENALTY_TYPE_IGNORE_QUALS :
 	                                  DEFAULT_MM_PENALTY_TYPE;
 	penMmcMax         = DEFAULT_MM_PENALTY_MAX;
 	penMmcMin         = DEFAULT_MM_PENALTY_MIN;
 	penNType          = DEFAULT_N_PENALTY_TYPE;
 	penN              = DEFAULT_N_PENALTY;
    penScMax          = DEFAULT_SC_PENALTY_MAX;
    penScMin          = DEFAULT_SC_PENALTY_MIN;
 	const double DMAX = std::numeric_limits<double>::max();
    costMin.init(
 		local ? SIMPLE_FUNC_LOG : SIMPLE_FUNC_LINEAR,
 		local ? DEFAULT_MIN_CONST_LOCAL  : 0.0f,
 		local ? DEFAULT_MIN_LINEAR_LOCAL : -0.2f);
 	nCeil.init(
 		SIMPLE_FUNC_LINEAR, 0.0f, DMAX,
 		DEFAULT_N_CEIL_CONST, DEFAULT_N_CEIL_LINEAR);
 	multiseedIval.init(
 		DEFAULT_IVAL, 1.0f, DMAX,
 		DEFAULT_IVAL_B, DEFAULT_IVAL_A);
 	nCatPair          = DEFAULT_N_CAT_PAIR;
 	if(!noisyHpolymer) {
 		penRdExConst  = DEFAULT_READ_GAP_CONST;
 		penRdExLinear = DEFAULT_READ_GAP_LINEAR;
 		penRfExConst  = DEFAULT_REF_GAP_CONST;
 		penRfExLinear = DEFAULT_REF_GAP_LINEAR;
 	} else {
 		penRdExConst  = DEFAULT_READ_GAP_CONST_BADHPOLY;
 		penRdExLinear = DEFAULT_READ_GAP_LINEAR_BADHPOLY;
 		penRfExConst  = DEFAULT_REF_GAP_CONST_BADHPOLY;
 		penRfExLinear = DEFAULT_REF_GAP_LINEAR_BADHPOLY;
 	}
 	multiseedMms      = DEFAULT_SEEDMMS;
 	multiseedLen      = DEFAULT_SEEDLEN;
 	EList<string> toks(MISC_CAT);
 	string tok;
 	istringstream ss(s);
 	int setting = 0;
 	// Get each ;-separated token
 	while(getline(ss, tok, ';')) {
 		setting++;
 		EList<string> etoks(MISC_CAT);
 		string etok;
 		// Divide into tokens on either side of =
 		istringstream ess(tok);
 		while(getline(ess, etok, '=')) {
 			etoks.push_back(etok);
 		}
 		// Must be exactly 1 =
 		if(etoks.size() != 2) {
 			cerr << "Error parsing alignment policy setting " << setting
 			     << "; must be bisected by = sign" << endl
 				 << "Policy: " << s.c_str() << endl;
 			assert(false); throw 1;
 		}
 		// LHS is tag, RHS value
 		string tag = etoks[0], val = etoks[1];
 		// Separate value into comma-separated tokens
 		EList<string> ctoks(MISC_CAT);
 		string ctok;
 		istringstream css(val);
 		while(getline(css, ctok, ',')) {
 			ctoks.push_back(ctok);
 		}
 		if(ctoks.size() == 0) {
 			cerr << "Error parsing alignment policy setting " << setting
 			     << "; RHS must have at least 1 token" << endl
 				 << "Policy: " << s.c_str() << endl;
 			assert(false); throw 1;
 		}
 		for(size_t i = 0; i < ctoks.size(); i++) {
 			if(ctoks[i].length() == 0) {
 				cerr << "Error parsing alignment policy setting " << setting
 				     << "; token " << i+1 << " on RHS had length=0" << endl
 					 << "Policy: " << s.c_str() << endl;
 				assert(false); throw 1;
 			}
 		}
 		// Bonus for a match
 		// MA=xx (default: MA=0, or MA=10 if --local is set)
 		if(tag == "MA") {
 			if(ctoks.size() != 1) {
 				cerr << "Error parsing alignment policy setting " << setting
 				     << "; RHS must have 1 token" << endl
 					 << "Policy: " << s.c_str() << endl;
 				assert(false); throw 1;
 			}
 			string tmp = ctoks[0];
 			istringstream tmpss(tmp);
 			tmpss >> bonusMatch;
 		}
 		// Scoring for mismatches
 		// MMP={Cxx|Q|RQ}
 		//        Cxx = constant, where constant is integer xx
 		//        Qxx = equal to quality, scaled
 		//        R   = equal to maq-rounded quality value (rounded to nearest
 		//              10, can't be greater than 30)
 		else if(tag == "MMP") {
 			if(ctoks.size() > 3) {
 				cerr << "Error parsing alignment policy setting "
 				     << "'" << tag.c_str() << "'"
 				     << "; RHS must have at most 3 tokens" << endl
 					 << "Policy: '" << s.c_str() << "'" << endl;
 				assert(false); throw 1;
 			}
 			if(ctoks[0][0] == 'C') {
 				string tmp = ctoks[0].substr(1);
 				// Parse constant penalty
 				istringstream tmpss(tmp);
 				tmpss >> penMmcMax;
 				penMmcMin = penMmcMax;
 				// Parse constant penalty
 				penMmcType = COST_MODEL_CONSTANT;
 			} else if(ctoks[0][0] == 'Q') {
 				if(ctoks.size() >= 2) {
 					string tmp = ctoks[1];
 					istringstream tmpss(tmp);
 					tmpss >> penMmcMax;
 				} else {
 					penMmcMax = DEFAULT_MM_PENALTY_MAX;
 				}
 				if(ctoks.size() >= 3) {
 					string tmp = ctoks[2];
 					istringstream tmpss(tmp);
 					tmpss >> penMmcMin;
 				} else {
 					penMmcMin = DEFAULT_MM_PENALTY_MIN;
 				}
 				if(penMmcMin > penMmcMax) {
 					cerr << "Error: Maximum mismatch penalty (" << penMmcMax
 					     << ") is less than minimum penalty (" << penMmcMin
 						 << endl;
 					throw 1;
 				}
 				// Set type to =quality
 				penMmcType = COST_MODEL_QUAL;
 			} else if(ctoks[0][0] == 'R') {
 				// Set type to=Maq-quality
 				penMmcType = COST_MODEL_ROUNDED_QUAL;
 			} else {
 				cerr << "Error parsing alignment policy setting "
 				     << "'" << tag.c_str() << "'"
 				     << "; RHS must start with C, Q or R" << endl
 					 << "Policy: '" << s.c_str() << "'" << endl;
 				assert(false); throw 1;
 			}
 		}
        else if(tag == "SCP") {
            if(ctoks.size() > 3) {
                cerr << "Error parsing alignment policy setting "
                << "'" << tag.c_str() << "'"
                << "; SCP must have at most 3 tokens" << endl
                << "Policy: '" << s.c_str() << "'" << endl;
                assert(false); throw 1;
            }
            istringstream tmpMax(ctoks[1]);
            tmpMax >> penScMax;
            istringstream tmpMin(ctoks[1]);
            tmpMin >> penScMin;
            if(penScMin > penScMax) {
                cerr << "max (" << penScMax << ") should be >= min (" << penScMin << ")" << endl;
                assert(false); throw 1;
            }
            if(penScMin < 1) {
                cerr << "min (" << penScMin << ") should be greater than 0" << endl;
                assert(false); throw 1;
            }
        }
 		// Scoring for mismatches where read char=N
 		// NP={Cxx|Q|RQ}
 		//        Cxx = constant, where constant is integer xx
 		//        Q   = equal to quality
 		//        R   = equal to maq-rounded quality value (rounded to nearest
 		//              10, can't be greater than 30)
 		else if(tag == "NP") {
 			if(ctoks.size() != 1) {
 				cerr << "Error parsing alignment policy setting "
 				     << "'" << tag.c_str() << "'"
 				     << "; RHS must have 1 token" << endl
 					 << "Policy: '" << s.c_str() << "'" << endl;
 				assert(false); throw 1;
 			}
 			if(ctoks[0][0] == 'C') {
 				string tmp = ctoks[0].substr(1);
 				// Parse constant penalty
 				istringstream tmpss(tmp);
 				tmpss >> penN;
 				// Parse constant penalty
 				penNType = COST_MODEL_CONSTANT;
 			} else if(ctoks[0][0] == 'Q') {
 				// Set type to =quality
 				penNType = COST_MODEL_QUAL;
 			} else if(ctoks[0][0] == 'R') {
 				// Set type to=Maq-quality
 				penNType = COST_MODEL_ROUNDED_QUAL;
 			} else {
 				cerr << "Error parsing alignment policy setting "
 				     << "'" << tag.c_str() << "'"
 				     << "; RHS must start with C, Q or R" << endl
 					 << "Policy: '" << s.c_str() << "'" << endl;
 				assert(false); throw 1;
 			}
 		}
 		// Scoring for read gaps
 		// RDG=xx,yy,zz
 		//        xx = read gap open penalty
 		//        yy = read gap extension penalty constant coefficient
 		//             (defaults to open penalty)
 		//        zz = read gap extension penalty linear coefficient
 		//             (defaults to 0)
 		else if(tag == "RDG") {
 			if(ctoks.size() >= 1) {
 				istringstream tmpss(ctoks[0]);
 				tmpss >> penRdExConst;
 			} else {
 				penRdExConst = noisyHpolymer ?
 					DEFAULT_READ_GAP_CONST_BADHPOLY :
 					DEFAULT_READ_GAP_CONST;
 			}
 			if(ctoks.size() >= 2) {
 				istringstream tmpss(ctoks[1]);
 				tmpss >> penRdExLinear;
 			} else {
 				penRdExLinear = noisyHpolymer ?
 					DEFAULT_READ_GAP_LINEAR_BADHPOLY :
 					DEFAULT_READ_GAP_LINEAR;
 			}
 		}
 		// Scoring for reference gaps
 		// RFG=xx,yy,zz
 		//        xx = ref gap open penalty
 		//        yy = ref gap extension penalty constant coefficient
 		//             (defaults to open penalty)
 		//        zz = ref gap extension penalty linear coefficient
 		//             (defaults to 0)
 		else if(tag == "RFG") {
 			if(ctoks.size() >= 1) {
 				istringstream tmpss(ctoks[0]);
 				tmpss >> penRfExConst;
 			} else {
 				penRfExConst = noisyHpolymer ?
 					DEFAULT_REF_GAP_CONST_BADHPOLY :
 					DEFAULT_REF_GAP_CONST;
 			}
 			if(ctoks.size() >= 2) {
 				istringstream tmpss(ctoks[1]);
 				tmpss >> penRfExLinear;
 			} else {
 				penRfExLinear = noisyHpolymer ?
 					DEFAULT_REF_GAP_LINEAR_BADHPOLY :
 					DEFAULT_REF_GAP_LINEAR;
 			}
 		}
 		// Minimum score as a function of read length
 		// MIN=xx,yy
 		//        xx = constant coefficient
 		//        yy = linear coefficient
 		else if(tag == "MIN") {
 			PARSE_FUNC(costMin);
 		}
 		// Per-read N ceiling as a function of read length
 		// NCEIL=xx,yy
 		//        xx = N ceiling constant coefficient
 		//        yy = N ceiling linear coefficient (set to 0 if unspecified)
 		else if(tag == "NCEIL") {
 			PARSE_FUNC(nCeil);
 		}
 		/*
 		 * Seeds
 		 * -----
 		 *
 		 * SEED=mm,len,ival (default: SEED=0,22)
 		 *
 		 *   mm   = Maximum number of mismatches allowed within a seed.
 		 *          Must be >= 0 and <= 2.  Note that 2-mismatch mode is
 		 *          not fully sensitive; i.e. some 2-mismatch seed
 		 *          alignments may be missed.
 		 *   len  = Length of seed.
 		 *   ival = Interval between seeds.  If not specified, seed
 		 *          interval is determined by IVAL.
 		 */
 		else if(tag == "SEED") {
 			if(ctoks.size() > 2) {
 				cerr << "Error parsing alignment policy setting "
 				     << "'" << tag.c_str() << "'; RHS must have 1 or 2 tokens, "
 					 << "had " << ctoks.size() << ".  "
 					 << "Policy: '" << s.c_str() << "'" << endl;
 				assert(false); throw 1;
 			}
 			if(ctoks.size() >= 1) {
 				istringstream tmpss(ctoks[0]);
 				tmpss >> multiseedMms;
 				if(multiseedMms > 1) {
 					cerr << "Error: -N was set to " << multiseedMms << ", but cannot be set greater than 1" << endl;
 					throw 1;
 				}
 				if(multiseedMms < 0) {
 					cerr << "Error: -N was set to a number less than 0 (" << multiseedMms << ")" << endl;
 					throw 1;
 				}
 			}
 			if(ctoks.size() >= 2) {
 				istringstream tmpss(ctoks[1]);
 				tmpss >> multiseedLen;
 			} else {
 				multiseedLen = DEFAULT_SEEDLEN;
 			}
 		}
 		else if(tag == "SEEDLEN") {
 			if(ctoks.size() > 1) {
 				cerr << "Error parsing alignment policy setting "
 				     << "'" << tag.c_str() << "'; RHS must have 1 token, "
 					 << "had " << ctoks.size() << ".  "
 					 << "Policy: '" << s.c_str() << "'" << endl;
 				assert(false); throw 1;
 			}
 			if(ctoks.size() >= 1) {
 				istringstream tmpss(ctoks[0]);
 				tmpss >> multiseedLen;
 			}
 		}
 		else if(tag == "DPS") {
 			if(ctoks.size() > 1) {
 				cerr << "Error parsing alignment policy setting "
 				     << "'" << tag.c_str() << "'; RHS must have 1 token, "
 					 << "had " << ctoks.size() << ".  "
 					 << "Policy: '" << s.c_str() << "'" << endl;
 				assert(false); throw 1;
 			}
 			if(ctoks.size() >= 1) {
 				istringstream tmpss(ctoks[0]);
 				tmpss >> failStreak;
 			}
 		}
 		else if(tag == "ROUNDS") {
 			if(ctoks.size() > 1) {
 				cerr << "Error parsing alignment policy setting "
 				     << "'" << tag.c_str() << "'; RHS must have 1 token, "
 					 << "had " << ctoks.size() << ".  "
 					 << "Policy: '" << s.c_str() << "'" << endl;
 				assert(false); throw 1;
 			}
 			if(ctoks.size() >= 1) {
 				istringstream tmpss(ctoks[0]);
 				tmpss >> seedRounds;
 			}
 		}
 		/*
 		 * Seed interval
 		 * -------------
 		 *
 		 * IVAL={L|S|C},a,b (default: IVAL=S,1.0,0.0)
 		 *
 		 *   L  = let interval between seeds be a linear function of the
 		 *        read length.  xx and yy are the constant and linear
 		 *        coefficients respectively.  In other words, the interval
 		 *        equals a * len + b, where len is the read length.
 		 *        Intervals less than 1 are rounded up to 1.
 		 *   S  = let interval between seeds be a function of the sqaure
 		 *        root of the  read length.  xx and yy are the
 		 *        coefficients.  In other words, the interval equals
 		 *        a * sqrt(len) + b, where len is the read length.
 		 *        Intervals less than 1 are rounded up to 1.
 		 *   C  = Like S but uses cube root of length instead of square
 		 *        root.
 		 */
 		else if(tag == "IVAL") {
 			PARSE_FUNC(multiseedIval);
 		}
        else if(tag == "CANINTRONLEN") {
            assert(penCanIntronLen != NULL);
 			PARSE_FUNC((*penCanIntronLen));
 		}
        else if(tag == "NONCANINTRONLEN") {
            assert(penNoncanIntronLen != NULL);
            PARSE_FUNC((*penNoncanIntronLen));
        }
 		else {
 			// Unknown tag
 			cerr << "Unexpected alignment policy setting "
 				 << "'" << tag.c_str() << "'" << endl
 				 << "Policy: '" << s.c_str() << "'" << endl;
 			assert(false); throw 1;
 		}
 	}
 }
 #ifdef ALIGNER_SEED_POLICY_MAIN
 int main() {
 	int bonusMatchType;
 	int bonusMatch;
 	int penMmcType;
 	int penMmc;
    int penScMax;
    int penScMin;
 	int penNType;
 	int penN;
 	int penRdExConst;
 	int penRfExConst;
 	int penRdExLinear;
 	int penRfExLinear;
 	SimpleFunc costMin;
 	SimpleFunc costFloor;
 	SimpleFunc nCeil;
 	bool nCatPair;
 	int multiseedMms;
 	int multiseedLen;
 	SimpleFunc msIval;
 	SimpleFunc posfrac;
 	SimpleFunc rowmult;
 	uint32_t mhits;
 	{
 		cout << "Case 1: Defaults 1 ... ";
 		const char *pol = "";
 		SeedAlignmentPolicy::parseString(
 			string(pol),
 			false,              // --local?
 			false,              // noisy homopolymers a la 454?
 			false,              // ignore qualities?
 			bonusMatchType,
 			bonusMatch,
 			penMmcType,
 			penMmc,
            penScMax,
            penScMin,
 			penNType,
 			penN,
 			penRdExConst,
 			penRfExConst,
 			penRdExLinear,
 			penRfExLinear,
 			costMin,
 			costFloor,
 			nCeil,
 			nCatPair,
 			multiseedMms,
 			multiseedLen,
 			msIval,
 			mhits);
 		assert_eq(DEFAULT_MATCH_BONUS_TYPE,   bonusMatchType);
 		assert_eq(DEFAULT_MATCH_BONUS,        bonusMatch);
 		assert_eq(DEFAULT_MM_PENALTY_TYPE,    penMmcType);
 		assert_eq(DEFAULT_MM_PENALTY_MAX,     penMmcMax);
 		assert_eq(DEFAULT_MM_PENALTY_MIN,     penMmcMin);
 		assert_eq(DEFAULT_N_PENALTY_TYPE,     penNType);
 		assert_eq(DEFAULT_N_PENALTY,          penN);
 		assert_eq(DEFAULT_MIN_CONST,          costMin.getConst());
 		assert_eq(DEFAULT_MIN_LINEAR,         costMin.getCoeff());
 		assert_eq(DEFAULT_FLOOR_CONST,        costFloor.getConst());
 		assert_eq(DEFAULT_FLOOR_LINEAR,       costFloor.getCoeff());
 		assert_eq(DEFAULT_N_CEIL_CONST,       nCeil.getConst());
 		assert_eq(DEFAULT_N_CAT_PAIR,         nCatPair);
 		assert_eq(DEFAULT_READ_GAP_CONST,     penRdExConst);
 		assert_eq(DEFAULT_READ_GAP_LINEAR,    penRdExLinear);
 		assert_eq(DEFAULT_REF_GAP_CONST,      penRfExConst);
 		assert_eq(DEFAULT_REF_GAP_LINEAR,     penRfExLinear);
 		assert_eq(DEFAULT_SEEDMMS,            multiseedMms);
 		assert_eq(DEFAULT_SEEDLEN,            multiseedLen);
 		assert_eq(DEFAULT_IVAL,               msIval.getType());
 		assert_eq(DEFAULT_IVAL_A,             msIval.getCoeff());
 		assert_eq(DEFAULT_IVAL_B,             msIval.getConst());
 		cout << "PASSED" << endl;
 	}
 	{
 		cout << "Case 2: Defaults 2 ... ";
 		const char *pol = "";
 		SeedAlignmentPolicy::parseString(
 			string(pol),
 			false,              // --local?
 			true,               // noisy homopolymers a la 454?
 			false,              // ignore qualities?
 			bonusMatchType,
 			bonusMatch,
 			penMmcType,
 			penMmc,
 			penNType,
 			penN,
 			penRdExConst,
 			penRfExConst,
 			penRdExLinear,
 			penRfExLinear,
 			costMin,
 			costFloor,
 			nCeil,
 			nCatPair,
 			multiseedMms,
 			multiseedLen,
 			msIval,
 			mhits);
 		assert_eq(DEFAULT_MATCH_BONUS_TYPE,   bonusMatchType);
 		assert_eq(DEFAULT_MATCH_BONUS,        bonusMatch);
 		assert_eq(DEFAULT_MM_PENALTY_TYPE,    penMmcType);
 		assert_eq(DEFAULT_MM_PENALTY_MAX,     penMmc);
 		assert_eq(DEFAULT_MM_PENALTY_MIN,     penMmc);
 		assert_eq(DEFAULT_N_PENALTY_TYPE,     penNType);
 		assert_eq(DEFAULT_N_PENALTY,          penN);
 		assert_eq(DEFAULT_MIN_CONST,          costMin.getConst());
 		assert_eq(DEFAULT_MIN_LINEAR,         costMin.getCoeff());
 		assert_eq(DEFAULT_FLOOR_CONST,        costFloor.getConst());
 		assert_eq(DEFAULT_FLOOR_LINEAR,       costFloor.getCoeff());
 		assert_eq(DEFAULT_N_CEIL_CONST,       nCeil.getConst());
 		assert_eq(DEFAULT_N_CAT_PAIR,         nCatPair);
 		assert_eq(DEFAULT_READ_GAP_CONST_BADHPOLY,  penRdExConst);
 		assert_eq(DEFAULT_READ_GAP_LINEAR_BADHPOLY, penRdExLinear);
 		assert_eq(DEFAULT_REF_GAP_CONST_BADHPOLY,   penRfExConst);
 		assert_eq(DEFAULT_REF_GAP_LINEAR_BADHPOLY,  penRfExLinear);
 		assert_eq(DEFAULT_SEEDMMS,            multiseedMms);
 		assert_eq(DEFAULT_SEEDLEN,            multiseedLen);
 		assert_eq(DEFAULT_IVAL,               msIval.getType());
 		assert_eq(DEFAULT_IVAL_A,             msIval.getCoeff());
 		assert_eq(DEFAULT_IVAL_B,             msIval.getConst());
 		cout << "PASSED" << endl;
 	}
 	{
 		cout << "Case 3: Defaults 3 ... ";
 		const char *pol = "";
 		SeedAlignmentPolicy::parseString(
 			string(pol),
 			true,               // --local?
 			false,              // noisy homopolymers a la 454?
 			false,              // ignore qualities?
 			bonusMatchType,
 			bonusMatch,
 			penMmcType,
 			penMmc,
 			penNType,
 			penN,
 			penRdExConst,
 			penRfExConst,
 			penRdExLinear,
 			penRfExLinear,
 			costMin,
 			costFloor,
 			nCeil,
 			nCatPair,
 			multiseedMms,
 			multiseedLen,
 			msIval,
 			mhits);
 		assert_eq(DEFAULT_MATCH_BONUS_TYPE_LOCAL,   bonusMatchType);
 		assert_eq(DEFAULT_MATCH_BONUS_LOCAL,        bonusMatch);
 		assert_eq(DEFAULT_MM_PENALTY_TYPE,    penMmcType);
 		assert_eq(DEFAULT_MM_PENALTY_MAX,     penMmcMax);
 		assert_eq(DEFAULT_MM_PENALTY_MIN,     penMmcMin);
 		assert_eq(DEFAULT_N_PENALTY_TYPE,     penNType);
 		assert_eq(DEFAULT_N_PENALTY,          penN);
 		assert_eq(DEFAULT_MIN_CONST_LOCAL,    costMin.getConst());
 		assert_eq(DEFAULT_MIN_LINEAR_LOCAL,   costMin.getCoeff());
 		assert_eq(DEFAULT_FLOOR_CONST_LOCAL,  costFloor.getConst());
 		assert_eq(DEFAULT_FLOOR_LINEAR_LOCAL, costFloor.getCoeff());
 		assert_eq(DEFAULT_N_CEIL_CONST,       nCeil.getConst());
 		assert_eq(DEFAULT_N_CEIL_LINEAR,      nCeil.getCoeff());
 		assert_eq(DEFAULT_N_CAT_PAIR,         nCatPair);
 		assert_eq(DEFAULT_READ_GAP_CONST,     penRdExConst);
 		assert_eq(DEFAULT_READ_GAP_LINEAR,    penRdExLinear);
 		assert_eq(DEFAULT_REF_GAP_CONST,      penRfExConst);
 		assert_eq(DEFAULT_REF_GAP_LINEAR,     penRfExLinear);
 		assert_eq(DEFAULT_SEEDMMS,            multiseedMms);
 		assert_eq(DEFAULT_SEEDLEN,            multiseedLen);
 		assert_eq(DEFAULT_IVAL,               msIval.getType());
 		assert_eq(DEFAULT_IVAL_A,             msIval.getCoeff());
 		assert_eq(DEFAULT_IVAL_B,             msIval.getConst());
 		cout << "PASSED" << endl;
 	}
 	{
 		cout << "Case 4: Simple string 1 ... ";
 		const char *pol = "MMP=C44;MA=4;RFG=24,12;FL=C,8;RDG=2;NP=C4;MIN=C,7";
 		SeedAlignmentPolicy::parseString(
 			string(pol),
 			true,               // --local?
 			false,              // noisy homopolymers a la 454?
 			false,              // ignore qualities?
 			bonusMatchType,
 			bonusMatch,
 			penMmcType,
 			penMmc,
 			penNType,
 			penN,
 			penRdExConst,
 			penRfExConst,
 			penRdExLinear,
 			penRfExLinear,
 			costMin,
 			costFloor,
 			nCeil,
 			nCatPair,
 			multiseedMms,
 			multiseedLen,
 			msIval,
 			mhits);
 		assert_eq(COST_MODEL_CONSTANT,        bonusMatchType);
 		assert_eq(4,                          bonusMatch);
 		assert_eq(COST_MODEL_CONSTANT,        penMmcType);
 		assert_eq(44,                         penMmc);
 		assert_eq(COST_MODEL_CONSTANT,        penNType);
 		assert_eq(4.0f,                       penN);
 		assert_eq(7,                          costMin.getConst());
 		assert_eq(DEFAULT_MIN_LINEAR_LOCAL,   costMin.getCoeff());
 		assert_eq(8,                          costFloor.getConst());
 		assert_eq(DEFAULT_FLOOR_LINEAR_LOCAL, costFloor.getCoeff());
 		assert_eq(DEFAULT_N_CEIL_CONST,       nCeil.getConst());
 		assert_eq(DEFAULT_N_CEIL_LINEAR,      nCeil.getCoeff());
 		assert_eq(DEFAULT_N_CAT_PAIR,         nCatPair);
 		assert_eq(2.0f,                       penRdExConst);
 		assert_eq(DEFAULT_READ_GAP_LINEAR,    penRdExLinear);
 		assert_eq(24.0f,                      penRfExConst);
 		assert_eq(12.0f,                      penRfExLinear);
 		assert_eq(DEFAULT_SEEDMMS,            multiseedMms);
 		assert_eq(DEFAULT_SEEDLEN,            multiseedLen);
 		assert_eq(DEFAULT_IVAL,               msIval.getType());
 		assert_eq(DEFAULT_IVAL_A,             msIval.getCoeff());
 		assert_eq(DEFAULT_IVAL_B,             msIval.getConst());
 		cout << "PASSED" << endl;
 	}
 }
 #endif /*def ALIGNER_SEED_POLICY_MAIN*/
--- a/aligner_seed_policy.h
+++ b/aligner_seed_policy.h
@ -0,0 +1,234 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef ALIGNER_SEED_POLICY_H_
 #define ALIGNER_SEED_POLICY_H_
 #include "scoring.h"
 #include "simple_func.h"
 #define DEFAULT_SEEDMMS 0
 #define DEFAULT_SEEDLEN 22
 #define DEFAULT_IVAL SIMPLE_FUNC_SQRT
 #define DEFAULT_IVAL_A 1.15f
 #define DEFAULT_IVAL_B 0.0f
 #define DEFAULT_UNGAPPED_HITS 6
 /**
 * Encapsulates the set of all parameters that affect what the
 * SeedAligner does with reads.
 */
 class SeedAlignmentPolicy {
 public:
 	/**
 	 * Parse alignment policy when provided in this format:
 	 * <lab>=<val>;<lab>=<val>;<lab>=<val>...
 	 *
 	 * And label=value possibilities are:
 	 *
 	 * Bonus for a match
 	 * -----------------
 	 *
 	 * MA=xx (default: MA=0, or MA=2 if --local is set)
 	 *
 	 *    xx = Each position where equal read and reference characters match up
 	 *         in the alignment contriubtes this amount to the total score.
 	 *
 	 * Penalty for a mismatch
 	 * ----------------------
 	 *
 	 * MMP={Cxx|Q|RQ} (default: MMP=C6)
 	 *
 	 *   Cxx = Each mismatch costs xx.  If MMP=Cxx is specified, quality
 	 *         values are ignored when assessing penalities for mismatches.
 	 *   Q   = Each mismatch incurs a penalty equal to the mismatched base's
 	 *         value.
 	 *   R   = Each mismatch incurs a penalty equal to the mismatched base's
 	 *         rounded quality value.  Qualities are rounded off to the
 	 *         nearest 10, and qualities greater than 30 are rounded to 30.
 	 *
 	 * Penalty for position with N (in either read or reference)
 	 * ---------------------------------------------------------
 	 *
 	 * NP={Cxx|Q|RQ} (default: NP=C1)
 	 *
 	 *   Cxx = Each alignment position with an N in either the read or the
 	 *         reference costs xx.  If NP=Cxx is specified, quality values are
 	 *         ignored when assessing penalities for Ns.
 	 *   Q   = Each alignment position with an N in either the read or the
 	 *         reference incurs a penalty equal to the read base's quality
 	 *         value.
 	 *   R   = Each alignment position with an N in either the read or the
 	 *         reference incurs a penalty equal to the read base's rounded
 	 *         quality value.  Qualities are rounded off to the nearest 10,
 	 *         and qualities greater than 30 are rounded to 30.
 	 *
 	 * Penalty for a read gap
 	 * ----------------------
 	 *
 	 * RDG=xx,yy (default: RDG=5,3)
 	 *
 	 *   xx    = Read gap open penalty.
 	 *   yy    = Read gap extension penalty.
 	 *
 	 * Total cost incurred by a read gap = xx + (yy * gap length)
 	 *
 	 * Penalty for a reference gap
 	 * ---------------------------
 	 *
 	 * RFG=xx,yy (default: RFG=5,3)
 	 *
 	 *   xx    = Reference gap open penalty.
 	 *   yy    = Reference gap extension penalty.
 	 *
 	 * Total cost incurred by a reference gap = xx + (yy * gap length)
 	 *
 	 * Minimum score for valid alignment
 	 * ---------------------------------
 	 *
 	 * MIN=xx,yy (defaults: MIN=-0.6,-0.6, or MIN=0.0,0.66 if --local is set)
 	 *
 	 *   xx,yy = For a read of length N, the total score must be at least
 	 *           xx + (read length * yy) for the alignment to be valid.  The
 	 *           total score is the sum of all negative penalties (from
 	 *           mismatches and gaps) and all positive bonuses.  The minimum
 	 *           can be negative (and is by default in global alignment mode).
 	 *
 	 * N ceiling
 	 * ---------
 	 *
 	 * NCEIL=xx,yy (default: NCEIL=0.0,0.15)
 	 *
 	 *   xx,yy = For a read of length N, the number of alignment
 	 *           positions with an N in either the read or the
 	 *           reference cannot exceed
 	 *           ceiling = xx + (read length * yy).  If the ceiling is
 	 *           exceeded, the alignment is considered invalid.
 	 *
 	 * Seeds
 	 * -----
 	 *
 	 * SEED=mm,len,ival (default: SEED=0,22)
 	 *
 	 *   mm   = Maximum number of mismatches allowed within a seed.
 	 *          Must be >= 0 and <= 2.  Note that 2-mismatch mode is
 	 *          not fully sensitive; i.e. some 2-mismatch seed
 	 *          alignments may be missed.
 	 *   len  = Length of seed.
 	 *   ival = Interval between seeds.  If not specified, seed
 	 *          interval is determined by IVAL.
 	 *
 	 * Seed interval
 	 * -------------
 	 *
 	 * IVAL={L|S|C},xx,yy (default: IVAL=S,1.0,0.0)
 	 *
 	 *   L  = let interval between seeds be a linear function of the
 	 *        read length.  xx and yy are the constant and linear
 	 *        coefficients respectively.  In other words, the interval
 	 *        equals a * len + b, where len is the read length.
 	 *        Intervals less than 1 are rounded up to 1.
 	 *   S  = let interval between seeds be a function of the sqaure
 	 *        root of the  read length.  xx and yy are the
 	 *        coefficients.  In other words, the interval equals
 	 *        a * sqrt(len) + b, where len is the read length.
 	 *        Intervals less than 1 are rounded up to 1.
 	 *   C  = Like S but uses cube root of length instead of square
 	 *        root.
 	 *
 	 * Example 1:
 	 *
 	 *  SEED=1,10,5 and read sequence is TGCTATCGTACGATCGTAC:
 	 *
 	 *  The following seeds are extracted from the forward
 	 *  representation of the read and aligned to the reference
 	 *  allowing up to 1 mismatch:
 	 *
 	 *  Read:    TGCTATCGTACGATCGTACA
 	 *
 	 *  Seed 1+: TGCTATCGTA
 	 *  Seed 2+:      TCGTACGATC
 	 *  Seed 3+:           CGATCGTACA
 	 *
 	 *  ...and the following are extracted from the reverse-complement
 	 *  representation of the read and align to the reference allowing
 	 *  up to 1 mismatch:
 	 *
 	 *  Seed 1-: TACGATAGCA
 	 *  Seed 2-:      GATCGTACGA
 	 *  Seed 3-:           TGTACGATCG
 	 *
 	 * Example 2:
 	 *
 	 *  SEED=1,20,20 and read sequence is TGCTATCGTACGATC.  The seed
 	 *  length is 20 but the read is only 15 characters long.  In this
 	 *  case, Bowtie2 automatically shrinks the seed length to be equal
 	 *  to the read length.
 	 *
 	 *  Read:    TGCTATCGTACGATC
 	 *
 	 *  Seed 1+: TGCTATCGTACGATC
 	 *  Seed 1-: GATCGTACGATAGCA
 	 *
 	 * Example 3:
 	 *
 	 *  SEED=1,10,10 and read sequence is TGCTATCGTACGATC.  Only one seed
 	 *  fits on the read; a second seed would overhang the end of the read
 	 *  by 5 positions.  In this case, Bowtie2 extracts one seed.
 	 *
 	 *  Read:    TGCTATCGTACGATC
 	 *
 	 *  Seed 1+: TGCTATCGTA
 	 *  Seed 1-: TACGATAGCA
 	 */
 	static void parseString(
                            const       std::string& s,
                            bool        local,
                            bool        noisyHpolymer,
                            bool        ignoreQuals,
                            int&        bonusMatchType,
                            int&        bonusMatch,
                            int&        penMmcType,
                            int&        penMmcMax,
                            int&        penMmcMin,
                            int&        penScMax,
                            int&        penScMin,
                            int&        penNType,
                            int&        penN,
                            int&        penRdExConst,
                            int&        penRfExConst,
                            int&        penRdExLinear,
                            int&        penRfExLinear,
                            SimpleFunc& costMin,
                            SimpleFunc& nCeil,
                            bool&       nCatPair,
                            int&        multiseedMms,
                            int&        multiseedLen,
                            SimpleFunc& multiseedIval,
                            size_t&     failStreak,
                            size_t&     seedRounds,
                            SimpleFunc* penCanIntronLen = NULL,
                            SimpleFunc* penNoncanIntronLen = NULL);
 };
 #endif /*ndef ALIGNER_SEED_POLICY_H_*/
--- a/aligner_sw.cpp
+++ b/aligner_sw.cpp
--- a/aligner_sw.h
+++ b/aligner_sw.h
@ -0,0 +1,648 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 /*
 * aligner_sw.h
 *
 * Classes and routines for solving dynamic programming problems in aid of read
 * alignment.  Goals include the ability to handle:
 *
 * - Both read alignment, where the query must align end-to-end, and local
 *   alignment, where we seek a high-scoring alignment that need not involve
 *   the entire query.
 * - Situations where: (a) we've found a seed hit and are trying to extend it
 *   into a larger hit, (b) we've found an alignment for one mate of a pair and
 *   are trying to find a nearby alignment for the other mate, (c) we're
 *   aligning against an entire reference sequence.
 * - Caller-specified indicators for what columns of the dynamic programming
 *   matrix we are allowed to start in or end in.
 *
 * TODO:
 *
 * - A slicker way to filter out alignments that violate a ceiling placed on
 *   the number of Ns permitted in the reference portion of the alignment.
 *   Right now we accomplish this by masking out ending columns that correspond
 *   to *ungapped* alignments with too many Ns.  This results in false
 *   positives and false negatives for gapped alignments.  The margin of error
 *   (# of Ns by which we might miscount) is bounded by the number of gaps.
 */
 /**
 *  |-maxgaps-|
 *  ***********oooooooooooooooooooooo    -
 *   ***********ooooooooooooooooooooo    |
 *    ***********oooooooooooooooooooo    |
 *     ***********ooooooooooooooooooo    |
 *      ***********oooooooooooooooooo    |
 *       ***********ooooooooooooooooo read len
 *        ***********oooooooooooooooo    |
 *         ***********ooooooooooooooo    |
 *          ***********oooooooooooooo    |
 *           ***********ooooooooooooo    |
 *            ***********oooooooooooo    -
 *            |-maxgaps-|
 *  |-readlen-|
 *  |-------skip--------|
 */
 #ifndef ALIGNER_SW_H_
 #define ALIGNER_SW_H_
 #define INLINE_CUPS
 #include <stdint.h>
 #include <iostream>
 #include <limits>
 #include "threading.h"
 #include <emmintrin.h>
 #include "aligner_sw_common.h"
 #include "aligner_sw_nuc.h"
 #include "ds.h"
 #include "aligner_seed.h"
 #include "reference.h"
 #include "random_source.h"
 #include "mem_ids.h"
 #include "aligner_result.h"
 #include "mask.h"
 #include "dp_framer.h"
 #include "aligner_swsse.h"
 #include "aligner_bt.h"
 #define QUAL2(d, f) sc_->mm((int)(*rd_)[rdi_ + d], \
 							(int)  rf_ [rfi_ + f], \
 							(int)(*qu_)[rdi_ + d] - 33)
 #define QUAL(d)     sc_->mm((int)(*rd_)[rdi_ + d], \
 							(int)(*qu_)[rdi_ + d] - 33)
 #define N_SNP_PEN(c) (((int)rf_[rfi_ + c] > 15) ? sc_->n(30) : sc_->penSnp)
 /**
 * SwAligner
 * =========
 *
 * Ensapsulates facilities for alignment using dynamic programming.  Handles
 * alignment of nucleotide reads against known reference nucleotides.
 *
 * The class is stateful.  First the user must call init() to initialize the
 * object with details regarding the dynamic programming problem to be solved.
 * Next, the user calls align() to fill the dynamic programming matrix and
 * calculate summaries describing the solutions.  Finally the user calls 
 * nextAlignment(...), perhaps repeatedly, to populate the SwResult object with
 * the next result.  Results are dispensend in best-to-worst, left-to-right
 * order.
 *
 * The class expects the read string, quality string, and reference string
 * provided by the caller live at least until the user is finished aligning and
 * obtaining alignments from this object.
 *
 * There is a design tradeoff between hiding/exposing details of the genome and
 * its strands to the SwAligner.  In a sense, a better design is to hide
 * details such as the id of the reference sequence aligned to, or whether
 * we're aligning the read in its original forward orientation or its reverse
 * complement.  But this means that any alignment results returned by SwAligner
 * have to be extended to include those details before they're useful to the
 * caller.  We opt for messy but expedient - the reference id and orientation
 * of the read are given to SwAligner, remembered, and used to populate
 * SwResults.
 *
 * LOCAL VS GLOBAL
 *
 * The dynamic programming aligner supports both local and global alignment,
 * and one option in between.  To implement global alignment, the aligner (a)
 * allows negative scores (i.e. doesn't necessarily clamp them up to 0), (b)
 * checks in rows other than the last row for acceptable solutions, and (c)
 * optionally adds a bonus to the score for matches.
 * 
 * For global alignment, we:
 *
 * (a) Allow negative scores
 * (b) Check only in the last row
 * (c) Either add a bonus for matches or not (doesn't matter)
 *
 * For local alignment, we:
 *
 * (a) Clamp scores to 0
 * (b) Check in any row for a sufficiently high score
 * (c) Add a bonus for matches
 *
 * An in-between solution is to allow alignments to be curtailed on the
 * right-hand side if a better score can be achieved thereby, but not on the
 * left.  For this, we:
 *
 * (a) Allow negative scores
 * (b) Check in any row for a sufficiently high score
 * (c) Either add a bonus for matches or not (doesn't matter)
 *
 * REDUNDANT ALIGNMENTS
 *
 * When are two alignments distinct and when are they redundant (not distinct)?
 * At one extreme, we might say the best alignment from any given dynamic
 * programming problem is redundant with all other alignments from that
 # problem.  At the other extreme, we might say that any two alignments with
 * distinct starting points and edits are distinct.  The former is probably too
 * conservative for mate-finding DP problems.  The latter is certainly too
 * permissive, since two alignments that differ only in how gaps are arranged
 * should not be considered distinct.
 *
 * Some in-between solutions are:
 *
 * (a) If two alignments share an end point on either end, they are redundant.
 *     Otherwise, they are distinct.
 * (b) If two alignments share *both* end points, they are redundant.
 * (c) If two alignments share any cells in the DP table, they are redundant.
 * (d) 2 alignments are redundant if either end within N poss of each other
 * (e) Like (d) but both instead of either
 * (f, g) Like d, e, but where N is tied to maxgaps somehow
 *
 * Why not (a)?  One reason is that it's possible for two alignments to have
 * different start & end positions but share many cells.  Consider alignments 1
 * and 2 below; their end-points are labeled.
 *
 *  1 2
 *  \ \
 *    -\
 *      \
 *       \
 *        \
 *        -\
 *        \ \
 *        1 2
 *
 * 1 and 2 are distinct according to (a) but they share many cells in common.
 *
 * Why not (f, g)?  It fixes the problem with (a) above by forcing the
 * alignments to be spread so far that they can't possibly share diagonal cells
 * in common
 */
 class SwAligner {
 	typedef std::pair<size_t, size_t> SizeTPair;
 	// States that the aligner can be in
 	enum {
 		STATE_UNINIT,  // init() hasn't been called yet
 		STATE_INITED,  // init() has been called, but not align()
 		STATE_ALIGNED, // align() has been called
 	};
 	const static size_t ALPHA_SIZE = 5;
 public:
 	explicit SwAligner() :
 		sseU8fw_(DP_CAT),
 		sseU8rc_(DP_CAT),
 		sseI16fw_(DP_CAT),
 		sseI16rc_(DP_CAT),
 		state_(STATE_UNINIT),
 		initedRead_(false),
 		readSse16_(false),
 		initedRef_(false),
 		rfwbuf_(DP_CAT),
 		btnstack_(DP_CAT),
 		btcells_(DP_CAT),
 		btdiag_(),
 		btncand_(DP_CAT),
 		btncanddone_(DP_CAT),
 		btncanddoneSucc_(0),
 		btncanddoneFail_(0),
 		cper_(),
 		cperMinlen_(),
 		cperPerPow2_(),
 		cperEf_(),
 		cperTri_(),
 		colstop_(0),
 		lastsolcol_(0),
 		cural_(0)
 		ASSERT_ONLY(, cand_tmp_(DP_CAT))
 	{ }
 	/**
 	 * Prepare the dynamic programming driver with a new read and a new scoring
 	 * scheme.
 	 */
 	void initRead(
 		const BTDnaString& rdfw, // read sequence for fw read
 		const BTDnaString& rdrc, // read sequence for rc read
 		const BTString& qufw,    // read qualities for fw read
 		const BTString& qurc,    // read qualities for rc read
 		size_t rdi,              // offset of first read char to align
 		size_t rdf,              // offset of last read char to align
 		const Scoring& sc);      // scoring scheme
 	/**
 	 * Initialize with a new alignment problem.
 	 */
 	void initRef(
 		bool fw,               // whether to forward or revcomp read is aligning
 		TRefId refidx,         // id of reference aligned against
 		const DPRect& rect,    // DP rectangle
 		char *rf,              // reference sequence
 		size_t rfi,            // offset of first reference char to align to
 		size_t rff,            // offset of last reference char to align to
 		TRefOff reflen,        // length of reference sequence
 		const Scoring& sc,     // scoring scheme
 		TAlScore minsc,        // minimum score
 		bool enable8,          // use 8-bit SSE if possible?
 		size_t cminlen,        // minimum length for using checkpointing scheme
 		size_t cpow2,          // interval b/t checkpointed diags; 1 << this
 		bool doTri,            // triangular mini-fills?
 		bool extend);          // true iff this is a seed extension
 	/**
 	 * Given a read, an alignment orientation, a range of characters in a
 	 * referece sequence, and a bit-encoded version of the reference,
 	 * execute the corresponding dynamic programming problem.
 	 *
 	 * Here we expect that the caller has already narrowed down the relevant
 	 * portion of the reference (e.g. using a seed hit) and all we do is
 	 * banded dynamic programming in the vicinity of that portion.  This is not
 	 * the function to call if we are trying to solve the whole alignment
 	 * problem with dynamic programming (that is TODO).
 	 *
 	 * Returns true if an alignment was found, false otherwise.
 	 */
 	void initRef(
 		bool fw,               // whether to forward or revcomp read aligned
 		TRefId refidx,         // reference aligned against
 		const DPRect& rect,    // DP rectangle
 		const BitPairReference& refs, // Reference strings
 		TRefOff reflen,        // length of reference sequence
 		const Scoring& sc,     // scoring scheme
 		TAlScore minsc,        // minimum alignment score
 		bool enable8,          // use 8-bit SSE if possible?
 		size_t cminlen,        // minimum length for using checkpointing scheme
 		size_t cpow2,          // interval b/t checkpointed diags; 1 << this
 		bool doTri,            // triangular mini-fills?
 		bool extend,           // true iff this is a seed extension
 		size_t  upto,          // count the number of Ns up to this offset
 		size_t& nsUpto);       // output: the number of Ns up to 'upto'
 	/**
 	 * Given a read, an alignment orientation, a range of characters in a
 	 * referece sequence, and a bit-encoded version of the reference, set up
 	 * and execute the corresponding ungapped alignment problem.  There can
 	 * only be one solution.
 	 *
 	 * The caller has already narrowed down the relevant portion of the
 	 * reference using, e.g., the location of a seed hit, or the range of
 	 * possible fragment lengths if we're searching for the opposite mate in a
 	 * pair.
 	 */
 	int ungappedAlign(
 		const BTDnaString&      rd,     // read sequence (could be RC)
 		const BTString&         qu,     // qual sequence (could be rev)
 		const Coord&            coord,  // coordinate aligned to
 		const BitPairReference& refs,   // Reference strings
 		size_t                  reflen, // length of reference sequence
 		const Scoring&          sc,     // scoring scheme
 		bool                    ohang,  // allow overhang?
 		TAlScore                minsc,  // minimum score
 		SwResult&               res);   // put alignment result here
 	/**
 	 * Align read 'rd' to reference using read & reference information given
 	 * last time init() was called.  Uses dynamic programming.
 	 */
 	bool align(RandomSource& rnd, TAlScore& best);
 	/**
 	 * Populate the given SwResult with information about the "next best"
 	 * alignment if there is one.  If there isn't one, false is returned.  Note
 	 * that false might be returned even though a call to done() would have
 	 * returned false.
 	 */
 	bool nextAlignment(
 		SwResult& res,
 		TAlScore minsc,
 		RandomSource& rnd);
 	/**
 	 * Print out an alignment result as an ASCII DP table.
 	 */
 	void printResultStacked(
 		const SwResult& res,
 		std::ostream& os)
 	{
 		res.alres.printStacked(*rd_, os);
 	}
 	/**
 	 * Return true iff there are no more solution cells to backtace from.
 	 * Note that this may return false in situations where there are actually
 	 * no more solutions, but that hasn't been discovered yet.
 	 */
 	bool done() const {
 		assert(initedRead() && initedRef());
 		return cural_ == btncand_.size();
 	}
 	/**
 	 * Return true iff this SwAligner has been initialized with a read to align.
 	 */
 	inline bool initedRef() const { return initedRef_; }
 	/**
 	 * Return true iff this SwAligner has been initialized with a reference to
 	 * align against.
 	 */
 	inline bool initedRead() const { return initedRead_; }
 	/**
 	 * Reset, signaling that we're done with this dynamic programming problem
 	 * and won't be asking for any more alignments.
 	 */
 	inline void reset() { initedRef_ = initedRead_ = false; }
 #ifndef NDEBUG
 	/**
 	 * Check that aligner is internally consistent.
 	 */
 	bool repOk() const {
 		assert_gt(dpRows(), 0);
 		// Check btncand_
 		for(size_t i = 0; i < btncand_.size(); i++) {
 			assert(btncand_[i].repOk());
 			assert_geq(btncand_[i].score, minsc_);
 		}
 		return true;
 	}
 #endif
 	/**
 	 * Return the number of alignments given out so far by nextAlignment().
 	 */
 	size_t numAlignmentsReported() const { return cural_; }
 	/**
 	 * Merge tallies in the counters related to filling the DP table.
 	 */
 	void merge(
 		SSEMetrics& sseU8ExtendMet,
 		SSEMetrics& sseU8MateMet,
 		SSEMetrics& sseI16ExtendMet,
 		SSEMetrics& sseI16MateMet,
 		uint64_t&   nbtfiltst,
 		uint64_t&   nbtfiltsc,
 		uint64_t&   nbtfiltdo)
 	{
 		sseU8ExtendMet.merge(sseU8ExtendMet_);
 		sseU8MateMet.merge(sseU8MateMet_);
 		sseI16ExtendMet.merge(sseI16ExtendMet_);
 		sseI16MateMet.merge(sseI16MateMet_);
 		nbtfiltst += nbtfiltst_;
 		nbtfiltsc += nbtfiltsc_;
 		nbtfiltdo += nbtfiltdo_;
 	}
 	/**
 	 * Reset all the counters related to filling in the DP table to 0.
 	 */
 	void resetCounters() {
 		sseU8ExtendMet_.reset();
 		sseU8MateMet_.reset();
 		sseI16ExtendMet_.reset();
 		sseI16MateMet_.reset();
 		nbtfiltst_ = nbtfiltsc_ = nbtfiltdo_ = 0;
 	}
 	/**
 	 * Return the size of the DP problem.
 	 */
 	size_t size() const {
 		return dpRows() * (rff_ - rfi_);
 	}
 protected:
 	/**
 	 * Return the number of rows that will be in the dynamic programming table.
 	 */
 	inline size_t dpRows() const {
 		assert(initedRead_);
 		return rdf_ - rdi_;
 	}
 	/**
 	 * Align nucleotides from read 'rd' to the reference string 'rf' using
 	 * vector instructions.  Return the score of the best alignment found, or
 	 * the minimum integer if an alignment could not be found.  Flag is set to
 	 * 0 if an alignment is found, -1 if no valid alignment is found, or -2 if
 	 * the score saturated at any point during alignment.
 	 */
 	TAlScore alignNucleotidesEnd2EndSseU8(  // unsigned 8-bit elements
 		int& flag, bool debug);
 	TAlScore alignNucleotidesLocalSseU8(    // unsigned 8-bit elements
 		int& flag, bool debug);
 	TAlScore alignNucleotidesEnd2EndSseI16( // signed 16-bit elements
 		int& flag, bool debug);
 	TAlScore alignNucleotidesLocalSseI16(   // signed 16-bit elements
 		int& flag, bool debug);
 	/**
 	 * Aligns by filling a dynamic programming matrix with the SSE-accelerated,
 	 * banded DP approach of Farrar.  As it goes, it determines which cells we
 	 * might backtrace from and tallies the best (highest-scoring) N backtrace
 	 * candidate cells per diagonal.  Also returns the alignment score of the best
 	 * alignment in the matrix.
 	 *
 	 * This routine does *not* maintain a matrix holding the entire matrix worth of
 	 * scores, nor does it maintain any other dense O(mn) data structure, as this
 	 * would quickly exhaust memory for queries longer than about 10,000 kb.
 	 * Instead, in the fill stage it maintains two columns worth of scores at a
 	 * time (current/previous, or right/left) - these take O(m) space.  When
 	 * finished with the current column, it determines which cells from the
 	 * previous column, if any, are candidates we might backtrace from to find a
 	 * full alignment.  A candidate cell has a score that rises above the threshold
 	 * and isn't improved upon by a match in the next column.  The best N
 	 * candidates per diagonal are stored in a O(m + n) data structure.
 	 */
 	TAlScore alignGatherEE8(                // unsigned 8-bit elements
 		int& flag, bool debug);
 	TAlScore alignGatherLoc8(               // unsigned 8-bit elements
 		int& flag, bool debug);
 	TAlScore alignGatherEE16(               // signed 16-bit elements
 		int& flag, bool debug);
 	TAlScore alignGatherLoc16(              // signed 16-bit elements
 		int& flag, bool debug);
 	/**
 	 * Build query profile look up tables for the read.  The query profile look
 	 * up table is organized as a 1D array indexed by [i][j] where i is the
 	 * reference character in the current DP column (0=A, 1=C, etc), and j is
 	 * the segment of the query we're currently working on.
 	 */
 	void buildQueryProfileEnd2EndSseU8(bool fw);
 	void buildQueryProfileLocalSseU8(bool fw);
 	/**
 	 * Build query profile look up tables for the read.  The query profile look
 	 * up table is organized as a 1D array indexed by [i][j] where i is the
 	 * reference character in the current DP column (0=A, 1=C, etc), and j is
 	 * the segment of the query we're currently working on.
 	 */
 	void buildQueryProfileEnd2EndSseI16(bool fw);
 	void buildQueryProfileLocalSseI16(bool fw);
 	bool gatherCellsNucleotidesLocalSseU8(TAlScore best);
 	bool gatherCellsNucleotidesEnd2EndSseU8(TAlScore best);
 	bool gatherCellsNucleotidesLocalSseI16(TAlScore best);
 	bool gatherCellsNucleotidesEnd2EndSseI16(TAlScore best);
 	bool backtraceNucleotidesLocalSseU8(
 		TAlScore       escore, // in: expected score
 		SwResult&      res,    // out: store results (edits and scores) here
 		size_t&        off,    // out: store diagonal projection of origin
 		size_t&        nbts,   // out: # backtracks
 		size_t         row,    // start in this rectangle row
 		size_t         col,    // start in this rectangle column
 		RandomSource&  rand);  // random gen, to choose among equal paths
 	bool backtraceNucleotidesLocalSseI16(
 		TAlScore       escore, // in: expected score
 		SwResult&      res,    // out: store results (edits and scores) here
 		size_t&        off,    // out: store diagonal projection of origin
 		size_t&        nbts,   // out: # backtracks
 		size_t         row,    // start in this rectangle row
 		size_t         col,    // start in this rectangle column
 		RandomSource&  rand);  // random gen, to choose among equal paths
 	bool backtraceNucleotidesEnd2EndSseU8(
 		TAlScore       escore, // in: expected score
 		SwResult&      res,    // out: store results (edits and scores) here
 		size_t&        off,    // out: store diagonal projection of origin
 		size_t&        nbts,   // out: # backtracks
 		size_t         row,    // start in this rectangle row
 		size_t         col,    // start in this rectangle column
 		RandomSource&  rand);  // random gen, to choose among equal paths
 	bool backtraceNucleotidesEnd2EndSseI16(
 		TAlScore       escore, // in: expected score
 		SwResult&      res,    // out: store results (edits and scores) here
 		size_t&        off,    // out: store diagonal projection of origin
 		size_t&        nbts,   // out: # backtracks
 		size_t         row,    // start in this rectangle row
 		size_t         col,    // start in this rectangle column
 		RandomSource&  rand);  // random gen, to choose among equal paths
 	bool backtrace(
 		TAlScore       escore, // in: expected score
 		bool           fill,   // in: use mini-fill?
 		bool           usecp,  // in: use checkpoints?
 		SwResult&      res,    // out: store results (edits and scores) here
 		size_t&        off,    // out: store diagonal projection of origin
 		size_t         row,    // start in this rectangle row
 		size_t         col,    // start in this rectangle column
 		size_t         maxiter,// max # extensions to try
 		size_t&        niter,  // # extensions tried
 		RandomSource&  rnd)    // random gen, to choose among equal paths
 	{
 		bter_.initBt(
 			escore,              // in: alignment score
 			row,                 // in: start in this row
 			col,                 // in: start in this column
 			fill,                // in: use mini-fill?
 			usecp,               // in: use checkpoints?
 			cperTri_,            // in: triangle-shaped mini-fills?
 			rnd);                // in: random gen, to choose among equal paths
 		assert(bter_.inited());
 		size_t nrej = 0;
 		if(bter_.emptySolution()) {
 			return false;
 		} else {
 			return bter_.nextAlignment(maxiter, res, off, nrej, niter, rnd);
 		}
 	}
 	const BTDnaString  *rd_;     // read sequence
 	const BTString     *qu_;     // read qualities
 	const BTDnaString  *rdfw_;   // read sequence for fw read
 	const BTDnaString  *rdrc_;   // read sequence for rc read
 	const BTString     *qufw_;   // read qualities for fw read
 	const BTString     *qurc_;   // read qualities for rc read
 	TReadOff            rdi_;    // offset of first read char to align
 	TReadOff            rdf_;    // offset of last read char to align
 	bool                fw_;     // true iff read sequence is original fw read
 	TRefId              refidx_; // id of reference aligned against
 	TRefOff             reflen_; // length of entire reference sequence
 	const DPRect*       rect_;   // DP rectangle
 	char               *rf_;     // reference sequence
 	TRefOff             rfi_;    // offset of first ref char to align to
 	TRefOff             rff_;    // offset of last ref char to align to (excl)
 	size_t              rdgap_;  // max # gaps in read
 	size_t              rfgap_;  // max # gaps in reference
 	bool                enable8_;// enable 8-bit sse
 	bool                extend_; // true iff this is a seed-extend problem
 	const Scoring      *sc_;     // penalties for edit types
 	TAlScore            minsc_;  // penalty ceiling for valid alignments
 	int                 nceil_;  // max # Ns allowed in ref portion of aln
 	bool                sse8succ_;  // whether 8-bit worked
 	bool                sse16succ_; // whether 16-bit worked
 	SSEData             sseU8fw_;   // buf for fw query, 8-bit score
 	SSEData             sseU8rc_;   // buf for rc query, 8-bit score
 	SSEData             sseI16fw_;  // buf for fw query, 16-bit score
 	SSEData             sseI16rc_;  // buf for rc query, 16-bit score
 	bool                sseU8fwBuilt_;   // built fw query profile, 8-bit score
 	bool                sseU8rcBuilt_;   // built rc query profile, 8-bit score
 	bool                sseI16fwBuilt_;  // built fw query profile, 16-bit score
 	bool                sseI16rcBuilt_;  // built rc query profile, 16-bit score
 	SSEMetrics			sseU8ExtendMet_;
 	SSEMetrics			sseU8MateMet_;
 	SSEMetrics			sseI16ExtendMet_;
 	SSEMetrics			sseI16MateMet_;
 	int                 state_;        // state
 	bool                initedRead_;   // true iff initialized with initRead
 	bool                readSse16_;    // true -> sse16 from now on for read
 	bool                initedRef_;    // true iff initialized with initRef
 	EList<uint32_t>     rfwbuf_;       // buffer for wordized ref stretches
 	EList<DpNucFrame>    btnstack_;    // backtrace stack for nucleotides
 	EList<SizeTPair>     btcells_;     // cells involved in current backtrace
 	NBest<DpBtCandidate> btdiag_;      // per-diagonal backtrace candidates
 	EList<DpBtCandidate> btncand_;     // cells we might backtrace from
 	EList<DpBtCandidate> btncanddone_; // candidates that we investigated
 	size_t              btncanddoneSucc_; // # investigated and succeeded
 	size_t              btncanddoneFail_; // # investigated and failed
 	BtBranchTracer       bter_;        // backtracer
 	Checkpointer         cper_;        // structure for saving checkpoint cells
 	size_t               cperMinlen_;  // minimum length for using checkpointer
 	size_t               cperPerPow2_; // checkpoint every 1 << perpow2 diags (& next)
 	bool                 cperEf_;      // store E and F in addition to H?
 	bool                 cperTri_;     // checkpoint for triangular mini-fills?
 	size_t              colstop_;      // bailed on DP loop after this many cols
 	size_t              lastsolcol_;   // last DP col with valid cell
 	size_t              cural_;        // index of next alignment to be given
 	uint64_t nbtfiltst_; // # candidates filtered b/c starting cell was seen
 	uint64_t nbtfiltsc_; // # candidates filtered b/c score uninteresting
 	uint64_t nbtfiltdo_; // # candidates filtered b/c dominated by other cell
 	ASSERT_ONLY(SStringExpandable<uint32_t> tmp_destU32_);
 	ASSERT_ONLY(BTDnaString tmp_editstr_, tmp_refstr_);
 	ASSERT_ONLY(EList<DpBtCandidate> cand_tmp_);
 };
 #endif /*ALIGNER_SW_H_*/
--- a/aligner_sw_common.h
+++ b/aligner_sw_common.h
@ -0,0 +1,305 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef ALIGNER_SW_COMMON_H_
 #define ALIGNER_SW_COMMON_H_
 #include "aligner_result.h"
 /**
 * Encapsulates the result of a dynamic programming alignment, including
 * colorspace alignments.  In our case, the result is a combination of:
 *
 * 1. All the nucleotide edits
 * 2. All the "edits" where an ambiguous reference char is resolved to
 *    an unambiguous char.
 * 3. All the color edits (if applicable)
 * 4. All the color miscalls (if applicable).  This is a subset of 3.
 * 5. The score of the best alginment
 * 6. The score of the second-best alignment
 *
 * Having scores for the best and second-best alignments gives us an
 * idea of where gaps may make reassembly beneficial.
 */
 struct SwResult {
 	SwResult() :
 		alres(),
 		sws(0),
 		swcups(0),
 		swrows(0),
 		swskiprows(0),
 		swskip(0),
 		swsucc(0),
 		swfail(0),
 		swbts(0)
 	{ }
 	/**
 	 * Clear all contents.
 	 */
 	void reset() {
 		sws = swcups = swrows = swskiprows = swskip = swsucc =
 		swfail = swbts = 0;
 		alres.reset();
 	}
 	/**
 	 * Reverse all edit lists.
 	 */
 	void reverse() {
 		alres.reverseEdits();
 	}
 	/**
 	 * Return true iff no result has been installed.
 	 */
 	bool empty() const {
 		return alres.empty();
 	}
 #ifndef NDEBUG
 	/**
 	 * Check that result is internally consistent.
 	 */
 	bool repOk() const {
 		assert(alres.repOk());
 		return true;
 	}
 	/**
 	 * Check that result is internally consistent w/r/t read.
 	 */
 	bool repOk(const Read& rd) const {
 		assert(alres.repOk(rd));
 		return true;
 	}
 #endif
 	AlnRes alres;
 	uint64_t sws;    // # DP problems solved
 	uint64_t swcups; // # DP cell updates
 	uint64_t swrows; // # DP row updates
 	uint64_t swskiprows; // # skipped DP row updates (b/c no valid alignments can go thru row)
 	uint64_t swskip; // # DP problems skipped by sse filter
 	uint64_t swsucc; // # DP problems resulting in alignment
 	uint64_t swfail; // # DP problems not resulting in alignment
 	uint64_t swbts;  // # DP backtrace steps
 	int nup;         // upstream decoded nucleotide; for colorspace reads
 	int ndn;         // downstream decoded nucleotide; for colorspace reads
 };
 /**
 * Encapsulates counters that measure how much work has been done by
 * the dynamic programming driver and aligner.
 */
 struct SwMetrics {
 	SwMetrics() : mutex_m() {
 	    reset();
 	}
 	void reset() {
 		sws = swcups = swrows = swskiprows = swskip = swsucc = swfail = swbts =
 		sws10 = sws5 = sws3 =
 		rshit = ungapsucc = ungapfail = ungapnodec = 0;
 		exatts = exranges = exrows = exsucc = exooms = 0;
 		mm1atts = mm1ranges = mm1rows = mm1succ = mm1ooms = 0;
 		sdatts = sdranges = sdrows = sdsucc = sdooms = 0;
 	}
 	void init(
 		uint64_t sws_,
 		uint64_t sws10_,
 		uint64_t sws5_,
 		uint64_t sws3_,
 		uint64_t swcups_,
 		uint64_t swrows_,
 		uint64_t swskiprows_,
 		uint64_t swskip_,
 		uint64_t swsucc_,
 		uint64_t swfail_,
 		uint64_t swbts_,
 		uint64_t rshit_,
 		uint64_t ungapsucc_,
 		uint64_t ungapfail_,
 		uint64_t ungapnodec_,
 		uint64_t exatts_,
 		uint64_t exranges_,
 		uint64_t exrows_,
 		uint64_t exsucc_,
 		uint64_t exooms_,
 		uint64_t mm1atts_,
 		uint64_t mm1ranges_,
 		uint64_t mm1rows_,
 		uint64_t mm1succ_,
 		uint64_t mm1ooms_,
 		uint64_t sdatts_,
 		uint64_t sdranges_,
 		uint64_t sdrows_,
 		uint64_t sdsucc_,
 		uint64_t sdooms_)
 	{
 		sws        = sws_;
 		sws10      = sws10_;
 		sws5       = sws5_;
 		sws3       = sws3_;
 		swcups     = swcups_;
 		swrows     = swrows_;
 		swskiprows = swskiprows_;
 		swskip     = swskip_;
 		swsucc     = swsucc_;
 		swfail     = swfail_;
 		swbts      = swbts_;
 		ungapsucc  = ungapsucc_;
 		ungapfail  = ungapfail_;
 		ungapnodec = ungapnodec_;
 		// Exact end-to-end attempts
 		exatts     = exatts_;
 		exranges   = exranges_;
 		exrows     = exrows_;
 		exsucc     = exsucc_;
 		exooms     = exooms_;
 		// 1-mismatch end-to-end attempts
 		mm1atts    = mm1atts_;
 		mm1ranges  = mm1ranges_;
 		mm1rows    = mm1rows_;
 		mm1succ    = mm1succ_;
 		mm1ooms    = mm1ooms_;
 		// Seed attempts
 		sdatts     = sdatts_;
 		sdranges   = sdranges_;
 		sdrows     = sdrows_;
 		sdsucc     = sdsucc_;
 		sdooms     = sdooms_;
 	}
 	/**
 	 * Merge (add) the counters in the given SwResult object into this
 	 * SwMetrics object.
 	 */
 	void update(const SwResult& r) {
 		sws        += r.sws;
 		swcups     += r.swcups;
 		swrows     += r.swrows;
 		swskiprows += r.swskiprows;
 		swskip     += r.swskip;
 		swsucc     += r.swsucc;
 		swfail     += r.swfail;
 		swbts      += r.swbts;
 	}
 	/**
 	 * Merge (add) the counters in the given SwMetrics object into this
 	 * object.  This is the only safe way to update a SwMetrics shared
 	 * by multiple threads.
 	 */
 	void merge(const SwMetrics& r, bool getLock = false) {
        ThreadSafe ts(&mutex_m, getLock);
 		sws        += r.sws;
 		sws10      += r.sws10;
 		sws5       += r.sws5;
 		sws3       += r.sws3;
 		swcups     += r.swcups;
 		swrows     += r.swrows;
 		swskiprows += r.swskiprows;
 		swskip     += r.swskip;
 		swsucc     += r.swsucc;
 		swfail     += r.swfail;
 		swbts      += r.swbts;
 		rshit      += r.rshit;
 		ungapsucc  += r.ungapsucc;
 		ungapfail  += r.ungapfail;
 		ungapnodec += r.ungapnodec;
 		exatts     += r.exatts;
 		exranges   += r.exranges;
 		exrows     += r.exrows;
 		exsucc     += r.exsucc;
 		exooms     += r.exooms;
 		mm1atts    += r.mm1atts;
 		mm1ranges  += r.mm1ranges;
 		mm1rows    += r.mm1rows;
 		mm1succ    += r.mm1succ;
 		mm1ooms    += r.mm1ooms;
 		sdatts     += r.sdatts;
 		sdranges   += r.sdranges;
 		sdrows     += r.sdrows;
 		sdsucc     += r.sdsucc;
 		sdooms     += r.sdooms;
 	}
 	void tallyGappedDp(size_t readGaps, size_t refGaps) {
 		size_t mx = max(readGaps, refGaps);
 		if(mx < 10) sws10++;
 		if(mx < 5)  sws5++;
 		if(mx < 3)  sws3++;
 	}
 	uint64_t sws;        // # DP problems solved
 	uint64_t sws10;      // # DP problems solved where max gaps < 10
 	uint64_t sws5;       // # DP problems solved where max gaps < 5
 	uint64_t sws3;       // # DP problems solved where max gaps < 3
 	uint64_t swcups;     // # DP cell updates
 	uint64_t swrows;     // # DP row updates
 	uint64_t swskiprows; // # skipped DP rows (b/c no valid alns go thru row)
 	uint64_t swskip;     // # DP problems skipped by sse filter
 	uint64_t swsucc;     // # DP problems resulting in alignment
 	uint64_t swfail;     // # DP problems not resulting in alignment
 	uint64_t swbts;      // # DP backtrace steps
 	uint64_t rshit;      // # DP problems avoided b/c seed hit was redundant
 	uint64_t ungapsucc;  // # DP problems avoided b/c seed hit was redundant
 	uint64_t ungapfail;  // # DP problems avoided b/c seed hit was redundant
 	uint64_t ungapnodec; // # DP problems avoided b/c seed hit was redundant
 	uint64_t exatts;     // total # attempts at exact-hit end-to-end aln
 	uint64_t exranges;   // total # ranges returned by exact-hit queries
 	uint64_t exrows;     // total # rows returned by exact-hit queries
 	uint64_t exsucc;     // exact-hit yielded non-empty result
 	uint64_t exooms;     // exact-hit offset memory exhausted
 	uint64_t mm1atts;    // total # attempts at 1mm end-to-end aln
 	uint64_t mm1ranges;  // total # ranges returned by 1mm-hit queries
 	uint64_t mm1rows;    // total # rows returned by 1mm-hit queries
 	uint64_t mm1succ;    // 1mm-hit yielded non-empty result
 	uint64_t mm1ooms;    // 1mm-hit offset memory exhausted
 	uint64_t sdatts;     // total # attempts to find seed alignments
 	uint64_t sdranges;   // total # seed-alignment ranges found
 	uint64_t sdrows;     // total # seed-alignment rows found
 	uint64_t sdsucc;     // # times seed alignment yielded >= 1 hit
 	uint64_t sdooms;     // # times an OOM occurred during seed alignment
 	MUTEX_T mutex_m;
 };
 // The various ways that one might backtrack from a later cell (either oall,
 // rdgap or rfgap) to an earlier cell
 enum {
 	SW_BT_OALL_DIAG,         // from oall cell to oall cell
 	SW_BT_OALL_REF_OPEN,     // from oall cell to oall cell
 	SW_BT_OALL_READ_OPEN,    // from oall cell to oall cell
 	SW_BT_RDGAP_EXTEND,      // from rdgap cell to rdgap cell
 	SW_BT_RFGAP_EXTEND       // from rfgap cell to rfgap cell
 };
 #endif /*def ALIGNER_SW_COMMON_H_*/
--- a/aligner_sw_driver.cpp
+++ b/aligner_sw_driver.cpp
@ -0,0 +1,20 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
--- a/aligner_sw_driver.h
+++ b/aligner_sw_driver.h
--- a/aligner_sw_nuc.h
+++ b/aligner_sw_nuc.h
@ -0,0 +1,262 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef ALIGNER_SW_NUC_H_
 #define ALIGNER_SW_NUC_H_
 #include <stdint.h>
 #include "aligner_sw_common.h"
 #include "aligner_result.h"
 /**
 * Encapsulates a backtrace stack frame.  Includes enough information that we
 * can "pop" back up to this frame and choose to make a different backtracking
 * decision.  The information included is:
 *
 * 1. The mask at the decision point.  When we first move through the mask and
 *    when we backtrack to it, we're careful to mask out the bit corresponding
 *    to the path we're taking.  When we move through it after removing the
 *    last bit from the mask, we're careful to pop it from the stack.
 * 2. The sizes of the edit lists.  When we backtrack, we resize the lists back
 *    down to these sizes to get rid of any edits introduced since the branch
 *    point.
 */
 struct DpNucFrame {
 	/**
 	 * Initialize a new DpNucFrame stack frame.
 	 */
 	void init(
 		size_t   nedsz_,
 		size_t   aedsz_,
 		size_t   celsz_,
 		size_t   row_,
 		size_t   col_,
 		size_t   gaps_,
 		size_t   readGaps_,
 		size_t   refGaps_,
 		AlnScore score_,
 		int      ct_)
 	{
 		nedsz    = nedsz_;
 		aedsz    = aedsz_;
 		celsz    = celsz_;
 		row      = row_;
 		col      = col_;
 		gaps     = gaps_;
 		readGaps = readGaps_;
 		refGaps  = refGaps_;
 		score    = score_;
 		ct       = ct_;
 	}
 	size_t   nedsz;    // size of the nucleotide edit list at branch (before
 	                   // adding the branch edit)
 	size_t   aedsz;    // size of ambiguous nucleotide edit list at branch
 	size_t   celsz;    // size of cell-traversed list at branch
 	size_t   row;      // row of cell where branch occurred
 	size_t   col;      // column of cell where branch occurred
 	size_t   gaps;     // number of gaps before branch occurred
 	size_t   readGaps; // number of read gaps before branch occurred
 	size_t   refGaps;  // number of ref gaps before branch occurred
 	AlnScore score;    // score where branch occurred
 	int      ct;       // table type (oall, rdgap or rfgap)
 };
 enum {
 	BT_CAND_FATE_SUCCEEDED = 1,
 	BT_CAND_FATE_FAILED,
 	BT_CAND_FATE_FILT_START,     // skipped b/c starting cell already explored
 	BT_CAND_FATE_FILT_DOMINATED, // skipped b/c it was dominated
 	BT_CAND_FATE_FILT_SCORE      // skipped b/c score not interesting anymore
 };
 /**
 * Encapsulates a cell that we might want to backtrace from.
 */
 struct DpBtCandidate {
 	DpBtCandidate() { reset(); }
 	DpBtCandidate(size_t row_, size_t col_, TAlScore score_) {
 		init(row_, col_, score_);
 	}
 	void reset() { init(0, 0, 0); }
 	void init(size_t row_, size_t col_, TAlScore score_) {
 		row = row_;
 		col = col_;
 		score = score_;
 		// 0 = invalid; this should be set later according to what happens
 		// before / during the backtrace
 		fate = 0; 
 	}
 	/** 
 	 * Return true iff this candidate is (heuristically) dominated by the given
 	 * candidate.  We say that candidate A dominates candidate B if (a) B is
 	 * somewhere in the N x N square that extends up and to the left of A,
 	 * where N is an arbitrary number like 20, and (b) B's score is <= than
 	 * A's.
 	 */
 	inline bool dominatedBy(const DpBtCandidate& o) {
 		const size_t SQ = 40;
 		size_t rowhi = row;
 		size_t rowlo = o.row;
 		if(rowhi < rowlo) swap(rowhi, rowlo);
 		size_t colhi = col;
 		size_t collo = o.col;
 		if(colhi < collo) swap(colhi, collo);
 		return (colhi - collo) <= SQ &&
 		       (rowhi - rowlo) <= SQ;
 	}
 	/**
 	 * Return true if this candidate is "greater than" (should be considered
 	 * later than) the given candidate.
 	 */
 	bool operator>(const DpBtCandidate& o) const {
 		if(score < o.score) return true;
 		if(score > o.score) return false;
 		if(row   < o.row  ) return true;
 		if(row   > o.row  ) return false;
 		if(col   < o.col  ) return true;
 		if(col   > o.col  ) return false;
 		return false;
 	}
 	/**
 	 * Return true if this candidate is "less than" (should be considered
 	 * sooner than) the given candidate.
 	 */
 	bool operator<(const DpBtCandidate& o) const {
 		if(score > o.score) return true;
 		if(score < o.score) return false;
 		if(row   > o.row  ) return true;
 		if(row   < o.row  ) return false;
 		if(col   > o.col  ) return true;
 		if(col   < o.col  ) return false;
 		return false;
 	}
 	/**
 	 * Return true if this candidate equals the given candidate.
 	 */
 	bool operator==(const DpBtCandidate& o) const {
 		return row   == o.row &&
 		       col   == o.col &&
 			   score == o.score;
 	}
 	bool operator>=(const DpBtCandidate& o) const { return !((*this) < o); }
 	bool operator<=(const DpBtCandidate& o) const { return !((*this) > o); }
 #ifndef NDEBUG
 	/**
 	 * Check internal consistency.
 	 */
 	bool repOk() const {
 		assert(VALID_SCORE(score));
 		return true;
 	}
 #endif
 	size_t   row;   // cell row
 	size_t   col;   // cell column w/r/t LHS of rectangle
 	TAlScore score; // score fo alignment
 	int      fate;  // flag indicating whether we succeeded, failed, skipped
 };
 template <typename T>
 class NBest {
 public:
 	NBest<T>() { nelt_ = nbest_ = n_ = 0; }
 	bool inited() const { return nelt_ > 0; }
 	void init(size_t nelt, size_t nbest) {
 		nelt_ = nelt;
 		nbest_ = nbest;
 		elts_.resize(nelt * nbest);
 		ncur_.resize(nelt);
 		ncur_.fill(0);
 		n_ = 0;
 	}
 	/**
 	 * Add a new result to bin 'elt'.  Where it gets prioritized in the list of
 	 * results in that bin depends on the result of operator>.
 	 */
 	bool add(size_t elt, const T& o) {
 		assert_lt(elt, nelt_);
 		const size_t ncur = ncur_[elt];
 		assert_leq(ncur, nbest_);
 		n_++;
 		for(size_t i = 0; i < nbest_ && i <= ncur; i++) {
 			if(o > elts_[nbest_ * elt + i] || i >= ncur) {
 				// Insert it here
 				// Move everyone from here on down by one slot
 				for(int j = (int)ncur; j > (int)i; j--) {
 					if(j < (int)nbest_) {
 						elts_[nbest_ * elt + j] = elts_[nbest_ * elt + j - 1];
 					}
 				}
 				elts_[nbest_ * elt + i] = o;
 				if(ncur < nbest_) {
 					ncur_[elt]++;
 				}
 				return true;
 			}
 		}
 		return false;
 	}
 	/**
 	 * Return true iff there are no solutions.
 	 */
 	bool empty() const {
 		return n_ == 0;
 	}
 	/**
 	 * Dump all the items in our payload into the given EList.
 	 */
 	template<typename TList>
 	void dump(TList& l) const {
 		if(empty()) return;
 		for(size_t i = 0; i < nelt_; i++) {
 			assert_leq(ncur_[i], nbest_);
 			for(size_t j = 0; j < ncur_[i]; j++) {
 				l.push_back(elts_[i * nbest_ + j]);
 			}
 		}
 	}
 protected:
 	size_t        nelt_;
 	size_t        nbest_;
 	EList<T>      elts_;
 	EList<size_t> ncur_;
 	size_t        n_;     // total # results added
 };
 #endif /*def ALIGNER_SW_NUC_H_*/
--- a/aligner_swsse.cpp
+++ b/aligner_swsse.cpp
@ -0,0 +1,88 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #include <string.h>
 #include "aligner_sw_common.h"
 #include "aligner_swsse.h"
 /**
 * Given a number of rows (nrow), a number of columns (ncol), and the
 * number of words to fit inside a single __m128i vector, initialize the
 * matrix buffer to accomodate the needed configuration of vectors.
 */
 void SSEMatrix::init(
 	size_t nrow,
 	size_t ncol,
 	size_t wperv)
 {
 	nrow_ = nrow;
 	ncol_ = ncol;
 	wperv_ = wperv;
 	nvecPerCol_ = (nrow + (wperv-1)) / wperv;
 	// The +1 is so that we don't have to special-case the final column;
 	// instead, we just write off the end of the useful part of the table
 	// with pvEStore.
 	try {
 		matbuf_.resizeNoCopy((ncol+1) * nvecPerCell_ * nvecPerCol_);
 	} catch(exception& e) {
 		cerr << "Tried to allocate DP matrix with " << (ncol+1)
 		     << " columns, " << nvecPerCol_
 			 << " vectors per column, and and " << nvecPerCell_
 			 << " vectors per cell" << endl;
 		throw e;
 	}
 	assert(wperv_ == 8 || wperv_ == 16);
 	vecshift_ = (wperv_ == 8) ? 3 : 4;
 	nvecrow_ = (nrow + (wperv_-1)) >> vecshift_;
 	nveccol_ = ncol;
 	colstride_ = nvecPerCol_ * nvecPerCell_;
 	rowstride_ = nvecPerCell_;
 	inited_ = true;
 }
 /**
 * Initialize the matrix of masks and backtracking flags.
 */
 void SSEMatrix::initMasks() {
 	assert_gt(nrow_, 0);
 	assert_gt(ncol_, 0);
 	masks_.resize(nrow_);
 	reset_.resizeNoCopy(nrow_);
 	reset_.fill(false);
 }
 /**
 * Given a row, col and matrix (i.e. E, F or H), return the corresponding
 * element.
 */
 int SSEMatrix::eltSlow(size_t row, size_t col, size_t mat) const {
 	assert_lt(row, nrow_);
 	assert_lt(col, ncol_);
 	assert_leq(mat, 3);
 	// Move to beginning of column/row
 	size_t rowelt = row / nvecrow_;
 	size_t rowvec = row % nvecrow_;
 	size_t eltvec = (col * colstride_) + (rowvec * rowstride_) + mat;
 	if(wperv_ == 16) {
 		return (int)((uint8_t*)(matbuf_.ptr() + eltvec))[rowelt];
 	} else {
 		assert_eq(8, wperv_);
 		return (int)((int16_t*)(matbuf_.ptr() + eltvec))[rowelt];
 	}
 }
--- a/aligner_swsse.h
+++ b/aligner_swsse.h
@ -0,0 +1,500 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef ALIGNER_SWSSE_H_
 #define ALIGNER_SWSSE_H_
 #include "ds.h"
 #include "mem_ids.h"
 #include "random_source.h"
 #include "scoring.h"
 #include "mask.h"
 #include "sse_util.h"
 #include <string>
 struct SSEMetrics {
 	SSEMetrics():mutex_m() { reset(); }
 	void clear() { reset(); }
 	void reset() {
 		dp = dpsat = dpfail = dpsucc = 
 		col = cell = inner = fixup =
 		gathsol = bt = btfail = btsucc = btcell =
 		corerej = nrej = 0;
 	}
 	void merge(const SSEMetrics& o, bool getLock = false) {
        ThreadSafe ts(&mutex_m, getLock);
 		dp       += o.dp;
 		dpsat    += o.dpsat;
 		dpfail   += o.dpfail;
 		dpsucc   += o.dpsucc;
 		col      += o.col;
 		cell     += o.cell;
 		inner    += o.inner;
 		fixup    += o.fixup;
 		gathsol  += o.gathsol;
 		bt       += o.bt;
 		btfail   += o.btfail;
 		btsucc   += o.btsucc;
 		btcell   += o.btcell;
 		corerej  += o.corerej;
 		nrej     += o.nrej;
 	}
 	uint64_t dp;       // DPs tried
 	uint64_t dpsat;    // DPs saturated
 	uint64_t dpfail;   // DPs failed
 	uint64_t dpsucc;   // DPs succeeded
 	uint64_t col;      // DP columns
 	uint64_t cell;     // DP cells
 	uint64_t inner;    // DP inner loop iters
 	uint64_t fixup;    // DP fixup loop iters
 	uint64_t gathsol;  // DP gather solution cells found
 	uint64_t bt;       // DP backtraces
 	uint64_t btfail;   // DP backtraces failed
 	uint64_t btsucc;   // DP backtraces succeeded
 	uint64_t btcell;   // DP backtrace cells traversed
 	uint64_t corerej;  // DP backtrace core rejections
 	uint64_t nrej;     // DP backtrace N rejections
 	MUTEX_T  mutex_m;
 };
 /**
 * Encapsulates matrix information calculated by the SSE aligner.
 *
 * Matrix memory is laid out as follows:
 *
 * - Elements (individual cell scores) are packed into __m128i vectors
 * - Vectors are packed into quartets, quartet elements correspond to: a vector
 *   from E, one from F, one from H, and one that's "reserved"
 * - Quartets are packed into columns, where the number of quartets is
 *   determined by the number of query characters divided by the number of
 *   elements per vector
 *
 * Regarding the "reserved" element of the vector quartet: we use it for two
 * things.  First, we use the first column of reserved vectors to stage the
 * initial column of H vectors.  Second, we use the "reserved" vectors during
 * the backtrace procedure to store information about (a) which cells have been
 * traversed, (b) whether the cell is "terminal" (in local mode), etc.
 */
 struct SSEMatrix {
 	// Each matrix element is a quartet of vectors.  These constants are used
 	// to identify members of the quartet.
 	const static size_t E   = 0;
 	const static size_t F   = 1;
 	const static size_t H   = 2;
 	const static size_t TMP = 3;
 	SSEMatrix(int cat = 0) : nvecPerCell_(4), matbuf_(cat) { }
 	/**
 	 * Return a pointer to the matrix buffer.
 	 */
 	inline __m128i *ptr() {
 		assert(inited_);
 		return matbuf_.ptr();
 	}
 	/**
 	 * Return a pointer to the E vector at the given row and column.  Note:
 	 * here row refers to rows of vectors, not rows of elements.
 	 */
 	inline __m128i* evec(size_t row, size_t col) {
 		assert_lt(row, nvecrow_);
 		assert_lt(col, nveccol_);
 		size_t elt = row * rowstride() + col * colstride() + E;
 		assert_lt(elt, matbuf_.size());
 		return ptr() + elt;
 	}
 	/**
 	 * Like evec, but it's allowed to ask for a pointer to one column after the
 	 * final one.
 	 */
 	inline __m128i* evecUnsafe(size_t row, size_t col) {
 		assert_lt(row, nvecrow_);
 		assert_leq(col, nveccol_);
 		size_t elt = row * rowstride() + col * colstride() + E;
 		assert_lt(elt, matbuf_.size());
 		return ptr() + elt;
 	}
 	/**
 	 * Return a pointer to the F vector at the given row and column.  Note:
 	 * here row refers to rows of vectors, not rows of elements.
 	 */
 	inline __m128i* fvec(size_t row, size_t col) {
 		assert_lt(row, nvecrow_);
 		assert_lt(col, nveccol_);
 		size_t elt = row * rowstride() + col * colstride() + F;
 		assert_lt(elt, matbuf_.size());
 		return ptr() + elt;
 	}
 	/**
 	 * Return a pointer to the H vector at the given row and column.  Note:
 	 * here row refers to rows of vectors, not rows of elements.
 	 */
 	inline __m128i* hvec(size_t row, size_t col) {
 		assert_lt(row, nvecrow_);
 		assert_lt(col, nveccol_);
 		size_t elt = row * rowstride() + col * colstride() + H;
 		assert_lt(elt, matbuf_.size());
 		return ptr() + elt;
 	}
 	/**
 	 * Return a pointer to the TMP vector at the given row and column.  Note:
 	 * here row refers to rows of vectors, not rows of elements.
 	 */
 	inline __m128i* tmpvec(size_t row, size_t col) {
 		assert_lt(row, nvecrow_);
 		assert_lt(col, nveccol_);
 		size_t elt = row * rowstride() + col * colstride() + TMP;
 		assert_lt(elt, matbuf_.size());
 		return ptr() + elt;
 	}
 	/**
 	 * Like tmpvec, but it's allowed to ask for a pointer to one column after
 	 * the final one.
 	 */
 	inline __m128i* tmpvecUnsafe(size_t row, size_t col) {
 		assert_lt(row, nvecrow_);
 		assert_leq(col, nveccol_);
 		size_t elt = row * rowstride() + col * colstride() + TMP;
 		assert_lt(elt, matbuf_.size());
 		return ptr() + elt;
 	}
 	/**
 	 * Given a number of rows (nrow), a number of columns (ncol), and the
 	 * number of words to fit inside a single __m128i vector, initialize the
 	 * matrix buffer to accomodate the needed configuration of vectors.
 	 */
 	void init(
 		size_t nrow,
 		size_t ncol,
 		size_t wperv);
 	/**
 	 * Return the number of __m128i's you need to skip over to get from one
 	 * cell to the cell one column over from it.
 	 */
 	inline size_t colstride() const { return colstride_; }
 	/**
 	 * Return the number of __m128i's you need to skip over to get from one
 	 * cell to the cell one row down from it.
 	 */
 	inline size_t rowstride() const { return rowstride_; }
 	/**
 	 * Given a row, col and matrix (i.e. E, F or H), return the corresponding
 	 * element.
 	 */
 	int eltSlow(size_t row, size_t col, size_t mat) const;
 	/**
 	 * Given a row, col and matrix (i.e. E, F or H), return the corresponding
 	 * element.
 	 */
 	inline int elt(size_t row, size_t col, size_t mat) const {
 		assert(inited_);
 		assert_lt(row, nrow_);
 		assert_lt(col, ncol_);
 		assert_lt(mat, 3);
 		// Move to beginning of column/row
 		size_t rowelt = row / nvecrow_;
 		size_t rowvec = row % nvecrow_;
 		size_t eltvec = (col * colstride_) + (rowvec * rowstride_) + mat;
 		assert_lt(eltvec, matbuf_.size());
 		if(wperv_ == 16) {
 			return (int)((uint8_t*)(matbuf_.ptr() + eltvec))[rowelt];
 		} else {
 			assert_eq(8, wperv_);
 			return (int)((int16_t*)(matbuf_.ptr() + eltvec))[rowelt];
 		}
 	}
 	/**
 	 * Return the element in the E matrix at element row, col.
 	 */
 	inline int eelt(size_t row, size_t col) const {
 		return elt(row, col, E);
 	}
 	/**
 	 * Return the element in the F matrix at element row, col.
 	 */
 	inline int felt(size_t row, size_t col) const {
 		return elt(row, col, F);
 	}
 	/**
 	 * Return the element in the H matrix at element row, col.
 	 */
 	inline int helt(size_t row, size_t col) const {
 		return elt(row, col, H);
 	}
 	/**
 	 * Return true iff the given cell has its reportedThru bit set.
 	 */
 	inline bool reportedThrough(
 		size_t row,          // current row
 		size_t col) const    // current column
 	{
 		return (masks_[row][col] & (1 << 0)) != 0;
 	}
 	/**
 	 * Set the given cell's reportedThru bit.
 	 */
 	inline void setReportedThrough(
 		size_t row,          // current row
 		size_t col)          // current column
 	{
 		masks_[row][col] |= (1 << 0);
 	}
 	/**
 	 * Return true iff the H mask has been set with a previous call to hMaskSet.
 	 */
 	bool isHMaskSet(
 		size_t row,          // current row
 		size_t col) const;   // current column
 	/**
 	 * Set the given cell's H mask.  This is the mask of remaining legal ways to
 	 * backtrack from the H cell at this coordinate.  It's 5 bits long and has
 	 * offset=2 into the 16-bit field.
 	 */
 	void hMaskSet(
 		size_t row,          // current row
 		size_t col,          // current column
 		int mask);
 	/**
 	 * Return true iff the E mask has been set with a previous call to eMaskSet.
 	 */
 	bool isEMaskSet(
 		size_t row,          // current row
 		size_t col) const;   // current column
 	/**
 	 * Set the given cell's E mask.  This is the mask of remaining legal ways to
 	 * backtrack from the E cell at this coordinate.  It's 2 bits long and has
 	 * offset=8 into the 16-bit field.
 	 */
 	void eMaskSet(
 		size_t row,          // current row
 		size_t col,          // current column
 		int mask);
 	/**
 	 * Return true iff the F mask has been set with a previous call to fMaskSet.
 	 */
 	bool isFMaskSet(
 		size_t row,          // current row
 		size_t col) const;   // current column
 	/**
 	 * Set the given cell's F mask.  This is the mask of remaining legal ways to
 	 * backtrack from the F cell at this coordinate.  It's 2 bits long and has
 	 * offset=11 into the 16-bit field.
 	 */
 	void fMaskSet(
 		size_t row,          // current row
 		size_t col,          // current column
 		int mask);
 	/**
 	 * Analyze a cell in the SSE-filled dynamic programming matrix.  Determine &
 	 * memorize ways that we can backtrack from the cell.  If there is at least one
 	 * way to backtrack, select one at random and return the selection.
 	 *
 	 * There are a few subtleties to keep in mind regarding which cells can be at
 	 * the end of a backtrace.  First of all: cells from which we can backtrack
 	 * should not be at the end of a backtrace.  But have to distinguish between
 	 * cells whose masks eventually become 0 (we shouldn't end at those), from
 	 * those whose masks were 0 all along (we can end at those).
 	 */
 	void analyzeCell(
 		size_t row,          // current row
 		size_t col,          // current column
 		size_t ct,           // current cell type: E/F/H
 		int refc,
 		int readc,
 		int readq,
 		const Scoring& sc,   // scoring scheme
 		int64_t offsetsc,    // offset to add to each score
 		RandomSource& rand,  // rand gen for choosing among equal options
 		bool& empty,         // out: =true iff no way to backtrace
 		int& cur,            // out: =type of transition
 		bool& branch,        // out: =true iff we chose among >1 options
 		bool& canMoveThru,   // out: =true iff ...
 		bool& reportedThru); // out: =true iff ...
 	/**
 	 * Initialize the matrix of masks and backtracking flags.
 	 */
 	void initMasks();
 	/**
 	 * Return the number of rows in the dynamic programming matrix.
 	 */
 	size_t nrow() const {
 		return nrow_;
 	}
 	/**
 	 * Return the number of columns in the dynamic programming matrix.
 	 */
 	size_t ncol() const {
 		return ncol_;
 	}
 	/**
 	 * Prepare a row so we can use it to store masks.
 	 */
 	void resetRow(size_t i) {
 		assert(!reset_[i]);
 		masks_[i].resizeNoCopy(ncol_);
 		masks_[i].fillZero();
 		reset_[i] = true;
 	}
 	bool             inited_;      // initialized?
 	size_t           nrow_;        // # rows
 	size_t           ncol_;        // # columns
 	size_t           nvecrow_;     // # vector rows (<= nrow_)
 	size_t           nveccol_;     // # vector columns (<= ncol_)
 	size_t           wperv_;       // # words per vector
 	size_t           vecshift_;    // # bits to shift to divide by words per vec
 	size_t           nvecPerCol_;  // # vectors per column
 	size_t           nvecPerCell_; // # vectors per matrix cell (4)
 	size_t           colstride_;   // # vectors b/t adjacent cells in same row
 	size_t           rowstride_;   // # vectors b/t adjacent cells in same col
 	EList_m128i      matbuf_;      // buffer for holding vectors
 	ELList<uint16_t> masks_;       // buffer for masks/backtracking flags
 	EList<bool>      reset_;       // true iff row in masks_ has been reset
 };
 /**
 * All the data associated with the query profile and other data needed for SSE
 * alignment of a query.
 */
 struct SSEData {
 	SSEData(int cat = 0) : profbuf_(cat), mat_(cat) { }
 	EList_m128i    profbuf_;     // buffer for query profile & temp vecs
 	EList_m128i    vecbuf_;      // buffer for 2 column vectors (not using mat_)
 	size_t         qprofStride_; // stride for query profile
 	size_t         gbarStride_;  // gap barrier for query profile
 	SSEMatrix      mat_;         // SSE matrix for holding all E, F, H vectors
 	size_t         maxPen_;      // biggest penalty of all
 	size_t         maxBonus_;    // biggest bonus of all
 	size_t         lastIter_;    // which 128-bit striped word has final row?
 	size_t         lastWord_;    // which word within 128-word has final row?
 	int            bias_;        // all scores shifted up by this for unsigned
 };
 /**
 * Return true iff the H mask has been set with a previous call to hMaskSet.
 */
 inline bool SSEMatrix::isHMaskSet(
 	size_t row,          // current row
 	size_t col) const    // current column
 {
 	return (masks_[row][col] & (1 << 1)) != 0;
 }
 /**
 * Set the given cell's H mask.  This is the mask of remaining legal ways to
 * backtrack from the H cell at this coordinate.  It's 5 bits long and has
 * offset=2 into the 16-bit field.
 */
 inline void SSEMatrix::hMaskSet(
 	size_t row,          // current row
 	size_t col,          // current column
 	int mask)
 {
 	assert_lt(mask, 32);
 	masks_[row][col] &= ~(31 << 1);
 	masks_[row][col] |= (1 << 1 | mask << 2);
 }
 /**
 * Return true iff the E mask has been set with a previous call to eMaskSet.
 */
 inline bool SSEMatrix::isEMaskSet(
 	size_t row,          // current row
 	size_t col) const    // current column
 {
 	return (masks_[row][col] & (1 << 7)) != 0;
 }
 /**
 * Set the given cell's E mask.  This is the mask of remaining legal ways to
 * backtrack from the E cell at this coordinate.  It's 2 bits long and has
 * offset=8 into the 16-bit field.
 */
 inline void SSEMatrix::eMaskSet(
 	size_t row,          // current row
 	size_t col,          // current column
 	int mask)
 {
 	assert_lt(mask, 4);
 	masks_[row][col] &= ~(7 << 7);
 	masks_[row][col] |=  (1 << 7 | mask << 8);
 }
 /**
 * Return true iff the F mask has been set with a previous call to fMaskSet.
 */
 inline bool SSEMatrix::isFMaskSet(
 	size_t row,          // current row
 	size_t col) const    // current column
 {
 	return (masks_[row][col] & (1 << 10)) != 0;
 }
 /**
 * Set the given cell's F mask.  This is the mask of remaining legal ways to
 * backtrack from the F cell at this coordinate.  It's 2 bits long and has
 * offset=11 into the 16-bit field.
 */
 inline void SSEMatrix::fMaskSet(
 	size_t row,          // current row
 	size_t col,          // current column
 	int mask)
 {
 	assert_lt(mask, 4);
 	masks_[row][col] &= ~(7 << 10);
 	masks_[row][col] |=  (1 << 10 | mask << 11);
 }
 #define ROWSTRIDE_2COL 4
 #define ROWSTRIDE 4
 #endif /*ndef ALIGNER_SWSSE_H_*/
--- a/aligner_swsse_ee_i16.cpp
+++ b/aligner_swsse_ee_i16.cpp
--- a/aligner_swsse_ee_u8.cpp
+++ b/aligner_swsse_ee_u8.cpp
--- a/aligner_swsse_loc_i16.cpp
+++ b/aligner_swsse_loc_i16.cpp
--- a/aligner_swsse_loc_u8.cpp
+++ b/aligner_swsse_loc_u8.cpp
--- a/alignment_3n.cpp
+++ b/alignment_3n.cpp
@ -0,0 +1,193 @@
 /*
 * Copyright 2020, Yun (Leo) Zhang <imzhangyun@gmail.com>
 *
 * This file is part of HISAT-3N.
 *
 * HISAT-3N is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * HISAT-3N is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with HISAT-3N.  If not, see <http://www.gnu.org/licenses/>.
 */
 #include "alignment_3n.h"
 #include "aln_sink.h"
 /**
 * return true if two location is concordant.
 * return false, if there are not concordant or too far (>maxPairDistance).
 */
 bool Alignment::isConcordant(long long int location1, bool &forward1, long long int readLength1, long long int location2, bool &forward2, long long int readLength2) {
    if (forward1 == forward2) // same direction
    {
        return false;
    }
    // adjust the location of the start of the read
    if (!forward1)
    {
        location1 = location1 + readLength1 - 1;
    }
    if (!forward2)
    {
        location2 = location2 + readLength2 - 1;
    }
    // return false if two reads are too far from each other
    if (abs(location1-location2) > maxPairDistance)
    {
        return false;
    }
    if (location1 == location2)
    {
        return true;
    }
    else if (location1 < location2)
    {
        if (forward1 && !forward2)
        {
            return true;
        }
    }
    else
    {
        if (!forward1 && forward2)
        {
            return true;
        }
    }
    return false;
 }
 /**
 * this is the basic function to calculate DNA pair score.
 * if the distance between 2 alignments is more than penaltyFreeDistance_DNA, we reduce the score by the distance/100.
 * if two alignment is concordant we add concordantScoreBounce to make sure to select the concordant pair as best pair.
 */
 int Alignment::calculatePairScore_DNA (long long int &location0, int& AS0, bool& forward0, long long int readLength0, long long int &location1, int &AS1, bool &forward1, long long int readLength1, bool& concordant) {
    int score = ASPenalty*AS0 + ASPenalty*AS1;
    int distance = abs(location0 - location1);
    if (distance > maxPairDistance) { return numeric_limits<int>::min(); }
    if (distance > penaltyFreeDistance_DNA) { score -= distance/distancePenaltyFraction_DNA; }
    concordant = isConcordant(location0, forward0, readLength0, location1, forward1, readLength1);
    if (concordant) { score += concordantScoreBounce; }
    return score;
 }
 /**
 * this is the basic function to calculate RNA pair score.
 * if the distance between 2 alignments is more than penaltyFreeDistance_RNA, we reduce the score by the distance/1000.
 * if two alignment is concordant we add concordantScoreBounce to make sure to select the concordant pair as best pair.
 */
 int Alignment::calculatePairScore_RNA (long long int &location0, int& XM0, bool& forward0, long long int readLength0, long long int &location1, int &XM1, bool &forward1, long long int readLength1, bool& concordant) {
    // this is the basic function to calculate pair score.
    // if the distance between 2 alignment is more than 100,000, we reduce the score by the distance/1000.
    // if two alignment is concordant we add 500,000 to make sure to select the concordant pair as best pair.
    int score = -ASPenalty*XM0 + -ASPenalty*XM1;
    int distance = abs(location0 - location1);
    if (distance > maxPairDistance) { return numeric_limits<int>::min(); }
    if (distance > penaltyFreeDistance_RNA) { score -= distance/distancePenaltyFraction_RNA; }
    concordant = isConcordant(location0, forward0, readLength0, location1, forward1, readLength1);
    if (concordant) { score += concordantScoreBounce; }
    return score;
 }
 /**
 * calculate the pairScore for a pair of alignment result. Output pair Score and number of pair.
 * Do not update their pairScore.
 */
 int Alignment::calculatePairScore(Alignment *inputAlignment, int &nPair) {
    int pairScore = numeric_limits<int>::min();
    nPair = 0;
    if (pairSegment == inputAlignment->pairSegment){
        // when 2 alignment results are from same pair segment, output the lowest score and number of pair equal zero.
        pairScore = numeric_limits<int>::min();
    } else if (!mapped && !inputAlignment->mapped) {
        // both unmapped.
        pairScore = numeric_limits<int>::min()/2 - 1;
    } else if (!mapped || !inputAlignment->mapped) {
        // one of the segment unmapped.
        pairScore = numeric_limits<int>::min()/2;
        nPair = 1;
    } else if ((!repeat && !inputAlignment->repeat)){
        // both mapped and (both non-repeat or not expand repeat)
        bool concordant;
        if (DNA) {
            pairScore = calculatePairScore_DNA(location,
                                               AS,
                                               forward,
                                               readSequence.length(),
                                               inputAlignment->location,
                                               inputAlignment->AS,
                                               inputAlignment->forward,
                                               inputAlignment->readSequence.length(),
                                               concordant);
        } else {
            pairScore = calculatePairScore_RNA(location,
                                               XM,
                                               forward,
                                               readSequence.length(),
                                               inputAlignment->location,
                                               inputAlignment->XM,
                                               inputAlignment->forward,
                                               inputAlignment->readSequence.length(),
                                               concordant);
        }
        setConcordant(concordant);
        inputAlignment->setConcordant(concordant);
        nPair = 1;
    }
    return pairScore;
 }
 void Alignments::reportStats_single(ReportingMetrics& met) {
    int nAlignment = alignmentPositions.nBestSingle;
    if (nAlignment == 0) {
        met.nunp_0++;
    } else {
        met.nunp_uni++;
        if (nAlignment == 1) { met.nunp_uni1++; }
        else { met.nunp_uni2++; }
    }
 }
 void Alignments::reportStats_paired(ReportingMetrics& met) {
    if (!alignmentPositions.concordantExist) {
        met.nconcord_0++;
        if (alignmentPositions.nBestPair == 0) {
            met.nunp_0_0 += 2;
            return;
        }
        if (alignmentPositions.bestPairScore == numeric_limits<int>::min()/2) {
            // one mate is unmapped, one mate is mapped
            met.nunp_0_0++;
            met.nunp_0_uni++;
            if (alignmentPositions.nBestPair == 1) { met.nunp_0_uni1++; }
            else { met.nunp_0_uni2++; }
        } else { //both mate is mapped
            if (alignmentPositions.nBestPair == 1) {
                met.ndiscord++;
                return;
            }
            else {
                met.nunp_0_uni += 2;
                met.nunp_0_uni2 += 2;
            }
        }
    } else {
        assert(alignmentPositions.nBestPair > 0);
        met.nconcord_uni++;
        if (alignmentPositions.nBestPair == 1) { met.nconcord_uni1++; }
        else { met.nconcord_uni2++; }
    }
 }
--- a/alignment_3n.h
+++ b/alignment_3n.h
--- a/alignment_3n_table.h
+++ b/alignment_3n_table.h
@ -0,0 +1,287 @@
 /*
 * Copyright 2020, Yun (Leo) Zhang <imzhangyun@gmail.com>
 *
 * This file is part of HISAT-3N.
 *
 * HISAT-3N is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * HISAT-3N is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with HISAT-3N.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef ALIGNMENT_3N_TABLE_H
 #define ALIGNMENT_3N_TABLE_H
 #include <string>
 #include "utility_3n_table.h"
 extern bool uniqueOnly;
 extern bool multipleOnly;
 extern char convertFrom;
 extern char convertTo;
 extern char convertFromComplement;
 extern char convertToComplement;
 using namespace std;
 /**
 * the class to store information from one SAM line
 */
 class Alignment {
 public:
    string chromosome;
    long long int location;
    long long int mateLocation;
    int flag;
    bool mapped;
    char strand;
    string sequence;
    string quality;
    bool unique;
    string mapQ;
    int NH;
    vector<PosQuality> bases;
    CIGAR cigarString;
    MD_tag MD;
    unsigned long long readNameID;
    int sequenceCoveredLength; // the sum of number is cigarString;
    bool overlap; // if the segment could overlap with the mate segment.
    bool paired;
    void initialize() {
        chromosome.clear();
        location = -1;
        mateLocation = -1;
        flag = -1;
        mapped = false;
        MD.initialize();
        cigarString.initialize();
        sequence.clear();
        quality.clear();
        unique = false;
        mapQ.clear();
        NH = -1;
        bases.clear();
        readNameID = 0;
        sequenceCoveredLength = 0;
        overlap = false;
        paired = false;
    }
    /**
     * for start position in input Line, check if it contain the target information.
     */
    bool startWith(string* inputLine, int startPosition, string tag){
        for (int i = 0; i < tag.size(); i++){
            if (inputLine->at(startPosition+i) != tag[i]){
                return false;
            }
        }
        return true;
    }
    /**
     * generate a hash value for readName
     */
     void getNameHash(string& readName) {
        readNameID = 0;
        int a = 63689;
        for (size_t i = 0; i < readName.size(); i++) {
            readNameID = (readNameID * a) + (int)readName[i];
        }
     }
    /**
     * extract the information from SAM line to Alignment.
     */
     void parseInfo(string* line) {
        int startPosition = 0;
        int endPosition = 0;
        int count = 0;
        while ((endPosition = line->find("\t", startPosition)) != string::npos) {
            if (count == 0) {
                string readName = line->substr(startPosition, endPosition - startPosition);
                getNameHash(readName);
            } else if (count == 1) {
                flag = stoi(line->substr(startPosition, endPosition - startPosition));
                mapped = (flag & 4) == 0;
                paired = (flag & 1) != 0;
            } else if (count == 2) {
                chromosome = line->substr(startPosition, endPosition - startPosition);
            } else if (count == 3) {
                location = stoll(line->substr(startPosition, endPosition - startPosition));
            } else if (count == 4) {
                mapQ = line->substr(startPosition, endPosition - startPosition);
                if (mapQ == "1") {
                    unique = false;
                } else {
                    unique = true;
                }
            } else if (count == 5) {
                cigarString.loadString(line->substr(startPosition, endPosition - startPosition));
            } else if (count == 7) {
                mateLocation = stoll(line->substr(startPosition, endPosition - startPosition));
            } else if (count == 9) {
                sequence = line->substr(startPosition, endPosition - startPosition);
            } else if (count == 10) {
                quality = line->substr(startPosition, endPosition - startPosition);
            } else if (count > 10) {
                if (startWith(line, startPosition, "MD")) {
                    MD.loadString(line->substr(startPosition + 5, endPosition - startPosition - 5));
                } else if (startWith(line, startPosition, "NM")) {
                    NH = stoi(line->substr(startPosition + 5, endPosition - startPosition - 5));
                } else if (startWith(line, startPosition, "YZ")) {
                    strand = line->at(endPosition-1);
                }
            }
            startPosition = endPosition + 1;
            count++;
        }
        if (startWith(line, startPosition, "MD")) {
            MD.loadString(line->substr(startPosition + 5, endPosition - startPosition - 5));
        } else if (startWith(line, startPosition, "NM")) {
            NH = stoi(line->substr(startPosition + 5, endPosition - startPosition - 5));
        } else if (startWith(line, startPosition, "YZ")) {
            strand = line->at(endPosition-1);
        }
     }
     /**
      * change the overlap = true, if the read is not uniquely mapped or the read segment is overlap to it's mate.
      */
      void checkOverlap() {
          if (!unique) {
              overlap = true;
          } else {
              if (paired && (location + sequenceCoveredLength >= mateLocation)) {
                  overlap = true;
              } else {
                  overlap = false;
              }
          }
      }
    /**
     * parse the sam line to alignment information
     */
    void parse(string* line) {
        initialize();
        parseInfo(line);
        if ((uniqueOnly && !unique) || (multipleOnly && unique)) {
            return;
        }
        appendBase();
    }
    /**
     *  scan all base in read sequence label them if they are qualified.
     */
    void appendBase() {
        if (!mapped || sequenceCoveredLength > 500000) { // if the read's intron longer than 500,000 ignore this read
            return;
        }
        bases.reserve(sequence.size());
        for (int i = 0; i < sequence.size(); i++) {
            bases.emplace_back(i);
        }
        int pos = adjustPos();
        string match;
        while (MD.getNextSegment(match)) {
            if (isdigit(match.front())) { // the first char of match is digit this is match
                int len = stoi(match);
                for (int i = 0; i < len; i++) {
                    while (bases[pos].remove) {
                        pos++;
                    }
                    if ((strand == '+' && sequence[pos] == convertFrom) ||
                        (strand == '-' && sequence[pos] == convertFromComplement)) {
                        bases[pos].setQual(quality[pos], false);
                    } else {
                        bases[pos].remove = true;
                    }
                    pos ++;
                }
            } else if (isalpha(match.front())) { // this is mismatch or conversion
                char refBase = match.front();
                // for + strand, it should have C->T change
                // for - strand, it should have G->A change
                while (bases[pos].remove) {
                    pos++;
                }
                if ((strand == '+' && refBase == convertFrom && sequence[pos] == convertTo) ||
                    (strand == '-' && refBase == convertFromComplement && sequence[pos] == convertToComplement)){
                    bases[pos].setQual(quality[pos], true);
                } else {
                    bases[pos].remove = true;
                }
                pos ++;
            } else { // deletion. do nothing.
            }
        }
    }
    /**
     * adjust the reference position in bases
     */
    int  adjustPos() {
        int readPos = 0;
        int returnPos = 0;
        int seqLength = sequence.size();
        char cigarSymbol;
        int cigarLen;
        sequenceCoveredLength = 0;
        while (cigarString.getNextSegment(cigarLen, cigarSymbol)) {
            sequenceCoveredLength += cigarLen;
            if (cigarSymbol == 'S') {
                if (readPos == 0) { // soft clip is at the begin of the read
                    returnPos = cigarLen;
                    for (int i = cigarLen; i < seqLength; i++) {
                        bases[i].refPos -= cigarLen;
                    }
                } else { // soft clip is at the end of the read
                    // do nothing
                }
                readPos += cigarLen;
            } else if (cigarSymbol == 'N') {
                for (int i = readPos; i < seqLength; i++) {
                    bases[i].refPos += cigarLen;
                }
            } else if (cigarSymbol == 'M') {
                for (int i = readPos; i < readPos+cigarLen; i++) {
                    bases[i].remove = false;
                }
                readPos += cigarLen;
            } else if (cigarSymbol == 'I') {
                for (int i = readPos + cigarLen; i < seqLength; i++) {
                    bases[i].refPos -= cigarLen;
                }
                readPos += cigarLen;
            } else if (cigarSymbol == 'D') {
                for (int i = readPos; i < seqLength; i++) {
                    bases[i].refPos += cigarLen;
                }
            }
        }
        return returnPos;
    }
 };
 #endif //ALIGNMENT_3N_TABLE_H
--- a/aln_sink.cpp
+++ b/aln_sink.cpp
@ -0,0 +1,785 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #include <iomanip>
 #include <limits>
 #include "aln_sink.h"
 #include "aligner_seed.h"
 #include "util.h"
 using namespace std;
 /**
 * Initialize state machine with a new read.  The state we start in depends
 * on whether it's paired-end or unpaired.
 */
 void ReportingState::nextRead(bool paired) {
 	paired_ = paired;
 	if(paired) {
 		state_ = CONCORDANT_PAIRS;
 		doneConcord_ = false;
 		doneDiscord_ = p_.discord ? false : true;
 		doneUnpair1_ = p_.mixed   ? false : true;
 		doneUnpair2_ = p_.mixed   ? false : true;
 		exitConcord_ = ReportingState::EXIT_DID_NOT_EXIT;
 		exitDiscord_ = p_.discord ?
 			ReportingState::EXIT_DID_NOT_EXIT :
 			ReportingState::EXIT_DID_NOT_ENTER;
 		exitUnpair1_ = p_.mixed   ?
 			ReportingState::EXIT_DID_NOT_EXIT :
 			ReportingState::EXIT_DID_NOT_ENTER;
 		exitUnpair2_ = p_.mixed   ?
 			ReportingState::EXIT_DID_NOT_EXIT :
 			ReportingState::EXIT_DID_NOT_ENTER;
 	} else {
 		// Unpaired
 		state_ = UNPAIRED;
 		doneConcord_ = true;
 		doneDiscord_ = true;
 		doneUnpair1_ = false;
 		doneUnpair2_ = true;
 		exitConcord_ = ReportingState::EXIT_DID_NOT_ENTER; // not relevant
 		exitDiscord_ = ReportingState::EXIT_DID_NOT_ENTER; // not relevant
 		exitUnpair1_ = ReportingState::EXIT_DID_NOT_EXIT;
 		exitUnpair2_ = ReportingState::EXIT_DID_NOT_ENTER; // not relevant
 	}
 	doneUnpair_ = doneUnpair1_ && doneUnpair2_;
 	done_ = false;
 	nconcord_ = ndiscord_ = nunpair1_ = nunpair2_ = 0;
    nunpairRepeat1_ = nunpairRepeat2_ = 0;
    concordBest_ = getMinScore();
 }
 /**
 * Caller uses this member function to indicate that one additional
 * concordant alignment has been found.
 */
 bool ReportingState::foundConcordant(TAlScore score) {
 	assert(paired_);
 	assert_geq(state_, ReportingState::CONCORDANT_PAIRS);
 	assert(!doneConcord_);
    if(score > concordBest_) {
        concordBest_ = score;
        nconcord_ = 0;
    }
 	nconcord_++;
    // DK CONCORDANT - debugging purpuses
 	// areDone(nconcord_, doneConcord_, exitConcord_);
 	// No need to search for discordant alignments if there are one or more
 	// concordant alignments.
 	doneDiscord_ = true;
 	exitDiscord_ = ReportingState::EXIT_SHORT_CIRCUIT_TRUMPED;
 	if(doneConcord_) {
 		// If we're finished looking for concordant alignments, do we have to
 		// continue on to search for unpaired alignments?  Only if our exit
 		// from the concordant stage is EXIT_SHORT_CIRCUIT_M.  If it's
 		// EXIT_SHORT_CIRCUIT_k or EXIT_WITH_ALIGNMENTS, we can skip unpaired.
 		assert_neq(ReportingState::EXIT_NO_ALIGNMENTS, exitConcord_);
 		if(exitConcord_ != ReportingState::EXIT_SHORT_CIRCUIT_M) {
 			if(!doneUnpair1_) {
 				doneUnpair1_ = true;
 				exitUnpair1_ = ReportingState::EXIT_SHORT_CIRCUIT_TRUMPED;
 			}
 			if(!doneUnpair2_) {
 				doneUnpair2_ = true;
 				exitUnpair2_ = ReportingState::EXIT_SHORT_CIRCUIT_TRUMPED;
 			}
 		}
 	}
 	updateDone();
 	return done();
 }
 /**
 * Caller uses this member function to indicate that one additional unpaired
 * mate alignment has been found for the specified mate.
 */
 bool ReportingState::foundUnpaired(bool mate1, bool repeat) {
 	assert_gt(state_, ReportingState::NO_READ);
 	// Note: it's not right to assert !doneUnpair1_/!doneUnpair2_ here.
 	// Even if we're done with finding 
 	if(mate1) {
 		nunpair1_++;
        if(repeat) {
            nunpairRepeat1_++;
        }
 		// Did we just finish with this mate?
 		if(!doneUnpair1_) {
 			areDone(nunpair1_, doneUnpair1_, exitUnpair1_);
 			if(doneUnpair1_) {
 				doneUnpair_ = doneUnpair1_ && doneUnpair2_;
 				updateDone();
 			}
 		}
 		if(nunpair1_ > 1) {
 			doneDiscord_ = true;
 			exitDiscord_ = ReportingState::EXIT_NO_ALIGNMENTS;
 		}
 	} else {
 		nunpair2_++;
        if(repeat) {
            nunpairRepeat2_++;
        }
 		// Did we just finish with this mate?
 		if(!doneUnpair2_) {
 			areDone(nunpair2_, doneUnpair2_, exitUnpair2_);
 			if(doneUnpair2_) {
 				doneUnpair_ = doneUnpair1_ && doneUnpair2_;
 				updateDone();
 			}
 		}
 		if(nunpair2_ > 1) {
 			doneDiscord_ = true;
 			exitDiscord_ = ReportingState::EXIT_NO_ALIGNMENTS;
 		}
 	}
 	return done();
 }
 /**
 * Called to indicate that the aligner has finished searching for
 * alignments.  This gives us a chance to finalize our state.
 *
 * TODO: Keep track of short-circuiting information.
 */
 void ReportingState::finish() {
 	if(!doneConcord_) {
 		doneConcord_ = true;
 		exitConcord_ =
 			((nconcord_ > 0) ?
 				ReportingState::EXIT_WITH_ALIGNMENTS :
 				ReportingState::EXIT_NO_ALIGNMENTS);
 	}
 	assert_gt(exitConcord_, EXIT_DID_NOT_EXIT);
 	if(!doneUnpair1_) {
 		doneUnpair1_ = true;
 		exitUnpair1_ =
 			((nunpair1_ > 0) ?
 				ReportingState::EXIT_WITH_ALIGNMENTS :
 				ReportingState::EXIT_NO_ALIGNMENTS);
 	}
 	assert_gt(exitUnpair1_, EXIT_DID_NOT_EXIT);
 	if(!doneUnpair2_) {
 		doneUnpair2_ = true;
 		exitUnpair2_ =
 			((nunpair2_ > 0) ?
 				ReportingState::EXIT_WITH_ALIGNMENTS :
 				ReportingState::EXIT_NO_ALIGNMENTS);
 	}
 	assert_gt(exitUnpair2_, EXIT_DID_NOT_EXIT);
 	if(!doneDiscord_) {
 		// Check if the unpaired alignments should be converted to a single
 		// discordant paired-end alignment.
 		assert_eq(0, ndiscord_);
 		if(nconcord_ == 0 && nunpair1_ == 1 && nunpair2_ == 1) {
 			convertUnpairedToDiscordant();
 		}
 		doneDiscord_ = true;
 		exitDiscord_ =
 			((ndiscord_ > 0) ?
 				ReportingState::EXIT_WITH_ALIGNMENTS :
 				ReportingState::EXIT_NO_ALIGNMENTS);
 	}
 	assert(!paired_ || exitDiscord_ > ReportingState::EXIT_DID_NOT_EXIT);
 	doneUnpair_ = done_ = true;
 	assert(done());
 }
 /**
 * Populate given counters with the number of various kinds of alignments
 * to report for this read.  Concordant alignments are preferable to (and
 * mutually exclusive with) discordant alignments, and paired-end
 * alignments are preferable to unpaired alignments.
 *
 * The caller also needs some additional information for the case where a
 * pair or unpaired read aligns repetitively.  If the read is paired-end
 * and the paired-end has repetitive concordant alignments, that should be
 * reported, and 'pairMax' is set to true to indicate this.  If the read is
 * paired-end, does not have any conordant alignments, but does have
 * repetitive alignments for one or both mates, then that should be
 * reported, and 'unpair1Max' and 'unpair2Max' are set accordingly.
 *
 * Note that it's possible in the case of a paired-end read for the read to
 * have repetitive concordant alignments, but for one mate to have a unique
 * unpaired alignment.
 */
 void ReportingState::getReport(
 	uint64_t& nconcordAln, // # concordant alignments to report
 	uint64_t& ndiscordAln, // # discordant alignments to report
 	uint64_t& nunpair1Aln, // # unpaired alignments for mate #1 to report
 	uint64_t& nunpair2Aln, // # unpaired alignments for mate #2 to report
    uint64_t& nunpairRepeat1Aln, // # unpaired alignments for mate #1 to report
    uint64_t& nunpairRepeat2Aln, // # unpaired alignments for mate #2 to report
 	bool& pairMax,         // repetitive concordant alignments
 	bool& unpair1Max,      // repetitive alignments for mate #1
 	bool& unpair2Max)      // repetitive alignments for mate #2
 	const
 {
 	nconcordAln = ndiscordAln = nunpair1Aln = nunpair2Aln = 0;
    nunpairRepeat1Aln = nunpairRepeat2Aln = 0;
 	pairMax = unpair1Max = unpair2Max = false;
 	assert_gt(p_.khits, 0);
 	assert_gt(p_.mhits, 0);
 	if(paired_) {
 		// Do we have 1 or more concordant alignments to report?
 		if(exitConcord_ == ReportingState::EXIT_SHORT_CIRCUIT_k) {
 			// k at random
 			assert_geq(nconcord_, (uint64_t)p_.khits);
 			nconcordAln = p_.khits;
 			return;
 		} else if(exitConcord_ == ReportingState::EXIT_SHORT_CIRCUIT_M) {
 			assert(p_.msample);
 			assert_gt(nconcord_, 0);
 			pairMax = true;  // repetitive concordant alignments
 			if(p_.mixed) {
 				unpair1Max = nunpair1_ > (uint64_t)p_.mhits;
 				unpair2Max = nunpair2_ > (uint64_t)p_.mhits;
 			}
 			// Not sure if this is OK
 			nconcordAln = 1; // 1 at random
 			return;
 		} else if(exitConcord_ == ReportingState::EXIT_WITH_ALIGNMENTS) {
 			assert_gt(nconcord_, 0);
 			// <= k at random
            nconcordAln = min<uint64_t>(p_.khits, nconcord_);
 		}
 		assert(!p_.mhitsSet() || nconcord_ <= (uint64_t)p_.mhits+1);
 		// Do we have a discordant alignment to report?
 		if(exitDiscord_ == ReportingState::EXIT_WITH_ALIGNMENTS) {
 			// Report discordant
 			assert(p_.discord);
 			ndiscordAln = 1;
 			return;
 		}
 	}
 	assert_neq(ReportingState::EXIT_SHORT_CIRCUIT_TRUMPED, exitUnpair1_);
 	assert_neq(ReportingState::EXIT_SHORT_CIRCUIT_TRUMPED, exitUnpair2_);
 	if((paired_ && !p_.mixed) || nunpair1_ + nunpair2_ == 0) {
 		// Unpaired alignments either not reportable or non-existant
 		return;
 	}
 	// Do we have 1 or more alignments for mate #1 to report?
 	if(exitUnpair1_ == ReportingState::EXIT_SHORT_CIRCUIT_k) {
 		// k at random
 		assert_geq(nunpair1_, (uint64_t)p_.khits);
 		nunpair1Aln = p_.khits;
 	} else if(exitUnpair1_ == ReportingState::EXIT_SHORT_CIRCUIT_M) {
 		assert(p_.msample);
 		assert_gt(nunpair1_, 0);
 		unpair1Max = true;  // repetitive alignments for mate #1
 		nunpair1Aln = 1; // 1 at random
 	} else if(exitUnpair1_ == ReportingState::EXIT_WITH_ALIGNMENTS) {
 		assert_gt(nunpair1_, 0);
 		// <= k at random
 		nunpair1Aln = min<uint64_t>(nunpair1_, (uint64_t)p_.khits);
 	}
 	assert(!p_.mhitsSet() || paired_ || nunpair1_ <= (uint64_t)p_.mhits+1);
    if(p_.repeat) nunpairRepeat1Aln = nunpairRepeat1_;
 	// Do we have 2 or more alignments for mate #2 to report?
 	if(exitUnpair2_ == ReportingState::EXIT_SHORT_CIRCUIT_k) {
 		// k at random
 		nunpair2Aln = p_.khits;
 	} else if(exitUnpair2_ == ReportingState::EXIT_SHORT_CIRCUIT_M) {
 		assert(p_.msample);
 		assert_gt(nunpair2_, 0);
 		unpair2Max = true;  // repetitive alignments for mate #1
 		nunpair2Aln = 1; // 1 at random
 	} else if(exitUnpair2_ == ReportingState::EXIT_WITH_ALIGNMENTS) {
 		assert_gt(nunpair2_, 0);
 		// <= k at random
 		nunpair2Aln = min<uint64_t>(nunpair2_, (uint64_t)p_.khits);
 	}
 	assert(!p_.mhitsSet() || paired_ || nunpair2_ <= (uint64_t)p_.mhits+1);
    if(p_.repeat) nunpairRepeat2Aln = nunpairRepeat2_;
 }
 /**
 * Given the number of alignments in a category, check whether we
 * short-circuited out of the category.  Set the done and exit arguments to
 * indicate whether and how we short-circuited.
 */
 inline void ReportingState::areDone(
 	uint64_t cnt,    // # alignments in category
 	bool& done,      // out: whether we short-circuited out of category
 	int& exit) const // out: if done, how we short-circuited (-k? -m? etc)
 {
 	assert(!done);
 	// Have we exceeded the -k limit?
 	assert_gt(p_.khits, 0);
 	assert_gt(p_.mhits, 0);
 	if(cnt >= (uint64_t)p_.khits && !p_.mhitsSet()) {
 		done = true;
 		exit = ReportingState::EXIT_SHORT_CIRCUIT_k;
 	}
 	// Have we exceeded the -m or -M limit?
 	else if(p_.mhitsSet() && cnt > (uint64_t)p_.mhits) {
 		done = true;
 		assert(p_.msample);
 		exit = ReportingState::EXIT_SHORT_CIRCUIT_M;
 	}
 }
 #ifdef ALN_SINK_MAIN
 #include <iostream>
 bool testDones(
 	const ReportingState& st,
 	bool done1,
 	bool done2,
 	bool done3,
 	bool done4,
 	bool done5,
 	bool done6)
 {
 	assert(st.doneConcordant()    == done1);
 	assert(st.doneDiscordant()    == done2);
 	assert(st.doneUnpaired(true)  == done3);
 	assert(st.doneUnpaired(false) == done4);
 	assert(st.doneUnpaired()      == done5);
 	assert(st.done()              == done6);
 	assert(st.repOk());
 	return true;
 }
 int main(void) {
 	cerr << "Case 1 (simple unpaired 1) ... ";
 	{
 		uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
 		bool pairMax = false, unpair1Max = false, unpair2Max = false;
 		ReportingParams rp(
 			2,      // khits
 			0,      // mhits
 			0,      // pengap
 			false,  // msample
 			false,  // discord
 			false); // mixed
 		ReportingState st(rp);
 		st.nextRead(false); // unpaired read
 		assert(testDones(st, true, true, false, true, false, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, true, true, false, true, false, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, true, true, true, true, true, true));
 		st.finish();
 		assert(testDones(st, true, true, true, true, true, true));
 		assert_eq(0, st.numConcordant());
 		assert_eq(0, st.numDiscordant());
 		assert_eq(2, st.numUnpaired1());
 		assert_eq(0, st.numUnpaired2());
 		assert(st.repOk());
 		st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
 		             pairMax, unpair1Max, unpair2Max);
 		assert_eq(0, nconcord);
 		assert_eq(0, ndiscord);
 		assert_eq(2, nunpair1);
 		assert_eq(0, nunpair2);
 		assert(!pairMax);
 		assert(!unpair1Max);
 		assert(!unpair2Max);
 	}
 	cerr << "PASSED" << endl;
 	cerr << "Case 2 (simple unpaired 1) ... ";
 	{
 		uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
 		bool pairMax = false, unpair1Max = false, unpair2Max = false;
 		ReportingParams rp(
 			2,      // khits
 			3,      // mhits
 			0,      // pengap
 			false,  // msample
 			false,  // discord
 			false); // mixed
 		ReportingState st(rp);
 		st.nextRead(false); // unpaired read
 		assert(testDones(st, true, true, false, true, false, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, true, true, false, true, false, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, true, true, false, true, false, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, true, true, false, true, false, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, true, true, true, true, true, true));
 		assert_eq(0, st.numConcordant());
 		assert_eq(0, st.numDiscordant());
 		assert_eq(4, st.numUnpaired1());
 		assert_eq(0, st.numUnpaired2());
 		st.finish();
 		assert(testDones(st, true, true, true, true, true, true));
 		assert_eq(0, st.numConcordant());
 		assert_eq(0, st.numDiscordant());
 		assert_eq(4, st.numUnpaired1());
 		assert_eq(0, st.numUnpaired2());
 		assert(st.repOk());
 		st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
 		             pairMax, unpair1Max, unpair2Max);
 		assert_eq(0, nconcord);
 		assert_eq(0, ndiscord);
 		assert_eq(0, nunpair1);
 		assert_eq(0, nunpair2);
 		assert(!pairMax);
 		assert(unpair1Max);
 		assert(!unpair2Max);
 	}
 	cerr << "PASSED" << endl;
 	cerr << "Case 3 (simple paired 1) ... ";
 	{
 		uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
 		bool pairMax = false, unpair1Max = false, unpair2Max = false;
 		ReportingParams rp(
 			2,      // khits
 			3,      // mhits
 			0,      // pengap
 			false,  // msample
 			false,  // discord
 			false); // mixed
 		ReportingState st(rp);
 		st.nextRead(true); // unpaired read
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundUnpaired(false);
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundUnpaired(false);
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundUnpaired(false);
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundUnpaired(false);
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundConcordant();
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundConcordant();
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundConcordant();
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundConcordant();
 		assert(testDones(st, true, true, true, true, true, true));
 		assert_eq(4, st.numConcordant());
 		assert_eq(0, st.numDiscordant());
 		assert_eq(4, st.numUnpaired1());
 		assert_eq(4, st.numUnpaired2());
 		st.finish();
 		assert(testDones(st, true, true, true, true, true, true));
 		assert_eq(4, st.numConcordant());
 		assert_eq(0, st.numDiscordant());
 		assert_eq(4, st.numUnpaired1());
 		assert_eq(4, st.numUnpaired2());
 		assert(st.repOk());
 		st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
 		             pairMax, unpair1Max, unpair2Max);
 		assert_eq(0, nconcord);
 		assert_eq(0, ndiscord);
 		assert_eq(0, nunpair1);
 		assert_eq(0, nunpair2);
 		assert(pairMax);
 		assert(!unpair1Max); // because !mixed
 		assert(!unpair2Max); // because !mixed
 	}
 	cerr << "PASSED" << endl;
 	cerr << "Case 4 (simple paired 2) ... ";
 	{
 		uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
 		bool pairMax = false, unpair1Max = false, unpair2Max = false;
 		ReportingParams rp(
 			2,      // khits
 			3,      // mhits
 			0,      // pengap
 			false,  // msample
 			true,   // discord
 			true);  // mixed
 		ReportingState st(rp);
 		st.nextRead(true); // unpaired read
 		assert(testDones(st, false, false, false, false, false, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, false, false, false, false, false, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, false, true, false, false, false, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, false, true, false, false, false, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, false, true, true, false, false, false));
 		st.foundUnpaired(false);
 		assert(testDones(st, false, true, true, false, false, false));
 		st.foundUnpaired(false);
 		assert(testDones(st, false, true, true, false, false, false));
 		st.foundUnpaired(false);
 		assert(testDones(st, false, true, true, false, false, false));
 		st.foundUnpaired(false);
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundConcordant();
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundConcordant();
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundConcordant();
 		assert(testDones(st, false, true, true, true, true, false));
 		st.foundConcordant();
 		assert(testDones(st, true, true, true, true, true, true));
 		assert_eq(4, st.numConcordant());
 		assert_eq(0, st.numDiscordant());
 		assert_eq(4, st.numUnpaired1());
 		assert_eq(4, st.numUnpaired2());
 		st.finish();
 		assert(testDones(st, true, true, true, true, true, true));
 		assert_eq(4, st.numConcordant());
 		assert_eq(0, st.numDiscordant());
 		assert_eq(4, st.numUnpaired1());
 		assert_eq(4, st.numUnpaired2());
 		assert(st.repOk());
 		st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
 		             pairMax, unpair1Max, unpair2Max);
 		assert_eq(0, nconcord);
 		assert_eq(0, ndiscord);
 		assert_eq(0, nunpair1);
 		assert_eq(0, nunpair2);
 		assert(pairMax);
 		assert(unpair1Max);
 		assert(unpair2Max);
 	}
 	cerr << "PASSED" << endl;
 	cerr << "Case 5 (potential discordant after concordant) ... ";
 	{
 		uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
 		bool pairMax = false, unpair1Max = false, unpair2Max = false;
 		ReportingParams rp(
 			2,      // khits
 			3,      // mhits
 			0,      // pengap
 			false,  // msample
 			true,   // discord
 			true);  // mixed
 		ReportingState st(rp);
 		st.nextRead(true);
 		assert(testDones(st, false, false, false, false, false, false));
 		st.foundUnpaired(true);
 		st.foundUnpaired(false);
 		st.foundConcordant();
 		assert(testDones(st, false, true, false, false, false, false));
 		st.finish();
 		assert(testDones(st, true, true, true, true, true, true));
 		assert_eq(1, st.numConcordant());
 		assert_eq(0, st.numDiscordant());
 		assert_eq(1, st.numUnpaired1());
 		assert_eq(1, st.numUnpaired2());
 		assert(st.repOk());
 		st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
 		             pairMax, unpair1Max, unpair2Max);
 		assert_eq(1, nconcord);
 		assert_eq(0, ndiscord);
 		assert_eq(0, nunpair1);
 		assert_eq(0, nunpair2);
 		assert(!pairMax);
 		assert(!unpair1Max);
 		assert(!unpair2Max);
 	}
 	cerr << "PASSED" << endl;
 	cerr << "Case 6 (true discordant) ... ";
 	{
 		uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
 		bool pairMax = false, unpair1Max = false, unpair2Max = false;
 		ReportingParams rp(
 			2,      // khits
 			3,      // mhits
 			0,      // pengap
 			false,  // msample
 			true,   // discord
 			true);  // mixed
 		ReportingState st(rp);
 		st.nextRead(true);
 		assert(testDones(st, false, false, false, false, false, false));
 		st.foundUnpaired(true);
 		st.foundUnpaired(false);
 		assert(testDones(st, false, false, false, false, false, false));
 		st.finish();
 		assert(testDones(st, true, true, true, true, true, true));
 		assert_eq(0, st.numConcordant());
 		assert_eq(1, st.numDiscordant());
 		assert_eq(0, st.numUnpaired1());
 		assert_eq(0, st.numUnpaired2());
 		assert(st.repOk());
 		st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
 		             pairMax, unpair1Max, unpair2Max);
 		assert_eq(0, nconcord);
 		assert_eq(1, ndiscord);
 		assert_eq(0, nunpair1);
 		assert_eq(0, nunpair2);
 		assert(!pairMax);
 		assert(!unpair1Max);
 		assert(!unpair2Max);
 	}
 	cerr << "PASSED" << endl;
 	cerr << "Case 7 (unaligned pair & uniquely aligned mate, mixed-mode) ... ";
 	{
 		uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
 		bool pairMax = false, unpair1Max = false, unpair2Max = false;
 		ReportingParams rp(
 			1,      // khits
 			1,      // mhits
 			0,      // pengap
 			false,  // msample
 			true,   // discord
 			true);  // mixed
 		ReportingState st(rp);
 		st.nextRead(true); // unpaired read
 		// assert(st.doneConcordant()    == done1);
 		// assert(st.doneDiscordant()    == done2);
 		// assert(st.doneUnpaired(true)  == done3);
 		// assert(st.doneUnpaired(false) == done4);
 		// assert(st.doneUnpaired()      == done5);
 		// assert(st.done()              == done6);
 		st.foundUnpaired(true);
 		assert(testDones(st, false, false, false, false, false, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, false, true, true, false, false, false));
 		assert_eq(0, st.numConcordant());
 		assert_eq(0, st.numDiscordant());
 		assert_eq(2, st.numUnpaired1());
 		assert_eq(0, st.numUnpaired2());
 		st.finish();
 		st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
 		             pairMax, unpair1Max, unpair2Max);
 		assert_eq(0, nconcord);
 		assert_eq(0, ndiscord);
 		assert_eq(0, nunpair1);
 		assert_eq(0, nunpair2);
 		assert(!pairMax);
 		assert(unpair1Max);
 		assert(!unpair2Max);
 	}
 	cerr << "PASSED" << endl;
 	cerr << "Case 8 (unaligned pair & uniquely aligned mate, NOT mixed-mode) ... ";
 	{
 		uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
 		bool pairMax = false, unpair1Max = false, unpair2Max = false;
 		ReportingParams rp(
 			1,      // khits
 			1,      // mhits
 			0,      // pengap
 			false,  // msample
 			true,   // discord
 			false); // mixed
 		ReportingState st(rp);
 		st.nextRead(true); // unpaired read
 		// assert(st.doneConcordant()    == done1);
 		// assert(st.doneDiscordant()    == done2);
 		// assert(st.doneUnpaired(true)  == done3);
 		// assert(st.doneUnpaired(false) == done4);
 		// assert(st.doneUnpaired()      == done5);
 		// assert(st.done()              == done6);
 		st.foundUnpaired(true);
 		assert(testDones(st, false, false, true, true, true, false));
 		st.foundUnpaired(true);
 		assert(testDones(st, false, true, true, true, true, false));
 		assert_eq(0, st.numConcordant());
 		assert_eq(0, st.numDiscordant());
 		assert_eq(2, st.numUnpaired1());
 		assert_eq(0, st.numUnpaired2());
 		st.finish();
 		st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
 		             pairMax, unpair1Max, unpair2Max);
 		assert_eq(0, nconcord);
 		assert_eq(0, ndiscord);
 		assert_eq(0, nunpair1);
 		assert_eq(0, nunpair2);
 		assert(!pairMax);
 		assert(!unpair1Max); // not really relevant
 		assert(!unpair2Max); // not really relevant
 	}
 	cerr << "PASSED" << endl;
 	cerr << "Case 9 (repetitive pair, only one mate repetitive) ... ";
 	{
 		uint64_t nconcord = 0, ndiscord = 0, nunpair1 = 0, nunpair2 = 0;
 		bool pairMax = false, unpair1Max = false, unpair2Max = false;
 		ReportingParams rp(
 			1,      // khits
 			1,      // mhits
 			0,      // pengap
 			true,   // msample
 			true,   // discord
 			true);  // mixed
 		ReportingState st(rp);
 		st.nextRead(true); // unpaired read
 		// assert(st.doneConcordant()    == done1);
 		// assert(st.doneDiscordant()    == done2);
 		// assert(st.doneUnpaired(true)  == done3);
 		// assert(st.doneUnpaired(false) == done4);
 		// assert(st.doneUnpaired()      == done5);
 		// assert(st.done()              == done6);
 		st.foundConcordant();
 		assert(st.repOk());
 		st.foundUnpaired(true);
 		assert(st.repOk());
 		st.foundUnpaired(false);
 		assert(st.repOk());
 		assert(testDones(st, false, true, false, false, false, false));
 		assert(st.repOk());
 		st.foundConcordant();
 		assert(st.repOk());
 		st.foundUnpaired(true);
 		assert(st.repOk());
 		assert(testDones(st, true, true, true, false, false, false));
 		assert_eq(2, st.numConcordant());
 		assert_eq(0, st.numDiscordant());
 		assert_eq(2, st.numUnpaired1());
 		assert_eq(1, st.numUnpaired2());
 		st.foundUnpaired(false);
 		assert(st.repOk());
 		assert(testDones(st, true, true, true, true, true, true));		
 		assert_eq(2, st.numConcordant());
 		assert_eq(0, st.numDiscordant());
 		assert_eq(2, st.numUnpaired1());
 		assert_eq(2, st.numUnpaired2());
 		st.finish();
 		st.getReport(nconcord, ndiscord, nunpair1, nunpair2,
 		             pairMax, unpair1Max, unpair2Max);
 		assert_eq(1, nconcord);
 		assert_eq(0, ndiscord);
 		assert_eq(0, nunpair1);
 		assert_eq(0, nunpair2);
 		assert(pairMax);
 		assert(unpair1Max); // not really relevant
 		assert(unpair2Max); // not really relevant
 	}
 	cerr << "PASSED" << endl;
 }
 #endif /*def ALN_SINK_MAIN*/
--- a/aln_sink.h
+++ b/aln_sink.h
--- a/alphabet.cpp
+++ b/alphabet.cpp
@ -0,0 +1,536 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #include <stdint.h>
 #include <cassert>
 #include <string>
 #include "alphabet.h"
 using namespace std;
 /**
 * Mapping from ASCII characters to DNA categories:
 *
 * 0 = invalid - error
 * 1 = DNA
 * 2 = IUPAC (ambiguous DNA)
 * 3 = not an error, but unmatchable; alignments containing this
 *     character are invalid
 */
 uint8_t asc2dnacat[] = {
 	/*   0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0,
 	       /*                                        - */
 	/*  48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  64 */ 0, 1, 2, 1, 2, 0, 0, 1, 2, 0, 0, 2, 0, 2, 2, 0,
 	       /*    A  B  C  D        G  H        K     M  N */
 	/*  80 */ 0, 0, 2, 2, 1, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0,
 	       /*       R  S  T     V  W  X  Y */
 	/*  96 */ 0, 1, 2, 1, 2, 0, 0, 1, 2, 0, 0, 2, 0, 2, 2, 0,
 	       /*    a  b  c  d        g  h        k     m  n */
 	/* 112 */ 0, 0, 2, 2, 1, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0,
 	       /*       r  s  t     v  w  x  y */
 	/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 };
 // 5-bit pop count
 int mask2popcnt[] = {
 	0, 1, 1, 2, 1, 2, 2, 3,
 	1, 2, 2, 3, 2, 3, 3, 4,
 	1, 2, 2, 3, 2, 3, 3, 4,
 	2, 3, 3, 4, 3, 4, 4, 5
 };
 /**
 * Mapping from masks to ASCII characters for ambiguous nucleotides.
 */
 char mask2dna[] = {
 	'?', // 0
 	'A', // 1
 	'C', // 2
 	'M', // 3
 	'G', // 4
 	'R', // 5
 	'S', // 6
 	'V', // 7
 	'T', // 8
 	'W', // 9
 	'Y', // 10
 	'H', // 11
 	'K', // 12
 	'D', // 13
 	'B', // 14
 	'N', // 15 (inclusive N)
 	'N'  // 16 (exclusive N)
 };
 /**
 * Mapping from ASCII characters for ambiguous nucleotides into masks:
 */
 uint8_t asc2dnamask[] = {
 	/*   0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  64 */ 0, 1,14, 2,13, 0, 0, 4,11, 0, 0,12, 0, 3,15, 0,
 	       /*    A  B  C  D        G  H        K     M  N */
 	/*  80 */ 0, 0, 5, 6, 8, 0, 7, 9, 0,10, 0, 0, 0, 0, 0, 0,
 	       /*       R  S  T     V  W     Y */
 	/*  96 */ 0, 1,14, 2,13, 0, 0, 4,11, 0, 0,12, 0, 3,15, 0,
 	       /*    a  b  c  d        g  h        k     m  n */
 	/* 112 */ 0, 0, 5, 6, 8, 0, 7, 9, 0,10, 0, 0, 0, 0, 0, 0,
 	       /*       r  s  t     v  w     y */
 	/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 };
 /**
 * Convert a pair of DNA masks to a color mask
 *
 * 
 */ 
 uint8_t dnamasks2colormask[16][16] = {
 	         /* 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 */
 	/*  0 */ {  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0 },
 	/*  1 */ {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
 	/*  2 */ {  0,  2,  1,  3,  8, 10,  9, 11,  4,  6,  5,  7, 12, 14, 13, 15 },
 	/*  3 */ {  0,  3,  3,  3, 12, 15, 15, 15, 12, 15, 15, 15, 12, 15, 15, 15 },
 	/*  4 */ {  0,  4,  8, 12,  1,  5,  9, 13,  2,  6, 10, 14,  3,  7, 11, 15 },
 	/*  5 */ {  0,  5, 10, 15,  5,  5, 15, 15, 10, 15, 10, 15, 15, 15, 15, 15 },
 	/*  6 */ {  0,  6,  9, 15,  9, 15,  9, 15,  6,  6, 15, 15, 15, 15, 15, 15 },
 	/*  7 */ {  0,  7, 11, 15, 13, 15, 15, 15, 14, 15, 15, 15, 15, 15, 15, 15 },
 	/*  8 */ {  0,  8,  4, 12,  2, 10,  6, 14,  1,  9,  5, 13,  3, 11,  7, 15 },
 	/*  9 */ {  0,  9,  6, 15,  6, 15,  6, 15,  9,  9, 15, 15, 15, 15, 15, 15 },
 	/* 10 */ {  0, 10,  5, 15, 10, 10, 15, 15,  5, 15,  5, 15, 15, 15, 15, 15 },
 	/* 11 */ {  0, 11,  7, 15, 14, 15, 15, 15, 13, 15, 15, 15, 15, 15, 15, 15 },
 	/* 12 */ {  0, 12, 12, 12,  3, 15, 15, 15,  3, 15, 15, 15,  3, 15, 15, 15 },
 	/* 13 */ {  0, 13, 14, 15,  7, 15, 15, 15, 11, 15, 15, 15, 15, 15, 15, 15 },
 	/* 14 */ {  0, 14, 13, 15, 11, 15, 15, 15,  7, 15, 15, 15, 15, 15, 15, 15 },
 	/* 15 */ {  0, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15 }
 };
 /**
 * Mapping from ASCII characters for ambiguous nucleotides into masks:
 */
 char asc2dnacomp[] = {
 	/*   0 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/*  16 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/*  32 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,'-',  0,  0,
 	/*  48 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/*  64 */ 0,'T','V','G','H',  0,  0,'C','D',  0,  0,'M',  0,'K','N',  0,
 	       /*    A   B   C   D           G   H           K       M   N */
 	/*  80 */ 0,  0,'Y','S','A',  0,'B','W',  0,'R',  0,  0,  0,  0,  0,  0,
 	       /*        R   S   T       V   W       Y */
 	/*  96 */ 0,'T','V','G','H',  0,  0,'C','D',  0,  0,'M',  0,'K','N',  0,
 	        /*   a   b   c   d           g   h           k       m   n */
 	/* 112 */ 0,  0,'Y','S','A',  0,'B','W',  0,'R',  0,  0,  0,  0,  0,  0,
 	       /*        r   s   t       v   w       y */
 	/* 128 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 144 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 160 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 176 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 192 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 208 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 224 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 240 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0
 };
 /**
 * Mapping from ASCII characters for ambiguous nucleotides into masks:
 */
 char col2dna[] = {
 	/*   0 */  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/*  16 */  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/*  32 */  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,'-','N',  0,
 	       /*                                                     -   . */
 	/*  48 */'A','C','G','T','N',  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	       /* 0   1   2   3   4  */
 	/*  64 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/*  80 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/*  96 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 112 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 128 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 144 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 160 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 176 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 192 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 208 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 224 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 240 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0
 };
 /**
 * Mapping from ASCII characters for ambiguous nucleotides into masks:
 */
 char dna2col[] = {
 	/*   0 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/*  16 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/*  32 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,'-',  0,  0,
 	/*  48 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/*  64 */ 0,'0',  0,'1',  0,  0,  0,'2',  0,  0,  0,  0,  0,  0,'4',  0,
 	       /*    A       C               G                           N */
 	/*  80 */ 0,  0,  0,  0,'3',  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	       /*                T */
 	/*  92 */ 0,'0',  0,'1',  0,  0,  0,'2',  0,  0,  0,  0,  0,  0,'4',  0,
 	       /*    a       c               g                           n */
 	/* 112 */ 0,  0,  0,  0,'3',  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	       /*                t */
 	/* 128 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 144 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 160 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 176 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 192 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 208 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 224 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
 	/* 240 */ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0
 };
 /**
 * Mapping from ASCII characters for ambiguous nucleotides into masks:
 */
 const char* dna2colstr[] = {
 	/*   0 */ "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",
 	/*  16 */ "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",
 	/*  32 */ "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "-",  "?",  "?",
 	/*  48 */ "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",
 	/*  64 */ "?",  "0","1|2|3","1","0|2|3","?",  "?",  "2","0|1|3","?",  "?", "2|3", "?", "0|1", ".",  "?",
 	/*               A     B     C     D                 G     H                 K           M     N */
 	/*  80 */ "?",  "?", "0|2","1|2", "3",  "?","0|1|2","0|3","?", "1|3", "?",  "?",  "?",  "?",  "?",  "?",
 	/*                     R     S     T           V     W           Y */
 	/*  92 */ "?",  "?","1|2|3","1","0|2|3","?",  "?",  "2","0|1|3","?",  "?", "2|3", "?", "0|1", ".",  "?",
 	/*               a     b     c     d                 g     h                 k           m     n */
 	/* 112 */ "?",  "0", "0|2","1|2", "3",  "?","0|1|2","0|3","?", "1|3", "?",  "?",  "?",  "?",  "?",  "?",
 	/*                     r     s     t           v     w           y */
 	/* 128 */ "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",
 	/* 144 */ "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",
 	/* 160 */ "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",
 	/* 176 */ "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",
 	/* 192 */ "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",
 	/* 208 */ "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",
 	/* 224 */ "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",
 	/* 240 */ "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?",  "?"
 };
 /**
 * Mapping from ASCII characters to color categories:
 *
 * 0 = invalid - error
 * 1 = valid color
 * 2 = IUPAC (ambiguous DNA) - there is no such thing for colors to my
 *     knowledge
 * 3 = not an error, but unmatchable; alignments containing this
 *     character are invalid
 */
 uint8_t asc2colcat[] = {
 	/*   0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0,
 	       /*                                        -  . */
 	/*  48 */ 1, 1, 1, 1, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	       /* 0  1  2  3  4  */
 	/*  64 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  80 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  96 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 112 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 };
 /**
 * Set the category for all IUPAC codes.  By default they're in
 * category 2 (IUPAC), but sometimes we'd like to put them in category
 * 3 (unmatchable), for example.
 */
 void setIupacsCat(uint8_t cat) {
 	assert(cat < 4);
 	asc2dnacat[(int)'B'] = asc2dnacat[(int)'b'] =
 	asc2dnacat[(int)'D'] = asc2dnacat[(int)'d'] =
 	asc2dnacat[(int)'H'] = asc2dnacat[(int)'h'] =
 	asc2dnacat[(int)'K'] = asc2dnacat[(int)'k'] =
 	asc2dnacat[(int)'M'] = asc2dnacat[(int)'m'] =
 	asc2dnacat[(int)'N'] = asc2dnacat[(int)'n'] =
 	asc2dnacat[(int)'R'] = asc2dnacat[(int)'r'] =
 	asc2dnacat[(int)'S'] = asc2dnacat[(int)'s'] =
 	asc2dnacat[(int)'V'] = asc2dnacat[(int)'v'] =
 	asc2dnacat[(int)'W'] = asc2dnacat[(int)'w'] =
 	asc2dnacat[(int)'X'] = asc2dnacat[(int)'x'] =
 	asc2dnacat[(int)'Y'] = asc2dnacat[(int)'y'] = cat;
 }
 /// For converting from ASCII to the Dna5 code where A=0, C=1, G=2,
 /// T=3, N=4
 uint8_t asc2dna[] = {
        /*   0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*  16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*  32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*  48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*  64 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
               /*    A     C           G                    N */
        /*  80 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
               /*             T */
        /*  96 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
               /*    a     c           g                    n */
        /* 112 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
               /*             t */
        /* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 };
 uint8_t asc2dna_3N[2][256] = {
        {
            /*   0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /*  16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /*  32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /*  48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /*  64 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
                   /*    A     C           G                    N */
            /*  80 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                   /*             T */
            /*  96 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
                   /*    a     c           g                    n */
            /* 112 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                   /*             t */
            /* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        },
        {
            /*   0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /*  16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /*  32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /*  48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /*  64 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
            /*    A     C           G                    N */
            /*  80 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /*             T */
            /*  96 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
            /*    a     c           g                    n */
            /* 112 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /*             t */
            /* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            /* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        }
 };
 // this is only used in BASE_CHANGE case
 uint8_t asc2dna_1[] = {
        /*   0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*  16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*  32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*  48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*  64 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
        /*    A     C           G                    N */
        /*  80 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*             T */
        /*  96 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
        /*    a     c           g                    n */
        /* 112 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*             t */
        /* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 };
 uint8_t asc2dna_2[] = {
        /*   0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*  16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*  32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*  48 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*  64 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
        /*    A     C           G                    N */
        /*  80 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*             T */
        /*  96 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
        /*    a     c           g                    n */
        /* 112 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /*             t */
        /* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        /* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 };
 /// Convert an ascii char representing a base or a color to a 2-bit
 /// code: 0=A,0; 1=C,1; 2=G,2; 3=T,3; 4=N,.
 uint8_t asc2dnaOrCol[] = {
 	/*   0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 0,
 	/*                                               -  . */
 	/*  48 */ 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*        0  1  2  3 */
 	/*  64 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
 	/*           A     C           G                    N */
 	/*  80 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*                    T */
 	/*  96 */ 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 4, 0,
 	/*           a     c           g                    n */
 	/* 112 */ 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*                    t */
 	/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 };
 /// For converting from ASCII to the Dna5 code where A=0, C=1, G=2,
 /// T=3, N=4
 uint8_t asc2col[] = {
 	/*   0 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  16 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  32 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 0,
 	       /*                                        -  . */
 	/*  48 */ 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	       /* 0  1  2  3 */
 	/*  64 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  80 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/*  96 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 112 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 128 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 144 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 160 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 176 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 192 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 208 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 224 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 240 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 };
 /**
 * Convert a nucleotide and a color to the paired nucleotide.  Indexed
 * first by nucleotide then by color.  Note that this is exactly the
 * same as the dinuc2color array.
 */
 uint8_t nuccol2nuc[5][5] = {
 	/*       B  G  O  R  . */
 	/* A */ {0, 1, 2, 3, 4},
 	/* C */ {1, 0, 3, 2, 4},
 	/* G */ {2, 3, 0, 1, 4},
 	/* T */ {3, 2, 1, 0, 4},
 	/* N */ {4, 4, 4, 4, 4}
 };
 /**
 * Convert a pair of nucleotides to a color.
 */
 uint8_t dinuc2color[5][5] = {
 	/* A */ {0, 1, 2, 3, 4},
 	/* C */ {1, 0, 3, 2, 4},
 	/* G */ {2, 3, 0, 1, 4},
 	/* T */ {3, 2, 1, 0, 4},
 	/* N */ {4, 4, 4, 4, 4}
 };
 /// Convert bit encoded DNA char to its complement
 int dnacomp[5] = {
 	3, 2, 1, 0, 4
 };
 const char *iupacs = "!ACMGRSVTWYHKDBN!acmgrsvtwyhkdbn";
 char mask2iupac[16] = {
 	-1,
 	'A', // 0001
 	'C', // 0010
 	'M', // 0011
 	'G', // 0100
 	'R', // 0101
 	'S', // 0110
 	'V', // 0111
 	'T', // 1000
 	'W', // 1001
 	'Y', // 1010
 	'H', // 1011
 	'K', // 1100
 	'D', // 1101
 	'B', // 1110
 	'N', // 1111
 };
 int maskcomp[16] = {
 	0,  // 0000 (!) -> 0000 (!)
 	8,  // 0001 (A) -> 1000 (T)
 	4,  // 0010 (C) -> 0100 (G)
 	12, // 0011 (M) -> 1100 (K)
 	2,  // 0100 (G) -> 0010 (C)
 	10, // 0101 (R) -> 1010 (Y)
 	6,  // 0110 (S) -> 0110 (S)
 	14, // 0111 (V) -> 1110 (B)
 	1,  // 1000 (T) -> 0001 (A)
 	9,  // 1001 (W) -> 1001 (W)
 	5,  // 1010 (Y) -> 0101 (R)
 	13, // 1011 (H) -> 1101 (D)
 	3,  // 1100 (K) -> 0011 (M)
 	11, // 1101 (D) -> 1011 (H)
 	7,  // 1110 (B) -> 0111 (V)
 	15, // 1111 (N) -> 1111 (N)
 };
--- a/alphabet.h
+++ b/alphabet.h
@ -0,0 +1,199 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef ALPHABETS_H_
 #define ALPHABETS_H_
 #include <stdexcept>
 #include <string>
 #include <sstream>
 #include <stdint.h>
 #include "assert_helpers.h"
 using namespace std;
 /// Convert an ascii char to a DNA category.  Categories are:
 /// 0 -> invalid
 /// 1 -> unambiguous a, c, g or t
 /// 2 -> ambiguous
 /// 3 -> unmatchable
 extern uint8_t asc2dnacat[];
 /// Convert masks to ambiguous nucleotides
 extern char mask2dna[];
 /// Convert ambiguous ASCII nuceleotide to mask
 extern uint8_t asc2dnamask[];
 /// Convert mask to # of alternative in the mask
 extern int mask2popcnt[];
 /// Convert an ascii char to a 2-bit base: 0=A, 1=C, 2=G, 3=T, 4=N
 extern uint8_t asc2dna[];
 /// Convert an ascii char representing a base or a color to a 2-bit
 /// code: 0=A,0; 1=C,1; 2=G,2; 3=T,3; 4=N,.
 extern uint8_t asc2dnaOrCol[];
 /// Convert a pair of DNA masks to a color mask
 extern uint8_t dnamasks2colormask[16][16];
 /// Convert an ascii char to a color category.  Categories are:
 /// 0 -> invalid
 /// 1 -> unambiguous 0, 1, 2 or 3
 /// 2 -> ambiguous (not applicable for colors)
 /// 3 -> unmatchable
 extern uint8_t asc2colcat[];
 /// Convert an ascii char to a 2-bit base: 0=A, 1=C, 2=G, 3=T, 4=N
 extern uint8_t asc2col[];
 /// Convert an ascii char to its DNA complement, including IUPACs
 extern char asc2dnacomp[];
 /// Convert a pair of 2-bit (and 4=N) encoded DNA bases to a color
 extern uint8_t dinuc2color[5][5];
 /// Convert a 2-bit nucleotide (and 4=N) and a color to the
 /// corresponding 2-bit nucleotide
 extern uint8_t nuccol2nuc[5][5];
 /// Convert a 4-bit mask into an IUPAC code
 extern char mask2iupac[16];
 /// Convert an ascii color to an ascii dna char
 extern char col2dna[];
 /// Convert an ascii dna to a color char
 extern char dna2col[];
 /// Convert an ascii dna to a color char
 extern const char* dna2colstr[];
 /// Convert bit encoded DNA char to its complement
 extern int dnacomp[5];
 /// String of all DNA and IUPAC characters
 extern const char *iupacs;
 /// Map from masks to their reverse-complement masks
 extern int maskcomp[16];
 /**
 * Return true iff c is a Dna character.
 */
 static inline bool isDna(char c) {
 	return asc2dnacat[(int)c] > 0;
 }
 /**
 * Return true iff c is a color character.
 */
 static inline bool isColor(char c) {
 	return asc2colcat[(int)c] > 0;
 }
 /**
 * Return true iff c is an ambiguous Dna character.
 */
 static inline bool isAmbigNuc(char c) {
 	return asc2dnacat[(int)c] == 2;
 }
 /**
 * Return true iff c is an ambiguous color character.
 */
 static inline bool isAmbigColor(char c) {
 	return asc2colcat[(int)c] == 2;
 }
 /**
 * Return true iff c is an ambiguous character.
 */
 static inline bool isAmbig(char c, bool color) {
 	return (color ? asc2colcat[(int)c] : asc2dnacat[(int)c]) == 2;
 }
 /**
 * Return true iff c is an unambiguous DNA character.
 */
 static inline bool isUnambigNuc(char c) {
 	return asc2dnacat[(int)c] == 1;
 }
 /**
 * Return the DNA complement of the given ASCII char.
 */
 static inline char comp(char c) {
 	switch(c) {
 	case 'a': return 't';
 	case 'A': return 'T';
 	case 'c': return 'g';
 	case 'C': return 'G';
 	case 'g': return 'c';
 	case 'G': return 'C';
 	case 't': return 'a';
 	case 'T': return 'A';
 	default: return c;
 	}
 }
 /**
 * Return the reverse complement of a bit-encoded nucleotide.
 */
 static inline int compDna(int c) {
 	assert_leq(c, 4);
 	return dnacomp[c];
 }
 /**
 * Return true iff c is an unambiguous Dna character.
 */
 static inline bool isUnambigDna(char c) {
 	return asc2dnacat[(int)c] == 1;
 }
 /**
 * Return true iff c is an unambiguous color character (0,1,2,3).
 */
 static inline bool isUnambigColor(char c) {
 	return asc2colcat[(int)c] == 1;
 }
 /// Convert a pair of 2-bit (and 4=N) encoded DNA bases to a color
 extern uint8_t dinuc2color[5][5];
 /**
 * Decode a not-necessarily-ambiguous nucleotide.
 */
 static inline void decodeNuc(char c , int& num, int *alts) {
 	switch(c) {
 	case 'A': alts[0] = 0; num = 1; break;
 	case 'C': alts[0] = 1; num = 1; break;
 	case 'G': alts[0] = 2; num = 1; break;
 	case 'T': alts[0] = 3; num = 1; break;
 	case 'M': alts[0] = 0; alts[1] = 1; num = 2; break;
 	case 'R': alts[0] = 0; alts[1] = 2; num = 2; break;
 	case 'W': alts[0] = 0; alts[1] = 3; num = 2; break;
 	case 'S': alts[0] = 1; alts[1] = 2; num = 2; break;
 	case 'Y': alts[0] = 1; alts[1] = 3; num = 2; break;
 	case 'K': alts[0] = 2; alts[1] = 3; num = 2; break;
 	case 'V': alts[0] = 0; alts[1] = 1; alts[2] = 2; num = 3; break;
 	case 'H': alts[0] = 0; alts[1] = 1; alts[2] = 3; num = 3; break;
 	case 'D': alts[0] = 0; alts[1] = 2; alts[2] = 3; num = 3; break;
 	case 'B': alts[0] = 1; alts[1] = 2; alts[2] = 3; num = 3; break;
 	case 'N': alts[0] = 0; alts[1] = 1; alts[2] = 2; alts[3] = 3; num = 4; break;
 	default: {
 		std::cerr << "Bad IUPAC code: " << c << ", (int: " << (int)c << ")" << std::endl;
 		throw std::runtime_error("");
 	}
 	}
 }
 extern void setIupacsCat(uint8_t cat);
 #endif /*ALPHABETS_H_*/
--- a/alt.h
+++ b/alt.h
@ -0,0 +1,294 @@
 /*
 * Copyright 2015, Daehwan Kim <infphilo@gmail.com>
 *
 * This file is part of HISAT 2.
 *
 * HISAT 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * HISAT 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with HISAT 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef ALT_H_
 #define ALT_H_
 #include <iostream>
 #include <fstream>
 #include <limits>
 #include "assert_helpers.h"
 #include "word_io.h"
 #include "mem_ids.h"
 using namespace std;
 enum ALT_TYPE {
    ALT_NONE = 0,
    ALT_SNP_SGL,     // single nucleotide substitution
    ALT_SNP_INS,     // small insertion wrt reference genome
    ALT_SNP_DEL,     // small deletion wrt reference genome
    ALT_SNP_ALT,     // alternative sequence (to be implemented ...)
    ALT_SPLICESITE,
    ALT_EXON
 };
 template <typename index_t>
 struct ALT {
    ALT() {
        reset();
    }
    void reset() {
        type = ALT_NONE;
        pos = len = 0;
        seq = 0;
    }
    ALT_TYPE type;
    union {
        index_t pos;
        index_t left;
    };
    union {
        index_t len;
        index_t right;
    };
    union {
        uint64_t seq;  // used to store 32 bp, but it can be used to store a pointer to EList<uint64_t>
        struct {
            union {
                bool fw;
                bool reversed;
            };
            bool excluded;
        };
    };
 public:
    // in order to support a sequence longer than 32 bp
    bool snp() const { return type == ALT_SNP_SGL || type == ALT_SNP_DEL || type == ALT_SNP_INS; }
    bool splicesite() const { return type == ALT_SPLICESITE; }
    bool mismatch() const { return type == ALT_SNP_SGL; }
    bool gap() const { return type == ALT_SNP_DEL || type == ALT_SNP_INS || type == ALT_SPLICESITE; }
    bool deletion() const { return type == ALT_SNP_DEL; }
    bool insertion() const { return type == ALT_SNP_INS; }
    bool exon() const { return type == ALT_EXON; }
    bool operator< (const ALT& o) const {
        if(pos != o.pos) return pos < o.pos;
        if(type != o.type) {
            if(type == ALT_NONE || o.type == ALT_NONE) {
                return type == ALT_NONE;
            }
            if(type == ALT_SNP_INS) return true;
            else if(o.type == ALT_SNP_INS) return false;
            return type < o.type;
        }
        if(len != o.len) return len < o.len;
        if(seq != o.seq) return seq < o.seq;
        return false;
    }
    bool compatibleWith(const ALT& o) const {
        if(pos == o.pos) return false;
        // sort the two SNPs
        const ALT& a = (pos < o.pos ? *this : o);
        const ALT& b = (pos < o.pos ? o : *this);
        if(a.snp()) {
            if(a.type == ALT_SNP_DEL || a.type == ALT_SNP_INS) {
                if(b.pos <= a.pos + a.len) {
                    return false;
                }
            }
        } else if(a.splicesite()) {
            if(b.pos <= a.right + 2) {
                return false;
            }
        } else {
            assert(false);
        }
        return true;
    }
    bool isSame(const ALT& o) const {
        if(type != o.type)
            return false;
        if(type == ALT_SNP_SGL) {
            return pos == o.pos && seq == o.seq;
        } else if(type == ALT_SNP_DEL || type == ALT_SNP_INS || type == ALT_SPLICESITE) {
            if(type == ALT_SNP_INS) {
                if(seq != o.seq)
                    return false;
            }
            if(reversed == o.reversed) {
                return pos == o.pos && len == o.len;
            } else {
                if(reversed) {
                    return pos - len + 1 == o.pos && len == o.len;
                } else {
                    return pos == o.pos - o.len + 1 && len == o.len;
                }
            }       
        } else {
            assert(false);
        }
        return true;
    }
 #ifndef NDEBUG
    bool repOk() const {
        if(type == ALT_SNP_SGL) {
            if(len != 1) {
                assert(false);
                return false;
            }
            if(seq > 3) {
                assert(false);
                return false;
            }
        } else if(type == ALT_SNP_DEL) {
            if(len <= 0) {
                assert(false);
                return false;
            }
            if(seq != 0) {
                assert(false);
                return false;
            }
        } else if(type == ALT_SNP_INS) {
            if(len <= 0) {
                assert(false);
                return false;
            }
        } else if(type == ALT_SPLICESITE) {
            assert_lt(left, right);
            assert_leq(fw, 1);
        }else {
            assert(false);
            return false;
        }
        return true;
    }
 #endif
    bool write(ofstream& f_out, bool bigEndian) const {
        writeIndex<index_t>(f_out, pos, bigEndian);
        writeU32(f_out, type, bigEndian);
        writeIndex<index_t>(f_out, len, bigEndian);
        writeIndex<uint64_t>(f_out, seq, bigEndian);
        return true;
    }
    bool read(ifstream& f_in, bool bigEndian) {
        pos = readIndex<index_t>(f_in, bigEndian);
        type = (ALT_TYPE)readU32(f_in, bigEndian);
        assert_neq(type, ALT_SNP_ALT);
        len = readIndex<index_t>(f_in, bigEndian);
        seq = readIndex<uint64_t>(f_in, bigEndian);
        return true;
    }
 };
 template <typename index_t>
 struct Haplotype {
    Haplotype() {
        reset();
    }
    void reset() {
        left = right = 0;
        alts.clear();
    }
    index_t left;
    index_t right;
    EList<index_t, 1> alts;
    bool operator< (const Haplotype& o) const {
        if(left != o.left) return left < o.left;
        if(right != o.right) return right < o.right;
        return false;
    }
    bool write(ofstream& f_out, bool bigEndian) const {
        writeIndex<index_t>(f_out, left, bigEndian);
        writeIndex<index_t>(f_out, right, bigEndian);
        writeIndex<index_t>(f_out, alts.size(), bigEndian);
        for(index_t i = 0; i < alts.size(); i++) {
            writeIndex<index_t>(f_out, alts[i], bigEndian);
        }
        return true;
    }
    bool read(ifstream& f_in, bool bigEndian) {
        left = readIndex<index_t>(f_in, bigEndian);
        right = readIndex<index_t>(f_in, bigEndian);
        assert_leq(left, right);
        index_t num_alts = readIndex<index_t>(f_in, bigEndian);
        alts.resizeExact(num_alts); alts.clear();
        for(index_t i = 0; i < num_alts; i++) {
            alts.push_back(readIndex<index_t>(f_in, bigEndian));
        }
        return true;
    }
 };
 template <typename index_t>
 class ALTDB {
 public:
    ALTDB() :
    _snp(false),
    _ss(false),
    _exon(false)
    {}
    virtual ~ALTDB() {}
    bool hasSNPs() const { return _snp; }
    bool hasSpliceSites() const { return _ss; }
    bool hasExons() const { return _exon; }
    void setSNPs(bool snp) { _snp = snp; }
    void setSpliceSites(bool ss) { _ss = ss; }
    void setExons(bool exon) { _exon = exon; }
    EList<ALT<index_t> >&       alts()       { return _alts; }
    EList<string>&              altnames()   { return _altnames; }
    EList<Haplotype<index_t> >& haplotypes() { return _haplotypes; }
    EList<index_t>&             haplotype_maxrights() { return _haplotype_maxrights; }
    const EList<ALT<index_t> >&       alts() const       { return _alts; }
    const EList<string>&              altnames() const   { return _altnames; }
    const EList<Haplotype<index_t> >& haplotypes() const { return _haplotypes; }
    const EList<index_t>&             haplotype_maxrights() const { return _haplotype_maxrights; }
 private:
    bool _snp;
    bool _ss;
    bool _exon;
    EList<ALT<index_t> >       _alts;
    EList<string>              _altnames;
    EList<Haplotype<index_t> > _haplotypes;
    EList<index_t>             _haplotype_maxrights;
 };
 #endif /*ifndef ALT_H_*/
--- a/assert_helpers.h
+++ b/assert_helpers.h
@ -0,0 +1,279 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef ASSERT_HELPERS_H_
 #define ASSERT_HELPERS_H_
 #include <stdexcept>
 #include <string>
 #include <cassert>
 #include <iostream>
 /**
 * Assertion for release-enabled assertions
 */
 class ReleaseAssertException : public std::runtime_error {
 public:
 	ReleaseAssertException(const std::string& msg = "") : std::runtime_error(msg) {}
 };
 /**
 * Macros for release-enabled assertions, and helper macros to make
 * all assertion error messages more helpful.
 */
 #ifndef NDEBUG
 #define ASSERT_ONLY(...) __VA_ARGS__
 #else
 #define ASSERT_ONLY(...)
 #endif
 #define rt_assert(b)  \
 	if(!(b)) { \
 		std::cout << "rt_assert at " << __FILE__ << ":" << __LINE__ << std::endl; \
 		throw ReleaseAssertException(); \
 	}
 #define rt_assert_msg(b,msg)  \
 	if(!(b)) { \
 		std::cout << msg <<  " at " << __FILE__ << ":" << __LINE__ << std::endl; \
 		throw ReleaseAssertException(msg); \
 	}
 #define rt_assert_eq(ex,ac)  \
 	if(!((ex) == (ac))) { \
 		std::cout << "rt_assert_eq: expected (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		throw ReleaseAssertException(); \
 	}
 #define rt_assert_eq_msg(ex,ac,msg)  \
 	if(!((ex) == (ac))) { \
 		std::cout << "rt_assert_eq: " << msg <<  ": (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		throw ReleaseAssertException(msg); \
 	}
 #ifndef NDEBUG
 #define assert_eq(ex,ac)  \
 	if(!((ex) == (ac))) { \
 		std::cout << "assert_eq: expected (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		assert(0); \
 	}
 #define assert_eq_msg(ex,ac,msg)  \
 	if(!((ex) == (ac))) { \
 		std::cout << "assert_eq: " << msg <<  ": (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		assert(0); \
 	}
 #else
 #define assert_eq(ex,ac)
 #define assert_eq_msg(ex,ac,msg)
 #endif
 #define rt_assert_neq(ex,ac)  \
 	if(!((ex) != (ac))) { \
 		std::cout << "rt_assert_neq: expected not (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		throw ReleaseAssertException(); \
 	}
 #define rt_assert_neq_msg(ex,ac,msg)  \
 	if(!((ex) != (ac))) { \
 		std::cout << "rt_assert_neq: " << msg << ": (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		throw ReleaseAssertException(msg); \
 	}
 #ifndef NDEBUG
 #define assert_neq(ex,ac)  \
 	if(!((ex) != (ac))) { \
 		std::cout << "assert_neq: expected not (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		assert(0); \
 	}
 #define assert_neq_msg(ex,ac,msg)  \
 	if(!((ex) != (ac))) { \
 		std::cout << "assert_neq: " << msg << ": (" << (ex) << ", 0x" << std::hex << (ex) << std::dec << ") got (" << (ac) << ", 0x" << std::hex << (ac) << std::dec << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		assert(0); \
 	}
 #else
 #define assert_neq(ex,ac)
 #define assert_neq_msg(ex,ac,msg)
 #endif
 #define rt_assert_gt(a,b) \
 	if(!((a) > (b))) { \
 		std::cout << "rt_assert_gt: expected (" << (a) << ") > (" << (b) << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		throw ReleaseAssertException(); \
 	}
 #define rt_assert_gt_msg(a,b,msg) \
 	if(!((a) > (b))) { \
 		std::cout << "rt_assert_gt: " << msg << ": (" << (a) << ") > (" << (b) << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		throw ReleaseAssertException(msg); \
 	}
 #ifndef NDEBUG
 #define assert_gt(a,b) \
 	if(!((a) > (b))) { \
 		std::cout << "assert_gt: expected (" << (a) << ") > (" << (b) << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		assert(0); \
 	}
 #define assert_gt_msg(a,b,msg) \
 	if(!((a) > (b))) { \
 		std::cout << "assert_gt: " << msg << ": (" << (a) << ") > (" << (b) << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		assert(0); \
 	}
 #else
 #define assert_gt(a,b)
 #define assert_gt_msg(a,b,msg)
 #endif
 #define rt_assert_geq(a,b) \
 	if(!((a) >= (b))) { \
 		std::cout << "rt_assert_geq: expected (" << (a) << ") >= (" << (b) << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		throw ReleaseAssertException(); \
 	}
 #define rt_assert_geq_msg(a,b,msg) \
 	if(!((a) >= (b))) { \
 		std::cout << "rt_assert_geq: " << msg << ": (" << (a) << ") >= (" << (b) << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		throw ReleaseAssertException(msg); \
 	}
 #ifndef NDEBUG
 #define assert_geq(a,b) \
 	if(!((a) >= (b))) { \
 		std::cout << "assert_geq: expected (" << (a) << ") >= (" << (b) << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		assert(0); \
 	}
 #define assert_geq_msg(a,b,msg) \
 	if(!((a) >= (b))) { \
 		std::cout << "assert_geq: " << msg << ": (" << (a) << ") >= (" << (b) << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		assert(0); \
 	}
 #else
 #define assert_geq(a,b)
 #define assert_geq_msg(a,b,msg)
 #endif
 #define rt_assert_lt(a,b) \
 	if(!(a < b)) { \
 		std::cout << "rt_assert_lt: expected (" << a << ") < (" << b << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		throw ReleaseAssertException(); \
 	}
 #define rt_assert_lt_msg(a,b,msg) \
 	if(!(a < b)) { \
 		std::cout << "rt_assert_lt: " << msg << ": (" << a << ") < (" << b << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		throw ReleaseAssertException(msg); \
 	}
 #ifndef NDEBUG
 #define assert_lt(a,b) \
 	if(!(a < b)) { \
 		std::cout << "assert_lt: expected (" << a << ") < (" << b << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		assert(0); \
 	}
 #define assert_lt_msg(a,b,msg) \
 	if(!(a < b)) { \
 		std::cout << "assert_lt: " << msg << ": (" << a << ") < (" << b << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		assert(0); \
 	}
 #else
 #define assert_lt(a,b)
 #define assert_lt_msg(a,b,msg)
 #endif
 #define rt_assert_leq(a,b) \
 	if(!((a) <= (b))) { \
 		std::cout << "rt_assert_leq: expected (" << (a) << ") <= (" << (b) << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		throw ReleaseAssertException(); \
 	}
 #define rt_assert_leq_msg(a,b,msg) \
 	if(!((a) <= (b))) { \
 		std::cout << "rt_assert_leq: " << msg << ": (" << (a) << ") <= (" << (b) << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		throw ReleaseAssertException(msg); \
 	}
 #ifndef NDEBUG
 #define assert_leq(a,b) \
 	if(!((a) <= (b))) { \
 		std::cout << "assert_leq: expected (" << (a) << ") <= (" << (b) << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		assert(0); \
 	}
 #define assert_leq_msg(a,b,msg) \
 	if(!((a) <= (b))) { \
 		std::cout << "assert_leq: " << msg << ": (" << (a) << ") <= (" << (b) << ")" << std::endl; \
 		std::cout << __FILE__ << ":" << __LINE__ << std::endl; \
 		assert(0); \
 	}
 #else
 #define assert_leq(a,b)
 #define assert_leq_msg(a,b,msg)
 #endif
 #ifndef NDEBUG
 #define assert_in(c, s) assert_in2(c, s, __FILE__, __LINE__)
 static inline void assert_in2(char c, const char *str, const char *file, int line) {
 	const char *s = str;
 	while(*s != '\0') {
 		if(c == *s) return;
 		s++;
 	}
 	std::cout << "assert_in: (" << c << ") not in  (" << str << ")" << std::endl;
 	std::cout << file << ":" << line << std::endl;
 	assert(0);
 }
 #else
 #define assert_in(c, s)
 #endif
 #ifndef NDEBUG
 #define assert_range(b, e, v) assert_range_helper(b, e, v, __FILE__, __LINE__)
 template<typename T>
 inline static void assert_range_helper(const T& begin,
                                       const T& end,
                                       const T& val,
                                       const char *file,
                                       int line)
 {
 	if(val < begin || val > end) {
 		std::cout << "assert_range: (" << val << ") not in  ["
 		          << begin << ", " << end << "]" << std::endl;
 		std::cout << file << ":" << line << std::endl;
 		assert(0);
 	}
 }
 #else
 #define assert_range(b, e, v)
 #endif
 #endif /*ASSERT_HELPERS_H_*/
--- a/banded.cpp
+++ b/banded.cpp
@ -0,0 +1,27 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #include <iostream>
 #include "banded.h"
 #ifdef MAIN_BANDED
 int main(void) {
 }
 #endif
--- a/banded.h
+++ b/banded.h
@ -0,0 +1,52 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef BANDED_H_
 #define BANDED_H_
 #include "sse_util.h"
 /**
 * Use SSE instructions to quickly find stretches with lots of matches, then
 * resolve alignments.
 */
 class BandedSseAligner {
 public:
 	void init(
 		int    *q,      // query, maskized
 		size_t  qi,     // query start
 		size_t  qf,     // query end
 		int    *r,      // reference, maskized
 		size_t  ri,     // reference start
 		size_t  rf)     // reference end
 	{
 	}
 	void nextAlignment() {
 	}
 protected:
 	EList_m128i mat_;
 };
 #endif
--- a/binary_sa_search.h
+++ b/binary_sa_search.h
@ -0,0 +1,102 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef BINARY_SA_SEARCH_H_
 #define BINARY_SA_SEARCH_H_
 #include <stdint.h>
 #include <iostream>
 #include <limits>
 #include "alphabet.h"
 #include "assert_helpers.h"
 #include "ds.h"
 #include "btypes.h"
 /**
 * Do a binary search using the suffix of 'host' beginning at offset
 * 'qry' as the query and 'sa' as an already-lexicographically-sorted
 * list of suffixes of host.  'sa' may be all suffixes of host or just
 * a subset.  Returns the index in sa of the smallest suffix of host
 * that is larger than qry, or length(sa) if all suffixes of host are
 * less than qry.
 *
 * We use the Manber and Myers optimization of maintaining a pair of
 * counters for the longest lcp observed so far on the left- and right-
 * hand sides and using the min of the two as a way of skipping over
 * characters at the beginning of a new round.
 *
 * Returns maximum value if the query suffix matches an element of sa.
 */
 template<typename TStr, typename TSufElt> inline
 TIndexOffU binarySASearch(
 	const TStr& host,
 	TIndexOffU qry,
 	const EList<TSufElt>& sa)
 {
 	TIndexOffU lLcp = 0, rLcp = 0; // greatest observed LCPs on left and right
 	TIndexOffU l = 0, r = (TIndexOffU)sa.size()+1; // binary-search window
 	TIndexOffU hostLen = (TIndexOffU)host.length();
 	while(true) {
 		assert_gt(r, l);
 		TIndexOffU m = (l+r) >> 1;
 		if(m == l) {
 			// Binary-search window has closed: we have an answer
 			if(m > 0 && sa[m-1] == qry) {
 				return std::numeric_limits<TIndexOffU>::max(); // qry matches
 			}
 			assert_leq(m, sa.size());
 			return m; // Return index of right-hand suffix
 		}
 		assert_gt(m, 0);
 		TIndexOffU suf = sa[m-1];
 		if(suf == qry) {
 			return std::numeric_limits<TIndexOffU>::max(); // query matches an elt of sa
 		}
 		TIndexOffU lcp = min(lLcp, rLcp);
 #ifndef NDEBUG
 		if(sstr_suf_upto_neq(host, qry, host, suf, lcp)) {
 			assert(0);
 		}
 #endif
 		// Keep advancing lcp, but stop when query mismatches host or
 		// when the counter falls off either the query or the suffix
 		while(suf+lcp < hostLen && qry+lcp < hostLen && host[suf+lcp] == host[qry+lcp]) {
 			lcp++;
 		}
 		// Fell off the end of either the query or the sa elt?
 		bool fell = (suf+lcp == hostLen || qry+lcp == hostLen);
 		if((fell && qry+lcp == hostLen) || (!fell && host[suf+lcp] < host[qry+lcp])) {
 			// Query is greater than sa elt
 			l = m;                 // update left bound
 			lLcp = max(lLcp, lcp); // update left lcp
 		}
 		else if((fell && suf+lcp == hostLen) || (!fell && host[suf+lcp] > host[qry+lcp])) {
 			// Query is less than sa elt
 			r = m;                 // update right bound
 			rLcp = max(rLcp, lcp); // update right lcp
 		} else {
 			assert(false); // Must be one or the other!
 		}
 	}
 	// Shouldn't get here
 	assert(false);
 	return std::numeric_limits<TIndexOffU>::max();
 }
 #endif /*BINARY_SA_SEARCH_H_*/
--- a/bit_packed_array.cpp
+++ b/bit_packed_array.cpp
@ -0,0 +1,315 @@
 /*
 * Copyright 2018, Chanhee Park <parkchanhee@gmail.com> and Daehwan Kim <infphilo@gmail.com>
 *
 * This file is part of HISAT 2.
 *
 * HISAT 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * HISAT 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with HISAT 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #include <iostream>
 #include <vector>
 #include <algorithm>
 #include "timer.h"
 #include "aligner_sw.h"
 #include "aligner_result.h"
 #include "scoring.h"
 #include "sstring.h"
 #include "bit_packed_array.h"
 TIndexOffU BitPackedArray::get(size_t index) const
 {
    assert_lt(index, cur_);
    pair<size_t, size_t> addr = indexToAddress(index);
    uint64_t *block = blocks_[addr.first];
    pair<size_t, size_t> pos = columnToPosition(addr.second);
    TIndexOffU val = getItem(block, pos.first, pos.second);
    return val;
 }
 #define write_fp(x) fp.write((const char *)&(x), sizeof((x)))
 void BitPackedArray::writeFile(ofstream &fp)
 {
    size_t sz = 0;
    write_fp(item_bit_size_);
    write_fp(elm_bit_size_);
    write_fp(items_per_block_bit_);
    write_fp(items_per_block_bit_mask_);
    write_fp(items_per_block_);
    write_fp(cur_);
    write_fp(sz_);
    write_fp(block_size_);
    // number of blocks
    sz = blocks_.size();
    write_fp(sz);
    for(size_t i = 0; i < sz; i++) {
        fp.write((const char *)blocks_[i], block_size_);
    }
 }
 void BitPackedArray::writeFile(const char *filename)
 {
    ofstream fp(filename, std::ofstream::binary);
    writeFile(fp);
    fp.close();
 }
 void BitPackedArray::writeFile(const string &filename)
 {
    writeFile(filename.c_str());
 }
 #define read_fp(x) fp.read((char *)&(x), sizeof((x)))
 void BitPackedArray::readFile(ifstream &fp)
 {
    size_t val_sz = 0;
    read_fp(val_sz);
    init_by_log2(val_sz);
    //rt_assert_eq(val_sz, item_bit_size_);
    read_fp(val_sz);
    rt_assert_eq(val_sz, elm_bit_size_);
    read_fp(val_sz);
    rt_assert_eq(val_sz, items_per_block_bit_);
    read_fp(val_sz);
    rt_assert_eq(val_sz, items_per_block_bit_mask_);
    read_fp(val_sz);
    rt_assert_eq(val_sz, items_per_block_);
    // skip cur_
    size_t prev_cnt = 0;
    read_fp(prev_cnt);
    cur_ = 0;
    // skip sz_
    size_t prev_sz = 0;
    read_fp(prev_sz);
    sz_ = 0;
    // block_size_
    read_fp(val_sz);
    rt_assert_eq(val_sz, block_size_);
    // alloc blocks
    allocItems(prev_cnt);
    rt_assert_eq(prev_sz, sz_);
    // number of blocks
    read_fp(val_sz);
    rt_assert_eq(val_sz, blocks_.size());
    for(size_t i = 0; i < blocks_.size(); i++) {
        fp.read((char *)blocks_[i], block_size_);
    }
    cur_ = prev_cnt;
 }
 void BitPackedArray::readFile(const char *filename)
 {
    ifstream fp(filename, std::ifstream::binary);
    readFile(fp);
    fp.close();
 }
 void BitPackedArray::readFile(const string &filename)
 {
    readFile(filename.c_str());
 }
 void BitPackedArray::put(size_t index, TIndexOffU val)
 {
    assert_lt(index, cur_);
    pair<size_t, size_t> addr = indexToAddress(index);
    uint64_t *block = blocks_[addr.first];
    pair<size_t, size_t> pos = columnToPosition(addr.second);
    setItem(block, pos.first, pos.second, val);
 }
 void BitPackedArray::pushBack(TIndexOffU val)
 {
    if(cur_ == sz_) {
        allocItems(items_per_block_);
    }
    put(cur_++, val);
    assert_leq(cur_, sz_);
 }
 TIndexOffU BitPackedArray::getItem(uint64_t *block, size_t idx, size_t offset) const
 {
    size_t remains = item_bit_size_;
    TIndexOffU val = 0;
    while(remains > 0) {
        size_t bits = min(elm_bit_size_ - offset, remains);
        uint64_t mask = bitToMask(bits);
        // get value from block
        TIndexOffU t = (block[idx] >> offset) & mask;
        val = val | (t << (item_bit_size_ - remains));
        remains -= bits;
        offset = 0;
        idx++;
    }
    return val;
 }
 void BitPackedArray::setItem(uint64_t *block, size_t idx, size_t offset, TIndexOffU val)
 {
    size_t remains = item_bit_size_;
    while(remains > 0) {
        size_t bits = min(elm_bit_size_ - offset, remains);
        uint64_t mask = bitToMask(bits);
        uint64_t dest_mask = mask << offset;
        // get 'bits' lsb from val
        uint64_t t = val & mask;
        val >>= bits;
        // save 't' to block[idx]
        t <<= offset;
        block[idx] &= ~(dest_mask); // clear
        block[idx] |= t;
        idx++;
        remains -= bits;
        offset = 0;
    }
 }
 pair<size_t, size_t> BitPackedArray::indexToAddress(size_t index) const
 {
    pair<size_t, size_t> addr;
    addr.first = index >> items_per_block_bit_;
    addr.second = index & items_per_block_bit_mask_;
    return addr;
 }
 pair<size_t, size_t> BitPackedArray::columnToPosition(size_t col) const {
    pair<size_t, size_t> pos;
    pos.first = (col * item_bit_size_) / elm_bit_size_;
    pos.second = (col * item_bit_size_) % elm_bit_size_;
    return pos;
 }
 void BitPackedArray::expand(size_t count)
 {
    if((cur_ + count) > sz_) {
        allocItems(count);
    }
    cur_ += count;
    assert_leq(cur_, sz_);
 }
 void BitPackedArray::allocSize(size_t sz)
 {
    size_t num_block = (sz * sizeof(uint64_t) + block_size_ - 1) / block_size_;
    for(size_t i = 0; i < num_block; i++) {
        uint64_t *ptr = new uint64_t[block_size_];
        blocks_.push_back(ptr);
        sz_ += items_per_block_;
    }
 }
 void BitPackedArray::allocItems(size_t count)
 {
    size_t sz = (count * item_bit_size_ + elm_bit_size_ - 1) / elm_bit_size_;
    allocSize(sz);
 }
 void BitPackedArray::init_by_log2(size_t ceil_log2)
 {
    item_bit_size_ = ceil_log2;
    elm_bit_size_ = sizeof(uint64_t) * 8;
    items_per_block_bit_ = 20;  // 1M
    items_per_block_ = 1ULL << (items_per_block_bit_);
    items_per_block_bit_mask_ = items_per_block_ - 1;
    block_size_ = (items_per_block_ * item_bit_size_ + elm_bit_size_ - 1) / elm_bit_size_ * sizeof(uint64_t);
    cur_ = 0;
    sz_ = 0;
 }
 void BitPackedArray::init(size_t max_value)
 {
    init_by_log2((size_t)ceil(log2(max_value)));
 }
 void BitPackedArray::dump() const
 {
    cerr << "item_bit_size_: " << item_bit_size_ << endl;
    cerr << "block_size_: " << block_size_ << endl;
    cerr << "items_per_block_: " << items_per_block_ << endl;
    cerr << "cur_: " << cur_ << endl;
    cerr << "sz_: " << sz_ << endl;
    cerr << "number of blocks: " << blocks_.size() << endl;
 }
 size_t BitPackedArray::getMemUsage() const
 {
    size_t tot = blocks_.size() * block_size_;
    tot += blocks_.totalCapacityBytes();
    return tot;
 }
 BitPackedArray::~BitPackedArray()
 {
    for(size_t i = 0; i < blocks_.size(); i++) {
        uint64_t *ptr = blocks_[i];
        delete [] ptr;
    }
 }
 void BitPackedArray::reset()
 {
    cur_ = 0;
    sz_ = 0;
    for(size_t i = 0; i < blocks_.size(); i++) {
        uint64_t *ptr = blocks_[i];
        delete [] ptr;
    }
    blocks_.clear();
 }
--- a/bit_packed_array.h
+++ b/bit_packed_array.h
@ -0,0 +1,105 @@
 /*
 * Copyright 2018, Chanhee Park <parkchanhee@gmail.com> and Daehwan Kim <infphilo@gmail.com>
 *
 * This file is part of HISAT 2.
 *
 * HISAT 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * HISAT 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with HISAT 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef __HISAT2_BIT_PACKED_ARRAY_H
 #define __HISAT2_BIT_PACKED_ARRAY_H
 #include <iostream>
 #include <fstream>
 #include <limits>
 #include <map>
 #include "assert_helpers.h"
 #include "word_io.h"
 #include "mem_ids.h"
 #include "ds.h"
 using namespace std;
 class BitPackedArray {
 public:
    BitPackedArray () {}
    ~BitPackedArray();
    /**
     * Return true iff there are no items
     * @return
     */
    inline bool empty() const { return cur_ == 0; }
    inline size_t size() const { return cur_; }
    TIndexOffU get(size_t idx) const;
    inline TIndexOffU operator[](size_t i) const { return get(i); }
    void pushBack(TIndexOffU val);
    void init(size_t max_value);
    void reset();
    void writeFile(const char *filename);
    void writeFile(const string& filename);
    void writeFile(ofstream &fp);
    void readFile(const char *filename);
    void readFile(const string& filename);
    void readFile(ifstream &fp);
    void dump() const;
    size_t getMemUsage() const;
 private:
    void init_by_log2(size_t ceil_log2);
    void put(size_t index, TIndexOffU val);
    inline uint64_t bitToMask(size_t bit) const
    {
        return (uint64_t) ((1ULL << bit) - 1);
    }
    TIndexOffU getItem(uint64_t *block, size_t idx, size_t offset) const;
    void setItem(uint64_t *block, size_t idx, size_t offset, TIndexOffU val);
    pair<size_t, size_t> indexToAddress(size_t index) const;
    pair<size_t, size_t> columnToPosition(size_t col) const;
    void expand(size_t count = 1);
    void allocSize(size_t sz);
    void allocItems(size_t count);
 private:
    size_t item_bit_size_;      // item bit size(e.g. 33bit)
    size_t elm_bit_size_;       // 64bit
    size_t items_per_block_bit_;
    size_t items_per_block_bit_mask_;
    size_t items_per_block_;    // number of items in block
    size_t cur_;                // current item count
    size_t sz_;                 // maximum item count
    size_t block_size_;         // block size in byte
    // List of packed array
    EList<uint64_t *> blocks_;
 };
 #endif //__HISAT2_BIT_PACKED_ARRAY_H
--- a/bitpack.h
+++ b/bitpack.h
@ -0,0 +1,80 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef BITPACK_H_
 #define BITPACK_H_
 #include <stdint.h>
 #include "assert_helpers.h"
 /**
 * Routines for marshalling 2-bit values into and out of 8-bit or
 * 32-bit hosts
 */
 static inline void pack_2b_in_8b(const int two, uint8_t& eight, const int off) {
 	assert_lt(two, 4);
 	assert_lt(off, 4);
 	eight |= (two << (off*2));
 }
 static inline int unpack_2b_from_8b(const uint8_t eight, const int off) {
 	assert_lt(off, 4);
 	return ((eight >> (off*2)) & 0x3);
 }
 static inline void pack_2b_in_32b(const int two, uint32_t& thirty2, const int off) {
 	assert_lt(two, 4);
 	assert_lt(off, 16);
 	thirty2 |= (two << (off*2));
 }
 static inline int unpack_2b_from_32b(const uint32_t thirty2, const int off) {
 	assert_lt(off, 16);
 	return ((thirty2 >> (off*2)) & 0x3);
 }
 /**
 * Routines for marshalling 1-bit values into and out of 8-bit or
 * 32-bit hosts
 */
 static inline void pack_1b_in_8b(const int one, uint8_t& eight, const int off) {
    assert_lt(one, 2);
    assert_lt(off, 8);
    eight |= (one << off);
 }
 static inline int unpack_1b_from_8b(const uint8_t eight, const int off) {
    assert_lt(off, 2);
    return ((eight >> off) & 0x1);
 }
 static inline void pack_1b_in_32b(const int one, uint32_t& thirty2, const int off) {
    assert_lt(one, 2);
    assert_lt(off, 32);
    thirty2 |= (one << off);
 }
 static inline int unpack_1b_from_32b(const uint32_t thirty2, const int off) {
    assert_lt(off, 32);
    return ((thirty2 >> off) & 0x1);
 }
 #endif /*BITPACK_H_*/
--- a/blockwise_sa.h
+++ b/blockwise_sa.h
--- a/bp_aligner.h
+++ b/bp_aligner.h
--- a/btypes.h
+++ b/btypes.h
@ -0,0 +1,48 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #ifndef BOWTIE_INDEX_TYPES_H
 #define	BOWTIE_INDEX_TYPES_H
 #ifdef BOWTIE_64BIT_INDEX
 #define OFF_MASK 0xffffffffffffffff
 #define OFF_LEN_MASK 0xc000000000000000
 #define LS_SIZE 0x100000000000000
 #define OFF_SIZE 8
 #define INDEX_MAX 0xffffffffffffffff
 typedef uint64_t TIndexOffU;
 typedef int64_t TIndexOff;
 #else
 #define OFF_MASK 0xffffffff
 #define OFF_LEN_MASK 0xc0000000
 #define LS_SIZE 0x10000000
 #define OFF_SIZE 4
 #define INDEX_MAX 0xffffffff
 typedef uint32_t TIndexOffU;
 typedef int TIndexOff;
 #endif /* BOWTIE_64BIT_INDEX */
 extern const std::string gfm_ext;
 #endif	/* BOWTIE_INDEX_TYPES_H */
--- a/ccnt_lut.cpp
+++ b/ccnt_lut.cpp
@ -0,0 +1,80 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #include <stdint.h>
 /* Generated by gen_lookup_tables.pl */
 uint8_t cCntLUT_4[4][4][256];
 uint8_t cCntLUT_4_rev[4][4][256];
 uint8_t cCntBIT[8][256];
 int countCnt(int by, int c, uint8_t str) {
    int count = 0;
    if(by == 0) by = 4;
    while(by-- > 0) {
        int c2 = str & 3;
        str >>= 2;
        if(c == c2) count++;
    }
    return count;
 }
 int countCnt_rev(int by, int c, uint8_t str) {
    int count = 0;
    if(by == 0) by = 4;
    while(by-- > 0) {
        int c2 = (str >> 6) & 3;
        str <<= 2;
        if(c == c2) count++;
    }
    return count;
 }
 void initializeCntLut() {
    for(int by = 0; by < 4; by++) {
        for(int c = 0; c < 4; c++) {
            for(int str = 0; str < 256; str++) {
                cCntLUT_4[by][c][str] = countCnt(by, c, str);
                cCntLUT_4_rev[by][c][str] = countCnt_rev(by, c, str);
            }
        }
    }
 }
 int countBit(int b, uint8_t str) {
    int count = 0;
    if(b == 0) b = 8;
    while(b-- > 0) {
        if(str & 0x1) count++;
        str >>= 1;
    }
    return count;
 }
 void initializeCntBit() {
    for(int b = 0; b < 8; b++) {
        for(int str = 0; str < 256; str++) {
            cCntBIT[b][str] = countBit(b, str);
        }
    }
 }
--- a/diff_sample.cpp
+++ b/diff_sample.cpp
@ -0,0 +1,117 @@
 /*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */
 #include "diff_sample.h"
 struct sampleEntry clDCs[16];
 bool clDCs_calced = false; /// have clDCs been calculated?
 /**
 * Entries 4-57 are transcribed from page 6 of Luk and Wong's paper
 * "Two New Quorum Based Algorithms for Distributed Mutual Exclusion",
 * which is also used and cited in the Burkhardt and Karkkainen's
 * papers on difference covers for sorting.  These samples are optimal
 * according to Luk and Wong.
 *
 * All other entries are generated via the exhaustive algorithm in
 * calcExhaustiveDC().
 *
 * The 0 is stored at the end of the sample as an end-of-list marker,
 * but 0 is also an element of each.
 *
 * Note that every difference cover has a 0 and a 1.  Intuitively,
 * any optimal difference cover sample can be oriented (i.e. rotated)
 * such that it includes 0 and 1 as elements.
 *
 * All samples in this list have been verified to be complete covers.
 *
 * A value of 0xffffffff in the first column indicates that there is no
 * sample for that value of v.  We do not keep samples for values of v
 * less than 3, since they are trivial (and the caller probably didn't
 * mean to ask for it).
 */
 uint32_t dc0to64[65][10] = {
 	{0xffffffff},                     // 0
 	{0xffffffff},                     // 1
 	{0xffffffff},                     // 2
 	{1, 0},                           // 3
 	{1, 2, 0},                        // 4
 	{1, 2, 0},                        // 5
 	{1, 3, 0},                        // 6
 	{1, 3, 0},                        // 7
 	{1, 2, 4, 0},                     // 8
 	{1, 2, 4, 0},                     // 9
 	{1, 2, 5, 0},                     // 10
 	{1, 2, 5, 0},                     // 11
 	{1, 3, 7, 0},                     // 12
 	{1, 3, 9, 0},                     // 13
 	{1, 2, 3, 7, 0},                  // 14
 	{1, 2, 3, 7, 0},                  // 15
 	{1, 2, 5, 8, 0},                  // 16
 	{1, 2, 4, 12, 0},                 // 17
 	{1, 2, 5, 11, 0},                 // 18
 	{1, 2, 6, 9, 0},                  // 19
 	{1, 2, 3, 6, 10, 0},              // 20
 	{1, 4, 14, 16, 0},                // 21
 	{1, 2, 3, 7, 11, 0},              // 22
 	{1, 2, 3, 7, 11, 0},              // 23
 	{1, 2, 3, 7, 15, 0},              // 24
 	{1, 2, 3, 8, 12, 0},              // 25
 	{1, 2, 5, 9, 15, 0},              // 26
 	{1, 2, 5, 13, 22, 0},             // 27
 	{1, 4, 15, 20, 22, 0},            // 28
 	{1, 2, 3, 4, 9, 14, 0},           // 29
 	{1, 2, 3, 4, 9, 19, 0},           // 30
 	{1, 3, 8, 12, 18, 0},             // 31
 	{1, 2, 3, 7, 11, 19, 0},          // 32
 	{1, 2, 3, 6, 16, 27, 0},          // 33
 	{1, 2, 3, 7, 12, 20, 0},          // 34
 	{1, 2, 3, 8, 12, 21, 0},          // 35
 	{1, 2, 5, 12, 14, 20, 0},         // 36
 	{1, 2, 4, 10, 15, 22, 0},         // 37
 	{1, 2, 3, 4, 8, 14, 23, 0},       // 38
 	{1, 2, 4, 13, 18, 33, 0},         // 39
 	{1, 2, 3, 4, 9, 14, 24, 0},       // 40
 	{1, 2, 3, 4, 9, 15, 25, 0},       // 41
 	{1, 2, 3, 4, 9, 15, 25, 0},       // 42
 	{1, 2, 3, 4, 10, 15, 26, 0},      // 43
 	{1, 2, 3, 6, 16, 27, 38, 0},      // 44
 	{1, 2, 3, 5, 12, 18, 26, 0},      // 45
 	{1, 2, 3, 6, 18, 25, 38, 0},      // 46
 	{1, 2, 3, 5, 16, 22, 40, 0},      // 47
 	{1, 2, 5, 9, 20, 26, 36, 0},      // 48
 	{1, 2, 5, 24, 33, 36, 44, 0},     // 49
 	{1, 3, 8, 17, 28, 32, 38, 0},     // 50
 	{1, 2, 5, 11, 18, 30, 38, 0},     // 51
 	{1, 2, 3, 4, 6, 14, 21, 30, 0},   // 52
 	{1, 2, 3, 4, 7, 21, 29, 44, 0},   // 53
 	{1, 2, 3, 4, 9, 15, 21, 31, 0},   // 54
 	{1, 2, 3, 4, 6, 19, 26, 47, 0},   // 55
 	{1, 2, 3, 4, 11, 16, 33, 39, 0},  // 56
 	{1, 3, 13, 32, 36, 43, 52, 0},    // 57
 	// Generated by calcExhaustiveDC()
 	{1, 2, 3, 7, 21, 33, 37, 50, 0},  // 58
 	{1, 2, 3, 6, 13, 21, 35, 44, 0},  // 59
 	{1, 2, 4, 9, 15, 25, 30, 42, 0},  // 60
 	{1, 2, 3, 7, 15, 25, 36, 45, 0},  // 61
 	{1, 2, 4, 10, 32, 39, 46, 51, 0}, // 62
 	{1, 2, 6, 8, 20, 38, 41, 54, 0},  // 63
 	{1, 2, 5, 14, 16, 34, 42, 59, 0}  // 64
 };
--- a/diff_sample.h
+++ b/diff_sample.h
--- a/docs/404.html
+++ b/docs/404.html
@ -0,0 +1,9 @@
 ---
 layout: page
 title: 404 Not Found
 permalink: 404.html
 hide: true
 share: false
 ---
 Sorry, the requested page wasn't found on the server.
--- a/docs/Gemfile
+++ b/docs/Gemfile
@ -0,0 +1,4 @@
 source 'https://rubygems.org'
 gem 'github-pages'
 gem 'jekyll-feed'
 gem 'jemoji'
--- a/docs/LICENSE
+++ b/docs/LICENSE
@ -0,0 +1,21 @@
 The MIT License (MIT)
 Copyright (c) 2014 Rohan Chandra
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/docs/README.md
+++ b/docs/README.md
@ -0,0 +1,59 @@
 # jekyll-ttskch-theme
 A simple and customizable theme for Jekyll.
 > This theme was renamed from _jekyll-**qck**-theme_ to _jekyll-**tch**-theme_ at 2016.06.02.  
 > And renamed again from _jekyll-**tch**-theme_ to _jekyll-**ttskch**-theme_ at 2016.09.23.
 ## Screen shot
 ![image](https://cloud.githubusercontent.com/assets/4360663/18776176/62611b38-81a2-11e6-875b-86a66aa8f15c.png)
 ## Features
 * A lot of Markdown features (also GitHub Flavored Markdown)
 * `:emoji:` ready :+1:
 * Easy color-scheme customization
 * Tags list page
 * Monthly Archives page
 * Search feature without any Jekyll plugins
 * `<!--more-->` tag feature
 * Anchor links for each headings
 * Sticky side nav
 * Responsive
 * OGP ready
 * Share buttons ready
 ## Getting started
 1. [Fork me](https://github.com/ttskch/jekyll-ttskch-theme/fork)
 2. Rename the repository from `jekyll-ttskch-theme` to `{username}.github.io` ([learn more](https://pages.github.com/))
 3. Modify `_config.yml`
 4. Modify `_sass/base/_variables.scss` if you need to change colors or font sizes
 5. Add new posts into `_posts/` :smiley:
 ## Demo
 You can see live demo at below:
 * https://ttskch.github.io/jekyll-ttskch-theme/
 ## Thanks for using :wink:
 * http://ttskch.github.io
 * http://sitaramshelke.github.io
 * http://jffourmond.github.io
 * http://vbflash8.github.io
 * http://luqitao.github.io
 * http://harusametime.github.io
 * http://gitzxon.github.io
 * http://hutsonlu.github.io
 * http://k0-1.github.io
 * http://anthonygore.github.io
 * http://getjsdojo.github.io
 * http://georgezhuo.github.io
 * http://neontapir.github.io
 * https://sasukeh.github.io
 * https://blog.guilhermegarnier.com
 Please PR if you want to add your blog.
--- a/docs/_config.yml
+++ b/docs/_config.yml
@ -0,0 +1,130 @@
 #
 # Basic settings.
 #
 url: http://DaehwanKimLab.github.io
 baseurl: /hisat2
 title: HISAT2
 description: graph-based alignment of next generation sequencing reads to a population of genomes
 avatar: /assets/img/ogp.png
 # favicon: /favicon.ico
 favicon: /assets/img/ogp.png
 # language: ja
 language: en
 #
 # Icons
 #
 icons:
  rss: true
  email:
  github: DaehwanKimLab
  bitbucket:
  twitter: 
  facebook:
  google_plus:
  tumblr:
  behance:
  dribbble:
  flickr:
  instagram:
  linkedin: # full URL
  pinterest:
  reddit:
  soundcloud:
  stack_exchange: # full URL
  steam:
  wordpress:
  youtube:
 #
 # default for front matter
 #
 defaults:
  - 
    scope:
      path: ""
    values:
      category: "main"
 #
 # Prettify url.
 #
 permalink: pretty
 #
 # Scripts.
 #
 google_analytics: # e.g. UA-000000-01
 disqus:
 #
 # Localizations.
 #
 str_next: Next
 str_prev: Prev
 str_read_more: Read more...
 str_search: Search
 str_recent_posts: Recent posts
 str_show_all_posts: Show all posts
 #
 # Recent posts.
 #
 recent_posts_num: 10
 #
 # Pagination.
 #
 paginate: 10
 paginate_path: page/:num
 #
 # Social.
 #
 share_buttons:
  twitter: true
  facebook: false # needs ogp.fb.app_id
  hatena: false
 ogp:
  image_url: //ttskch.github.io/jekyll-ttskch-theme/assets/img/ogp.png
  fb:
    admin: # facebook admin id
    app_id: # facebook application id
 #
 # Plugins.
 #
 gems:
  - jekyll-paginate
  - jekyll-feed
  - jemoji
 #
 # Styles: see "_sass/base/_variables.scss"
 #
 #
 # !! Danger zone !!
 #
 include: ["_pages"]
 markdown: kramdown
 kramdown:
  input: GFM
  syntax_highlighter: rouge
 excerpt_separator: <!--more-->
 sass:
  sass_dir: _sass
  style: :compressed # or :expanded
 exclude:
  - Gemfile
  - Gemfile.lock
  - LICENSE
  - README.md
  - vendor
--- a/docs/_data/collaborate.yml
+++ b/docs/_data/collaborate.yml
@ -0,0 +1,6 @@
 - name: Lyda Hill Department of Bioinformatics, The University of Texas Southwestern Medical Center
  url: https://www.utsouthwestern.edu/departments/bioinformatics
  logo: /assets/img/bioinformatics_utsw_logo.png
 - name: Center for Computational Biologoy, Johns Hopkins University
  url: http://ccb.jhu.edu
  logo: /assets/img/ccb_jhu_logo_tmp.png
--- a/docs/_data/contributor.yml
+++ b/docs/_data/contributor.yml
@ -0,0 +1,10 @@
 - name: Chanhee Park
  url: /chanhee.park/
 - name: Ben Langmead
  url: http://www.langmead-lab.org/
 - name: Yun (Leo) Zhang
  url: /leo.zhang/
 - name: Steven Salzberg
  url: https://salzberg-lab.org/in-the-news/about-me/
 - name: Daehwan Kim
  url: https://kim-lab.org/daehwan-kim-principal-investigator/
--- a/docs/_data/download-binary.yml
+++ b/docs/_data/download-binary.yml
@ -0,0 +1,66 @@
 latest_version: 2.2.1,2.2.0,2.1.0
 release:
  - version: 2.2.1
    date: 7/24/2020
    name: HISAT2
    artifacts:
      Source: https://cloud.biohpc.swmed.edu/index.php/s/fE9QCsX3NH4QwBi/download
      OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/zMgEtnF6LjnjFrr/download
      Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/oTtGWbWjaxsQ2Ho/download
  - version: 2.2.0
    date: 2/6/2020
    name: HISAT2
    artifacts:
      Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-220-source/download
      OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-220-OSX_x86_64/download
      Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-220-Linux_x86_64/download
  - version: 2.1.0
    date: 6/8/2017
    name: HISAT2
    artifacts:
      Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-210-source/download
      OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-210-OSX_x86_64/download
      Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-210-Linux_x86_64/download
      Windows: http://www.di.fc.ul.pt/~afalcao/hisat2_windows.html
  - version: 2.0.5
    date: 11/4/2016
    name: HISAT2
    artifacts:
      Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-205-source/download
      OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-205-OSX_x86_64/download
      Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-205-Linux_x86_64/download
  - version: 2.0.4
    date: 5/18/2016
    name: HISAT2
    artifacts:
      Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-204-source/download
      OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-204-OSX_x86_64/download
      Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-204-Linux_x86_64/download
  - version: 2.0.3-beta
    date: 3/28/2016
    name: HISAT2
    artifacts:
      Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-203-beta-source/download
      OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-203-beta-OSX_x86_64/download
      Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-203-beta-Linux_x86_64/download
  - version: 2.0.2-beta
    date: 3/17/2016
    name: HISAT2
    artifacts:
      Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-202-beta-source/download
      OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-202-beta-OSX_x86_64/download
      Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-202-beta-Linux_x86_64/download
  - version: 2.0.1-beta
    date: 11/19/2015
    name: HISAT2
    artifacts:
      Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-201-beta-source/download
      OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-201-beta-OSX_x86_64/download
      Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-201-beta-Linux_x86_64/download
  - version: 2.0.0-beta
    date: 9/8/2015
    name: HISAT2
    artifacts:
      Source: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-200-beta-source/download
      OSX_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-200-beta-OSX_x86_64/download
      Linux_x86_64: https://cloud.biohpc.swmed.edu/index.php/s/hisat2-200-beta-Linux_x86_64/download
--- a/docs/_data/download-index.yml
+++ b/docs/_data/download-index.yml
@ -0,0 +1,81 @@
 - organism: H. sapiens
  data:
    GRCh38:
      genome:
        url: https://genome-idx.s3.amazonaws.com/hisat/grch38_genome.tar.gz
      genome_snp: 
        url: https://genome-idx.s3.amazonaws.com/hisat/grch38_snp.tar.gz
      genome_tran: 
        url: https://genome-idx.s3.amazonaws.com/hisat/grch38_tran.tar.gz
      genome_snp_tran: 
        url: https://genome-idx.s3.amazonaws.com/hisat/grch38_snptran.tar.gz
      genome_rep(above 2.2.0): 
        url: https://genome-idx.s3.amazonaws.com/hisat/grch38_rep.tar.gz
      genome_snp_rep(above 2.2.0): 
        url: https://genome-idx.s3.amazonaws.com/hisat/grch38_snprep.tar.gz
    UCSC hg38:
      genome: 
        url: https://genome-idx.s3.amazonaws.com/hisat/hg38_genome.tar.gz
      genome_tran: 
        url: https://genome-idx.s3.amazonaws.com/hisat/hg38_tran.tar.gz
    GRCh37:
      genome: 
        url: https://genome-idx.s3.amazonaws.com/hisat/grch37_genome.tar.gz
      genome_snp: 
        url: https://genome-idx.s3.amazonaws.com/hisat/grch37_snp.tar.gz
      genome_tran: 
        url: https://genome-idx.s3.amazonaws.com/hisat/grch37_tran.tar.gz
      genome_snp_tran: 
        url: https://genome-idx.s3.amazonaws.com/hisat/grch37_snptran.tar.gz
    UCSC hg19:
      genome: 
        url: https://genome-idx.s3.amazonaws.com/hisat/hg19_genome.tar.gz
 - organism: M. musculus
  data:
    GRCm38:
      genome:
        url: https://cloud.biohpc.swmed.edu/index.php/s/grcm38/download
      genome_snp: 
        url: https://cloud.biohpc.swmed.edu/index.php/s/grcm38_snp/download
      genome_tran: 
        url: https://cloud.biohpc.swmed.edu/index.php/s/grcm38_tran/download
      genome_snp_tran: 
        url: https://cloud.biohpc.swmed.edu/index.php/s/grcm38_snp_tran/download
    UCSC mm10:
      genome: 
        url: https://genome-idx.s3.amazonaws.com/hisat/mm10_genome.tar.gz
 - organism: R. norvegicus
  data:
    UCSC rn6:
      genome: 
        url: https://genome-idx.s3.amazonaws.com/hisat/rn6_genome.tar.gz
 - organism: D. melanogaster
  data:
    BDGP6:
      genome: 
        url: https://genome-idx.s3.amazonaws.com/hisat/bdgp6.tar.gz
      genome_tran: 
        url: https://genome-idx.s3.amazonaws.com/hisat/bdgp6_tran.tar.gz
    UCSC dm6:
      genome: 
        url: https://genome-idx.s3.amazonaws.com/hisat/dm6.tar.gz
 - organism: C. elegans
  data:
    WBcel235:
      genome: 
        url: https://genome-idx.s3.amazonaws.com/hisat/wbcel235.tar.gz
      genome_tran: 
        url: https://genome-idx.s3.amazonaws.com/hisat/wbcel235_tran.tar.gz
    UCSC ce10:
      genome: 
        url: https://cloud.biohpc.swmed.edu/index.php/s/bbynxoY2TPpRNQb/download
 - organism: S. cerevisiae
  data: 
    R64-1-1:
      genome: 
        url: https://cloud.biohpc.swmed.edu/index.php/s/JRSoKHD5cHfpCFE/download
      genome_tran:
        url: https://cloud.biohpc.swmed.edu/index.php/s/akeiMrGGtt5KoJY/download
    UCSC sacCer3:
      genome: 
        url: https://cloud.biohpc.swmed.edu/index.php/s/Gsq4goLW4TDAz4E/download
--- a/docs/_includes/article-footer.html
+++ b/docs/_includes/article-footer.html
@ -0,0 +1,5 @@
 <footer>
    {% if site.share_buttons and include.share != false %}
    {% include share-buttons.html page=include.page %}
    {% endif %}
 </footer>
--- a/docs/_includes/article-header.html
+++ b/docs/_includes/article-header.html
@ -0,0 +1,64 @@
 {% assign page = include.page %}
 <header>
    <div class="panel">
        <h1>
            {% if include.link %}
            <a class="post-link" href="{{ page.url | prepend: site.baseurl }}">{{ page.title }}</a>
            {% else %}
            {{ page.title }}
            {% endif %}
        </h1>
        <ul class="tags">
            {% assign tags_num = (page.tags | size) %}
            {% if tags_num > 0 %}
            <li><i class="fa fa-tags"></i></li>
            {% endif %}
            {% for tag in page.tags %}
            <li>
                <a class="tag" href="{{ '/search/?t=' | append: tag | prepend: site.baseurl }}">#{{ tag }}</a>
            </li>
            {% endfor %}
        </ul>
        <div class="clearfix">
            <ul class="meta">
                {% if page.date %}
                <li>
                    <i class="fa fa-calendar"></i>
                    {{ page.date | date: "%Y-%m-%d" }}
                </li>
                {% endif %}
                {% if page.author %}
                <li>
                    <a href="{{ '/search/?a=' | append: page.author | prepend: site.baseurl }}">
                        <i class="fa fa-user"></i>
                        {{ page.author }}
                    </a>
                </li>
                {% if page.icons %}
                <li>
                    <ul class="icons">
                        {% include icons.html icons=page.icons %}
                    </ul>
                </li>
                {% endif %}
                {% endif %}
            </ul>
        </div>
    </div>
    {% if site.share_buttons and include.share != false %}
    <div style="margin-top: 1em;">
        {% include share-buttons.html page=page %}
    </div>
    {% endif %}
    {% if include.eye_catch != false and page.eye_catch %}
    <p style="text-align: center">
        <img class="eye-catch" src="{{ page.eye_catch }}"/>
    </p>
    {% endif %}
 </header>
--- a/docs/_includes/disqus.html
+++ b/docs/_includes/disqus.html
@ -0,0 +1,10 @@
 <div id="disqus_thread"></div>
 <script type="text/javascript">
    var disqus_shortname = '{{ site.disqus }}';
    (function() {
        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
        dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
    })();
 </script>
 <noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
--- a/docs/_includes/fb-root.html
+++ b/docs/_includes/fb-root.html
@ -0,0 +1,11 @@
 <!-- Init Facebook SDK -->
 {% if site.share_buttons.facebook %}
 <div id="fb-root"></div>
 <script>(function(d, s, id) {
  var js, fjs = d.getElementsByTagName(s)[0];
  if (d.getElementById(id)) return;
  js = d.createElement(s); js.id = id;
  js.src = "//connect.facebook.net/ja_JP/sdk.js#xfbml=1&version=v2.5&appId={{ site.ogp.fb.app_id }}";
  fjs.parentNode.insertBefore(js, fjs);
 }(document, 'script', 'facebook-jssdk'));</script>
 {% endif %}
--- a/docs/_includes/google-analytics.html
+++ b/docs/_includes/google-analytics.html
@ -0,0 +1,12 @@
 <!-- Google Analytics -->
 {% if site.google_analytics %}
 <script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
        (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
            m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
    ga('create', '{{ site.google_analytics }}', 'auto');
    ga('send', 'pageview');
 </script>
 {% endif %}
--- a/docs/_includes/icons.html
+++ b/docs/_includes/icons.html
@ -0,0 +1,161 @@
 {% assign icons = include.icons %}
 {% if icons.rss %}
 <li>
    <a href="{{ '/feed.xml' | prepend: site.baseurl }}">
        <i class="fa fa-fw fa-rss"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.email %}
 <li>
    <a href="mailto:{{ icons.email }}">
        <i class="fa fa-fw fa-envelope"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.github %}
 <li>
    <a href="https://github.com/{{ icons.github }}">
        <i class="fa fa-fw fa-github"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.bitbucket %}
 <li>
    <a href="https://bitbucket.org/{{ icons.bitbucket }}">
        <i class="fa fa-fw fa-bitbucket"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.twitter %}
 <li>
    <a href="https://twitter.com/{{ icons.twitter }}">
        <i class="fa fa-fw fa-twitter"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.facebook %}
 <li>
    <a href="https://www.facebook.com/{{ icons.facebook }}">
        <i class="fa fa-fw fa-facebook"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.google_plus %}
 <li>
    <a href="https://plus.google.com/{{ icons.google_plus }}">
        <i class="fa fa-fw fa-google-plus"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.tumblr %}
 <li>
    <a href="https://{{ icons.tumblr }}.tumblr.com/">
        <i class="fa fa-fw fa-tumblr"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.behance %}
 <li>
    <a href="https://www.behance.net/{{ icons.behance }}">
        <i class="fa fa-fw fa-behance"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.dribbble %}
 <li>
    <a href="https://dribbble.com/{{ icons.dribbble }}">
        <i class="fa fa-fw fa-dribbble"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.flickr %}
 <li>
    <a href="https://www.flickr.com/photos/{{ icons.flickr }}">
        <i class="fa fa-fw fa-flickr"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.instagram %}
 <li>
    <a href="http://instagram.com/{{ icons.instagram }}">
        <i class="fa fa-fw fa-instagram"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.linkedin %}
 <li>
    <a href="{{ icons.linkedin }}">
        <i class="fa fa-fw fa-linkedin"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.pinterest %}
 <li>
    <a href="http://www.pinterest.com/{{ icons.pinterest }}">
        <i class="fa fa-fw fa-pinterest"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.reddit %}
 <li>
    <a href="https://www.reddit.com/user/{{ icons.reddit }}">
        <i class="fa fa-fw fa-reddit"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.soundcloud %}
 <li>
    <a href="https://soundcloud.com/{{ icons.soundcloud }}">
        <i class="fa fa-fw fa-soundcloud"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.stack_exchange %}
 <li>
    <a href="{{ icons.stack_exchange }}">
        <i class="fa fa-fw fa-stack-exchange"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.steam %}
 <li>
    <a href="http://steamcommunity.com/id/{{ icons.steam }}">
        <i class="fa fa-fw fa-steam"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.wordpress %}
 <li>
    <a href="https://{{ icons.wordpress }}.wordpress.com/">
        <i class="fa fa-fw fa-wordpress"></i>
    </a>
 </li>
 {% endif %}
 {% if icons.youtube %}
 <li>
    <a href="https://www.youtube.com/user/{{ icons.youtube }}">
        <i class="fa fa-fw fa-youtube"></i>
    </a>
 </li>
 {% endif %}
--- a/docs/_includes/page-url-resolver.html
+++ b/docs/_includes/page-url-resolver.html
@ -0,0 +1,7 @@
 {% assign page = include.page %}
 {% if page.canonical %}
 {% assign url = page.canonical | prepend: site.baseurl | prepend: site.url %}
 {% else %}
 {% assign url = page.url | replace: 'index.html', '' | prepend: site.baseurl | prepend: site.url %}
 {% endif %}
--- a/docs/_includes/paginator.html
+++ b/docs/_includes/paginator.html
@ -0,0 +1,29 @@
 {% if paginator.total_pages > 1 %}
 <div class="pagination">
    {% if paginator.previous_page %}
    <a class="btn" href="{{ paginator.previous_page_path | prepend: site.baseurl }}">
        <i class="fa fa-chevron-left"></i>
        {{ site.str_prev }}
    </a>
    {% else %}
    <span class="btn disabled">
        <i class="fa fa-chevron-left"></i>
        {{ site.str_prev }}
    </span>
    {% endif %}
    {% if paginator.next_page %}
    <a class="btn" href="{{ paginator.next_page_path | prepend: site.baseurl }}">
        {{ site.str_next }}
        <i class="fa fa-chevron-right"></i>
    </a>
    {% else %}
    <span class="btn disabled">
        {{ site.str_next }}
        <i class="fa fa-chevron-right"></i>
    </span>
    {% endif %}
 </div>
 {% endif %}
--- a/docs/_includes/share-buttons.html
+++ b/docs/_includes/share-buttons.html
@ -0,0 +1,22 @@
 {% include page-url-resolver.html page=include.page %}
 {% assign title = include.page.title | append: ' | ' | append: site.title %}
 <div class="clearfix">
    <div style="float: right !important;">
        {% if site.share_buttons.twitter %}
        <div style="margin-right: 5px !important; float: left !important;">
            <a href="https://twitter.com/share" class="twitter-share-button"{count} data-url="{{ url }}" data-text="{{ title }}">Tweet</a>
            <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>
        </div>
        {% endif %}
        {% if site.share_buttons.facebook %}
        <div style="width: 93px !important; float: left !important;">
            <div class="fb-like" data-href="{{ url }}" data-layout="button_count"></div>
        </div>
        {% endif %}
        {% if site.share_buttons.hatena %}
        <div style="float: left !important;">
            <a href="http://b.hatena.ne.jp/entry/{{ url }}" class="hatena-bookmark-button" data-hatena-bookmark-title="{{ title }}" data-hatena-bookmark-layout="standard-balloon" data-hatena-bookmark-lang="ja" title="このエントリーをはてなブックマークに追加"><img src="https://b.st-hatena.com/images/entry-button/button-only@2x.png" alt="このエントリーをはてなブックマークに追加" width="20" height="20" style="border: none;" /></a><script type="text/javascript" src="https://b.st-hatena.com/js/bookmark_button.js" charset="utf-8" async="async"></script>
        </div>
        {% endif %}
    </div>
 </div>
--- a/docs/_layouts/default.html
+++ b/docs/_layouts/default.html
@ -0,0 +1,194 @@
 <!DOCTYPE html>
 <html lang="{{ site.language }}">
 <head>
    {% capture title %}{% if page.title %}{{ page.title }} | {% endif %}{{ site.title }}{% endcapture %}
    {% include page-url-resolver.html page=page %}
    {% if page.excerpt %}
    {% assign description = page.excerpt | strip_html | strip_newlines | truncate: 160 %}
    {% else %}
    {% assign description = site.description %}
    {% endif %}
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>{{ title }}</title>
    <meta name="description" content="{{ description }}">
    <link rel="shortcut icon" href="{{ site.favicon | prepend: site.baseurl }}" type="image/x-icon">
    <link rel="canonical" href="{{ url }}">
    <link rel="alternate" type="application/atom+xml" title="{{ site.title }}" href="{{ '/feed.xml' | prepend: site.baseurl }}" />
    {% if page.eye_catch %}
    {% assign ogp_image_url = page.eye_catch %}
    {% else %}
    {% assign ogp_image_url = site.ogp.image_url %}
    {% endif %}
    <meta property="og:title" content="{{ title }}" />
    <meta property="og:type" content="website" />
    <meta property="og:image" content="{{ ogp_image_url }}" />
    <meta property="og:url" content="{{ url }}" />
    <meta property="og:site_name" content="{{ site.title }}" />
    <meta property="fb:admins" content="{{ site.ogp.fb.admin }}" />
    <meta property="fb:app_id" content="{{ site.ogp.fb.app_id }}" />
    <meta property="og:description" content="{{ description }}" />
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
    <script src="https://use.fontawesome.com/1f5f360d80.js"></script>
    <link href="//fonts.googleapis.com/css?family=Source+Sans+Pro:400,700,700italic,400italic" rel="stylesheet">
    <link href="{{ '/assets/css/style.css' | prepend: site.baseurl }}" rel="stylesheet">
 </head>
 <body>
 <header class="site-header">
    <div class="inner clearfix">
        {% if site.avatar %}
        <a href="{{ '/' | prepend: site.baseurl }}">
            <img class="avatar" src="{{ site.avatar | prepend: site.baseurl }}" alt=""/>
        </a>
        {% endif %}
        <h1 class="clearfix">
            <a class="title {% if site.avatar == null %}slim{% endif %}" href="{{ '/' | prepend: site.baseurl }}">{{ site.title }}</a>
            <br><span class="description">{{ site.description }}</span>
        </h1>
    </div>
 </header>
 <div class="site-container">
    <div class="site-content">
        {{ content }}
    </div>
    <aside class="site-aside">
        <div class="inner">
 			<div class="block">
                        <form action="{{ site.baseurl }}/search">
                            <input type="search" id="search" name="q" placeholder="{{ site.str_search }}" />
                        </form>
           </div>
 		   <div class="block">
            <ul>
                {% assign pages = site.pages | where: "category", "main" | sort: 'order' %}
                {% for page in pages %}
                {% if page.title and page.hide != true %}
                <li><a class="page-link" href="{{ page.url | prepend: site.baseurl }}">{{ page.title }}</a></li>
                {% endif %}
                {% endfor %}
            </ul>
 			</div>
            <!--
            <ul class="icons">
                {% include icons.html icons=site.icons %}
            </ul>
            <hr class="with-no-margin margin-bottom"/>
            -->
 			<div class="block">
 			<h2>Funding</h2>
 			<br>
 			<div style="font-size: 0.8em">
            This work was supported in part by the National Human Genome Research Institute under grants R01-HG006102 and R01-HG006677, 
 			and NIH grants R01-LM06845 and R01-GM083873 and NSF grant CCF-0347992 to Steven L. Salzberg 
 			and by the Cancer Prevention Research Institute of Texas under grant RR170068 and NIH grant R01-GM135341 to Daehwan Kim
 			</div>
 			</div>
 			<div class="block">
 			<h2>Getting Help</h2>
 			<br>
 			Please use <a href="mailto:hisat2.genomics@gmail.com">hisat2.genomics@gmail.com</a> for private communications only. Please do not email technical questions to HISAT2 contributors directly.
 			</div>
 			<div class="block">
 			<h2>Publications</h2>
 			<div style="font-size: 0.8em">
 			<ul>
 			<li>Kim, D., Paggi, J.M., Park, C. <i>et al.</i> <a class="publication" href="https://doi.org/10.1038/s41587-019-0201-4">Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.</a> <a class="publication" href="https://www.nature.com/nbt/"><i>Nat Biotechnol</i></a> <b>37</b>, 907–915 (2019).</li>
 			<li>Kim D, Langmead B and Salzberg SL. <a class="publication" href="https://doi.org/10.1038/nmeth.3317">HISAT: a fast spliced aligner with low memory requirements.</a> <a class="publication" href="https://www.nature.com/nmeth/"><i>Nature Methods</i></a> 2015</li>
 			<li>Pertea M, Kim D, Pertea G, Leek JT and Salzberg SL. <a class="publication" href="https://doi.org/10.1038/nprot.2016.095">Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown.</a> <a class="publication" href="https://www.nature.com/nprot/"><i>Nature Protocols</i></a> 2016</li>
 			</ul>
 			</div>
 			</div>
 			<div class="block">
 			<h2>Contributors</h2>
            <ul>
            {% for item in site.data.contributor %}
 			    <li>
 				{% if item.url contains "http://" or item.url contains "https://" %}
 				<a class="page-link" href="{{ item.url }}">{{ item.name }}</a>
 				{% else %}
 				<a class="page-link" href="{{ item.url | prepend: site.baseurl }}">{{ item.name }}</a>
 				{% endif %}
 				</li>
            {% endfor %}
            </ul>
 			</div>
            {% if site.data.collaborate %}
            <div class="block">
            {% for item in site.data.collaborate %}
                    <ul style="text-align: center">
                        <a href="{{ item.url }}">
                            <img class="avatar" src="{{ item.logo | prepend: site.baseurl }}" alt="{{ item.name }}" />
                        </a>
                    </ul>
            {% endfor %}
            </div>
            {% endif %}
 			<!--
            <div class="block sticky">
                <h2>{{ site.str_recent_posts }}</h2>
                <ul>
                    {% assign posts = '' | split: '' %}
                    {% for post in site.posts %}
                    {% if post.hide != true %}
                    {% assign posts = posts | push: post %}
                    {% endif %}
                    {% endfor %}
                    {% assign posts = posts | sort: 'date' | reverse %}
                    {% for post in posts limit:site.recent_posts_num %}
                    <li><a href="{{ post.url | prepend: site.baseurl }}">{{ post.title }}</a></li>
                    {% endfor %}
                </ul>
            </div>
 			-->
        </div>
    </aside>
 </div>
 <footer class="site-footer">
    <div class="inner">
        <span>Powered by <a href="http://jekyllrb.com">Jekyll</a> with <a href="https://github.com/ttskch/jekyll-ttskch-theme">TtskchTheme</a></span>
    </div>
 </footer>
 <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
 <script src="{{ '/assets/lib/garand-sticky/jquery.sticky.js' | prepend: site.baseurl }}"></script>
 <script src="{{ '/assets/js/script.js' | prepend: site.baseurl }}"></script>
 {% if page.id %}
 <script src="{{ '/assets/js/header-link.js' | prepend: site.baseurl }}"></script>
 {% endif %}
 {% if page.permalink == '/search/' %}
 <script src="{{ '/assets/js/search.js' | prepend: site.baseurl }}"></script>
 {% endif %}
 {% include fb-root.html %}
 {% include google-analytics.html %}
 </body>
 </html>
--- a/docs/_layouts/page.html
+++ b/docs/_layouts/page.html
@ -0,0 +1,13 @@
 ---
 layout: default
 ---
 <div class="article-wrapper">
    <article>
        {% include article-header.html page=page link=false share=page.share %}
        <section class="post-content">
            {{ content }}
        </section>
        {% include article-footer.html page=page share=page.share %}
    </article>
 </div>
--- a/docs/_layouts/post.html
+++ b/docs/_layouts/post.html
@ -0,0 +1,19 @@
 ---
 layout: default
 ---
 <div class="article-wrapper">
    <article>
        {% include article-header.html page=page link=false share=page.share %}
        <section class="post-content">
            {{ content }}
        </section>
        {% include article-footer.html page=page share=page.share %}
    </article>
 </div>
 {% if site.disqus %}
 <section class="comments">
    {% include disqus.html %}
 </section>
 {% endif %}
--- a/docs/_pages/about.md
+++ b/docs/_pages/about.md
@ -0,0 +1,9 @@
 ---
 layout: page
 title: About
 permalink: /about/
 order: 2
 share: false
 ---
 **HISAT2** is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome. Based on an extension of BWT for graphs ([Sir&eacute;n et al. 2014](http://dl.acm.org/citation.cfm?id=2674828)), we designed and implemented a graph FM index (GFM), an original approach and its first implementation. In addition to using one global GFM index that represents a population of human genomes, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome. These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph FM index (HGFM).
--- a/docs/_pages/archives-all.html
+++ b/docs/_pages/archives-all.html
@ -0,0 +1,20 @@
 ---
 layout: page
 title: All Posts
 permalink: /archives/all/
 hide: true
 share: false
 ---
 <div id="search-results">
    <hr id="first-hr" class="with-no-margin"/>
    {% for post in site.posts %}
    <div class="article-wrapper">
        <article>
            {% include article-header.html page=post link=true share=false eye_catch=false %}
        </article>
    </div>
    <hr class="with-no-margin"/>
    {% endfor %}
 </div>
--- a/docs/_pages/archives.html
+++ b/docs/_pages/archives.html
@ -0,0 +1,35 @@
 ---
 layout: page
 title: Archives
 permalink: /archives/
 order: 3
 share: false
 hide: true
 ---
 {% for post in site.posts %}
    {% unless post.next %}
        <h3>{{ post.date | date: '%Y' }}</h3>
        <ul>
    {% else %}
        {% assign year = post.date | date: '%Y' %}
        {% assign next_year = post.next.date | date: '%Y' %}
        {% if year != next_year %}
        </ul>
        <h3>{{ post.date | date: '%Y' }}</h3>
        <ul>
        {% endif %}
    {% endunless %}
    {% assign month = post.date | date: '%m' %}
    {% assign next_month = post.next.date | date: '%m' %}
    {% if year != next_year or month != next_month %}
    <li><a href="{{ '/search/?d=' | prepend: site.baseurl }}{{ post.date | date: '%Y-%m' }}">{{ post.date | date: '%Y/%m' }}</a></li>
    {% endif %}
 {% endfor %}
 {% if site.posts %}
 </ul>
 {% endif %}
 <a class="btn" href="{{ '/archives/all/' | prepend: site.baseurl }}">{{ site.str_show_all_posts }}</a>
--- a/docs/_pages/contributors/chanheepark.md
+++ b/docs/_pages/contributors/chanheepark.md
@ -0,0 +1,12 @@
 ---
 layout: page
 title: Chanhee Park 
 permalink: /chanhee.park/
 order: 1
 share: false
 category: contributor 
 ---
 Chanhee Park is a Scientific Software Engineer in the Kim Lab at UTSW responsible for maintaining and improving HISAT2.
 [Linkedin](https://www.linkedin.com/in/chanhee-park-97677297/)
--- a/docs/_pages/contributors/yunleozhang.md
+++ b/docs/_pages/contributors/yunleozhang.md
@ -0,0 +1,12 @@
 ---
 layout: page
 title: Yun (Leo) Zhang
 permalink: /leo.zhang/
 order: 1
 share: false
 category: contributor 
 ---
 Yun (Leo) is a biomedical engineering graduate student at UT Southwestern Medical Center. His main research includes developing advance alignment tools.
 [Linkedin](https://www.linkedin.com/in/zhang-yun-a9565891/)
--- a/docs/_pages/download.md
+++ b/docs/_pages/download.md
@ -0,0 +1,61 @@
 ---
 layout: page
 title: Download
 permalink: /download/
 order: 5
 share: false
 ---
 Please cite:  
 >Kim, D., Paggi, J.M., Park, C. _et al._ Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. _Nat Biotechnol_ **37**, 907–915 (2019). <https://doi.org/10.1038/s41587-019-0201-4>
 - TOC
 {:toc}
 ## Index
 HISAT2 indexes are hosted on AWS (Amazon Web Services), thanks to the AWS Public Datasets program. Click this [link](https://registry.opendata.aws/jhu-indexes/) for more details.
 {% for item in site.data.download-index %}
 ### {{ item.organism }}
  {% for data in item.data %}
 <li>{{ data[0] }}</li>
 <table style="border-collapse: collapse; border: none;">
 {% for genome in data[1] %}
 <tr style="border: none;"><td style="border: none;">{{ genome[0] }}</td>
  <td style="border: none;">
  {% for url in genome[1] %}
  <a href="{{ url[1] }}">{{ url[1] }}</a><br/>
  {% endfor %}
  </td>
 </tr>
 {% endfor %}
 </table>
 {% endfor %}
 {% endfor %}
    genome: HISAT2 index for reference
    genome_snp: HISAT2 Graph index for reference plus SNPs
    genome_tran: HISAT2 Graph index for reference plus transcripts
    genome_snp_tran: HISAT2 Graph index for reference plus SNPs and transcripts
 ## Binaries
 {: binaries }
 {% assign targets = site.data.download-binary.latest_version | split: "," %}
 {% for release in site.data.download-binary.release %}
 {% assign version = release['version'] %}
 {% if targets contains version or targets == null %}
 {% assign name = release['name'] %}
 ### Version: {{name}} {{version}}
 <table style="border-collapse: collapse; border: none;">
 <tr style="border: none;"><td style="border: none;" colspan="2"><b>Release Date</b>: {{release['date']}}</td></tr>
 {% for artifact in release['artifacts'] %}
 {% assign type = artifact[0] %}
 <tr style="border: none;"><td style="border: none;">{{type}}</td><td style="border: none;"><a href="{{artifact[1]}}">{{artifact[1]}}</a></td></tr>
 {% endfor %}
 </table>
 {% endif %}
 {% endfor %}
--- a/docs/_pages/hisat-3n.md
+++ b/docs/_pages/hisat-3n.md
@ -0,0 +1,225 @@
 ---
 layout: page
 title: HISAT-3N 
 permalink: /hisat-3n/
 order: 4
 share: false
 ---
 HISAT-3N
 ============
 Overview
 -----------------
 **HISAT-3N** (hierarchical indexing for spliced alignment of transcripts - 3 nucleotides)
 is designed for nucleotide conversion sequencing technologies and implemented based on HISAT2. 
 There are two strategies for HISAT-3N to align nuleotide conversion sequencing reads: *standard mode* and *repeat mode*. 
 The standard mode align reads with standard-3N index only, so it is fast and require small memory (~9GB for human genome alignment).
 The repeat mode align reads with both standard-3N index and repeat-3N index, then output 1,000 alignment result (the output number can be adjust by `--repeat-limit`).
 The repeat mode can align nucleotide conversion reads more accurately, 
 and it is only 10% slower than the standard mode with tiny more memory (repeat mode use about ~10.5GB) usage than standard mode.
 HISAT-3N is developed based on [HISAT2], which is particularly optimized for RNA sequencing technology. 
 HISAT-3N can be used for any base-converted sequencing reads include [BS-seq], [SLAM-seq], [TAB-seq], [oxBS-seq], [TAPS], [scBS-seq], and [scSLAM-seq],.
 [HISAT2]:https://github.com/DaehwanKimLab/hisat2
 [BS-seq]: https://en.wikipedia.org/wiki/Bisulfite_sequencing
 [SLAM-seq]: https://www.nature.com/articles/nmeth.4435
 [scBS-seq]: https://www.nature.com/articles/nmeth.3035
 [scSLAM-seq]: https://www.nature.com/articles/s41586-019-1369-y
 [TAPS]: https://www.nature.com/articles/s41587-019-0041-2
 [TAB-seq]: https://doi.org/10.1016/j.cell.2012.04.027
 [oxBS-seq]: https://science.sciencemag.org/content/336/6083/934
 Getting started
 ============
 HISAT-3N requires a 64-bit computer running either Linux or Mac OS X and at least 16 GB of RAM. 
 A few notes:  
 1. The repeat 3N index building process requires 256 GB of RAM.
 2. The standard 3N index building requires no more than 16 GB of RAM.
 3. The alignment process with either standard or repeat index requires no more than 16 GB of RAM.
 4. [SAMtools] is required to sort SAM file for hisat-3n-table.
 Install
 ------------
    git clone https://github.com/DaehwanKimLab/hisat2.git
    cd hisat2
    git checkout -b hisat-3n origin/hisat-3n
    make
 Make sure that you are in the `hisat-3n` branch
 Build a 3N index with `hisat-3n-build`
 -----------
 `hisat-3n-build` builds a 3N-index, which contains two hisat2 indexes, from a set of DNA sequences. For standard 3N-index,
 each index contains 16 files with suffix `.3n.*.*.ht2`.
 For repeat 3N-index, there are 16 more files in addition to the standard 3N-index, and they have the suffix 
 `.3n.*.rep.*.ht2`. 
 These files constitute the hisat-3n index and no other file is needed to alignment reads to the reference.
 * Example for standard HISAT-3N index building:  
 `hisat-3n-build genome.fa genome`  
 * Example for repeat HISAT-3N index building (require 256 GB memory):  
 `hisat-3n-build --repeat-index genome.fa genome` 
 It is optional to make the graph index and add SNP or spicing site information to the index, to increase the alignment accuracy.
 for more detail, please check the [HISAT2 manual].
 [HISAT2 manual]:https://daehwankimlab.github.io/hisat2/manual/
    # Standard HISAT-3N integrated index with SNP information
    hisat-3n-build --exons genome.exon genome.fa genome 
    # Standard HISAT-3N integrated index with splicing site information
    hisat-3n-build --ss genome.ss genome.fa genome 
    # Repeat HISAT-3N integrated index with SNP information
    hisat-3n-build --repeat-index --exons genome.exon genome.fa genome 
    # Repeat HISAT-3N integrated index with splicing site information
    hisat-3n-build --repeat-index --ss genome.ss genome.fa genome 
 Alignment with `hisat-3n`
 ------------
 After we build the HISAT-3N index, you are ready to use `hisat-3n` for alignment. 
 HISAT-3N uses the HISAT2 argument but has some extra arguments. Please check [HISAT2 manual] for more detail.
 For human genome reference, HISAT-3N requires about 9GB for alignment with standard 3N-index and 10.5 GB for repeat 3N-index.
 * `--base-change <chr1,chr2>`  
    Provide which base is converted in the sequencing process to another base. Please enter
    2 letters separated by ',' for this argument. The first letter(chr1) should be the converted base, the second letter(chr2) should be
    the converted to base. For example, during slam-seq, some 'T' is converted to 'C',
    please enter `--base-change T,C`. During bisulfite-seq, some 'C' is converted to 'T', please enter `--base-change C,T`.
    If you want to align non-converted reads to the regular HISAT2 index, do not use this option.
 * `--index/-x <hisat-3n-idx>`  
    The index for HISAT-3N.  The basename is the name of the index files up to but not including the suffix `.3n.*.*.ht2` / etc. 
    For example, you build your index with basename 'genome' by HISAT-3N-build, please enter `--index genome`.
 * `--repeat-limit <int>` 
    You can set up the number of alignment will be check for each repeat alignment. You may increase the number to let hisat-3n 
    output more, if a read has multiple mapping. We suggest the repeat limit number for paired-end reads alignment is no more 
    than 1,000,000. default: 1000.
 * `--unique-only` 
    Only output uniquely aligned reads.
 #### Examples:
 * Single-end slam-seq reads (T to C conversion) alignment with standard 3N-index:  
 `hisat-3n --index genome -f -U read.fa -S alignment_result.sam --base-change T,C`
 * Paired-end bisulfite-seq reads (C to T conversion) alignment with repeat 3N-index:   
 `hisat-3n --index genome -f -1 read_1.fa -2 read_2.fa -S alignment_result.sam --base-change C,T`
 * Single-end TAPS reads (have C to T conversion) alignment with repeat 3N-index and only output unique aligned result:   
 `hisat-3n --index genome -q -U read.fq -S alignment_result.sam --base-change C,T --unique`
 #### Extra SAM tags generated by HISAT-3N:
 * `Yf:i:<N>`: Number of conversions are detected in the read.
 * `YZ:A:<A>`: The value `+` or `–` indicate the read is mapped to REF-3N (`+`) or REF-RC-3N (`-`).
 Generate a 3N-conversion-table with `hisat-3n-table`
 ------------
 ### Preparation
 To generate 3N-conversion-table, users need to sort the SAM file which generated by `hisat-3n`. 
 [SAMtools] is required for this sorting process.
 Use `samtools sort` to convert the SAM file to a sorted SAM file.
    samtools sort alignment_result.sam -o sorted_alignment_result.sam -O sam
 Generate 3N-conversion-table with `hisat-3n-table`:
 ### Usage
    hisat-3n-table [options]* --alignments <alignmentFile> --ref <refFile> --output-name <outputFile> --base-change <char1,char2>
 #### Main arguments
 * `--alignments <alignmentFile>`   
  SORTED SAM file. Please enter `-` for standard input.
 * `--ref <refFile>`  
  The reference genome file (FASTA format) for generating HISAT-3N index. 
 * `--output-name <outputFile>`  
  Filename to write 3N-conversion-table (tsv format) to.
 * `--base-change <char1,char2>`  
  The base-change rule. User should enter the exact same `--base-change` arguments in hisat-3n.
  For example, please enter `--base-change C,T` for bisulfite sequencing reads.
 #### Input options
 * `-u/--unique-only`  
  Only count the unique aligned reads into 3N-conversion-table.
 * `-m/--multiple-only`  
  Only count the multiple aligned reads into 3N-conversion-table.
 * `-c/--CG-only`  
  Only count the CpG island in reference genome. This option is designed for bisulfite sequencing reads.
 * `-p/--threads <int>`  
  Launch `int` parallel threads (default: 1) for table building. 
 * `-h/--help`  
  Print usage information and quit.
 #### Examples:
 * Generate 3N conversion table for bisulfite sequencing data:  
 `hisat-3n-table -p 16 --alignments sorted_alignment_result.sam --ref genome.fa --output-name output.tsv --base-change C,T`
 * Generate 3N-conversion-table for TAPS data and only count base in CpG island and uniquely aligned:  
 `hisat-3n-table -p 16 --alignments sorted_alignment_result.sam --ref genome.fa --output-name output.tsv --base-change C,T --CG-only --unique-only`
 * Generate 3N conversion table for bisulfite sequencing data from sorted BAM file:  
 `samtools view -h sorted_alignment_result.bam | hisat-3n-table --ref genome.fa --alignments - --output-name output.tsv --base-change C,T`
 * Generate 3N conversion table for bisulfite sequencing data from unsorted BAM file:  
  `samtools sort alignment_result.bam -O sam | hisat-3n-table --ref genome.fa --alignments - --output-name output.tsv --base-change C,T`
 #### Note:
 There are 7 columns in the 3N-conversion-table:
 1. `ref`: the chromosome name.
 2. `pos`: 1-based position in ref.
 3. `strand`: '+' for forward strand. '-' for reverse strand.
 4. `convertedBaseQualities`: the qualities for converted base in read-level measurement. Length of this string is equal to
 the number of converted Base in read-level measurement.
 5. `convertedBaseCount`: number of distinct read positions where converted base in read-level measurements were found.
 this number should equal to the length of convertedBaseQualities.
 6. `unconvertedBaseQualities`: the qualities for unconverted base in read-level measurement. Length of this string is equal to
 the number of unconverted Base in read-level measurement.
 7. `unconvertedBaseCount`: number of distinct read positions where unconverted base in read-level measurements were found.
 this number should equal to the length of unconvertedBaseQualities.
 ##### Sample 3N-conversion-table:
    ref    pos    strand    convertedBaseQualities    convertedBaseCount    unconvertedBaseQualities    unconvertedBaseCount
    1      11874  +         FFFFFB<BF<F               11                                                0
    1      11877  -         FFFFFF<                   7                                                 0
    1      11878  +         FFFBB//F/BB               11                                                0
    1      11879  +                                   0                     FFFBB//FB/                  10
    1      11880  -         F                         1                     FFFF/                       5
 [SAMtools]:        http://samtools.sourceforge.net
 Publication
 ============
 * HISAT-3N paper  
  Zhang, Y., Park, C., Bennett, C., Thornton, M., & Kim, D. (2021). [Rapid and accurate alignment of nucleotide conversion sequencing reads with HISAT-3N](https://doi.org/10.1101/gr.275193.120). Genome research, gr.275193.120. Advance online publication.  
 * HIAST2 paper  
 Kim, D., Paggi, J.M., Park, C. _et al._ [Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype](https://doi.org/10.1038/s41587-019-0201-4). _Nat Biotechnol_ **37**, 907–915 (2019)  
--- a/docs/_pages/hisat2.md
+++ b/docs/_pages/hisat2.md
@ -0,0 +1,135 @@
 ---
 layout: page
 title: Main
 permalink: /main/
 order: 1
 share: false
 ---
 **HISAT2** is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome. Based on an extension of BWT for graphs ([Sir&eacute;n et al. 2014](http://dl.acm.org/citation.cfm?id=2674828)), we designed and implemented a graph FM index (GFM), an original approach and its first implementation. In addition to using one global GFM index that represents a population of human genomes, **HISAT2** uses a large set of small GFM indexes that collectively cover the whole genome. These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph FM index (HGFM).
 ### Index files are moved to the AWS Public Dataset Program. 9/3/2020
 We have moved HISAT2 index files to the AWS Public Dataset Program. See the [link](https://registry.opendata.aws/jhu-indexes/) for more details.
 ### HISAT 2.2.1 release 7/24/2020
 This patch version includes the following changes.
 * Python3 support
 * Remove the HISAT-genotype related scripts. HISAT-genotype moved to [http://daehwankimlab.github.io/hisat-genotype/](http://daehwankimlab.github.io/hisat-genotype/)
 * Fixed bugs related to `--read-lengths` option
 ### HISAT 2.2.0 release 2/6/2020
 This major version update includes a new feature to handle “repeat” reads. Based on sets of 100-bp simulated and 101-bp real reads that we tested, we found that 2.6-3.4% and 1.4-1.8% of the reads were mapped to >5 locations and >100 locations, respectively. Attempting to report all alignments would likely consume a prohibitive amount of disk space. In order to address this issue, our repeat indexing and alignment approach directly aligns reads to repeat sequences, resulting in one repeat alignment per read. HISAT2 provides application programming interfaces (API) for C++, Python, and JAVA that rapidly retrieve genomic locations from repeat alignments for use in downstream analyses.  
 Other minor bug fixes are also included as follows:  
 * Fixed occasional sign (+ or -) issues of template lengths in SAM file
 * Fixed duplicate read alignments in SAM file
 * Skip a splice site if exon's last base or first base is ambiguous (N) 
 ### Index files are moved to a different location. 8/30/2019
 Due to a high volume of index downloads, we have moved HISAT2 index files to a different location in order to provide faster download speed. If you use wget or curl to download index files, then you may need to use the following commands to get the correct file name.
 * `wget --content-disposition` *download_link*
 * `curl -OJ` *download_link*
 ### [The HISAT2 paper](https://www.nature.com/articles/s41587-019-0201-4) is out in *Nature Biotechnology*. 8/2/2019
 ### HISAT 2.1.0 release 6/8/2017
 * This major version includes the first release of HISAT-genotype, which currently performs HLA typing,
  DNA fingerprinting analysis, and CYP typing on whole genome sequencing (WGS) reads. 
  We plan to extend the system so that it can analyze not just a few genes, but a whole human genome. 
  Please refer to [the HISAT-genotype website](https://daehwankimlab.github.io/hisat-genotype) for more details.
 * HISAT2 can be directly compiled and executed on Windows system using Visual Studio, thanks to [Nigel Dyer](http://www2.warwick.ac.uk/fac/sci/systemsbiology/staff/dyer/).
 * Implemented `--new-summary` option to output a new style of alignment summary, which is easier to parse for programming purposes.
 * Implemented `--summary-file` option to output alignment summary to a file in addition to the terminal (e.g. stderr).
 * Fixed discrepancy in HISAT2’s alignment summary.
 * Implemented `--no-templatelen-adjustment` option to disable automatic template length adjustment for RNA-seq reads.
 ### HISAT2 2.0.5 release 11/4/2016
 Version 2.0.5 is a minor release with the following changes.
 * Due to a policy change (HTTP to HTTPS) in using SRA data (`--sra-option`), users are strongly encouraged to use this version. As of 11/9/2016, NCBI will begin a permanent redirect to HTTPS, which means the previous versions of HISAT2 no longer works with `--sra-acc` option soon.
 * Implemented `-I` and `-X` options for specifying minimum and maximum fragment lengths.  The options are valid only when used with `--no-spliced-alignment`, which is used for the alignment of DNA-seq reads.
 * Fixed some cases where reads with SNPs on their 5' ends were not properly aligned.
 * Implemented `--no-softclip` option to disable soft-clipping.
 * Implemented `--max-seeds` to specify the maximum number of seeds that HISAT2 will try to extend to full-length alignments (see [the manual] for details).
 ### [HISAT, StringTie and Ballgown protocol](http://www.nature.com/nprot/journal/v11/n9/full/nprot.2016.095.html) published at Nature Protocols 8/11/2016
 ### HISAT2 2.0.4 Windows binary available [here](http://www.di.fc.ul.pt/~afalcao/hisat2_windows.html), thanks to [Andre Osorio Falcao](http://www.di.fc.ul.pt/~afalcao/) 5/24/2016
 ### HISAT2 2.0.4 release 5/18/2016
 Version 2.0.4 is a minor release with the following changes.
 * Improved template length estimation (the 9th column of the SAM format) of RNA-seq reads by taking introns into account.
 * Introduced two options, `--remove-chrname` and `--add-chrname`, to remove "chr" from reference names or add "chr" to reference names in the alignment output, respectively (the 3rd column of the SAM format).
 * Changed the maximum of mapping quality (the 5th column of the SAM format) from 255 to 60. Note that 255 is an undefined value according to the SAM manual and some programs would not work with this value (255) properly.
 * Fixed NH (number of hits) in the alignment output.
 * HISAT2 allows indels of any length pertaining to minimum alignment score (previously, the maximum length of indels was 3 bp).
 * Fixed several cases that alignment goes beyond reference sequences.
 * Fixed reporting duplicate alignments.
 ### HISAT2 2.0.3-beta release 3/28/2016
 Version 2.0.3-beta is a minor release with the following changes.
 * Fixed graph index building when using both SNPs and transcripts. As a result, genome_snp_tran indexes here on the HISAT2 website have been rebuilt.
 * Included some missing files needed to follow the small test example (see [the manual] for details).
 ### HISAT2 2.0.2-beta release 3/17/2016
 **Note (3/19/2016):** this version is slightly updated to handle reporting splice sites with the correct chromosome names.
 Version 2.0.2-beta is a major release with the following changes.
 * Memory mappaped IO (`--mm` option) works now.
 * Building linear index can be now done using multi-threads.
 * Changed the minimum score for alignment in keeping with read lengths, so it's now `--score-min L,0.0,-0.2`, meaning a minimum score of -20 for 100-bp reads and -30 for 150-bp reads.
 * Fixed a bug that the same read was written into a file multiple times when `--un-conc` was used.
 * Fixed another bug that caused reads to map beyond reference sequences.
 * Introduced `--haplotype` option in the hisat2-build (index building), which is used with `--snp` option together to incorporate those SNP combinations present in the human population.  This option also prevents graph construction from exploding due to exponential combinations of SNPs in small genomic regions.
 * Provided a new python script to extract SNPs and haplotypes from VCF files, <i>hisat2_extract_snps_haplotypes_VCF.py</i>
 * Changed several python script names as follows<
  * *extract_splice_sites.py* to *hisat2_extract_splice_sites.py*
  * *extract_exons.py* to *hisat2_extract_exons.py*
  * *extract_snps.py* to *hisat2_extract_snps_haplotypes_UCSC.py*
 ### HISAT2 2.0.1-beta release 11/19/2015
 Version 2.0.1-beta is a maintenance release with the following changes.
 * Fixed a bug that caused reads to map beyond reference sequences.
 * Fixed a deadlock issue that happened very rarely.
 * Fixed a bug that led to illegal memory access when reading SNP information.
 * Fixed a system-specific bug related to popcount instruction.
 ### HISAT2 2.0.0-beta release 9/8/2015 - first release
 We extended the BWT/FM index to incorporate genomic differences among individuals into the reference genome, while keeping memory requirements low enough to fit the entire index onto a desktop computer. Using this novel Hierarchical Graph FM index (HGFM) approach, we built a new alignment system, HISAT2, with an index that incorporates ~12.3M common SNPs from the dbSNP database. HISAT2 provides greater alignment accuracy for reads containing SNPs.
 * HISAT2's index size for the human reference genome and 12.3 million common SNPs is 6.2GB (the memory footprint of HISAT2 is 6.7GB). The SNPs consist of 11 million single nucleotide polymorphisms, 728,000 deletions, and 555,000 insertions. The insertions and deletions used in this index are small (usually <20bp).
 * HISAT2 comes with several index types:
  * Hierarchical FM index (HFM) for a reference genome (index base: <i>genome</i>)
  * Hierarchical Graph FM index (HGFM) for a reference genome plus SNPs (index base: <i>genome_snp</i>)
  * Hierarchical Graph FM index (HGFM) for a reference genome plus transcripts (index base: <i>genome_tran</i>)
  * Hierarchical Graph FM index (HGFM) for a reference genome plus SNPs and transcripts (index base: <i>genome_snp_tran</i>)
 * HISAT2 is a successor to both [HISAT](http://ccb.jhu.edu/software/hisat) and [TopHat2](http://ccb.jhu.edu/software/tophat). We recommend that HISAT and TopHat2 users switch to HISAT2.
  * HISAT2 can be considered an enhanced version of HISAT with many improvements and bug fixes. The alignment speed and memory requirements of HISAT2 are virtually the same as those of HISAT when using the HFM index (<i>genome</i>).
  * When using graph-based indexes (HGFM), the runtime of HISAT2 is slightly slower than HISAT (30~80% additional CPU time).
  * HISAT2 allows for mapping reads directly against transcripts, similar to that of TopHat2 (use <i>genome_tran</i> or <i>genome_snp_tran</i>).
 * When reads contain SNPs, the SNP information is provided as an optional field in the SAM output of HISAT2 (e.g., **<code>Zs:Z:1|S|rs3747203,97|S|rs16990981</code>** - see [the manual] for details).  This feature enables fast and sensitive genotyping in downstream analyses. Note that there is no alignment penalty for mismatches, insertions, and deletions if they correspond to known SNPs.
 * HISAT2 provides options for transcript assemblers (e.g., StringTie and Cufflinks) to work better with the alignment from HISAT2 (see options such as `--dta` and `--dta-cufflinks`).
 * Some slides about HISAT2 are found [here]({{ '/assets/data/HISAT2-first_release-Sept_8_2015.pdf' | prepend: site.baseurl }}) and we are preparing detailed documention.
 * We plan to incorporate a larger set of SNPs and structural variations (SV) into this index (e.g., long insertions/deletions, inversions, and translocations).
 [the manual]: {{ site.baseurl }}{% link _pages/manual.md %}
 ### The HISAT2 source code is available in a [public GitHub repository](https://github.com/DaehwanKimLab/hisat2) (5/30/2015).
--- a/docs/_pages/howto.md
+++ b/docs/_pages/howto.md
@ -0,0 +1,78 @@
 ---
 layout: page
 title: HowTo
 permalink: /howto/
 order: 6
 share: false
 ---
 ## HOWTO
 {: .no_toc}
 - TOC
 {:toc}
 ### Building indexes
 Depend on your purpose, you have to download reference sequence, gene annotation and SNP files.  
 We also provides scripts to build indexes. [Download]({{ site.baseurl }}{% link _pages/download.md %})
 #### Prepare data
 1. Download reference
 ```
 $ wget ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
 $ gzip -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
 $ mv Homo_sapiens.GRCh38.dna.primary_assembly.fa genome.fa
 ```
 1. Download GTF and make exon, splicesite file.  
   If you want to build HFM index, you can skip this step.
 ```
 $ wget ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.gtf.gz  
 $ gzip -d Homo_sapiens.GRCh38.84.gtf.gz
 $ mv Homo_sapiens.GRCh38.84.gtf genome.gtf
 $ hisat2_extract_splice_sites.py genome.gtf > genome.ss
 $ hisat2_extract_exons.py genome.gtf > genome.exon
 ```
 1. Download SNP  
   If you want to build HFM index, you can skip this step.  
 ```
 $ wget http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/snp144Common.txt.gz
 $ gzip -d snp144Common.txt.gz
 ```
   Convert chromosome names of UCSC Database to Ensembl Annotation
 ```
 $ awk 'BEGIN{OFS="\t"} {if($2 ~ /^chr/) {$2 = substr($2, 4)}; if($2 == "M") {$2 = "MT"} print}' snp144Common.txt > snp144Common.txt.ensembl
 ```
   make SNPs and haplotype file
 ```
 $ hisat2_extract_snps_haplotypes_UCSC.py genome.fa snp144Common.txt.ensembl genome
 ```
 #### Build HFM index
 It takes about 20 minutes(depend on HW spec) to build index, and requires at least 6GB memory.
 ```
 $ hisat2-build -p 16 genome.fa genome
 ```
 #### Build HGFM index with SNPs
 ```
 $ hisat2-build -p 16 --snp genome.snp --haplotype genome.haplotype genome.fa genome_snp
 ```
 #### Build HGFM index with transcripts
 It takes about 1 hour(depend on HW spec) to build index, and requires at least 160GB memory.
 ```
 $ hisat2-build -p 16 --exon genome.exon --ss genome.ss genome.fa genome_tran
 ```
 #### Build HGFM index with SNPs and transcripts
 ```
 $ hisat2-build -p 16 --snp genome.snp --haplotype genome.haplotype --exon genome.exon --ss genome.ss genome.fa genome_snp_tran
 ```
--- a/docs/_pages/links.md
+++ b/docs/_pages/links.md
@ -0,0 +1,17 @@
 ---
 layout: page
 title: Links 
 permalink: /links/
 order: 7
 share: false
 ---
 * KimLab - <https://kim-lab.org>
  * github - <https://github.com/DaehwanKimLab>
 * hisat-genotype - <https://daehwankimlab.github.io/hisat-genotype>
  * github for hisat-genotype - <https://github.com/DaehwanKimLab/hisat-genotype>
 * Lyda Hill Department of Bioinformatics at UT Southwestern Medical Center - <https://www.utsouthwestern.edu/departments/bioinformatics>
 * Center for Computational Biology at Johns Hopkins University - <http://www.ccb.jhu.edu> 
--- a/docs/_pages/manual.md
+++ b/docs/_pages/manual.md
--- a/docs/_pages/search.html
+++ b/docs/_pages/search.html
@ -0,0 +1,26 @@
 ---
 layout: page
 title: Search Results
 permalink: /search/
 hide: true
 share: false
 ---
 <script>
    var baseurl = "{{ site.baseurl }}";
 </script>
 <div id="search-results">
    <hr id="first-hr" class="with-no-margin"/>
    {% for post in site.posts %}
    <div id="{{ post.id | replace: '/', '-' }}" style="display: none;">
        <div class="article-wrapper">
            <article>
                {% include article-header.html page=post link=true share=false eye_catch=false %}
            </article>
        </div>
        <hr class="with-no-margin"/>
    </div>
    {% endfor %}
 </div>
--- a/docs/_pages/tags.html
+++ b/docs/_pages/tags.html
@ -0,0 +1,14 @@
 ---
 layout: page
 title: Tags
 permalink: /tags/
 order: 2
 share: false
 hide: true
 ---
 <ul class="inline">
    {% for tag in site.tags %}
    <li><a href="{{ '/search/?t=' | prepend: site.baseurl }}{{ tag[0] }}">#{{ tag[0] }}</a></li>
    {% endfor %}
 </ul>
--- a/docs/_posts/2000-01-01-kim.md
+++ b/docs/_posts/2000-01-01-kim.md
@ -0,0 +1,13 @@
 ---
 layout: post
 title: Daehwan Kim
 tags: daehwankim
 eye_catch: https://avatars0.githubusercontent.com/u/28678667?s=460&v=4
 ---
 Daehwan Kim is an Assistant Professor at UT Southwestern and was the original designer who layed much of the ground work for HISAT-genotype.
 [Webpage](https://kim-lab.org/daehwan-kim-principal-investigator/)
--- a/docs/_posts/2000-01-02-salzberg.md
+++ b/docs/_posts/2000-01-02-salzberg.md
@ -0,0 +1,11 @@
 ---
 layout: post
 title: Steven Salzberg
 tags: stevensalzberg
 eye_catch: https://avatars0.githubusercontent.com/u/28678667?s=460&v=4
 ---
 Steven Salzberg is the Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics at Johns Hopkins University, where I’m also Director of the Center for Computational Biology. 
 [Webpage](https://salzberg-lab.org/in-the-news/about-me/)
--- a/docs/_posts/2000-01-03-langmead.md
+++ b/docs/_posts/2000-01-03-langmead.md
@ -0,0 +1,13 @@
 ---
 layout: post
 title: Ben Langmead
 tags: benlangmead
 eye_catch: https://avatars0.githubusercontent.com/u/28678667?s=460&v=4
 ---
 Ben Langmead is an Associate Professor of Computer Science at Johns Hopkins University.
 [Webpage](http://www.langmead-lab.org/)
--- a/docs/_posts/2019-07-28-park.md
+++ b/docs/_posts/2019-07-28-park.md
@ -0,0 +1,10 @@
 ---
 layout: post
 title: Chanhee Park
 tags: chanheepark
 eye_catch: https://avatars0.githubusercontent.com/u/28678667?s=460&v=4
 ---
 Chanhee Park is a Scientific Software Engineer in the Kim Lab at UTSW responsible for maintaining and improving HISAT2, the core of HISAT-genotype.
 [Linkedin](https://www.linkedin.com/in/chanhee-park-97677297/)
--- a/Show More
+++ b/Show More