tagaligner: Aligner for parallel text

What is Tagaligner?

Tagaligner is a program that segments and aligns corresponding translated sentences, contained in two markup-language-based files and generates a TMX translation memory from them for use in computer-assisted translation.

Tagaligner uses the tag structure of the webpages and XML-based languages to improve the results of classic geometrical aligners. The aligner has been tested with XHTML webpages preprocessed with the tidy program and using the ISO-8859-1 and UTF8 encoding, but may work for HTML files and other encodings.

Who develops it?

Initial and ongoing development is mainly by Enrique Sánchez Villamil, with help from Sergio Ortiz-Rojas, Susana Santos-Antón and Mikel L. Forcada (members of the the Transducens research group at the Departament de Llenguatges i Sistemes Informàtics , Universitat d'Alacant, Spain).

From November 2007, Miquel Simon retook the project, since it was stopped for a long time. Important changes are being applied to the project from November 2007 to July 2008.

Since September 2008, the project war retaken by Miquel Esplà. From then, TagAligner has been rewritten and redesigned. Now it is composed by two packages, the library LibTagAligner and the own application, TagAligner. Configuration file has been improved and his format has been changed, and stable versions have been released

The program originated inside the project "Finite-state translators based on bitexts harvested from the net" (2004–2006), that was funded by the now defunct Ministry of Science and Technology of Spain through grant number TIC2003-08681-C02.


Tagaligner 2.0rc1 release notes (26/06/2008):

Today we are announcing the first release of version 2.0 of Tag-Aligner. After a period of development, we have included some features that improve the performance of the aligner.

The main breakthrough of this version is the creation of an XML configuration file. In this file users can define the values for different paramenters that are used during the alignment process.

Now the aligner can process any XML-like markup-language-based file beyond XHTML. In the configuration file you can define the set of tags that the aligner should take into account. This will let you align documents from different sources: XHTML, XML, OpenOffice.org, DocBook, or any XML-based file.

Other parameters such as the edit costs are also set in the configuration file. In this way users may change them so that it suits their needs.

The results obtained from the tests performed are, at least, as good as they were in the previous version, with the addition of testing with new formats.

Tagaligner 2.0rc3 release notes (13/11/2008):

In this release, important problems in the management of dynamic memory have been solved. If you are using the 2.0rc1, we advise you to chgange to this new version of Tag Aligner 2.0.

Tagaligner 2.1rc1 release notes (3/12/2008):

From today, you can download the new version of Tag Aligner, the 2.1rc1 one. In this new version, we have been working in: to fix some functional defects, convert the code with the containers from the STL of C++ and to optimize the opperations and, consequently, to reduce the computational costs of the opperations.

Besides, an important task has been developped for the redundant code elimination and the incorporation of the inherence to the dessign to make easyer to add new aligners to the actual application.

Finally, its important to say that from now it will be necessary to indicate the configuration file's path in every callin to the application with the option -c. In other case, it will be assumed that there is a file named config.xml in the actual directory, in wich configuration parametters can been readed.

Tagaligner 2.1rc1 released (11/12/2008):

Solved little errors in alignment proces.

Tagaligner 2.2.0 released (28/01/2008):

In this new version of Tag Aligner there are some news:

TagAligner 3.0.0 released (16/03/2009):

In this new version of tag-aligner, some important changes has been performed. For the 3.0 version of TagAligner project, the configuration file has been modified to make it more configurable and compatible with other applications. From this versions, the compatibility with older configuration files is not supported. Those are the two most important things that has been changed in the configuration file:

As well, this is the first version that has been tested with unit-tests, stress-tests and integration-tests


The latest tagaligner package may be downloaded through the SourceForge page sf.net/projects/tag-aligner/. You can access the project's SVN to get a snapshot of current development.


Building and Installing


tagaligner option left_file left_language right_file right_language [output_file]



This application is released under the GNU General Public License .

Developers welcome!

