Overview Description Examples Publications

Kea is distributed under the GNU General Public License. The current version 5.0 allows free as well as controlled indexing. It uses the latest version of the Weka machine learning workbench.


  • easy to install and use, direct from your code or from the command line
  • free or controlled indexing, with any vocabulary in text or SKOS format
  • latest libraries, including Jena-2.4 and Weka-3.5.5
  • easily applicable to new languages and domains
  • distributed with sample vocabularies in 3 languages (en, es, fr)
  • contains sample documents in 3 languages for creating and testing models


Download Kea from its Google Code project page. It includes source code, required libraries, test data and documentation.

Also consider using Maui, an algorithm for topic indexing, which can be used for the same tasks as Kea, but offers additional features. Maui also allows indexing using Wikipedia as a controlled vocabulary.

Examples of controlled vocabularies that can be used with Kea (and Maui)
  • Documentation
  • Free or Controlled Indexing?

    In free indexing, keyphrases are significant terms that appear in the document. Any document in the phrase is a potential keyphrase. The advantage of free indexing is that it can be applied to any document. The disadvantages are poor quality of extracted phrases (compared to controlled indexing) and the indexing is not consistent.

    In controlled indexing, keyphrases are chosen from a controlled vocabulary (a dictionary, thesaurus, or a list of terms). It has the advantage that all documents are indexed in a consistent way disregarding their wording. For example, two documents, one about "laptops" and another one about "notebooks", would be indexed with the same term, which is the preferred term in the controlled vocabulary to describe this concept.

    Older Versions

    Kea-4.1 (ZIP, 6.6 MB) -- controlled indexing only

    Kea-4.0 (ZIP, 1 MB) -- controlled indexing for agricultural documents only.

    Kea-3.0 (ZIP, 512 KB) -- free indexing only.

    • It is based on the original version, which has been re-implemented in Java. Version 3.0 additionally allows indexing German documents. Implementing further languages is straightforward.

    The oldest version of Kea is still available for download. It is implemented in Perl and Java (and a little C) for Unix systems. It is not straightforward to install it; you will probably have to know a little about Perl and Java. We strongly recommend you read the README file before you attempt it, so that you know what's in store.

    Here is a model for the old version of Kea that was trained on a collection of Computer Science Technical Reports and uses domain-specific keyphrase frequency information for better results.

    Other Resources

    KEA has also been integrated into the NLP workbench GATE (http://gate.ac.uk). Please send queries regarding the KEA plugin for GATE to the GATE support mailing list (http://gate.ac.uk/mail/index.html).

    There is a IKMV version of KEA 3.0 (for dotnet/C#) developed by Enrico Lu. It is available on his website: http://enricolu.myweb.hinet.net/.


    • Version 5.0 - Kea that combines controlled and free indexing. Works with the latest version of Weka,
    • Version 4.1 - Kea now works with any controlled vocabulary in SKOS format.
    • Version 4.0 - Kea for agricultural documents
    • Version 3.0 - Kea now also works for German documents
    • Version 2.0 - Kea is now fully Java-based
    • Version 1.1.4 - finally updated Kea-1.1.4-README.txt to cover building models, and added a count-lines.pl script to this end.
    • Version 1.1.3 - Moved Lynx command to script that checks for conditions that are likely to crash it.
    • Version 1.1.2 - Documentation, phrase length set at command-line.
    • Version 1.1.1 - Set output extension at command-line