Parts-of-Speech Tagging - POS Tagger in Apache OpenNLP

In this Apache OpenNLP tutorial, you will learn how to build a POS Tagger example in Java. The program tokenizes an input sentence, loads the OpenNLP POS model, tags each token with a Penn Treebank part-of-speech tag, and prints the probability for each predicted tag.

POS Tagger Example in Apache OpenNLP using Java

POS Tagger Example in Apache OpenNLP marks each word in a sentence with the word type.

In natural language processing, part-of-speech tagging identifies whether a token is used as a noun, verb, adjective, number, punctuation mark, and so on. Apache OpenNLP provides the POSModel and POSTaggerME classes for this task.

In this tutorial, we will learn how to use POS Tagger in Apache OpenNLP for Parts-of-Speech tagging.

Following is an example showing the output of POS Tagger for a given input sentence.

Input to POS Tagger	John is 27 years old.
Output of POS Tagger	John_NNP is_VBZ 27_CD years_NNS old_JJ ._.

The word types are the tags attached to each word. These Parts Of Speech tags used are from Penn Treebank.

Tag	Description
NNP	Proper Noun, Singular
VBZ	Verb, 3rd person singular present
CD	Cardinal Number
NNS	Noun, Plural
JJ	Adjective
.	.

For a complete list of Parts Of Speech tags from Penn Treebank, please refer https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

How Apache OpenNLP POS Tagging Works in This Java Example

The OpenNLP POS tagger does not read a raw sentence directly in this example. First, the sentence is split into tokens using the tokenizer model. Then the token array is passed to POSTaggerME.tag(). The tagger returns one POS tag for each token in the same order.

For the sentence John is 27 years old., the tokenizer produces tokens such as John, is, 27, years, old, and .. The POS tagger then labels them as proper noun, verb, cardinal number, plural noun, adjective, and punctuation.

Steps to Use POS Tagger in OpenNLP

Following are the steps to obtain the tags programmatically in Java using Apache OpenNLP.

Step 1: Tokenize the given input sentence into tokens.

</>

Copy

String sentence = "John is 27 years old.";
// tokenize the sentence
tokenModelIn = new FileInputStream("en-token.bin");
TokenizerModel tokenModel = new TokenizerModel(tokenModelIn);
Tokenizer tokenizer = new TokenizerME(tokenModel);
String tokens[] = tokenizer.tokenize(sentence);

Step 2: Read the parts-of-speech maxent model, “en-pos-maxent.bin” into a stream.

</>

Copy

InputStream posModelIn = new FileInputStream("en-pos-maxent.bin");

Step 3: Read the stream into parts-of-speech model, POSModel.

</>

Copy

POSModel posModel = new POSModel(posModelIn);

Step 4: Load the model into parts-of-speech tagger, POSTaggerME .

</>

Copy

POSTaggerME posTagger = new POSTaggerME(posModel);

Step 5: Grab the tags using the method POSTaggerME.tag(), and probability for the tag to be given using the method PosTaggerME.probs();

</>

Copy

String tags[] = posTagger.tag(tokens);
double probs[] = posTagger.probs();

Step 6: Finally, print what we got, the token, their respective tags and probabilities of the tags.

Example – POS Tagger in OpenNLP

In this example, we will implement all the steps mentioned above.

POSTaggerExample.java

</>

Copy

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.Tokenizer;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;

/**
 * www.tutorialkart.com
 * POS Tagger Example in Apache OpenNLP using Java
 */
public class POSTaggerExample {

	public static void main(String[] args) {

		InputStream tokenModelIn = null;
		InputStream posModelIn = null;
		
		try {
			String sentence = "John is 27 years old.";
			// tokenize the sentence
			tokenModelIn = new FileInputStream("en-token.bin");
			TokenizerModel tokenModel = new TokenizerModel(tokenModelIn);
			Tokenizer tokenizer = new TokenizerME(tokenModel);
			String tokens[] = tokenizer.tokenize(sentence);

			// Parts-Of-Speech Tagging
			// reading parts-of-speech model to a stream 
			posModelIn = new FileInputStream("en-pos-maxent.bin");
			// loading the parts-of-speech model from stream
			POSModel posModel = new POSModel(posModelIn);
			// initializing the parts-of-speech tagger with model 
			POSTaggerME posTagger = new POSTaggerME(posModel);
			// Tagger tagging the tokens
			String tags[] = posTagger.tag(tokens);
			// Getting the probabilities of the tags given to the tokens
			double probs[] = posTagger.probs();
			
			System.out.println("Token\t:\tTag\t:\tProbability\n---------------------------------------------");
			for(int i=0;i<tokens.length;i++){
				System.out.println(tokens[i]+"\t:\t"+tags[i]+"\t:\t"+probs[i]);
			}
			
		}
		catch (IOException e) {
			// Model loading failed, handle the error
			e.printStackTrace();
		}
		finally {
			if (tokenModelIn != null) {
				try {
					tokenModelIn.close();
				}
				catch (IOException e) {
				}
			}
			if (posModelIn != null) {
				try {
					posModelIn.close();
				}
				catch (IOException e) {
				}
			}
		}
	}
}

When the above program is run, the output to the console is shown in the following.

Output

Token	:	Tag	:	Probability
---------------------------------------------
John	:	NNP	:	0.9874932809932121
is   	:	VBZ	:	0.9667574183085389
27   	:	CD	:	0.9890000667325892
years	:	NNS	:	0.979181322666035
old  	:	JJ	:	0.9894752224172251
.   	:	.	:	0.9923321017451305

The output contains three useful values for each token: the original token, the assigned POS tag, and the model probability for that tag. A higher probability means the model is more confident about the tag it assigned in that context.

Apache OpenNLP POS Tagger Project Structure and Model Files

The structure of the project is shown below:

Structure of the project - POS Tagger Example in Apache OpenNLP — Structure of the project

Please note that in this example, the model files, en-pos-maxent.bin and en-token.bin are placed right under the project folder. Please find the models at http://opennlp.sourceforge.net/models-1.5/ .

If the model files are stored in another folder, update the file path in new FileInputStream(...). For example, if you place the files inside a folder named models, use paths such as models/en-token.bin and models/en-pos-maxent.bin.

</>

Copy

InputStream tokenModelIn = new FileInputStream("models/en-token.bin");
InputStream posModelIn = new FileInputStream("models/en-pos-maxent.bin");

Apache OpenNLP POS Tagger Classes Used in the Java Program

OpenNLP class or method	Purpose in this POS tagger example
`TokenizerModel`	Loads the tokenizer model from `en-token.bin`.
`TokenizerME`	Splits the input sentence into tokens.
`POSModel`	Loads the POS tagging model from `en-pos-maxent.bin`.
`POSTaggerME`	Assigns part-of-speech tags to the token array.
`tag(tokens)`	Returns the POS tag for each token.
`probs()`	Returns the probabilities for the tags assigned in the most recent tagging operation.

Printing OpenNLP POS Tags as token_tag Pairs

Many POS tagging examples show the result in the format word_TAG. After you get the tokens and tags arrays, you can combine the values using their index positions.

</>

Copy

for (int i = 0; i < tokens.length; i++) {
    System.out.print(tokens[i] + "_" + tags[i] + " ");
}

For the same sample sentence, the output is:

John_NNP is_VBZ 27_CD years_NNS old_JJ ._.

Using WhitespaceTokenizer with Apache OpenNLP POS Tagger

If your input is already clean and separated by spaces, you can use WhitespaceTokenizer for a smaller demonstration. This avoids loading the tokenizer model, but it is less flexible than a trained tokenizer for normal text because punctuation handling may differ.

</>

Copy

import java.io.FileInputStream;
import java.io.InputStream;

import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.WhitespaceTokenizer;

public class SimplePOSTaggerExample {
    public static void main(String[] args) throws Exception {
        String sentence = "John is 27 years old .";
        String[] tokens = WhitespaceTokenizer.INSTANCE.tokenize(sentence);

        try (InputStream posModelIn = new FileInputStream("en-pos-maxent.bin")) {
            POSModel posModel = new POSModel(posModelIn);
            POSTaggerME posTagger = new POSTaggerME(posModel);

            String[] tags = posTagger.tag(tokens);

            for (int i = 0; i < tokens.length; i++) {
                System.out.println(tokens[i] + " : " + tags[i]);
            }
        }
    }
}

Output

John : NNP
is : VBZ
27 : CD
years : NNS
old : JJ
. : .

Common Apache OpenNLP POS Tagger Errors and Fixes

Problem	Likely reason	Fix
`FileNotFoundException` for `en-pos-maxent.bin`	The model file is not in the working directory used by the Java program.	Place the model file in the correct folder or pass the correct relative or absolute path.
Different POS tags for a similar sentence	POS tagging depends on the trained model and the surrounding context.	Check the exact input sentence, tokenizer output, and model file used.
Punctuation is not tagged as expected	The sentence may not have been tokenized correctly before POS tagging.	Use the tokenizer model instead of simple string splitting for normal text.
`ClassNotFoundException` for OpenNLP classes	The OpenNLP tools library is not added to the Java project classpath.	Add the OpenNLP tools JAR or Maven/Gradle dependency to the project.

When to Train a Custom POS Tagger in Apache OpenNLP

The example in this tutorial uses a pre-trained English POS model. That is enough for learning the OpenNLP POS tagging API and for many general English examples. A custom POS tagger is useful when your text belongs to a special domain, uses unusual vocabulary, or follows a tagging convention that is different from the model you are using.

For a custom POS tagger, you need correctly tagged training data. The quality and consistency of this training data matter because the model learns tagging patterns from it.

Apache OpenNLP POS Tagger FAQ

What is POS tagging in Apache OpenNLP?

POS tagging in Apache OpenNLP is the process of assigning a part-of-speech tag to each token in a sentence. For example, a word may be tagged as a noun, verb, adjective, number, or punctuation mark.

Which OpenNLP classes are used for POS tagging in Java?

The main classes used in this example are POSModel and POSTaggerME. The sentence is first tokenized using classes such as TokenizerModel and TokenizerME.

Why should a sentence be tokenized before POS tagging?

The POS tagger works on tokens, not directly on a raw sentence string. Tokenization separates the sentence into words, numbers, and punctuation so that the tagger can assign one tag to each token.

What does POSTaggerME.probs() return?

POSTaggerME.probs() returns the probabilities for the tags assigned in the most recent call to tag(). These values indicate the model’s confidence for the selected tags.

Can OpenNLP POS Tagger be trained with custom data?

Yes. Apache OpenNLP can be used to train a custom POS tagging model if you have properly tagged training data. This is useful for domain-specific text or custom tagging requirements.

Editorial QA Checklist for Apache OpenNLP POS Tagger Example

The Java example loads both en-token.bin and en-pos-maxent.bin before tagging the sentence.
The tutorial explains that tokenization must happen before POS tagging.
The sample output maps every token to exactly one POS tag.
The meaning of common Penn Treebank tags such as NNP, VBZ, CD, NNS, and JJ is shown.
The model file location and common file path errors are covered for Java project setup.

Apache OpenNLP POS Tagger Conclusion

In this Apache OpenNLP Tutorial, we have seen how to tag parts of speech to the words in a sentence using POSModel and POSTaggerME classes of openNLP Tagger API.

The important sequence is: tokenize the sentence, load the POS model, create a POSTaggerME object, call tag(tokens), and read probabilities with probs() when needed. For reliable results, use the correct model files and check the tokenization output before interpreting the POS tags.

Following are some of the other example programs we have,

TutorialKart.com

Parts-of-Speech Tagging – POS Tagger in Apache OpenNLP

POS Tagger Example in Apache OpenNLP using Java

How Apache OpenNLP POS Tagging Works in This Java Example

Steps to Use POS Tagger in OpenNLP

Example – POS Tagger in OpenNLP

Apache OpenNLP POS Tagger Project Structure and Model Files

Apache OpenNLP POS Tagger Classes Used in the Java Program

Printing OpenNLP POS Tags as token_tag Pairs

Using WhitespaceTokenizer with Apache OpenNLP POS Tagger

Common Apache OpenNLP POS Tagger Errors and Fixes

When to Train a Custom POS Tagger in Apache OpenNLP

Apache OpenNLP POS Tagger FAQ

Editorial QA Checklist for Apache OpenNLP POS Tagger Example

Apache OpenNLP POS Tagger Conclusion

Popular Courses

SAP

CRM

SAP Resources

Apache

GUI

Programming

Databases

Mobile

Linux

Web & Server

Testing

Learning