Positive? Negative? Neutral? Use the Stanford CoreNLP suite to analyze customer product reviews.
Download a PDF of this article
Sentiment analysis is a text classification task focused on identifying whether a piece of text is positive, negative, or neutral. For example, you might be interested in analyzing the sentiment of customer feedback on a certain product or in detecting the sentiment on a certain topic trending in social media.
This article illustrates how such tasks can be implemented in Java using the sentiment tool integrated into Stanford CoreNLP, an open source library for natural language processing.
The Stanford CoreNLP sentiment classifier
To perform sentiment analysis, you need a sentiment classifier, which is a tool that can identify sentiment information based on predictions learned from the training data set.
In Stanford CoreNLP, the sentiment classifier is built on top of a recursive neural network (RNN) deep learning model that is trained on the Stanford Sentiment Treebank (SST), a well-known data set for sentiment analysis.
In a nutshell, the SST data set represents a corpus with sentiment labels for every syntactically possible phrase derivable from thousands of sentences used, thus allowing for the capture of the compositional effects of sentiment in text. In simple terms, this allows the model to identify the sentiment based on how words compose the meaning of phrases rather than just by evaluating words in isolation.
To better understand the structure of the SST data set, you can examine the data set files downloadable from the Stanford CoreNLP sentiment analysis page.
In Java code, the Stanford CoreNLP sentiment classifier is used as follows.
To start, you build up a text processing pipeline by adding the annotators required to perform sentiment analysis, such as tokenize, ssplit, parse, and sentiment. In terms of Stanford CoreNLP, an annotator is an interface that operates on annotation objects, where the latter represent a span of text in a document. For example, the ssplit annotator is required to split a sequence of tokens into sentences.
The point is that Stanford CoreNLP calculates sentiment on a per-sentence basis. So, the process of dividing text into sentences is always followed by applying the sentiment annotator.
Once the text has been broken into sentences, the parse annotator performs syntactic dependency parsing, generating a dependency representation for each sentence. Then, the sentiment annotator processes these dependency representations, checking them against the underlying model to build a binarized tree with sentiment labels (annotations) for each sentence.
In simple terms, the nodes of the tree are determined by the tokens of the input sentence and contain the annotations indicating the predicted class out of five sentiment classes from very negative to very positive for all the phrases derivable from the sentence. Based on these predictions, the sentiment annotator calculates the sentiment of the entire sentence.
Setting up Stanford CoreNLP
Before you can start using Stanford CoreNLP, you need to do the following setup:
◉ To run Stanford CoreNLP, you need Java 1.8 or later.
◉ Download the Stanford CoreNLP package and unzip the package in a local folder on your machine.
◉ Add the distribution directory to your CLASSPATH as follows:
export CLASSPATH=$CLASSPATH:/path/to/stanford-corenlp-4.3.0/*:
After completing the steps above, you are ready to create a Java program that runs a Stanford CoreNLP pipeline to process text.
In the following example, you implement a simple Java program that runs a Stanford CoreNLP pipeline for sentiment analysis on text containing several sentences.
To start, implement a class that provides a method to initialize the pipeline and a method that will use this pipeline to split a submitted text into sentences and then to classify the sentiment of each sentence. Here is what the implementation of this class might look like.
//nlpPipeline.java
import java.util.Properties;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations;
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations;
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations.SentimentAnnotatedTree;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.util.CoreMap;
public class nlpPipeline {
static StanfordCoreNLP pipeline;
public static void init()
{
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");
pipeline = new StanfordCoreNLP(props);
}
public static void estimatingSentiment(String text)
{
int sentimentInt;
String sentimentName;
Annotation annotation = pipeline.process(text);
for(CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class))
{
Tree tree = sentence.get(SentimentAnnotatedTree.class);
sentimentInt = RNNCoreAnnotations.getPredictedClass(tree);
sentimentName = sentence.get(SentimentCoreAnnotations.SentimentClass.class);
System.out.println(sentimentName + "\t" + sentimentInt + "\t" + sentence);
}
}
}
The init() method initializes the sentiment tool in the Stanford CoreNLP pipeline being created, and it also initializes the tokenizer, dependency parser, and sentence splitter needed to use this sentiment tool. To initialize the pipeline, pass a Properties object with the corresponding list of annotators to the StanfordCoreNLP() constructor. This creates a customized pipeline that is ready to perform sentiment analysis on text.
In the estimatingSentiment() method of the nlpPipeline class, invoke the process() method of the pipeline object created previously, passing in text for processing. The process() method returns an annotation object that stores the analyses of the submitted text.
Next, iterate over the annotation object getting a sentence-level CoreMap object on each iteration. For each of these objects, obtain a Tree object containing the sentiment annotations used to determine the sentiment of the underlying sentence.
Pass the Tree object to the getPredictedClass() method of the RNNCoreAnnotations class to extract the number code of the predicted sentiment for the corresponding sentence. Then, obtain the name of the predicted sentiment and print the results.
To test the functionality above, implement a class with the main() method that invokes the init() method and then invokes the estimatingSentiment() method of the nlpPipeline class, passing sample text to the latter.
In the following implementation, the sample text is hardcoded in the program for simplicity. The sample sentences were designed to cover the entire spectrum of sentiment scores available with Stanford CoreNLP: very positive, positive, neutral, negative, and very negative.
//SentenceSentiment.java
public class SentenceSentiment
{
public static void main(String[] args)
{
String text = "This is an excellent book. I enjoy reading it. I can read on Sundays. Today is only Tuesday. Can't wait for next Sunday. The working week is unbearably long. It's awful.";
nlpPipeline.init();
nlpPipeline.estimatingSentiment(text);
}
}
Now, compile the nlpPipeline and SentenceSentiment classes and then run SentenceSentiment:
$ javac nlpPipeline.java
$ javac SentenceSentiment.java
$ java SentenceSentiment
Here is what the output should look like.
Very positive 4 This is an excellent book.
Positive 3 I enjoy reading it.
Neutral 2 I can read on Sundays.
Neutral 2 Today is only Tuesday.
Neutral 2 Can't wait for next Sunday.
Negative 1 The working week is unbearably long.
Very negative 0 It's awful.
The first column in the output above contains the name of the sentiment class predicted for a sentence. The second column contains the corresponding number code of the predicted class. The third column contains the sentence.
Analyzing online customer reviews
As you learned from the previous example, Stanford CoreNLP can return a sentiment for a sentence. There are many use cases, however, where there is a need to analyze the sentiment of many pieces of text, each of which may contain more than a single sentence. For example, you might want to analyze the sentiment of tweets or customer reviews from an ecommerce website.
To calculate the sentiment of a multisentence text sample with Stanford CoreNLP, you might use several different techniques.
When dealing with a tweet, you might analyze the sentiment of each sentence in the tweet and if there are some sentences that are either positive or negative you could rank the entire tweet respectively, ignoring the sentences with the neutral sentiment. If all (or almost all) the sentences in a tweet are neutral, then the tweet could be ranked neutral.
Sometimes, however, you don’t even have to analyze each sentence to estimate the sentiment of an entire text. For example, when analyzing customer reviews, you can rely on their titles, which often consist of a single sentence.
To work through the following example, you’ll need a set of customer reviews. You can use the reviews found in the NlpBookReviews.csv file accompanying this article. The file contains a set of actual reviews downloaded from an Amazon web page with the help of Amazon Review Export, a Google Chrome browser extension that allows you to download a product’s reviews with their titles and ratings to a comma-separated values (CSV) file. (You can use that tool to explore a different set of reviews for analysis.)
There is another requirement. Because the Java language lacks any native support for the efficient handling of CSV files, you’ll need a third-party library such as Opencsv, an easy-to-use CSV parser library for Java. You can download the Opencsv JAR and its dependencies. In particular, you will need to download the Apache Commons Lang library. Include them in the CLASSPATH as follows:
export CLASSPATH=$CLASSPATH:/path/to/opencsv/*:
Then, add the following method to the nlpPipeline class created in the previous section:
//nlpPipeline.java
...
public static String findSentiment(String text) {
int sentimentInt = 2;
String sentimentName = "NULL";
if (text != null && text.length() > 0) {
Annotation annotation = pipeline.process(text);
CoreMap sentence = annotation
.get(CoreAnnotations.SentencesAnnotation.class).get(0);
Tree tree = sentence
.get(SentimentAnnotatedTree.class);
sentimentInt = RNNCoreAnnotations.getPredictedClass(tree);
sentimentName = sentence.get(SentimentCoreAnnotations.SentimentClass.class);
}
return sentimentName;
}
As you might notice, the code above is similar to the code in the estimatingSentiment() method defined in the previous section. The only significant difference is that this time you don’t iterate over the sentences in input text. Instead, you get only the first sentence, since in most cases a review’s title consists of a single sentence.
Create a ReviewsSentiment.java file with the main method that will read the reviews from a CSV file and pass them to the newly created findSentiment() for processing, as follows:
import com.opencsv.CSVReader;
import com.opencsv.CSVParser;
import com.opencsv.CSVReaderBuilder;
import com.opencsv.exceptions.CsvException;
import java.io.FileReader;
import java.io.IOException;
public class ReviewsSentiment {
public static void main(String[] args) throws IOException, CsvException {
nlpPipeline.init();
String fileName = "NlpBookReviews.csv";
try (CSVReader reader = new CSVReaderBuilder(new FileReader(fileName)).withSkipLines(1).build())
{
String[] row;
while ((row = reader.readNext()) != null) {
System.out.println("Review: " + row[1] + "\t" + " Amazon rating: " + row[4] + "\t" + " Sentiment: " + nlpPipeline.findSentiment(row[1]));
}
}
}
}
You’ll need to recompile the nlpPipeline class and compile the newly created ReviewsSentiment class. After that, you can run ReviewsSentiment as follows:
$ javac nlpPipeline.java
$ javac ReviewsSentiment.java
$ java ReviewsSentiment
The output should look as follows:
Review: Old version of python useless Amazon rating: 1 Sentiment: Negative
Review: Excellent introduction to NLP and spaCy Amazon rating: 5 Sentiment: Positive
Review: Could not get spaCy on MacBook Amazon rating: 1 Sentiment: Negative
Review: Good introduction to SPACY for beginner. Amazon rating: 4 Sentiment: Positive
Source: oracle.com
0 comments:
Post a Comment