Vector semantics to capture information from texts

We are all familiar with Internet searches. After typing a few well-chosen words the search engines are able to return to us a number of documents relevant to what we are looking for. At least we hope so and often it all works well.

Here we are concerned with a subtly different problem. Instead of finding relevant documents, assume we have the documents and wish to score them with regard to a theme or concept. For example, use of “we words” which standardly in English (with allowance for typos) are:

lets, let’s, our, ours, ourselves, us, we, we’d, we’ll, we’re, weve, we’ve

Together this vocabulary constitutes a representation of a concept, “we words”. In accepting this, we take the stance of accepting vector semantics (the representation of concepts as collections of terms) and we call the vocabulary a concept vector for “we words”.

This stance has proved productive and useful in extracting information from bodies of text (corpora). For example, Pennebaker in his book The Secret Life of Pronouns (2011, page 111) writes that

[W]e-words are used frequently when people are arrogant, emotionally distant and high in status. Males especially use we in a distancing or royal form:” We need to analyze that data” or ” We aren’t going to put up with higher taxes.”

OK, what about, say, corporations? Well we collected 23 years of 10-K (annual report) filings by IBM with the Security and Exchange Commission (1994-2016) and applied the we-words concept vector to the documents.

Here’s what we got.

IBMwewords

Normalizing by the length of the documents, we get an explosive growth in we-words starting about 2010. Why? One can conjecture. Simple text mining with concept vectors can hardly settle the matter, but it clearly can serve to find patterns of interest and to raise issues well worth following up.

— Christine Chou and Steven Kimbrough

 

Advertisements
This entry was posted in Concept vector, Text Analytics, Vector semantics and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s