24

I want to write something that takes a sentence and identifies each word it contains and defines what part of speech each word is.

For example

Hello World, I am a sentence

would return this

verb noun, pronoun verb adjective noun

Ideally, I'd like to eventually take it one step further and take a sentence and programmatically have it understand what it is trying to interpret and maybe do something about it.

So my question is, has someone heard of something like this?

Vinny
  • 259

4 Answers4

20

This is called Natural Language Processing and it's a huge, complex field. Something like you describe is a monumental achievement, and even the best solutions, like Watson, are nowhere near perfect.

Things like this make it challenging: "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo"

a grammatically correct sentence in American English, used as an example of how homonyms and homophones can be used to create complicated linguistic constructs. It has been discussed in literature since 1972... It was also featured in Steven Pinker's 1994 book The Language Instinct as an example of a sentence that is "seemingly nonsensical" but grammatical...

The sentence's meaning becomes clearer when it's understood that it uses the city of Buffalo, New York and the somewhat-uncommon verb "to buffalo" (meaning "to bully or intimidate"), and when the punctuation and grammar is expanded so that the sentence reads as follows: "Buffalo buffalo that Buffalo buffalo buffalo, buffalo Buffalo buffalo." The meaning becomes even clearer when synonyms are used: "Buffalo bison that other Buffalo bison bully, themselves bully Buffalo bison."

gnat
  • 20,543
  • 29
  • 115
  • 306
Ryathal
  • 13,486
  • 1
  • 36
  • 48
6

Though splitting a sentence and determining the grammatical correctness along with solving your first problem is easier than your second problem, many complexities like verb-nouns or gerunds like swimming, programming, etc and other such intricacies, it still is a challenge - See Morons' answer.

But your second problem - people have put in huge efforts to find a perfect solution, but a really perfect "interpretation" algorithm is not realizable practically for any natural language like English - there are variations that will screw up your algorithm. This field - a hybrid between AI, Computer Science and Linguistics is known as NLP. Consider this: Even Google Translate is not perfect when "interpreting" sentences.

But nevertheless, this is a very interesting field to dabble with.

yati sagade
  • 2,089
4

I think you should start reading this Wikipedia article:

http://en.wikipedia.org/wiki/Part-of-speech_tagging

(it is a research field, don't expect any easy solution for it.)

Morons
  • 14,706
Doc Brown
  • 218,378
-1

A cheap way of doing this would be to set up a database of the dictionary (I'm almost positive that someone has done this).

Need two fields in the table: word and usage

Turn the phrase into an array of strings, (each word being a string) and independently:

select 'usage' from Dictionary WHERE 'word' = $word; 

It's a heavy solution, but one that I've used in the past.

gnat
  • 20,543
  • 29
  • 115
  • 306