8

This is sort of a follow up to this question about NLG research directions in the linguistics field.

How do personal assistant tools such as Siri, Google Now, or Cortana perform Natural Language Generation (NLG)? Specifically, the sentence text generation part. I am not interested in the text-to-speech part, just the text generation part.

I'm not looking for exactly how each one does it, as that information is probably not available.

I am wondering what setup is required to implement sentence generation of that quality?

  • What kind of data would you need in a database (at a high level)?
    • Does it require having a dictionary of every possible word and it's meaning, along with many books/corpora annotated and statistically analyzed added to it?
    • Does it require actually recording people talk in a natural way (such as from TV shows or podcasts), transcribing them to text, and then adding that somehow to their "system"? (to get really "human"-like sentences)
    • Or are there just simple syntax-based sentence patterns they are using, with no gigantic semantic "meaning" database? Where someone just wrote a bunch of regular expressions type thing..
  • What are the algorithms that are used for such naturally written human-like sentences?

One reason for asking is, it seems like the NLG field is very far from being able to do what Siri and Google Now and others are accomplishing. So what kind of stuff are they doing? (Just for the sentence text generation part).

2 Answers2

3

Siri typically doesn't "generate" sentences. She parses what you say and 'recognizes' certain keywords, sure, and for common responses, she will use a template, such as I found [N] restaurants fairly close to you or I couldn't find [X] in your music, [Username].

But most of her responses are canned, depending on her interpretation of your speech, in addition to a random number generator to choose a creative answer to a flippant question. Simply asking Siri "How much wood can a wood chuck chuck?" or "What is the meaning of life?" will generate any of a variety of answers. There are numerous cultural references and jokes built-in (and repeated verbatim) that prove with relative certainty that Siri is not just spontaneously generating most of her text, but pulling it from a database of some sort. It's likely that incoming questions are saved to a central server, where new responses to those questions can be created by Apple employees, allowing Siri to "learn".

Her text-to-speech part is good enough, however, that it sometimes makes it seem as though the answers are being generated...

Ayelis
  • 191
2

If you have a so-called deep syntactic representation of what you want to generate, such as read(he,book), then it's relatively easy to generate its linear representation. One needs a formal grammar describing the syntax of the language and a morphological lexicon for inflected forms. Generation is an order of magnitude easier than analysis (since one is "creating ambiguity", not resolving it).

If you have only a logical representation (say, in first-order logic), things get more complicated. Let's say, you have buy(John,book) ∧ read(John,book). One could generate two sentences like John bought a book. John read a book but it feels unnatural. A better output would be John bought a book. He read it. Even better would be to generate one compound sentence with and. The logical representation might be similar to the deep syntactic representation above but there are no pronouns, no clause boundaries, etc. The phase of translating a purely logical representation of what one wants to convey into something more "human-like" is called "language planning" or "sentence planning" and is the harder task in the process.

Atamiri
  • 121