|
The Architecture of a Natural Language Processor |
| | | | Submitted on: 3/25/2002 5:56:39 AM
By: Patrick Ingle
Level: Intermediate User Rating: Unrated Compatibility:C++ (general)
Users have accessed this article 1583 times. | (About the author) |
| | Describes the architecture of a Natural Language Processor which will be written in C++. A natural language processor is used to synthesize english grammatical queries into machine queries. | | | Terms of Agreement:
By using this article, you agree to the following terms...
1) You may use
this article in your own programs (and may compile it into a program and distribute it in compiled format for languages that allow it) freely and with no charge.
2) You MAY NOT redistribute this article (for example to a web site) without written permission from the original author. Failure to do so is a violation of copyright laws.
3) You may link to this article from another website, but ONLY if it is not wrapped in a frame.
4) You will abide by any additional copyright restrictions which the author may have placed in the article or article's description. | Introduction
Natural Language Processing also known as Natural Language Understanding is the ability for an IT system to take grammatically correct sentences and synthesize them to a machine level understanding for processing.
There are many NLP/NLU on the market, identified as grammar checkers. These can be found in your popular word processing software. There is a lack of open source or even an explanation of how grammar checkers function. Their operation is as proprietary as the design of golf balls. A major factor for the lack of this information in the open source community is the extensive study and understanding of the grammar and grammatical structure, followed by the correct implementation. Note that a correct implementation of NLU is the basis for artificial intelligence (AI), yet the implementation presented here may not be truly AI-oriented.
There have been attempts to develop NLU processors, such as the ‘Alicebots’, but they are limited and rely on set predefined responses to known inputs.
On the other hand, NLU needs to start with an understanding of how we, as humans, first learn our grammar and language. This will be covered in the next section on Background.
English grammar will be analyzed first since it is the most common and the most difficult to comprehend.
Background
When we first began to learn our language, our parents, teachers and peers taught our words and relied on the method of rote to remember and use our new communicating ability. As we progressed we still relied on rote, but not only to quickly recall known responses to input or stimulus but also to remember proper grammatical rules so we can derived and create new responses.
Please note that our communication can out of necessity for responding to stimulus within our environment. Without stimulus then we would have no needed for communication. This makes the problem more difficult or simpler for our computer system to acquire NLU because the computer will only respond to stimulus until improvements in AI allow the computer to initiate intelligent stimulus. The computer can prompt for a response, but this is not AI.
English grammar is based on rules and patterns. This is why your mother always corrected you when you used improper grammar at the dinner table. This is the easy part, or so to speak, for implementing NLU in software. The communication input is checked against the rules for validity and for known statements. If a match is found, the appropriate action is taken. Otherwise, the input is checked against known patterns to grasp an understanding of the context of the statement. The computer will further break down the input into manageable units. This part is more complex, but needs to be generic to handle almost any communication input.
When we rely on rote, we commit a statement and associated action to memory. This can be same for the computer. The computer uses wrote in the form database tables, where the rules and known inputs are stored.
Now that we have a brief understanding of how we decipher communication input and make our response, we then look at some of the basics components of communication. Most of our communication is made in the form of sentences. We know sentences have rules and patterns. Every sentence has a verb and sentences contained a subject and predicate. The basic unit of a sentence is a word and a word can be classified as noun, verb, adjective, adverb, preposition, interjection, conjunction and participle. Now you see the importance of your English 1010 class in grammar school and college? Depending on the classification of the word, that word will have additional common grammatical rules. In addition, the sentence also has patterns, which identify a complete thought. This is important, a correct sentence will always identify a complete thought, anything less is slang and is deemed a grammatical error and will be responded as such. Based on the placement of the words with the associated classifications, a pattern is born. There are known patterns regardless of the word usage.
Once we have identified the pattern, then the process continues to identify the context and meaning of the sentence. Here again, these known patterns have rules to decipher the context and meaning and this is the most complex portion of the NLU because once you have the meaning of a sentence, the computer can then respond appropriately.
The rules and words with their classifications should be stored in database tables with a generic engine used to extract the words from the sentence and validating each word and sentence pattern.
>>>> TO BE CONTINUED ...
| | Other 1 submission(s) by this author
| | | Report Bad Submission | | | Your Vote! |
See Voting Log | | Other User Comments | 3/25/2002 10:10:53 AM:Blaine Sahazian Plagurism is great isn't it =P
| 3/25/2002 9:48:49 PM:nhsxth Your lack of knowledge in the field of
artificial intelligence is way too
obvious. It is completely absurd to
presume that natural language
processing is the basis of AI! I have
one nagging question: how do you expect
to implement "NLP" when you can't even
speak with proper grammar? Don't read
this, it's a complete waste of your
time.
| 3/26/2002 6:34:08 PM:Patrick Ingle I never claimed to be an expert of AI.
The technology to implement AI is still
relatively new. This posting, though
not complete and loaded with
grammatical errors (obvious only a
draft) but is provided to promote
interest, discussion and opinions.
There is a lack of sufficient
explanation and study within the open
souce community. Everyone is differnt,
a waste for some is an inspiration to
others. What is your experience with AI
,NLP, grammar? Would you like to
co-author this tutorial?
| 4/2/2002 3:04:45 PM:nhsxth No, I would not like to co-author this.
Sorry. The technology to implement AI
is not new at all; it has been around
since the inception of the computer.
Read Hackers, by Levy. There is also an
abundance of explanatoin and study in
matters related to this. Sorry to be
mean about it, but you have no idea
what you're talking about.
| | Add Your Feedback! | Note:Not only will your feedback be posted, but an email will be sent to the code's author in your name.
NOTICE: The author of this article has been kind enough to share it with you. If you have a criticism, please state it politely or it will be deleted.
For feedback not related to this particular article, please click here. | | |