Updated 2009-07-20 15:32:50 by AK

By vkvalli.

Why we need to understand Natural language? Natural languages evolved. The ability to express something is in a language has high correlation with brain structure. Natural language studies revealed that nearly all languages have a similar structure/grammar, but superficial differences. This I remember was based on some Chomsky's findings, but don't have reference off-hand. The evolution of natural language was based on the cognitive limits of human mind.

This insight may help in the design of programming language so that the man-machine interaction does not cause too much cognitive load and the resulting stress.

Most of the interaction of man-computer would fall under imperative statement type. Typically computer languages are classified as imperative and declarative. XML is more of a declarative language. The general thought is imperative languages are like procedural languages and object-oriented languages are different.

In imperative statements, there are three entities involved. Subject, Object, predicate. "John! close the door," is an example of an imperative statement. Here "John" is the subject. "door" is the object. "close" is the predicate.

More simply, the subject acts on the object, the predicate denotes the action. Most of the time, the state of the object gets changed as a result of the action. One could see an analogy here. The new Security-enhanced linux gives more fine-grained access control in linux. There, when writing authorisation rules, we talk of 3 entities, subject, object, action. A joe user opens a file for reading. "joe user" is subject , file is object, reading is action.

Most of the times, when we are programming, the subject is the computer. We ask the computer to do xyz things. So instead of saying "computer open this file", "computer close this file", we simply say "open this file", "close this file" etc.

Th order in which subject, object, predicate appear varies in different natural languages. In English, it is "subject, predicate, object". For example, in the Asian language Tamil, it is "subject, object, predicate". But what is important is a language consistently adopts one style.

Now let us see how object orientation comes in natural language. It is instinctive to see that predicates are like verbs, and the equivalent in programming languages is a "command" or "procedure". But there is a slight difference. A "procedure" maps to unique sequence of sub-actions/actions. But natural language verbs need not. To understand this, let us take the statements, "open the door", "open the letter", "open the computer". Let us say, we are instructing a robot to do this. Then the robot needs to know the sequence of actions for "open". But the sequence of actions for "open the door", "open the letter", "open the computer" are very different. For clarity, we can consider two kinds of verbs, specific verbs and generic verbs. Specific verbs map to unique sequence of actions, but generic verbs do not. In this case, let us have these additional verbs, "open_door", "open_letter", "open_computer", which maps to unique sequence for door, letter, computer. Now it is clear that open is kind of generic verb. Now, the robot to accomplish the task, will map the generic verb to specific verb based on the type of object on which it has to act.

Now a new entity is introduced, object-type. This is what is happening implicitly in our mind, when we speak. We use generic verbs. The subject to carry out his action, determines the object,object-type and maps the generic-verb to the right sequence of action(or specific-verb).

Therefore many new products keeps flooding the world but the verbs in the language do not explode. An imperative language like Tcl natively supports specific verbs with procedures (they map to unique set of sub-actions). If they support generic verbs/commands, then we accomplish the same effect as object-oriented paradigm. To do this, the language needs to support object-types, and a way to determine the object of a command.

To accomplish this see OO libraries

Now most OO languages seems to have gotten it wrong. They all try to simulate Smalltalk, which views things differently. Instead of subject acting on object like natural languages. In Smalltalk, one passes messages to objects and it acts on them. "door open" - one requests the door to open. Basically objects receives commands and act on them and change their state accordingly. I believe the Smalltalk way of viewing is inconsistent with our natural viewing. In Smalltalk, there is no subject. (Or may be you are the subject.)

I feel there is no need for Tcl to emulate Smalltalk notion of objects. Tcl could emulate natural language way of viewing objects, "open door", "open computer" etc. By doing this Tcl can consistently maintain the "predicate-object" sequence rather than "predicate-object" for non-OO and "object-predicate" for OO.

Lars H: An interesting argument, but I don't agree with the conclusion (that the language needs to support types), and indeed the argument is based on an incorrect premise. Nonetheless, it is interesting.

The error you make is that you implicitly assume that the objects of natural language grammars are equivalent to the objects of object-oriented programming, but OO-objects are in fact more like the subjects of natural languages. (In order to keep them apart, I'll write G-objects for the grammatical kind and CS-objects for the computer science kind.) The key difference is that of encapsulation -- CS-objects are supposed to hide their internals from outsiders, whereas the subject acting on a G-object needs both access to and understanding of this G-object. Since the only entity (in a pure object system) that is allowed access to the internals of a CS-object is that CS-object itself, that CS-object will grammatically be the subject that performs the action.

The examples "John, open the door!" and "John, open the letter!" can be taken as an example of this. For John to perform these actions, he both needs to understand doors and letters respectively -- how to turn the doorknob or rip open the letter -- and have access to the object in question. While perfectly sensible, this is not the OO way. An OOified letter would instead have an "open" button, which when pressed would cause the letter to open itself. (Similarly an OOified door would have a "open" button, so that its users neither needs access nor insight into such complicated machinery as locks, doorknobs, and hinges.) As it is the CS-object that is executing the predicate of the sentence, it is grammatically the subject, not the object.

From a grammatical point of view, the idea of OO can actually be seen as making programming more subjective, by introducing entities (the CS-objects) that can serve as subjects. Instead of having the programmer instruct "the computer" what to do, the programmer is instructing windows, documents, devices, etc. to do stuff, and by doing stuff they grammatically become subjects. Hence the usual "object method argument" order of OO programming is exactly the same as the English "subject predicate object"; the only difference is in the terminology.

vkvalli Thanks for giving this clarification. In fact after this post, I dwelled on what is a subject in the Smalltalk world. I more or less came to the same conclusion.

Here are some more thoughts on this. The following is with reference to Natural languages/real world. There are "things" in the world. Few "things" can take the role of "subjects" and most "things" take the role of "objects". The criterion is - Those "things" which are capable of inflicting state-changes on themselves and others, can be "subjects". Other "things" are "objects". Therefore we can communicate with only "subject"-able things.

In real-world, human beings, robots, computers are "subject"-able things. Therefore they can be subjects in imperative statements. In computer-world, in a simple view, computer, a running program, process can be "subjects" and rest are objects. Hence in SE-Linux, they categorize users, processes as subjects, for authorization framework. (I am not sure, whether in computer world, a hard-boundary can be placed on what can be "subject"-able things.)

Now, in the Smalltalk world-view, everything is a subject. It looks every object like a tiny computer or process. I remember Alan Kay phrasing something like this on his introductory paper on Smalltalk system. This world-view is different from real world-view and the world-view of imperative language. They say OO is a different paradigm because of this difference in world-view. Now the issue is - whether this world-view is required to achieve OO benefits like data-encapsulation, type, subtype system? My feeling is, this world-view of everything is "subject" is not mandatory to achieve OO benefits. It is just Smalltalk's world-view.

By having a world-view of not everthing is a subject but by supporting typed-objects, an imperative language can achieve OO benefits without a paradigm shift. For that imperative languages, needs to support generic or polymorhic commands.(I am not very sure, polymorphic is the right-term).

Thoughts on Unified view of data and program

NEM There are several senses to the word polymorphism. In one sense it means having the same code operate on values of different "types". One way to do this is to have a universal type (in Tcl's case the string, in Java it is java.lang.Object mostly) which everything fits into. Another way is to have operations be parameterised by types (parametric polymorphism) as supported by several typed functional programming languages such as Haskell or ML. For instance, one such operation might be to find the length of a list. To do this, we do not need to know the types of the elements of the list, but only that the list is indeed a list. In Tcl this is done like (assuming we had no llength command):
 proc list_length list {
     set length 0
     foreach item $list { incr length }
     return $length

In Haskell this would be done as:
 list_length        :: [a] -> Int
 list_length []     = 0
 list_length (x:xs) = 1 + list_length xs

The first line above gives the type declaration, where [a] means a list of elements of any type, a. So, here we have polymorphism in that the same code can be called with several different types of arguments (lists of integers, lists of characters, or even lists of heterogenous types in the Tcl version). This is also the sort of polymorphism offered by generics in Java or templates in C++.

A different sort of polymorphism is what is sometimes known as overloading or ad-hoc polymorphism. In this sort of polymorphism, each type can provide a separate implementation for the same operation. This is the sort of polymorphism that is usually referred to in OO programming. This is what $object method ... syntax supports quite well, as it allows the object to decide how to interpret the message depending on its type. For instance, in Snit you could do:
 snit::type Person {
    variable name
    constructor n { set name $n }
    method say msg { puts "$name says '$msg'" }
 snit::type Dog {
    variable name
    constructor n { set name $n }
    method say msg { puts "$name barks 'Woof! Woof!" }
 Person neil "Neil"
 Dog fido "Fido"
 foreach object {neil fido} {
     $object say "Hello, World!"

Here the say method is overloaded for each type and acts differently depending on the type of the object it is called on.

Both types of polymorphism are useful. Tcl supports the first fine on its own, and supports the second through the convention of having objects be commands which you pass messages to. In some OO systems, such as TOOT, you can also explicitly call a particular implementation, via an ensemble or some other means, thus I might be able to do:
 Person say $neil "Some message"

What syntax you prefer is really quite a minor point compared with supporting the various semantic options for polymorphism.

One final point, it is not a straight-forward claim that having a programming language be similar to a natural language decreases "cognitive load" on the person learning that language. Indeed, the whole move towards more formal languages (e.g. with notations for logic, maths, and now programming) has been motivated in part because natural languages are so ambiguous, thus increasing confusion when talking about very specific topics.

TJK Very interesting. I agree with NEM's last point but putting that to one side, I've always thought OO research wasn't broad enough in its investigation of how to improve the human computer interface. As an example a lot of the really interesting things that computers can do have to do with time yet no one has tried to draw the parallel between grammatical tense and action dispatching. For instance wouldn't it be great to have these concepts available programmatically: uncertainty, frequency, completion, duration, possibility, and even whether information derives from experience or hearsay.

DKF: That last item tends to go by the name “provenance”[1]. I know people who work on this sort of thing.

[Alan] - 2009-07-19 14:08:52

Interesting misinterpretation here: that between human commonsense and science. The former (and the semantics of many natural languages) were formed before modern science, and think e.g. of rocks as inert things to be acted by actors. Modern physics sees cause and effect as the result of objects receiving messages which they respond to -- everything is local. One way to look at this is that cave humans can get confused if they are able to kick a rock -- they might get the wrong idea and be surprised that they can't just kick a boulder and have it move (the boulder "doesn't want to"-- thinking of the cave person as a "causer" is a very naive view). Smalltalk was set up to have a simple uniform semantics that could be like science at all levels. Polymorphisms can help to limit the number of concepts that have to be contemplated. And so forth.

Best wishes,

Alan Kay