41

Most programming languages (both dynamically and statically typed languages) have special keyword and/or syntax that looks much different than declaring variables for declaring functions. I see functions as just as declaring another named entity:

For example in Python:

x = 2
y = addOne(x)
def addOne(number): 
  return number + 1

Why not:

x = 2
y = addOne(x)
addOne = (number) => 
  return number + 1

Similarly, in a language like Java:

int x = 2;
int y = addOne(x);

int addOne(int x) {
  return x + 1;
}

Why not:

int x = 2;
int y = addOne(x);
(int => int) addOne = (x) => {
  return x + 1;
}

This syntax seems more natural way of declaring something (be it a function or a variable) and one less keyword like def or function in some languages. And, IMO, it is more consistent (I look in the same place to understand the type of a variable or function) and probably makes the parser/grammar a little bit simpler to write.

I know very few languages uses this idea (CoffeeScript, Haskell) but most common languages have special syntax for functions (Java, C++, Python, JavaScript, C#, PHP, Ruby).

Even in Scala, which supports both ways (and has type inference), it more common to write:

def addOne(x: Int) = x + 1

Rather than:

val addOne = (x: Int) => x + 1

IMO, atleast in Scala, this is probably the most easily understandable version but this idiom is seldom followed:

val x: Int = 1
val y: Int = addOne(x)
val addOne: (Int => Int) = x => x + 1

I am working on my own toy language and I am wondering if there are any pitfalls if I design my language in such a way and if there are any historical or technical reasons this pattern is not widely followed?

gnat
  • 20,543
  • 29
  • 115
  • 306
pathikrit
  • 560

12 Answers12

59

It's because it's important for humans to recognize that functions are not just "another named entity". Sometimes it makes sense to manipulate them as such, but they are still able to be recognized at a glance.

It doesn't really matter what the computer thinks about the syntax, as an incomprehensible blob of characters is fine for a machine to interpret, but that would be nigh-impossible for humans to understand and maintain.

It really is the same reason as why we have while and for loops, switch and if else, etc, even though all of them ultimately boil down to a a compare and jump instruction. The reason is because it's there for the benefit of the humans maintaining and understanding the code.

Having your functions as "another named entity" the way you are proposing will make your code harder to see, and thus harder to understand.

whatsisname
  • 27,703
49

I think the reason is that most popular languages either come from or were influenced by the C family of languages as opposed to functional languages and their root, the lambda calculus.

And in these languages, functions are not just another value:

  • In C++, C# and Java, you can overload functions: you can have two functions with the same name, but different signature.

  • In C, C++, C# and Java, you can have values that represent functions, but function pointers, functors, delegates and functional interfaces all are distinct from functions themselves. Part of the reason is that most of those are not actually just functions, they are a function together with some (mutable) state.

  • Variables are mutable by default (you have to use const, readonly or final to forbid mutation), but functions can't be reassigned.

  • From a more technical perspective, code (which is composed of functions) and data are separate. They typically occupy different parts of memory, and they are accessed differently: code is loaded once and then only executed (but not read or written), whereas data is often constantly allocated and deallocated and is being written and read, but never executed.

    And since C was meant to be "close to the metal", it makes sense to mirror this distinction in the syntax of the language too.

  • The "function is just a value" approach that forms the basis of functional programming has gained traction in the common languages only relatively recently, as evidenced by the late introduction of lambdas in C++, C# and Java (2011, 2007, 2014).

svick
  • 10,137
  • 1
  • 39
  • 53
11

You might be interested to learn that, way back in prehistoric times, a language called ALGOL 68 used a syntax close to what you propose. Recognising that function identifiers are bound to values just like other identifiers are, you could in that language declare a function (constant) using the syntax

function-type name = (parameter-list) result-type : body ;

Concretely your example would read

PROC (INT)INT add one = (INT n) INT: n+1;

Recognising the redundancy in that the initial type can be read off from the RHS of the declaration, and being a function type always starts with PROC, this could (and usually would) be contracted to

PROC add one = (INT n) INT: n+1;

but note that the = still comes before the parameter list. Also note that if you wanted a function variable (to which another value of the same function type could later be assigned), the = should be replaced by :=, giving either one of

PROC (INT)INT func var := (INT n) INT: n+1;
PROC func var := (INT n) INT: n+1;

However in this case both forms are in fact abbreviations; since the identifier func var designates a reference to a locally generated function, the fully expanded form would be

REF PROC (INT)INT func var = LOC PROC (INT)INT := (INT n) INT: n+1;

This particular syntactic form is easy to get used to, but it clearly did not have a large following in other programming languages. Even functional programming languages like Haskell prefer the style f n = n+1 with = following the parameter list. I guess the reason is mainly psychological; after all even mathematicians don't often prefer, as I do, f = nn + 1 over f(n) = n + 1.

By the way, the above discussion does highlight one important difference between variables and functions: function definitions usually bind a name to one specific function value, that cannot be later changed, whereas variable definitions usually introduce an identifier with an initial value, but one that can change later. (It is not an absolute rule; function variables and non-function constants do occur in most languages.) Moreover, in compiled languages the value bound in a function definition is usually a compile-time constant, so that calls to the function can be compiled using a fixed address in the code. In C/C++ this is even a requirement; the equivalent of the ALGOL 68

PROC (REAL) REAL f = IF mood=sunny THEN sin ELSE cos FI;

cannot be written in C++ without introducing a function pointer. This kind of specific limitations justify using a different syntax for function definitions. But they depend on the language semantics, and the justification does not apply to all languages.

Marc van Leeuwen
  • 1,013
  • 7
  • 10
8

You mentioned Java and Scala as examples. However, you overlooked an important fact: those aren't functions, those are methods. Methods and functions are fundamentally different. Functions are objects, methods belong to objects.

In Scala, which has both functions and methods, there are the following differences between methods and functions:

  • methods can be generic, functions can't
  • methods can have no, one or many parameter lists, functions always have exactly one parameter list
  • methods can have an implicit parameter list, functions can't
  • methods can have optional parameters with default arguments, functions can't
  • methods can have repeated parameters, functions can't
  • methods can have by-name parameters, functions can't
  • methods can be called with named arguments, functions can't
  • functions are objects, methods aren't

So, your proposed replacement simply doesn't work, at least for those cases.

Jörg W Mittag
  • 104,619
3

The reasons that I can think of are:

  • It is easier for the compiler to know what we declaring.
  • It's important for us to know (in a trivial way) whether this is a function or a variable. Functions are usually black boxes and we don't care about their internal implementation. I dislike type inference on return types in Scala, because I believe that it's easier to use a function that has a return type: it is often the only documentation provided.
  • And the most important one is the following the crowd strategy that is used in designing programming languages. C++ was created to steal C programmers, and Java was designed in a way that doesn't scare C++ programmers, and C# to attract Java programmers. Even C#, which I think is a very modern language with an amazing team behind it, copied some mistakes from Java or even from C.
TRiG
  • 1,172
2

Turning the question around, if one isn't interested in trying to edit source code on a machine which is extremely RAM-constrained, or minimize the time to read it off a floppy disk, what's wrong with using keywords?

Certainly it's nicer to read x=y+z than store the value of y plus z into x, but that doesn't mean that punctuation characters are inherently "better" than keywords. If variables i, j, and k are Integer, and x is Real, consider the following lines in Pascal:

k := i div j;
x := i/j;

The first line will perform a truncating integer division, while the second will perform a real-number division. The distinction can be made nicely because Pascal uses div as its truncating-integer-division operator, rather than trying to use a punctuation mark which already has another purpose (real-number division).

While there are a few contexts in which it can be helpful to make a function definition concise (e.g. a lambda which is used as part of another expression), functions are generally supposed to stand out and be easily visually recognizable as functions. While it might be possible to make the distinction much more subtle and use nothing but punctuation characters, what would be the point? Saying Function Foo(A,B: Integer; C: Real): String makes it clear what the function's name is, what parameters it expects, and what it returns. Maybe one could shorten it by six or seven characters by replacing Function with some punctuation characters, but what would be gained?

Another thing to note is that there is an most frameworks a fundamental difference between a declaration which will always associate a name with either a particular method or a particular virtual binding, and one which creates a variable which initially identifies a particular method or binding, but could be changed at runtime to identify another. Since these are very semantically very different concepts in most procedural frameworks, it makes sense that they should have different syntax.

supercat
  • 8,629
2

Well, the reason might be, that those languages are not functional enough, so to say. In other words, you rather seldomly define functions. Thus, the use of an extra key word is acceptable.

In languages of the ML or Miranda heritage, OTOH, you define functions most of the time. Look at some Haskell code, for instance. It's literally mostly a sequence of function definitions, many of those have local functions and local functions of those local functions. Hence, a fun keyword in Haskell would be a mistake as great as requiring an assingment statement in an imperative language to start with assign. Cause assignment is probably by far the single most frequent statement.

Ingo
  • 3,941
1

Personally, I see no fatal flaw in your idea; you may find that it's trickier than you expected to express certain things using your new syntax, and/or you may find that you need to revise it (adding various special cases and other features, etc), but I doubt you'll find yourself needing to abandon the idea entirely.

The syntax you've proposed looks more or less like a variant of some of the notation styles sometimes used to express functions or types of functions in mathematics. This means that, like all grammars, it will probably appeal more to some programmers than others. (As a mathematician, I happen to like it.)

However, you should note that in most languages, the def-style syntax (i.e. the traditional syntax) does behave differently from a standard variable assignment.

  • In the C and C++ family, functions aren't generally treated as "objects", i.e. chunks of typed data to be copied and put on the stack and whatnot. (Yes, you can have function pointers, but those still point at executable code, not at "data" in the typical sense.)
  • In most OO languages, there's special handling for methods (i.e. member functions); that is, they're not just functions declared inside the scope of a class definition. The most important difference is that the object on which the method is being called is typically passed as an implicit first parameter to the method. Python makes this explicit with self (which, by the way, is not actually a keyword; you could make any valid identifier the first argument of a method).

You need to consider whether your new syntax accurately (and hopefully intuitively) represents what the compiler or interpreter is actually doing. It may help to read up on, say, the difference between lambdas and methods in Ruby; this will give you an idea of how your functions-are-just-data paradigm differs from the typical OO/procedural paradigm.

1

For some languages functions are not values. In such a language to say that

val f = << i : int  ... >> ;

is a function definition, whereas

val a = 1 ;

declares a constant, is confusing because you are using one syntax to mean two things.

Other languages, such as ML, Haskell, and Scheme treat functions as 1st class values, but provide the user with a special syntax for declaring function valued constants.* They are applying the rule that "usage shortens form". I.e. if a construct is both common and verbose, you should give the user a shorthand. It is inelegant to give the user two different syntaxes that mean exactly the same thing; sometimes elegance should be sacrificed to utility.

If, in your language, functions are 1st class, then why not try to find a syntax that is concise enough that you won't be tempted to find a syntactic sugar?

-- Edit --

Another issue no one has brought up (yet) is recursion. If you allow

{ 
    val f = << i : int ... g(i-1) ... >> ;
    val g = << j : int ... f(i-1) ... >> ;
    f(x)
}

and you allow

{
    val a = 42 ;
    val b = a + 1 ;
    a
} ,

does it follow that you should allow

{
    val a = b + 1 ; 
    val b = a - 1 ;
    a
} ?

In a lazy language (like Haskell), there is no issue here. In an language with essentially no static checks (like LISP), there is no issue here. But in a statically-checked eager language, you have to be careful about how the rules of static checking are defined, if you want to allow the first two and forbid the last.

-- End of Edit --

*It might be argued that Haskell does not belong in this list. It provides two ways to declare a function, but both are, in a sense, generalizations of the syntax for declaring constants of other types

0

This might be useful on dynamic languages where the type is not that important, but it's not that readable in static typed languages where always you want to know the type of your variable. Also, in object-oriented languages it's pretty important to know the type of your variable, in order to know what operations it supports.

In your case, a function with 4 variables would be:

(int, long, double, String => int) addOne = (x, y, z, s) => {
  return x + 1;
}

When I look at the function header and see (x, y, z, s) but I do not know the types of these variables. If I want to know the type of z which is the third parameter, I'll have to look at the beginning of the function and start counting 1, 2, 3 and then to see that the type is double. In the former way I look directly and see double z.

Random42
  • 10,520
  • 10
  • 52
  • 65
0

There is a very simple reason to have such a distinction in most languages: there is a need to distinguish evaluation and declaration. Your example is good: why not like variables? Well, variables expressions are immediately evaluated.

Haskell has a special model where there is no distinction between evaluation and declaration, which is why there is no need for a special keyword.

0

Functions are declared differently from literals, objects, etc. in most languages because they are used differently, debugged differently, and pose different potential sources of error.

If a dynamic object reference or a mutable object is passed to a function, the function can change the value of the object as it runs. This kind of side effect can make it difficult to follow what a function will do if it is nested within a complex expression, and this is a common problem in languages like C++ and Java.

Consider debugging some sort of kernel module in Java, where every object has a toString() operation. While it may be expected that the toString() method should restore the object, it may need to disassemble and reassemble the object in order to translate its value to a String object. If you're trying to debug the methods that toString() will be calling (in a hook-and-template scenario) to do its work, and accidentally highlight the object in the variables window of most IDE's, it can crash the debugger. This is because the IDE will try to toString() the object which calls the very code you're in the process of debugging. No primitive value ever does crap like this because the semantic meaning of primitive values is defined by the language, not the programmer.