14

In Python and JavaScript, semi-colons are optional.

In PHP, quotes around array-keys are optional ($_GET[key] vs $_GET['key']), although if you omit them it will first look for a constant by that name. It also allows 2 different styles for blocks (colon, or brace delimited).

I'm creating a programming language now, and I'm trying to decide how strict I should make it. There are a lot of cases where extra characters aren't really necessary and can be unambiguously interpreted due to priorities, but I'm wondering if I should still enforce them or not to encourage consistency.

What do you think?


Okay, my language isn't so much a programming language as a fancy templating language. Kind of a cross between Haml and Django templates. Meant to be used with my C# web framework, and supposed to be very extensible.

mpen
  • 1,889

11 Answers11

26

What I look for in a programming language (as opposed to a scripting language) is consistency and strong typing.

In current programming languages it is possible to omit the semicolon for instance in certain places without becoming ambiguous (the last expression in a {} block is one). If a programming language allows you to omit characters in these cases, a programmer now has an extra problem; on top of the general language syntax she now has to know in which cases it is allowed to omit parts of the syntax too.

This extra knowledge is no problem for the programmer writing the code, but it becomes a burden to anyone who has to interpret existing code at a later point in time (including the original author after a while).

Your PHP example opens the possibility for subtle bugs in a program when the constant key would be added in the same context. The compiler has no way of knowing that is not what was meant, so the problem only becomes apparent at runtime instead of compile time.

rsp
  • 1,139
18

Different types of languages have different uses, so the answer to this question really depends on what your going to be using it for.

For example, Perl is a very loose language, and I find it very useful for writing quick fixup or number crunching scripts. For solid robust projects I use C#.

You need to get the balance right for the target useage. The more strict it is, the longer you need to spend writing the code, but you get greater robustness, reusability, and easier to maintain.

BG100
  • 296
11

Every place where there's some ambiguity, the compiler needs to have some way to guess what the programmer really meant. Every time this happens, there's the chance that the programmer really meant something different, but didn't have the ambiguity-resolution rule down.

Writing logically correct code is hard enough already. Adding syntactical ambiguities may seem "friendly" on the surface, but it's an open invitation to introduce new, unexpected, hard-to-debug bugs into the codebase. Bottom line, be as strict as possible.

From your example, you mentioned that semicolons are optional in Python and JavaScript. For Javascript, at least, this is not entirely true. Semicolons are just as required in JS as they are in any other C family language. But the JavaScript parser is required by the language specification to insert missing semicolons under certain circumstances. This is widely regarded as a very bad thing because it tends to get your intentions wrong and screw up your code.

Mason Wheeler
  • 83,213
6

The answer to how loose you should make your language is equal to the answer of the question said in an Texas accent "How lucky do you feel, punk?".

Henrik
  • 634
4

Everyone wouldn't have to work so hard for coding consistency if the languages didn't have so much variation. We don't like it when users make requests that unnecessarily increase complexity, so why should be ask that of our development languages?

JeffO
  • 36,956
2

My personal preference is for the ability to have just enough strictness to catch my typos, but with as little extra boilerplate as possible. I talk about this issue at http://www.perlmonks.org/?node_id=755790.

That said, you're designing your language for yourself. You should make it be whatever you want it to be.

btilly
  • 18,340
1

I would suggest that a good programming language should have strict rules, which implementations would be expected to enforce consistently, but the rules should be written in such fashion so as to be helpful. I would further suggest that one should consider designing a language to avoid cases where the "Hamming distance" between two substantially-different programs is only one. Obviously one can't achieve such a thing with numeric or string literals (if a programmer who meant 123 instead types 1223 or 13, the compiler can't very well know what the program meant). On the other hand, if language were to use := for assignment and == for equality comparison, and not use a single = for any legal purpose, then would greatly reduce the possibilities both for accidental assignments which were supposed to be comparisons, and accidental do-nothing comparisons which were supposed to be assignments.

I would suggest that while there are places where it is useful for compilers to infer things, such inference is often most valuable in the simplest cases, and less valuable in the more complicated cases. For example, allowing the replacement of:

  Dictionary<complicatedType1,complicatedType2> item =
    new Dictionary<complicatedType1, complicatedType2()>;

with

  var item = new Dictionary<complicatedType1, complicatedType2()>;

does not require any complicated type inference, but makes the code vastly more readable (among other things, using the more verbose syntax only in scenarios where it's needed, e.g. because the type of the storage location doesn't precisely match the type of the expression creating it, will help call extra attention to places that may require it).

One major difficulty of attempting more sophisticated type inference is that ambiguous situations may arise; I would suggest that a good language should allow a programmer to include information to the compiler could use to either resolve such ambiguities (e.g. by regarding some typecasts as preferable to others), determine that they don't matter (e.g. because even though two possible overloads may execute different code, the programmer has indicated that they should behave identically in those cases where either could be used), or flag those (and only those) which cannot be handled in either of the above ways.

supercat
  • 8,629
1

To me, readability is most important.

To someone experienced with the language, a code fragment's meaning should be clear without having to analyze the context deeply.

The language should be able to flag mistakes as often as possible. If every random sequence of characters makes a syntactically correct program, that's not helpful. And if variables are automatically created the first time they are used, then misspelling client as cleint will not give you a compile error.

Besides the syntax, the language should have a clearly-defined semantics, and maybe that's even harder than deciding on a decent syntax...

Good examples:

  • In Java, "1" is a string, 1 is an int, 1.0 is a double, and 1L is a long. One look and you know what it is.

  • In Java, = is the assignment. It assigns the value for primitive types and the reference for reference types. It never copies complex data or compares.

  • In Java, calling a method needs parentheses und this way is clearly distinguished from variables - so, if there's no parenthesis, you don't need to search for a method definition, it's just reading data.

Bad examples:

  • In Java, a symbol like client can be nearly anything: a package path element, a class or interface name, an inner class name, a field name, a method name, a local variable, and even more. It's up to the user to introduce or obey naming conventions or not.

  • In Java, the dot . is over-used. It can be separator within the package name, separator between package and class, separator between outer and inner class, connector between instance expression and method to be invoked on the instance, and many more.

  • In many languages, the curly braces of the if blocks are optional, leading to nasty mistakes if someone adds one more statement to the (not really existing) block.

  • Infix operators: sometimes I have to stop at a numerical expression and think hard what it means, step-by-step. We are all used to write math expressions in infix notation like a * b / c * d + e. Most of the time we remember the precedence of multiplication and division over addition and subtraction (but did you realize that we're not dividing by c*d, but dividing only by c and then multiplying by d?). But there are so many additional infix operators with their own precedence rules and in some languages overloading that it's hard to keep track. Maybe enforcing the use of parentheses had been a better approach...

1

I generally tend to fall on to the side of "What would make it easier for me as a programmer". Of course that can mean more than one thing. In Javascript there is almost no type checking, which works great until you hit a weird bug. On the other hand in Haskell there is a lot of type checking which puts more of the work up front but clobbers some classes of bugs.

To be honest I would check out a bunch of languages to see what they do and try to find a niche that none of them hit!

I don't think there is one obvious right way to do it, or at least if there is its not something people have found a consensus on yet. So by creating languages with different type systems we are learning.

Good luck.

Zachary K
  • 10,413
1

I like my languages to do what I mean. Generally that leans pretty hard towards loose. I also would like to be able to tag "strict" on an element or block to be able to debug/analyze that limited area.

Paul Nathan
  • 8,560
  • 1
  • 34
  • 41
-2

You might consider an analogy with natural language. In email, are you a Grammar Nazi? Or are you okay with some grammatical errors, such as split infinitives, missing conjunctions, or misplaced modifiers. The answer boils down to personal preference.

emallove
  • 101
  • 4