Showing posts tagged “pl design”. Show All
25th
2008
Dec
permalink

(Non-)Total Functional Programming

I’ve been getting back into language design again. (I haven’t written anything about it since August!)

Not too long ago I was introduced to Total Functional Programming. The paper is very interesting, and at first, the concept sounds great. Increase the ease of proofs of properties of your code at the expense of Turing-completeness. Who needs Turing-completeness anyway.

But after reading some comments on Lambda the Ultimate, something occurred to me. Maybe removing infinite looping, in an attempt to remove a certain kind of partial function — those that never return — is acceptable. But removing partial functions completely is not acceptable (to me).

To implement division, for example, you’d have to modify the definition of it, or change the type to make it impossible to type-check with a zero divisor. But in general, the compiler can’t prove that a runtime value is non-zero, so this forces the programmer to explicitly handle the case all the time. Maybe that’s desirable in some places, but is it always? I’m not sure.

What I take away from this is that, (1) partial functions won’t go away completely and (2) we have to accept some kind of undefined exceptional value.

Both could be eliminated by making your types über-complicated. But I think that’s the wrong way to go. An intuitive type for division is int -> int -> int or perhaps Num n => n -> n -> n. But not something artificial just to make the function total like Num n, NonZero nz => n -> nz -> n.

Tags:
17th
2008
Aug
permalink

Code Maintenance Tool Wish-List

At my day-job, I am forced to maintain code that I did not write, that I am not familiar with, and whose authors no longer work at the company. Did you say “documentation”? Hah! No, my friends… it is just me and the editor.

I am a designer. A prototyping programmer. I do my best work when given vague requirements, lots of freedom, and a blank editor. And in my experience, unlike conventional wisdom, people do not know what they want… That’s a whole post in itself, so I’ll leave it at that.

Not only do I dislike maintenance, I am actually not that great at being a maintenance programmer whose job is to modify or add to an existing (usually unfamiliar) code-base. However, as much as I dislike it, I’m reluctant to say that I’ve actually learned a lot from the experience. I hate it so much that my mind is constantly looking for ways to make it easier. Here is what I wish I had.

Specifying and seeing scope more precisely. For example, I should be able to say these 2 functions can call this, or just the class Foo and this class’s testing module, or this is a project entry-point. Actually, what would be better is to have intended scope, and then have your tool tell you what actually refers to it. (To this end, it would help if dynamic lookup were restrictable. Perhaps whether a function can be looked up dynamically should be a metadata tag on the function.) Additionally, I don’t want to have to search every time I want to see the references to a code object. I want to be able to see this in real-time as I move my cursor, click, or even mouseover something in my editor. The way it is now, finding references feels expensive because you have to search and it takes a long time. As a result, I only do it when I feel like I really have to, which is not that often. But if it were cheap, I would think of it differently. Perhaps I would use it to learn how things are related; I’m really not sure. But I know I would use it differently.

Comments and function names should be indistinguishable from log statements, or at least toggle-able. My editor should allow me to toggle logging of function arguments whenever the function is called. I should never have to type log("in myFunction x=" + x + ", y=" + y), yet somehow I always do. It should be as simple as log(x, y), and this could be solved with something as simple as macros that output the arguments that were evaluated, in addition to the values of those arguments.

Inline, implicit functions instead of code “paragraphs” which are merely lines of code separated by blank lines. These implicit functions would show which variables the block is actually touching as parameters to the function (which can be inferred). These new kinds of function-paragraphs could easily be pulled out into actual functions, with the function application automatically generated in that spot of the code.

Viewing code fully inlined, highlighting patterns. Long functions, as opposed to many short functions, are actually easier to read if you’re not familiar with the code. For one, it is fewer levels of indirection. But also, you can think of functions in a module as a mini domain-specific language. For domain experts, it is infinitely easier to use this terminology. However, when you’re new to the field, you can’t understand anyone because they’re all using what seems like cryptic lingo.

Before the year 2000, who in the world knew what a hanging chad was. But the few experts who coined the term must have found it useful. The first time you heard “hanging chad”, you probably asked, “What the hell is a hanging chad? Pregnant chad??” Likewise, a maintenance programmer should be able to view any function calls he pleases   inlined, basically saying, “Instead of using this term, tell me what you mean in the standard terminology that I already know.” Of course, the editor won’t literally inline the code by copying and pasting; it only displays it that way. After all, code is just data. We can view the code as if it were inlined without throwing away the fact that there is a pattern of shared code. The editor could use something akin to the implicit function-paragraphs I described to keep the patterns already created by the original programmer, and at the same time, slowly teach the maintenance programmer the domain lingo.

Oftentimes when a few classes have commonality, they are refactored into an abstract base class containing the common code and several subclasses which implement or re-implement dynamically-dispatched methods. When you understand the flow of all the classes, this is a great way to factor out patterns and reuse code. However, when you’re unfamiliar with the code, it’s damn-near impossible to look at the flow to try to modify it. A method in the base class is possibly used by the derived classes, but not necessarily. Some might use it, but some might not. Others could use it but also add to it by overriding and explicitly calling the overridden method. There is no adequate tool that I know of that will show me the collapsed view of code for a derived class, to see inherited and overridden methods and all — the final result of all the abstraction and reuse. Firebug does this for inspecting CSS.

Firebug shows you the rules that cascade together to style each element. Rules are sorted in the order of precedence, and properties that have been overridden are stricken out. Each rule has a link back to the file where it came from which you can click to jump to the line.

Any web developer will tell you this is the most useful thing in the world; why not do this for methods in an OO programming language…? With a function, at least you can apply it to get its result in specific cases. With a macro, you can expand it. But with abstract classes — abstractions over classes — you’re stuck having to imagine it all yourself. Sure, you can instantiate the class and then call the methods, but you’d have to do that for each method. Depending on your language, sometimes it’s not even clear what all the methods of a given object are, let alone the source code that created them.

Currently, in order to reuse a piece of code, you have to pull it out (of context) and name it so that the 2 places you want to use it can refer to it (whether it’s a class, a function, a variable, etc.). But pulling something out and having 2 things be semantically linked are separate things. The fact that our tools do not allow us to do one without the other (when sometimes I really do only want one) is an indication of inadequate tools.

There is a general pattern of solving programming problems with an extra level of indirection. However, too much indirection creates more problems, especially when working with code you aren’t familiar with. Or put another way, when someone else works with code you wrote. It is a limitation of the human mind that we must take into consideration.

Programs are written by people, and they must also be read by people. For practical reasons, simplicity and clarity — which amount to human readability — are the first things to be sacrificed when the only benchmark for code being shipped is whether it executes correctly. Like rushing the composition of an essay, which results in prose that can be interpreted by a reader but is not necessarily well-written, code that is rushed is often similarly poorly-written.

It has been said that programs should be written primarily for people, and secondarily for computers to execute. The more I learn and the more I experience, the more I agree with this. And our tools should help this cause.

Tags:
19th
2008
May
permalink

The Prototype-Production Knob

Once you’ve seen the progression that software goes through from birth as a hacker’s one-night-stand, to 3-man garage-startup’s baby, to Small Corp’s stubborn adolescent, to The-Next-Microsoft’s bloated 1000-developer software-engineering nightmare… you simply can’t ignore it and the programming language feature it seems to demand.

Hardening

In the beginning when you have an idea, you want a flexible medium for experimenting with. You really don’t know where you’re going to end up, so you want your medium to just get out of the way and let your creative juices flow. It’s the same in every industry really, whether it be software engineering, architecture, painting, or writing. But once you have a product, and hundreds of other people besides you care about the outcome of every little detail, everything from which shade of gray its background is to what happens when you press the Tab key when the last textbox of the final dialog of your Import Wizard has the focus, you have to worry about things like quality assurance.1

Software, like concrete, hardens over time becoming a rigid unmovable mass. This happens as the original developers move on or simply forget about code they wrote and haven’t touched for a while. Code gets pushed down into layers of abstraction, becoming black boxes that no one ever looks into unless something goes wrong. This is the natural progression as new building blocks get created by combining the functionality of older building blocks. The fringes of development churn like mad, but over time, newer modules start depending on them, weighing them down by discouraging change.

On top of that, shear code size prevents change. Once you have a massive software system built from thousands upon thousands of man-hours, you simply can’t throw it away and start from scratch. Maybe in an ideal world where you didn’t have to worry about paying rent… but if you intend to make a living off of software, it simply isn’t an option.

Once a software system has been grown so large, you’re stuck with it. Steve Yegge talked about this in a blog post, but I think most people who read skimmed it just voted it up on their favorite news site and moved on to the next article. This is so fundamental — size! Not some theoretical cyclomatic metric. Size! And part of the reason size is so important is because once you have a sufficiently large code-base, re-writing it is no longer an option. Which means, changing it is no longer an option.

The code literally solidifies!

The Knob

Concrete naturally hardens over time. But what if your concrete were rigid even when you wanted to constantly mold it. Or what if it never completely hardened, even after you found the perfect form. That is what programming languages are like today. You have to choose the static language that’s too rigid to prototype with or the dynamic language that never completely hardens even in production.

HTML and PHP are good examples of languages that never completely harden. They were great at first; it was so easy to dive right in, and they blew up in popularity as a result. But years later we are stuck with large websites and code-bases which are living nightmares to maintain. Although this is partially the responsibility of the developers, as good developers can write good code in any language, the language itself should support this transition, not hinder it.

On the opposite side, we have languages like ML and Haskell whose type-systems are so strict that most people give up on them before writing a single useful program.2 They are not flexible enough for constant molding. I, of all people, understand the benefits of static type-systems. But I’m beginning to realize that when you’re prototyping, it’s okay to have some runtime errors. In fact, it’s desirable, because prototypes are by-nature underspecified. Any error that is caught statically must necessarily be determined by analyzing the source code alone, not its execution, which means that I must write more code to single-out those error cases. Like the None branch in a case expression that “shouldn’t happen”, it is literally error-handling code that is required by the compiler.

Writing error-handling code is — by definition — code that deals with uncommon special-cases. It’s common knowledge that most code paths don’t get exercised until perhaps years after being out in the wild. Why then should I care about catching them all statically in my prototype? Even in the first released version. It’s naive to think I even can catch them all.

And the problem with writing all this extra code is not that it takes longer to write the first time, but that it takes longer to change each time that you do, which is many many times when you are still in the prototyping phase and the code is constantly churning. So the code-base starts out rigid and gets even more rigid faster.

What we need is a dial — a knob — that can be tuned in the direction we are in: either flexibility for a prototype or rigidity for a production app.

Breaking Things

The problem stems from the fact that when you modify code you didn’t write, you can’t see the big picture. You only have a local view of the code you’re modifying, so you don’t completely understand the ramifications of your changes.

People fail to respect the great differences between writing new code and {modifying or maintaining} code they didn’t write.

Sure, both require knowledge of programming, but they’re completely different activities. In the film industry, the corresponding activities are called completely different things: directing and editing. Both require knowledge of film making, and experience doing one can help improve skills in the other, but they are fundamentally different tasks. When I am writing code from scratch, I start with a blank editor and combine language constructs that I am already intimately familiar with. When I am modifying code that I am not familiar with, my biggest concern is will this change break anything? And most of the time, that’s a difficult question to answer because I only have a local view of the code.3 I don’t completely understand the entire system and can’t see the big picture of what the change will affect. So I usually end up being extremely conservative, inevitably creating cruft that is otherwise unnesessary. Done over and over again, this can be extremely harmful to a code-base.

Basically, if you’re modifying someone else’s code, it’s because that code can not, for one reason or another, be re-written. That code is more rigid, closer to the production end of the spectrum. Now… a lot of effort (and resources) goes into making sure that production code works. So when you’re adding to or modifying code written by someone else, you don’t want to change anything that already works and undo all that effort, nullifying the resources already spent on it.

Today’s PLs

It would be nice if our language allowed us to keep our code nimble as long as possible, and then, when we were ready to push code into an abstraction or let someone else maintain it, solidify the code on cue.

Perl’s use strict allows you to adjust the amount of static checking done on a program. However, no sane programmer that I know of ever turns this switch off for a program more than a few lines long. This seems to say that without the strict option enabled, the language is too flexible even for prototyping. Paul Graham even experimented with implicit variable declarations in Arc, a language designed specifically for prototyping, but decided against it.

The closest feature I know of that resembles what I’m thinking of is optional type declarations. Languages which allow programmers to omit types and optionally insert type-constraints when and where they please are a step in this direction. It allows for flexibility during the prototyping phase and a little more compiler-checked guarantees when inserted. Additionally, it documents the code and allows the compiler to take advantage of type-directed performance optimizations, two things more valuable towards the production side of the spectrum. When an app is a prototype, performance usually isn’t as important as getting feedback on working features, and documentation is a waste because the code is more likely than not to change, rendering any documentation obsolete (and even misleading). Besides, you can always ask the developer who owns the code, as he’s still working on it and it’s fresh in his mind.

Lispers, I’m waiting for you to chime in right about now stating how Lisp has had this feature all along. And that’s great. But if people don’t understand why it’s so great, they won’t use or support it.

So how else can we tune a programming language from flexible to rigid? From dynamic to static?

Feature Flip-Flopping

I suppose that any feature that separates flexible languages from rigid ones is a candidate for being a knob in this regard. But I’m pretty sure this is fallow territory with lots of room for improvement.

For one thing, I think it would be useful to restrict which kinds of decisions are delayed until runtime. The more that is delayed until runtime, the more possibilities there are for errors that are uncatchable until the last moment, driving the cost of the errors up. If you can catch an error as early as compile-time, or even at edit-time with a little red squiggly underline directly in the editor, the cost is only a few moments of a developer’s time to fix it. But if that error is not caught until it’s being run by a client — heaven forbid, on a client’s 7-year-old desktop 273.1 miles away running Windows ME — not only is it extraordinarily difficult to reproduce and track down the error, but one of your paying customers is unhappy, and just might blog about how terrible your software is for all his friends to hear about it.

What kinds of decisions am I talking about? Ones that prevent reasoning about the code without executing it, like modifying the symbol table based on runtime values, calling eval, using reflection, or using dynamic dispatching. These things throw most, if not all, of your reasoning out the window. In general, it’s not possible to determine what the effect of a call to eval will be, so any guarantees are shot. With dynamic dispatching, it’s never quite clear at compile-time what code will be executed as a result of a function call, so again, just about anything could happen. All bets are off.

Again, these features are great for prototyping. They reduce the amount of code you have to write, reducing the amount of time you have to spend changing it while the code is still churning. Additionally, you are probably the one who wrote all the code, so there’s no issue of not being able to see the big picture to understand it.

However, at the same time, these features are bad for the maintainability of production code. It’s true that less code is easier to maintain than more code, as it is simply less that maintainers have to try to understand. But dynamic features actually make code more difficult to grok because they are more abstract. … Calculating offsets of fields. Generating code. Modifying code. Data-flow analysis. Code-transforming optimizations. All of these things are normal programming concepts. But if you add to the end of each phrase “in bed”… Sorry, I mean, if you add to the end of each phrase “at runtime”, they suddenly become horrors!4 In the same way that pointers are simply more abstract, so too is eval and dynamic features like it.

Am I suggesting that people should write code with eval and dynamic dispatching, and then when the code becomes stable, turn off those features and re-write the code without them? It does seem like the logical conclusion from the above observations.

This doesn’t sit right with me though. For one, it would mean re-writing code just when you wanted to solidify it, undoing all the testing effort that went into it.

The first thing that comes to my mind is: is there a way we can compile these features away when they’re switched off? perhaps by collecting data about runtime values and then generating static code which is functionally equivalent for the observed scenarios, explicitly triggering a runtime error otherwise? I honestly don’t know what the right thing to do should be, but I hope I’ve raised some interesting questions for others to consider.


1. Ironically, because you worry about quality assurance and have an entire process to ensure quality of a product before releasing it, you increase the time it takes for a release iteration, thus increasing the cost of bugs or missing requirements. And this is on top of whatever extra cost it took to QA in the first place. But I guess this is like car insurance.
2. One problem is that once a type-system becomes sufficiently complicated, it requires an intimate understanding of it to write programs in the language, which can be a barrier to learning at the least.
3. This is where unit- and regression-tests become imperative.
4. That is, they become horrors to all but compiler writers and metaprogrammers.
Tags:
18th
2008
Apr
permalink

PL What-Ifs

What if you compiled a source language to multiple target languages? gaining the benefit of more than one platform.

For example, what if you were creating a brand new language that you wanted to be type-safe with all the intricacies of Haskell’s type-system, but you wanted to take advantage of libraries written in Ruby. And you created a compiler that first compiled your program to Haskell, ran it through ghc’s type-checker, and then, if it passed, compiled your program to Ruby. You’d get the benefit of Haskell’s type-checker and Ruby’s libraries.

What if a language wasn’t statically typed or dynamically typed? but instead had a knob that could be tuned in one direction or the other depending on the situation.

For example, what if you wanted the benefits of static type-checking, but if you could just access the symbol table or use eval in one or two places in your code, it would be infinitely simpler at the cost of a possible runtime error. And no, this is not the same as implementing everything yourself with some sort of variant type, as all Turing-complete languages could. I’m thinking something more like Haskell’s IO monad that allows you to execute impure code in an otherwise pure setting. In the same way that the IO monad infects everything it touches, so too would the dynamically-typed-code “monad”. But that’s just one way of doing it. Another way would be to specifically declare something to be a variant type whose properly typed value was implicitly projected out.

What if you could visualize the dependency graph of language objects like functions, modules, etc.?

For example, I’ve noticed that projects whose sub-projects have dependencies in a stack (i.e. more like a linear chain) are much easier to grok than those whose dependencies form an intricate cyclic graph. Would seeing these dependency graphs help in spotting possible complexity hot-spots, and thus, possible bug hot-spots? Or would visualizing the dependencies alone help us to better understand them. I’d expect my compiler to generate these automatically, of course, because it’s already doing the dependency analysis anyway.

What if you could inline and un-inline function calls at will as you were editing the code?

For example, some people are good at thinking very abstractly and like to factor out commonalities as much as possible to reduce code. After a point though, diminishing returns are seen as code becomes unintuitive or “unreadable”, deferring the simplest two-time-use definitions to a separate file for example. Where that point is is different for different people however. So what if a sufficient code-editor — i.e. a viewer for data that happens to be code — in addition to skins allowed different users to adjust how many levels functions got inlined. Said another way, what if your editor allowed you to macroexpand and un-macroexpand the code you were editing (inline, not in an output buffer somewhere) at the push of a button, arbitrary levels deep.

…Let us all keep asking questions. About programming and everything else.

Tags: