AI/ML News & Innovations Hub

(You need to know programming to understand this post. If you know what linked lists are, that’s enough to get the general point, but more knowledge would be more helpful.)

Within the Programming Languages community, there’s a subcommunity that thinks a lot about education, especially for introductory courses. Two main approaches are SICP approach and the HtDP approach (which addresses shortcomings in SICP). I have taught the SICP approach several times at Berkeley using Scheme, and have talked with people who have taught the HtDP approach several times. At Berkeley, the first thing we teach is the sentence data structure (from Simply Scheme), whereas HtDP starts with the linked list data structure. I want to talk about one advantage of the sentence approach (but note that I’m completely ignoring the main justification for HtDP’s approach, which is mostly orthogonal to this point).

First, an aside, to explain what these data structures are. With the linked list approach, students are only given cons, car, cdr, and the empty list '(). (You may know these as cons, first, and rest, or cons, item, and next.) They write standard list functions like list, append, and length themselves using recursion. Some examples:

> (cons 1 '())(1)> (define lst (cons 1 (cons 3 (cons 5 '())))> lst(1 3 5)> (car (cdr lst))3> (cons 7 lst)(7 1 3 5)> (cdr lst)(3 5)

Note in particular that the first argument to cons must be an element of the list, while the second argument must be a list. If you disobey this rule you get strange results (or possibly an error message, depending on what language you are using):

> (cons 1 3) (1 . 3) > (cons 5 (cons 6 7)) (5 6 . 7) > (cons lst lst) ((1 3 5) 1 3 5)

Sentences are similar, but they remove most of the restrictions on cons. Instead of cons, we have sentence, which can take any number of arguments, which can all be lists, or elements of the list. Instead of car, we have first. Instead of cdr, we have butfirst. In addition, we add selectors last and butlast. Sentences are typically made up of words and not numbers:

> (define request (sentence ‘pass ‘me ‘the ‘salt)) > request (pass me the salt) > (sentence ‘would ‘you request) (would you pass me the salt) > (define prefix (sentence ‘would ‘you)) > (sentence prefix request ‘please) (would you pass me the salt please) > (first request) pass > (butfirst request) (me the salt) > (last request) salt > (butlast request) (pass me the)

Sentences are implemented as linked lists that are forced to be flat (not nested). The sentence constructor looks at all of its arguments and smooshes them all together into a single flat list using combinations of cons, list, and append. first and butfirst are the same as car and cdr. last and butlast are implemented as recursive functions on lists. (This is a bit simplified — in reality these functions work on words too, but we can ignore that here.)

I am a fan of sentences because students don’t have to worry about the type and number of arguments that they give to sentence — it just works and does the obvious thing. On the other hand, with cons, you have to make sure that you only give it two arguments, the first of which is an element, and the second of which is a list. This is hard for students to keep track of in addition to all the other things they’re keeping track of. (Once we introduce cons to them, they have all sorts of bugs involving incorrect applications of cons.)

One proponent of the HtDP approach has suggested that linked lists are better because it’s easier to understand how it works. (Again, this is not the main argument for the HtDP approach.) cons is the simplest possible thing you can think of, all it does is “glue” any two things it is given together. On the other hand, sentence is fairly complicated — it’s not clear to introductory students how on earth it “knows” to distinguish between words and sentences and do the right thing with each of them. Students would have no idea what’s going on behind the scenes of the sentence abstraction.

One issue I have with this is that students don’t know what’s going on behind the scenes of cons either — “gluing two things together” is still an abstraction. Sure, sentences are a higher level of abstraction than linked lists, but I don’t see why that matters. Perhaps it’s easier to believe that cons works than to believe that sentence works.

But the real issue I see here is that this puts a focus on the operational semantics (how the thing works) instead of the denotational semantics (what the thing does). It seems to me that you should aim for the simplest possible denotations, not the simplest possible operations. (Use the data structure where it is easy to explain what it does, as opposed to how it does it.) One of the central tenets of programming is abstraction, where you don’t think (much) about how the components you are using work, but just know what they do so that you can build them into bigger components. The denotations of sentences are much easier to explain, understand and use than the denotations of linked lists (to introductory students, at least).

This reasoning suggests changes to the ways we teach other programming topics:

Data structures: It may be preferable to first simply list out all the data structures that the class will consider, list the methods that they support and their running times, and first teach the skill of how to pick data structures for the task at hand and later teach the skill of how to come up with new data structures and to implement them yourself.
Memory management: Instead of explaining how a C program has both a “stack” and “heap” and “pointers”, you could start working in a pedagogical language where there are no functions and memory is a single long (but not infinite) array. There are no registers and no calling conventions, but you could still have high-level primitive operations (such as drawing pictures), values in memory could have different sizes, and values could retain type information. You could then show how to build calling conventions, the stack, the heap, and the idea of pointers from there. (This is all assuming that students already understand arrays from some other higher level language.)
Security: Rather than delving into real-world security problems (SQL injection, cross-site scripting, buffer overflows), start with a pedagogical application with security flaws that don’t depend on an intimate understanding of the underlying programming language or technology or protocol that’s being used. For example, the application could have TOCTTOU vulnerabilities — perhaps when you click “sign up and get $25 added to your account”, it redirects you to a page where you have to confirm, after which the $25 is added. It only checks whether you’ve already signed up for a free trial on the first page, so you can click the button many times and confirm all of them, to get as much money added to your account as you want. It could allow you to ask queries about your friends, but simply assume that the name you enter is a friend instead of checking, leading to privacy violations (bad input validation is what caused Heartbleed and is typically the entry point for buffer overflow attacks). It could have a secret admin panel, that relies on security through obscurity of the URL. You could also build a simple interpreter for a programming language and show how using this language with user input allows for injection attacks. (It’s not clear that this is simpler than just showing SQL injection though.)

Teaching from Simple Abstractions