Week 2 Notes — Arrays

Saturday Jan. 9 to Friday, Jan. 15

Week 2 Video (duration 2:24:58)

Complete your reading Chapter 1 of Kernighan and Ritchie in parallel with watching this week’s lecture. Malan is using different examples than Kernighan and Ritchie, but you are going to see a lot of parallels in the exposition nonetheless. Then, when you get to Chapter 2 of Kernighan and Ritchie, you will have enough examples under your belt to appreciate the grammar of C. A programming language is a grammar, no more and no less. As the compiler recognizes statements that conform to the grammar, it produces some machine code that goes into the executable file — mario.c is turned into mario. This actually involves four steps(!) as you’ll hear in this lecture. If the compiler encounters code that doesn’t match C’s grammar, then it produces as many helpful errors as it can and terminates without creating an executable. Computers are terrible at making correct inferences when there are elisions or minor mistakes. If your program doesn’t conform to C’s grammar, mostly it will just tell you (in gory detail), where it hit the conforming program lines. It certainly won’t make a reasonable inference of what you meant and try to continue.

This is a good moment to mention something Malan deliberately glossed over in Week 1. When you type make mario, you aren’t actually running the compiler. You are running make. In turn, make runs the compiler, which nowadays is no longer cc on most systems (that is what it is called in Kernighan and Ritchie’s book), but a newer program called clang which is short for “C language compiler.” Although clang does the same thing as cc, it does it in a very versatile way that allows for support of the ridiculously large variety of modern computer systems and even other computer languages. Running clang has so many options that it is very useful to let make run it for you rather than running it directly. On the other hand, knowing what make and clang are actually doing can be useful when trying to understand the error messages that will inevitably be spewed by make, and so that is Malan’s first topic.

After you have done this week’s problem set is a good time to read Chapter 2 of Kernighan and Ritchie. The idea of Chapter 2 is to start thinking of C as a rigorously-defined grammar. You’ll still mostly think in terms of the examples Malan has shown you and that you have written, but knowing that there is a grammar and that all the examples you have seen conform to a grammar is perhaps the first beautiful thing that belongs to the subject of computer science. Practically-speaking, knowing what some of the grammars elements are — e.g., statement, expression, declaration, constant, etc. — will help you interpret compiler error messages.

Problem Set 2

Problem Set 2 is to write two programs. One is readability.c. The other is either caesar.c or substitution.c, both of which are very simple examples from the field of cryptography.

My readability.c output for the Harry Potter text:

readability.c output

My caesar.c output for miscellaneous inputs:

caesar.c output

Some comments on caesar.c

First, here is a cheesy, fast workaround

You don’t really need to make a copy of plaintext to hold the ciphertext. You can work through plaintext one character at a time, encipher it one character at a time, and print the enciphered character one at a time. No copy ever needed!

Second, here is the real answer that does not dodge your question

The fundamental thing to understand is what does

int[] some_numbers = {7, 5, 3, 4, 2};

actually do?

The compiler notes that you have five ints requiring four bytes each, and that the initializer lists five of them, and therefore 20 bytes of memory will have to be reserved.

some_numbers is not all the items in memory, nor is it even the first item in the memory. It is called a pointer. It just says where the first item is.

When you later refer to some_numbers[0], that gets the contents of the first location. If you use the jargon that some_numbers is a “reference” then some_numbers[0] “dereferences” the pointer.

Imagine you knew that Skippy was 10 items down aisle four in a Von’s. shelf_stock[4][10] gets you Skippy. But shelf_stock itself is a location. It is not to be confused with either shelf_stock[0][0] which might have Cheerios, or of all of the shelf stock. shelf_stock is just a starting point.

Next, what does

int[] sorted_numbers = some_numbers;

do?

It does not reserve 20 new bytes of memory! It just makes a copy of some_numbers which isn’t what you wanted! To use the Von’s example, the new copy you’ve now got if you did

product[] other_shelf_stock = shelf_stock;

in other_shelf_stock refers to the exact same starting location in the same grocery store. If the store manager decides that Adam’s peanut butter goes at shelf_stock[4][10] then Adam’s peanut butter will now also be at other_shelf_stock[4][10]

C is a pass-by-reference language. Rarely do copies get made. When you want an actual copy (because you are opening another Von’s and they might not have the same store layout, but perhaps you want to start with the same layout), you have to do what is called a deep copy. You don’t yet have the tools yet to do a deep copy. You will need the malloc() function and the memcpy() function, and I suspect both of those are coming next week.

C is an outrageously high-performance language. You can only get higher performance by hand coding the assembler yourself and human time is so costly compared to computer time that almost nobody does that anymore. C’s syntax rarely implies something expensive (like making a deep copy) when it could imply something cheap (like making a new reference reference from an existing reference).

In coming weeks watch for the buzzwords:

pass by reference pass by value pointer dereference deep copy

I’m sure Malan doesn’t intend to leave you in the dark about this for long.