Tuesday, April 8

The "boolean" disease

I was just reading the linux kernel coding style guide, and I am once again frustrated by how poorly people understand Boolean algebra.

Here's the offending quote:

Functions can return values of many different kinds, and one of the most common is a value indicating whether the function succeeded or failed.  Such a value can be represented as an error-code integer (-Exxx = failure, 0 = success) or a "succeeded" boolean (0 = failure, non-zero = success).

Mixing up these two sorts of representations is a fertile source of
difficult-to-find bugs.  If the C language included a strong distinction between integers and booleans then the compiler would find these mistakes for us... but it doesn't.
 So what's wrong with that?

It's essentially a failure of the entire educational system that a statement like that goes unchallenged.

Here's an example of Boolean multiplication on integers:
┌──┬─────────────────────────────┐
│*.│ _4  _3 _2 _1 0  1  2   3   4│
├──┼─────────────────────────────┤
│_4│  4  12  4  4 0 _4 _4 _12  _4│
│_3│ 12   3  6  3 0 _3 _6  _3 _12│
│_2│  4   6  2  2 0 _2 _2  _6  _4│
│_1│  4   3  2  1 0 _1 _2  _3  _4│
0│  0   0  0  0 0  0  0   0   0│
1│ _4  _3 _2 _1 0  1  2   3   4│
│ 2│ _4  _6 _2 _2 0  2  2   6   4│
│ 3│_12  _3 _6 _3 0  3  6   3  12│
│ 4│ _4 _12 _4 _4 0  4  4  12   4│
└──┴─────────────────────────────┘

And here's an example of Boolean addition on integers:
┌──┬─────────────────────┐
│+.│_4 _3 _2 _1 0 1 2 3 4│
├──┼─────────────────────┤
│_4│ 4  1  2  1 4 1 2 1 4│
│_3│ 1  3  1  1 3 1 1 3 1│
│_2│ 2  1  2  1 2 1 2 1 2│
│_1│ 1  1  1  1 1 1 1 1 1│
0│ 4  3  2  1 0 1 2 3 4│
1│ 1  1  1  1 1 1 1 1 1│
│ 2│ 2  1  2  1 2 1 2 1 2│
│ 3│ 1  3  1  1 3 1 1 3 1│
│ 4│ 4  1  2  1 4 1 2 1 4│
└──┴─────────────────────┘

Notice how, if we restrict ourselves to the 0 and 1 we get "binary and" and "binary or".

Anyways, somewhere along the line, someone (probably Henry Sheffer) decided that "logical not" should be a part of boolean algebra. Essentially, it's "proof by vigorous assertion". Algebras existed before Henry Sheffer, and Boole's work existed before Henry. Henry's work proved an equivalence between logic and Boole's system - for the case where we restrict ourselves to 0 and 1.

But what do we gain from this?

Answer: not much.

Meanwhile, there's another way of formulating logic using 0 and 1 which gracefully incorporates the idea of logical negation:

Instead of using Boole's multiplication (which is defined on integers) we can instead use regular multiplication. And, for logical not, we can use 1-x. And that's all we need, we can use Henry Sheffer's approach to define logical or: x OR y is 1-((1-x) * (1-y)).

And what do we gain from this?

Answer: probability and logic can be seen as two facets of the same system.

Something like this:

┌───┬──────────────────────────────────────────────────┐
│*  │0 0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1  │
├───┼──────────────────────────────────────────────────┤
│0  │0 0    0    0    0    0    0    0    0    0    0  │
│0.1│0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1│
│0.2│0 0.02 0.04 0.06 0.08 0.1  0.12 0.14 0.16 0.18 0.2│
│0.3│0 0.03 0.06 0.09 0.12 0.15 0.18 0.21 0.24 0.27 0.3│
│0.4│0 0.04 0.08 0.12 0.16 0.2  0.24 0.28 0.32 0.36 0.4│
│0.5│0 0.05 0.1  0.15 0.2  0.25 0.3  0.35 0.4  0.45 0.5│
│0.6│0 0.06 0.12 0.18 0.24 0.3  0.36 0.42 0.48 0.54 0.6│
│0.7│0 0.07 0.14 0.21 0.28 0.35 0.42 0.49 0.56 0.63 0.7│
│0.8│0 0.08 0.16 0.24 0.32 0.4  0.48 0.56 0.64 0.72 0.8│
│0.9│0 0.09 0.18 0.27 0.36 0.45 0.54 0.63 0.72 0.81 0.9│
│1  │0 0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1  │
└───┴──────────────────────────────────────────────────┘

┌───┬──────────────────────────────────────────────────┐
│or │0   0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1│
├───┼──────────────────────────────────────────────────┤
│0  │0   0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1│
│0.1│0.1 0.19 0.28 0.37 0.46 0.55 0.64 0.73 0.82 0.91 1│
│0.2│0.2 0.28 0.36 0.44 0.52 0.6  0.68 0.76 0.84 0.92 1│
│0.3│0.3 0.37 0.44 0.51 0.58 0.65 0.72 0.79 0.86 0.93 1│
│0.4│0.4 0.46 0.52 0.58 0.64 0.7  0.76 0.82 0.88 0.94 1│
│0.5│0.5 0.55 0.6  0.65 0.7  0.75 0.8  0.85 0.9  0.95 1│
│0.6│0.6 0.64 0.68 0.72 0.76 0.8  0.84 0.88 0.92 0.96 1│
│0.7│0.7 0.73 0.76 0.79 0.82 0.85 0.88 0.91 0.94 0.97 1│
│0.8│0.8 0.82 0.84 0.86 0.88 0.9  0.92 0.94 0.96 0.98 1│
│0.9│0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1│
│1  │1   1    1    1    1    1    1    1    1    1    1│
└───┴──────────────────────────────────────────────────┘

So why the mixup?

In part: people are lazy. I know I'm pretty lazy about this issue, and sometimes careless of how I research it. But I'm certainly not alone. People mostly just don't care. And *that* says a lot about the value of education.

Education has a lot to do with satisfying the teacher, and not so much about historical accuracy. People like to refer to names that most people don't bother looking up (who has the time?). And of course, this suggests that educational literature is rife with mistakes - most of it is crud (but not all).

Still, there's the issue of naming. If "boolean" is the wrong name for the concept people are getting at when they talk about distinguishing between "booleans and integers" what are some good names?

One obvious name might be "logical". That's closer to accurate than "boolean". And that leads to "true" and "false" being satisfying names for variables of this type. In the above kernel style guide excerpt, I think it would be good to replace "boolean" with "logical value".

Another obvious name might be "bit". And a variable of type "bit" really ought to be able to represent the values 0 and 1.

But that won't stop fans of "proof by vigorous assertion", will it? For them, historical accuracy and utility take back seat to vehemence. It's actually kind of amusing to watch.




No comments:

Post a Comment