Tuesday, February 21, 2012

The Real Reason for Lexing

All too often I see introductory texts say that parsing is really a two phase process:
  1. lexing
  2. (actual) parsing
And they say that phase 1 is done for efficiency reasons. Which is right, but only in an indirect manner. The reality of the situation is that people need(ed) to do parsing using only 1 token of lookahead. If you don't split your stream into tokens, even trivial languages cannot be parsed deterministically: e.g. seeing an "i" you don't know (yet) if what follows is an if-then-else statement or the identifier "iamalongidentifier". However, you can hack around this problem by first tokenizing/scanning/lexing the input stream. Then the lookahead is the entire string "iamalongidentifier" (or "if"), which solves the determinacy problem. There you have it: the real reason for this (historical) split-up into phases.

Wednesday, February 8, 2012

Proving the Strong Induction Principle for Natural Numbers in Coq



The following following inference rule:

        P 0         forall n : nat, (P n -> P (n + 1))
(SI)    ----------------------------------------------
                       forall n : nat, P n


is the structural induction (SI) principle for natural numbers, where P is the predicate to be proven. It says that if you can prove a property P holds for 0 and also prove the implication P n -> P (n + 1) for any natural n, then P holds of every natural number.

In contrast, consider the following inference rule:

        P 0         forall n : nat, ((forall m <= n, P m) -> P (n + 1))
(CI)    ---------------------------------------------------------------
                               forall n : nat, P n


This rule is the complete induction (CI) principle for natural numbers, also called the strong induction principle or the course-of-value induction principle. The difference is that in the inductive case, you are allowed to use the inductive hypothesis for any value which is structurally smaller than the current value. This makes CI seem stronger than SI.

In Coq, the (automatically generated) induction principle for inductive types is structural induction. We can check this by (re)defining naturals:

Section mynat.

Inductive nat : Set :=
  | O : nat
  | S : nat -> nat.

The constructor O denotes the natural number 0 and the constructor S denotes the successor of a natural number. The structural induction principle generated by Coq:

Check nat_ind.

<pre> nat_ind : forall P : nat -> Prop, P O -> (forall n : nat, P n -> P (S n)) -> forall n : nat, P n </pre>

corresponds to the (SI) rule above.

End mynat.

Fortunately, the two induction principles can be obtained one from the other. This blogpost presents the formal proof of the strong induction principle from the structural induction principle for natural numbers.

Section InductionPrinciple.

Require Export Datatypes.

We first define the proposition to be proven by induction as a parameter of the section.
Parameter P : nat -> Prop.

We also need a technical lemma to perform case analysis of less-than-or-equal-to.
Lemma le_S :
  forall n m : nat,
    n <= S m -> n <= m \/ n = S m.
Proof.
  intros.
  inversion H.
  right. reflexivity.
  left. assumption.
Qed.

We obtain the strong induction principle by first applying structural induction to prove the predicate Q(n) = (forall m <= n, P m).
Theorem strongind_aux :
  P 0 ->
  (forall n, (forall m, m <= n -> P m) -> P (S n)) ->
  forall n, (forall m, ((m <= n) -> P m)).
Proof.
  induction n as [ | n' IHn' ].
    intros m H1.
    inversion H1.
    assumption.
    intros.
    assert (m <= n' \/ m = S n'). apply le_S. assumption.
    inversion H2.
    apply IHn'; assumption.
    rewrite H3.
    apply (H0 n'); assumption.
Qed.

It is then very easy to show that Q(n) implies P(n).
Lemma weaken :
  (forall n, (forall m, ((m <= n) -> P m))) -> (forall n, P n).
Proof.
  intros H n.
  apply (H n n).
  apply le_n.
Qed.

And we obtain a proof of the strong induction principle by applying strongind_aux to show Q(n) for all n and then apply weaken to obtain P(n) from Q(n).
Theorem strongind :
  P 0 ->
  (forall n, (forall m, m <= n -> P m) -> P (S n)) ->
  forall n, P n.
Proof.
  intros.
  apply weaken.
  apply strongind_aux; assumption.
Qed.

End InductionPrinciple.
¯