Notes on FDL Timeline

1. Foundations of Logical Libraries

Stuart Allen's notes on the structure of the Formal Digital Library, Notes on the Design and Purpose of the FDL, lay out a foundation for the notion of a logical library described in the project Goals section of the Web page. Work on this topic progresses as we approach implementing the capabilities described in Allen's notes.

2. Formal System Cooperation (see Goals -- Section 1.2, Technical Challenges)
2.1 Create model-sharing environment for the Logic of Events (LoE) (Nuprl, MetaPRL, PVS, JProver combined)

For critical infrastructure protection, distributed algorithms are very important. These algorithms are also investigated by the Software Engineering section at the Naval Research Laboratory using the PVS theorem prover in the Timed Automata Modeling Environment (TAME). We want to demonstrate the value of the FDL in algorithm development by combining the capabilities of PVS, MetaPRL and Nuprl through the Formal Digital Library.

2.2 Installing beta-version of sentinels in FDL

It is not possible to simply combine results from different theorem provers. The exact conditions under which a combination is possible is a fundamental matter for logical libraries such as the FDL. One approach to keeping track of logical dependencies is presented in our foundational work on the FDL. We plan to concretely illustrate this mechanism on various shared libraries such as number theory, lists, graphs, trees, and protocols.

2.3 Metamathematics FDL theory -- basis for relating theories

The basic logical results allowing sharing among Nuprl, HOL, and PVS have not been formally supported. We intend to lay the ground work for this by linking theoretical results to the sentinels mentioned above.

2.4 Formal Symbolic Algebra

Computer algebra systems are a substantial source of basic algorithmic mathematics. We are making a small effort to connect some of these algorithms to the provers. This is a major activity in Europe. Our approach is to use the module system of the MetaPRL logical framework to track the domains of a system such as Axiom. The module system and Kopylov's dependent records are both important for organizing the theory. The module system manages system content, like rules and tactics, and the records manage the formal parts like groups, rings, and fields.

2.5 Preparing to support Larch proofs

The corporation ATC-NY has allocated funds to transfer a large number of Larch theorems to the FDL. They are also interested in restoring their Larch prover within the FDL. They have done some preliminary work in this direction, and they plan more when we are ready.

Larch is a language with overloading so after parsing there is a "sort-checking" phase that assigns a sort to every term and chooses a signature for every function symbol. Before we can do any semantic analysis (in particular the checking of proofs) of the Larch theories we have to get them sort-checked and store in the library a version with all function symbols resolved. So the first tool we must build is the sort checker. This is non trivial because Larch has a complex syntax for specifying how traits include other traits with renaming, and that defines the possible signatures for the function symbols.

2.6 Verified decision procedure for abstract algebra and arithmetic

The MetaPRL system is building a very general arithmetic decision procedure. The procedure is being used to derive results in abstract algebra. As the algebra work progresses, it will be used to derive more general axioms for the decision procedure. Elements of this work will be included in the FDL and can be contributed to QPQ as well. Since the procedure generates primitive proofs, it does not need to be verified directly.

3. Collections (see Goals -- Section 2, Computational Content)
3.1 Automating PVS acquisition and posting

Currently the methods used for acquisition of PVS libraries have involved an amount of human supervision that is not practical for repeated efforts of acquiring PVS files for the FDL. The methods should be completed so that no human intervention is required beyond targeting the PVS files.

The acquisition of PVS proofs requires significantly more human intervention because the PVS proof engine often crashes during the process. We must either find a more reliable method of running PVS for proof acquisition, or develop a fault-tolerant method.

Currently although a large number of proofs have been collected into the FDL, they have not been included in our web presentations. Methods for presenting them have yet to be finished, and more incremental methods for managing them as data are required. We must also account for proofs we have been unable to acquire.

A final stage of this work is to verify that the automatic acquisition is correct by replaying the proofs in the FDL.

The entire acquisition, posting and verification work is very compute intensive.

3.2 Collecting formal metadata for PVS libraries

While establishing links between objects is fundamental to acquisition for the FDL, it is quite important to assemble data from those objects to forestall some of the web walking clients would otherwise have to do themselves. For example, it is of general interest among clients which objects refer directly or perhaps indirectly to a given object. In addition to adding such cross-referencing indices to the FDL itself, there are further issues of organization to make them surveyable by human browsers of the Web presentation, since the indices can become long.

Extending the indices for cross-referencing to provide elaborate larger organizations of objects is a further design and programming problem. This can involve the collection of related objects such as all lemmas needed for a proof.

It has been particularly challenging to collect metadata for PVS because there is no notion of primitive proof. We expect to encounter this problem with other proof systems as well.

3.3 Preparing to support Larch proofs

Larch is a language with overloading so after parsing there is a "sort-checking" phase that assigns a sort to every term and chooses a signature for every function symbol. Before we can do any semantic analysis (in particular the checking of proofs) of the Larch theories we have to get them sort-checked and store in the library a version with all function symbols resolved. So the first tool we must build is the sort checker. This is non trivial because Larch has a complex syntax for specifying how traits include other traits with renaming, and that defines the possible signatures for the function symbols.

3.4 Other libraries

We might be able to add HOL and Minlog libraries pending technical discussions over the summer, and we are alert for other opportunities, e.g. a possibility with Mizar.

4. Access to Content
Access Media
4.1 Posting to other presentation media (Helm, OMDoc, MathML)

We have connected Nuprl libraries to Helm and are discussing connections to OMDoc and MathML. We plan to make it possible to connect all FDL content to these presentation resources.

4.2 Dynamic FDL server architecture

Full access to the formal knowledge content is available only at the FDL server and only using the editors and navigators provided for the FDL. Here there is complete access to proofs as well as theorems, definitions, and algorithms. Execution of algorithms is principally in ML, Lisp, and OCaml.

There are currently two ways to access the FDL via web servers: (1) a limited demonstration utility showing that XML formatted objects can be retrieved from the FDL, and (2) projections of material from the FDL intended to show to readers the kind of content available. Three advances to be made are giving a full access to the FDL via a server; supplementing the precomputed projections for browsing with a companion server for dynamic browsing requests too expensive to precompute generally; dynamically tying the relation between the browser presented information and the active FDL more directly, i.e., making the browseable material more a "view" than a "projection."

Semantic Access
4.3 Semantic anchoring of expository text

We will produce examples of expository text that are "semantically anchored" in the formal proofs and definitions of the FDL that support it. The first example of a semantically anchored research paper will be one about Event Systems, to which great value can be added by linking it to the extensive FDL material corresponding to the subject matter of that paper.

4.4 Concise Annotation of FDL content

Work is underway for developing methods to add concise annotations to FDL objects. The purpose of the annotations is to provide brief paraphrases in words for formal concepts and entities of the FDL. These complement more expository texts for readers and also serve as a more focussed basis for word-based search since each concise annotation is about fewer subjects. Further, these concise annotations are considerably less expensive to produce by knowledgeable annotators than are broader expository texts; they are simpler technically to produce since they consist simply of ordinary text, and are therefore amenable to elementary creation through forms on the Web.

Although it is of little intellectual interest, we must provide some recommendations to readers as to which word searches are likely to bear fruit. This could be as simple as an index of prepared search buttons.

4.5 Continued exploration of formula-based search

Concise annotations (as well as semantically anchored texts) serve as entry points via word-search to the FDL contents. But once one has the formula structure in hand, one has a radically different means for specifying search criteria. For example one could specify what it means to be a theorem expressing distributivity of any binary operation over any other binary operation. Development of methods for specifying useful patterns is underway, with particular focus on automatically identifying operators that can be considered interchangeable for most purposes based upon mining theorems relating them.

5. Formal Content Creation (see Goals -- Section 2, Computational Content)
5.1 Create model-sharing environment for the Logic of Events (LoE) (Nuprl, MetaPRL, PVS, JProver combined)

For critical infrastructure protection, distributed algorithms are very important. These algorithms are also investigated by the Software Engineering section at the Naval Research Laboratory using the PVS theorem prover in the Timed Automata Modeling Environment (TAME). We want to demonstrate the value of the FDL in algorithm development by combining the capabilities of PVS, MetaPRL and Nuprl through the Formal Digital Library.

5.2 General content production

A significant part of the overall project effort is the production and integration of new content. Some of this arises as preexisting libraries are moved into the FDL, annotated, and posted. As part of this process, formal metadata is collected and displayed.

Other content arises as we expand the scope or depth of the existing collections, e.g. enhancing graph theory, adding new algorithms such as red/black trees, and adding new theories such as the logic of events. In all cases the libraries are integrated into the FDL, and they are used to test the acquisition and posting capabilities.

5.3 Verified decision procedure for abstract alg and arith

The MetaPRL system is building a very general arithmetic decision procedure. The procedure is being used to derive results in abstract algebra. As the algebra work progresses, it will be used to derive more general axioms for the decision procedure. Elements of this work will be included in the FDL and can be contributed to QPQ as well (see Community on the FDL home page). Since the procedure generates primitive proofs, it does not need to be verified directly.

6. Publication and Communictions
6.1 Papers, lectures and conference preparation

We are writing a variety of papers about the FDL, and R. Constable has again been invited to speak about this work in Europe. Much of June and July will be occupied in preparing articles and presentations, which will appear on the Web site in due course.

The Web site keeps a running account of the publications.

6.2 FDL technical meetings (OMDoc, Peer-to-peer, PVS automation)

We have plans to work with Michael Kohlhase on OMDoc and MathML over the summer. We will also work with C. Jechlitschek in Ithaca this summer. We hope to have a meeting with a PVS staff member to help us automate the PVS acquisition process.

7. Graduate Student Supervision

The professors and research assistants are heavily involved in supervising the work of the graduate students. In some cases involving implementation, this requires several hours per week.

8. System Support

Maintaining and improving the FDL involves a great deal of basic system support. It is necessary to operate several theorem provers, (JProver, MetaPRL, Nuprl, and PVS for now; others later) and relate them. We also support a VNC connection to the FDL.