Annual Report, 2002-2003

Building Interactive Digital Libraries of Formal Algorithmic Knowledge

Contract Number ONR N000014-01-1-0765
Reporting Period: May 1, 2002 -- April 30, 2003
Submission Date: June 1, 2003

Prepared by:

Robert L. Constable
Cornell University, 4130 Upson Hall, Ithaca, NY 14853

Summary

This is a project to design and create a software system for sharing formal algorithmic mathematics among theorem provers, and for making formal algorithmic mathematics accessible to people who value verified accounts of algorithms. The project is also committed to creating interesting specimens of formally explained algorithms. In the first year, we invested heavily in building a software infrastructure that includes a prototype Formal Digital Library (FDL) and procedures for storing formal content in it and procedures for presenting that content on the Web.

Our work enables a new approach to CIP/SW; we call it information-intensive infrastructure protection. We describe the rationale for this approach in this report.

This second year of the project has been a period during which the basic infrastructure and results established in the first year have borne visible fruit, and during which we have considerably enriched that infrastructure to support results anticipated in the third and critical year. During the next year (third) we expect to demonstrate our progress over two-and-a-half years and make a strong case for the optional two additional years of funding.

The most "visible" results are the collections of algorithmic knowledge posted on the Web from the Formal Digital Library (FDL). The collections are from three provers -- MetaPRL, Nuprl, and PVS. For the Nuprl and PVS collections, we have harvested formal metadata whose value is directly apparent.

We have also demonstrated new direct access to the FDL using a new navigator tool and VNC (Virtual Network Computing). Extensive documentation and user manuals are available at the FDL Web page.

Among the collections are important new verified algorithms such as Red/Black trees, a small collection of graph algorithms, and a linear arithmetic package used in theorem proving. These provide the basis for illustrative Web-based articles that are semantically anchored in the FDL collections.

The additions to the infrastructure and basic capabilities over this period are not yet as visible. We have written a considerable amount of code for harvesting formal metadata. This code will be critical as we work this year to automate the Web-posting process. We have also explored mechanisms for formula-based search and for automatically clustering the FDL objects based on the latent semantics of the link structure.

We have extensively studied and explained fundamentals of "Logical Libraries" and how one might effect them. The two theoretical issues are, first, how to build a repository of certified (digitally expressed) knowledge as opposed to mere information, and second, how to enable different, sometimes incompatible, methods of certifying such knowledge to be accommodated in the same repository. Different clients may have radically different criteria for what counts as verified knowledge, and yet may be able to agree on substantial aspects and so share large parts of the knowledge repository. These issues are resolved by maintaining records of certification and strict accounting for bases of knowledge.

The "openness" of such repositories practically entails that criteria for certification include calls to agents external to the repository; that methods for developing material for contribution to a logical library can be developed with the same accounting methods used in it; and that contributions not interfere with extant content (for example, name collision must be avoidable).

Issues of cognitive accessibility include the attachment of informal explanatory material to the logical material, provision of extra-logical organization of logical and other content, as well as utilities for exploiting the logical content in concert with the informal content. We continue to investigate search based upon formula patterns, complementing search for words leading to content via concise annotations (see section below on Goals).

Larger social issues are discussed, including the cooperation of distinct repositories in ways that respect their independence and account for integrity of content composed from different sources, and the practicality of independent implementation of repositories to avoid an intolerable dependency on a few institutions.

We have continued our investigations of the theoretical foundation of inter-theory sharing, and the foundations for presenting a class of algorithms that is especially relevant to protecting the nation's critical software infrastructure, namely distributed algorithms and protocols. These foundational results were selected to complement investments being made by the Naval Research Laboratory in software engineering for reliability of distributed systems.

We have made significant progress towards practical methods of reflecting syntactic and computational aspects of logics. The bulk of the methods pertain to reasoning about expressions abstractly and are intended to be applied to the abstract syntax of typical contributions to the FDL itself. Logics sharing the FDL as a medium will then be able to refer to themselves and each other. This may be expected to expedite metamathematical work relating multiple logics, as it is hoped that it will provide a practical medium for enhancing various logics by reflection.

We expect to bring all of these threads together in the third year to demonstrate the power of information-intensive critical infrastructure protection. Our efforts are fundamental and progressive. That they are fundamental can be seen through their ties to two other MURI projects -- SPYCE, and Language-Based Security -- as well as through ties to research at the NRL. The fact that they are progressive can be seen from the fact that we are the only such project in the U.S., and we are highly competitive with European efforts that are far more extensive and well funded.

Index of Accomplishments

Basic FDL
    Navigator manual
    VNC interface
    Additional collections
    XML interface
        PVS proofs
        MetaPRL proofs

Web-based Presentation of FDL Collections
    PVS standard libraries
    Graphs -- PVS, Nuprl
    Metadata harvesting (see Nuprl Library and PVS Library)
    Formal metadata (see Nuprl Library and PVS Library)

Creating New Content
    Red/Black trees
    Graph algorithms
    Linear arithmetic
    Distributed algorithms

Foundational
    Abstract object identifiers
    Certificates and sentinels
    Relating theories -- thesis work
    Reflection -- thesis work (to be demonstrated at LICS '03)

Relationship to CIP/SW

Goals

Publications