DINDE: Towards more sophisticated software environments for statistics. R.W. Oldford and S.C. Peters Massachusetts Institute of Technology Abstract A prototype statistical system we call DINDE is described. DINDE is aimed at the professional statistician and provides a statistical analysis environment that is more sophisticated than the current generation of systems. In particular, it allows the analyst to keep careful track of the entire analysis as it progresses. General design philosophy and some issues of implementation are described and an example session is presented for illustration. 1.0 Introduction. DINDE is a computer "system" for performing data analysis and statistics. More precisely, DINDE is an enrichment of an extensive interactive programming environment. The programming environment is Interlisp-D with LOOPS which runs on the Xerox 1109 personal workstation (Teitelman and Masinter, 1981, Stefik et al, 1983, and Interlisp Reference Manual, 1983). DINDE provides the usual analytic, graphical, and data management tools, which are accessible from an interpreted language. In addition, DINDE also organizes, tracks, and occasionally guides the use of these tools, all in a distinctive visual format. A personal workstation provides the combination of high interaction, extensive graphics, and powerful dedicated computing required by the ambitious aims of DINDE. DINDE is comprised of many sophisticated interdependent procedures. We rely on the interactive programming environment to help us build, manage, and especially to experiment with this complex system. Why did we invent another statistical system? First of all, like McDonald and Pedersen (1986), we think data analysts and practical statisticians do a special kind of programming work œ especially when they are developing new methods. This kind of programming is referred to as "experimental," "exploratory," or "improvisational" in the Computer Science community. Interactive programming environments are the most appropriate and productive locales for doing experimental programming; this is borne out by our experience and is forcefully argued by Sheil (1983). We built DINDE to take advantage of the interactive programming environment available on the Xerox 1109. Our initial aim has been to facilitate our own research in data analysis and statistics. As we proceed we hope to organize our efforts and make them available to others. Second, we have often been frustrated by working with existing interactive statistical systems. Especially vexing are the multiplicity of languages required to control them (these include command lines or an expression algebra, graphics subsystem commands, macros, "user function" or interface language, implementation language, pre-processors, etc.) and the paucity (often non-existence) of language tools for the lot (i.e., interpreters and compilers, debuggers, inspectors, performance meters). By designing DINDE as an enrichment of the Interlisp›LOOPS environment, all the facilities and tools of Interlisp and LOOPS are available to the user. In particular, the same language features, compilers, debugging aids, etc., used by the systems programmers to construct Interlisp and LOOPS, and which we used to develop DINDE, are also available to the DINDE user. We think we gain important leverage by embedding DINDE in this rich and painstakingly developed environment which allows us to focus our efforts on statistical programming. Third, we want to explore and develop statistical software which is more sophisticated than programs currently available: we envision software which guides the choice of technique, interpretation of results, and management of the analysis. Further, we are interested in studying the strategies used in practical statistical analysis and we perceive that this investigation requires a novel kind of statistical system (see Oldford and Peters, 1985a). DINDE is being constructed to meet what we anticipate might be the needs of such study. The system we describe here is about a year old. Many details remain to be settled; others have been chosen by virtue of their expedience. DINDE is mostly a framework awaiting more hard work. Even so, the ease of use and usefulness of this approach are already becoming clear. In subsequent papers we plan to elaborate on the programming environment offered through DINDE. This paper introduces the novel user interface and some of the philosophy underlying the design of DINDE. In building DINDE, we have been strongly influenced by the S data analysis and graphics system (Becker and Chambers, 1984) and often borrow ideas from it. The system design issues raised by David Donoho's DART (Donoho, 1983) have also impacted our work: we think many of Donoho's concerns are met by the programming environments now available on LISP machines. (DART stands for Data Analysis Research Tool; we regretted that the acronym had already been claimed and so had to settle for the self-referential "DINDE": DINDE Is Not DART Exactly. "DINDE" also has the virtue of being self-deprecatory, "dinde" being French for "turkey".) The plotting capabilities of DINDE rely on a collection of graphical functions developed by Jan Pedersen of Stanford University and Xerox AI Systems, who agreed to let us try out early versions of his work. (Pedersen's graphics package will also be incorporated in IDL, a system for data analysis developed at Xerox by Kaplan et al (1981)). Recent software that shares some of the objectives of DINDE has been described by Carr et al (1984); its underlying philosophy has been detailed in Nicholson et al (1984). That software appears to be very closely related to a formalism of data analysis proposed recently by Thisted(1985), but differs in many aspects from DINDE. This paper is organized as follows. Section 2 is a very brief introduction to the characteristics of personal workstations, to the idea of an interactive programming environment, and to the ``object oriented" approach to programming which pervades DINDE. Readers already familiar with the advantages of these modern computing technologies may skip this section. DINDE is then introduced in Section 3 with an example emphasizing the interactive, evolutionary nature of data analysis. The key themes underlying the implementation are developed in Section 4, while Section 5 relates DINDE to our current views on the study of statistical strategy. Some concluding remarks are given in the final section. (LIST ((PAGE NIL (PAPERSIZE Letter FOLIOINFO (ARABIC ) STARTINGPAGE# 1) (0 0 612 792) ((FOLIO NIL (PARALOOKS (QUAD CENTERED) CHARLOOKS (SUPERSCRIPT 0 INVISIBLE OFF SELECTPOINT OFF PROTECTED OFF SIZE 10 FAMILY CLASSIC OVERLINE OFF STRIKEOUT OFF UNDERLINE OFF EXPANSION REGULAR SLOPE REGULAR WEIGHT MEDIUM INVERTED OFF USERINFO NIL STYLE NIL) FORMATINFO (ARABIC )) (162 36 288 36) NIL) (HEADING NIL (HEADINGTYPE DINDE) (120 3600 492 36) NIL) (TEXT NIL NIL (72 72 468 648) NIL))) (PAGE NIL (PAPERSIZE Letter FOLIOINFO (ARABIC )) (0 0 612 792) ((FOLIO NIL (PARALOOKS (QUAD CENTERED) CHARLOOKS (SUPERSCRIPT 0 INVISIBLE OFF SELECTPOINT OFF PROTECTED OFF SIZE 10 FAMILY CLASSIC OVERLINE OFF STRIKEOUT OFF UNDERLINE OFF EXPANSION REGULAR SLOPE REGULAR WEIGHT MEDIUM INVERTED OFF USERINFO NIL STYLE NIL) FORMATINFO (ARABIC )) (162 36 288 36) NIL) (HEADING NIL (HEADINGTYPE DINDE) (120 3600 492 36) NIL) (TEXT NIL NIL (72 72 468 648) NIL))) (PAGE NIL (PAPERSIZE Letter FOLIOINFO (ARABIC )) (0 0 612 792) ((FOLIO NIL (PARALOOKS (QUAD CENTERED) CHARLOOKS (SUPERSCRIPT 0 INVISIBLE OFF SELECTPOINT OFF PROTECTED OFF SIZE 10 FAMILY CLASSIC OVERLINE OFF STRIKEOUT OFF UNDERLINE OFF EXPANSION REGULAR SLOPE REGULAR WEIGHT MEDIUM INVERTED OFF USERINFO NIL STYLE NIL) FORMATINFO (ARABIC )) (162 36 288 36) NIL) (HEADING NIL (HEADINGTYPE DINDE) (120 3600 492 36) NIL) (TEXT NIL NIL (72 72 468 648) NIL)))))((HHŒ(CLASSIC CLASSIC CLASSIC H% ÆÖPä×KÂŒ zº