4.0 Underlying ideas. In this section we describe the key ideas which underlie DINDE. The presentation has been divided into three subsections. In the first of these we describe the contents of the toolbox that collection of basic DINDE classes which one may use to build an analysis. The way analyses are built and recorded in DINDE is treated in Section 4.2. Section 4.3 discusses the manner in which we feel statistical expertise can be reasonably placed within a system like DINDE. The discussion here focusses on issues and philosophy of design, detailed documentation of the components appear elsewhere. 4.1 The toolbox. The toolbox contains those classes of objects which can be created and used in a statistical analysis. Since the analysis is represented as a collection of networks whose nodes are these objects, it is critically important that the classes represent meaningful chunks of information. This is perhaps one of the most challenging aspects of creating a sophisticated system like DINDE. It requires an identification and grouping of the elements that are important in a statistical analysis, a statistical taxonomy of sorts. To this end we suggest a coarse partition of such elements into six basic element types: (i) Data, (ii) Graphics, (iii) Situations, (iv) Models, (v) Tables, and (vi) Designs. To date only the first three of these exist in DINDE. Only these were necessary to build a prototype bivariate regression analysis, but we anticipate that Models (such as parametric probability models), Tables (such as many way contingency tables), and Designs (such as experimental and survey designs) will be required before long. Object oriented programming is pervasive in DINDE and nowhere is this more evident to the user than in the toolbox. Here a network is displayed which shows both the classes of objects available and their relationship one to another. It begins with the most general class DINDEObject. The immediate children, or specializations, of this class are shown to the right of DINDEObject and attached to it by a line segment as below. (Far from serving as a simple place-holder, the class DINDEObject defines or "mixes-in" a wide variety of important behaviors, among these are the ability to name and annotate objects, the basic linking operations, and certain mouse responses. The descendants of DINDEObject for the most part inherit these behaviors "orthogonally", that is, without further specialization.) L""""""  pq1ٶ1ن  0 , 0 """"""0 """""""0 000 m` ! m<03 mf`ٶ` m``0ٶ m`` !ٶ mf`ٶ`͎<80   0 """""""0 """"""""0 0 !0}1c6`{16` 1c6b1c6` ` """""""" The Data factor of our classification is represented here by the class Array. These classes have further specializations which appear in the toolbox. We discuss each in turn. The class Array has three different specializations corresponding to the three modes for data values that we have adopted: floating point numbers, character strings, and Boolean values. These subclasses are BooleanArray (an array of data which take on the values true, T, or false, NIL), FloatArray (an array containing floating point numbers), and StringArray (containing arbitrary character strings). Array is also specialized into the three familiar array shapes: Matrix, Vector, and Scalar. Each of these in turn have specialized subclasses leading to the following network for the data. L`` 8` ` 8 g>}`yll36ma6mfoxcg ٳo36aOA6mmc ٳl3>a€6mfmcc ٳl3caÀ6mfmcggcapll  8 | @@@8~> 0 0  ?cǟ<`@6mffflfA6mbG f 6mfc f@p6mfalf@aLJ<p   @8    | |@͘c l3`@f0 `fx8@mfla@0afl ` af|  @` af  @@a>@h (@f @, < # @@`` @0@0  0~ @Vp`1` ) ``}8A`@ |ن1 cl @@``ٞ1; L @  P0@  @ "@ 8""""" $ @H0 P@B@! !F~ h`1`@`0`|xD#|ن0͙q&"$L@0`ٞ0HͶB0`ٶ0x0x؃`ٶ00͙0P @ `0x l0p#0 @20 l0|  !@`@` 4  P00(  F $@8@""""" 0  <@ !`0 f @` 3px8 6a<3 0 6g< ` `6ml@`6mfͳlA` P 3|x>@0@@ @ P `  `@80"""""  @0 `   @@p@`@  ` @`C  q @ qp0@{>f @ 0`{<@@n  n @n<@` dώf@ @  x `@8  @<   @""""" <  0,@@ @@   @0  ` @ @  @{ 0@ <@  0 @@?`@`^ """""" !  `p``@O@@ @a 0 a`8z}>}@3<  mf1x"33fc6@afݞ10?`c6@afݶ100`c6  afݶ1x 3fc6 a>ɟ <9@ ?` f@<@   @@""""""  @   @` @>||xf͙@fH fx  @f0͙@@f|0x  x @`""""""@@@  @@  !,x00  }>qafmfyf af=y6 faf fـ6mf af͛fـa>} f<"""""" The inheritance in this network runs left to right, from parent to child. It allows us to collect together those methods and variables which classes share and to attach them to a common ancestor. For instance, all FloatScalars, FloatVectors and FloatMatrices should be able to take the logarithm of their elements. Hence, the method LOG is defined for a FloatArray and is inherited down through the hierarchy by the others. It is not defined for Array because it makes no sense to try to take the log of either BooleanArrays or StringArrays. Similarly, BooleanMatrix, FloatMatrix, and StringMatrix are also all children of Matrix. This lets us attach strictly matrix attributes (like transpose) to the single class Matrix. (We now think that the Array hierarchy presents too much detail at too primitive a level. In a separate paper we describe new, higher-level data objects which address the special needs of statistics and data analysis (Oldford and Peters, 1986c).) The network of classes below Graphic is the following. L̀0f͞|3<ͳ0 0DDDDD@0`<0f@ ̀0A<͞|P ͳ0@ 0 0B 0 @@ DDDDD@@0 11 1<}x?LfͰ1 f̓1f͆1f͆1<}  x @@@@DDDDDDD @ @`@A0`@0@>g;alCl l@; l@3g@@   DDDDDDD0000`30`6``0`6``0`6 >ٳπ3dz7>3dz31c6ٶf6ll3c66ll33/c6f6l1c6l1#lc6f6l0lc6l1lc66f6il6lc66il0lǏ93À33ϷǏ933ϰ@@@@@ @   @@@A@ @DDDDDDD @ @!@@ @  0    ٰ l   ρ||6l3{||0 ff͙m͛-0f͙6ff͙3Hf6ff̀ ̀6x;f̀6ff͙ ͚m60f͙3Äf|6 0f|0@@@  DDDDDD@@@ DDDDDD@@@ A@ 66y=x636m3CP36m36m[[36mͳxp7͆x@00BDDDDDD@@ DDDDDDDDDDD@@@A G @ lc 1 lc<1fy lcf301f Llcf@1f lcf1f8 mcf365f`<>@@``DDDDDDDDDDD@ At present, all DINDE graphics are based on a simple one- and two-dimensional plots, as represented by 1DPlot and XYPlot, respectively. For illustration, consider XYPlot. This graphic class has all the information necessary to produce a two-dimensional scatterplot of points. Information like the X and Y coordinates of the points, how to plot the points, how to label them, and so on, is collected here. IndexPlots are specialized XYPlots that automatically provide X coordinates which are the numbers 1 to N, indices for the components of Y. Similarly, QQPlots require both X and Y to be sorted. ScatterPlots are just those XYPlots to be used in a statistical analysis and hence have slightly more information attached to them, like how to add to the plot a least-squares fitted line, or some simple smooths. This brings up an interesting point which was not encountered with the purely data constructs like FloatVectors. With Graphics, behaviors and information are now attached to a given class which are of interest only in a particular statistical context. Adding a smooth to a scatterplot is a statistical procedure useful for exploring the apparent dependence of one variable on another it is not a useful adjunct to all possible XYPlots. The additional ability to smooth positive and negative Y's separately is a very useful device when the Y coordinates represent residuals, hence the further specialization to ResidualScatter plots. Specializations are thus created to sort out the pertinent statistical procedures and information. This results in having the tools most accessible when they are most needed. A straightforward extension of these ideas is to provide a grouping of statistical concepts, information and tools, which could be perceived as representative of some typical stage or decision point in an analysis. We have called such groupings Situations. While these are presently few in number, examination of those now available in DINDE should illustrate the idea. The figure below shows the Situations presently available in DINDE as they would appear in the toolbox. LSDDDDDDDD@0``0>``0`3 3π3lgg<{<Ac06l>laaf͛6ٲfc36l@3fǘg~6f6c66l 3f͘m`̓6;f6c66l3cmf͛6fC9>cg<<<00@@ DDDDDDDD@  0 `  ?`|``0fc`m7c88 ㏰of>>y|͌3l13c c6 0;f|6͛6[&fc`l 36<>90c|ώyßf;80` ``>``<`3 f3lgglaaffL<͘fٲ3fǘg~ ͙3f͘m`a͛f3cmfdf͛f>cg<||< The BivariateRegression Situation represents that point in the analysis where the analyst has decided to perform a bivariate regression of Y on X, and as such it must contain the minimal amount of information and set of tools required to make the next decision. It identifies the variables Y and X, contains some suggestions as to how to proceed, and offers easy access to typical next steps (like plotting the points or fitting a straight line). If, at the BivariateRegression step, one elected to do a least-squares linear fit of Y to X, then a BivariateLeastSquares object would be produced. The BivariateLeastSquares Situation is a specialization of a BivariateLinearFit, which is itself a specialization of BivariateFit. All BivariateFits have pointers to the X and Y vectors onwhich they are based and contain vectors of the fitted values and the residuals from the fit. Further, they can take a number of relevant actions such as producing a variety of residual plots. Additionally, BivariateLinearFits contain the parameter values which define the fitted line. BivariateLeastSquares contains yet more information such as variance estimates and t-statistics. This is distinguished from the BivariateResistantLine, representing the fit obtained by fitting the "resistant line" (e.g. see Velleman and Hoaglin(1983)), which has diagnostic information like the ratio of the half-slopes. In Situations we see a need for much more work both on those Situations we have created thus far, and on new ones. Situations require the factorization, cataloguing, and bundling of many statistical concepts, tools, techniques, et cetera, and the identification of the relationships between them (i.e. how one usefully leads to another). As such, the creation of each must be carefully undertaken. Their constituent parts must be based on sound statistical theory and practice, and their appropriate interconnection is often an open research question. We expect this to become ever more poignant as statistical situations more complex than simple bivariate regression are considered. The complete collection of tools are made available to the user through a DINDE window which displays them in a network reflecting their familial relationships. This window, which we have been calling the toolbox, appears in its current entirety below. (For clarity, we have excluded the classes: BooleanArray, FloatArray and StringArray.) L?x@??y̎NgyO'yO'pd yO '$ yO''$yO''$?yO''$y̘Ng'$x@?pd ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `p `0p `3Ǚ㏞ـ `Cl06 `6l۳0 `6l۶0 `6l6۶0 `Ǚ3ـ ` ` ` `@ `"""""" ` ` ` ` `` `p 8` `p@8 `قxyl `61xcg `0c `0@ͱcc `0 ͱcg ` 3ـxl `  `  `B `@ ` `@ `"""""" ` ` `q `fq `{>f `6ٳ{< `0ٳn `f0ٳn `f0ٳn< `ðٟdώf ` `3 `  ` ` `@ `@ ` ` ` ` ` ` ` ` ` ` ``< ` `f ` g>px8 `l3<3 ` ٳo3< `  ٳl3l `  ٳl3fͳl ` g|x> `@ `@@ ` ` `""""" """""  ` ` ` `" `@ ` `  `q(8>" 8> `Ͷ3 6͘c3 6 `0x؈<0  <0 `0Pl0 cl0 ` l0p$l0  cl0 ` l0| >ϰ@ >ϰ `` ` ` !  ` ` ` ` """"" """"" `@@ `@ `@ `  ` < `ff `px8 `6ٳ<3 `0ٳ< `f0ٳl `f0ٳfͳl ` ðٟ|x> `  ` 3 `  `  `  `  ` `@ `@ `@@ `@@ `@@ `@ `@ `@ `@ ` `x1Ϟ| `63336f3l ` ͛61#3` `@͛613` `͛6ͳ06f3` `x0Þ` ` ` ` ` `"""""" `@ ` `@ `B `D ` 0~ `!00`1 `| `|x `1d|ن0͙ ` 1 `ٞ0H `01`ٶ0x `1`ٶ00͙ ``0x ` ` ` `@ ` ` ` `DDDDDD ` ` ` `@<  ` f` `p1Ϟ| `` `@`3` `@gg `@3l `@l>l `@l0l `@l0l `@Ǚg ` ` `@ ` ` `DDDDDD ` ` ` ` `?>`@ `1`  `1g `1l `1l `!1l ` 1l@ `D?0g `@ ` ` ` ` ` `@ `DDDDDD ` ` `6 `0 `6{現q ` Ɍٶ ` 6ٰy `@69ٰٙ `@6ٰٙ `6珰} ` `DDDDDDA ` ` ` `D ` ` `` cq?>``| P cyc`  cyc|g>ϙ cmcfl٘fl0 cmcfolP cgcflfl cgcfl٘fl1 cc?>|gv `` D `  `  `  ` DDDDDDA ` ` `` `@|`g0  `@af`l00  ` 8f>ggٜn}πg ` 3 af٘llنg0m6[6a `8`ggϟosa p `@@ ` @ ` @ `  ` ` ` `  `@DDDDDD@ `@ `! `  `@ ` ` o`>`@   `B l `3`   `BG@癳g 3{||0 `A36l3l͛-0f͙ `A3@6o>l3Hf `A 3 6l0l̀6x;f̀ `@L 36l0l͚m60f͙ ` 3癰g0f|0 `  ` @ ` @@ ` @ ` @ ` ` DDDDDD `  ` ` `DDDDD@ ` `@ ` `@ ` ` ` `>>|@ `ccf ll `Pccf> ll>`@ٛ<} `3f@  `3f`` `3f `L3< `DDDDD@ ` ` `@ ` DDDDDDDDDDD@ ` ` ` `@ `` `?``1 `@1` 0 `$1|xg0@3< `1f͛ll0?3e `@1fl13q `@1f l03p `@5f͛ l0@3M0 `@ fxgϟy `@ `@ ` @ ` `  ` ` DDDDDDDDDDD@ ` ` ` ` ` ` ` ` ` ` """"""" ` ` ` `   ` `00 `>>ͱ}|3``< `` 003 f `3` 0o}|3lgglaaffL<͘fٲ `c3?` ٱlߞa03fǘg~ ͙ `f60` ٳfc0l6a03f͘m`a͛f `63` fclٶa03cmfdf͛f `13`?la>cg<||< `@ `0 ` ` ` ` ` ` ` ` ` `@ `0>~ ` 3 ` ` 3<{=` `fcfd| ` ٱ~p` ` ٳfc3`;` ` fcfL` `3 A @@./K1fZ$<1!5Lə"j$!!!HAD!!!HH2ADA!!!Hȉ"@>! χH@x?wwgLtxt wٛ3nﻳx;ۻwx_w׿ۻwf_wۛwn߿߿ãÿ:g|x<[wq_wwٛvW|ۻۻ6@ۻv@ýۻO@ H @ H#HIg.#H$HI1$HOHI!'H$HI !$H$H1$HH#GIP#@@@@@@OH H#qg8H$H"DO>8H$ H$H"DH#q8@@@@AO !H QH p,)O5UH !)H !QH B!!H B A@@@@D>D!Dq 8qDĈĉ "Dĉ D9 @ D @ D!"Dĉ Dgu>8h @@ Short or long summaries of the class, a list of the components any instance of the class will require (e.g. BivariateRegressions must have a Y and an X variable) and references to the literature are all available by making the corresponding selection. FindWhere identifies the parent from which any variable or method owned by that class was inherited, and InternalDescription gives a skeletal outline of the class in question showing its variables, methods and parents (super classes). (A possible extension to this menu would be a series of examples concerning the usage of instances of the class.) This kind of information is available for all classes in DINDE and is directly accessible from the toolbox.(LIST ((PAGE NIL (PAPERSIZE Letter FOLIOINFO (ARABIC ) STARTINGPAGE# 16) (0 0 612 792) ((FOLIO NIL (PARALOOKS (QUAD CENTERED) CHARLOOKS (SUPERSCRIPT 0 INVISIBLE OFF SELECTPOINT OFF PROTECTED OFF SIZE 10 FAMILY CLASSIC OVERLINE OFF STRIKEOUT OFF UNDERLINE OFF EXPANSION REGULAR SLOPE REGULAR WEIGHT MEDIUM INVERTED OFF USERINFO NIL STYLE NIL) FORMATINFO (ARABIC )) (162 36 288 36) NIL) (HEADING NIL (HEADINGTYPE DINDE) (120 3600 492 36) NIL) (TEXT NIL NIL (72 72 468 648) NIL))) (PAGE NIL (PAPERSIZE Letter FOLIOINFO (ARABIC )) (0 0 612 792) ((FOLIO NIL (PARALOOKS (QUAD CENTERED) CHARLOOKS (SUPERSCRIPT 0 INVISIBLE OFF SELECTPOINT OFF PROTECTED OFF SIZE 10 FAMILY CLASSIC OVERLINE OFF STRIKEOUT OFF UNDERLINE OFF EXPANSION REGULAR SLOPE REGULAR WEIGHT MEDIUM INVERTED OFF USERINFO NIL STYLE NIL) FORMATINFO (ARABIC )) (162 36 288 36) NIL) (HEADING NIL (HEADINGTYPE DINDE) (120 3600 492 36) NIL) (TEXT NIL NIL (72 72 468 648) NIL))) (PAGE NIL (PAPERSIZE Letter FOLIOINFO (ARABIC )) (0 0 612 792) ((FOLIO NIL (PARALOOKS (QUAD CENTERED) CHARLOOKS (SUPERSCRIPT 0 INVISIBLE OFF SELECTPOINT OFF PROTECTED OFF SIZE 10 FAMILY CLASSIC OVERLINE OFF STRIKEOUT OFF UNDERLINE OFF EXPANSION REGULAR SLOPE REGULAR WEIGHT MEDIUM INVERTED OFF USERINFO NIL STYLE NIL) FORMATINFO (ARABIC )) (162 36 288 36) NIL) (HEADING NIL (HEADINGTYPE DINDE) (120 3600 492 36) NIL) (TEXT NIL NIL (72 72 468 648) NIL)))))((?1(DEFAULTFONT 1 (GACHA 10) (GACHA 8) (TERMINAL 8)) MODERN CLASSIC CLASSIC TO W g  e BMOBJ.GETFN3?1(DEFAULTFONT 1 (GACHA 10) (GACHA 8) (TERMINAL 8)) Gd  2  3 ,;d/P BMOBJ.GETFN3?1(DEFAULTFONT 1 (GACHA 10) (GACHA 8) (TERMINAL 8))    L S<    ,X4 BMOBJ.GETFN3?1(DEFAULTFONT 1 (GACHA 10) (GACHA 8) (TERMINAL 8)) g, u% d 0  C& BMOBJ.GETFN3?1(DEFAULTFONT 1 (GACHA 10) (GACHA 8) (TERMINAL 8))  F  &  >l 2 - 2-   L BMOBJ.GETFN3?1(DEFAULTFONT 1 (GACHA 10) (GACHA 8) (TERMINAL 8)) U\ BMOBJ.GETFN3CLASSIC l} `N/z