<<>> <> <> <> The Architecture of Cirio External Dependencies Cirio is a Cedar program. Cirio is not a monolith (see the section ``User Interface vs. Functionality''). The following pieces import the following Cedar interfaces: CirioBase: Atom, BasicTime, Breakpoint, BreakWorldArchitecture, CardTab, Commander, CommandTool, Convert, ConvertUnsafe, CountedVM, FamousPath, FileNames, FS, IO, List, MorePfsNames, NetworkStream, OneCasabaParser, PFS, PFSNames, Process, Random, Real, RealFns, RefTab, RefText, Rope, RopeHash, RopeParts, RopeSequence, RuntimeError, SafeStorage, SimpleFeedback, SourceFileOpsExtras, SPARCArchitecture, SunYPAgent, SymTab, SystemInterface, UXStrings, VersionMap, VersionMapDefaults, VM VCirio: Atom, CirioBackstop, Commander, CommandTool, Containers, Convert, EditedStream, FileNames, IO, Labels, LocalCirio, MBQueue, PopUpButtons, Process, RemoteCirio, Rope, Rules, RuntimeError, SourceFileOpsExtras, StackCirio, TiogaAccess, TiogaOps, TypeScript, ViewerEvents, ViewerIO, ViewerOps, ViewerTools CommandLineCirio: ... The dependencies on FS, PFS, and MorePfsNames indicate that Cirio uses files through these Cedar interfaces. Cirio references files for the following reasons ... [each reason will get mentioned somewhere else, where it belongs in the overall structure of this document; should we also gather that info here?] ... Debuggee vs. Debugger There is a standard interface, called the DebugNub interface, that is exported by every debuggee, and is imported by the debugger. This interface is available over network streams, and also via procedure calls. Thus, Cirio can be used cross-world or same-world (remote or local). The DebugNub interface Enumerate and explain the items in the interface. Note differences between local and remote cases. Explain how to bind to a remote instance. The CirioNubAccess interface. This is a Cedar version of the DebugNub interface; the debuggee can be either remote or local. Debuggee Parameters Currently, Cirio is essentially not paramaterized with respect to the ways that debuggees might vary. For example, Cirio currently only works on big-endian, 32-bit pointer, 8-bit-byte-addressable machines, with SPARC programs loaded from (Sun-style) ``.o'' files. These assumptions are scattered through Cirio, and not (to my knowledge) organized in any way. User Interface vs. Functionality Cirio has several user interfaces. Each instance interfaces between a user and a debuggee world. The available user interfaces are: Remote Cirio Debug Tool (Viewers-based), Local Cirio Debug Tool (Viewers based), and Command-Line Cirio (local or remote). The two Viewers-based interfaces are listed separately because they are implemented in separate modules. I don't know what the structure of the command-line interfaces is. A debuggee is represented by a (LocalCirio or RemoteCirio).Connection and a StackCirio.Stack. When no particular thread is focussed on, a dummy Stack is used. The fact that there are two kinds of Connection may be why there are two separate modules implementing the Viewers interfaces. User Interface Architectures We should probably have such a branch. Functionality: Worlds, Stacks, Frames, Languages, Values, Interpretation A LocalCirio.Connection or a RemoteCirio.Connection represent a debuggee at a higher level of abstraction than a CirioNubAccess.Handle. Not all DebugNub operations are directly reflected in LocalCirio or RemoteCirio; instead, a Connection can produce a StackCirio.Stack for a particular thread, and most of the interesting stuff Cirio can do is done by stacks. A stack: can list a summary of its frames; has a current frame; can move the current frame up and down the stack, either to an absolute frame number or by a relative number of frames, seeing and counting only Cedar frames or seeing and counting all frames; can produce various listings about the current frame, including one that shows all the variables visible from the frame and their values; has a desired language, which can be queried and set; can interpret a line of text in the context of the current frame in the desired language; can show the source position corresponding to the PC in the current frame; and can set, list, and clear breakpoints. Every frame can, in general, be viewed through various languages; possible languages include: Cedar, SchemeXerox, Modula 3, C, and machine language. For example, one frame may simultaneously be a Cedar frame, a C frame, and a machine language frame, due to the facts that the Cedar compiler emits C code, and the C compiler emits machine code. [In fact, it's worse than that: one Cedar procedure may be implemented by several C procedures --- and thus one Cedar frame may be spread out over several C frames. And inline procedure expansion in the Cedar compiler does the reverse: cram several Cedar frames into one C frame. Cirio currently deals very poorly with these facts.] For each frame, for each applicable (and implemented) language, there is a node (see below) representing that frame in that language. Cirio must have a data type of its own to represent an arbitrary value in the debuggee. That data type is the CirioTypes.Node. Cirio also has a data type, CirioTypes.Type, that represents a type in the debuggee. Nodes and Types form a language-independent algebra. Of course, in general, each node and each type is useful in only a subset (possibly a singleton) of the supported languages. Every Node has a Type. Everything in the debuggee is representable by a Node. That includes variables, constants, frames, stacks, procedures, signals, modules, and more. [I think some of the more esoteric ones of those (eg, frames and stacks) have bugs in them, and so Howard avoided using them and built other mechanisms.] In particular, evaluation contexts are represented by nodes. For each supported language, there is an interface that creates an evaluation context node from a frame. Cirio currently can interpret expressions, but not statements. Expression interpretation goes like this: 1. the text is compiled into code (like this:) 1.1 the text is parsed into an abstract parse tree (like this) 1.1.1 the text is parsed into a source- and language-dependent form, 1.1.2 which is then converted into an abstract parse tree 1.2 the parse tree is compiled 2. the code is executed. Language-dependent vs. Language-independent From the above discussion, you can see that: Debuggee worlds are language-independent. Stacks are language-independent. Each stack frame can be viewed through one of a few applicable languages. Each Node represents a value that, in general, makes sense in a subset of the supported languages. Each expression is interpreted in exactly one language. The Object-Oriented Structure, and Left-Side vs. Right-Side Cirio is heavily object-oriented. Abstract parse trees are object-oriented, as are nodes and types. An abstract parse tree is represented in an object-oriented style: as a REF ANY plus the methods of the compilation protocol. The conversion from the source- and language-dependent form into the abstract parse tree is thus done by simply pairing the dependent form with the appropriate method suite. The knowledge of how to compile resides in that method suite. The code is in a fixed, but hidden, instruction set. The interfaces CedarCode and CedarCodeExtras have procedures for producing various kinds of instructions. The actual enumeration of the instructions and their meanings is in CedarCodeImpl [how chauvanistic!]. Most instructions are implemented by asking one of their argument nodes to actually do the work. The class hierarchy for nodes has two levels (the story for types is very analogous). At the top level, there is the class Node. At the bottom level, there is a set of classes, defined in the interfaces CedarNumericTypes, CNumericTypes, CedarOtherPureTypes, AmpersandContext, Records, VariantRecords, RefTypes, Sequences, PointerTypes, Procedures, Arrays, Frames, Definitions, Types, Atoms, Addresses, Lists [some of these might not actually be used.] The classes are things like procedure, record, array, Cedar number, ... For every Node class, there is a corresponding Type class; each of the above interfaces generally defines both some node classes and some corresponding type classes. Howard conceptually divided Cirio into two large parts, which he called left side and right side [the choice of terminology is appalling: it does not in any way refer to the familiar `left-hand side' vs. `right-hand side' (...`of an assignment') distinction of imperitive programming languages --- instead, it refers to a picture Howard drew on his whiteboard]. The left was roughly characterized as ``the compiler''; the right contains the knowledge of memory layout and other runtime representation details. The border between left and right runs right through the middle of those interfaces and implementations of the Node and Type classes. Each prodecure that creates a Node or Type of a given class returns a ``whole'' object; the creation procedure generally takes as parameters a REF ANY and a suite of procedures that effectively constitute a shadow object that encapsulates the right half of the whole object. The implementation of the whole object's methods is considered to be on the left side of Cirio, and can be found in Cedar modules whose names are derived from the above interface names in the usual way (ie, by appending `Impl'). For some operations, no knowledge from the right side is needed, and none is used. For others, the left-hand code invokes operations of the shadow object, thus using knowledge from the right side. For some Type and some Node subclasses, no knowledge from the right side is needed by any method, and the creation procedure takes no right-side arguments. For example, look at the ``Arrays'' interface. The procedure CreateArrayIndirectNode takes an argument (info: ArrayIndirectNodeInfo) that is the right-half object. The left-side code in ArraysImpl knows that an array is an indexed collection of elements, that the elements are all of the same type, and so on. It does not know how to do the address calculation to produce an element location given an array location and an index; indeed, it doesn't even use addresses to describe arrays or elements --- it just uses Nodes. When an Array Node is called upon to deliver an element Node, the left-side code in ArrayImpl invokes a method of the right-half object (the `info: ArrayIndirectNodeInfo' parameter to CreateArrayIndirectNode) to actually do this work. Finding all the right-half subclasses of a given Node subclass (eg, finding all the right-half subclasses of Array Node) is not quite easy. Every client of Array that calls CreateArrayIndirectNode generally supplies an `info: ArrayIndirectNodeInfo' of a subclass unique to that client. Thus, every client must be found and examined. The clients of the above Node- and Type-class-defining interfaces are the language-specific modules that create evaluation context nodes from frames. There are currently two such modules: one for Cedar, and one for C. The Cedar module is described by a .config named CirioPCedarTW --- even though all the submodules' names begin with ``RMTW'' and the main interface is called ``NewRMTW''. The ``RMTW'' stands for Remote Mimosa Target World. The C module is described by a .config named CirioCTW --- even though the abbreviation ``RCTW'' (Remote C Target World) figures prominently in its interface and submodule names. Nodes and Types The Node & Type Algebra The way Howard formalized Cedar's TYPE algebra is a bit surprising. To be fair, it's true that Cedar's TYPE algebra is itself surprising. It's too bad there was no more general review of Howard's formalization. I don't understand it all, and will only point out what I've found to be the most surprising and confusing areas. My sources have been comments in the Cirio code, CirioDoc.tioga, and typeNodes.tioga in /Sturgis/CirioDocumentation (where /Sturgis is currently prefixed mapped to `-vux:/net/gharlane/aleut/defunct/sturgis', the directory where Sharon Johnson has temporarily restored Howard's files). Indirects Types come in pairs; each pair consists of an indirect type and a direct (aka target) type. The significance is this: an indirect node represents a location in the debuggee's memory where some other value is stored. The load operation is applied to an indirect node and produces a direct one, which represents the value that was stored at the location at the time the load was applied. The store operation stores a given direct node's value into the location represented by a given indirect node. There are no `indicrect indirects'; indirect types are different from pointers. Note that in Cedar, if you've declared y: INT _ 3; x: POINTER TO INT = @y; the expression `x+1' means pointer addition, whereas `y+1' means `the integer that's 1 greater than the INT currently stored in y'. This is why indirects are different from pointers and refs. Furthermore, dereferencing a pointer or ref produces an indirect, not the current contents of the address. This must be so because we can store through a dereferenced pointer, but not through an integer: `x^ _ 5' is legal but `3 _ 5' is not. In Cirio, if N1 is an indirect pointer-to-Foo node, Load(N1) produces N2, a direct pointer-to-Foo node. Dereferencing N2 produces N3, an indirect Foo node. Load(N3) produces N4, a direct Foo node. Such operations appear in the code produced by the compilation step. The basic idea that Howard started with is that each type t has a function some abstract value (represented by a Node). He later decided that reality is more complicated than that, because global and local frames are not layed out in contiguous stretches of memory. He hacked Cirio into coping with that extra complexity by introducing the intermediate concept of a Node Schema. Unfortunately, he hacked Node Schemas only into CirioPCedarTW, not all of Cirio. When Hopcroft taught Cirio about C, creating CirioCTW, he cloned the NodeSchema TYPE of CirioPCedarTW into a new NodeSchema TYPE for CirioCTW. It is unclear whether NodeSchemas are necessary, and it's quite clear that having two different kinds is painful. Indirects and Unions One thing I think Howard did a good job on was clarifying the interactions of the UNION (eg, variant record) and REF type constructors. In particular, note that in the Cedar language, a REF to a variant RECORD type is not actually a REF to a UNION type: you cannot store through that REF a record with a different tag. Practically speaking, this is because the storage could have been allocated to be only big enough to hold some small variant, and you might be trying to store a variant that occupies more storage than was allocated. In Cedar, a REF to a variant record type is in fact a UNION of REF types: one for each variant of the record type. This is also consistent with the fact that you can discriminate a REF to a variant record type into a REF to a bound variant --- this is just discriminating amongst the REF types in the UNION. Also, a REF ANY is not a REF to a UNION over everything; instead, it is a UNION over every REF thing. However, a REF REF Variant is a REF to a UNION (over other REFs). Here is a formalization that is inspired by Howard's, but probably a little different. There is a universe of values. Some of those values are indirects. There is a set of ordinary types; these are ordinary in that they lack the extra structure of Cedar types. Each T in the set of Cedar types is a set of ordinary types {t1, t2, ... tn}. An ordinary type t has a set of values, revealed by a subtype relation {t1, t2, ... tn} is the union of the values in the member types, mean type. There is also a partial function CI amongst Cedar types that maps direct ones to indirect ones: CI({t1, t2, ... tn}) = {I(t1) Note that I(t1 time various executions of Load(i) will produce various t1s, or will produce various t2s, but not a mixture of t1s and t2s --- which executions of Load(j) could produce if j is an indirect to a union. The reason Cedar types have more structure than ordinary ones is to make it easy to explain why POINTER TO and REF, when given a union, sometimes produce a union of indirects and sometimes an indirect to a union. In the notes I've been able to find, Howard seemed to confuse indirects with pointers somewhat. Note that with the declarations v: CedarNumericTypes.NumericDescriptor; p: POINTER TO CedarNumericTypes.NumericDescriptor = ...; r: REF CedarNumericTypes.NumericDescriptor = ...; the assignment v _ [unsigned[16, full[]]]; is safe, but the assignments p^ _ [unsigned[16, full[]]]; r^ _ [unsigned[16, full[]]]; should be unsafe, because p or r might point to some space that was only allocated to be big enough to hold a NumericDescriptor.real. From here on, until told otherwise, forget the distinction between pointers and indirects. Assignment Safety We take the view that one always stores through an indirect. Thus, the assignment a _ exp is implemented by obtaining an indirect to the variable a, a value for the expression exp, and then storing the value through the indirect. Further r^ _ exp (where r is a REF) is implemented by obtaining the indirect contained in r, a value for the expression exp, and storing the value through the indirect. Now we come to the crucial point. Given the Cedar type of an indirect, the set of values which one may safely store through that indirect may be considerably smaller than the set of values that one might find when loading from it. Let T = {I(t1) ...}. Then the set of values that one can safely store through an indirect of type T is contained in However, the set of values which one might load through such an indirect includes We define two type operators. LTarget(T) is intended to be the Cedar type of the values which may be safely stored through an indirect of Cedar type T, and RTarget(T) is intended to be the Cedar type of the values which might be obtained when loading from an indirect of Cedar type T. We define these operators for certain indirect types by RTarget(CI(T)) = T if |T| > 1, then LTarget(CI(T)) = { if |T| = 1, then LTarget(CI(T)) = T. We can show that these definitions are suitable as follows. Suppose that u has been loaded thorough an indirect of Cedar type CI(T), where T = {t1, t2, ...}. Then u case, the values u that are safe to store thorugh the indirect include contained in this set. On the other hand, if |T| > 1, then Conformance For Cedar types, define Conforms(T1, T2) by Observe that Conforms(T1, T2) is equivalent to Hence Conforms(T1, T2) <=> <=> Note the following property of ordinary indirect types: If I(t1) [This follows from an argument that legal configurations must lead to legal configurations .... ???] [Is it even possible that it is a definition of I(t1) [Do t1 Given these definitions and assumptions, let us compute Conforms(CI(T1), CI(T2)). Conforms(CI(T1), CI(T2)) <=> Conforms({ <=> <=> We offer two special cases: (1) if both T1 and T2 are singletons (i.e., T1 = {t1} and T2 = {t2}), then this reduces to: <=> I(t1) <=> t1 <=> Conforms(T1, T2) and Conforms(T2, T1) (2) if T1 <=> Conforms(CI(T1), CI(T2)) How can such an asymmetric statement be correct? If it really is, that's because when |T2| > 1, we can't store through a CI(T2) (because LTarget(CI(T2)) = { Records We assume the record type constructor for ordinary types: . We define the Cedar record type constructor <> by: <> = { | ti [This is probably wrong; consider a variant record nested within a variant record. That suggests <> = {. CODE(T) is an ordinary type containing a single integer value, the type code for T. We define RTT(T) to be this record type. RTT stands for Ref Target Type. When T is {t1, t2, ...}, RTT(T) is {, , ...}. Variant Records We consider exactly one class of variant records. It contains the individual types VR, red VR, big red VR, small red VR, and blue VR. Some Cedar Types Let Z32 be the ordinary type for 32-bit integers. Let big red VR (note underscores) be the ordinary type that consists of all big red VRs; similarly for the other fully discriminated variants of VR. INT = {Z32} VR = {big red VR, small red VR, blue VR} red VR = {big red VR, small red VR} big red VR = {big red VR} RTT(INT) = {} RTT(VR) = {, , } RTT(red VR) = {, } RTT(big red VR) = {} RTT(ANY) = { | any Cedar type T of the form {t}} REF INT = CI(RTT(INT)) = CI({}) = {I()} REF VR = CI(RTT(VR)) = {I() I()} REF red VR = CI(RTT(VR)) = {I() REF big red VR = CI(RTT(big red VR)) ={I()} REF ANY = CI(RTT(ANY)) = { REF REF INT = CI(RTT(REF INT)) REF REF ANY = CI(RTT(REF ANY)) Sequences A (possibly variant) record that ends with a SEQUENCE is treated like a variant record, with one variant for each possible length. Some assignments Given the above definitions we can consider the legality of some assignments. Remember that v _ exp is safe when Conforms(type(exp), LTarget(CI(type(v)))) and r^ _ exp is safe when Conforms(type(exp), LTarget(type(r))). Or, in general, e1 _ e2 is safe when Conforms(type(e2), LTarget(type(e1))), where type(e) is the statically apparent type of expression e. Assume the following declarations i: INT; ra: REF ANY; ri: REF INT; rvr: REF VR; rbrvr: REF big red VR; rra: REF REF ANY; rri: REF REF INT; vr: VR; brvr: big red VR; ra^ _ i could be shown safe if Conforms(INT, LTarget(REF ANY)) <=> Conforms(INT, LTarget(CI(RTT(ANY)))) <=> Conforms(INT, { is false vr _ brvr is safe if Conforms(big red VR, LTarget(CI(VR))) <=> Conforms(big red VR, { is false ra _ ri is safe if Conforms(REF INT, LTarget(CI(REF ANY))) <=> Conforms(REF INT, REF ANY) {because |REF ANY| = 1} <=> Conforms(CI(RTT(INT)), CI(RTT(ANY)) which follows from {} => RTT(INT) => Conforms(CI(RTT(INT)), CI(RTT(ANY)) rra _ rri could be shown safe if Conforms(REF REF INT, LTarget(CI(REF REF ANY))) <=> Conforms(REF REF INT, REF REF ANY) {because |REF REF ANY| = 1} <=> Conforms(CI(RTT(REF INT)), CI(RTT(REF ANY))) <=> [Conforms(RTT(REF INT), RTT(REF ANY)) and Conforms(RTT(REF ANY), RTT(REF INT))] {because RTT(REF INT) and RTT(REF ANY) are both singletons} which requires at least Conforms(RTT(REF ANY), RTT(REF INT)) which requires at least Conforms(CODE(REF ANY), CODE(REF INT)) which fails. Runtime Check Bug The above does not account for the fact that the Compiler will accept some assignments and emit runtime checks. Fixing Pointers/Refs vs. Indirects OK, now remember that pointers and REFs are (syntactically) different from indirects. Currently, Cirio regards the constructor POINTER TO T as constructing a type that is slightly different from Indirect T: values of type POINTER TO T must first be dereferenced, which produces an Indirect to a T. REF T has the additional complexity of RTT: a value of type REF T must first be dereferenced, which yields an Indirect to a RTT(T). Declaring a variable of type T binds the variable's name to an Indirect to a T. Declaring an identifier to be a constant of type T binds that identifier directly to a T value. REF and POINTER types are considered direct types, which is why there are no indirects of indirects. Cirio currently errs in treating the declaration v: T; as binding `v' to a CI(T); it should actually set `v' to have the Cedar type {I(T, it wouldn't be correct to NARROW that POINTER TO T to a POINTER TO a bound variant of T! (v could take on other variant values.) Field Lists If you look at the actual Record, VariantRecords, and Sequences interfaces, you'll see that there are FieldLists, as well as Records. What are they? ... AMNode Crocky. What good way is there to compile `&1 _ 2+2' in a strongly-typed language? ... Name Scopes ... The Dance The path from analyzing a TYPE from some source of information, such as a .mob file, to having a direct node in hand goes through several steps, and was referred to by Howard as ``the dance''; between steps things are at various ``times'': type time, structure time, and node time. The dance goes like this for Cedar (and similarly, but independently, for C). The information from the .mob is digested by some code in RMTWSomethingOrOther into an RMTWPrivate.AnalyzedSEH, which is a type-time abstraction and has a Type as one of its fields. I think the reason there are AnalyzedSEHs, as well as Types, is that Types know nothing about NodeSchemas, and AnalyzedSEHs know something about them. In particular, one of the fields of an AnalyzedSEH is a procedure to create a NodeSchema. That's the next step of the dance, and takes us to structure time. Creation of a NodeSchema requires an AnalyzedSEH and a BitFieldSchema. A BitFieldSchema is a structure-time abstraction that represents (in an object-oriented fashion that exports only the ability to do it) the ability to extract a sub-field of a stretch of memory. With a BitFieldSchema in hand, you can: (1) ask how many bits long the sub-field is, and (2) create another BitFieldSchema that represents the ability to extract an even smaller sub-sub-field. The other things you can do with BitFieldSchemas are: (1) get the byte address of the beginning of the sub-field (what if it's not byte aligned?); (2) read the bits currently stored in the sub-field, and (3) write bits into the sub-field. However, to do these three things requires also having an RMTWPrivate.Mem in hand. A Mem is a node-time abstraction that represents an absolutely-positioned stretch of memory, from which BitFieldSchemas can extract sub-fields. As noted above, a Mem actually represents a structure of absolutely-positioned stretches of memory, to cope with the vagaries of global and local frame layout. In particular, a Mem consists of (see RCTW.mesa): a frame pointer; a stack pointer; a frame extension pointer; a global frame pointer; text, data, and bss segment pointers; and a distinguished absolute address. Each of these addresses can independently be effectively omitted (a RemoteAddress can declare itself invalid). Howard's original idea was that Mems would be opaque, and different Mems could have different structure. BitFieldSchemas and Mems must be implemented by code that knows the internal structure of both. That is, a BitFieldSchema represents a sub-field's relation to a particular containing structure, which is represented by an appropriate Mem. When asking a BitFieldSchema to read bits, you must supply a Mem describing a structure of memory of the kind the BitFieldSchema expects. This is still true, even though all Mems are now represented by a data structure published in RCTW.mesa: because different Mems will have meaningful addresses in different of their pointer slots, it is still imperative to supply an appropriate Mem when asking a BitFieldSchema to do memory operations or resolve itself to an absolute address; there are still bugs in Cirio that cause inappropriate Mems to be given. A NodeSchema is a structure-time abstraction. It knows little more than how to create an indirect Node. The operation to create the indirect Node takes a NodeSchema, a Mem, and, curiously, the direct Type that's the direct/indirect dual of the Type of the desired indirect Node. I do not understand why this Type is passed at this point: the NodeSchema came from an AnalyzedSEH, which already knows those Types. Whether there's ever a difference between the Type in the AnalyzedSEH and the Type given to the createIndirectNode operation, I don't know. With the creation of the indirect Node, we arrive at node time. The final step consists of invoking the Load operation of the indirect Node to produce the direct Node. It's at this time that the BitFieldSchema and the Mem collected at earlier steps are used to read memory, and the bits from memory are interpreted according to at earlier steps, where incorrect BitFieldsSchemas, Mems, or type information is supplied --- and the calling procedures that are responsible have long since returned. Code, Compilation, and Execution. ... Thread Controls ... Breakpoints ...