CSLNotebookEntryToCedarusersDateMarch30,1983FromEdTaftLocationPaloAltoSubjectBTreepackageOrganizationPARC/CSLXEROXReleaseas[Indigo]Documentation>BTree.tiogaDraft[Indigo]Documentation>BTree.tiogaLasteditedbyTaft,March30,19839:42amAbstractThisCedarpackagemaintainsanorderedcollectionofobjectsasaBTree.Theobjectsmaybeofdifferentsizes,andtheremaybealargenumberofthem(tensorhundredsofthousands).TheamountofvirtualmemoryrequireddoesnotdependonthesizeoftheBTree,andthecostoffinding,inserting,anddeletingobjectsincreasesonlyveryslowlyastheBTreegetslarger.Thepackagemakesveryfewassumptionsabouttherepresentationoftheobjectsbeingstoredoraboutthepropertiesofthestorageitself.ThismemogivesanoverviewoftheoperationoftheBTreepackageandpresentstheresultsofsomeperformancemeasurements.IntroductionABTreeisadatastructureformaintaininganorderedcollectionofobjects(orentries)storedasatreeoffixed-sizepages.Theentriesareeachsmallerthanapageandarestoredmanyperpage;thus,eachinteriorpageofthetreehasmanybranches,soevenaverylargetreeisquiteshallow(adepthof3or4istypical).Thismeansthatfindinganarbitraryentryisrelativelyinexpensiveintermsofthenumberofpagesthatmustbeaccessed.Thecostofanupdateisalsoreasonableandhasanupperboundwhichislinearinthedepthofthetree.Finally,thetreehasverygoodlocality,soreferencestoconsecutiveentriesusuallyaccessthesamepage.ForfurtherinformationonthepropertiesofBTreesingeneral,pleasereadKnuth,TheArtofComputerProgramming,vol.3,section6.2.4.ThisBTreepackageisaCedartransliterationofonewritteninBCPLbyEdMcCreightandusedformaintainingthefiledirectoryinIFS.TheimportantpropertiesthatdistinguishitfromexistingMesaBTreepackagesare:GeneralitytheBTreepackagemakesnoassumptionsabouttherepresentationofentries;itsimplystoresuninterpretedblocksofwords.Itneedknownothingaboutwhatportionofanentrycontainsthekey''orthevalue''.Allknowledgeabouttherepresentationofentriesisvestedinclient-providedprocedures.ThepackagealsomakesnoassumptionsaboutthemeansusedtostoretheBTreepages,butinsteadcallsclient-providedprocedurestoaccessthem.Capacitythereisnopracticallimittothenumberofentriesthatcanbestored.Inparticular,thereisnorequirementthatallofthepagesoftheBTreeoccupyvirtualmemoryatthesametime.Theultimatelimittothesizeofthetreeis65,535pages,whichshouldbesufficienttostoreatleastamillionentriespa ?q ^r^ q+^r6^:k<q ZrZ%q+Zs6Z9q V4rV4q+V4 r6V4s:V4;V4t{Nq D sDr%3D&D q BrBsBBr&}B'.B q A, VsA,d!q >s>$.&+& 1N279-:J?/AF <  dV g"f#',L.156=> F ;V# 1#j%*,/P236-:=o?BDn 9  "%(-.175f8A="@EV 8  R/ &~(3*/(278<? EzG/ 6`2jQ `&O(&*025A9>AG/ 4;  p 0 r -$ ~GQW $&, 2I48u:-$;-$r>-$?-$CEF +| wR #Q(+E,r/2v48<?:BF^ )+Sl%'_*+.2c5-6:2?.@DFG (.  9MS!"(,+- 3| :<@WB Db & [}c r#5$&+R,/v 6:8;R=8A0E $ gV}Eo$&),K/i289 @}B: #8\|  2  #|%.)+u047u< =i ?>ACJ  rr  W3t&z(K+ /157:3A0CGR  v:!$+a 14 ;=G@E 1v  r!# *.0 9;F@AVE  mC"%*.168w:^=CCE Z   }" +-N137%8B0   "|&O(,0147A;U?iAF  s z) 6v r x v  V"$l)+[/2527' DH#  \<3!#',06@7:=AdD+ 5 6MS $(-_/T 5W7 :j<?0@^ETVm$2(evenmoreifpages''largerthan256wordsareused).Storageefficiencyoverheadconsistsofonlyonewordperentryandtwowordsperpage;andthepackagemaintainspagesbetween60and80percentfullontheaverage.Thepackagealsoattemptstokeeplargerentriesintheleafpagesandsmalleronesintheinteriorpagessoastominimizethetree'sdepth.Provenreliabilitysincethelast''bugwasfixedintheBCPLversionofthepackage,therehavebeenmanytensofthousandsofIFS-hoursofoperationwithnosuspicionofaBTree-relatedproblem.NochangesweremadetotheBTreemaintenancealgorithmsduringthetransliterationtoCedar.Itisintendedthatthispackagebesuitableforuseinmaintaininganyordereddatastructurethatmustbestoredexternally,i.e.,iseitherpermanentoris(potentially)toolargetokeepinvirtualmemory.Themostobviousapplicationisafiledirectory.Forsmallerapplicationsinvolvingtemporarydataonly,aninternaldatastructuresuchasaRedBlackTreeismoreefficienttoaccessandmaintain.InstructionsforuseTheprimarydocumentationfortheBTreepackageistheinterfaceBTree.mesa,obtainedviaBTree.df.Thefollowingisanoverviewofhowthepackageisintendedtobeused;consulttheinterfaceitselffordetails.Theclientmustprovidetwosetsofprocedures,calledtherepresentationprimitivesandthestorageprimitives.TheBTreepackageisobject-oriented,i.e.,asingleinstantiationofthepackagesufficestodealwithanarbitrarynumberofTreeobjects;andeachTreeobjectinstancecanhaveanindependentsetofrepresentationandstorageprimitivesassociatedwithit.TherepresentationprimitivesenablethepackagetofindouteverythingitneedstoknowaboutBTreeentriesandkeys.ThemostimportantareEntrySize,whichgivesthesizeofanentryinwords,andCompare,whichgivestheresultofcomparingakeywithanentry(less,equal,orgreater).ThepackagedoesnotdoanythingwithkeysbesidesComparethemtoentries.Additionally,thereareproceduresforconvertingbetweenREFandLONGPOINTER(i.e.,safeandunsafe)referencestoentries.ThestorageprimitivesarethemeansbywhichthepackageaccessesthepagesinwhichtheBTreeisstored.ThepackagegainsaccesstothecontentsofapagebycallingtheReferencePageprocedure,whichreturnsaLONGPOINTERtoablockofvirtualmemorycontainingacopyofthatpage.Whenthepackageisfinishedwiththepage,itcallsReleasePage;subsequently,thepagestorageimplementationmayrewritethepagetopermanentstorage(ifdirty)andreclaimthevirtualmemory.Thepackagecomesintwoparts.Themainpart,BTreeImpl.bcd,exportstheBTreeinterface.Additionally,thereisanexampleimplementationofthepagestorageprimitivescalledBTreeVMImpl,whichexportstheBTreeVMinterface.ThisimplementationisintendedforbuildingatemporaryorpermanentBTreeonrawPilotfiles(or,eventually,CedarNucleusfiles).ItmaintainsaVMcachewhosesizeisspecifiedbytheclient;anditreadsandwritesthefileexplicitly,usingSpace.CopyInandCopyOut.TheideaisthattheclientprogramfirstcallsBTreeVM.Open,passingafilecapabilityandobtainingastoragehandleandasetofstorageprimitives.These,alongwiththeclient'srepresentationprimitives,shouldthenbepassedtoBTree.Open,whichreturnsaTreehandleuponwhichBTreeoperationsmaybeperformed.ThereisnoexplicitCloseoperation.TheTreeandstoragehandlesarecollectibleobjects,whichvanishwhenthelastREFstothemdisappear.(Note:itisintendedeventuallythatthespaceoccupiedbytheBTreeVMcachewillbefreedwhenthestoragehandleisgivenup.Atpresent,however,thisisnotdone;socacheSize*filePagesPerPagepagesofVMarelost.)Performance2r b! ?!r%'v _j< r_j5_jW!#p&)-M/369x=@YDHG/ ]6~Q #t%b*T,.1]7I:)?sBHG \ I@u n#1(+?,/d4Y8/9;=rCE Ztv W r4WWyb".$(h*,057E9?QBF2 V !{#H),.467 @G- Toi)7Z &^ -"13<>` Q l ~)tn$&)R+ 25Q:{=COF' P  pP%*&(A /2T57Q:<8@F Nj Ms * %P(, 4i:uAD'G L &(/+1269p HR r D   Z$)+b-3 ;jA-C} C  PE #E()/1p3e7;>FCGR Ae > ! );-nu0 >0> 8 r?W>?>BJuD>Ej> = r=$=YA)2+-*12 9(:=zBG ;`  !')- 0M4n9 @T GR 0e q0e0er0et0eq0e?0e\r$0e%\0e'*-t2H 8:x -  <+!#'*Q/57;m=9AeCH# ,86M!U#)p+O,0 2.69? B *` q*`*`r(*`*`#%o)/2 57:`<>CG (DMv $ ,m 57:? ' =$&^*),148x $[ 2!%), 6<4?C " Hhn (*-605Z ;? ! 6 #&@ 0C17:?AG e\k! (,2697=>AEa  ~ rTW#%)+.5 4[7 @C9  V3q "& 05q69" ?qB7He `A=h. 'Q+/259 B u !&*, /R37;? F  [ UY+ #').g3p5 9;?GE'G/  o<M #X(+,.`255e7<BEG e ue7ereuewer!e"te%x'1* ,\p w) TVm$TheperformanceoftheBTreepackagedependsonmanyvariables:treesize,pagesize,accesspatterns,cachingbehaviorofthestorageprimitives,andothers.Afewgeneralstatementsaboutperformanceareofferedhere,followedbysomemeasuredresults.Itiscrucialthatthestorageimplementationincludeasubstantialamountofcaching,managedinanLRUfashion.Inmanysituations,theBTreepackagereferencesagivenpagemultipletimesinthecourseofasingleupdate.Evenignoringthis,inmostapplicationsitiscommonforclientstomakereferencestothesameorclosely-relatedentriesoverashortintervaloftime;theresultinglocalityofreferencehasaperformancebenefitonlywhenredundantstoragereadsareavoidedbyuseofacache.ThesizeofthecacherelativetothesizeoftheBTreecanhaveasubstantialeffectonperformance,astheresultsbelowshow.Additionally,itshouldbenotedthatforalargeBTreeitisdesirabletohavealargepagesize;givenareasonablylargeamountofspaceforacache,itisbettertohaveamodestnumberofrelativelylargepagesthanalargernumberofsmallpages.ThisisbecausethemaincostofperforminganoperationonalargeBTreestoredonadiskisdiskaccesstime(asopposedtotransfertime);andlargerpagesmeanmoreentriesperpage,ashallowertree,andfeweraccessesperoperation.WhenupdatingaBTree,atradeoffmustbemadebetweenthedesireforgoodperformanceandthedesiretomaintainaconsistentpermanentstateofthetree.Ifitisdesiredthatthepermanentstatebeconsistentbetweeneveryupdatethenmorewritesarerequired,thusreducingperformance.TheclientprogramcontrolsthistradeoffbyuseofthemaintainRecomputableStateargumentofBTree.OpenandtheStartLongUpdateandEndLongUpdateproceduresavailableinBTreeVM.Ofcourse,ifconsistencyisbeingmaintainedatalowerlevel,e.g.,bykeepingtheBTreeinanAlpinefileunderatransaction,thennoprovisionsneedbemadeattheBTreelevelformaintainingconsistency.ThemeasurementsbelowwereobtainedusingtheBTreeTesttool,whichisavailableasBTreeTest.bcdfromBTree.df.ThisprogramusesBTreeVMtomanageaBTreestoredinaPilottemporaryfile(whichisdeletedwhenthetoolisdestroyed).Itobtainsparametersfromtheuser,randomlyperformsvariousoperationsonthetree,anddisplaysanumberofperformancestatistics.RunningBTreeTestfromtheUserExecopensaviewercontainingacommandmenu,severaluser-adjustableparameters,andatableofresults.Theparametersare:DiskPages/BTreePagethenumberofPilotpages(256words)perlogicalBTreepage.Thelegalrangeis[1..16].CacheSizethenumberoflogicalBTreepageskeptinBTreeVM'scache[8..255].MaxTreeEntriesthemaximumnumberofentriesthetestprogramwilleverpermitthetreetocontain[100..65535].(Notethattheupperlimitisarestrictionofthetestprogram,notoneimposedbytheBTreepackageitself.)LongUpdatethisbooleanswitchcontrolswhetherornotBTreeVMistokeepthepermanentstateconsistentbetweenupdates.LongUpdate:yes''meansthatwritesareperformedonlywhennecessarytoobtainafreecacheentry.ValidateEveryUpdateturningonthisswitchcausestheentiretreetobecheckedaftereachupdate;thisisusefulonlyduringbasicdebuggingoftheBTreepackage,andisverycostly.AtestisstartedbyclickingStart,andisendedbyclickingStop.Theresultsareupdatedeveryfewsecondswhilethetestisrunning.ClickingInitTreeresetsthetreetoemptybeforethebeginningofatest.(ItisnecessarytodothisafterchangingDiskPages/BTreePageorMaxTreeEntries.)ResetStatsresetsthestatisticstozerobutdoesnotchangethetree.Theresultsare:Treesize,Levels,Entriesthesedescribethecurrentstateofthetree.ThetreesizeisinunitsoflogicalBTreepages.Operations,ms/opthesearethenumberofeachtypeofBTreeoperation(lookup,enumerate,insert,delete,replace)thathavebeenperformedandtheaveragenumberofmillisecondstakenbyeach.Inthecaseofenumerate,whatiscountedisthetotalnumberofentriesenumeratedandtheaveragetime3r b)  "Q')- 369<?C `HP $&+-?/4 ;_?< G; ^'[# \I  # %+,N 3B8q:O?EG Z  C"', 3648j;AEVG/ XZ2z/$m')Q, 457=O?DE WS TiI#g'+),o0 5%6:=0BG U_  4#N'-2Z58L=x?tACD S Go( #G% 't+.512 9=j?z G Qs M !^"'4)'- /23E6:<+=ClEHe O |9i l#)0+.1W268v: >,@CsD N$ 3!("T&B+p-,05w8:?%AEG L} 38v K$(*,/037:=3BD Jr1$5&*H+r147b;,@dB HD% W= $&/*/26B8<3 DWG/ F r #',(+V/013e86; =vDdG D K~!G$(+H184E9 BE CN!#hu%CN&CNr6CN7CN=+> F A P& $` +k1*2:6X   J"&*,O 3 ; c#&&)T0 3@7`8>@l : Nxl%',-1578<BE 8x \5S$ "3#(} /25W8>D 6  !2&^( 0 4?8jaQ% )+K0@ 79C@5D 2] s"#(+ 2u 0r=00 +!%,(,037^;?B~E ._ Xu +r+c+##&* +3 6u )< r)<)<E"$))+.36[9b=@9BD '  )#%&g ,.1X4: <?nEG/ %Qu #\ r.#\#\hI$*>,:.5799<?@F^ ! O %(+,{/V3d5<?C{   u |rN||!,%b)+/24G64;>AF  Xpn!'"%@)b.136. D ?c|uDDrDD"p#(;*Tu/D0dDr25D3D6;3=CF 9Hru$%r*T*.+03d59c=@)FHe  -`.T!ou'k(r5`6u7C8 r@A1uBCr N  z),"S&)Z,/4 u  r "%I* -9.1F47:j=>v@CE5 u ;r G U$%).,V.(2c8>6 El in$I').35 =ArCG 5 QF '"F#&)H.{0:4 AdFiw) TVm$/perentry.Aninsertisanupdatethatinsertsanentrynotpreviouslypresent,whileareplaceisanupdatethatreplacesanexistingentrywithanewonehavingthesamekey.Insertedandreplacemententrieshavesizesselectedatrandomfrom[2..33].CacheRefs,Hit%thetotalnumberofcallstoBTreeVM.ReferencePage,andthepercentageofthosecallsthataccessedapagealreadypresentinthecache.StorageReads,WritesthenumberoftimesBTreeVMtransferredaBTreepagebetweenthecacheandpermanentstorage.Writes/updatetheaveragenumberofstoragewritesperupdateoperation(insert,delete,orreplace).Totalelapsedtimethetotalnumberofsecondsthetesthasrun,excludingtestoverheadsuchasupdatingtheresultsintheviewer,butincludinganyconcurrentactivityelsewhereinCedar.%R+Wtime,ms/(RorW)thepercentageofthetotalelapsedtimespentwaitingforstoragereadsandwrites,andtheaveragenumberofmillisecondstakenbyeachone.Herearesomeresults,obtainedonaDorado:Testnumber1234567DiskPgs/BTreePg1111114CacheSize200200202020020050MaxTreeEntries2500250025002500600006000060000LongUpdateYesNoYesNoYesNoNoTreeSize190190190190436143601037Levels3333443Entries1961194919441940469134702747048ms/opLookup1.901.9138.826.272.855.040.1ms/opEnumerate0.080.081.821.682.752.990.84ms/opInsert4.7248.848.075.786.411971.9ms/opDelete6.6360.869.510112015485.9ms/opReplace3.9630.144.652.879.885.562.4Hit%1001008281818185Writes/update01.741.281.781.351.801.16%R+Wtime0879193949491ms/(RorW)21.717.018.727.428.328.8Afewthingsareworthnotingabouttheseresults.Test1usesacachelargeenoughtoholdtheentiretreeandneverwritesanypagestopermanentstorage(exceptattheendofthetest,whereitisn'tmeasured).ThisgivesagoodapproximationoftheCPUcostofoperationsperformedona3-levelBTree.Asisevidentfromcomparingtest1'sresultswiththeothers,normallymostofthetimerequiredforaBTreeoperationisspentwaitingforthedisk.Test2isthesameastest1exceptthatthepermanentstateiswrittentostorageattheendofeveryupdate.Tests3and4aresimilarto1and2exceptthatthecacheismuchsmallerthanthetree,somanymorereadsandwritesoccur.Interestingly,thechoiceofwhetherornottowritestorageaftereachupdatehasonlyamodesteffecton%R+Wtimeintests3and4.Thisispresumablybecausemostdirtypagesgetwrittenanywayasaresultofbeingdisplacedduringlaterreferencestootherpartsofthetree.(Thedecreaseinlookuptimefromtest3totest4isagoodindicationthatthisiswhatishappening.)Tests5through7areforamuchlargerBTree.Tests6and7showtheeffectofvaryingthepagesize,givenafixed-sizecache.Itisevidentthatthelargerpagesizeyieldsbetterperformanceinallrespects.Thevariationsintimeperreadorwriteacrossallthetestsarepartiallyexplainableasbeingrelatedtothesizeofthetree(alargertreemeansincreasedtimespentseekingbetweendifferentpagesofit).Variationsamongtestswithequal-sizetreesareprobablyPilotartifacts.Finally,itshouldbenotedthatthetestprogramprobestheBTreeatrandom.Realapplicationsmaybeexpectedtoexhibitmorelocalityintreereferences,andconsequentlyahigherhitrateandlesstime4r b) C?U #'x)-H/ 6;?A)F G `j!u$%(+0269?A ^\ $u \[ rG\[\[*r "%'7:= DE Z Cm#[% 'iu X5@rX5xX5!%X+ 238;I@CF Vu T rMTT   9!&*-17CEF KF]\A '*,/ IC _F u F rmF!-F(F0F8nF@/FGFu ErmE!-E(E0E8nE@/EGEu CqrmCq!-Cq(Cq0Cq8nCq@/CqGCqu A rmA!-A(A0A8nA@/AGAu @ rm@!-@(@0@8n@@/@G@u >vrm>v!->v(>v0>v8n>v@/>vG>vu <rm<!-<(<0<8n<@/<G<u ;#rm;#!-;#(;#0;#8n;#@/;#G;#u 9zrm9z!-9z(9z09z8n9z@/9zG9zu 7rm7!-7(7078n7@/7G7u 6'rm6'!-6'(6'06'8n6'@/6'G6'u 4~rm4~!-4~(4~04~8n4~@/4~G4~u 2rm2!-2(2028n2@/2G2u 1+rm1+!-1+(1+01+8n1+@/1+G1+u / rm/!-/(/0/8n/@//G/u -brm-!--(-0-8n-@/-G-u ,/2rm,/!-,/(,/0,/8n,/@/,/G,/ ) 'F $n),-0259_>P@C<E (    ',136/9; =@E F &a  &(+^.13 :ACE $:j!-#%)-/\39<>@C # ) Su !#&%  bk} ?#,%,/16Z8'<>A-DE /}M!n"%&+D.+0469>BDG E E<! ){+0]2579<=AFS  " "'*,/03e5Y8e9 AF2  7<|#c%(.3D6c <>BE}G/ O RuO8OrO(OW"8$%'S)*,+-90 69<=h@A  x%}"'+H,/<0t46d:4;@CAF )  s!R$ &*k-0]4H8@ @DAC  v"')+{.06b =?nCDG  sG(exb#)-L160;AEGh [ x) !$'$,0< 9j/"'+.T2m39= DG 5\ # *- 57@;=@CFiw) $TVm$Espentreadingandwritingstorage.5r b) Bw)TVm$ HELVETICA TIMESROMAN TIMESROMANLOGOLAUREL TIMESROMAN TIMESROMAN TIMESROMANY N # (/0j/31L BTree.tioga30-Mar-83 9:43:40