\input basic
% this is "newcycle.tex" of June 24, 1979 6:39 AM

\def\blackslug{\hjust{\hskip 10 pt
\vrule width 3pt height 6 pt depth 1.5 pt \hskip 1 pt}}

\def\boxit#1{\vjust{\hrule\hjust{\vrule\hskip 3pt
\vjust{\vskip 3pt #1 \vskip 3pt}\hskip 3pt\vrule}\hrule}}

\def\yskip{\penalty-50\vskip 3pt plus 6pt minus 2 pt}
\def\yyskip{\penalty-100\vskip 6pt plus 12pt minus 4 pt}
\def\sectskip{\vskip 15pt plus 3 pt minus 3pt}
\def\thmskip{\vskip 10pt plus 3 pt minus 3pt}
\def\lemskip{\vskip 8pt plus 3 pt minus 3pt}
\def\corskip{\vskip 8pt plus 3 pt minus 3pt}
\def\nobreak{\penalty 1000}

\font w=gacha10
\def\rm{\:p} \def\it{\:c} \def\bf{\:e} \def\fx{\:w}

\rm
\output{\hsize 5.75 truein\vsize 4.4 truein\save1\page}\eject
\output{\outa}
\def\outa{\output{\outb}\save1\page
\hsize 3.1875 truein\vsize 4.3 truein\jpar 1000}
\def\outb{\output{\outc}\save2\page}
\def\outc{\output{\outb}
\vjust{\vskip -0.25 truein\hjust{\hskip -0.15 truein
\vjust to 9.5 truein
{\baselineskip 0pt \lineskip 0pt
\hjust to 6.75 truein{\hfill\box1\hfill}
\vfill
\hjust to 6.75 truein{\box2\hfill\page}
\vskip 0.5 in\vskip -14 pt
\hjust to 6.75 truein{\hfill{\:p \count0}\hfill}
}}}
\advcount0
\vsize 9.0 truein}
\dispskip 8 pt plus 2 pt minus 5 pt
\baselineskip 14pt plus 1 pt minus 1 pt
\parindent 15pt
\parskip 5pt plus 1pt minus 1pt
\setcount0 1
{\small$\null$\hfill}\par
\ctrline{\hdr The Complexity of Finding Cycles in Periodic Functions}
\vskip 15pt
\ctrline{\vjust{\halign{\ctr{#}\qquad\qquad–\ctr{#}\qquad\qquad–\ctr{#}\cr
Robert Sedgewick*–Thomas G. Szymanski–Andrew C. Yao$↑\dag$\cr
{\it Brown University}–{\it Bell Laboratories}–{\it Stanford University}\cr
{\it Providence, Rhode Island}–{\it Murray Hill, New Jersey}–{\it Stanford, California}\cr}}}
\vskip 25pt
\ctrline{\bf Abstract}
\vskip 5pt
Given a function $f$ over a finite domain $D$ and an arbitrary starting point $x$, the sequence $x,f(x),f(f(x)), \ldots$ is ultimately periodic. Such sequences typically are used for constructing random number generators. The {\it cycle problem} is to determine the first repeated element $f↑n(x)$ in the sequence. Previous algorithms for this problem have required $3n$ operations. In this paper we show that $n\biglp 1+O\biglp 1/\sqrt M\bigrp\bigrp$ steps are necessary and sufficient to solve the problem, if $M$ memory cells are available to store values of the function. Our treatment is novel in that we explicitly consider the performance of the algorithm as a function of the amount of memory available as well as the relative cost of evaluating $f$ and comparing sequence elements for equality.\par
\vskip 5pt
\ctrline{\vjust{\hrule width 4in}}\vfill
\eject
\botinsert{\vskip 10pt\hrule width 5pc \vskip 3pt\baselineskip 9pt
\hbox par size{\small * This work was done in part under support from the National Science Foundation, grant number MCS75-23738 and in part while this author was a Visiting Scientist at the Xerox Palo Alto Research Center, Palo Alto, California.}\vskip 3 pt\hbox par size{\small $\dag$ This work was done while this author was a Visiting Scientist at the Xerox Palo Alto Research Center, Palo Alto, California}}
\noindent{\rm 1.\ \ I{\small NTRODUCTION}}\par
\nobreak
Suppose that we are given an arbitrary function $f$ which maps some finite domain $D$ into $D$. If we take an arbitrary element $x$ from $D$ and generate the infinite sequence $f↑0(x),f↑1(x),f↑2(x),\ldotss$, then we are guaranteed by the ‘‘pigeonhole principle’’ and the finiteness of $D$ that the sequence becomes cyclic. That is, for some $l$ and $c$ we have $l+c$ distinct values $f↑0(x),f↑1(x),\ldotss,f↑{l+c-1}(x)$ but $f↑{l+c}(x)=f↑l(x)$. This implies, in turn, that $f↑{i+c}(x)=f↑i(x)$ for all $ilxxx$. The problem of finding this unique pair $(l,c)$ will be termed the {\it cycle problem} for $f$ and $x$. The integer $c$ is the {\it cycle length} of the sequence, and $l$ is termed the {\it leader length}. Similarly, the elements $f↑l(x),f↑{l+1}(x),\ldotss,f↑{l+c-1}(x)$ are said to form the {\it cycle} of $f$ on $x$ and $f↑0(x),f↑1(x),\ldotss,f↑{l-1}(x)$ are said to form the {\it leader} of $f$ on $x$. For notational convenience, the number $l+c$ of distinct values in the sequence will be denoted by $n$.\par
The cycle problem arises when analyzing the effectiveness of certain random number generators that produce successive ‘‘random’’ values by applying some function to the previous value in the sequence [1, Section 3.1]. Solving the cycle problem gives the number of distinct random numbers which are produced from a given seed. It is often possible to design functions that have null leaders and maximum cycles for all starting values, but such functions may be difficult to implement correctly. An algorithm for the cycle problem can be of use in checking the characteristics of an unknown random number generator.\par
One method for cycle detection has been given by Floyd [1, Exercise 3.1-7]. The idea is to have two variables taking on the values in sequence, one being advanced twice as fast as the other, as shown in the program of Figure 1.\par

\vskip 20pt
\ctrline{\vjust{\baselineskip 11pt
\halign{\lft{#}\cr
$y←z←x\,$;\cr
{\bf repeat}\cr
\quad $y←f(y)\,$;\cr
\quad $z←f(f(z))\,$;\cr
{\bf until} $y=z\,$;\cr}}}
\vskip 5pt
\ctrline{{\bf Figure 1.} Floyd’s algorithm.}
\vskip 10pt
\noindent This algorithm stops with $y=f↑i(x)=f↑{2i}(x)=z$, where $i$ is the smallest multiple of $c$ which is greater than or equal to $l$. If $l=0$ then $3n$ function evaluations are performed, and if $c=1$ or $l=c+1$ then a total of $3(n-1)$ function evaluations are performed. This may be objectionable when the cost of evaluating $f$ is high relative to the cost of comparisons.\par
Another method, due to Gosper, {\it et. al.}, was designed to circumvent the overhead of advancing two independently operating ‘‘copies’’ of the generating function as required in Floyd’s method. Their method is to save certain values of the sequence in a small table and to search for each new value to see if it has previously been generated. The table update rule is to save the $i$th value generated in {\fx TABLE}[$j$], where $j$ is the number of trailing zeroes in the binary representation of $i$. This method can require as many as $l+2c$ function evaluations, or $2n$ if $l=0$ and ${3 \over 2} n$ if $l=c$. Moreover, it requires an equal number of table searches, which is undesirable when the cost of comparisons is high relative to the cost of evaluating $f$.\par
These algorithms are suitable for detecting the existence of a cycle, and the value of $c$ can be found by proceeding around the cycle one additional time. Of course, this may be undesirable if $c$ is very large. Furthermore, neither algorithm has provision for directly finding $l$ except by starting back at the initial value.\par
In this paper, we develop an algorithm that solves the cycle problem using $n\biglp 1+O\biglp {1/\sqrt M}\bigrp \bigrp $ function evaluations in the worst case, where $M$ is the amount of memory available for storing generated function values. The number of memory operations used is $O\biglp {n/\sqrt M}+ M\log{{\textstyle n}\over {\textstyle M}}\bigrp $. The algorithm is developed in Section 3 in two parts: one algorithm detects the cycle and a companion algorithm recovers the values of $l$ and $c$. The worst case analysis is given in Section 3. In Section 4, we derive a lower bound which shows that no algorithm which uses the same fundamental operations can have better worst case performance. A generalization of the problem and some concluding remarks are offered in Section 5.\par
\sectskip\noindent{\rm 2.\ \ T{\small HE ALGORITHM}}\par
\nobreak
Any algorithm for the cycle problem which successively evaluates $f$ must have a running time exceeding $ntf̌$ where $tf̌$ is the (assumed constant) time to perform one evaluation of $f$. Clearly, an algorithm could be designed which achieves a running time of $ntf̌+O(n\log n)$ by employing, for example, a balanced tree scheme to save all the generated elements. Such an algorithm is unsatisfactory for at least two reasons. First, it is unrealistic to assume an unlimited supply of memory. Second, the analysis does not take into consideration the relative cost of evaluating $f$ and comparing two domain elements for equality. Let us therefore construct a framework in which these latter considerations can be addressed. We shall be particularly interested in the tradeoff between memory size and execution time.\par
Let {\fx TABLE} be an associative store capable of storing up to $M$ pairs $(y,i)$ of domain elements and integers. Both elements of the pair are {\it keys} in the sense that at most one pair can occur in {\fx TABLE} with a given first (or second) component. Let $tǔ$ be the time needed to insert or delete a pair from the {\fx TABLE}, i.e., to update the {\fx TABLE}, and let $tš$ be the time needed to search the {\fx TABLE} for a given key. Depending on the implementation of {\fx TABLE}, $tǔ$ and $tš$ might be constants, logarithmic functions of $M$ or even linear functions of $M$. (See Section 5.) As mentioned above, $tf̌$ is assumed to be a constant, and all other operations of the algorithm are assumed to be free.\par
Within this model, we are ready to consider an efficient algorithm for the cycle problem. The idea is to limit the number of operations performed on {\fx TABLE} by avoiding the store and search operations for most generated function values. Rather, these operations will be performed only sufficiently often to guarantee detection of the cycle. To this end, we shall introduce two parameters, $b$ and $g$. Figure 2 exhibits an algorithm which only stores every $b$th function value in {\fx TABLE} and which only does searches after every $g$ stores (at which point it does searches on $b$ consecutive values).\par

\vskip 20pt
\ctrline{\vjust{\baselineskip 11pt
\halign{\lft{#}\cr
$i←0$;\cr
$y←x$;\cr
{\bf repeat}\cr
\qquad {\bf if} $i \eqv 0 \mod b$ {\bf then} {\it insert} $(y,i)$;\cr
\qquad $y←f(y)$;\cr
\qquad $i←i+1$;\cr
\qquad {\bf if} $i<b \mod {gb}$ {\bf then} {\it search} $(y,j)$;\cr
{\bf until} {\it found}\/;\cr}}}
\vskip 5pt
\ctrline{{\bf Figure 2.} Preliminary version of the algorithm.}
\vskip 10pt
\noindent Here the procedure {\it search} sets {\it found} to {\bf false} if $y$ is not in {\fx TABLE}, otherwise it sets {\it found} to {\bf true} and $j$ to the minimum value of $j$ for which $(y,j)$ is in {\fx TABLE}. The procedure {\it insert} puts $(y,i)$ into {\fx TABLE} without checking to see if there is another entry with $y$ as the first component. The modulus computations in this program are used for clarity: an actual implementation would use simple counters instead.\par
The program must halt because once the cycle is reached, at least one value out of every block of $b$ consecutive values looked up must be in {\fx TABLE}. It is possible for the algorithm to overshoot the point at which the cycle first returns to itself, but the algorithm will always detect the cycle before the $(n+gb)$th evaluation of $f$. The running time is thus bounded by $$(n+gb)\left( tf̌+{{tš}\over g}+{{tǔ}\over b}\right). $$\par
It is interesting to note that a dual algorithm can be developed by interchanging the roles of {\it search} and {\it insert} in Figure 2. Many of the results of this paper can be carried through for the dual algorithm as well (with the roles of $b$ and $g$ interchanged). In fact, for many search strategies, {\it insert} might be implemented by doing a {\it search} first, so it is tempting to contemplate an algorithm which involves only one operation on {\fx TABLE}. However, the memory management and the analysis become quite complicated in this case, and it is convenient to keep the {\it search} and {\it insert} functions separated.\par
\botskip 20pt\botinsert
{\ctrline{\vjust{\baselineskip 11pt
\halign{\lft{#}\cr
$i←0$;\cr
$y←x$;\cr
$b←1$;\cr
$m←0$;\cr
{\bf repeat}\cr
\qquad {\bf if} $i \eqv 0 \mod b$ {\bf and} $m=M$ {\bf then}\cr
\qquad\quad {\bf begin}\cr
\qquad\qquad {\it purge} $(b)$;\cr
\qquad\qquad $b←2b$;\cr
\qquad\qquad $m←m/2$;\cr
\qquad\quad {\bf end};\cr
\qquad {\bf if} $i \eqv 0 \mod b$ {\bf then}\cr
\qquad\quad {\bf begin}\cr
\qquad\qquad {\it insert} $(y,i)$;\cr
\qquad\qquad $m←m+1$;\cr
\qquad\quad {\bf end};\cr
\qquad $y←f(y)$;\cr
\qquad $i←i+1$;\cr
\qquad {\bf if} $i<b \mod {gb}$ {\bf then} {\it search} $(y,j)$;\cr
{\bf until} {\it found}\/;\cr}}}
\vskip 5pt
\ctrline{{\bf Figure 3.} The cycle detecting algorithm.}}
We could arrange to have the algorithm spend virtually all its time doing the (unavoidable) task of stepping $f$ by choosing $b$ and $g$ suitably, were it not for the fact that {\fx TABLE} will soon fill up. Accordingly, we introduce the following memory management mechanism: whenever {\fx TABLE} gets filled, remove every other entry from {\fx TABLE}, double $b$ and continue. This has the same effect as restarting the program from the beginning with the larger value of $b$, with no additional function evaluations being required. The algorithm thus adapts its behavior to the problem at hand. The final version of the algorithm is shown in Figure 3. Note that $b$ is now a variable of the algorithm, while $g$ is still a parameter. We shall see later how to best choose the value of $g$.\par
Memory management is controlled with a variable $m$, which counts the number of items currently in {\fx TABLE}, and a procedure {\it purge} $(b)$, which removes all entries $(z,j)$ with $j/b$ odd from {\fx TABLE}.\par
This algorithm does not perform exactly as if the simple algorithm of Figure 2 had been run from the beginning with the final value of $b$, because of subtle interactions between the {\it search}, {\it insert}, and {\it purge} procedures. To make the algorithm perform properly, we need to introduce suitable restrictions on the choice of $g$, as shown in the following lemmas.\par
\lemskip\noindent{\bf Lemma 1.} The value of $i$ when {\it purge} is called satisfies $i=2↑kM$ for some integer $k0xxx$.\par
\noindent{\it Proof:} By induction on the number of memory purges, it is clear that the $(k+1)$st call to {\it purge} happens with $i=2↑kM$, and changes $b$ from $2↑k$ to $2↑{k+1}$. That is, purges occur when $i=bM$.\blackslug\par
\lemskip\noindent{\bf Lemma 2.} The value of $b$ when {\it search} is called is $i/M$ rounded up to the next power of $2$.\par
\noindent{\it Proof:} As in the proof of Lemma 1, we have $b=2↑{k+1}$ for $2↑kM<i 2↑{k+1}M$.\blackslug\par
\lemskip\noindent{\bf Lemma 3.} Whenever {\it search}, {\it insert}, and {\it purge} are called during the execution of the cycle detecting algorithm, we have $(f↑j(x),j)\in$ {\fx TABLE} if and only if $0 j<i$ and $j \eqv 0 \mod b$.\par
\noindent{\it Proof:} Obvious by the definition of {\it purge}. \blackslug\par
\lemskip\noindent{\bf Lemma 4.} If $Mgxxx+3$, then the cycle detecting algorithm terminates with $$n i<n+(g+3)bň,$$ where $bň$ is $n/M$ rounded up to the next power of $2$ (i.e., the value of $b$ when $i$ is $n$).\par
\noindent{\it Proof:} Since we must have $i>j$ and $f↑i(x)=f↑j(x)$ at termination, it is clear from the definition of $n$ that $i$ cannot be less than $n$.\par
Let $i0̌$ be the unique multiple of $gbň$ in the range $n i0̌<n+gbň$, and let $i1̌$ be the unique multiple of $2gbň$ in the range $n i0̌<n+2gbň$. Two cases now arise, depending on whether a {\it purge} occurs between the time $i=n$ and the time the algorithm terminates.\par
By Lemma 1, a {\it purge} cannot occur after $i=n$ until $i=bňM$. Therefore, if $i0̌+bň bňM$, then {\it search} is called for $i0̌,i0̌+1,\ldotss,i0̌+bň-1$. Since $i0̌nxxx$, the values looked up are the same as the $bň$ values starting at $i0̌-c$ and, by Lemma 3, one of them must be in {\fx TABLE}, so the algorithm must halt with $i i0̌+bň<n+(g+1)bň$.\par
If $bňM<i0̌+bň$ then a {\it purge} will occur before the full block at $i0̌$ is checked for matches. But this {\it purge} will set $b$ to $2bň$, and {\it search} will be called for $i1̌,i1̌+1,\ldotss,i1̌+2bň-1$, provided that a second {\it purge} does not occur. As before, one of the values $i1̌-c,\ldotss,i1̌+2bň-1-c$ must be in {\fx TABLE}, causing the algorithm to halt with $i i1̌+2bň<n+(g+3)bň$. The condition $Mgxxx+3$ ensures that a second {\it purge} does not occur. \blackslug\par
These lemmas describe the performance of the algorithm well enough for us to calculate its worst case running time. Intuitively, we expect that the time required should decrease as $g$ and $M$ increase. A small search interval implies frequent {\it search}\/es but a short overshoot past the beginning of the cycle; a large interval limits the number of {\it search}es but at the risk of allowing a long overshoot.\par
Before proceeding to the detailed analysis we need to finish the solution to the problem: determine $l$ and $c$ once the cycle has been detected.\par
The cycle detecting algorithm halts as soon as it discovers a pair $i>j$ of integers for which $f↑i(x)=f↑j(x)$. This implies that $j>l$ and $i\eqv j\mod c$, but we need to use our stored {\fx TABLE} values to find the exact values of $l$ and $c$. The variety of possible situations makes it necessary to design this part of the algorithm carefully. Figure 4 shows a companion algorithm to the algorithm of Figure 3 which recovers the solution $(l,c)$ once the cycle detecting algorithm has terminated.\par

\vskip 20pt
\ctrline{\vjust{\baselineskip 11pt
\halign{\lft{#}\cr
$i↑\prime ←gb\lfloor i/gb\rfloor -gb$;\cr
{\bf if} $i↑\prime>j$\cr
\quad {\bf then} $c←i-j$;\cr
\quad {\bf else} $c←$smallest $c$ with $f↑j(x)=f↑{j+c}(x)$;\cr
{\bf if} $i↑\prime<c$ {\bf then} $i↑\prime=c$;\cr
$j↑\prime ←i↑\prime-c$;\cr
$l←j↑\prime+$smallest $l↑\prime$ with $f↑{j↑\prime +l↑\prime}(x) =f↑{i↑\prime +l↑\prime}(x)$;\cr}}}
\vskip 5pt
\ctrline{{\bf Figure 4.} The recovery algorithm.}
\vskip 10pt
\noindent The last statement in this program may require evaluating $f$ starting at a point which is not in {\fx TABLE}. This can be done with little extra overhead. For example, $f↑{j↑\prime}(x)$ can be found by doing a {\it search} for $f↑{b\lfloor j↑\prime/b\rfloor}(x)$ and then applying $f$ exactly $j↑\prime \mod b$ times. This takes at most $tš+2bňtf̌$ time because the final value of $b$ is either $bň$ or $2bň$.\par
\lemskip\noindent{\bf Lemma 5.} The recovery algorithm correctly finds $c$.\par
\nobreak
\noindent{\it Proof:} Since $f↑i(x)=f↑j(x)$, we have $i\eqv j \mod c$, that is, the cycle size $c$ divides $i-j$. If $i↑\prime  j$, then $i-j<gb+b (g+1)2bň$ and the algorithm proceeds by brute force. If $i↑\prime >j$, then an entire block of unsuccessful {\it search}\/es for $f↑{i↑\prime}(x),\ldotss,f↑{i↑\prime+b-1}(x)$ was performed between $j$ and $i$, and the cycle must have been traversed only once. Thus $c$ is exactly $i-j$.\blackslug\par
\lemskip\noindent{\bf Lemma 6.} The recovery algorithm correctly finds $l$.\par
\nobreak
\noindent{\it Proof:} Clearly $i↑\prime <l+c i$ or else the algorithm would have terminated earlier. Since $j↑\prime\eqv i↑\prime\mod c$, proceeding forward in synchronization from $j↑\prime$ and $i↑\prime$ will lead to the first duplicate value.\blackslug\par
It is possible to design faster recovery procedures for many situations. For example, $c$ could be found by applying ‘‘divide and conquer’’ to the prime decomposition of $i-j$, and $l$ could be found by a binary search procedure. However, the recovery time is heavily dominated by the cycle detection time, so such sophisticated implementations might not be worth the effort.\par
\sectskip\noindent{\rm 3.\ \ W{\small ORST CASE ANALYSIS}}\par
\nobreak
The algorithms of the previous section can provide an efficient solution to the cycle problem if the parameter $g$ is chosen intelligently. In this section, we shall analyze the running time of the algorithms to find the best choice of $g$. We shall concentrate on choosing a value of $g$ which minimizes the worst case running time.\par
To begin, we need to add up all the costs involved when the algorithms are run to produce the solution $(l,c)$ to a particular instance $(f,x)$ of the cycle problem.\par
\thmskip\noindent{\bf Theorem 1.} In the worst case, the running time of the algorithms to solve the cycle problem is $$n\left( 1+{2g+6\over M}\right) \left( tf̌+{tš\over g}\right) +tǔM\log2̌{4\sqrt2n\over M},$$ provided that $Mgxxx+3$. The additional time required to find the values of $l$ and $c$ is at most $$n{4(3g+1)\over M}tf̌+tš.$$ \par
\noindent{\it Proof:} For the cycle detecting algorithm, the first term follows immediately from Lemma 4, since $bň$ could be as large as $2n/M$. For the second term, we need to count $tǔ$ for each element remaining in {\fx TABLE} (for its insertion) and $2tǔ$ for elements entered in {\fx TABLE} and subsequently removed in {\it purge}. The algorithm either performs $\log2̌bň+1$ {\it purge}\/s and finishes with a half full {\fx TABLE} for a total cost of $tǔ({1\over 2}M+M(\log2̌bň+1))$, or else it performs $\log2̌bň$ {\it purge}\/s and finishes with a {\fx TABLE} which is more than half full, for a total worst case cost of $tǔ(M+M\log2̌bň)$. The given bound follows from Lemma 4 as above. Note that the worst case is achieved for infinitely many values of $l$ and $c$.\par
In the recovery algorithm, $gb$ function evaluations could be required to find $c$, and $2gb$ function evaluations may be needed to find $l$, with an additional $b$ function evaluations and $1$ table search needed to find $f↑{j↑\prime}(x)$ or $f↑c(x)$. The final value of $b$ is either $bň$ or $2bň$, and is therefore bounded by $4n/M$.\blackslug\par
\corskip\noindent{\bf Corollary.} If $tš=O(1)$ and $tf̌=O(M)$ then the total running time of the algorithms to solve the cycle problem will be less than $$ntf̌\left( 1+O\left(\sqrt{tš\over M}\right) \right) ,$$if $g$ is chosen to be the closest power of two to $\sqrt {Mtš/14tf̌}$.\par
\noindent{\it Proof:} Calculating the total running times given in Theorem 1 to within $O(n/M)$ gives the expression $$n\left( {14gtf̌\over M}+{tš\over g}+O\left( {tš\over M}\right) \right).$$ Differentiating, we find that the given choice of g minimizes the total running time to $$n\left( tf̌+2\sqrt{14tštf̌\over M}+O\left( {tš\over M}\right) \right),$$ which implies the stated result. The asymptotic result is not affected by restricting $g$ to be an integer.\blackslug\par
Note, in particular, that a balanced tree implementation will have $tš=O(\log M)$, and an address calculation (hashing) method could have $tš=O(1)$. In both cases, the worst case running time will approach $ntf̌$ as $M$ gets large. For the typical case where $l+c>>M$, the algorithms can be made to run in very nearly optimal time by increasing $M$. In fact, we shall see in the next section that the algorithms are optimal in a much more precise sense.\par
\sectskip\noindent{\rm 4.\ \ O{\small PTIMALITY}}\par
\nobreak
The cycle detecting algorithm given above can be thought of as a family of algorithms (parameterized by $g$) that exhibits a tradeoff phenomenon between the two types of basic operations. That is, in the range $1 g M-3$, if we are willing to use $O(ng/M)$ extra function evaluations in addition to the $n$ required to evaluate $f↑{(n)}(x)$ then we need only use $O(n/g)$ table searches. The corollary to Theorem 1 shows that $g$ can be chosen to allow the cycle problem to be solved in time $ntf̌( 1+O(\sqrt{tš/ M}))$. In this section, we shall show these results to be optimal in the sense that no algorithm which does successive function evaluations and has a limited amount of memory to store generated function values can do better.\par
We shall consider the problem of cycle detection, i. e. finding $ij$ such that $f↑i(x)=f↑j(x)$. The lower bound results of course apply to the general cycle problem, because cycle detection is a weaker problem.\par
{\bf The Model.} An algorithm $A$ uses an array $T[1],\ldotss,T[M]$, each cell can store a value in the domain $D$. Initially, $T[1]=x$ and all other $T[i]$ are null. The algorithm can make two types of moves: an F-move picks $i,j$ (which may be equal) and sets $T[j]←f(T[i])$, and an S-move picks $i$ and tests ‘‘Is there a $ji$ such that $T[j]=T[i]$?’’. The computation executes one move at each time $t=1,2,\ldotss$, and halts as soon as an S-move results in a ‘‘yes’’ answer. The algorithm is equipped with a tape that can record the entire history of the computation (the values of $i,j$ in the F-moves, and the $i$ values and results in the S-moves), and the choice of the next move can depend on all the information recorded on the tape. Of course, no values from $D$ can be stored on the tape, and as we shall see, the other information will be of little use to the algorithm.\par
For any instance $(f,x)$ of the cycle problem, let $F{̌(f,x)}(A)$ and $S{̌(f,x)}(A)$ be the number of F-moves and S-moves used by $A$; the running time is of course $F{̌(f,x)}(A)tf̌+S{̌(f,x)}(A)tš$. In keeping with this, we shall use the notation $n{̌(f,x)}$ to denote the sum of the leader plus cycle length for an instance $(f,x)$ of the cycle problem.\par
The following theorem gives explicit bounds on the tradeoff required between function evaluations and table searches for any algorithm for the cycle problem.\par
\thmskip\noindent{\bf Theorem 2.} Let $k,n0̌$ be fixed, and A be an algorithm that satisfies $F{̌(f,x)}(A) (1+k)n{̌(f,x)}$ for all $(f,x)$ with $n{̌(f,x)}nxxx0̌$. Then $$S{̌(f,x)}(A){xxxn{̌(f,x)}\over 4kM(1+2k)}$$ for all $(f,x)$ with $n{̌(f,x)}$ sufficiently large.\par
\noindent{\it Proof:} Consider the algorithm A working on an input string $(f,x)$ with $n{̌(f,x)}=$̄. Let $tm̌$ be the time when $f↑m(x)$ is first evaluated (stored in the array $T$), and let $s(m,m↑\prime)$ denote the number of S-moves made in the time interval $[tm̌,t{̌m↑\prime})$. The method of proof will be to bound $s(m,m↑\prime)$ for appropriate $m,m↑\prime$, then sum these results over a large range of intervals to prove the theorem.\par
First, we shall show that if $mnxxx0̌$ and $m↑\prime>(1+k)m$ then we must have $$s(m,m↑\prime){xxx1\over 4(M-1)}{m↑2\over m↑\prime}.$$ To prove this, suppose that the algorithm $A$ has not yet halted at time $t{̌m↑\prime}$. Then the S-moves made so far must have given enough information to conclude that it is impossible to have $n{̌(f,x)}=m$; otherwise we would have $$F{̌(f,x)}(A)mxxx↑\prime >(1+k)m=(1+k)n{̌(f,x)}.$$ All the S-moves contributing to this conclusion clearly are made in the time interval $[tm̌,t{̌m↑\prime})$, so it remains to count the number of S-moves necessary to discount all possible cycle sizes. The algorithm must ‘‘prove’’ that the cycle size is not $c$ for all $c$ in the range $0 c m$, otherwise it could not conclude that $n{̌(f,x)}m$. It will be convenient to deal only with those values of $c$ in the range $\alpha m c m$, with $0 \alpha 1$. For each such $c$, let $f↑{jč}(x)f↑{{jč}↑\prime}(x)$ with $jč<{jč}↑\prime<m↑\prime$ be the ‘‘proof’’ that it is impossible to have $n{̌(f,x)}=m$ and cycle size $c$. Then ${jč}↑\prime-jč = hčc$ for some integer $hč1xxx$. Since ${jč}↑\prime<m↑\prime$ and $c\xxxalpha m$, we must have $hč m↑\prime/ \alpha m$. Thus, each inequality $f↑{jč}(x)f↑{{jč}↑\prime}(x)$ can serve as the ‘‘proof’’ for at most $m↑\prime/\alpha m$ cycle sizes in the range $\alpha m c m$. Because each S-move can supply only $M-1$ inequalities, we must have $$(M-1){m↑\prime\over\alpha m}s(m,m↑\prime)(xxx1-\alpha)m.$$ The given bound on $s(m,m↑\prime)$ follows by taking $\alpha = {1\over 2}$.\par
Next, we shall show that the number of S-moves made before time $t{̌n{̌(f,x)}}$ is already $n{̌(f,x)}/4kM(1+2k)$ or more, if $n{̌(f,x)}$ is large enough. Since we know that algorithm $A$ will not halt before $t{̌n{̌(f,x)}}$, this suffices to prove the lemma.\par
For $n{̌(f,x)}nxxx0̌$, define $t$ to be the largest integer with $\lceil n{̌(f,x)}/(1+k)↑t \rceil xxxn0̌$, and define $m0̌,m1̌,\ldotss,mť$ by $mǐ=\lceil n{̌(f,x)}/(1+k)↑{t-i} \rceil$. Note that $n0̌ m0̌ m1̌ \ldots mť n{̌(f,x)}$. Clearly we must have $$\eqalign{S{̌(f,x)}(A)–\xxxsum{̌0 i<t}s(m{̌i},m{̌i+1})\cr
–{xxx1\over 4(M-1)}\sum{̌0 i<t}{{m{̌i}}↑2\over m{̌i+1}}\cr
–{xxx1\over 4(M-1)}{n{̌(f,x)}-n0̌\over 1+2k}\cr}.$$ The last inequality follows, for sufficiently large $n{̌(f,x)}$, from the inequalities $m{̌i+1} (1+2k)mǐ$ for $0 i<t$ and $k(m0̌+m1̌+\ldots + m{̌t-1})nxxx{̌(f,x)}-n0̌$. For sufficiently large $n{̌(f,x)}$, this implies that $$S{̌(f,x)}(A){xxxn{̌(f,x)}\over 4kM(1+2k)},$$ which completes the proof. \blackslug\par
\corskip\noindent{\bf Corollary 1.} Let $A$ be an algorithm for the cycle detection problem which uses a table of size $M$, and let $g$ be a parameter in the range $1 g M$. Suppose that $A$ uses at most $n{̌(f,x)}(1+{g/M})$ function evaluations for all instances $(f,x)$ of the cycle problem with $n{̌(f,x)}$ sufficiently large. Then $A$ must do at least $n{̌(f,x)}/12g$ table searches for all instances $(f,x)$ of the cycle problem with $n{̌(f,x)}$ sufficiently large.\par
\noindent{\it Proof:} The result follows directly from the theorem by taking $k=g/M$.\blackslug\par
\corskip\noindent{\bf Corollary 2.} If $tš=O(1)$ and $tf̌=O(M)$ then the total running time of any algorithm to solve the cycle detection problem must be at least $$ntf̌\left( 1+\Omega\left(\sqrt{tš\over M}\right) \right)$$ for infinitely many $n$.\par
\noindent{\it Proof:} Let $c1̌,c2̌$ be constants such that $tf̌ c1̌$ and $tš c2̌M$, and let $g=\sqrt{tšM/c2̌}$. For any cycle detecting algorithm $A$, there are two cases: (i) Algorithm $A$ uses $n(1+{g/M})$ F-moves for infinitely many $n$, which implies a running time exceeding $$ntf̌\left( 1+{1\over\sqrt{c2̌}}\sqrt{tš\over M}\right)$$ for infinitely many $n$. (ii) Algorithm A uses at most $n(1+{g/M})$ F-moves for all large $n$, which implies, by Theorem 2, a running time exceeding $$ntf̌+{ntš\over 12g}nxxxtf̌\left( 1+{\sqrt{c2̌}\over 12c1̌}\sqrt{tš\over M}\right)$$ for all instances of the problem with $n{̌(f,x)}$ sufficiently large. The stated result follows immediately. Note that the constant in the $\Omega$ term depends on the constants in $tš=O(1)$ and $tf̌=O(M)$.\blackslug\par
The above results are asymptotic statements about the performance of algorithms for the cycle problem for instances $(f,x)$ of the cycle problem with $n{̌(f,x)}$ large. Of course, it is implicit in these statements that the size of the domain $D$ must also be large, since we have the constraint that $n{̌(f,x)} \leftv D\rightv$. If $\leftv D\rightv$ is known to be small, the algorithm can obviously make use of that information.\par
The lower bounds derived in this section shed some light on the possibility of extending further the algorithms in Section 2. The cycle detecting algorithm of that section has a running time of $$(1+c1̌{g\over M})ntf̌+c2̌{ntš\over g}$$ for small constants $c1̌,c2̌>0$, but only under the restrictions that $g$ is $O(M)$ and $\Omega (1)$. It is of interest to inquire whether the range for $g$ can be extended. For example, if we are willing to use many more, say, $Mn$ extra function evaluations, can this decrease the number of table searches to $O(n/M↑2)$? Theorem 2 provides part of the answer, since it says that, for $gMxxx$, even with an extra $O(gn/M)$ function evaluations, one still has to use $\Omega (Mn/g↑2)$ table searches for large $n$. We do not know if this can be achieved.\par
For small $g$, the question of extendability has a more definite answer: a lower bound on the number of function evaluations can be derived which is independent of $g$. One can show that, for any algorithm $A$ which uses memory $M$ and and $n>4M$, there exists an instance $(f,x)$ of the cycle problem with $n{̌(f,x)}=n$ such that $$F{̌(f,x)}(A)(xxx1+{1\over 2M})n-1.$$ This can be proved by considering the contents of the array $T$ at time $tň$ and suitably choosing the cycle length. We leave the details to the reader.\par
Furthermore, this last statement and Theorem 2 can be combined to prove that the tradeoff of Theorem 2 must hold in a slightly stronger sense: For any algorithm $A$ which uses memory $M$, there exists a constant $g>{1\over 2}-\epsilon$ such that, for infinitely many $n$, there is an instance $(f,x)$ of the cycle problem with $n{̌(f,x)}=n$ for which $A$ has a running time no less than $$ntf̌(1+{g\over M})+c{̌\epsilon}{ntš\over \dispstyle{g(1+{g\over M}})},$$ where $\epsilon$ is a small positive constant and $c{̌\epsilon}$ is a constant depending only on $\epsilon$. We omit the details of this proof. As mentioned above, it is an open question whether these bounds can be achieved when $g>>M$.\par
\sectskip\noindent{\rm 5.\ \ C{\small ONCLUDING REMARKS}}\par
\nobreak
We have dealt exclusively with algorithms with good worst case performance for solving a particular instance of the cycle problem on an unknown function. The problem is also interesting under other variations of the model.\par
If we take the function to be {\it random} in some sense, then we can talk about an average case measure of complexity. R. W. Floyd has pointed out that studying the probabalistic structure of random functions over $D$ could lead to savings on the average. For example, $l$ is relatively large in this case, so it may not be worthwhile to save or search for values at the beginning.\par
Another variation is to consider the case where the cost of computing $f↑j(x)$ is $tf̌$, independent of $j$ and $x$. A famous factoring algorithm due to Shanks[4] is based on this variant of the problem. Shanks’ solution, which is based on the additional knowledge that $l=0$ and that $n=c$ is bounded by some constant $N$, finds $n$ in time proportional to $\sqrt N$ using memory proportional to $\sqrt N$. The method is similar to the dual algorithm mentioned in Section 2: save the first $\sqrt N$ values generated, then do table searches at intervals of $\sqrt N$. The cycle detecting algorithm of Figure 3 also works in this case, but it runs slightly longer due to its’ need to discover that $l=0$ and its’ need to adapt to the value of $c$. A rough analysis for this case goes as follows. From the program, we see that the function evaluation cost will be about the same as the table update cost, so the expression for the total running time in Theorem 1 tells us that the best thing to do is to pick $g$ to be as large as possible (about $M$), for a running time of $$O\left({ntš\over M}+(tǔ+tf̌)M\log2̌{n\over M}\right)$$. This expression is minimized to $O(\sqrt{N\log2̌N})$ by choosing $M$ to be $O(\sqrt{N/\log2̌N})$ (if enough memory is available and we know that $n$ is bounded by $N$).\par
In general, the domain $D$ can be partitioned by $f$ into disjoint sets with the property that all points in each set lead to the same cycle. Properties of the {\it cycle structure} (i.e. the number of sets, their sizes, or the sizes of the cycles) can be found by solving the cycle problem on all points of $D$. The algorithms of this paper can be adapted to avoid retraversing long cycles by maintaining versions of {\fx TABLE} for each cycle.\par
It is also possible to generalize the problem of finding cycles in the following way. As before, we are given a unary function $f$, only now we allow the domain of $f$ to be infinite. We suppose that we are also given a binary predicate $P$ on $D\times D$. The problem is to find the smallest $n$ for which there exists an $l<n$ such that $P(f↑n(x),f↑l(x))$. In the absence of any further information, it is easy to show that this problem requires $n\choose 2$ evaluations of $P$. However, if $P$ is {\it preserved} by $f$, that is, $P(a,b)$ implies $P(f(a),f(b))$, then the algorithms of this paper can be made to run in time $n(tf̌+O(tp̌))$, where $tp̌$ is the time needed to evaluate $P$. If we extend the domain D to be infinite, then the cycle will still be detected if one exists. It is interesting to note that the algorithms of [1] and [2] simply do not work for this problem.\par
An earlier version of this paper [3] left open the question of whether an algorithm exists for the cycle problem which uses a finite amount of memory and an optimal number of function evaluations. We have resolved this question in Section 4, with the somewhat surprising result that the algorithm given in Section 2 may be viewed as optimal for the range of the parameters of the problem for which it is applicable.\par
\def\ref \num#1\full#2\refend {\hbox to size{\hfill\hbox par 2.85truein{\hjust to 0.0in {$\hskip -0.3375truein \hjust{#1} \hskip 0pt plus 1000pt minus 1000pt$}#2}}}
\sectskip\noindent{\rm 6.\ \ R{\small EFERENCES}}\par
\nobreak
\vskip 15pt
\ref \num{[1]}\full{Knuth, D. E. {\it Seminumerical Algorithms}, (Vol. 2 of {\it The Art of Computer Programming}), Addison-Wesley, 1969.}\refend\par
\vskip 10pt
\ref \
num{[2]}\full{Gosper, R., {\it et.al.},‘‘HAKMEM,’’ M.I.T. Artificial Intelligence Lab Report No. 239, Feb. 1972, item 132.}\refend\par
\vskip 10pt
\ref \
num{[3]}\full{Sedgewick, R. and Szymanski, T. G. ‘‘The Complexity of Finding Periods’’, {\it Proceedings of the 11th Symposium on the Theory of Computing}, Atlanta, Georgia, 1978}\refend\par
\vskip 10pt
\ref \
num{[4]}\full{Shanks, D. ‘‘Class Number, A Theory of Factorization, and Genera’’, {\it Proceedings of Symposia in Pure Math.}, American Math. Society, Providence, R. I., 1970}\refend\par
\vfill\eject\end