算法分析基础

Correctness of Algorithms

Definition

When we talk about the correctness of an algorithm, we actually mean the correctness with respect to its specification.

Specification expresses the task to be done by the algorithm, which consists of:

  • (optional) name of algorithm and list of its arguments
  • Precondition (or initial condition): it species what is correct input data to the
    problem
  • Postcondition (or final condition): it species what is the desired result of the algorithm)

Example:

  • name: Sort(A)Sort(A)
  • input: An array AA of nn integers
  • output: A permutation of that array AA that is sorted (monotonic).

Total correctness(完全正确性)

An algorithm is called totally correct for the given specification if and only if for any correct input data it:

  1. terminates
  2. returns correct output

Usually, while checking the correctness of an algorithm it is easier to separately:

  • Check whether the algorithm stops
  • Then checking the remaining part — This remaining part of correctness is called "Partial Correctness" of algorithm

Partial correctness(部分正确性)

An algorithm is called partially correct if satisfies the following condition: If the algorithm receiving correct input data stops then its result is correct.

Note: Partial correctness does not make the algorithm stop.

Total correctness:

  • precondition: x=1x = 1
  • algorithm: yxy \coloneqq x
  • postcondition: y=1y = 1

Partial correctness:

  • precondition: x=1x = 1
  • algorithm:
    1
    2
    while (true)
    x := 0
  • postcondition: y=1y = 1

Neither partial nor total correctness:

  • precondition: x=1x = 1
  • algorithm: yxy \coloneqq x
  • postcondition: y=2y = 2

The proof of total correctness

A proof of total correctness of an algorithm usually assumes 2 separate steps

  1. (to prove that) the algorithm always terminate for correct input data
  2. (to prove that) the algorithm is partially correct.

Different proof methods for them, typically

  • Variants(变式) for "termination"
  • Invariants(不变式) for "partial correctness"

Termination is often much easier to prove.

Example: Insertion-Sort

1
2
3
4
5
6
7
8
9
10
11
12
13
# Procedure: Insertion-Sort(A)
# In: An arrat A of n integers
# Out: A permutation of that array A that is sorted (monotonic)

for i := 2 to A.length
key := A[i]
# Insert A[i] into the sorted sequence A[1:i-1]
j := i - 1
while j > 0 and A[j] > key
A[j + 1] := A[j]
j := j - 1
A[j + 1] := key
return A

Proof the correctness of Insertion-Sort:

  1. The algorithm outputs correct result on every instance (partially correct).
  2. The algorithm terminates within finite steps on every instance (termination).
Step 1: Using loop invariant for partial correctness

General rules for loop invariant proofs:

  • Initialization: It is true prior to the first iteration of the loop.
  • Maintenance: If it is true before an iteration of the loop, it remains true before the next iteration.
  • Termination: When the loop terminates, the invariant gives us a useful property that helps show that the algorithm is correct.

The loop invariant of the outer for loop: By the end of ii-th iteration, the elements in subarray A[1:i] are sorted order.

  • Initialization: prior the first iteration(i=2): A[1] is in sorted order.
  • Maintenance: Assume by the end of the ii-th iteration, the elements in subarray A[1:i] are in sorted order; then by the end of the (i+1)(i+1)-th iteration, the elements in subarray A[1:i+1] are in sorted order.
  • Termination: After the iteration i=n, the loop invariant states that A is sorted.

Let the loop invariant mentioned above as IV1IV_1. Is there only one loop invariant? The answer is NO!

Here is another loop invariant: By the end of the ii-th iteration, subarray A[1:i] retains all of the original elements in A[1:i] in previous iteration. Let this be IV2IV_2.

IV2IV_2 is weaker than IV1IV_1! Since there are more possible A[1:i] that satisfy IV2IV_2 but not satisfy IV1IV_1.

A good (strong) loop invariant must satisfy these three properties Initialization, Maintenance and Termination. Note that IV2IV_2 does not satisfy Termination property.

How to find the loop invariant?

Generally, the answer is: We don't know.

For simple ones, there exists effective techniques.

Step 2: Using variant for termination

Using loop variant to prove the termination:

  • show that some quantity strictly decreases.
  • it cannot decrease indefinitely (Bounded!)

Well-ordered set(良序集)

An ordered set is well-ordered(良序) if each and every nonempty subset has a smallest or least element.

「良序」的更多内容见离散数学笔记

A well-ordered set has no infinite descending sequences, which can be used to ensure the termination of algorithm.

Loop variant for the inner while loop: jj. For each iteration, jj strictly decreases. jj is bounded to be larger than 00.

Loop variant for the outer for loop: A.length - i. For each iteration, A.length - i strictly decreases. A.length - i is bounded to be larger than 00.

How to find the loop variant?

Again, generally, the answer is: We don't know.

However, generally speaking, it is very easy to identify. For example, the induction variable of the loop (or some linear transformation of it).

Some other strategies of correctness proof: proof by cases, proof by contrapposition, proof by contraction, etc.

When loops and/or recursions are involved: often (if not always) use mathematical induction.

Efficiency of Algorithms

Complexity

  • Time complexity: how much time is needed before halting
  • Space complexity: how much memory (usually excluding input) is required for successful executed

Other performance measures:

  • communication bandwidth(通信带宽)
  • energy consumption

Time complexity is typically more important than others in analysis.

An observation: larger inputs often demands more time. Therefore, Cost of an algorithm should be a function of input size, say, T(n)T(n).

Given an algorithm and an input, when counting the cost with respect to the RAM model:

  • Each memory access takes constant time.
  • Each "primitive" operation takes constant time.
  • Compound operations should be decomposed.
  • At last, Counting up the number of time units.

Time complexity of Insertion Sort

1
2
3
4
5
6
7
8
9
10
                                # Cost   # Times
# ---- # -----
for i := 2 to A.length # c1 # n
key := A[i] # c2 # n - 1
j := i - 1 # c3 # n - 1
while j > 0 and A[j] > key # c4 # sum(t_i), i=2 to n
A[j + 1] := A[j] # c5 # sum(t_i - 1), i=2 to n
j := j - 1 # c6 # sum(t_i - 1), i=2 to n
A[j + 1] := key # c7 # n - 1
return A

The total cost of the algorithm is

T(n)=c1n+c2(n1)+c3(n1)+c4i=2nti+c5i=2n(ti1)+c6i=2n(ti1)+c7(n1)T(n) = c_1n + c_2(n-1) + c_3(n-1) + c_4\sum_{i=2}^{n}t_i + c_5\sum_{i=2}^{n}(t_i-1) + c_6\sum_{i=2}^{n}(t_i-1) + c_7(n-1)

tit_i depends on which input of size nn.

The time cost of insert sort varies among inputs. How to fairly evaluate a algorithm?

Enumerate the cost of all the possible inputs? Not possible, since the input space is infinite! We can check the representative inputs, but, what are they?

Worst, Best and Average case

  • Worst-case time complexity: the maximum time cost among all possible inputs of size nn.
    • W(n)=maxxXnT(x)W(n) = \max\limits_{x \in \mathscr{X}_n} T(x)
  • Best-case time complexity: the minimum time cost among all possible inputs of size nn.
    • B(n)=minxXnT(x)B(n) = \min\limits_{x \in \mathscr{X}_n} T(x)
  • Average-case time complexity: the average time cost among all possible inputs of size nn.
    • A(n)=xXnPr(x)T(x)=E[T(x)]A(n) = \sum\limits_{x \in \mathscr{X}_n} \Pr(x)T(x) = \mathbb{E}[T(x)]

Note: need assumption of statistics distribution of inputs.

We mainly focus on worst-case time complexity.

Exceptions: Some exponential-time algorithms are used widely in practice because the worst-case instances don't arise.

Best-case time complexity

Each time tit_i is 11, which means that each time the while loop condition is false at the beginning! Which means that the array is already sorted at the beginning!

B(n)=c1n+c2(n1)+c3(n1)+c4(n1)+c7(n1)=(c1+c2+c3+c4+c7)n(c2+c3+c4+c7)B(n) = c_1n + c_2(n-1) + c_3(n-1) + c_4(n-1) + c_7(n-1) = (c_1 + c_2 + c_3 + c_4 + c_7)n - (c_2 + c_3 + c_4 + c_7)

Worst-case time complexity

Each time tit_i is the largest it can be, which means that each time the while loop condition is true until jj is equal to 00. A[j] > key is true every time, which means that the array is reversely sorted at the beginning! So ti=it_i = i every time.

W(n)=c1n+c2(n1)+c3(n1)+c4i=2ni+c5i=2n(i1)+c6i=2n(i1)+c7(n1)=c4+c5+c62n2+(c1+c2+c3+c7c4+c5+c62)n(c2+c3+c4+c7)\begin{aligned} W(n) &= c_1n + c_2(n-1) + c_3(n-1) + c_4\sum_{i=2}^{n}i + c_5\sum_{i=2}^{n}(i-1) + c_6\sum_{i=2}^{n}(i-1) + c_7(n-1) \\ &= \dfrac{c_4+c_5+c_6}{2}n^2 + \left(c_1+c_2+c_3+c_7-\dfrac{c_4+c_5+c_6}{2}\right)n - (c_2+c_3+c_4+c_7) \end{aligned}

Average-case time complexity

What about the average case? The elements in the input array are randomly ordered.

Hint: the number of swaps equals the number of inversions!

相当于求随机列表中反序对的期望值,为 (n2)E(I)=12(n2)\dbinom{n}{2}\mathbb{E}(I) = \dfrac{1}{2}\dbinom{n}{2}

Asymptotic order of growth

Big OO notation

In practice, we usually don’t care about the unimportant details in the counted operations.

We need one more simplifying abstraction, which can give us an intuitive feeling of the cost of an algorithm.

The abstractions is: the rate of growth, or order of growth, of the running time that really interests us, therefore, two factors are ignored:

  • Constant coefficients are not that important (when nn is large)
  • Lower-order terms are not that important (when nn is large).

Big $O$ notation(大 $O$ 符号)

Given a function g(n)g(n), we denote by O(g(n))O(g(n)) the following set of functions:

O(g(n))={f(n)c>0,n0>0,nn0 ⁣:0f(n)cg(n)}O(g(n)) = \left\lbrace f(n) \mid \exists c > 0,\, \exists n_0 > 0,\, \forall n \ge n_0\colon 0 \le f(n) \le c \cdot g(n) \right\rbrace

Asymptotic upper bounds(渐近上界): when we say f(n)f(n) is O(g(n))O(g(n)), we mean that f(n)f(n) grows no faster than a certain rate (is asymptotically at most g(n)g(n)).

O(g(n))O(g(n)) is actually a set of functions, but computer scientists often write f(n)=O(g(n))f(n) = O(g(n)) instead of f(n)O(g(n))f(n) \in O(g(n)).

Consider f1(n)=5n3,f2(n)=3n2f_1(n) = 5n^3,\, f_2(n) = 3n^2, we have f1(n)=O(n3),f2(n)=O(n3)f_1(n) = O(n^3),\, f_2(n) = O(n^3). But we do not concluded f1(n)=f2(n)f_1(n) = f_2(n).

Therefore, the worst time complexity of Insertion Sort W(n)=O(n2)W(n) = O(n^2), which means that it is asymptotically at most n2n^2.

multiple variables

f(m,n)f(m, n) is O(g(m,n))O(g(m, n)) if there exists constants c>0,m00,n00c > 0,\, m_0 \ge 0,\, n_0 \ge 0 such that 0f(m,n)cg(m,n)0 \le f(m, n) \le c \cdot g(m, n) for all mm0,nn0m \ge m_0,\, n \ge n_0.

Ex. f(m,n)=32mn2+17mn+32n3f(m, n) = 32mn^2 + 17mn + 32n^3 is both O(mn2+n3)O(mn^2 + n^3) and O(mn3)O(mn^3), but is neither O(n3)O(n^3) nor O(mn2)O(mn^2).

Big Ω\Omega notation

Big $\Omega$ notation(大 $\Omega$ 符号)

Given a function g(n)g(n), we denote by Ω(g(n))\Omega(g(n)) the following set of functions:

Ω(g(n))={f(n)c>0,n0>0,nn0 ⁣:f(n)cg(n)}\Omega(g(n)) = \left\lbrace f(n) \mid \exists c > 0,\, \exists n_0 > 0,\, \forall n \ge n_0\colon f(n) \ge c \cdot g(n) \right\rbrace

Asymptotic lower bounds(渐近下界): when we say f(n)f(n) is Ω(g(n))\Omega(g(n)), we mean that f(n)f(n) grows at least as fast as a certain rate (is asymptotically at least g(n)g(n)).

Big Θ\Theta notation

Big $\Theta$ notation(大 $\Theta$ 符号)

Given a function g(n)g(n), we denote by Θ(g(n))\Theta(g(n)) the following set of functions:

Θ(g(n))={f(n)c1>0,c2>0,n0>0,nn0 ⁣:c1g(n)f(n)c2g(n)}\Theta(g(n)) = \left\lbrace f(n) \mid \exists c_1 > 0,\, \exists c_2 > 0,\, \exists n_0 > 0,\, \forall n \ge n_0\colon c_1 \cdot g(n) \le f(n) \le c_2 \cdot g(n) \right\rbrace

Asymptotically tight bounds(渐近紧确界): when we say f(n)f(n) is Θ(g(n))\Theta(g(n)), we mean that f(n)f(n) grows precisely at a certain rate (is asymptotically equal to g(n)g(n)).

Small oo and ω\omega notation

Small $o$ notation(小 $o$ 符号)

Given a function g(n)g(n), we denote by o(g(n))o(g(n)) the following set of functions:

o(g(n))={f(n)c>0,n0>0,nn0 ⁣:0f(n)<cg(n)}o(g(n)) = \left\lbrace f(n) \mid \forall c > 0,\, \exists n_0 > 0,\, \forall n \ge n_0\colon 0 \le f(n) < c \cdot g(n) \right\rbrace

f(n)f(n) is asymptotically (strictly) smaller than g(n)g(n).

Small $\omega$ notation(小 $\omega$ 符号)

Given a function g(n)g(n), we denote by ω(g(n))\omega(g(n)) the following set of functions:

ω(g(n))={f(n)c>0,n0>0,nn0 ⁣:f(n)>cg(n)}\omega(g(n)) = \left\lbrace f(n) \mid \forall c > 0,\, \exists n_0 > 0,\, \forall n \ge n_0\colon f(n) > c \cdot g(n) \right\rbrace

f(n)f(n) is asymptotically (strictly) larger than g(n)g(n).

There is no small θ\theta notation.

Some properties of asymptotic notations

  • Reflexivity: f(n)O(f(n))f(n) \in O(f(n)); but f(n)o(f(n))f(n) \notin o(f(n))
  • Transitivity: if f(n)O(g(n))f(n) \in O(g(n)) and g(n)O(h(n))g(n) \in O(h(n)), then f(n)O(h(n))f(n) \in O(h(n))
  • Symmetry: f(n)Θ(g(n))    g(n)Θ(f(n))f(n) \in \Theta(g(n)) \iff g(n) \in \Theta(f(n))
  • Transpose symmetry: f(n)O(g(n))    g(n)Ω(f(n))f(n) \in O(g(n)) \iff g(n) \in \Omega(f(n))

Asymptotic bounds and limits

If cost functions are complex, it is hard to apply the definitions to get its asymptotic bounds. In this case, it usually easier to apply limit method.

  1. If limnf(n)g(n)=c\lim\limits_{n\to \infty } \dfrac{f(n)}{g(n)} = c for some constant 0<c<0 < c < \infty, then f(n)f(n) is Θ(g(n))\Theta(g(n)).
  2. If limnf(n)g(n)=0\lim\limits_{n\to \infty } \dfrac{f(n)}{g(n)} = 0, then f(n)f(n) is O(g(n))O(g(n)). (a.k.a. f(n)f(n) is o(g(n))o(g(n))).
  3. If limnf(n)g(n)=\lim\limits_{n\to \infty } \dfrac{f(n)}{g(n)} = \infty, then f(n)f(n) is Ω(g(n))\Omega(g(n)). (a.k.a. f(n)f(n) is ω(g(n))\omega(g(n))).
Proof of 1.

By definition of the limits, for any ε>0\varepsilon > 0, there exists n0n_0 such that cεf(n)g(n)c+εc - \varepsilon \le \dfrac{f(n)}{g(n)} \le c + \varepsilon for all nn0n \ge n_0.

Choose ε=c2>0\varepsilon = \dfrac{c}{2} > 0. Multiplying by g(n)g(n) yields 12cg(n)f(n)32cg(n)\dfrac{1}{2}c \cdot g(n) \le f(n) \le \dfrac{3}{2} c \cdot g(n) for all nn0n \ge n_0.

Thus, f(n)f(n) is Θ(g(n))\Theta(g(n)) by definition, with c1=12cc_1 = \dfrac{1}{2}c and c2=32cc_2 = \dfrac{3}{2} c.

Asymptotic bounds for some common functions

  • Polynomials: p(n)p(n) is Θ(nd)\Theta(n^d), where dd is the maximum degree of p(n)p(n).
  • Logarithms: logan\log_a n is Θ(logbn)\Theta(\log_b n) for any a,b>1a, b > 1.
    • logan\log_a n is O(nd)O(n^d) for every a>1,d>0a > 1, d > 0.
  • Exponentials: ana^n is Θ(bn)\Theta(b^n) for any a,b>1a, b > 1.
    • ndn^d is O(rn)O(r^n) for every r>1,d>0r > 1, d > 0.
  • Factorials: n!n! is Θ(nn)\Theta(n^n), log(n!)\log (n!) is Θ(nlogn)\Theta(n \log n).

tractable{Θ(1),constantΘ(logn),logarithmΘ(n),linearΘ(nlogn)linearithmicΘ(nc),polynomialintractable{Θ(2n),exponentialΘ(n!),factorial\begin{aligned} \text{tractable}&\begin{cases} \Theta(1), &\text{constant}\\ \Theta(\log n), &\text{logarithm}\\ \Theta(n), &\text{linear}\\ \Theta(n \log n) &\text{linearithmic}\\ \Theta(n^c), &\text{polynomial}\\ \end{cases}\\ \text{intractable}&\begin{cases} \Theta(2^n), &\text{exponential}\\ \Theta(n!), &\text{factorial}\\ \end{cases} \end{aligned}

When considering brute force algorithm to solve one problem, it is usually asymptotically equal to exponential functions.

When an algorithm has a polynomial running time, we say it is efficient, and the corresponding problem is so-called easy or tractable. The algorithm has typically exposes some crucial structure of the problem.

Although, there are exceptions. Some poly-time algorithms in the wild have galactic constants and/or huge exponents. Which would you prefer: 20n12020n^{120} or n1+lnnn^{1+\ln n}?