平摊分析

之前的笔记中有相关的记录。

Amortized Analysis

Idea: Even when expensive operations must be performed, it is often possible to get away with performing them rarely, so that the average cost per operation is not so high.

Amortized analysis is different from what is commonly referred to as average case analysis, because it does not make any assumption about the distribution of the data values, whereas average case analysis assumes the data are not "bad".

That is, amortized analysis is a worst case analysis, but for a sequence of operations, rather than for individual operations.

Aggregate method for Amortized Analysis

One assumes that there is no need to distinguish between the different operations on the data structure.

We just use aggregate method: Add up the cost of all the operations and then divide by the number of operations.

aggregate cost=max amount of work done by any series of m operationsm\text{aggregate cost} = \dfrac{\text{max amount of work done by any series of $m$ operations}}{m}

Let N=2kN = 2^{k} for some constant kk. Then maximum copying cost of NN insertions in CircularArray is 1+2++2k1=2k1N1 + 2 + \dots + 2^{k-1} = 2^{k} - 1 \approx N.

Therefore, the aggregate cost is O(N+NN)=O(1)O\left(\frac{N + N}{N}\right) = O(1).

Definition

Note: Different operations may have different amortized costs.

Operation has amortized cost c^(n)\hat{c}(n) if for every kN+k \in \N^{+}, the total cost of any kk operations is at most i=1nc^(ni)\displaystyle \sum_{i=1}^n \hat{c}(n_i), where nin_i is the size of the data structure when applying the ii-th operation.

Consider a sequence operations:

  • cic_i is the actual cost of the ii-th operation
  • c^i\hat{c}_i is the amortized cost of the ii-th operation

For the amortized cost to be valid:

i=1kcii=1kc^ikN+\sum_{i=1}^{k} c_i \le \sum_{i=1}^{k} \hat{c}_i\quad \forall k \in \N^{+}

  • Total cost of kk operations i=1kc^i\le \displaystyle \sum_{i=1}^{k} \hat{c}_i
  • Average cost of kk operations i=1kc^ik\le \dfrac{\displaystyle \sum_{i=1}^{k}\hat{c}_i}{k}

Example: Consider CircularArray implementation of Queue

It is not enough to prove using just 6 operations.

Amortized Analysis Techniques

The Accounting Method(记账法)

Imagine you have a bank account BB with initial balance 00.

For the ii-th operation, you spend c^i\hat{c}_i money:

  • Recall that the actual cost of the ii-th operation is cic_i
  • If c^ici\hat{c}_i \ge c_i, pay cic_i for the operation and deposit c^ici\hat{c}_i - c_i into BB.
  • If c^i<ci\hat{c}_i < c_i, pay c^i\hat{c}_i for the operation and withdraw cic^ic_i - \hat{c}_i from BB.

Amortized analysis valid if

B=i=1k(c^ici)B = \sum_{i=1}^{k}(\hat{c}_i - c_i)

always 0\ge 0 for all kk.

Example: Queue with CircularArray

c^i=3\hat{c}_i = 3 if the ii-th operation is Insert and c^i=1\hat{c}_i = 1 if the ii-th operation is Remove.

  • Goal: Prove i=1kcii=1kc^i\displaystyle \sum_{i=1}^{k}c_i \le \sum_{i=1}^{k}\hat{c}_i for any kNk \in \N^{*} operations.
  • Strategy: account always non-negative via induction on kk.
  • Basis: Prior to 1st operation, account balance is 00.
  • Hypothesis: Prior to ii-th operation, account balance is always non-negative.

Inductive Step: Consider the ii-th operation

  • If it's Remove, then we make no change to account value.
  • c^ici=11=0\hat{c}_i - c_i = 1 - 1 = 0
  • If it's Insert without expansion, we add 22 to account value.
  • c^ici=31=2\hat{c}_i - c_i = 3 - 1 = 2
  • If it's Insert with expansion. Assume expand from nn to 2n2n, then we need to withdraw n1n - 1 value.
  • cic^i=n+12=n1c_i - \hat{c}_i = n + 1 - 2 = n - 1
  • Analysisstep:
  1. Last expand must be from n2\dfrac{n}{2} to nn.
  2. Since last expand, each Insert adds 22 and each Remove makes no change.
  3. Since last expand, there are at least n2\dfrac{n}{2} Insert operation.
  4. Account balance is at least n2×2=n\dfrac{n}{2} \times 2 = n.
  5. After the ii-th operation, account balance is at least n(n1)=1n - (n - 1) = 1.
  6. Account balance is always non-negative.

Example: Binary Counter

The disadvantage of the accounting method is that it is not always easy to find the right accounting scheme. The guess is always constant.

The Potential Method(势能法)

Design a potential function Φ\Phi that maps data structure status to real values:

  • Φ(D0)\Phi(D_0): Initial potential of the data structure, usually set to 00.
  • Φ(Di)\Phi(D_i): Potential of the data structure after ii-th operation.

Define

c^i=ci+Φ(Di)Φ(Di1)\hat{c}_i = c_i + \Phi(D_i) - \Phi(D_{i-1})

Amortized analysis valid if

Φ(Dk)Φ(D0)\Phi(D_{k}) \ge \Phi(D_0)

for all kk.

Potential is like the balance in account in Counting Method.

  • Potential slowly accumulates during cheap operations (deposit).
  • Potential is used to pay for expensive operations (withdraw).

Example: Binary Counter

The last line is because, in one operation, it will turn 00 to 11 at most 11 time.