tutORial:Bridge and Torch Problem

The Bridge and Torch Problem

From an OR/MS perspective, this famous puzzle can be regarded (with a bit of imagination) as a vehicle routing problem. We shall discuss this perspective in due course. First, however, let us have a look at the puzzle itself. So here it is.

Suppose that a group of n persons, P={1,2,...,m}, is at the southern bank of a river and plans to walk across a bridge to the northern side. It is dark, so they have to carry a torch with them whenever they cross the bridge. Needless to say, it is assumed that m > C > 1. Since no more than C < persons are allowed to be on the bridge at any given time, a number of crossings will be required.

The time required by a group of persons, say A, to cross the bridge is given by t(A):= max{t(p): p in A}, where t(p) is the time it takes person p to cross the bridge. That is, the crossing time of a group is equal to the crossing time of the slowest person in the group. Thus, if A = {1, 2, 3, 4 ,5}, then t(A) = max{t(1), t(2), t(3), t(4), t(5)} so for t(1)=2, t(2)=3, t(3)=8, t(4)=5, t(5)=2 we have t(A) = max{2,3,8,5} = 8. We assume that t(p) > 0 for p in P.

Observe that as long as there are people on the southern bank, it is necessary to return the torch from the northern bank so that the next group can cross the bridge from south to north. It is assumed - without a formal proof - that under an optimal policy a single person, rather than a group of people, return the torch from north to south. Thus, typically groups move from south to north and individuals move back from north to south to return the torch.

The question is then: what is the optimal crossing policy, assuming that the objective is to minimize the total time required by the group to cross the bridge?

In what follows we define the problem mathematically and derive a dynamic programming (DP) functional equation for this formulation. An on-line interactive module is also provided for experimentation with the model.

Mathematical Model

The conceptual framework we use to construct a mathematical model for the problem is sequential decision processes. That is, we regard the problem under consideration as a sequential decision problem. By this we mean that a sequence of decisions are to be made to achieve a certain goal, subject to a variety of constraints. In addition to the decision variables, the model also includes state variables, namely variables that are used to describe the state of the seuqential decision process. We assume that there will be k crossings, j=1,2,...,k.

Crossing j<k consists of two parts: a group is moved from the southern bank to the northern banks and then one person returns the torch from the northern bank to the southern bank. The last crossing, j=k, consists only of the first part, as there is no need to return the torch. Needless to say, the value of k is not known in advance.

Notation:

E = empty set, namely E={}.
A\B = set consisting of all elements of set A that are not elements of set B.
A = complement of set A with reppect to set P, namely A = P\A.
U = union operation, namely AUB is the union of set A and set B.
|A| = cardinality of set A. Hence, |P| = m, |E| = 0.

Decision variables:

Let

X_j := group of persons moving from south to north in the j-th crossing, j=1,2,...,k. (1)

y_j := person returning the torch to the southern bank after the j-th crossing, j=1,2,...,k-1. (2)

Note that it is not necessary to return the torch after the last crossing, hence there is no need for y_k. The use of X_j rather than the conventional x_j is intended to highlight the fact that this decision variable is a set.

State variables:

Let

S_j := group of persons on the southern bank just before the j-th crossing, j=1,2,...,k. (3)

By definition then, S₁ = P. Although there are only k crossings, we let S_k+1 denote the group on the southern bank after the k-th crossings. By definition of k we must have S_k+1 = {}.

State transition:

Given the above definitions, it follows that the dynamics of the state variables is governed by the following transition law:

S_j+1 = (S_j\X_j)U{y_j}, j=1,2,...,k. (4)

Objective function:

Let

T(X_j,y_j) := t(X_j) + t(y_j), j=1,2,...,k-1. (5)

then by definition T(X_j,y_j) is the time it takes to complete the j-th crossing - including the return of the torch to the southern bank. The duration of the last crossing is equal to t(X_k) as there is no need to return the torch to the southern bank.

It follows then that the total crossing time is

F(X,y) := T(X₁,y₁) + ... + T(X_k-1,y_k-1) + t(X_k) (6)

Feasible Decisions:

By definition, X_j must be a non-empty subset of S_j and y_j must be an element of S_jUX_j for j=1,2,...,k-1.

The problem under consideration is then as follows:

T^*: = max
X,y T(X₁,y₁) + ... + T(X_k-1,y_k-1) + t(X_k)

s.t.

S₁ = P, S_k+1 = {}.

X_j = subset of S_j, j=1,2,...,k-1.

y_j is an element of S_jUX_j, j=1,2,...,k-1.

|X_j| <= C, j=1,2,...,k.

S_j+1 = (S_j\X_j)U{y_j}, j=1,2,...,k.

(7)

It what follows we formulate a Dynamic Programming (DP) functional equation for this problem. The optimal policy is obtained by solving this equation. In anticipation of this, it is useful to examine a number properties possessed by at least one optimal policy.

Property 1:
The person who returns the torch to the southern bank is the fastest person on the northern bank when the torch is to be returned. That is, given that group S_j is currently on the northern bank, person p will return the torch to the southern bank where t(p)=min {t(i): i in S_j}. Let p(S_j,X_j) denote this person, observing that S_j is uniquely determined by S_j and X_j, namely S_j = (P\S_j)UX_j. Then in the context of (7) for j=1,2,...,k-1 we have y_j = p(S_j,X_j). It follows that

S_j+1 = Next(S_j,X_j) := (S_j\X_j)U{p(S_j,X_j)}, j=1,2,...,k. (8)

Property 2:
If there are more than C persons on the southern bank, then at least two persons will cross the bridge to the northern bank in the next crossing. That is, |X_j|=1 if and only if X_j=S_j, or equivalently |S_j| > 1 implies that |X_j| > 1. If there are C or less persons on the southern bank, then clearly all of them will cross the bridge in the next crossing.

Property 3:
We always try to make |X_j| as large as possible without slowing down the crossing time of group X_j. That is, if |X_j| < C, then t(X_j) <= t(p) for all p in S_j.

A proof that these properties are satisfied by at least one optimal policy is provided in the Appendix.

Let D(S_j) denote the set of feasible values of X_j. It follows then that we can set

D(S_j) = {S_j} , |S_j| <= C (9)

D(S_j) = {G: G is a subset of S_j such that 2 <= |G| <=C. If |G| < C, then t(G) <= t(p) for all p in S_j}, |S_j| > C (10)

Also, let

T(S_j,X_j) := t(X_j) + t(p(S_j,X_j), j=1,2,...,k-1. (11)

The problem under consideration can therefore be rewritten as follows:

T^*: = max
X T(X₁) + ... + T(X_k-1) + t(X_k)

s.t.

S₁ = P, S_k+1 = {}.

X_j = element of D(S_j), j=1,2,...,k.

S_j+1 = Next(S_j,X_j), j=1,2,...,k.

(12)

We now derive a dynamic programming (DP) functional equation for this problem. Solving this equation will yield an optimal policy for the problem. To accomplish this define the state of the process to be the group of persons on the southern bank, namely G. The initial state of the process is then G=P and the terminal state is G=E.

We create out of the given problem a family of related problems as follows:

Let

f(S) := minimal crossing time required to move group S from south to north, (S is a subset of P). (13)

Obviously, we are interested in the value of f(P). Note that by definition f(E)=0.

With this in mind, let

f(S,G) := minimal crossing time required to move group S from south to north, given that subgroup G is moved first , (S is a subset of P and G is a subset of S such that G is in D(S)). (14)

It follows then that

Lemma 1:

f(S) = min {f(S,G): G in D(S)} , S is a non-empty subset of P. (15)

Observe that in accordance with (13)-(14) we have

f(S,G) = T(S,G) + f(Next(S,G)), |S| > C. (16)

Thus,

Theorem 1:

f(S) = t(S) , |S| <= C (17)

f(S) = min {T(S,G) + f(Next(S,G)): G in D(S) }, |S| > C (18)

Observe that in accordance with Property 2, if |S| >C then |S| > |Next(S,G)| for any group G in D(S). The implication is that (18) can be solved iteratively in increasing order |S|, starting with |S|=C in which case f(S) is determined by (17).

Let D^*(S) denote the set of optimal values of G associated with f(S). That is, let

D^*(S) = {S} , 1 <= |S| <= C (19)

D^*(S) = {G^* in D(S): T(S,G^*) + f(Next(S,G^*)) = min {T(S,G) + f(Next(S,G)): G in D(S) }, |S| > C (20)

These sets will be useful in the recovery of optimal policies.

Example

Consider the case where C=2, m=4 and the crossing times {t(j)} are as follows:

j 1 2 3 4

t(j) 1 2 10 11

So from (17) we obtain:

S E {1} {2} {3} {4} {1,2} {1,3} {1,4} {2,3} {2,4} {3,4}

f(S) 0 1 2 10 11 2 10 11 10 11 11

D^*(S) {{1}} {{2}} {{3}} {{4}} {{1,2}} {{1,3}} {{1,4}} {{2,3}} {{2,4}} {{3,4}}

We can now solve f(S) according to (18) for all S such that |S| = C + 1 = 3. The results are as follows:

S {1,2,3} {1,2,4} {1,3,4} {2,3,4}

f(S) 13 14 15 14

D^*(S) {{1,2},{1,3}} {{1,2},{1,4}} {{3,4}} {{3,4}}

We illustrate below in detail how this is done for S = {1,2,3}, for which S = P\S = {4}. Observe that since |S| > C, in accordance with Property 2 , |S| = 2 for all G in D(S), hence D({1,2,3}) = {{1,2},{1,3},{2,3}}.

f({1,2,3}) = min {t(G) + t(p({1,2,3},G)) + f(Next({1,2,3},G)): G in D({1,2,3}} (19)

G p(S,X) Next(S,G) t(G) t(p(S,G)) f(Next(S,G)) f(S,G)

{1,2} 1 {1,3} 2 1 10 13

{1,3} 1 {1,2} 10 1 2 13

{2,3} 2 {1,2} 10 2 2 14

It follows then that f({1,2,3}) = min {13,13,14} = 13. There are two optimal decisions for this subproblem, namely G={1,2} and G={1,3}, hence D^*({1,2,3})={{1,2},{1,3}}

We are ready now to deal with groups whose cardinality is 4. There is only one such group, namely S=P={1,2,3,4}. For this case, the set of feasible decisions is as follows:

D({1,2,3,4}) = {{1,2},{1,3},{1,4},{2,3},{2,4}, {3,4}} (21)

The following table explains how the value of f({1,2,3,4,}) is computed:

G p(S,G) Next(S,G) t(G) t(p(S,G)) f(Next(S,G)) f(S,G)

{1,2} 1 {1,3,4} 2 1 15 18

{1,3} 1 {1,2,4} 10 1 14 25

{1,4} 1 {1,2,3} 11 1 13 25

{2,3} 2 {1,2,4} 10 2 14 26

{2,4} 2 {1,2,3} 11 2 13 26

{3,4} 3 {1,2,3} 11 10 13 34

Thus, f(P) = f({1,2,3,4}) = 18 and the optimal decision is G={1,2}, hence D^*({1,2,3,4})={{1,2}}. This means that the next state of the process will be Next({1,2,3,4},{1,2}) = {1,3,4}. The optimal decision at this state is G={3,4}. This will change the state of the process to Next({1,3,4},{3,4}) = {1,2}. The optimal decision for this state is G'={1,2}. The state resulting from this decision will then be E={}. It follows then that the optimal policy is p={{1,2},{1},{3,4},{2},{1,2}} or in plain English:

Move {1,2} from south to north.
Move {1} from north to south.
Move {3,4} from south to north.
Move {2} from north to south.
Move {1,2} from south to north.

The (optimal) total crossing time is T^* = f(P) = 18.

Algorithms

The DP functional equation can be solved by a variety of algorithms including those designed for acyclic shortest path problems with positive arc length. Of particular interest is the Dijkstra's variety. The point is that the shortest path representation of the problem under consideration may possess many very long optimal subpaths which Dijkstra's type algorithms will be only too happy to ignore. So this is the good news.

The bad news, however, is that the DP functional equation for this problem is subjected to the Curse of Diemsnsionality. Observe that since f(S) is to be computed for all subsets S of P={1,2,...,m}, there are 2^m such values to compute. Furthermore, to compute f(S) it is necessary to generate all subsets G of S such that 2 <=|G| <= C, and there are plenty of those. Therefore, such algorithms will exhibit an exponential growth in time/space requirements as m increases.

You may now wish to experiment with our on-line interactive module. At this stage it does not provide a facility to compute the optimal policy. You can use it to experiment with your very own heuristic solutions to the problem. We do plan to complete soon a Dijkstra-based module that will generate optimal solution to the problem.

Appendix

In this appendix we provide proofs of the validity of the four properties mentioned above regarding the structure of an optimal policy. The situation is as follows: S is the group of persons on the southern bank and a group G is to be moved to the northern bank and then a person p will return the torch to the southern bank. The questions arises as to the optimal choices for G and p.

Property 1:
The person who returns the torch to the southern bank is the fastest person on the northern bank when the torch is to be returned. That is, given that group S is currently on the northern bank, person p will return the torch to the southern bank where t(p)=min {t(i): i in S}.

Proof: Let Plan A be any policy that sends a person q such that t(q) > t(p) to the southern bank. Let Plan B be a plan that is identical to Plan A except that the moves of person p and person q are swapped until their first meeting under Plan A. The immediate penalty for Plan A is r:=t(q)-t(p), namely the crossing time will increase by t(k) rather than t(p) when the torch is returned to the south bank.

There are two possibilities with regard to where persons q and p will meet for the first time under Plan A. We show that in both cases Plan B is at least as good as Plan A.

The first meeting is on the northern bank:
Clearly, in this case the total crossing time up to the first meeting under Plan A is at least r units longer than that under Plan B. Therefore, Plan B is superior to Plan A.

The first meeting is on the southern bank:
In this case the total crossing time up to the first meeting under Plan A is equal to that under Plan B.

It therefore follows that nothing can be gained by sending person q instead of person p back to the southern bank.

Property 2:
If there are C or more persons on the southern bank, then at least two persons will cross the bridge to the northern bank in the next crossing. That is, |G|=1 if and only if S=G, or equivalently |S| > C implies that |G| > 1.

Proof: Suppose that |S| > C and let Plan A be any policy such that |G|= 1 in the next crossing and let q be the sole element of G. Clearly, it is not necssary to consider the case where person q is faster than all the persons in S, because according to Property 1 person q will immediately return the torch to the southern bank. in this case a total of 2t(q) units of time will be wasted without any change in the state of the process. So assume that q is slower than the fastest person in S, call it p. Thus, Plan A will send q to the northern bank and then send back person p to the southern bank. The crossing time required for this swap is r=t(q)+t(p). Let plan B be the policy that is identical to Plan A except that it skips the above swap between person q and person p and in subsequent moves util q and p meet for the first time it interchanges the role of these two persons. We shall show that Plan B is superior to Plan A. We have to consider two cases regarding where q and p meets for the first time under Plan A:

They meet for the first time at the northern bank:
The most that Plan A can gain over Plan B is t(q)-t(p) time units. This is smaller than r=t(q)+t(p), the duration of the swap between p and q. Hence Plan B is better than Plan A.

They meet for the first time at the southern bank:
Plan B will save t(q)-t(p) time units over Plan A until the first meeting. Thus, clearly Plan B is better than Plan A.

We therefore conclude that Plan B is superior to Plan A.

Property 3:
We always try to make |G| as large as possible without slowing down the crossing time of G. That is, if |G| < C, then t(G) <= t(i) for all i in S\G.

Proof: Suppose that |G| < C and t(p) <= t(q) where q is the slowest person in G and p is some person in S\G. Then by including p in G we do not slow down G, yet we generate a non-inferior new state. That is, under the above conditions state S"=Next(S,G') is non-inferior to state S'=Next(S,G) where G'=GU{p}).

Just in case the term 'non-iferior state' needs clarification, here it is: a state S" is said to be non-inferior to state S' if S" is a subset of S'. The motivation for this terminology is obvious, namely:

Lemma 2

Let S' and S" be any two subsets of P such that S" is a subset of S'. Then f(S") <= f(S').

Proof: Exercise (Hint: Let Plan A be any optimal policy for state S'. Modify Plan A slightly so that the new policy will be feasible for state S" and will generate a total crossing time for state S" that is not longer than the total crossing time generated for state S' by Plan A.

Contributed by

Date created: January 15, 2002
Date last modified: February 15, 2002
Authorised by: Moshe Sniedovich
Maintained by: Moshe Sniedovich, Department of Mathematics and Statistics.
Email: m.sniedovich@ms.unimelb.edu.au

	D(S_j) = {S_j} , \|S_j\| <= C			(9)
	D(S_j) =	{G: G is a subset of S_j such that 2 <= \|G\| <=C. If \|G\| < C, then t(G) <= t(p) for all p in S_j}, \|S_j\| > C		(10)

	f(S) = t(S) , \|S\| <= C	(17)
	f(S) = min {T(S,G) + f(Next(S,G)): G in D(S) }, \|S\| > C	(18)

	D^*(S) = {S} , 1 <= \|S\| <= C	(19)
	D^(S) = {G^ in D(S): T(S,G^) + f(Next(S,G^)) = min {T(S,G) + f(Next(S,G)): G in D(S) }, \|S\| > C	(20)

S	E	{1}	{2}	{3}	{4}	{1,2}	{1,3}	{1,4}	{2,3}	{2,4}	{3,4}
f(S)	0	1	2	10	11	2	10	11	10	11	11
D^*(S)		{{1}}	{{2}}	{{3}}	{{4}}	{{1,2}}	{{1,3}}	{{1,4}}	{{2,3}}	{{2,4}}	{{3,4}}

S	{1,2,3}	{1,2,4}	{1,3,4}	{2,3,4}
f(S)	13	14	15	14
D^*(S)	{{1,2},{1,3}}	{{1,2},{1,4}}	{{3,4}}	{{3,4}}

G	p(S,X)	Next(S,G)	t(G)	t(p(S,G))	f(Next(S,G))	f(S,G)
{1,2}	1	{1,3}	2	1	10	13
{1,3}	1	{1,2}	10	1	2	13
{2,3}	2	{1,2}	10	2	2	14

G	p(S,G)	Next(S,G)	t(G)	t(p(S,G))	f(Next(S,G))	f(S,G)
{1,2}	1	{1,3,4}	2	1	15	18
{1,3}	1	{1,2,4}	10	1	14	25
{1,4}	1	{1,2,3}	11	1	13	25
{2,3}	2	{1,2,4}	10	2	14	26
{2,4}	2	{1,2,3}	11	2	13	26
{3,4}	3	{1,2,3}	11	10	13	34