Solutions to test of 2000

Problem 1.

Answer:UMA is most similar to PRAM.PRAM model has been developed for modeling idealized parallel computers with zero synchronization or memory access overhead.An n-processor PRAM has a globally addressable memory.Each processor can access any memory location in unit time.In a UMA multiprocessor model,the physical memory is uniformed shared by all processors.All processors have equal access time to all memory words.

Problem 2.

Answer:Assume n processors(Pi,i=1,2,...,n) want to read from the same memory unit M1.

Step 1:P1 read data X from M1 ,and then write X to M2 ,now there are two memory unit(M1,M2) has the same data.

Step 2:P1 and P2 concurrently read data X from M1,M2, it is legal because two Processors read from the different memory units.After reading,they write data X to other units M3,M4.It is also legal for they write the different memory units.Now there are four memory units has the same data(M1M2M3M4).

Step 3:Continue reading and writing as Step 2,each time with double processors.At last,we can get n memory units having the same data X.It is cost O(logN).

Step 4:N processors concurrently read data X from the n different memory units.It can finish in one unit time.

Ok,now we proved a READ operation in PRAM CRCW can be simulated in PRAM EREW in O(logN) steps with n processors.

Problem 3.

Answer:Multiprocessors are poor in scalability because the shared memory. (1).As the number of processors increases,conflicts will also increase because it is more possible for more than one processors read the same memory unit.(2).As memory increases, processors are harder to find the memory unit they need and the overhead will be bigger.Memory increases,address bits is longer,it is harder to compile the address.(3).Because the length of memory address bits limited.

Problem 4.

Answer:

Step1: We can divide N numbers to N/logN processors.Each processor have logN numbers.

Step2:Processors sum the datas of theirs simultaneously.The cost is O(logN).

Step3: Now each processor has one data,it is the sum of the logN datas abtained in step 2.And we can add up all of the data in O(log(N/logN)) with logN processors.

As we can see the total time is O(logN)+O(log(N/logN)).So we add up N numbers in O(logN) time with N/logN processors.

 

Problem 5.

Answer:

We assume the node of a n-dimension hypercube can be denoted by n bits.

Step 1 . 2-dimension:we can see hypecube is a square.It is easy to see it is Hamiltonian.

Step 2 . We assume n-dimension hypercube is also Hamiltonian.

Step 3. In this step,we try to prove n+1-dimension hypercube is Hamiltonian.We know a n+1 dimension hypercube can be constructed by two n-dimension hypercubes.These two n-dimension hypercyues are all Hamiltonian.

we can connect these two hypercubes in below way(in one hypercube we add a bit 0 to the head,to the other,we add a bit 1):

and then we remove the two lines with crossbar.

ok.now we can see a n+1-dimension hypercube is Hamiltonian.

Above all,we prove the hypercube is Hamiltonian.

Problem 6.

Answer:

(1)one processor: we can only add the number one by one.

T(1)=15 . S(1)=1 ; E(1)=1.

(2)four processors: we can use the method in problem 4 to do the work.

So T(4)=5 . S(4)=T(1)/T(4)=3 ; E(4)=S(4)/4=3/4.

(3)eight processors: we can get the result easily in 3 steps.

So T(8)=3 . S(8)=T(1)/T(8)=5 ; E(8)=S(8)/8=5/8.

We can see from above: if we want to obtain high speedup,8 procesors are needed;for efficiency,we can choose 1 processors;All things considered,we would choose 4 processors.

 

Problem 7.

Answer:

(1).Amdahl's law is for workload-fixed and is for time-critical.While Gustafson's law is for workload-scaled in fixed time for accuracy.

(2).Amdahl's law is Sn=1/(+n/(1-));

Gustafson's law Sn=+n(1-).

(3).In Amdahl's law,we can see,the speedup Sn is upper-bounded by 1/.The problem of sequential bottleneck cannot be solved just by increasing the number of porcessors in a system.The real problem lies in the existence of a sequential fraction of the code.Gustafson's law does support scalable performance as the machine size increase.The idea is to keep all processors busy by increasing the problem size.When the problem can scale to match available computing power,the sequential fraction is no longer a bottleneck.