Weve implicitly assumed that each cores call to Compute_next_value() function or other similar function requires roughly the same amount of work as the other calls for our block partition strategy among the cores. How would you change your partition approach to the preceding question if call i = k requires k + 1 times as much work as the call with i = 0? So if the first call (i =0) requires 2 milliseconds, the second call (i =1) requires 4, the third (i = 2) requires 6, and so on. Please show a better approach that can achieve the best load balancing than block partition. Using an example of p=5 as the total number of cores and n = 23 ( i=0,1,2 .22) (Hint: Try to come up with a better/revised approach than simple cyclic assignment and compare the total work among different cores)Try to write pseudo-code for the tree-structured global sum illustrated in slide. We can use Cs bitwise operators to implement the tree-structured global sum. In order to see how this works, it helps to write down the binary (base 2) representation of each of the core ranks, and note the pairings during each stage:From the table we see that during the first stage each core is paired with the core whose rank differs in the rightmost or first bit. During the second stage cores that continue are paired with the core whose rank differs in the second bit, and during the third stage cores are paired with the core whose rank differs in the third bit. Thus, if we have a binary value bitmask that is 0012 for the first stage, 0102 for the second, and 1002 for the third, we can get the rank of the core were paired with by inverting the bit in our rank that is nonzero in bitmask . This can be done using the bitwise exclusive or ^ operator. Implement this algorithm in pseudo-code using the bitwise exclusive or and the left-shift operator.Derive formulas for the number of receives and additions that core 0 carries out usinga. The original pseudo-code for a global sum (all other cores send partial sum to core 0 and 0 does all the summation), andb. The tree-structured global sum.c. Make a table showing the numbers of receives and additions carried out by core 0 when the two sums are used with 2, 4, 8, ,1024 cores
Compute next value
This is a sample question
Need help with a similar assignment?
Place an order at Study Pirate
Attach all custom instructions.
Make Payment. (The total price is based on number of pages, academic level and deadline)
We’ll assign the paper to one of writers and send it back once complete.