以前行った、multiprocessingでのクーロン力並列計算に引き続き、threadingを使ったクーロン力の並列計算をテストする。計算条件はmultiprocessingの時と同じくcore数を4、計算粒子数を15^3個とした。
結果、計算時間は14.36[sec]。multiprocessingでの計算は2.09[sec]であったため、大分時間がかかるというかシングルスレッドで計算していた時より時間がかかっている。threadingでは各threadがメモリを共有するために必要な情報を取得するためメモリアクセスする際に、排他ロック(GIL)が起きているのかな、と考えられる。試しにthread数を1にして実行してみると処理時間は6.92[sec]。thread化しない方が高速であった。
■結果 [Results summary]
code type | 時間[sec] |
---|---|
threading (4threads) | 14.36 <-(new!) |
threading (1thread) | 6.92 <-(new!) | multiprocessing | 2.09 |
itertools使用 (no1) | 8.18 |
range記述 (no2) | 7.93 |
xrange記述 (no3) | 7.89 |
ループ内周でnumpy使用 (no4) | 78.46 |
■使用したコードは下記(use 4 threads)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
########################### # threading test ########## ########################### import random import math import itertools import time import scipy.misc as scm import numpy as np import threading from Queue import Queue random.seed(1) PX = 0;PY = 1;PZ = 2; VX = 3;VY = 4;VZ = 5; FX = 6;FY = 7;FZ = 8; #number of particles in a line line_num = 15 #total particle num PN = line_num * line_num * line_num #ready to 9 parameters for particle #(PX, PY, PZ, VX, VY, VZ, FX, FY, FZ) xyz = [[0 for i in range(9)] for j in range(PN)] #Number of combinations of coulomb force calculation combinum = int(scm.comb(PN, 2)) #thread number(local thread num) core = 20 def find_pair_sub(prep,pend,thread,q): global xyz #local results array xyzF = [[0 for i in range(3)] for j in range(PN)] fx = 0; fy = 1; fz = 2 for i in xrange(prep,pend): for j in xrange(i + 1, PN): dx = xyz[i][PX] - xyz[j][PX] dy = xyz[i][PY] - xyz[j][PY] dz = xyz[i][PZ] - xyz[j][PZ] r = math.sqrt(dx*dx + dy*dy + dz*dz) xyzF[i][fx] = xyzF[i][fx] + dx/(r*r*r) xyzF[i][fy] = xyzF[i][fy] + dy/(r*r*r) xyzF[i][fz] = xyzF[i][fz] + dz/(r*r*r) xyzF[j][fx] = xyzF[j][fx] - dx/(r*r*r) xyzF[j][fy] = xyzF[j][fy] - dy/(r*r*r) xyzF[j][fz] = xyzF[j][fz] - dz/(r*r*r) q.put(xyzF) def find_pair(): global PN global combinum q = Queue() pw = combinum // core pl = combinum % core localt = 0 thread = 0 pre = 0 worklist = [] ppp = pw for i in range(PN) : if core == 1: worklist.append([pre,PN,thread]) break localt = localt + (PN - i - 1) if localt >= ppp: worklist.append([pre,i,thread]) ppp += pw thread += 1 pre = i if i != pre: prep = worklist[thread-1][0] worklist[thread-1] = [prep,PN,thread-1] results = [] for i in range(core): thread = threading.Thread(target=find_pair_sub, args=(worklist[i][0],worklist[i][1],worklist[i][2],q)) thread.start() thread_list = threading.enumerate() main_thread = threading.currentThread() thread_list.remove(main_thread) for thread in thread_list: thread.join() results.append(q.get()) for j in range(core): for i in range(PN): xyz[i][FX] += results[j][i][0] xyz[i][FY] += results[j][i][1] xyz[i][FZ] += results[j][i][2] def init_lattice(): global xyz pnum = 0 while pnum < PN: xyz[pnum][PX] = random.uniform(-1,1) xyz[pnum][PY] = random.uniform(-1,1) xyz[pnum][PZ] = random.uniform(-1,1) xyz[pnum][FX] = random.uniform(-1,1) xyz[pnum][FY] = random.uniform(-1,1) xyz[pnum][FZ] = random.uniform(-1,1) pnum += 1 if __name__ == "__main__": init_lattice() find_pair() |
Following previous parallel computation of Coulomb force in multiprocessing, we test parallel computation of Coulomb force using threading. The same condition as before the multiprocessing calculation condition, the core number was set to 4 and the number of calculated particles was set to 15 ^ 3.
As a result, the calculation time is 14.36 [sec]. The computation with multiprocessing was 2.09 [sec], so it took a long time or it took more time than when computing with single thread. In threading, it is considered that an exclusive lock (GIL) is occurring when memory access is performed in order to acquire necessary information for each thread to share memory. If we try to run with thread number 1, the processing time is 6.92 [sec]. It was faster to not thread.