2017-02-09 4 views
1

TensorFlowを使用して、2つのソースからのデータを使用してモデルをトレーニングします。両方のソースについて、トレーニングと検証データの形状はほぼ同じで、全体のdtypはnp.float32です。Tensorflowは、1つのデータセットに対してGPUを使用していません。非常に似たデータセットで使用されます。

最初のデータセットを使用すると、マシン上のGPUが使用されますが、2番目のデータセットを使用すると、GPUは使用されません。

誰かが調査する方法についていくつか提案していますか?

print(s1_train_data.shape) 
print(s1_train_data.values) 
(1165032, 941) 
[[ 0.45031181 -0.99680316 0.63686389 ..., 0.22323072 -0.37929842 0.  ] 
[-0.40660214 0.34022757 -0.00710014 ..., -1.43051076 -0.14785887 1.  ] 
[ 0.03955967 -0.91227823 0.37887612 ..., 0.16451506 -1.02560401 0.  ] 
..., 
[ 0.11746094 -0.18229018 0.43319091 ..., 0.36532226 -0.48208624 0.  ] 
[ 0.110379 -1.07364404 0.42837444 ..., 0.74732345 0.92880726 0.  ] 
[-0.81027234 -1.04290771 -0.56407243 ..., 0.25084609 -0.1797282 1.  ]] 

print(s2_train_data.shape) 
print(s2_train_data.values) 
(559873, 941) 
[[ 0.   0.   0.   ..., -1.02008295 0.27371082 0.  ] 
[ 0.   0.   0.   ..., -0.74775815 0.18743835 0.  ] 
[ 0.   0.   0.   ..., 0.6469788 0.67864949 1.  ] 
..., 
[ 0.   0.   0.   ..., -0.88198501 -0.02421325 1.  ] 
[ 0.   0.   0.   ..., 0.28361112 -1.08478808 1.  ] 
[ 0.   0.   0.   ..., 0.22360609 0.50698668 0.  ]] 

編集。 log_device_placement = Trueのログの抜粋です。

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GRID K520 
major: 3 minor: 0 memoryClockRate (GHz) 0.797 
pciBusID 0000:00:03.0 
Total memory: 4.00GiB 
Free memory: 3.95GiB 
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x7578380 
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties: 
name: GRID K520 
major: 3 minor: 0 memoryClockRate (GHz) 0.797 
pciBusID 0000:00:04.0 
Total memory: 4.00GiB 
Free memory: 3.95GiB 
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x7c54b10 
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties: 
name: GRID K520 
major: 3 minor: 0 memoryClockRate (GHz) 0.797 
pciBusID 0000:00:05.0 
Total memory: 4.00GiB 
Free memory: 3.95GiB 
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x65bb1d0 
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties: 
name: GRID K520 
major: 3 minor: 0 memoryClockRate (GHz) 0.797 
pciBusID 0000:00:06.0 
Total memory: 4.00GiB 
Free memory: 3.95GiB 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 1 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 2 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 3 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 2 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 3 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 1 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 3 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 1 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 2 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y N N N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: N Y N N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2: N N Y N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3: N N N Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0) 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GRID K520, pci bus id: 0000:00:04.0) 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GRID K520, pci bus id: 0000:00:05.0) 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GRID K520, pci bus id: 0000:00:06.0) 
Device mapping: 
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GRID K520, pci bus id: 0000:00:03.0 
/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: GRID K520, pci bus id: 0000:00:04.0 
/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: GRID K520, pci bus id: 0000:00:05.0 
/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: GRID K520, pci bus id: 0000:00:06.0 
I tensorflow/core/common_runtime/direct_session.cc:255] Device mapping: 
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GRID K520, pci bus id: 0000:00:03.0 
/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: GRID K520, pci bus id: 0000:00:04.0 
/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: GRID K520, pci bus id: 0000:00:05.0 
/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: GRID K520, pci bus id: 0000:00:06.0 

WARNING:tensorflow:From tf.py:183 in get_session.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02. 
Instructions for updating: 
Use `tf.global_variables_initializer` instead. 
gradients_3/add_grad/Shape_1: (Const): /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/add_grad/Shape_1: (Const)/job:localhost/replica:0/task:0/gpu:0 
gradients_3/add_2_grad/Shape_1: (Const): /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/add_2_grad/Shape_1: (Const)/job:localhost/replica:0/task:0/gpu:0 
gradients_3/gradients_2/Mean_1_grad/Tile_grad/range: (Range): /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/gradients_2/Mean_1_grad/Tile_grad/range: (Range)/job:localhost/replica:0/task:0/gpu:0 
gradients_3/gradients_2/Mean_1_grad/truediv_grad/Shape_1: (Const): /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/gradients_2/Mean_1_grad/truediv_grad/Shape_1: (Const)/job:localhost/replica:0/task:0/gpu:0 
gradients_3/gradients_2/logistic_loss_1_grad/Sum_grad/Size: (Const): /job:localhost/replica:0/task:0/gpu:0 
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/gradients_2/logistic_loss_1_grad/Sum_grad/Size: (Const)/job:localhost/replica:0/task:0/gpu:0 
gradients_3/gradients_2/logistic_loss_1_grad/Sum_grad/range: (Range): /job:localhost/replica:0/task:0/gpu:0 

それは、GPU上でタスクを配置しているように見えるん、しかし、私はまだのnvidia-SMIモニターにほぼ完全に0%GPU-Utilのを参照してください。

もちろん、パンダのデータフレームはメモリに格納されています。このプロセスに影響を与える可能性がある他のIOはありますか?

編集2:高速データセットと低速データセットの両方のlog_device_placementログを取得しました。 GPUの使用率が25%で、もう1つが0%であっても、これらは同じです。私の頭を本当に傷つける....

+1

[log_device_placement](https://www.tensorflow.org/how_tos/using_gpu/)を有効にして、どのOPプレースメントが異なるのか把握できますか? –

+0

アレンさん、ありがとう、私は明日オフィスに帰るとすぐにこれを試してみるよ – MarkNS

答えて

1

遅れの原因は、DataFrameをバックアップするndarrayのメモリレイアウトでした。 s2データは、列とメジャーであり、フィーチャーとターゲットの各行が連続していないことを意味しています。

s2_train_data = s2_train_data.values.copy(order='C') 

を、今GPUは26%の使用率で実行されている:

この操作は、メモリレイアウトを変更します。幸せな日:)

関連する問題