float16 matmul is way slower than float32 matmul on CPU

float16 matmul is way slower than float32 matmul on CPU #24738

 Open

dchatterjee172 opened this issue on 7 Jan 2019 · 1 comment

 Open

float16 matmul is way slower than float32 matmul on CPU#24738

dchatterjee172 opened this issue on 7 Jan 2019 · 1 comment

Comments

@dchatterjee172

 

dchatterjee172 commented on 7 Jan 2019

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): YES
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • TensorFlow installed from (source or binary): Binary
  • TensorFlow version (use command below): 1.12.0
  • Python version: 3.5.2

You can collect some of this information using our environment capture script
You can also obtain the TensorFlow version with
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the current behavior
float16 matmul is way slower than float32 matmul on CPU

Code to reproduce the issue

import tensorflow as tf
import time
from datetime import timedelta


a = tf.random.normal(shape=[1, 768, 768], dtype=tf.float16)
b = tf.random.normal(shape=[1, 768, 768], dtype=tf.float16)

c = tf.random.normal(shape=[1, 768, 768], dtype=tf.float32)
d = tf.random.normal(shape=[1, 768, 768], dtype=tf.float32)

e = tf.matmul(a, b)
f = tf.matmul(c, d)

config = tf.ConfigProto(
    intra_op_parallelism_threads=24,
    inter_op_parallelism_threads=24,
    allow_soft_placement=True,
    device_count={"GPU": 0},
)

with tf.Session(config=config) as sess:
    for i in range(100):
        if i % 2:
            print("16bit -- ", end="")
            op = e
        else:
            print("32bit -- ", end="")
            op = f
        start = time.monotonic()
        sess.run(op)
        end = time.monotonic()
        print(i, timedelta(seconds=end - start))

output

2019-01-07 16:06:19.698878: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow bi
nary was not compiled to use: AVX512F
32bit -- 0 0:00:00.017297
16bit -- 1 0:00:00.275746
32bit -- 2 0:00:00.002908
16bit -- 3 0:00:00.261320
32bit -- 4 0:00:00.003028
16bit -- 5 0:00:00.253561
32bit -- 6 0:00:00.002849
16bit -- 7 0:00:00.256515
32bit -- 8 0:00:00.006011
16bit -- 9 0:00:00.255613
32bit -- 10 0:00:00.003996
16bit -- 11 0:00:00.242231
32bit -- 12 0:00:00.003338

@jvishnuvardhan jvishnuvardhan self-assigned this on 9 Jan 2019

@jvishnuvardhan jvishnuvardhan added type:support type:others comp:ops labels on 9 Jan 2019

@jvishnuvardhan jvishnuvardhan assigned rmlarsen and unassigned jvishnuvardhan on 9 Jan 2019

@naisy

 

naisy commented on 9 Jan 2019

It is simple, because Intel Architecture does not support FP16.
See also:
https://stackoverflow.com/questions/49995594/half-precision-floating-point-arithmetic-on-intel-chips
https://stackoverflow.com/questions/15340781/python-numpy-data-types-performance

👍 1

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值