[밑바닥 딥러닝] 2차원 합성곱 함수 속도 비교 (2 라운드)

일전에 2차원 합성곱 함수 속도 비교를 한 적이 있다.

blog.naver.com/madrabbit7/222241306389

다시 보니 정확하지 않거나 깔끔하지 못한 부분이 보인다.

이번에는 필자가 Ctypes을 이용해 C로 작성한 동적 라이브러리를 파이썬에서 호출하여 계산하는 함수,

_directConvolve2d를 파이썬의 Scipy signal 함수인 correlate2d 함수와 비교를 하는 데 목적이 있으므로,

되도록 정확성을 기하도록 하겠다.

_directConvolve2d 함수는 stride를 처리할 수 있지만 correlate2d는 사용 목적이 약간 다른 함수라서, stride를 지원하지 않는다.

필자가 예전에 짜둔 순수 파이썬 코드만으로 2차원 합성곱을 구하는 함수인 hcorrelate2d 함수는 느려터지긴 했지만, 순수하게 파이썬 코드만으로 짜면 속도가 얼마나 느려지는지 보여주고 싶다.

2차원 합성곱을 구하다 보니, 데이터 개수(N)와 채널 수(C)를 넣지 못하고 단순히 1장의 필터를 사용한 흑백 이미지 처리에 국한되어 있다.

어쨌거나 N by N 데이터 갯수를 점점 올려보면서 성능을 테스트해보자.

예전에 테스트했을 때, 파이썬 코드만으로 이루어진 hcorrelate2d 함수는 C로 짜여진 Python 내장 함수인 correlate2d에 비해 수백배 느렸다. 여전히 그럴 것이다. 달라진 것이 별로 없으므로. 하지만 C로 짜서 C 언어로 합성곱을 구하는 Ctypes 함수가 등장했다. 성질 같으면 어셈블리어로 합성곱을 구해서..... (워 워 참아...인생은 짧다구...할일이 많잖아... 솔직히 너 어셈블리어 잘하지 못했잖아. 기억도 가물가물할 테고...)

#!/usr/bin/env python3

import ctypes
import numpy as np
from scipy import signal
import sys, time

# I wrote this python function which impliments Convolution definition.
# This is just for 2 demenstional data. Provide no stride and padding.
def hcorrelate2d(x, f):
    """
    2차원 행렬 데이터와 필터를 입력받아 합성곱을 하여 돌려준다.
    x: 데이터 행렬
    f: 필터 행렬
    """
    if x.ndim != f.ndim:
        sys.exit(f"x의 차원({x.ndim})과 f의 차원({f.ndim})은 같아야 합니다.")

    xh, xw = x.shape[0], x.shape[1]
    fh, fw = f.shape[0], f.shape[1]

    output = np.zeros((xh-fh+1, xw-fw+1)).astype(int)
    oh = output.shape[0] #3
    ow = output.shape[1] #3

    for i in range(oh):
        for j in range(ow):
            mirror = x[i:i+fh, j:j+fw] # 미러를 만듦
            output[i, j] = np.sum(mirror * f)

    return output

# I wrote folling code. It uses Ctypes and call C function named '_directConvolve2d'
def directConvolve2dCtypes (x, filter, stride=1):
    if x.ndim != 2 or filter.ndim != 2:
        sys.exit("x.ndim ({x.ndim}) and filter.ndim({filter.ndim}) should be 2.")
    xh, xw = x.shape
    fh, fw = filter.shape

    oh = 1 + int((xh-fh)/stride)
    ow = 1 + int((xw-fw)/stride)
    # out의 내용을 여기서 0으로 초기화 하고 있으므로 C 함수에서 굳이 0으로 다시 초기화할 필요는 없다.
    out = np.zeros((oh, ow), dtype=np.float64)

    c_lib = ctypes.CDLL("./directConvolve2d.so")
    # c_lib.argtypes = [ctypes.POINTER(ctypes.c_double), ctypes.c_int, ctypes.c_int,
    #           ctypes.POINTER(ctypes.c_double), ctypes.c_int, ctypes.c_int, 
    #           ctypes.POINTER(ctypes.c_double), ctypes.c_int]
    # c_lib.restype = None

    # double 형 자료형 세 가지 배열에 대해 Ctypes 타입 자료형을 적용한다. 적용해봤자 포인터지만.
    src = x.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
    f = filter.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
    dest = out.ctypes.data_as(ctypes.POINTER(ctypes.c_double))

    # src는 원본 데이터에 대한 Ctypes의 포인터, xh, xw은 원본 데이터 shape, f는 필터 포인터, 
    # fh, fw은 필터의 shape, dest는 out을 저장할 배열 포인터, stride는 보폭이다.
    # the Ctypes function name begins '_' for distingushment, it's my taste.
    c_lib._directConvolve2d(src, xh, xw, f, fh, fw, dest, stride)
    
    # 처리를 마쳤으면 합성곱 결과가 out에 저장되어 있다.
    return out

# The following C code is _directConvolve2d.c which was origin of 'directConvolve2d.so'.
# In linux, I compiled this using folling command,
# gcc -shared -o abc.so -fPIC abc.c
# you can change 'abc' to acutal prefix of the filename.

"""
void _directConvolve2d(double *src, int xh, int xw, double *f, int fh, int fw, 
                    double *dst, int stride) {
    int oh = 1 + (xh-fh)/stride;
    int ow = 1 + (xw-fw)/stride;
    for (int i = 0; i < oh; i++) {
        for (int j = 0; j < ow; j++) {
            int start = i*stride * xw + j*stride;
            // printf("start = %d\n", start);
            for (int k = 0; k < fh; k++) {
                for (int l = 0; l < fw; l++) {
                    dst[i * ow + j] += src[start + l + xw * k] * f[k * fw + l];
                }
            }
        }
    }
}
""" 

#--------------Test Part --------------
x = np.arange(0, 1024*1024, dtype=np.float64)
x = x.reshape(1024, 1024)
filter = np.array([[1, 0, -2], [1, 0, 1], [1, -1, 0]], dtype=np.float64)                    

start = time.time()
out = signal.correlate2d(x, filter, mode='valid')
now = time.time()
print("-------signal.correlate2d -----------------")
print(out)
print("Taken time: ", now - start)

start = time.time()
out2 = directConvolve2dCtypes (x, filter, stride=1)
now = time.time()
print("-------directConvolve2dCtypes -----------------")
print(out2)
print("Taken time: ", now - start)

start = time.time()
out3 = hcorrelate2d(x, filter)
now = time.time()
print("-------hcorrelate2d -----------------")
print(out3)
print("Taken time: ", now - start)

주석에 영어가 있는 것은 리눅스에서 Spyder IDE가 한글 입력을 지원하지 않아서다. 영작 실력 늘어나겠네...쩝.

실행 결과:

-------signal.correlate2d -----------------
[[   2045.    2046.    2047. ...    3064.    3065.    3066.]
[   3069.    3070.    3071. ...    4088.    4089.    4090.]
[   4093.    4094.    4095. ...    5112.    5113.    5114.]
...
[1045501. 1045502. 1045503. ... 1046520. 1046521. 1046522.]
[1046525. 1046526. 1046527. ... 1047544. 1047545. 1047546.]
[1047549. 1047550. 1047551. ... 1048568. 1048569. 1048570.]]
Taken time:  0.07471561431884766
-------directConvolve2dCtypes -----------------
[[   2045.    2046.    2047. ...    3064.    3065.    3066.]
[   3069.    3070.    3071. ...    4088.    4089.    4090.]
[   4093.    4094.    4095. ...    5112.    5113.    5114.]
...
[1045501. 1045502. 1045503. ... 1046520. 1046521. 1046522.]
[1046525. 1046526. 1046527. ... 1047544. 1047545. 1047546.]
[1047549. 1047550. 1047551. ... 1048568. 1048569. 1048570.]]
Taken time:  0.056177616119384766
-------hcorrelate2d -----------------
[[   2045    2046    2047 ...    3064    3065    3066]
[   3069    3070    3071 ...    4088    4089    4090]
[   4093    4094    4095 ...    5112    5113    5114]
...
[1045501 1045502 1045503 ... 1046520 1046521 1046522]
[1046525 1046526 1046527 ... 1047544 1047545 1047546]
[1047549 1047550 1047551 ... 1048568 1048569 1048570]]
Taken time:  14.364908695220947

놀랍게도 필자가 짠 Ctypes 함수인 directConvolve2dCtypes 가 Scipy 내장 함수 속도를 앞지르고 있다.

호레이~ 롱 리브 씨타입! 마이 아너러벌, 브릴리언트, 스마트....유를 내가 사랑해주마...

이제 3백배쯤 느린 hcorrelate2d 함수는 빼고 실행해보자. 100배 정도 데이터를 키워서...

shape을 (10240, 10240)으로 만들었다. 실행 결과는 다음과 같다.

미세하게 필자가 짠 함수가 속도가 빠르게 나오지만 4번 중 1번은 싸이파이 함수가 미세하게 속도가 빠르다.

그냥 속도가 동일하다고 보면 되겠다.

-------signal.correlate2d -----------------
[[2.04770000e+04 2.04780000e+04 2.04790000e+04 ... 3.07120000e+04
  3.07130000e+04 3.07140000e+04]
[3.07170000e+04 3.07180000e+04 3.07190000e+04 ... 4.09520000e+04
  4.09530000e+04 4.09540000e+04]
[4.09570000e+04 4.09580000e+04 4.09590000e+04 ... 5.11920000e+04
  5.11930000e+04 5.11940000e+04]
...
[1.04826877e+08 1.04826878e+08 1.04826879e+08 ... 1.04837112e+08
  1.04837113e+08 1.04837114e+08]
[1.04837117e+08 1.04837118e+08 1.04837119e+08 ... 1.04847352e+08
  1.04847353e+08 1.04847354e+08]
[1.04847357e+08 1.04847358e+08 1.04847359e+08 ... 1.04857592e+08
  1.04857593e+08 1.04857594e+08]]
Taken time:  6.099974632263184
-------directConvolve2dCtypes -----------------
[[2.04770000e+04 2.04780000e+04 2.04790000e+04 ... 3.07120000e+04
  3.07130000e+04 3.07140000e+04]
[3.07170000e+04 3.07180000e+04 3.07190000e+04 ... 4.09520000e+04
  4.09530000e+04 4.09540000e+04]
[4.09570000e+04 4.09580000e+04 4.09590000e+04 ... 5.11920000e+04
  5.11930000e+04 5.11940000e+04]
...
[1.04826877e+08 1.04826878e+08 1.04826879e+08 ... 1.04837112e+08
  1.04837113e+08 1.04837114e+08]
[1.04837117e+08 1.04837118e+08 1.04837119e+08 ... 1.04847352e+08
  1.04847353e+08 1.04847354e+08]
[1.04847357e+08 1.04847358e+08 1.04847359e+08 ... 1.04857592e+08
  1.04857593e+08 1.04857594e+08]]
Taken time:  5.801564931869507

미친토끼의 가출일기

[밑바닥 딥러닝] 2차원 합성곱 함수 속도 비교 (2 라운드)

티스토리툴바