Python独習！

習得したPython知識をペイフォワード

Pythonでグラフ画像から数値を読み取ってエクセルに出力する

Python

シーンとしては、例えば、製品Aと製品Bの性能を比較するときに特性図（グラフ）を参照することがある。カタログやWebにグラフは掲載されているが、画像になっているので重ね合わせて比較することができず、なんとなくAの方が優れてるかな？なんてあいまいな感じに終わってしまう。メーカーに問い合わせても数値を提供してくれることは稀。まぁそりゃそうだろう。
ということで、画像になっているグラフから数値を読み取って、それをエクセルに出力するプログラムを作った。

結果

参考データとしては十分に使えるレベルで数値化できたと思う。画像処理の膨張縮小、線の検出で位置ズレが起きているはずなのであくまで参考データ。
左が入力画像、右が出力結果。数値はエクセルに出力するがグラフ化は手作業で。
f:id:greenhornprofessional:20200603214010p:plain

注意事項としては、

第一象限のグラフしか対応していない。（マイナスを含むデータは非対応）
対数グラフは対応していない。
入力するグラフ画像は補助線よりもデータの線が太い必要がある。画像処理で消せなくなる。
入力するグラフ画像の淵はペイントなどで消す必要あり。上記の左画像の元データは以下。

　 f:id:greenhornprofessional:20200603214749j:plain

プログラム

相変わらず、センスのなさに歯がゆい思いがする…まぁ動くのでよしとするが。

# 29_ExtractChart_001.py
# python 3.8.1
# opencv-python 4.1.2.30
# coding: utf-8
#
import cv2
import numpy as np
import datetime
import openpyxl

#===================#
# Define parameters #
#===================#
image = "f.png"
k = 2
th = 100
y_max = 1
x_min = 350
x_max = 750

#=================#
# Define function #
#=================#
# Function for extracting XY data from an imaged chart.
def get_profile():
    ## Add closing and threshold to erase auxiliary lines on the chart.
    img = cv2.imread(image, 0)
    kernel = np.ones((k, k), np.uint8)
    ret,img = cv2.threshold(img, th, 255, cv2.THRESH_BINARY)
    img = cv2.dilate(img, kernel,iterations = 1)
    img = cv2.erode(img, kernel,iterations = 1)
#    img = cv2.dilate(img, kernel,iterations = 10)
#    img = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
#    img = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
    cv2.imshow("output", img)

    ## Get num of pixels of img. h:height  w:width
    h, w = img.shape
    
    ## Array for XY data. [X, Y].    
    profile = []
    ## print(type(profie)) -> <class 'list'> Not nparray!

    ## Get each XY-coordinate of each points on the line of the chart. 
    for i in range(w):
        line = img[:,i]
        edge = [j for j in range(h-1) if line[j] != line[j+1]]      ##Seach two inflection points (W to B and B to W) on a vertical line.
        if len(edge) < 1:
            val = 0
        elif len(edge) == 1:
            if edge[0] > h/2:
                val = 0
            else:
                val = h
        else:
            val = h - sum(edge)/2       ## Reverse Y-coordinate(top to bottom -> bottom to top). 
        point = [i+1, val]              ## XY data [X, Y]
        profile.append(point)

    ## Do unit conversion on X and Y coordinate.
    y = max(profile, key= lambda p:p[1])[1]     ##Get maximum value of Y data.
    for q in range(w):
        profile[q][0] = profile[q][0] * (x_max - x_min) / w + x_min
        profile[q][1] = profile[q][1] * y_max / y

    return profile

# Export 2Darray to xlsx format.           
def export_xlsx(arry):
    wb = openpyxl.Workbook()
    ws = wb.create_sheet(index= 0, title = "Line profile")
    ws = wb["Line profile"]             ## I don't know how this works.
    wb.active = wb.sheetnames.index("Line profile")

    row = 1
    for i in arry:
        celA = "A" + str(row)
        celB = "B" + str(row)
        ws[celA] = i[0]
        ws[celB] = i[1]
        row += 1
    try:
        now = datetime.datetime.now()
        wb.save('LineProfile_{0:%Y%m%d%H%M%S}.xlsx'.format(now))
        print("Save completed")
    except:
        print("Save failed!")
  
#======#
# Main #
#======#
if __name__ == "__main__":
    export_xlsx(get_profile())

/* -----codeの行番号----- */