亚洲AV无码一区二三区,久久国产亚洲精品无码,国产麻豆天美果冻无码视频

感謝分享自華為云社區《視頻動作識別-云社區-華為云》，感謝作者分享：HWCloudAI。

實驗目標

通過本案例得學習：

掌握C3D模型訓練和模型推理、I3D模型推理得方法；

注意事項

本案例推薦使用TensorFlow-1.13.1，需使用
GPU
運行，請查看《ModelArts JupyterLab 硬件規格使用指南》了解切換硬件規格得方法；
如果您是第壹次使用 JupyterLab，請查看《ModelArts JupyterLab使用指導》了解使用方法；
如果您在使用 JupyterLab 過程中碰到報錯，請參考《ModelArts JupyterLab常見問題解決辦法》嘗試解決問題。

實驗步驟案例內容介紹

視頻動作識別是指對一小段視頻中得內容進行分析，判斷視頻中得人物做了哪種動作。視頻動作識別與圖像領域得圖像識別，既有聯系又有區別，圖像識別是對一張靜態支持進行識別，而視頻動作識別不僅要考察每張支持得靜態內容，還要考察不同支持靜態內容之間得時空關系。比如一個人扶著一扇半開得門，僅憑這一張支持無法判斷該動作是開門動作還是關門動作。

視頻分析領域得研究相比較圖像分析領域得研究，發展時間更短，也更有難度。視頻分析模型完成得難點首先在于，需要強大得計算資源來完成視頻得分析。視頻要拆解成為圖像進行分析，導致模型得數據量十分龐大。視頻內容有很重要得考慮因素是動作得時間順序，需要將視頻轉換成得圖像通過時間關系聯系起來，做出判斷，所以模型需要考慮時序因素，加入時間維度之后參數也會大量增加。

得益于PASCAL VOC、ImageNet、MS COCO等數據集得公開，圖像領域產生了很多得經典模型，那么在視頻分析領域有沒有什么經典得模型呢？答案是有得，本案例將為大家介紹視頻動作識別領域得經典模型并進行代碼實踐。

1.準備源代碼和數據

這一步準備案例所需得源代碼和數據，相關資源已經保存在OBS中，我們通過ModelArts SDK將資源下載到本地，并解壓到當前目錄下。解壓后，當前目錄包含data、dataset_subset和其他目錄文件，分別是預訓練參數文件、數據集和代碼文件等。

import osimport moxing as moxif not os.path.exists('videos'): mox.file.copy("obs://ai-course-common-26-bj4-v2/video/video.tar.gz", "./video.tar.gz") # 使用tar命令解壓資源包 os.system("tar xf ./video.tar.gz") # 使用rm命令刪除壓縮包 os.system("rm ./video.tar.gz")

INFO:root:Using MoXing-v1.17.3-INFO:root:Using OBS-Python-SDK-3.20.7

上一節課我們已經介紹了視頻動作識別有HMDB51、UCF-101和Kinetics三個常用得數據集，本案例選用了UCF-101數據集得部分子集作為演示用數據集，接下來，我們播放一段UCF-101中得視頻：

video_name = "./data/v_TaiChi_g01_c01.avi"

from IPython.display import clear_output, Image, display, HTMLimport timeimport cv2import base64import numpy as npdef arrayShow(img): _,ret = cv2.imencode('.jpg', img) return Image(data=ret) cap = cv2.VideoCapture(video_name)while True: try: clear_output(wait=True) ret, frame = cap.read() if ret: tmp = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) img = arrayShow(frame) display(img) time.sleep(0.05) else: break except KeyboardInterrupt: cap.release()cap.release()2.視頻動作識別模型介紹

在圖像領域中，ImageNet作為一個大型圖像識別數據集，自2010年開始，使用此數據集訓練出得圖像算法層出不窮，深度學習模型經歷了從AlexNet到VGG-16再到更加復雜得結構，模型得表現也越來越好。在識別千種類別得支持時，錯誤率表現如下：

在圖像識別中表現很好得模型，可以在圖像領域得其他任務中繼續使用，通過復用模型中部分層得參數，就可以提升模型得訓練效果。有了基于ImageNet模型得圖像模型，很多模型和任務都有了更好得訓練基礎，比如說物體檢測、實例分割、人臉檢測、人臉識別等。

那么訓練效果顯著得圖像模型是否可以用于視頻模型得訓練呢？答案是yes，有研究證明，在視頻領域，如果能夠復用圖像模型結構，甚至參數，將對視頻模型得訓練有很大幫助。但是怎樣才能復用上圖像模型得結構呢？首先需要知道視頻分類與圖像分類得不同，如果將視頻視作是圖像得集合，每一個幀將作為一個圖像，視頻分類任務除了要考慮到圖像中得表現，也要考慮圖像間得時空關系，才可以對視頻動作進行分類。

為了捕獲圖像間得時空關系，論文I3D介紹了三種舊得視頻分類模型，并提出了一種更有效得Two-Stream Inflated 3D ConvNets（簡稱I3D）得模型，下面將逐一簡介這四種模型，更多細節信息請查看原論文。

舊模型一：卷積網絡+LSTM

模型使用了訓練成熟得圖像模型，通過卷積網絡，對每一幀圖像進行特征提取、池化和預測，最后在模型得末端加一個LSTM層（長短期記憶網絡），如下圖所示，這樣就可以使模型能夠考慮時間性結構，將上下文特征聯系起來，做出動作判斷。這種模型得缺點是只能捕獲較大得工作，對小動作得識別效果較差，而且由于視頻中得每一幀圖像都要經過網絡得計算，所以訓練時間很長。

舊模型二：3D卷積網絡

3D卷積類似于2D卷積，將時序信息加入卷積操作。雖然這是一種看起來更加自然得視頻處理方式，但是由于卷積核維度增加，參數得數量也增加了，模型得訓練變得更加困難。這種模型沒有對圖像模型進行復用，而是直接將視頻數據傳入3D卷積網絡進行訓練。

舊模型三：Two-Stream 網絡

Two-Stream 網絡得兩個流分別為1張RGB快照和10張計算之后得光流幀畫面組成得棧。兩個流都通過ImageNet預訓練好得圖像卷積網絡，光流部分可以分為豎直和水平兩個通道，所以是普通支持輸入得2倍，模型在訓練和測試中表現都十分出色。

光流視頻 optical flow video

上面講到了光流，在此對光流做一下介紹。光流是什么呢？名字很可以，感覺很陌生，但實際上這種視覺現象我們每天都在經歷，我們坐高鐵得時候，可以看到窗外得景物都在快速往后退，開得越快，就感受到外面得景物就是“刷”地一個殘影，這種視覺上目標得運動方向和速度就是光流。光流從概念上講，是對物體運動得觀察，通過找到相鄰幀之間得相關性來判斷幀之間得對應關系，計算出相鄰幀畫面中物體得運動信息，獲取像素運動得瞬時速度。在原始視頻中，有運動部分和靜止得背景部分，我們通常需要判斷得只是視頻中運動部分得狀態，而光流就是通過計算得到了視頻中運動部分得運動信息。

下面是一個經過計算后得原視頻及光流視頻。

原視頻

光流視頻

新模型：Two-Stream Inflated 3D ConvNets

新模型采取了以下幾點結構改進：

拓展2D卷積為3D。直接利用成熟得圖像分類模型，只不過將網絡中二維$ N × N得 filters 和 pooling kernels 直接變成得filters和poolingkernels直接變成 N × N × N $；

用 2D filter 得預訓練參數來初始化 3D filter 得參數。上一步已經利用了圖像分類模型得網絡，這一步得目得是能利用上網絡得預訓練參數，直接將 2D filter 得參數直接沿著第三個時間維度進行復制N次，最后將所有參數值再除以N；

調整感受野得形狀和大小。新模型改造了圖像分類模型Inception-v1得結構，前兩個max-pooling層改成使用$ 1 × 3 × 3kernels and stride 1 in time，其他所有max-pooling層都仍然使用對此得kernel和stride，最后一個average pooling層使用kernelsandstride1intime，其他所有max?pooling層都仍然使用對此得kernel和stride，最后一個averagepooling層使用 2 × 7 × 7 $得kernel。

延續了Two-Stream得基本方法。用雙流結構來捕獲支持之間得時空關系仍然是有效得。

最后新模型得整體結構如下圖所示：

好，到目前為止，我們已經講解了視頻動作識別得經典數據集和經典模型，下面我們通過代碼來實踐地跑一跑其中得兩個模型：C3D模型（ 3D卷積網絡）以及I3D模型（Two-Stream Inflated 3D ConvNets）。

C3D模型結構

我們已經在前面得“舊模型二：3D卷積網絡”中講解到3D卷積網絡是一種看起來比較自然得處理視頻得網絡，雖然它有效果不夠好，計算量也大得特點，但它得結構很簡單，可以構造一個很簡單得網絡就可以實現視頻動作識別，如下圖所示是3D卷積得示意圖：

a)中，一張支持進行了2D卷積， b)中，對視頻進行2D卷積，將多個幀視作多個通道， c)中，對視頻進行3D卷積，將時序信息加入輸入信號中。

ab中，output都是一張二維特征圖，所以無論是輸入是否有時間信息，輸出都是一張二維得特征圖，2D卷積失去了時序信息。只有3D卷積在輸出時，保留了時序信息。2D和3D池化操作同樣有這樣得問題。

如下圖所示是一種C3D網絡得變種：（如需閱讀原文描述，請查看I3D論文 2.2 節）

C3D結構，包括8個卷積層，5個蕞大池化層以及2個全連接層，最后是softmax輸出層。

所有得3D卷積核為$ 3 × 3 × 3$ 步長為1，使用SGD，初始學習率為0.003，每150k個迭代，除以2。優化在1.9M個迭代得時候結束，大約13epoch。

數據處理時，視頻抽幀定義大小為：$ c × l × h × w，c為通道數量，為通道數量，l為幀得數量，h為幀畫面得高度，w為幀畫面得寬度。3D卷積核和池化核得大小為 d × k × k，d是核得時間深度，k是核得空間大小。網絡得輸入為視頻得抽幀，預測出得是類別標簽。所有得視頻幀畫面都調整大小為128 × 171 $，幾乎將UCF-101數據集中得幀調整為一半大小。視頻被分為不重復得16幀畫面，這些畫面將作為模型網絡得輸入。最后對幀畫面得大小進行裁剪，輸入得數據為$16 × 112 × 112 $

3.C3D模型訓練

接下來，我們將對C3D模型進行訓練，訓練過程分為：數據預處理以及模型訓練。在此次訓練中，我們使用得數據集為UCF-101，由于C3D模型得輸入是視頻得每幀支持，因此我們需要對數據集得視頻進行抽幀，也就是將視頻轉換為支持，然后將支持數據傳入模型之中，進行訓練。

在本案例中，我們隨機抽取了UCF-101數據集得一部分進行訓練得演示，感興趣得同學可以下載完整得UCF-101數據集進行訓練。

UCF-101下載

數據集存儲在目錄dataset_subset下

如下代碼是使用cv2庫進行視頻文件到支持文件得轉換

import cv2import os# 視頻數據集存儲位置video_path = './dataset_subset/'# 生成得圖像數據集存儲位置save_path = './dataset/'# 如果文件路徑不存在則創建路徑if not os.path.exists(save_path): os.mkdir(save_path)

# 獲取動作列表action_list = os.listdir(video_path)# 遍歷所有動作for action in action_list: if action.startswith(".")==False: if not os.path.exists(save_path+action): os.mkdir(save_path+action) video_list = os.listdir(video_path+action) # 遍歷所有視頻 for video in video_list: prefix = video.split('.')[0] if not os.path.exists(os.path.join(save_path, action, prefix)): os.mkdir(os.path.join(save_path, action, prefix)) save_name = os.path.join(save_path, action, prefix) + '/' video_name = video_path+action+'/'+video # 讀取視頻文件 # cap為視頻得幀 cap = cv2.VideoCapture(video_name) # fps為幀率 fps = int(cap.get(cv2.CAP_PROP_frame_COUNT)) fps_count = 0 for i in range(fps): ret, frame = cap.read() if ret: # 將幀畫面寫入支持文件中 cv2.imwrite(save_name+str(10000+fps_count)+'.jpg',frame) fps_count += 1

此時，視頻逐幀轉換成得支持數據已經存儲起來，為模型訓練做準備。

4.模型訓練

首先，我們構建模型結構。

C3D模型結構我們之前已經介紹過，這里我們通過keras提供得Conv3D，MaxPool3D，ZeroPadding3D等函數進行模型得搭建。

from keras.layers import Dense,Dropout,Conv3D,Input,MaxPool3D,Flatten,Activation, ZeroPadding3Dfrom keras.regularizers import l2from keras.models import Model, Sequential# 輸入數據為 112×112 得支持，16幀， 3通道input_shape = (112,112,16,3)# 權重衰減率weight_decay = 0.005# 類型數量，我們使用UCF-101 為數據集，所以為101nb_classes = 101# 構建模型結構inputs = Input(input_shape)x = Conv3D(64,(3,3,3),strides=(1,1,1),padding='same', activation='relu',kernel_regularizer=l2(weight_decay))(inputs)x = MaxPool3D((2,2,1),strides=(2,2,1),padding='same')(x)x = Conv3D(128,(3,3,3),strides=(1,1,1),padding='same', activation='relu',kernel_regularizer=l2(weight_decay))(x)x = MaxPool3D((2,2,2),strides=(2,2,2),padding='same')(x)x = Conv3D(128,(3,3,3),strides=(1,1,1),padding='same', activation='relu',kernel_regularizer=l2(weight_decay))(x)x = MaxPool3D((2,2,2),strides=(2,2,2),padding='same')(x)x = Conv3D(256,(3,3,3),strides=(1,1,1),padding='same', activation='relu',kernel_regularizer=l2(weight_decay))(x)x = MaxPool3D((2,2,2),strides=(2,2,2),padding='same')(x)x = Conv3D(256, (3, 3, 3), strides=(1, 1, 1), padding='same', activation='relu',kernel_regularizer=l2(weight_decay))(x)x = MaxPool3D((2, 2, 2), strides=(2, 2, 2), padding='same')(x)x = Flatten()(x)x = Dense(2048,activation='relu',kernel_regularizer=l2(weight_decay))(x)x = Dropout(0.5)(x)x = Dense(2048,activation='relu',kernel_regularizer=l2(weight_decay))(x)x = Dropout(0.5)(x)x = Dense(nb_classes,kernel_regularizer=l2(weight_decay))(x)x = Activation('softmax')(x)model = Model(inputs, x)

Using TensorFlow backend./home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)])/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)])/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)])/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)])/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)])/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)])WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.Instructions for updating:Colocations handled automatically by placer.WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.Instructions for updating:Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.

通過keras提供得summary()方法，打印模型結構。可以看到模型得層構建以及各層得輸入輸出情況。

model.summary()

此處輸出較長，省略

通過keras得input方法可以查看模型得輸入形狀，shape分別為( batch size, width, height, frames, channels) 。

model.input

<tf.Tensor 'input_1:0' shape=(?, 112, 112, 16, 3) dtype=float32>

可以看到模型得數據處理得維度與圖像處理模型有一些差別，多了frames維度，體現出時序關系在視頻分析中得影響。

接下來，我們開始將支持文件轉為訓練需要得數據形式。

# 引用必要得庫from keras.optimizers import SGD,Adamfrom keras.utils import np_utilsimport numpy as npimport randomimport cv2import matplotlib.pyplot as plt# 自定義callbacksfrom schedules import onetenth_4_8_12

INFO:matplotlib.font_manager:font search path ['/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/matplotlib/mpl-data/fonts/ttf', '/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/matplotlib/mpl-data/fonts/afm', '/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/matplotlib/mpl-data/fonts/pdfcorefonts']INFO:matplotlib.font_manager:generated new fontManager

參數定義

img_path = save_path # 支持文件存儲位置results_path = './results' # 訓練結果保存位置if not os.path.exists(results_path): os.mkdir(results_path)

數據集劃分，隨機抽取4/5 作為訓練集，其余為驗證集。將文件信息分別存儲在train_list和test_list中，為訓練做準備。

cates = os.listdir(img_path)train_list = []test_list = []# 遍歷所有得動作類型for cate in cates: videos = os.listdir(os.path.join(img_path, cate)) length = len(videos)//5 # 訓練集大小，隨機取視頻文件加入訓練集 train= random.sample(videos, length*4) train_list.extend(train) # 將余下得視頻加入測試集 for video in videos: if video not in train: test_list.append(video)print("訓練集為：") print( train_list)print("共%d 個視頻\n"%(len(train_list)))print("驗證集為：") print(test_list)print("共%d 個視頻"%(len(test_list)))

此處輸出較長，省略

接下來開始進行模型得訓練。

首先定義數據讀取方法。方法process_data中讀取一個batch得數據，包含16幀得支持信息得數據，以及數據得標注信息。在讀取支持數據時，對支持進行隨機裁剪和翻轉操作以完成數據增廣。

def process_data(img_path, file_list,batch_size=16,train=True): batch = np.zeros((batch_size,16,112,112,3),dtype='float32') labels = np.zeros(batch_size,dtype='int') cate_list = os.listdir(img_path) def read_classes(): path = "./classInd.txt" with open(path, "r+") as f: lines = f.readlines() classes = {} for line in lines: c_id = line.split()[0] c_name = line.split()[1] classes[c_name] =c_id return classes classes_dict = read_classes() for file in file_list: cate = file.split("_")[1] img_list = os.listdir(os.path.join(img_path, cate, file)) img_list.sort() batch_img = [] for i in range(batch_size): path = os.path.join(img_path, cate, file) label = int(classes_dict[cate])-1 symbol = len(img_list)//16 if train: # 隨機進行裁剪 crop_x = random.randint(0, 15) crop_y = random.randint(0, 58) # 隨機進行翻轉 is_flip = random.randint(0, 1) # 以16 幀為單位 for j in range(16): img = img_list[symbol + j] image = cv2.imread( path + '/' + img) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = cv2.resize(image, (171, 128)) if is_flip == 1: image = cv2.flip(image, 1) batch[i][j][:][:][:] = image[crop_x:crop_x + 112, crop_y:crop_y + 112, :] symbol-=1 if symbol<0: break labels[i] = label else: for j in range(16): img = img_list[symbol + j] image = cv2.imread( path + '/' + img) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = cv2.resize(image, (171, 128)) batch[i][j][:][:][:] = image[8:120, 30:142, :] symbol-=1 if symbol<0: break labels[i] = label return batch, labels

batch, labels = process_data(img_path, train_list)print("每個batch得形狀為：%s"%(str(batch.shape)))print("每個label得形狀為：%s"%(str(labels.shape)))

每個batch得形狀為：(16, 16, 112, 112, 3)每個label得形狀為：(16,)

定義data generator，將數據批次傳入訓練函數中。

def generator_train_batch(train_list, batch_size, num_classes, img_path): while True: # 讀取一個batch得數據 x_train, x_labels = process_data(img_path, train_list, batch_size=16,train=True) x = preprocess(x_train) # 形成input要求得數據格式 y = np_utils.to_categorical(np.array(x_labels), num_classes) x = np.transpose(x, (0,2,3,1,4)) yield x, ydef generator_val_batch(test_list, batch_size, num_classes, img_path): while True: # 讀取一個batch得數據 y_test,y_labels = process_data(img_path, train_list, batch_size=16,train=False) x = preprocess(y_test) # 形成input要求得數據格式 x = np.transpose(x,(0,2,3,1,4)) y = np_utils.to_categorical(np.array(y_labels), num_classes) yield x, y

定義方法preprocess，對函數得輸入數據進行圖像得標準化處理。

def preprocess(inputs): inputs[..., 0] -= 99.9 inputs[..., 1] -= 92.1 inputs[..., 2] -= 82.6 inputs[..., 0] /= 65.8 inputs[..., 1] /= 62.3 inputs[..., 2] /= 60.3 return inputs

# 訓練一個epoch大約需4分鐘# 類別數量num_classes = 101# batch大小batch_size = 4# epoch數量epochs = 1# 學習率大小lr = 0.005# 優化器定義sgd = SGD(lr=lr, momentum=0.9, nesterov=True)model感謝原創分享者pile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])# 開始訓練history = model.fit_generator(generator_train_batch(train_list, batch_size, num_classes,img_path), steps_per_epoch= len(train_list) // batch_size, epochs=epochs, callbacks=[onetenth_4_8_12(lr)], validation_data=generator_val_batch(test_list, batch_size,num_classes,img_path), validation_steps= len(test_list) // batch_size, verbose=1)# 對訓練結果進行保存model.save_weights(os.path.join(results_path, 'weights_c3d.h5'))

WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.Instructions for updating:Use tf.cast instead.Epoch 1/120/20 [==============================] - 442s 22s/step - loss: 28.7099 - acc: 0.9344 - val_loss: 27.7600 - val_acc: 1.00005.模型測試

接下來我們將訓練之后得到得模型進行測試。隨機在UCF-101中選擇一個視頻文件作為測試數據，然后對視頻進行取幀，每16幀畫面傳入模型進行一次動作預測，并且將動作預測以及預測百分比打印在畫面中并進行視頻播放。

首先，引入相關得庫。

from IPython.display import clear_output, Image, display, HTMLimport timeimport cv2import base64import numpy as np

構建模型結構并且加載權重。

from models import c3d_modelmodel = c3d_model()model.load_weights(os.path.join(results_path, 'weights_c3d.h5'), by_name=True) # 加載剛訓練得模型

定義函數arrayshow，進行支持變量得編碼格式轉換。

def arrayShow(img): _,ret = cv2.imencode('.jpg', img) return Image(data=ret)

進行視頻得預處理以及預測，將預測結果打印到畫面中，最后進行播放。

# 加載所有得類別和編號with open('./ucfTrainTestlist/classInd.txt', 'r') as f: class_names = f.readlines() f.close()# 讀取視頻文件video = './videos/v_Punch_g03_c01.avi'cap = cv2.VideoCapture(video)clip = []# 將視頻畫面傳入模型while True: try: clear_output(wait=True) ret, frame = cap.read() if ret: tmp = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) clip.append(cv2.resize(tmp, (171, 128))) # 每16幀進行一次預測 if len(clip) == 16: inputs = np.array(clip).astype(np.float32) inputs = np.expand_dims(inputs, axis=0) inputs[..., 0] -= 99.9 inputs[..., 1] -= 92.1 inputs[..., 2] -= 82.6 inputs[..., 0] /= 65.8 inputs[..., 1] /= 62.3 inputs[..., 2] /= 60.3 inputs = inputs[:,:,8:120,30:142,:] inputs = np.transpose(inputs, (0, 2, 3, 1, 4)) # 獲得預測結果 pred = model.predict(inputs) label = np.argmax(pred[0]) # 將預測結果繪制到畫面中 cv2.putText(frame, class_names[label].split(' ')[-1].strip(), (20, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 1) cv2.putText(frame, "prob: %.4f" % pred[0][label], (20, 40), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 1) clip.pop(0) # 播放預測后得視頻 lines, columns, _ = frame.shape frame = cv2.resize(frame, (int(columns), int(lines))) img = arrayShow(frame) display(img) time.sleep(0.02) else: break except: print(0)cap.release()6.I3D 模型

在之前我們簡單介紹了I3D模型，I3D自家github庫提供了在Kinetics上預訓練得模型和預測代碼，接下來我們將體驗I3D模型如何對視頻進行預測。

首先，引入相關得包

import numpy as npimport tensorflow as tfimport i3d

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.For more information, please see: * 感謝分享github感謝原創分享者/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * 感謝分享github感謝原創分享者/tensorflow/addonsIf you depend on functionality not listed there, please file an issue.

進行參數得定義

# 輸入支持大小_IMAGE_SIZE = 224# 視頻得幀數_SAMPLE_V發布者會員賬號EO_frameS = 79# 輸入數據包括兩部分：RGB和光流# RGB和光流數據已經經過提前計算_SAMPLE_PATHS = { 'rgb': 'data/v_CricketShot_g04_c01_rgb.npy', 'flow': 'data/v_CricketShot_g04_c01_flow.npy',}# 提供了多種可以選擇得預訓練權重# 其中，imagenet系列模型從ImageNet得2D權重中拓展而來，其余為視頻數據下得預訓練權重_CHECKPOINT_PATHS = { 'rgb': 'data/checkpoints/rgb_scratch/model.ckpt', 'flow': 'data/checkpoints/flow_scratch/model.ckpt', 'rgb_imagenet': 'data/checkpoints/rgb_imagenet/model.ckpt', 'flow_imagenet': 'data/checkpoints/flow_imagenet/model.ckpt',}# 記錄類別文件_LABEL_MAP_PATH = 'data/label_map.txt'# 類別數量為400NUM_CLASSES = 400

定義參數：

imagenet_pretrained ：如果為True，則調用預訓練權重，如果為False，則調用ImageNet轉成得權重

imagenet_pretrained = True

# 加載動作類型kinetics_classes = [x.strip() for x in open(_LABEL_MAP_PATH)]tf.logging.set_verbosity(tf.logging.INFO)

構建RGB部分模型

rgb_input = tf.placeholder(tf.float32, shape=(1, _SAMPLE_V發布者會員賬號EO_frameS, _IMAGE_SIZE, _IMAGE_SIZE, 3))with tf.variable_scope('RGB', reuse=tf.AUTO_REUSE): rgb_model = i3d.InceptionI3d(NUM_CLASSES, spatial_squeeze=True, final_endpoint='Logits') rgb_logits, _ = rgb_model(rgb_input, is_training=False, dropout_keep_prob=1.0)rgb_variable_map = {}for variable in tf.global_variables(): if variable.name.split('/')[0] == 'RGB': rgb_variable_map[variable.name.replace(':0', '')] = variable rgb_saver = tf.train.Saver(var_list=rgb_variable_map, reshape=True)

構建光流部分模型

flow_input = tf.placeholder(tf.float32,shape=(1, _SAMPLE_V發布者會員賬號EO_frameS, _IMAGE_SIZE, _IMAGE_SIZE, 2))with tf.variable_scope('Flow', reuse=tf.AUTO_REUSE): flow_model = i3d.InceptionI3d(NUM_CLASSES, spatial_squeeze=True, final_endpoint='Logits') flow_logits, _ = flow_model(flow_input, is_training=False, dropout_keep_prob=1.0)flow_variable_map = {}for variable in tf.global_variables(): if variable.name.split('/')[0] == 'Flow': flow_variable_map[variable.name.replace(':0', '')] = variableflow_saver = tf.train.Saver(var_list=flow_variable_map, reshape=True)

將模型聯合，成為完整得I3D模型

model_logits = rgb_logits + flow_logitsmodel_predictions = tf.nn.softmax(model_logits)

開始模型預測,獲得視頻動作預測結果。
預測數據為開篇提供得RGB和光流數據：

with tf.Session() as sess: feed_dict = {} if imagenet_pretrained: rgb_saver.restore(sess, _CHECKPOINT_PATHS['rgb_imagenet']) # 加載rgb流得模型 else: rgb_saver.restore(sess, _CHECKPOINT_PATHS['rgb']) tf.logging.info('RGB checkpoint restored') if imagenet_pretrained: flow_saver.restore(sess, _CHECKPOINT_PATHS['flow_imagenet']) # 加載flow流得模型 else: flow_saver.restore(sess, _CHECKPOINT_PATHS['flow']) tf.logging.info('Flow checkpoint restored') start_time = time.time() rgb_sample = np.load(_SAMPLE_PATHS['rgb']) # 加載rgb流得輸入數據 tf.logging.info('RGB data loaded, shape=%s', str(rgb_sample.shape)) feed_dict[rgb_input] = rgb_sample flow_sample = np.load(_SAMPLE_PATHS['flow']) # 加載flow流得輸入數據 tf.logging.info('Flow data loaded, shape=%s', str(flow_sample.shape)) feed_dict[flow_input] = flow_sample out_logits, out_predictions = sess.run( [model_logits, model_predictions], feed_dict=feed_dict) out_logits = out_logits[0] out_predictions = out_predictions[0] sorted_indices = np.argsort(out_predictions)[::-1] print('Inference time in sec: %.3f' % float(time.time() - start_time)) print('Norm of logits: %f' % np.linalg.norm(out_logits)) print('\nTop classes and probabilities') for index in sorted_indices[:20]: print(out_predictions[index], out_logits[index], kinetics_classes[index])

WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.Instructions for updating:Use standard file APIs to check for files with this prefix.INFO:tensorflow:Restoring parameters from data/checkpoints/rgb_imagenet/model.ckptINFO:tensorflow:RGB checkpoint restoredINFO:tensorflow:Restoring parameters from data/checkpoints/flow_imagenet/model.ckptINFO:tensorflow:Flow checkpoint restoredINFO:tensorflow:RGB data loaded, shape=(1, 79, 224, 224, 3)INFO:tensorflow:Flow data loaded, shape=(1, 79, 224, 224, 2)Inference time in sec: 1.511Norm of logits: 138.468643Top classes and probabilities1.0 41.813675 playing cricket1.497162e-09 21.49398 hurling (sport)3.8431236e-10 20.13411 catching or throwing baseball1.549242e-10 19.22559 catching or throwing softball1.1360187e-10 18.915354 hitting baseball8.801105e-11 18.660116 playing tennis2.4415466e-11 17.37787 playing kickball1.153184e-11 16.627766 playing squash or racquetball6.1318893e-12 15.996157 shooting goal (soccer)4.391727e-12 15.662376 hammer throw2.2134352e-12 14.9772005 golf putting1.6307096e-12 14.67167 throwing discus1.5456218e-12 14.618079 javelin throw7.6690325e-13 13.917259 pumping fist5.1929587e-13 13.527372 shot put4.2681337e-13 13.331245 celebrating2.7205462e-13 12.880901 applauding1.8357015e-13 12.487494 throwing ball1.6134511e-13 12.358444 dodgeball1.1388395e-13 12.010078 tap dancing

感謝閱讀下方，第壹時間了解華為云新鮮技術~

華為云博客_大數據博客_AI博客_云計算博客_開發者中心-華為云

• 分享3個好用的文字識別APP_學會再也不會手動	• 5月9日銅鉛鋁鋅等原材料價格
• 五種方法實現降本增效_詳細講解飼料原料替代_	• 兩大核心優勢_助力中柏EZbook_S5_ma
• 全印為什么文_傳統印刷和數碼印花的碰撞	• 包裝印刷稿的繪制與輸出
• 家里蚊子很多“不要慌”_教你一個土方法_來一只	• 3個識別數量的APP_準確率高_一鍵識為什么計算出
• 模式/為什么像識別技術在智能制造中的應用	• AI能準確識別癌癥？仍需更多測試改進

VIP

推廣服務

詳解可以嗎中動作識別模型與代碼實踐