• 企业400电话
  • 微网小程序
  • AI电话机器人
  • 电商代运营
  • 全 部 栏 目

    企业400电话 网络优化推广 AI电话机器人 呼叫中心 网站建设 商标✡知产 微网小程序 电商运营 彩铃•短信 增值拓展业务
    浅谈tensorflow语义分割api的使用(deeplab训练cityscapes)

    浅谈tensorflow语义分割api的使用(deeplab训练cityscapes)

    安装教程:

    cityscapes训练:

    遇到的坑:

    1. 环境:

    - tensorflow1.8+CUDA9.0+cudnn7.0+annaconda3+py3.5

    - 使用最新的tensorflow1.12或者1.10都不行,报错:报错不造卷积算法(convolution algorithm...)

    2. 数据集转换

    # Exit immediately if a command exits with a non-zero status.
    set -e
    CURRENT_DIR=$(pwd)
    WORK_DIR="."
    # Root path for Cityscapes dataset.
    CITYSCAPES_ROOT="${WORK_DIR}/cityscapes"
    # Create training labels.
    python "${CITYSCAPES_ROOT}/cityscapesscripts/preparation/createTrainIdLabelImgs.py"
    # Build TFRecords of the dataset.
    # First, create output directory for storing TFRecords.
    OUTPUT_DIR="${CITYSCAPES_ROOT}/tfrecord"
    mkdir -p "${OUTPUT_DIR}"
    BUILD_SCRIPT="${CURRENT_DIR}/build_cityscapes_data.py"
    echo "Converting Cityscapes dataset..."
    python "${BUILD_SCRIPT}" \
    
      --cityscapes_root="${CITYSCAPES_ROOT}" \
    
      --output_dir="${OUTPUT_DIR}" \

    - 首先当前conda环境下安装cityscapesScripts模块,要支持py3.5才行;

    - 由于cityscapesscripts/preparation/createTrainIdLabelImgs.py里面默认会把数据集gtFine下面的test,train,val文件夹json文件都转为TrainIdlandelImgs.png;然而在test文件下有很多json文件编码格式是错误的,大约十几张,每次报错,然后将其剔除!!!

    - 然后执行build_cityscapes_data.py将img,lable转换为tfrecord格式。

    3. 训练cityscapes代码

    - 将训练代码写成脚本文件:train_deeplab_cityscapes.sh

    #!/bin/bash
    # CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --backbone resnet --lr 0.01 --workers 4 --epochs 40 --batch-size 16 --gpu-ids 0,1,2,3 --checkname deeplab-resnet --eval-interval 1 --dataset coco
    
    PATH_TO_INITIAL_CHECKPOINT='/home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/model.ckpt'
    PATH_TO_TRAIN_DIR='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train/'
    PATH_TO_DATASET='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/tfrecord'
    WORK_DIR='/home/rjw/tf-models/research/deeplab'
    # From tensorflow/models/research/
    python "${WORK_DIR}"/train.py \
    
        --logtostderr \
    
        --training_number_of_steps=40000 \
    
        --train_split="train" \
    
        --model_variant="xception_65" \
    
        --atrous_rates=6 \
    
        --atrous_rates=12 \
    
        --atrous_rates=18 \
    
        --output_stride=16 \
    
        --decoder_output_stride=4 \
    
        --train_crop_size=513 \
    
        --train_crop_size=513 \
    
        --train_batch_size=1 \
    
        --fine_tune_batch_norm=False \
    
        --dataset="cityscapes" \
    
        --tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \
    
        --train_logdir=${PATH_TO_TRAIN_DIR} \
    
        --dataset_dir=${PATH_TO_DATASET}

    参数分析:

    training_number_of_steps: 训练迭代次数;

    train_crop_size:训练图片的裁剪大小,因为我的GPU只有8G,故我将这个设置为513了;

    train_batch_size: 训练的batchsize,也是因为硬件条件,故保持1;

    fine_tune_batch_norm=False :是否使用batch_norm,官方建议,如果训练的batch_size小于12的话,须将该参数设置为False,这个设置很重要,否则的话训练时会在2000步左右报错

    tf_initial_checkpoint:预训练的初始checkpoint,这里设置的即是前面下载的../research/deeplab/backbone/deeplabv3_cityscapes_train/model.ckpt.index

    train_logdir: 保存训练权重的目录,注意在开始的创建工程目录的时候就创建了,这里设置为"../research/deeplab/exp/train_on_train_set/train/"

    dataset_dir:数据集的地址,前面创建的TFRecords目录。这里设置为"../dataset/cityscapes/tfrecord"

    4.验证测试

    - 验证脚本:

    #!/bin/bash
    # CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --backbone resnet --lr 0.01 --workers 4 --epochs 40 --batch-size 16 --gpu-ids 0,1,2,3 --checkname deeplab-resnet --eval-interval 1 --dataset coco
    PATH_TO_INITIAL_CHECKPOINT='/home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/'
    PATH_TO_CHECKPOINT='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train/'
    PATH_TO_EVAL_DIR='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/eval/'
    PATH_TO_DATASET='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/tfrecord'
    WORK_DIR='/home/rjw/tf-models/research/deeplab'
    # From tensorflow/models/research/
    python "${WORK_DIR}"/eval.py \
    
        --logtostderr \
    
        --eval_split="val" \
    
        --model_variant="xception_65" \
    
        --atrous_rates=6 \
    
        --atrous_rates=12 \
    
        --atrous_rates=18 \
    
        --output_stride=16 \
    
        --decoder_output_stride=4 \
    
        --eval_crop_size=1025 \
    
        --eval_crop_size=2049 \
    
        --dataset="cityscapes" \
    
        --checkpoint_dir=${PATH_TO_INITIAL_CHECKPOINT} \
    
        --eval_logdir=${PATH_TO_EVAL_DIR} \
    
        --dataset_dir=${PATH_TO_DATASET}

    - rusult:model.ckpt-40000为在初始化模型上训练40000次迭代的模型;后面用初始化模型测试miou_1.0还是很低,不知道是不是有什么参数设置的问题!!!

    - 注意,如果使用官方提供的checkpoint,压缩包中是没有checkpoint文件的,需要手动添加一个checkpoint文件;初始化模型中是没有提供chekpoint文件的。

    INFO:tensorflow:Restoring parameters from /home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train/model.ckpt-40000
    INFO:tensorflow:Running local_init_op.
    INFO:tensorflow:Done running local_init_op.
    INFO:tensorflow:Starting evaluation at 2018-12-18-07:13:08
    INFO:tensorflow:Evaluation [50/500]
    INFO:tensorflow:Evaluation [100/500]
    INFO:tensorflow:Evaluation [150/500]
    INFO:tensorflow:Evaluation [200/500]
    INFO:tensorflow:Evaluation [250/500]
    INFO:tensorflow:Evaluation [300/500]
    INFO:tensorflow:Evaluation [350/500]
    INFO:tensorflow:Evaluation [400/500]
    INFO:tensorflow:Evaluation [450/500]
    miou_1.0[0.478293568]
    INFO:tensorflow:Waiting for new checkpoint at /home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/
    INFO:tensorflow:Found new checkpoint at /home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/model.ckpt
    INFO:tensorflow:Graph was finalized.
    2018-12-18 15:18:05.210957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
    2018-12-18 15:18:05.211047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
    2018-12-18 15:18:05.211077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
    2018-12-18 15:18:05.211100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
    2018-12-18 15:18:05.211645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9404 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
    INFO:tensorflow:Restoring parameters from /home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/model.ckpt
    INFO:tensorflow:Running local_init_op.
    INFO:tensorflow:Done running local_init_op.
    INFO:tensorflow:Starting evaluation at 2018-12-18-07:18:06
    INFO:tensorflow:Evaluation [50/500]
    INFO:tensorflow:Evaluation [100/500]
    INFO:tensorflow:Evaluation [150/500]
    INFO:tensorflow:Evaluation [200/500]
    INFO:tensorflow:Evaluation [250/500]
    INFO:tensorflow:Evaluation [300/500]
    INFO:tensorflow:Evaluation [350/500]
    INFO:tensorflow:Evaluation [400/500]
    INFO:tensorflow:Evaluation [450/500]
    miou_1.0[0.496331513]

    5.可视化测试

    - 在vis目录下生成分割结果图

    #!/bin/bash
    # CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --backbone resnet --lr 0.01 --workers 4 --epochs 40 --batch-size 16 --gpu-ids 0,1,2,3 --checkname deeplab-resnet --eval-interval 1 --dataset coco
    
    PATH_TO_CHECKPOINT='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train/'
    PATH_TO_VIS_DIR='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/vis/'
    PATH_TO_DATASET='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/tfrecord'
    WORK_DIR='/home/rjw/tf-models/research/deeplab'
    
    # From tensorflow/models/research/
    python "${WORK_DIR}"/vis.py \
    
        --logtostderr \
    
        --vis_split="val" \
    
        --model_variant="xception_65" \
    
        --atrous_rates=6 \
    
        --atrous_rates=12 \
    
        --atrous_rates=18 \
    
        --output_stride=16 \
    
        --decoder_output_stride=4 \
    
        --vis_crop_size=1025 \
    
        --vis_crop_size=2049 \
    
        --dataset="cityscapes" \
    
        --colormap_type="cityscapes" \
    
        --checkpoint_dir=${PATH_TO_CHECKPOINT} \
    
        --vis_logdir=${PATH_TO_VIS_DIR} \
    
        --dataset_dir=${PATH_TO_DATASET}

    以上为个人经验,希望能给大家一个参考,也希望大家多多支持脚本之家。

    您可能感兴趣的文章:
    • tensorflow常用函数API介绍
    • 使用Tensorflow将自己的数据分割成batch训练实例
    • windows10下安装TensorFlow Object Detection API的步骤
    • TensorFlow2.0:张量的合并与分割实例
    上一篇:Django分页器的用法详解
    下一篇:如何利用Python识别图片中的文字详解
  • 相关文章
  • 

    © 2016-2020 巨人网络通讯 版权所有

    《增值电信业务经营许可证》 苏ICP备15040257号-8

    浅谈tensorflow语义分割api的使用(deeplab训练cityscapes) 浅谈,tensorflow,语义,分割,