AI学习笔记 | 曲径通幽处.

JSON（JavaScript Object Notation）是一种轻量级的数据交换格式，它使用易于人类阅读和编写的文本格式，并且可以被多种编程语言解析和生成。JSON文件用于存储和传输数据，通常在Web应用程序中用于从服务器获取数据并将其显示在用户界面上。

JSON文件由键-值对组成，类似于JavaScript对象，其中每个键都是一个字符串，值可以是字符串、数字、布尔值、数组或其他JSON对象。JSON文件可以包含任意数量的键-值对，使其非常适合表示结构化数据，如配置文件、日志文件、API响应和数据库记录等。

也可以包含网址等信息

tesk2

Task02:数据集的读取与可视化
一、任务目的：了解图像数据集的存储方式，学会读取数据并对数据进行存储格式的转换，实现数据可视化。

二、主要任务（以coco数据集的标签文件为例）：

1、标签数据读取：读取数据集标签文件“TestData_coco.json”；

2、图像数据下载：根据标签数据中图像链接下载相应图像并以图像ID命名；

3、数据存储格式变换：为一幅图像建立一个字典，字典中包含{‘image_ids’, ‘category_ids’, ‘bboxes’, ‘segmentations’}等键，每个键构建一个list，存储该幅图像中包含的目标类别、目标框和分割标签信息。
4、可视化图像ID为1000的图像及标签：显示图像、目标框和分割信息，并在目标框左上角标注类别名称，如图例所示；

5、抽取部分数据保存成新的json文件：从数据集中挑选出图像ID为‘139’，‘724’，‘785’，‘885’和‘1000’的图像及相应标注框和分割掩码标签，保存成新的标签文件，格式与原文件保持一致。

三、任务要求：语言不限，推荐使用Python语言。

将一个图片转换成字典的python脚本：

from PIL import Image

# 读取图像文件
img = Image.open('000000000885.jpg')

# 获取图像的尺寸
width, height = img.size

# 创建一个空字典
pixels_dict = {}

# 遍历每个像素点
for y in range(height):
    for x in range(width):
        # 获取当前像素点的颜色值
        pixel = img.getpixel((x, y))
        # 将像素点和颜色值添加到字典中
        pixels_dict[(x, y)] = pixel

# 存储字典到本地文件系统
with open('pixels_dict.txt', 'w') as f:
    f.write(str(pixels_dict))

从图像中提取目标框和分割标签信息，我要用到什么工具?

要从图像中提取目标框和分割标签信息，您可以使用以下工具：

图像处理库：例如OpenCV、Pillow等
目标检测模型：例如Faster R-CNN、YOLO、SSD等
图像语义分割模型：例如Mask R-CNN、FCN等

这些工具可以帮助识别和定位图像中的对象，并生成相应的目标框和分割标签信息。其中，目标检测模型可以检测出图像中的物体并给出其边界框，而图像语义分割模型可以将图像中的不同区域进行划分，并为每个区域打上特定的标签。使用这些工具可以高效地完成目标检测和分割任务。

如何用Pillow库提取目标框？

要使用Pillow库提取目标框，可以按照以下步骤进行：

打开图片文件并加载到Pillow的Image对象中：

1
2
3

from PIL import Image

image = Image.open("image.jpg")

定义一个包含目标位置信息的元组（left, upper, right, lower），其中left和upper是目标框左上角的x和y坐标，right和lower是右下角的x和y坐标。
调用Pillow的crop()方法，使用该元组作为参数来提取目标框：

# 定义目标位置信息元组
box = (100, 100, 300, 300)

# 提取目标框
target = image.crop(box)

##PIL:

1	pip install Pillow

我如何得到right和lower、left和upper？

可以使用Python中的cv2.boundingRect(contour)函数来获取轮廓的边界框。该函数将返回四个值：x，y，w和h。其中，(x, y)是矩形左上角点的坐标，w和h分别是矩形的宽度和高度。因此，可以通过以下代码来获取right、lower、left和upper：

Copy Codex, y, w, h = cv2.boundingRect(contour)
right = x + w
lower = y + h
left = x
upper = y

这里假设contour是一个OpenCV中的轮廓对象。

我如何给一张图片添加OpenCV中的轮廓对象

要在OpenCV中给一张图片添加轮廓对象，需要按照以下步骤进行操作：

读取图像并将其转换为灰度图像：使用cv2.imread()函数加载图像，并使用cv2.cvtColor()函数将图像转换为灰度图像。
1
2
3
4
5
6
7
import cv2

# Load the image
img = cv2.imread('image.jpg')

# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
对图像进行边缘检测：使用任意的边缘检测算法（如Canny算子）对灰度图像进行边缘检测。
1
2
# Apply Canny edge detection
edges = cv2.Canny(gray, 100, 200)

查找轮廓并绘制：使用cv2.findContours()函数查找图像中的所有轮廓，并使用cv2.drawContours()函数将轮廓绘制到原始图像上。

# Find contours
contours, hierarchy = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

# Draw contours on the original image
cv2.drawContours(img, contours, -1, (0, 255, 0), 2)

完整代码示例：

import cv2

# Load the image
img = cv2.imread('image.jpg')

# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Apply Canny edge detection
edges = cv2.Canny(gray, 100, 200)

# Find contours
contours, hierarchy = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

# Draw contours on the original image
cv2.drawContours(img, contours, -1, (0, 255, 0), 2)

# Display the result
cv2.imshow('Contours', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

这段代码将在窗口中显示带有轮廓的原始图像。

##cv2:

1	pip install opencv-python

初次尝试未成功的代码：

import cv2
from PIL import Image

# Load the image
img = cv2.imread('000000000885.jpg')

# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Apply Canny edge detection
edges = cv2.Canny(gray, 100, 200)

# Find contours
contours, hierarchy = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

# Draw contours on the original image
cv2.drawContours(img, contours, -1, (0, 255, 0), 2)

# Get bounding boxes of contours
bboxes = []
for contour in contours:
    x, y, w, h = cv2.boundingRect(contour)
    right = x + w
    lower = y + h
    left = x
    upper = y
    bboxes.append([left, upper, right, lower])

# Open the image for further processing
image = Image.open("000000000885.jpg")

# Define the target box coordinates as a tuple
box_coords = (100, 100, 300, 300)

# Extract the target box as a new image
target_image = image.crop(box_coords)

# Assume you have obtained the target class, bounding box, and segmentation label information and stored them in respective variables

# Create an empty list to store the entry for each object
annotations_list = []

# For each object, create a dictionary entry and add it to the list
num_objects = 3
image_id = 885
category_ids = ["human", "dog", "cat"]  # use list instead of string
segmentations = ["person", "dog", "cat"]  # use list instead of string
for i in range(num_objects):
    annotation_dict = {
        'image_id': image_id,
        'category_id': category_ids[i],
        'bbox': bboxes[i],
        'segmentation': segmentations[i]
    }
    annotations_list.append(annotation_dict)

# Create the final dictionary containing keys such as 'image_ids', 'category_ids', 'bboxes', and 'segmentations', and store it as a list
annotations_dict = {
    'image_ids': [image_id],
    'category_ids': [ann['category_id'] for ann in annotations_list],
    'bboxes': [ann['bbox'] for ann in annotations_list],
    'segmentations': [ann['segmentation'] for ann in annotations_list]
}

# Open the file for writing
with open("annotations.txt", "w") as f:
    # Write the dictionary to the file
    f.write(str(annotations_dict))

# Close the file
f.close()


########################################

# Define functions to get image and category information
def get_image_info('image_id'):
    # Implementation of getting image info
    pass


def get_categories():
    # Implementation of getting categories
    pass


# Load the image using PIL
image_path = '000000000885.jpg'
image = Image.open(image_path)

# Get the image id, bounding boxes, and segmentation masks
image_id = 885
image_info = get_image_info(image_id)
categories = get_categories()

import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

# Create a plot with the image and bounding boxes
fig, ax = plt.subplots()
ax.imshow(image)

for i in range(len(image_info['bboxes'])):
    bbox = image_info['bboxes'][i]
    category_id = image_info['category_ids'][i]
    category_name = categories[category_id]['name']
    seg_mask = image_info['segmentations'][i]

    # Add a rectangle for the bounding box
    rect = Rectangle((bbox[0], bbox[1]), bbox[2] - bbox[0], bbox[3] - bbox[1], linewidth=1, edgecolor='r',
                     facecolor='none')
    ax.add_patch(rect)

    # Add the category name as text next to the bounding box
    ax.text(bbox[0], bbox[1], category_name, fontsize=8, color='r')

plt.show()

第二次尝试：

import json
import urllib.request
from PIL import Image, ImageDraw

# 读取标签数据
with open('TestData_coco.json', 'r') as f:
    data = json.load(f)

# 下载图像并以图像ID命名
for img in data['images']:
    url = img['coco_url']
    img_id = str(img['id']).zfill(12)
    filename = img_id + '.jpg'
    urllib.request.urlretrieve(url, filename)

# 建立字典存储数据
img_dict = {}
for img in data['images']:
    img_id = img['id']
    img_dict[img_id] = {'image_ids': [], 'category_ids': [], 'bboxes': [], 'segmentations': []}
for ann in data['annotations']:
    img_id = ann['image_id']
    img_dict[img_id]['image_ids'].append(img_id)
    img_dict[img_id]['category_ids'].append(ann['category_id'])
    img_dict[img_id]['bboxes'].append(ann['bbox'])
    img_dict[img_id]['segmentations'].append(ann['segmentation'])

# 可视化图像ID为1000的图像及标签
img_path = '000000001000.jpg'
img = Image.open(img_path)
draw = ImageDraw.Draw(img)
for ann in data['annotations']:
    if ann['image_id'] == 1000:
        bbox = ann['bbox']
        draw.rectangle([(bbox[0], bbox[1]), (bbox[0]+bbox[2], bbox[1]+bbox[3])], outline='red')
        cat_id = ann['category_id']
        cat_name = data['categories'][cat_id]['name']
        draw.text((bbox[0], bbox[1]-15), cat_name, fill='red')
img.show()

# 抽取部分数据保存成新的json文件
new_data = {'licenses': data['licenses'], 'info': data['info'], 'categories': data['categories'], 'images': [], 'annotations': []}
for img_id in [139, 724, 785, 885, 1000]:
    new_data['images'].append(data['images'][id])
    anns = [ann for ann in data['annotations'] if ann['image_id'] == id]
    new_data['annotations'].extend(anns)
with open('NewTestData_coco.json', 'w') as f:
    json.dump(new_data, f)

最后一部分存在问题，且该程序虽然能够实现框出物体，但是错误的把人标识成了自行车（T_T）

看了别人的，用到了pycocotools这个库，于是查了一下他是干什么的：

pycocotools 是一个 Python 库，用于处理 COCO 数据集（Common Objects in Context）。COCO 数据集是一个非常受欢迎的计算机视觉数据集，其中包含了各种常见物体的图像，如人、动物、车辆等，并且为每个对象提供了遮挡、不同角度和光照条件下的多个标注框。该数据集旨在支持目标检测、分割、姿态估计等多个任务。

pycocotools 提供了一系列工具函数，用于解析 COCO 数据集中的注释信息、评估模型预测结果和可视化注释等，使得使用 COCO 数据集进行目标检测与分割相关研究的开发者能够更加方便地进行实验和评估。

一个可行的程序：

import os
import requests
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from pycocotools.coco import COCO

# 任务1: 加载COCO数据集的标签文件
def load_coco_data(json_path):
    coco = COCO(json_path)
    return coco

# 任务2: 下载图像并以其ID命名保存到指定目录中
def download_images(coco, output_dir):
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    for img_id in coco.imgs:
        img_info = coco.loadImgs(ids=img_id)[0]
        url = img_info['coco_url']
        response = requests.get(url)
        if response.status_code == 200:
            with open(os.path.join(output_dir, f'{img_id}.jpg'), 'wb') as f:
                f.write(response.content)

# 任务3: 转换数据格式
def transform_data(coco):
    transformed_data = {}
    for img_id in coco.imgs:
        ann_ids = coco.getAnnIds(imgIds=img_id)
        annotations = coco.loadAnns(ids=ann_ids)
        transformed_data[img_id] = {'image_ids': [img_id],
                                    'category_ids': [ann['category_id'] for ann in annotations],
                                    'bboxes': [ann['bbox'] for ann in annotations],
                                    'segmentations': [ann['segmentation'] for ann in annotations]}
    return transformed_data

# 任务4: 可视化图像ID为1000的图像及其标签
def visualize_image(image_id, data, image_dir, coco):
    img_info = coco.loadImgs(ids=image_id)[0]
    image = Image.open(os.path.join(image_dir, f"{image_id}.jpg"))

    plt.figure()
    plt.imshow(image)
    ax = plt.gca()

    # 定义一个颜色列表，用于给不同的标签分配不同的颜色
    colors = ['r', 'g', 'b', 'y', 'c', 'm']

    for i, (category_id, bbox) in enumerate(zip(data[image_id]['category_ids'], data[image_id]['bboxes'])):
        category_name = coco.loadCats(ids=category_id)[0]['name']
        x, y, width, height = bbox
        rect = Rectangle((x, y), width, height, linewidth=2, edgecolor=colors[i % len(colors)], facecolor='none')
        ax.add_patch(rect)
        # 设置字体大小为8，颜色为黑色，透明度为0.35
        plt.text(x, y, category_name, color='k', fontsize=8, backgroundcolor=colors[i % len(colors)], alpha=0.35)

    plt.show()

# 主程序
if __name__ == '__main__':
    json_path = 'TestData_coco.json'
    image_dir = 'images'
    image_id_to_visualize = 1000

    # 加载COCO JSON文件
    coco_data = load_coco_data(json_path)

    # 下载图像
    download_images(coco_data, image_dir)

    # 转换数据格式
    transformed_data = transform_data(coco_data)

    # 可视化图像ID为1000的图像及其标签
    visualize_image(image_id_to_visualize, transformed_data, image_dir, coco_data)

可行程序2：

import json
import os
import requests
from pycocotools.coco import COCO
import numpy as np
import cv2

# 读取标签数据
dataDir = '.'
dataType = 'val2017'
annFile = '{}/TestData_coco.json'.format(dataDir)
coco = COCO(annFile)

# 下载图像数据
save_dir = './images'
if not os.path.exists(save_dir):
    os.makedirs(save_dir)
for img_id in coco.getImgIds():
    img_info = coco.loadImgs(img_id)[0]
    img_url = img_info['coco_url']
    img_name = '{:012d}.jpg'.format(img_id)
    img_path = os.path.join(save_dir, img_name)
    if not os.path.exists(img_path):
        response = requests.get(img_url)
        with open(img_path, 'wb') as f:
            f.write(response.content)

# 存储格式变换
data = {}
for img_id in coco.getImgIds():
    img_info = coco.loadImgs(img_id)[0]
    ann_ids = coco.getAnnIds(imgIds=img_id)
    anns = coco.loadAnns(ann_ids)
    categories = []
    bboxes = []
    segmentations = []
    for ann in anns:
        categories.append(ann['category_id'])
        bboxes.append(ann['bbox'])
        segmentations.append(ann['segmentation'])
    data[img_id] = {'image_ids': [img_id], 'category_ids': categories, 'bboxes': bboxes, 'segmentations': segmentations}

# 可视化图像ID为1000的图像及标签
img_id = 1000
img_info = coco.loadImgs(img_id)[0]
img_path = os.path.join(save_dir, img_info['file_name'])
img = cv2.imread(img_path)
ann_ids = coco.getAnnIds(imgIds=img_id)
anns = coco.loadAnns(ann_ids)
for ann in anns:
    bbox = ann['bbox']
    segmentation = ann['segmentation']
    category_id = ann['category_id']
    category_info = coco.loadCats(category_id)[0]
    category_name = category_info['name']
    color = tuple(np.random.randint(0, 256, 3).tolist())
    cv2.rectangle(img, (int(bbox[0]), int(bbox[1])), (int(bbox[0] + bbox[2]), int(bbox[1] + bbox[3])), color, 2)
    cv2.putText(img, category_name, (int(bbox[0]), int(bbox[1]) - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
    mask = coco.annToMask(ann)
    contour, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    cv2.drawContours(img, contour, -1, color, 2)
cv2.imshow('image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()