FaceForensics++数据库下载（一步步解析过程）-编程知识

FaceForensics++数据库下载（超详细版教程）

相信很多做deepfake相关研究的朋友，在对模型进行测试或者对潜前人的研究进行复现时，都需要下载一系列数据库并进行预处理等操作，而FaceForensics++数据库是一个由数千个使用不同DeepFake方法操纵的视频组成，并包含四个假子数据集，即DeepFake Detection (DFD)， DeepFake (DF)， Face2Face (F2F)和FaceSwap (FS)。

由于这是国外的数据集所以一下的操作都需要挂代理来实现：（如果小伙伴无法挂代理可以评论留言，我发给你）

获取下载脚本并并保存到本地

ondyari/FaceForensics: Github of the FaceForensics datasethttps://github.com/ondyari/FaceForensicsff++的官网如上，按照上面的要求填写谷歌的表格，他会通过邮件给你发脚本代码，这里不赘述，我会在文章最后附上这个代码，大家就可以不用填写表格了。

CMD窗口下载数据库

打开cmd窗口

将文后我给的代码粘贴并后缀改为.py文件，确保你的文件命名为FaceForensics++.py

cd  you_dir # 转到这个.py文件所属的文件夹下

接着cmd窗口会显示已经进入目标文件夹，接着输入下载命令：

python FaceForensics++.py
//前面的python指你的电脑本身的python.exe文件，注意并不一定是“python”,需要观察你自己下载的python中的pythin.exe文件的命名是什么，比如笔者下载的python运行文件的命名是python3.11.exe，所以这里就应该python3.11<output path>//这里意思是数据下载的地址，即你的数据集要放在哪里（注意存储空间要足够）-d <dataset type, e.g., Face2Face, original or all>//如果你要下载FaceForensics++全部直接-all即可，也可以选择FaceForensics++数据集其中的一项来下载-c <compression quality, e.g., c23 or raw>//这里指压缩参数选择，如果想要下载原始数据可以选-raw,笔者下载的是-23压缩版-t <file type, e.g., videos, masks or models>//文件下载的类型-video即可下载deepfake的video

比如，要在D盘上的FaceForensics++文件里下载FaceForensics++数据集全部视频，以C23参数压缩，命令可以是：

python3.11 FaceForensics.py E:/FaceForensics++ -d all -c c23 -t videos

注意，运行过程中如果出现“502 BadGateway”提示，可能是你的服务不能使用脚本默认的，而是需要更改，脚本里面提供了三个server可供选择，分别是EU,EU2和CA，对应了欧洲1，2和加拿大，默认使用的是EU，脚本这部分代码如下：

 parser.add_argument('--server', type=str, default='EU',help='Server to download the data from. If you ''encounter a slow download speed, consider ''changing the server.',choices=SERVERS)args = parser.parse_args()# URLsserver = args.serverif server == 'EU':server_url = 'http://canis.vc.in.tum.de:8100/'elif server == 'EU2':server_url = 'http://kaldir.vc.in.tum.de/faceforensics/'elif server == 'CA':server_url = 'http://falas.cmpt.sfu.ca:8100/'else:raise Exception('Wrong server name. Choices: {}'.format(str(SERVERS)))args.tos_url = server_url + 'webpage/FaceForensics_TOS.pdf'args.base_url = server_url + 'v3/'args.deepfakes_model_url = server_url + 'v3/manipulated_sequences/' + \'Deepfakes/models/'return args

所以，在你的CMD窗口可以指定以下server，笔者在出错以后，将server改为EU2就可以顺利下载，总体命令如下：


python3.11 FaceForensics.py E:/FaceForensics++ --server EU2 -d all -c c23 -t videos

PS:–server EU2即指定server为EU2

接下来数据就会开始下载（会有进度条显示下载情况）

由于挂代理，所以可能会出现不稳定的停止下载的情况，如果遇到程序中断，没有关系，确保网络、代理正确连接后重新输入下载命令即可，它会自动跳过已经下载好的文件，继续下载其他文件。

建议下载数据集时尽量保证网速较快，并且代理稳定（否则一中断就重新输入一遍命令很麻烦）。

脚本

附：邮件提供的FaceForensics++脚本如下：（自己复制粘贴做成.py文件）

#!/usr/bin/env python
""" Downloads FaceForensics++ and Deep Fake Detection public data release
Example usage:see -h or https://github.com/ondyari/FaceForensics
"""
# -*- coding: utf-8 -*-
import argparse
import os
import urllib
import urllib.request
import tempfile
import time
import sys
import json
import random
from tqdm import tqdm
from os.path import join# URLs and filenames
FILELIST_URL = 'misc/filelist.json'
DEEPFEAKES_DETECTION_URL = 'misc/deepfake_detection_filenames.json'
DEEPFAKES_MODEL_NAMES = ['decoder_A.h5', 'decoder_B.h5', 'encoder.h5',]# Parameters
DATASETS = {'original_youtube_videos': 'misc/downloaded_youtube_videos.zip','original_youtube_videos_info': 'misc/downloaded_youtube_videos_info.zip','original': 'original_sequences/youtube','DeepFakeDetection_original': 'original_sequences/actors','Deepfakes': 'manipulated_sequences/Deepfakes','DeepFakeDetection': 'manipulated_sequences/DeepFakeDetection','Face2Face': 'manipulated_sequences/Face2Face','FaceShifter': 'manipulated_sequences/FaceShifter','FaceSwap': 'manipulated_sequences/FaceSwap','NeuralTextures': 'manipulated_sequences/NeuralTextures'}
ALL_DATASETS = ['original', 'DeepFakeDetection_original', 'Deepfakes','DeepFakeDetection', 'Face2Face', 'FaceShifter', 'FaceSwap','NeuralTextures']
COMPRESSION = ['raw', 'c23', 'c40']
TYPE = ['videos', 'masks', 'models']
SERVERS = ['EU', 'EU2', 'CA']def parse_args():parser = argparse.ArgumentParser(description='Downloads FaceForensics v2 public data release.',formatter_class=argparse.ArgumentDefaultsHelpFormatter)parser.add_argument('output_path', type=str, help='Output directory.')parser.add_argument('-d', '--dataset', type=str, default='all',help='Which dataset to download, either pristine or ''manipulated data or the downloaded youtube ''videos.',choices=list(DATASETS.keys()) + ['all'])parser.add_argument('-c', '--compression', type=str, default='raw',help='Which compression degree. All videos ''have been generated with h264 with a varying ''codec. Raw (c0) videos are lossless compressed.',choices=COMPRESSION)parser.add_argument('-t', '--type', type=str, default='videos',help='Which file type, i.e. videos, masks, for our ''manipulation methods, models, for Deepfakes.',choices=TYPE)parser.add_argument('-n', '--num_videos', type=int, default=None,help='Select a number of videos number to '"download if you don't want to download the full"' dataset.')parser.add_argument('--server', type=str, default='EU',help='Server to download the data from. If you ''encounter a slow download speed, consider ''changing the server.',choices=SERVERS)args = parser.parse_args()# URLsserver = args.serverif server == 'EU':server_url = 'http://canis.vc.in.tum.de:8100/'elif server == 'EU2':server_url = 'http://kaldir.vc.in.tum.de/faceforensics/'elif server == 'CA':server_url = 'http://falas.cmpt.sfu.ca:8100/'else:raise Exception('Wrong server name. Choices: {}'.format(str(SERVERS)))args.tos_url = server_url + 'webpage/FaceForensics_TOS.pdf'args.base_url = server_url + 'v3/'args.deepfakes_model_url = server_url + 'v3/manipulated_sequences/' + \'Deepfakes/models/'return argsdef download_files(filenames, base_url, output_path, report_progress=True):os.makedirs(output_path, exist_ok=True)if report_progress:filenames = tqdm(filenames)for filename in filenames:download_file(base_url + filename, join(output_path, filename))def reporthook(count, block_size, total_size):global start_timeif count == 0:start_time = time.time()returnduration = time.time() - start_timeprogress_size = int(count * block_size)speed = int(progress_size / (1024 * duration))percent = int(count * block_size * 100 / total_size)sys.stdout.write("\rProgress: %d%%, %d MB, %d KB/s, %d seconds passed" %(percent, progress_size / (1024 * 1024), speed, duration))sys.stdout.flush()def download_file(url, out_file, report_progress=False):out_dir = os.path.dirname(out_file)if not os.path.isfile(out_file):fh, out_file_tmp = tempfile.mkstemp(dir=out_dir)f = os.fdopen(fh, 'w')f.close()if report_progress:urllib.request.urlretrieve(url, out_file_tmp,reporthook=reporthook)else:urllib.request.urlretrieve(url, out_file_tmp)os.rename(out_file_tmp, out_file)else:tqdm.write('WARNING: skipping download of existing file ' + out_file)def main(args):# TOSprint('By pressing any key to continue you confirm that you have agreed '\'to the FaceForensics terms of use as described at:')print(args.tos_url)print('***')print('Press any key to continue, or CTRL-C to exit.')_ = input('')# Extract argumentsc_datasets = [args.dataset] if args.dataset != 'all' else ALL_DATASETSc_type = args.typec_compression = args.compressionnum_videos = args.num_videosoutput_path = args.output_pathos.makedirs(output_path, exist_ok=True)# Check for special dataset casesfor dataset in c_datasets:dataset_path = DATASETS[dataset]# Special casesif 'original_youtube_videos' in dataset:# Here we download the original youtube videos zip fileprint('Downloading original youtube videos.')if not 'info' in dataset_path:print('Please be patient, this may take a while (~40gb)')suffix = ''else:suffix = 'info'download_file(args.base_url + '/' + dataset_path,out_file=join(output_path,'downloaded_videos{}.zip'.format(suffix)),report_progress=True)return# Else: regular datasetsprint('Downloading {} of dataset "{}"'.format(c_type, dataset_path))# Get filelists and video lenghts list from serverif 'DeepFakeDetection' in dataset_path or 'actors' in dataset_path:filepaths = json.loads(urllib.request.urlopen(args.base_url + '/' +DEEPFEAKES_DETECTION_URL).read().decode("utf-8"))if 'actors' in dataset_path:filelist = filepaths['actors']else:filelist = filepaths['DeepFakesDetection']elif 'original' in dataset_path:# Load filelist from serverfile_pairs = json.loads(urllib.request.urlopen(args.base_url + '/' +FILELIST_URL).read().decode("utf-8"))filelist = []for pair in file_pairs:filelist += pairelse:# Load filelist from serverfile_pairs = json.loads(urllib.request.urlopen(args.base_url + '/' +FILELIST_URL).read().decode("utf-8"))# Get filelistfilelist = []for pair in file_pairs:filelist.append('_'.join(pair))if c_type != 'models':filelist.append('_'.join(pair[::-1]))# Maybe limit number of videos for downloadif num_videos is not None and num_videos > 0:print('Downloading the first {} videos'.format(num_videos))filelist = filelist[:num_videos]# Server and local pathsdataset_videos_url = args.base_url + '{}/{}/{}/'.format(dataset_path, c_compression, c_type)dataset_mask_url = args.base_url + '{}/{}/videos/'.format(dataset_path, 'masks', c_type)if c_type == 'videos':dataset_output_path = join(output_path, dataset_path, c_compression,c_type)print('Output path: {}'.format(dataset_output_path))filelist = [filename + '.mp4' for filename in filelist]download_files(filelist, dataset_videos_url, dataset_output_path)elif c_type == 'masks':dataset_output_path = join(output_path, dataset_path, c_type,'videos')print('Output path: {}'.format(dataset_output_path))if 'original' in dataset:if args.dataset != 'all':print('Only videos available for original data. Aborting.')returnelse:print('Only videos available for original data. ''Skipping original.\n')continueif 'FaceShifter' in dataset:print('Masks not available for FaceShifter. Aborting.')returnfilelist = [filename + '.mp4' for filename in filelist]download_files(filelist, dataset_mask_url, dataset_output_path)# Else: models for deepfakeselse:if dataset != 'Deepfakes' and c_type == 'models':print('Models only available for Deepfakes. Aborting')returndataset_output_path = join(output_path, dataset_path, c_type)print('Output path: {}'.format(dataset_output_path))# Get Deepfakes modelsfor folder in tqdm(filelist):folder_filelist = DEEPFAKES_MODEL_NAMES# Folder pathsfolder_base_url = args.deepfakes_model_url + folder + '/'folder_dataset_output_path = join(dataset_output_path,folder)download_files(folder_filelist, folder_base_url,folder_dataset_output_path,report_progress=False)   # already doneif __name__ == "__main__":args = parse_args()main(args)