1.datasets.fetch_lfw_people源码阅读笔记
def fetch_lfw_people(data_home=None, funneled=True, resize=0.5,
min_faces_per_person=0, color=False,
slice_=(slice(70, 195), slice(78, 172)),
download_if_missing=True):
"""Loader for the Labeled Faces in the Wild (LFW) people dataset
This dataset is a collection of JPEG pictures of famous people
collected on the internet, all details are available on the
official website:
http://vis-www.cs.umass.edu/lfw/
Each picture is centered on a single face. Each pixel of each channel
(color in RGB) is encoded by a float in range 0.0 - 1.0.
The task is called Face Recognition (or Identification): given the
picture of a face, find the name of the person given a training set
(gallery).
The original images are 250 x 250 pixels, but the default slice and resize
arguments reduce them to 62 x 74.
Parameters
----------
data_home : optional, default: None
Specify another download and cache folder for the datasets. By default
all scikit learn data is stored in '~/scikit_learn_data' subfolders.
funneled : boolean, optional, default: True
Download and use the funneled variant of the dataset.
resize : float, optional, default 0.5
Ratio used to resize the each face picture.
min_faces_per_person : int, optional, default None
The extracted dataset will only retain pictures of people that have at
least `min_faces_per_person` different pictures.
color : boolean, optional, default False
Keep the 3 RGB channels instead of averaging them to a single
gray level channel. If color is True the shape of the data has
one more dimension than the shape with color = False.
slice_ : optional
Provide a custom 2D slice (height, width) to extract the
'interesting' part of the jpeg files and avoid use statistical
correlation from the background
download_if_missing : optional, True by default
If False, raise a IOError if the data is not locally available
instead of trying to download the data from the source site.
Returns
-------
dataset : dict-like object with the following attributes:
dataset.data : numpy array of shape (13233, 2914)
Each row corresponds to a ravelled face image of original size 62 x 47
pixels. Changing the ``slice_`` or resize parameters will change the
shape of the output.
dataset.images : numpy array of shape (13233, 62, 47)
Each row is a face image corresponding to one of the 5749 people in
the dataset. Changing the ``slice_`` or resize parameters will change
the shape of the output.
dataset.target : numpy array of shape (13233,)
Labels associated to each face image. Those labels range from 0-5748
and correspond to the person IDs.
dataset.DESCR : string
Description of the Labeled Faces in the Wild (LFW) dataset.
"""
2.运行错误问题:Please make sure that libjpeg is installed
在学习特征脸时,要加载lfw_people,代码如下
from sklearn.datasets import fetch_lfw_people
people = fetch_lfw_people()
首先,看一下下载的目录,我的是联想电脑,这个包(lfw_funneled.tgz)被下载到了这个文件夹下面:
C:\Users\Lenovo\scikit_learn_data\lfw_home
切到这个目录下面:
很可能原文件下载不全,于是把这个文件删除。(如果已经解压出lfw_funneled文件夹,也把这个文件夹删除)
同时复制这个网址 https://ndownloader.figshare.com/files/5976015 到迅雷中,下载到完整的lfwfunneded.tgz文件。
并把这个文件复制到/Users/your_name/scikit_learn_data/lfw_home/ 下面,并解压缩,好的,到了这一步就OK。
就像下面这样:
现在,lfw就能正常的使用了: