opencl:cl::make_kernel的进化

本文介绍了一种利用C++11模板特性简化OpenCL内核执行的方法,通过抽象类memory_cl管理和上传下载内存对象,结合run_kernel模板函数,使得内核调用更加简洁高效。

我之前的一篇博客《opencl:C++ 利用cl::make_kernel简化kernel执行代码》详细说明了如何使用OpenCL C++接口(cl.hpp)提供cl::make_kernel算子来简化kernel执行代码。

/* 缩放图像(双线性插值) */
gray_matrix_cl gray_matrix_cl::zoom(size_t dst_width, size_t dst_height, const facecl_context& context)const {
	gray_matrix_cl dst_matrix(dst_width, dst_height);
	auto command_queue = global_facecl_context.getCommandQueue();// 获取cl::CommandQueue
	this->upload(command_queue);//向OpenCL设备中上传原始图像数据
	cl_float widthNormalizationFactor = 1.0f / dst_width;
	cl_float heightNormalizationFactor = 1.0f / dst_height;
	//构造cl::make_kernel对象执行kernel
	cl::make_kernel<cl::Image2D,cl::Image2D,cl_float,cl_float>
		(context.getKernel(KERNEL_NAME(image_scaling)))// 获取已经编译好的cl::Kernel
		(cl::EnqueueArgs(command_queue,cl::NDRange( dst_width, dst_height )),
		cl_img,dst_matrix.cl_img,
		widthNormalizationFactor,
		heightNormalizationFactor);
	command_queue.finish(); // 等待kernel执行结束
	dst_matrix.download(command_queue);从OpenCL设备中下载结果数据
	return std::move(dst_matrix);
}

这是上一篇博客中最后简化的代码。与原来原始代码相比,这种调用方式将所有设置kernel参数的调用(setArg)都被cl::make_kernel算子(fuctor)封装,调用者不需要知道细节。只需要执行cl::make_kernel的operator(),在()中按kernel定义的参数顺序将kernel需要的参数填在括号中,cl::make_kernel算子会自动为kernel设置参数并将kernel压入command_queue执行。

Ok,前一篇博客的内容回顾完毕。
那么还能不能进一步改进,让kernel执行更简单化?
再看看上面的代码,在用opencl的kernel执行一个图像的缩放之前,先要

this->upload(command_queue);//向OpenCL设备中上传原始图像数据

在kernel执行结束之后,

dst_matrix.download(command_queue);从OpenCL设备中下载结果数据

在你写完第一个kernel程序后,再写另外一个kernel的时候,你会发现几乎所有的kernel调用都要有上面两个动作,概括起来就是

  1. 在执行kernel之前,如果kernel参数中有指针类型或imag类型的参数,需要将参数在主机端对应的cl::Memory类型(其子类包括cl::Image,cl::Buffer)的数据上传(upload)到设备
  2. 在执行kernel结束后,可能需要将kernel处理之后的输出数据(同样是cl::Memory类型)下载(download)到主机。

这些都是重复和类似的代码,我们只要把这两个动作抽象出来(memory_cl类),就可以有办法将这两个动作也封装起来。

关于如何实现memory_cl类,将要本文后面讲到,现在假定我们已经有memory_cl类实现对所有cl::Memory对象的download和upload统一管理

make_kernel进化之run_kernel

于是利用C++11的变长模板特性,我们可以写出下面的run_kernel模板函数

template<typename IN_CL_TYPE // kernel参数中的输入数据类型(cl::Buffer,cl::Image)
		,typename OUT_CL_TYPE// kernel参数中的输出数据类型(cl::Buffer,cl::Image) 
		,typename... Args    // kernel参数中其他标量数据类型,变长模板,允许多个参数
		>
void run_kernel(const cl::EnqueueArgs &queue_args // 队列参数
		,const cl::Kernel &kernel//kernel对象
		,bool download           //kernel执行结束后是否将结果数据下载到本地?  
		,const memory_cl<IN_CL_TYPE> &in // 输入数据对象,memory_cl为自已写的opencl内存管理类
		,memory_cl<OUT_CL_TYPE>&out// 输出数据对象,memory_cl为自已写的opencl内存管理类
		,Args&&... args //其他kernel参数
		){
	// 根据数据状态标记判断是否需要上传数据到设备,如果数据已经在设备中就不需要upload
	in.upload_if_need(queue_args.queue_);
	// 执行kernel
	cl::make_kernel<IN_CL_TYPE, OUT_CL_TYPE, Args...>k(kernel);//创建cl::make_kernel对象
	k(queue_args,in.cl_mem_obj,out.cl_mem_obj, std::forward<Args>(args)...);//执行kernel
	// 根据download标记决定是否执行	memory_cl的download函数将kernel输出数据下载到主机。
	if(download)
		out.download(queue_args.queue_);
}

借助这个run_kernel模板函数,前面实现图像缩放的gray_matrix_cl::zoom函数就可以改写如下:

template<typename T>T get_align(T v,uint8_t a){return (T)((v+(T)((1<<a)-1))>>a);}
/* 缩放图像(双线性插值) */
gray_matrix_cl gray_matrix_cl::zoom(size_t dst_width, size_t dst_height, const facecl_context& context,bool download)const {
	gray_matrix_cl dst_matrix(dst_width, dst_height);
	auto command_queue = global_facecl_context.getCommandQueue();
	cl_float widthNormalizationFactor = 1.0f / dst_width;
	cl_float heightNormalizationFactor = 1.0f / dst_height;
	run_kernel(
			cl::EnqueueArgs(command_queue,{ 1, get_align(dst_height,4) })//队列参数对象
			,context.getKernel(KERNEL_NAME(image_scaling)) // 要执行的kernel对象
			,true //自动下载结果数据
			,*this //输入图像
			,dst_matrix // 输出图像
			,widthNormalizationFactor
			,heightNormalizationFactor
		);
	command_queue.finish(); // 等待kernel执行结束
	return std::move(dst_matrix);
}

哈哈,这样以来代码又简化了,大功告成!

run_kernel进化

但是好像当我准备将这个run_kernel,用于执行第二个kernel函数时,问题来了。
我们看上面这个run_kernel函数,它对kernel函数的参数类型和顺序是有要求的:

  1. 第一个参数必须是输入的数据对象
  2. 第二个参数必须是输出数据对象
  3. 其他标量数据对象必须位于第三位以后

所以,它的使用是有限制的,我的第二个kernel函数,只有一个数据对象参数,它即是输入又是输出,它就不太方便用这个函数,(当然还是可以用,将这参数重复填入两次)
当kernel函数有超一个输入数据对象或输出数据对象,就没可能用这个模板函数。。。

能不能改进run_kernel函数,使它允许接收超过一个输入/出数据对象参数,并且不用限定kernel的参数顺序呢?
yes,we can
run_kernel要经历再一次的进化!

下面是改进后的run_kernel模板函数

template<typename... Args>
inline void run_kernel_new(const cl::EnqueueArgs &queue_args// 队列参数对象
		, const cl::Kernel &kernel // kernel对象
		, bool download // kernel执行结束后是否下载结果数据
		, Args&&... args //  kernel参数表
		){
	// 根据需要上传所有cl::Memory对象的数据到设备
	upload_args_if_need<1>(queue_args.queue_,std::forward<Args>(args)...);
	typename make_make_kernel<Args...>::type k(kernel);
	k(queue_args,std::forward<Args>(args)...); // 执行kernel
	// 根据download标记需要下载所有cl::Memory输出对象的数据到主机
	download_args<1>(queue_args.queue_,download,std::forward<Args>(args)...);
}

额,粗看起来与前一版本的run_kernel,貌似差不多,
但还是它真的是进化了

进化之一

只是参数中不再有in,out参数,也就是说,参数表中可以不用关心in/out参数的顺序以及个数了。

		,const memory_cl<IN_CL_TYPE> &in
		,memory_cl<OUT_CL_TYPE>&out

进化之二

与前一版本的run_kernel相比,原来第一行的in.upload_if_need(queue_args.queue_);换成了upload_args_if_need<1>(queue_args.queue_,std::forward<Args>(args)...);最后一行的out.download(queue_args.queue_);换成了download_args<1>(queue_args.queue_,download,std::forward<Args>(args)...);

等等, 这upload_args_if_needdownload_args是个模板函数啊,
嗯,在这里用了递归模板函数,循环检查args 参数表中的参数类型,如果是memory_cl类就执行memory_cl中的upload_if_need函数,
download_args也是差不多,如果是memory_cl类就根据download标记执行memory_cl中的download函数
upload_args_if_needdownload_args模板函数的实现如下:

/* 模板函数,检查T是否为memory_cl的子类 */
template<typename T>
struct is_kind_of_memory_cl{
	template <typename CL_TYPE>
		static CL_TYPE  check(memory_cl<CL_TYPE>);
	static void check(...);
	using cl_type=decltype(check(std::declval<T>()));
	enum{value=!std::is_same<cl_type,void>::value};
};
/*
 * upload_arg(x)_if_need和download_arg(x)系列模板函数循环对run_kernel中的所有变长参数类型进行识别,
 * 对于memory_cl类型的参数,根据需要在kernel执行前上传数据到设备,
 * 并在kernel执行后根据需要下载输出数据到主机
 * 模板中的N参数,用于调试时知道哪个参数出错
 *
 * */
// 参数ARG为非memory_cl类型时直接返回,啥也不做
template<int N,typename ARG>
typename std::enable_if<!is_kind_of_memory_cl<ARG>::value>::type
inline upload_arg_if_need(const cl::CommandQueue &command_queue,const ARG & arg){}
// 参数ARG是memory_cl类型,时根据需要上传数据
template<int N,typename ARG>
typename std::enable_if<is_kind_of_memory_cl<ARG>::value>::type
inline upload_arg_if_need(const cl::CommandQueue &command_queue,const ARG & arg){
	const cl::Memory&m=arg.cl_mem_obj;
	auto mem_context=m.getInfo<CL_MEM_CONTEXT>();
	auto queue_context=command_queue.getInfo<CL_QUEUE_CONTEXT>();
	// 检查memory_cl中内存对象的context与command_queue是否一致,不一致则抛出异常
	if(mem_context()!=queue_context()){
		std::stringstream stream;
		stream<<":the arg No:"<<N;// 动态参数编号
		throw std::invalid_argument(std::string(SOURCE_AT).append(stream.str()).append(":mem_context()!=queue_context()"));
	}
	try{
		arg.upload_if_need(command_queue);//上传数据到设备
	}catch(cl::Error&e){
		std::stringstream stream;
		stream<<"the arg No:"<<N;// 动态参数编号
		throw face_cl_exception(SOURCE_AT,e,stream.str());
	}catch(face_exception&e){
		std::stringstream stream;
		stream<<"the arg No:"<<N<<e.what();// 动态参数编号
		throw face_cl_exception(SOURCE_AT,stream.str());
	}catch(std::exception&e){
		std::stringstream stream;
		stream<<"the arg No:"<<N;// 动态参数编号
		throw face_cl_exception(SOURCE_AT,e,stream.str());
	}catch(...){
		std::stringstream stream;
		stream<<"the arg No:"<<N<<":unknow exception";// 动态参数编号
		throw face_cl_exception(SOURCE_AT,stream.str());
	}
}
// 特例:参数表为空,递归终止
template<int N>
inline void upload_args_if_need(const cl::CommandQueue &command_queue){
}
/* 递归处理Args中的每一个参数
 * 如果是memory_cl类型的对象,则上传数据到设备
 * */
template<int N,typename ARG1,typename... Args>
inline void upload_args_if_need(const cl::CommandQueue &command_queue,ARG1 && arg1,Args&&... args){
	upload_arg_if_need<N>	(command_queue,std::forward<ARG1>(arg1));//处理第一个参数
	upload_args_if_need<N+1>	(command_queue,std::forward<Args>(args)...);//递归处理其他参数
}
// 参数ARG为非memory_cl类型时,为空函数,啥也不做直接返回
template<int N,typename ARG>
typename std::enable_if<!is_kind_of_memory_cl<ARG>::value>::type
inline download_arg(const cl::CommandQueue &command_queue,bool download, const ARG & arg){}
// 参数ARG是memory_cl类型,时根据需要下载数据到主机
template<int N,typename ARG>
typename std::enable_if<is_kind_of_memory_cl<ARG>::value>::type
inline download_arg(const cl::CommandQueue &command_queue,bool download, const ARG & arg){
	if(download){
		try{
			const cl::Memory &m=arg.cl_mem_obj;
			auto flags=m.getInfo<CL_MEM_FLAGS>();
			// 根据CL_MEM_FLAGS判断是否为输出数据对象,以决定是否需要下载数据
			if(flags&(CL_MEM_WRITE_ONLY|CL_MEM_READ_WRITE)){
				const_cast<ARG&>(arg).download(command_queue);//下载数据到设备
			}
		}catch(cl::Error&e){
			std::stringstream stream;
			stream<<"the arg No:"<<N;// 动态参数编号
			throw face_cl_exception(SOURCE_AT,e,stream.str());
		}catch(face_exception&e){
			std::stringstream stream;
			stream<<"the arg No:"<<N<<e.what();// 动态参数编号
			throw face_cl_exception(SOURCE_AT,stream.str());
		}catch(std::exception&e){
			std::stringstream stream;
			stream<<"the arg No:"<<N;// 动态参数编号
			throw face_cl_exception(SOURCE_AT,e,stream.str());
		}catch(...){
			std::stringstream stream;
			stream<<"the arg No:"<<N<<":unknow exception";// 动态参数编号
			throw face_cl_exception(SOURCE_AT,stream.str());
		}
	}
}
// 特例:参数表为空,递归终止
template<int N>
inline void download_args(const cl::CommandQueue &command_queue,bool download){}
/* 递归处理Args中的每一个参数
 * 如果是memory_cl类型的对象,则根据download参数的指示下载数据到主机
 * */
template<int N,typename ARG1,typename... Args>
inline void download_args(const cl::CommandQueue &command_queue,bool download, ARG1 && arg1,Args&&... args){
	download_arg<N>(command_queue,download,std::forward<ARG1>(arg1));//处理第一个参数
	download_args<N+1>(command_queue,download,std::forward<Args>(args)...);//递归处理其他参数
}

进化之三

原来是直接实例化cl::make_kernel类对象的

	cl::make_kernel<IN_CL_TYPE, OUT_CL_TYPE, Args...>k(kernel);

而新版本则改成了

typename make_make_kernel<Args...>::type k(kernel);

这里make_make_kernel也是一个模板函数,用来实例化cl::make_kernel类,为什么要这么做呢?

因为传递给run_kernel的参数中所有OpenCL内存对象(cl::Buffer,cl::Image)都被我自定义的memeory_cl类封装起来了,而cl::make_kernel在执行的时候,参数类型却是需要原始的OpenCL内存对象(cl::Buffer,cl::Image),所以实例化cl::make_kernel时必须将memeory_cl类型转为对应的OpenCL内存对象类型。
make_make_kernel模板函数就是实现这个功能的,下面是make_make_kernel的代码实现


/* 模板函数返回make_kernel执行里需要的类
 * 对于普通的类,就是类本身
 * 对于memory_cl的子类,返回memory_cl::cl_cpp_type
 *  */
template<typename ARG
		,typename ARG_TYPE=typename std::decay<ARG>::type
		,typename MEM_CL= is_kind_of_memory_cl<ARG>
		,typename K_TYPE=typename std::conditional<MEM_CL::value,typename MEM_CL::cl_type,ARG>::type
		>
struct kernel_type {
	using type= K_TYPE;
};

/*
 * 模板函数
 * 根据模板参数,创建cl::make_kernel类
 * 创建cl::make_kernel类时所有的模板参数都会调用 kernel_type模板函数,
 * 以获取实例化cl::make_kernel时真正需要的类型
*/
template <
   typename T0,   typename T1 = cl::detail::NullType,   typename T2 = cl::detail::NullType,
   typename T3 = cl::detail::NullType,   typename T4 = cl::detail::NullType,
   typename T5 = cl::detail::NullType,   typename T6 = cl::detail::NullType,
   typename T7 = cl::detail::NullType,   typename T8 = cl::detail::NullType,
   typename T9 = cl::detail::NullType,   typename T10 = cl::detail::NullType,
   typename T11 = cl::detail::NullType,   typename T12 = cl::detail::NullType,
   typename T13 = cl::detail::NullType,   typename T14 = cl::detail::NullType,
   typename T15 = cl::detail::NullType,   typename T16 = cl::detail::NullType,
   typename T17 = cl::detail::NullType,   typename T18 = cl::detail::NullType,
   typename T19 = cl::detail::NullType,   typename T20 = cl::detail::NullType,
   typename T21 = cl::detail::NullType,   typename T22 = cl::detail::NullType,
   typename T23 = cl::detail::NullType,   typename T24 = cl::detail::NullType,
   typename T25 = cl::detail::NullType,   typename T26 = cl::detail::NullType,
   typename T27 = cl::detail::NullType,   typename T28 = cl::detail::NullType,
   typename T29 = cl::detail::NullType,   typename T30 = cl::detail::NullType,
   typename T31 = cl::detail::NullType
>
struct make_make_kernel{
	using type=cl::make_kernel<
			typename kernel_type<T0>::type,		typename kernel_type<T1>::type,
			typename kernel_type<T2>::type,		typename kernel_type<T3>::type,
			typename kernel_type<T4>::type,		typename kernel_type<T5>::type,
			typename kernel_type<T6>::type,		typename kernel_type<T7>::type,
			typename kernel_type<T8>::type,		typename kernel_type<T9>::type,
			typename kernel_type<T10>::type,	typename kernel_type<T11>::type,
			typename kernel_type<T12>::type,	typename kernel_type<T13>::type,
			typename kernel_type<T14>::type,	typename kernel_type<T15>::type,
			typename kernel_type<T16>::type,	typename kernel_type<T17>::type,
			typename kernel_type<T18>::type,	typename kernel_type<T19>::type,
			typename kernel_type<T20>::type,	typename kernel_type<T21>::type,
			typename kernel_type<T22>::type,	typename kernel_type<T23>::type,
			typename kernel_type<T24>::type,	typename kernel_type<T25>::type,
			typename kernel_type<T26>::type,	typename kernel_type<T27>::type,
			typename kernel_type<T28>::type,	typename kernel_type<T29>::type,
			typename kernel_type<T30>::type,	typename kernel_type<T31>::type
			>;
};

总结

进化后的run_kernel使用起来了方便多了,对kernel参数个数和顺序不再有限制,同时自动实现OpenCL内存对象数据的上传和下载。
只是代码貌似增加了好多好多,实现增加的代码主要是模板函数,都只是在编译期起作用,并不会增加多少运行时代码。
它带来的好处是当你的项目中有很多不同的kernel函数要执行时,使用这种设计方式可以大大减少撰写重复或相似的代码,同时增加代码的稳定性。

神奇的memory_cl

前面一直不断被提起的用来封装OpenCL内存对象的memory_cl是个什么神奇的东东?呵呵,其实并不复杂,就是抽象的基类而已,下面是这个类的主要实现代码和函数声明。前面代码所涉及到的所有函数都在这里有声明。

/*
 * OpenCL内存抽象模型定义
 * memory_cl为抽象接口,所有OpenCL内存对象(cl::Buffer,cl::Image等等)都被封装在该对象内部
 * 主要提供主机与设备之间的交换功能
 * 项目中涉及的其他涉及OpenCL内存对象的类都是此类的衍生类
 * matrix_cl 继承自memory_cl,是抽象矩阵类
 * integral_matrix继承自matrix_cl,积分图对象类
 * gray_matrix_cl继承自matrix_cl,灰度图像类
 * */
template<typename CL_TYPE,
		typename ENABLE=typename std::enable_if<std::is_base_of<cl::Memory,CL_TYPE>::value>::type>
class memory_cl{
public:
	using cl_cpp_type=CL_TYPE;
private:
	mutable bool	on_device=false;	// 数据是否已经在设备上标志
public:
	cl_cpp_type cl_mem_obj; // OpenCL 内存对象
	/* 如果数据没有上传到设备(on_device=false),则向OpenCL设备中上传原始矩阵数据,
	 * 上传成功则将on_device置为true
	 * */
	void upload_if_need(const cl::CommandQueue& command_queue=Null_Queue)const{
		if(!on_device){
			upload(command_queue);
		}
	}

	/* 虚函数,从OpenCL设备中下载结果数据, 将on_device标志置为true */
	virtual void download(const cl::CommandQueue& command_queue=Null_Queue){
		throw face_exception(SOURCE_AT,"sub class must implement the funtion "
				"by calling download_force(const cl::CommandQueue& command_queue,std::vector<E> &out)");
	}
	/* 虚函数,向OpenCL设备中上传原始矩阵数据, 将on_device标志置为true */
	virtual void upload(const cl::CommandQueue& command_queue=Null_Queue)const{
		throw face_exception(SOURCE_AT,
				"sub class must implement the funtion "
				"by calling upload_force(const cl::CommandQueue& command_queue,std::vector<E> &in) ");
	}
	/* upload_force上传cl::Memory对象到设备,上传成功则将on_device置为true
	 * 因为项目中只涉及到使用cl::Buffer和cl::Image2D所以,在此做只分别对cl::Buffer和cl::Image写了相关的代码,
	 * download_force也是一样
	 */
	template<typename E, typename _CL_TYPE = CL_TYPE>
	typename std::enable_if<std::is_base_of<cl::Buffer,_CL_TYPE>::value>::type
	upload_force(const std::vector<E> &in,const cl::CommandQueue& command_queue=Null_Queue) const;
	template<typename E,typename _CL_TYPE=CL_TYPE>
	typename std::enable_if<std::is_base_of<cl::Image2D,_CL_TYPE>::value>::type
	upload_force(const std::vector<E> &in,const cl::CommandQueue& command_queue=Null_Queue) const;
	/* 从cl_mem_obj对象中下载数据到out,下载成功则将on_device置为true */
	template<typename E, typename _CL_TYPE = CL_TYPE>
	typename std::enable_if<std::is_base_of<cl::Buffer,_CL_TYPE>::value>::type
	download_force(std::vector<E> &out, const cl::CommandQueue& command_queue=Null_Queue) const;
	template<typename E, typename _CL_TYPE = CL_TYPE>
	typename std::enable_if<std::is_base_of<cl::Image2D,_CL_TYPE>::value>::type
	download_force(std::vector<E> &out,size_t row_pitch=0,const cl::CommandQueue& command_queue=Null_Queue) const;
	//////////////////相关的构造函数/////////////////
	memory_cl(const CL_TYPE& cl_mem_obj,bool on_device):cl_mem_obj(cl_mem_obj),on_device(on_device){};
	memory_cl(const memory_cl&)=default;
	memory_cl(memory_cl&&)=default;
	memory_cl()=default;
	memory_cl& operator=(const memory_cl&)=default;
	memory_cl& operator=(memory_cl&&rv){
		this->cl_mem_obj=std::move(rv.cl_mem_obj);
		this->on_device=rv.on_device;
		return *this;
	};
	/* operator type()操作符,返回OpenCL内存对象 */
	operator const cl_cpp_type& ()const{	return this->cl_mem_obj;	}
	operator cl_cpp_type&(){return this->cl_mem_obj;}
	virtual ~memory_cl()=default;
};
好像卡住了”(.venv) PS E:\PyTorch_Build\pytorch> # 1. 准备工作空间 (.venv) PS E:\PyTorch_Build\pytorch> $sourceDir = "E:\PyTorch_Build\pytorch" (.venv) PS E:\PyTorch_Build\pytorch> $buildDir = "$sourceDir\build" (.venv) PS E:\PyTorch_Build\pytorch> $installDir = "$sourceDir\install" (.venv) PS E:\PyTorch_Build\pytorch> (.venv) PS E:\PyTorch_Build\pytorch> # 清理工作区 (.venv) PS E:\PyTorch_Build\pytorch> Remove-Item -Path $buildDir -Recurse -Force -ErrorAction SilentlyContinue (.venv) PS E:\PyTorch_Build\pytorch> Remove-Item -Path $installDir -Recurse -Force -ErrorAction SilentlyContinue (.venv) PS E:\PyTorch_Build\pytorch> New-Item -Path $buildDir -ItemType Directory -Force | Out-Null (.venv) PS E:\PyTorch_Build\pytorch> New-Item -Path $installDir -ItemType Directory -Force | Out-Null (.venv) PS E:\PyTorch_Build\pytorch> (.venv) PS E:\PyTorch_Build\pytorch> # 2. 设置环境变量 (.venv) PS E:\PyTorch_Build\pytorch> $env:Path = "C:\Program Files\CMake\bin;E:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\bin;E:\Python310;$env:Path" (.venv) PS E:\PyTorch_Build\pytorch> $env:CUDA_PATH = "E:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0" (.venv) PS E:\PyTorch_Build\pytorch> (.venv) PS E:\PyTorch_Build\pytorch> # 3. 安装正确的 CMake 版本 (.venv) PS E:\PyTorch_Build\pytorch> Write-Host "安装 CMake 3.28..." 安装 CMake 3.28... (.venv) PS E:\PyTorch_Build\pytorch> Invoke-WebRequest -Uri "https://github.com/Kitware/CMake/releases/download/v3.28.3/cmake-3.28.3-windows-x86_64.msi" -OutFile "$env:TEMP\cmake.msi" Invoke-WebRequest: One or more errors occurred. (The response ended prematurely. (ResponseEnded)) (.venv) PS E:\PyTorch_Build\pytorch> Start-Process msiexec -ArgumentList "/i `"$env:TEMP\cmake.msi`" /quiet" -Wait (.venv) PS E:\PyTorch_Build\pytorch> $env:Path = "C:\Program Files\CMake\bin;$env:Path" (.venv) PS E:\PyTorch_Build\pytorch> (.venv) PS E:\PyTorch_Build\pytorch> # 4. 生成 CMake 配置 (.venv) PS E:\PyTorch_Build\pytorch> Set-Location $buildDir (.venv) PS E:\PyTorch_Build\pytorch\build> (.venv) PS E:\PyTorch_Build\pytorch\build> $cmakeCommand = @" >> cmake $sourceDir ` >> -G "Visual Studio 17 2022" -A x64 ` >> -DCMAKE_INSTALL_PREFIX="$installDir" ` >> -DCMAKE_TOOLCHAIN_FILE="$sourceDir/cmake/modules/win_toolchain.cmake" ` >> -DBUILD_PYTHON=ON ` >> -DPython_EXECUTABLE="E:\Python310\python.exe" ` >> -DUSE_CUDA=ON ` >> -DUSE_CUDNN=ON ` >> -DCUDNN_INCLUDE_DIR="E:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include" ` >> -DCUDNN_LIBRARY="E:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\lib\x64\cudnn.lib" ` >> -DUSE_MKLDNN=OFF ` >> -DUSE_DNNL=OFF ` >> -DUSE_STATIC_OPENMP=ON ` >> -DBUILD_TEST=OFF ` >> -DBUILD_TORCH=ON ` >> -DCMAKE_BUILD_TYPE=Release >> "@ (.venv) PS E:\PyTorch_Build\pytorch\build> (.venv) PS E:\PyTorch_Build\pytorch\build> Invoke-Expression $cmakeCommand 2>&1 | Tee-Object -FilePath "$buildDir\cmake_configure.log" -- Building for: Visual Studio 17 2022 -- Selecting Windows SDK version 10.0.26100.0 to target Windows . -- The CXX compiler identification is MSVC 19.44.35215.0 -- The C compiler identification is MSVC 19.44.35215.0 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/bin/Hostx64/x64/cl.exe - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/bin/Hostx64/x64/cl.exe - skipped -- Detecting C compile features -- Detecting C compile features - done -- Not forcing any particular BLAS to be found -- Build type not set - defaulting to Release -- Performing Test C_HAS_AVX_1 -- Performing Test C_HAS_AVX_1 - Success -- Performing Test C_HAS_AVX2_1 -- Performing Test C_HAS_AVX2_1 - Success -- Performing Test C_HAS_AVX512_1 -- Performing Test C_HAS_AVX512_1 - Success -- Performing Test CXX_HAS_AVX_1 -- Performing Test CXX_HAS_AVX_1 - Success -- Performing Test CXX_HAS_AVX2_1 -- Performing Test CXX_HAS_AVX2_1 - Success -- Performing Test CXX_HAS_AVX512_1 -- Performing Test CXX_HAS_AVX512_1 - Success -- Current compiler supports avx2 extension. Will build perfkernels. -- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY -- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Failed -- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY -- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Failed -- Could not find hardware support for NEON on this machine. -- No OMAP3 processor on this machine. -- No OMAP4 processor on this machine. -- Compiler does not support SVE extension. Will not build perfkernels. -- Performing Test HAS/UTF_8 -- Performing Test HAS/UTF_8 - Success -- Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR) (found version "13.0") -- Building using own protobuf under third_party per request. -- Use custom protobuf build. -- -- 3.13.0.0 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - not found -- Found Threads: TRUE -- Caffe2 protobuf include directory: $<BUILD_INTERFACE:E:/PyTorch_Build/pytorch/third_party/protobuf/src>$<INSTALL_INTERFACE:include> -- Trying to find preferred BLAS backend of choice: MKL -- MKL_THREADING = OMP -- Looking for sys/types.h -- Looking for sys/types.h - found -- Looking for stdint.h -- Looking for stdint.h - found -- Looking for stddef.h -- Looking for stddef.h - found -- Check size of void* -- Check size of void* - done -- MKL_THREADING = OMP -- MKL_THREADING = OMP -- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - libiomp5md] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_intel_thread - mkl_core - libiomp5md] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_intel_thread - mkl_core] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_sequential - mkl_core] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_sequential - mkl_core] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_core - libiomp5md - pthread] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_core - libiomp5md - pthread] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_core - pthread] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_core - pthread] -- Library mkl_intel: not found -- Checking for [mkl - guide - pthread - m] -- Library mkl: not found -- MKL library not found -- Checking for [blis] -- Library blis: BLAS_blis_LIBRARY-NOTFOUND -- Checking for [Accelerate] -- Library Accelerate: BLAS_Accelerate_LIBRARY-NOTFOUND -- Checking for [vecLib] -- Library vecLib: BLAS_vecLib_LIBRARY-NOTFOUND -- Checking for [flexiblas] -- Library flexiblas: BLAS_flexiblas_LIBRARY-NOTFOUND -- Checking for [openblas] -- Library openblas: BLAS_openblas_LIBRARY-NOTFOUND -- Checking for [openblas - pthread - m] -- Library openblas: BLAS_openblas_LIBRARY-NOTFOUND -- Checking for [openblas - pthread - m - gomp] -- Library openblas: BLAS_openblas_LIBRARY-NOTFOUND -- Checking for [libopenblas] -- Library libopenblas: BLAS_libopenblas_LIBRARY-NOTFOUND -- Checking for [goto2 - gfortran] -- Library goto2: BLAS_goto2_LIBRARY-NOTFOUND -- Checking for [goto2 - gfortran - pthread] -- Library goto2: BLAS_goto2_LIBRARY-NOTFOUND -- Checking for [acml - gfortran] -- Library acml: BLAS_acml_LIBRARY-NOTFOUND -- Checking for [blis] -- Library blis: BLAS_blis_LIBRARY-NOTFOUND -- Could NOT find Atlas (missing: Atlas_CBLAS_INCLUDE_DIR Atlas_CLAPACK_INCLUDE_DIR Atlas_CBLAS_LIBRARY Atlas_BLAS_LIBRARY Atlas_LAPACK_LIBRARY) -- Checking for [ptf77blas - atlas - gfortran] -- Library ptf77blas: BLAS_ptf77blas_LIBRARY-NOTFOUND -- Checking for [] -- Looking for sgemm_ -- Looking for sgemm_ - not found -- Cannot find a library with BLAS API. Not using BLAS. -- Using pocketfft in directory: E:/PyTorch_Build/pytorch/third_party/pocketfft/ -- Using third party subdirectory Eigen. -- Found Python: E:/Python310/python.exe (found version "3.10.10") found components: Interpreter Development.Module NumPy -- Using third_party/pybind11. -- pybind11 include dirs: E:/PyTorch_Build/pytorch/cmake/../third_party/pybind11/include -- Could NOT find OpenTelemetryApi (missing: OpenTelemetryApi_INCLUDE_DIRS) -- Using third_party/opentelemetry-cpp. -- opentelemetry api include dirs: E:/PyTorch_Build/pytorch/cmake/../third_party/opentelemetry-cpp/api/include -- Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS) -- Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS) -- Could NOT find MPI (missing: MPI_C_FOUND MPI_CXX_FOUND) -- MKL_THREADING = OMP -- Check OMP with lib C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/lib/x64/libomp.lib and flags -openmp:experimental -- MKL_THREADING = OMP -- Check OMP with lib C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/lib/x64/libomp.lib and flags -openmp:experimental -- Found OpenMP_C: -openmp:experimental -- Found OpenMP_CXX: -openmp:experimental -- Found OpenMP: TRUE -- Adding OpenMP CXX_FLAGS: -openmp:experimental -- Will link against OpenMP libraries: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/lib/x64/libomp.lib -- Found nvtx3: E:/PyTorch_Build/pytorch/third_party/NVTX/c/include -- ROCM_PATH environment variable is not set and C:/opt/rocm does not exist. Building without ROCm support. -- Found Python3: E:/Python310/python.exe (found version "3.10.10") found components: Interpreter -- ONNX_PROTOC_EXECUTABLE: $<TARGET_FILE:protobuf::protoc> -- Protobuf_VERSION: Protobuf_VERSION_NOTFOUND -- -- ******** Summary ******** -- CMake version : 4.1.0 -- CMake command : E:/Python310/Lib/site-packages/cmake/data/bin/cmake.exe -- System : Windows -- C++ compiler : C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/bin/Hostx64/x64/cl.exe -- C++ compiler version : 19.44.35215.0 -- CXX flags : /DWIN32 /D_WINDOWS /EHsc /Zc:__cplusplus /MP /bigobj /FS /utf-8 /EHsc /wd26812 -- Build type : Release -- Compile definitions : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1 -- CMAKE_PREFIX_PATH : -- CMAKE_INSTALL_PREFIX : C:/Program Files (x86)/Torch -- CMAKE_MODULE_PATH : E:/PyTorch_Build/pytorch/cmake/Modules;E:/PyTorch_Build/pytorch/cmake/public/../Modules_CUDA_fix -- -- ONNX version : 1.18.0 -- ONNX NAMESPACE : onnx_torch -- ONNX_USE_LITE_PROTO : OFF -- USE_PROTOBUF_SHARED_LIBS : OFF -- ONNX_DISABLE_EXCEPTIONS : OFF -- ONNX_DISABLE_STATIC_REGISTRATION : OFF -- ONNX_WERROR : OFF -- ONNX_BUILD_TESTS : OFF -- BUILD_SHARED_LIBS : OFF -- -- Protobuf compiler : $<TARGET_FILE:protobuf::protoc> -- Protobuf includes : -- Protobuf libraries : -- ONNX_BUILD_PYTHON : OFF -- Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor -- Adding -DNDEBUG to compile flags -- Could not find hardware support for NEON on this machine. -- No OMAP3 processor on this machine. -- No OMAP4 processor on this machine. -- MKL_THREADING = OMP -- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - libiomp5md] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_intel_thread - mkl_core - libiomp5md] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_intel_thread - mkl_core] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_sequential - mkl_core] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_sequential - mkl_core] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_core - libiomp5md - pthread] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_core - libiomp5md - pthread] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_core - pthread] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_core - pthread] -- Library mkl_intel: not found -- Checking for [mkl - guide - pthread - m] -- Library mkl: not found -- MKL library not found -- Checking for [blis] -- Library blis: BLAS_blis_LIBRARY-NOTFOUND -- Checking for [Accelerate] -- Library Accelerate: BLAS_Accelerate_LIBRARY-NOTFOUND -- Checking for [vecLib] -- Library vecLib: BLAS_vecLib_LIBRARY-NOTFOUND -- Checking for [flexiblas] -- Library flexiblas: BLAS_flexiblas_LIBRARY-NOTFOUND -- Checking for [openblas] -- Library openblas: BLAS_openblas_LIBRARY-NOTFOUND -- Checking for [openblas - pthread - m] -- Library openblas: BLAS_openblas_LIBRARY-NOTFOUND -- Checking for [openblas - pthread - m - gomp] -- Library openblas: BLAS_openblas_LIBRARY-NOTFOUND -- Checking for [libopenblas] -- Library libopenblas: BLAS_libopenblas_LIBRARY-NOTFOUND -- Checking for [goto2 - gfortran] -- Library goto2: BLAS_goto2_LIBRARY-NOTFOUND -- Checking for [goto2 - gfortran - pthread] -- Library goto2: BLAS_goto2_LIBRARY-NOTFOUND -- Checking for [acml - gfortran] -- Library acml: BLAS_acml_LIBRARY-NOTFOUND -- Checking for [blis] -- Library blis: BLAS_blis_LIBRARY-NOTFOUND -- Could NOT find Atlas (missing: Atlas_CBLAS_INCLUDE_DIR Atlas_CLAPACK_INCLUDE_DIR Atlas_CBLAS_LIBRARY Atlas_BLAS_LIBRARY Atlas_LAPACK_LIBRARY) -- Checking for [ptf77blas - atlas - gfortran] -- Library ptf77blas: BLAS_ptf77blas_LIBRARY-NOTFOUND -- Checking for [] -- Cannot find a library with BLAS API. Not using BLAS. -- LAPACK requires BLAS -- Cannot find a library with LAPACK API. Not using LAPACK. -- MIOpen not found. Compiling without MIOpen support -- {fmt} version: 11.2.0 -- Build type: Release -- Using CPU-only version of Kineto -- Configuring Kineto dependency: -- KINETO_SOURCE_DIR = E:/PyTorch_Build/pytorch/third_party/kineto/libkineto -- KINETO_BUILD_TESTS = OFF -- KINETO_LIBRARY_TYPE = static -- Found PythonInterp: E:/Python310/python.exe (found version "3.10.10") -- CUDA_SOURCE_DIR = -- ROCM_SOURCE_DIR = -- CUPTI unavailable or disabled - not building GPU profilers -- Kineto: FMT_SOURCE_DIR = E:/PyTorch_Build/pytorch/third_party/fmt -- Kineto: FMT_INCLUDE_DIR = E:/PyTorch_Build/pytorch/third_party/fmt/include -- CUPTI_INCLUDE_DIR = /extras/CUPTI/include -- ROCTRACER_INCLUDE_DIR = /include/roctracer -- DYNOLOG_INCLUDE_DIR = E:/PyTorch_Build/pytorch/third_party/kineto/libkineto/third_party/dynolog/ -- IPCFABRIC_INCLUDE_DIR = E:/PyTorch_Build/pytorch/third_party/kineto/libkineto/third_party/dynolog//dynolog/src/ipcfabric/ -- Configured Kineto (CPU) -- Performing Test HAS/WD4624 -- Performing Test HAS/WD4624 - Success -- Performing Test HAS/WD4068 -- Performing Test HAS/WD4068 - Success -- Performing Test HAS/WD4067 -- Performing Test HAS/WD4067 - Success -- Performing Test HAS/WD4267 -- Performing Test HAS/WD4267 - Success -- Performing Test HAS/WD4661 -- Performing Test HAS/WD4661 - Success -- Performing Test HAS/WD4717 -- Performing Test HAS/WD4717 - Success -- Performing Test HAS/WD4244 -- Performing Test HAS/WD4244 - Success -- Performing Test HAS/WD4804 -- Performing Test HAS/WD4804 - Success -- Performing Test HAS/WD4273 -- Performing Test HAS/WD4273 - Success -- Performing Test HAS_WNO_STRINGOP_OVERFLOW -- Performing Test HAS_WNO_STRINGOP_OVERFLOW - Failed -- Found Git: E:/Program Files/Git/cmd/git.exe (found version "2.51.0.windows.1") -- -- Note: when building with Visual Studio the build type is specified when building. -- For example: 'cmake --build . --config=Release -- Architecture: -- Use the C++ compiler to compile (MI_USE_CXX=ON) -- -- Library name : mimalloc -- Version : 2.2.4 -- Build type : release -- C++ Compiler : C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/bin/Hostx64/x64/cl.exe -- Compiler flags : /Zc:__cplusplus -- Compiler defines : MI_CMAKE_BUILD_TYPE=release;MI_BUILD_RELEASE -- Link libraries : psapi;shell32;user32;advapi32;bcrypt -- Build targets : static -- -- don't use NUMA -- Looking for backtrace -- Looking for backtrace - not found -- Could NOT find Backtrace (missing: Backtrace_LIBRARY Backtrace_INCLUDE_DIR) -- headers outputs: torch\csrc\inductor\aoti_torch\generated\c_shim_cuda.h not found torch\csrc\inductor\aoti_torch\generated\c_shim_cpu.h not found torch\csrc\inductor\aoti_torch\generated\c_shim_aten.h not found -- sources outputs: -- declarations_yaml outputs: -- Performing Test COMPILER_SUPPORTS_NO_AVX256_SPLIT -- Performing Test COMPILER_SUPPORTS_NO_AVX256_SPLIT - Failed -- Using ATen parallel backend: OMP -- Check size of long double -- Check size of long double - done -- Performing Test COMPILER_SUPPORTS_FLOAT128 -- Performing Test COMPILER_SUPPORTS_FLOAT128 - Failed -- Found OpenMP_C: -openmp:experimental (found version "2.0") -- Found OpenMP_CXX: -openmp:experimental (found version "2.0") -- Found OpenMP: TRUE (found version "2.0") -- Performing Test COMPILER_SUPPORTS_OPENMP -- Performing Test COMPILER_SUPPORTS_OPENMP - Success -- Performing Test COMPILER_SUPPORTS_OMP_SIMD -- Performing Test COMPILER_SUPPORTS_OMP_SIMD - Failed -- Performing Test COMPILER_SUPPORTS_WEAK_ALIASES -- Performing Test COMPILER_SUPPORTS_WEAK_ALIASES - Failed -- Performing Test COMPILER_SUPPORTS_BUILTIN_MATH -- Performing Test COMPILER_SUPPORTS_BUILTIN_MATH - Failed -- Performing Test COMPILER_SUPPORTS_SYS_GETRANDOM -- Performing Test COMPILER_SUPPORTS_SYS_GETRANDOM - Failed -- Configuring build for SLEEF-v3.8.0 -- Using option `/D_CRT_SECURE_NO_WARNINGS /D_CRT_NONSTDC_NO_DEPRECATE ` to compile libsleef -- Building shared libs : OFF -- Building static test bins: OFF -- MPFR : LIB_MPFR-NOTFOUND -- GMP : LIBGMP-NOTFOUND -- RT : -- FFTW3 : LIBFFTW3-NOTFOUND -- OPENSSL : -- SDE : SDE_COMMAND-NOTFOUND -- COMPILER_SUPPORTS_OPENMP : FALSE -- -- ******** Summary ******** -- General: -- CMake version : 4.1.0 -- CMake command : E:/Python310/Lib/site-packages/cmake/data/bin/cmake.exe -- System : Windows -- C++ compiler : C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/bin/Hostx64/x64/cl.exe -- C++ compiler id : MSVC -- C++ compiler version : 19.44.35215.0 -- Using ccache if found : OFF -- CXX flags : /DWIN32 /D_WINDOWS /EHsc /Zc:__cplusplus /MP /bigobj /FS /utf-8 -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273 -- Shared LD flags : /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 -- Static LD flags : /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 -- Module LD flags : /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 -- Build type : Release -- Compile definitions : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;_CRT_SECURE_NO_DEPRECATE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS;EXPORT_AOTI_FUNCTIONS;WIN32_LEAN_AND_MEAN;_UCRT_LEGACY_INFINITY;NOMINMAX;USE_MIMALLOC -- CMAKE_PREFIX_PATH : -- CMAKE_INSTALL_PREFIX : C:/Program Files (x86)/Torch -- USE_GOLD_LINKER : OFF -- -- TORCH_VERSION : 2.9.0 -- BUILD_STATIC_RUNTIME_BENCHMARK: OFF -- BUILD_BINARY : OFF -- BUILD_CUSTOM_PROTOBUF : ON -- Link local protobuf : ON -- BUILD_PYTHON : ON -- Python version : 3.10.10 -- Python executable : E:/Python310/python.exe -- Python library : E:/Python310/libs/python310.lib -- Python includes : E:/Python310/include -- Python site-package : E:\Python310\Lib\site-packages -- BUILD_SHARED_LIBS : ON -- CAFFE2_USE_MSVC_STATIC_RUNTIME : OFF -- BUILD_TEST : OFF -- BUILD_JNI : OFF -- BUILD_MOBILE_AUTOGRAD : OFF -- BUILD_LITE_INTERPRETER: OFF -- INTERN_BUILD_MOBILE : -- TRACING_BASED : OFF -- USE_BLAS : 0 -- USE_LAPACK : 0 -- USE_ASAN : OFF -- USE_TSAN : OFF -- USE_CPP_CODE_COVERAGE : OFF -- USE_CUDA : OFF -- USE_XPU : OFF -- USE_ROCM : OFF -- BUILD_NVFUSER : -- USE_EIGEN_FOR_BLAS : ON -- USE_EIGEN_FOR_SPARSE : OFF -- USE_FBGEMM : OFF -- USE_KINETO : ON -- USE_GFLAGS : OFF -- USE_GLOG : OFF -- USE_LITE_PROTO : OFF -- USE_PYTORCH_METAL : OFF -- USE_PYTORCH_METAL_EXPORT : OFF -- USE_MPS : OFF -- CAN_COMPILE_METAL : -- USE_MKL : OFF -- USE_MKLDNN : OFF -- USE_UCC : OFF -- USE_ITT : OFF -- USE_XCCL : OFF -- USE_NCCL : OFF -- Found NVSHMEM : -- USE_NNPACK : OFF -- USE_NUMPY : ON -- USE_OBSERVERS : ON -- USE_OPENCL : OFF -- USE_OPENMP : ON -- USE_MIMALLOC : ON -- USE_MIMALLOC_ON_MKL : OFF -- USE_VULKAN : OFF -- USE_PROF : OFF -- USE_PYTORCH_QNNPACK : OFF -- USE_XNNPACK : OFF -- USE_DISTRIBUTED : OFF -- Public Dependencies : -- Private Dependencies : Threads::Threads;cpuinfo;fp16;caffe2::openmp;fmt::fmt-header-only;kineto -- Public CUDA Deps. : -- Private CUDA Deps. : -- USE_COREML_DELEGATE : OFF -- BUILD_LAZY_TS_BACKEND : ON -- USE_ROCM_KERNEL_ASSERT : OFF -- Performing Test HAS_WMISSING_PROTOTYPES -- Performing Test HAS_WMISSING_PROTOTYPES - Failed -- Performing Test HAS_WERROR_MISSING_PROTOTYPES -- Performing Test HAS_WERROR_MISSING_PROTOTYPES - Failed -- Configuring incomplete, errors occurred! (.venv) PS E:\PyTorch_Build\pytorch\build> (.venv) PS E:\PyTorch_Build\pytorch\build> # 5. 编译和安装 (.venv) PS E:\PyTorch_Build\pytorch\build> cmake --build . --config Release --target install -j 8 2>&1 | Tee-Object -FilePath "$buildDir\cmake_build.log" 閫傜敤浜?.NET Framework MSBuild 鐗堟湰 17.14.19+164abd434 MSBUILD : error MSB1009: 椤圭洰鏂囦欢涓嶅瓨鍦ㄣ€? 寮€鍏?install.vcxproj (.venv) PS E:\PyTorch_Build\pytorch\build> (.venv) PS E:\PyTorch_Build\pytorch\build> # 6. 安装 Python 绑定 (.venv) PS E:\PyTorch_Build\pytorch\build> Set-Location $sourceDir (.venv) PS E:\PyTorch_Build\pytorch> & "E:\Python310\python.exe" setup.py develop --cmake --no-deps Building wheel torch-2.9.0a0+git2d31c3d -- Building version 2.9.0a0+git2d31c3d -- Checkout nccl release tag: v2.27.5-1 cmake -GVisual Studio 17 2022 -Ax64 -Thost=x64 -DBUILD_ONLY_CORE=1 -DBUILD_PYTHON=True -DBUILD_SHARED_LIBS=OFF -DBUILD_TEST=True -DBUILD_TORCH=1 -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXE_LINKER_FLAGS=-static -DCMAKE_GENERATOR=Visual Studio 17 2022 -DCMAKE_INSTALL_PREFIX=E:\PyTorch_Build\pytorch\torch -DCMAKE_PREFIX_PATH=E:\Python310\Lib\site-packages -DCMAKE_TOOLCHAIN_FILE=E:\PyTorch_Build\pytorch\cmake\modules\win_toolchain.cmake -DPython_EXECUTABLE=E:\Python310\python.exe -DPython_NumPy_INCLUDE_DIR=E:\Python310\lib\site-packages\numpy\core\include -DTORCH_BUILD_VERSION=2.9.0a0+git2d31c3d -DUSE_NINJA=0 -DUSE_NUMPY=True -DUSE_STATIC_CUDNN=ON -DUSE_STATIC_NCCL=ON -DUSE_STATIC_OPENMP=ON E:\PyTorch_Build\pytorch CMake Deprecation Warning at CMakeLists.txt:18 (cmake_policy): The OLD behavior for policy CMP0126 will be removed from a future version of CMake. The cmake-policies(7) manual explains that the OLD behaviors of all policies are deprecated and that a policy should be set to OLD only under specific short-term circumstances. Projects should be ported to the NEW behavior and not rely on setting a policy to OLD. -- Selecting Windows SDK version 10.0.26100.0 to target Windows . -- The CXX compiler identification is MSVC 19.44.35215.0 -- The C compiler identification is MSVC 19.44.35215.0 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/bin/Hostx64/x64/cl.exe - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/bin/Hostx64/x64/cl.exe - skipped -- Detecting C compile features -- Detecting C compile features - done -- Not forcing any particular BLAS to be found CMake Warning at CMakeLists.txt:425 (message): TensorPipe cannot be used on Windows. Set it to OFF CMake Warning at CMakeLists.txt:427 (message): KleidiAI cannot be used on Windows. Set it to OFF CMake Warning at CMakeLists.txt:439 (message): Libuv is not installed in current conda env. Set USE_DISTRIBUTED to OFF. Please run command 'conda install -c conda-forge libuv=1.39' to install libuv. -- Performing Test C_HAS_AVX_1 -- Performing Test C_HAS_AVX_1 - Success -- Performing Test C_HAS_AVX2_1 -- Performing Test C_HAS_AVX2_1 - Success -- Performing Test C_HAS_AVX512_1 -- Performing Test C_HAS_AVX512_1 - Success -- Performing Test CXX_HAS_AVX_1 -- Performing Test CXX_HAS_AVX_1 - Success -- Performing Test CXX_HAS_AVX2_1 -- Performing Test CXX_HAS_AVX2_1 - Success -- Performing Test CXX_HAS_AVX512_1 -- Performing Test CXX_HAS_AVX512_1 - Success -- Current compiler supports avx2 extension. Will build perfkernels. -- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY -- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Failed -- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY -- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Failed -- Could not find hardware support for NEON on this machine. -- No OMAP3 processor on this machine. -- No OMAP4 processor on this machine. -- Compiler does not support SVE extension. Will not build perfkernels. CMake Warning at CMakeLists.txt:845 (message): x64 operating system is required for FBGEMM. Not compiling with FBGEMM. Turn this warning off by USE_FBGEMM=OFF. -- Performing Test HAS/UTF_8 -- Performing Test HAS/UTF_8 - Success -- Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR) (found version "13.0") CMake Warning at cmake/public/cuda.cmake:31 (message): PyTorch: CUDA cannot be found. Depending on whether you are building PyTorch or a PyTorch dependent library, the next warning / error will give you more info. Call Stack (most recent call first): cmake/Dependencies.cmake:44 (include) CMakeLists.txt:873 (include) CMake Warning at cmake/Dependencies.cmake:76 (message): Not compiling with CUDA. Suppress this warning with -DUSE_CUDA=OFF. Call Stack (most recent call first): CMakeLists.txt:873 (include) CMake Warning at cmake/Dependencies.cmake:95 (message): Not compiling with XPU. Could NOT find SYCL. Suppress this warning with -DUSE_XPU=OFF. Call Stack (most recent call first): CMakeLists.txt:873 (include) -- Building using own protobuf under third_party per request. -- Use custom protobuf build. CMake Warning at cmake/ProtoBuf.cmake:37 (message): Ancient protobuf forces CMake compatibility Call Stack (most recent call first): cmake/ProtoBuf.cmake:87 (custom_protobuf_find) cmake/Dependencies.cmake:107 (include) CMakeLists.txt:873 (include) CMake Deprecation Warning at third_party/protobuf/cmake/CMakeLists.txt:2 (cmake_minimum_required): Compatibility with CMake < 3.10 will be removed from a future version of CMake. Update the VERSION argument <min> value. Or, use the <min>...<max> syntax to tell CMake that the project requires at least <min> but has been updated to work with policies introduced by <max> or earlier. -- -- 3.13.0.0 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - not found -- Found Threads: TRUE -- Caffe2 protobuf include directory: $<BUILD_INTERFACE:E:/PyTorch_Build/pytorch/third_party/protobuf/src>$<INSTALL_INTERFACE:include> -- Trying to find preferred BLAS backend of choice: MKL -- MKL_THREADING = OMP -- Looking for sys/types.h -- Looking for sys/types.h - found -- Looking for stdint.h -- Looking for stdint.h - found -- Looking for stddef.h -- Looking for stddef.h - found -- Check size of void* -- Check size of void* - done -- MKL_THREADING = OMP CMake Warning at cmake/Dependencies.cmake:213 (message): MKL could not be found. Defaulting to Eigen Call Stack (most recent call first): CMakeLists.txt:873 (include) CMake Warning at cmake/Dependencies.cmake:279 (message): Preferred BLAS (MKL) cannot be found, now searching for a general BLAS library Call Stack (most recent call first): CMakeLists.txt:873 (include) -- MKL_THREADING = OMP -- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - libiomp5md] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_intel_thread - mkl_core - libiomp5md] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_intel_thread - mkl_core] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_sequential - mkl_core] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_sequential - mkl_core] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_core - libiomp5md - pthread] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_core - libiomp5md - pthread] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_core - pthread] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_core - pthread] -- Library mkl_intel: not found -- Checking for [mkl - guide - pthread - m] -- Library mkl: not found -- MKL library not found -- Checking for [blis] -- Library blis: BLAS_blis_LIBRARY-NOTFOUND -- Checking for [Accelerate] -- Library Accelerate: BLAS_Accelerate_LIBRARY-NOTFOUND -- Checking for [vecLib] -- Library vecLib: BLAS_vecLib_LIBRARY-NOTFOUND -- Checking for [flexiblas] -- Library flexiblas: BLAS_flexiblas_LIBRARY-NOTFOUND -- Checking for [openblas] -- Library openblas: BLAS_openblas_LIBRARY-NOTFOUND -- Checking for [openblas - pthread - m] -- Library openblas: BLAS_openblas_LIBRARY-NOTFOUND -- Checking for [openblas - pthread - m - gomp] -- Library openblas: BLAS_openblas_LIBRARY-NOTFOUND -- Checking for [libopenblas] -- Library libopenblas: BLAS_libopenblas_LIBRARY-NOTFOUND -- Checking for [goto2 - gfortran] -- Library goto2: BLAS_goto2_LIBRARY-NOTFOUND -- Checking for [goto2 - gfortran - pthread] -- Library goto2: BLAS_goto2_LIBRARY-NOTFOUND -- Checking for [acml - gfortran] -- Library acml: BLAS_acml_LIBRARY-NOTFOUND -- Checking for [blis] -- Library blis: BLAS_blis_LIBRARY-NOTFOUND -- Could NOT find Atlas (missing: Atlas_CBLAS_INCLUDE_DIR Atlas_CLAPACK_INCLUDE_DIR Atlas_CBLAS_LIBRARY Atlas_BLAS_LIBRARY Atlas_LAPACK_LIBRARY) -- Checking for [ptf77blas - atlas - gfortran] -- Library ptf77blas: BLAS_ptf77blas_LIBRARY-NOTFOUND -- Checking for [] -- Looking for sgemm_ -- Looking for sgemm_ - not found -- Cannot find a library with BLAS API. Not using BLAS. -- Using pocketfft in directory: E:/PyTorch_Build/pytorch/third_party/pocketfft/ CMake Warning at cmake/Dependencies.cmake:351 (message): Target architecture "" is not supported in {Q/X}NNPACK. Supported architectures are x86, x86-64, ARM, and ARM64. Turn this warning off by USE_{Q/X}NNPACK=OFF. Call Stack (most recent call first): CMakeLists.txt:873 (include) CMake Deprecation Warning at third_party/cpuinfo/CMakeLists.txt:1 (CMAKE_MINIMUM_REQUIRED): Compatibility with CMake < 3.10 will be removed from a future version of CMake. Update the VERSION argument <min> value. Or, use the <min>...<max> syntax to tell CMake that the project requires at least <min> but has been updated to work with policies introduced by <max> or earlier. CMake Warning at third_party/cpuinfo/CMakeLists.txt:93 (MESSAGE): Target processor architecture is not specified. cpuinfo will compile, but cpuinfo_initialize() will always fail. -- Found Git: E:/Program Files/Git/cmd/git.exe (found version "2.51.0.windows.1") -- Google Benchmark version: v1.9.3, normalized to 1.9.3 -- Looking for shm_open in rt -- Looking for shm_open in rt - not found -- Performing Test HAVE_CXX_FLAG_WX -- Performing Test HAVE_CXX_FLAG_WX - Success -- Cross-compiling to test HAVE_STD_REGEX CMake Warning at third_party/benchmark/cmake/CXXFeatureCheck.cmake:49 (message): If you see build failures due to cross compilation, try setting HAVE_STD_REGEX to 0 Call Stack (most recent call first): third_party/benchmark/CMakeLists.txt:311 (cxx_feature_check) -- Performing Test HAVE_STD_REGEX -- success -- Cross-compiling to test HAVE_GNU_POSIX_REGEX -- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile -- Cross-compiling to test HAVE_POSIX_REGEX -- Performing Test HAVE_POSIX_REGEX -- failed to compile -- Cross-compiling to test HAVE_STEADY_CLOCK CMake Warning at third_party/benchmark/cmake/CXXFeatureCheck.cmake:49 (message): If you see build failures due to cross compilation, try setting HAVE_STEADY_CLOCK to 0 Call Stack (most recent call first): third_party/benchmark/CMakeLists.txt:322 (cxx_feature_check) -- Performing Test HAVE_STEADY_CLOCK -- success -- Cross-compiling to test HAVE_PTHREAD_AFFINITY -- Performing Test HAVE_PTHREAD_AFFINITY -- failed to compile CMake Warning at cmake/Dependencies.cmake:749 (message): FP16 is only cmake-2.8 compatible Call Stack (most recent call first): CMakeLists.txt:873 (include) CMake Deprecation Warning at third_party/FP16/CMakeLists.txt:1 (CMAKE_MINIMUM_REQUIRED): Compatibility with CMake < 3.10 will be removed from a future version of CMake. Update the VERSION argument <min> value. Or, use the <min>...<max> syntax to tell CMake that the project requires at least <min> but has been updated to work with policies introduced by <max> or earlier. CMake Deprecation Warning at third_party/psimd/CMakeLists.txt:1 (CMAKE_MINIMUM_REQUIRED): Compatibility with CMake < 3.10 will be removed from a future version of CMake. Update the VERSION argument <min> value. Or, use the <min>...<max> syntax to tell CMake that the project requires at least <min> but has been updated to work with policies introduced by <max> or earlier. -- Using third party subdirectory Eigen. -- Found Python: E:\Python310\python.exe (found version "3.10.10") found components: Interpreter Development.Module NumPy -- Using third_party/pybind11. -- pybind11 include dirs: E:/PyTorch_Build/pytorch/cmake/../third_party/pybind11/include -- Could NOT find OpenTelemetryApi (missing: OpenTelemetryApi_INCLUDE_DIRS) -- Using third_party/opentelemetry-cpp. -- opentelemetry api include dirs: E:/PyTorch_Build/pytorch/cmake/../third_party/opentelemetry-cpp/api/include -- Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS) -- Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS) -- Could NOT find MPI (missing: MPI_C_FOUND MPI_CXX_FOUND) CMake Warning at cmake/Dependencies.cmake:894 (message): Not compiling with MPI. Suppress this warning with -DUSE_MPI=OFF Call Stack (most recent call first): CMakeLists.txt:873 (include) -- MKL_THREADING = OMP -- Check OMP with lib C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/lib/x64/libomp.lib and flags -openmp:experimental -- MKL_THREADING = OMP -- Check OMP with lib C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/lib/x64/libomp.lib and flags -openmp:experimental -- Found OpenMP_C: -openmp:experimental -- Found OpenMP_CXX: -openmp:experimental -- Found OpenMP: TRUE -- Adding OpenMP CXX_FLAGS: -openmp:experimental -- Will link against OpenMP libraries: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/lib/x64/libomp.lib -- Found nvtx3: E:/PyTorch_Build/pytorch/third_party/NVTX/c/include -- ROCM_PATH environment variable is not set and C:/opt/rocm does not exist. Building without ROCm support. -- Found Python3: E:\Python310\python.exe (found version "3.10.10") found components: Interpreter -- ONNX_PROTOC_EXECUTABLE: $<TARGET_FILE:protobuf::protoc> -- Protobuf_VERSION: Protobuf_VERSION_NOTFOUND Generated: E:/PyTorch_Build/pytorch/build/third_party/onnx/onnx/onnx_onnx_torch-ml.proto Generated: E:/PyTorch_Build/pytorch/build/third_party/onnx/onnx/onnx-operators_onnx_torch-ml.proto Generated: E:/PyTorch_Build/pytorch/build/third_party/onnx/onnx/onnx-data_onnx_torch.proto -- -- ******** Summary ******** -- CMake version : 4.1.0 -- CMake command : E:/Python310/Lib/site-packages/cmake/data/bin/cmake.exe -- System : Windows -- C++ compiler : C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/bin/Hostx64/x64/cl.exe -- C++ compiler version : 19.44.35215.0 -- CXX flags : /DWIN32 /D_WINDOWS /EHsc /Zc:__cplusplus /MP /bigobj /FS /utf-8 /EHsc /wd26812 -- Build type : Release -- Compile definitions : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1 -- CMAKE_PREFIX_PATH : E:\Python310\Lib\site-packages -- CMAKE_INSTALL_PREFIX : E:/PyTorch_Build/pytorch/torch -- CMAKE_MODULE_PATH : E:/PyTorch_Build/pytorch/cmake/Modules;E:/PyTorch_Build/pytorch/cmake/public/../Modules_CUDA_fix -- -- ONNX version : 1.18.0 -- ONNX NAMESPACE : onnx_torch -- ONNX_USE_LITE_PROTO : OFF -- USE_PROTOBUF_SHARED_LIBS : OFF -- ONNX_DISABLE_EXCEPTIONS : OFF -- ONNX_DISABLE_STATIC_REGISTRATION : OFF -- ONNX_WERROR : OFF -- ONNX_BUILD_TESTS : OFF -- BUILD_SHARED_LIBS : OFF -- -- Protobuf compiler : $<TARGET_FILE:protobuf::protoc> -- Protobuf includes : -- Protobuf libraries : -- ONNX_BUILD_PYTHON : OFF -- Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor -- Adding -DNDEBUG to compile flags CMake Warning at cmake/Dependencies.cmake:1418 (message): Not compiling with MAGMA. Suppress this warning with -DUSE_MAGMA=OFF. Call Stack (most recent call first): CMakeLists.txt:873 (include) -- Could not find hardware support for NEON on this machine. -- No OMAP3 processor on this machine. -- No OMAP4 processor on this machine. -- MKL_THREADING = OMP -- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - libiomp5md] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_intel_thread - mkl_core - libiomp5md] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_intel_thread - mkl_core] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_sequential - mkl_core] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_sequential - mkl_core] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_core - libiomp5md - pthread] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_core - libiomp5md - pthread] -- Library mkl_intel: not found -- Checking for [mkl_intel_lp64 - mkl_core - pthread] -- Library mkl_intel_lp64: not found -- Checking for [mkl_intel - mkl_core - pthread] -- Library mkl_intel: not found -- Checking for [mkl - guide - pthread - m] -- Library mkl: not found -- MKL library not found -- Checking for [blis] -- Library blis: BLAS_blis_LIBRARY-NOTFOUND -- Checking for [Accelerate] -- Library Accelerate: BLAS_Accelerate_LIBRARY-NOTFOUND -- Checking for [vecLib] -- Library vecLib: BLAS_vecLib_LIBRARY-NOTFOUND -- Checking for [flexiblas] -- Library flexiblas: BLAS_flexiblas_LIBRARY-NOTFOUND -- Checking for [openblas] -- Library openblas: BLAS_openblas_LIBRARY-NOTFOUND -- Checking for [openblas - pthread - m] -- Library openblas: BLAS_openblas_LIBRARY-NOTFOUND -- Checking for [openblas - pthread - m - gomp] -- Library openblas: BLAS_openblas_LIBRARY-NOTFOUND -- Checking for [libopenblas] -- Library libopenblas: BLAS_libopenblas_LIBRARY-NOTFOUND -- Checking for [goto2 - gfortran] -- Library goto2: BLAS_goto2_LIBRARY-NOTFOUND -- Checking for [goto2 - gfortran - pthread] -- Library goto2: BLAS_goto2_LIBRARY-NOTFOUND -- Checking for [acml - gfortran] -- Library acml: BLAS_acml_LIBRARY-NOTFOUND -- Checking for [blis] -- Library blis: BLAS_blis_LIBRARY-NOTFOUND -- Could NOT find Atlas (missing: Atlas_CBLAS_INCLUDE_DIR Atlas_CLAPACK_INCLUDE_DIR Atlas_CBLAS_LIBRARY Atlas_BLAS_LIBRARY Atlas_LAPACK_LIBRARY) -- Checking for [ptf77blas - atlas - gfortran] -- Library ptf77blas: BLAS_ptf77blas_LIBRARY-NOTFOUND -- Checking for [] -- Cannot find a library with BLAS API. Not using BLAS. -- LAPACK requires BLAS -- Cannot find a library with LAPACK API. Not using LAPACK. disabling CUDA because NOT USE_CUDA is set disabling ROCM because NOT USE_ROCM is set -- MIOpen not found. Compiling without MIOpen support disabling MKLDNN because USE_MKLDNN is not set -- {fmt} version: 11.2.0 -- Build type: Release -- Using CPU-only version of Kineto -- Configuring Kineto dependency: -- KINETO_SOURCE_DIR = E:/PyTorch_Build/pytorch/third_party/kineto/libkineto -- KINETO_BUILD_TESTS = OFF -- KINETO_LIBRARY_TYPE = static CMake Deprecation Warning at third_party/kineto/libkineto/CMakeLists.txt:7 (cmake_minimum_required): Compatibility with CMake < 3.10 will be removed from a future version of CMake. Update the VERSION argument <min> value. Or, use the <min>...<max> syntax to tell CMake that the project requires at least <min> but has been updated to work with policies introduced by <max> or earlier. CMake Warning (dev) at third_party/kineto/libkineto/CMakeLists.txt:15 (find_package): Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules are removed. Run "cmake --help-policy CMP0148" for policy details. Use the cmake_policy command to set the policy and suppress this warning. This warning is for project developers. Use -Wno-dev to suppress it. -- Found PythonInterp: E:/Python310/python.exe (found version "3.10.10") -- CUDA_SOURCE_DIR = -- ROCM_SOURCE_DIR = -- CUPTI unavailable or disabled - not building GPU profilers -- Kineto: FMT_SOURCE_DIR = E:/PyTorch_Build/pytorch/third_party/fmt -- Kineto: FMT_INCLUDE_DIR = E:/PyTorch_Build/pytorch/third_party/fmt/include -- CUPTI_INCLUDE_DIR = /extras/CUPTI/include -- ROCTRACER_INCLUDE_DIR = /include/roctracer -- DYNOLOG_INCLUDE_DIR = E:/PyTorch_Build/pytorch/third_party/kineto/libkineto/third_party/dynolog/ -- IPCFABRIC_INCLUDE_DIR = E:/PyTorch_Build/pytorch/third_party/kineto/libkineto/third_party/dynolog//dynolog/src/ipcfabric/ -- Configured Kineto (CPU) -- Performing Test HAS/WD4624 -- Performing Test HAS/WD4624 - Success -- Performing Test HAS/WD4068 -- Performing Test HAS/WD4068 - Success -- Performing Test HAS/WD4067 -- Performing Test HAS/WD4067 - Success -- Performing Test HAS/WD4267 -- Performing Test HAS/WD4267 - Success -- Performing Test HAS/WD4661 -- Performing Test HAS/WD4661 - Success -- Performing Test HAS/WD4717 -- Performing Test HAS/WD4717 - Success -- Performing Test HAS/WD4244 -- Performing Test HAS/WD4244 - Success -- Performing Test HAS/WD4804 -- Performing Test HAS/WD4804 - Success -- Performing Test HAS/WD4273 -- Performing Test HAS/WD4273 - Success -- Performing Test HAS_WNO_STRINGOP_OVERFLOW -- Performing Test HAS_WNO_STRINGOP_OVERFLOW - Failed -- -- Note: when building with Visual Studio the build type is specified when building. -- For example: 'cmake --build . --config=Release -- Architecture: x64 -- Use the C++ compiler to compile (MI_USE_CXX=ON) -- -- Library name : mimalloc -- Version : 2.2.4 -- Build type : release -- C++ Compiler : C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/bin/Hostx64/x64/cl.exe -- Compiler flags : /Zc:__cplusplus -- Compiler defines : MI_CMAKE_BUILD_TYPE=release;MI_BUILD_RELEASE -- Link libraries : psapi;shell32;user32;advapi32;bcrypt -- Build targets : static -- CMake Error at CMakeLists.txt:1264 (add_subdirectory): add_subdirectory given source "torch/headeronly" which is not an existing directory. -- don't use NUMA -- Looking for backtrace -- Looking for backtrace - not found -- Could NOT find Backtrace (missing: Backtrace_LIBRARY Backtrace_INCLUDE_DIR) -- headers outputs: torch\csrc\inductor\aoti_torch\generated\c_shim_cuda.h not found torch\csrc\inductor\aoti_torch\generated\c_shim_cpu.h not found torch\csrc\inductor\aoti_torch\generated\c_shim_aten.h not found -- sources outputs: -- declarations_yaml outputs: -- Performing Test COMPILER_SUPPORTS_NO_AVX256_SPLIT -- Performing Test COMPILER_SUPPORTS_NO_AVX256_SPLIT - Failed -- Using ATen parallel backend: OMP disabling CUDA because USE_CUDA is set false -- Check size of long double -- Check size of long double - done -- Performing Test COMPILER_SUPPORTS_FLOAT128 -- Performing Test COMPILER_SUPPORTS_FLOAT128 - Failed -- Found OpenMP_C: -openmp:experimental (found version "2.0") -- Found OpenMP_CXX: -openmp:experimental (found version "2.0") -- Found OpenMP: TRUE (found version "2.0") -- Performing Test COMPILER_SUPPORTS_OPENMP -- Performing Test COMPILER_SUPPORTS_OPENMP - Success -- Performing Test COMPILER_SUPPORTS_OMP_SIMD -- Performing Test COMPILER_SUPPORTS_OMP_SIMD - Failed -- Performing Test COMPILER_SUPPORTS_WEAK_ALIASES -- Performing Test COMPILER_SUPPORTS_WEAK_ALIASES - Failed -- Performing Test COMPILER_SUPPORTS_BUILTIN_MATH -- Performing Test COMPILER_SUPPORTS_BUILTIN_MATH - Failed -- Performing Test COMPILER_SUPPORTS_SYS_GETRANDOM -- Performing Test COMPILER_SUPPORTS_SYS_GETRANDOM - Failed -- Configuring build for SLEEF-v3.8.0 Target system: Windows Target processor: Host system: Windows-10.0.26100 Host processor: AMD64 Detected C compiler: MSVC @ C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/bin/Hostx64/x64/cl.exe CMake: 4.1.0 Make program: C:/Program Files/Microsoft Visual Studio/2022/Community/MSBuild/Current/Bin/amd64/MSBuild.exe Crosscompiling SLEEF. Native build dir: -- Using option `/D_CRT_SECURE_NO_WARNINGS /D_CRT_NONSTDC_NO_DEPRECATE ` to compile libsleef -- Building shared libs : OFF -- Building static test bins: OFF -- MPFR : LIB_MPFR-NOTFOUND -- GMP : LIBGMP-NOTFOUND -- RT : -- FFTW3 : LIBFFTW3-NOTFOUND -- OPENSSL : -- SDE : SDE_COMMAND-NOTFOUND -- COMPILER_SUPPORTS_OPENMP : FALSE AT_INSTALL_INCLUDE_DIR include/ATen/core core header install: E:/PyTorch_Build/pytorch/build/aten/src/ATen/core/aten_interned_strings.h core header install: E:/PyTorch_Build/pytorch/build/aten/src/ATen/core/enum_tag.h core header install: E:/PyTorch_Build/pytorch/build/aten/src/ATen/core/TensorBody.h CMake Error: File E:/PyTorch_Build/pytorch/torch/_utils_internal.py does not exist. CMake Error at caffe2/CMakeLists.txt:241 (configure_file): configure_file Problem configuring file CMake Error: File E:/PyTorch_Build/pytorch/torch/csrc/api/include/torch/version.h.in does not exist. CMake Error at caffe2/CMakeLists.txt:246 (configure_file): configure_file Problem configuring file CMake Error at caffe2/CMakeLists.txt:1398 (add_subdirectory): The source directory E:/PyTorch_Build/pytorch/torch does not contain a CMakeLists.txt file. CMake Warning at CMakeLists.txt:1285 (message): Generated cmake files are only fully tested if one builds with system glog, gflags, and protobuf. Other settings may generate files that are not well tested. CMake Warning at CMakeLists.txt:1349 (message): Generated cmake files are only available when building shared libs. -- -- ******** Summary ******** -- General: -- CMake version : 4.1.0 -- CMake command : E:/Python310/Lib/site-packages/cmake/data/bin/cmake.exe -- System : Windows -- C++ compiler : C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/bin/Hostx64/x64/cl.exe -- C++ compiler id : MSVC -- C++ compiler version : 19.44.35215.0 -- Using ccache if found : OFF -- CXX flags : /DWIN32 /D_WINDOWS /EHsc /Zc:__cplusplus /MP /bigobj /FS /utf-8 -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273 -- Shared LD flags : /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 -- Static LD flags : /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 -- Module LD flags : /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 -- Build type : Release -- Compile definitions : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;_CRT_SECURE_NO_DEPRECATE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS;EXPORT_AOTI_FUNCTIONS;WIN32_LEAN_AND_MEAN;_UCRT_LEGACY_INFINITY;NOMINMAX;USE_MIMALLOC -- CMAKE_PREFIX_PATH : E:\Python310\Lib\site-packages -- CMAKE_INSTALL_PREFIX : E:/PyTorch_Build/pytorch/torch -- USE_GOLD_LINKER : OFF -- -- TORCH_VERSION : 2.9.0 -- BUILD_STATIC_RUNTIME_BENCHMARK: OFF -- BUILD_BINARY : OFF -- BUILD_CUSTOM_PROTOBUF : ON -- Protobuf compiler : -- Protobuf includes : -- Protobuf libraries : -- BUILD_PYTHON : True -- Python version : 3.10.10 -- Python executable : E:\Python310\python.exe -- Python library : E:/Python310/libs/python310.lib -- Python includes : E:/Python310/include -- Python site-package : E:\Python310\Lib\site-packages -- BUILD_SHARED_LIBS : OFF -- CAFFE2_USE_MSVC_STATIC_RUNTIME : ON -- BUILD_TEST : True -- BUILD_JNI : OFF -- BUILD_MOBILE_AUTOGRAD : OFF -- BUILD_LITE_INTERPRETER: OFF -- INTERN_BUILD_MOBILE : -- TRACING_BASED : OFF -- USE_BLAS : 0 -- USE_LAPACK : 0 -- USE_ASAN : OFF -- USE_TSAN : OFF -- USE_CPP_CODE_COVERAGE : OFF -- USE_CUDA : OFF -- USE_XPU : OFF -- USE_ROCM : OFF -- BUILD_NVFUSER : -- USE_EIGEN_FOR_BLAS : ON -- USE_EIGEN_FOR_SPARSE : OFF -- USE_FBGEMM : OFF -- USE_KINETO : ON -- USE_GFLAGS : OFF -- USE_GLOG : OFF -- USE_LITE_PROTO : OFF -- USE_PYTORCH_METAL : OFF -- USE_PYTORCH_METAL_EXPORT : OFF -- USE_MPS : OFF -- CAN_COMPILE_METAL : -- USE_MKL : OFF -- USE_MKLDNN : OFF -- USE_UCC : OFF -- USE_ITT : OFF -- USE_XCCL : OFF -- USE_NCCL : OFF -- Found NVSHMEM : -- USE_NNPACK : OFF -- USE_NUMPY : ON -- USE_OBSERVERS : ON -- USE_OPENCL : OFF -- USE_OPENMP : ON -- USE_MIMALLOC : ON -- USE_MIMALLOC_ON_MKL : OFF -- USE_VULKAN : OFF -- USE_PROF : OFF -- USE_PYTORCH_QNNPACK : OFF -- USE_XNNPACK : OFF -- USE_DISTRIBUTED : OFF -- Public Dependencies : -- Private Dependencies : Threads::Threads;cpuinfo;fp16;caffe2::openmp;fmt::fmt-header-only;kineto -- Public CUDA Deps. : -- Private CUDA Deps. : -- USE_COREML_DELEGATE : OFF -- BUILD_LAZY_TS_BACKEND : ON -- USE_ROCM_KERNEL_ASSERT : OFF -- Performing Test HAS_WMISSING_PROTOTYPES -- Performing Test HAS_WMISSING_PROTOTYPES - Failed -- Performing Test HAS_WERROR_MISSING_PROTOTYPES -- Performing Test HAS_WERROR_MISSING_PROTOTYPES - Failed -- Configuring incomplete, errors occurred! (.venv) PS E:\PyTorch_Build\pytorch> (.venv) PS E:\PyTorch_Build\pytorch> # 7. 验证安装 (.venv) PS E:\PyTorch_Build\pytorch> & "E:\Python310\python.exe" -c @" >> import torch >> print(f'PyTorch版本: {torch.__version__}') >> print(f'CUDA可用: {torch.cuda.is_available()}') >> print(f'CUDA版本: {torch.version.cuda}') >> print(f'cuDNN版本: {torch.backends.cudnn.version()}') >> "@ Traceback (most recent call last): File "<string>", line 1, in <module> File "E:\Python310\lib\site-packages\torch-2.9.0a0+git2d31c3d-py3.10-win-amd64.egg\torch\__init__.py", line 281, in <module> _load_dll_libraries() File "E:\Python310\lib\site-packages\torch-2.9.0a0+git2d31c3d-py3.10-win-amd64.egg\torch\__init__.py", line 277, in _load_dll_libraries raise err OSError: [WinError 126] 找不到指定的模块。 Error loading "E:\Python310\lib\site-packages\torch-2.9.0a0+git2d31c3d-py3.10-win-amd64.egg\torch\lib\aoti_custom_ops.dll" or one of its dependencies. (.venv) PS E:\PyTorch_Build\pytorch> Set-Content -Path manual_build.ps1 -Value (Get-Content manual_build.ps1) Get-Content: Cannot find path 'E:\PyTorch_Build\pytorch\manual_build.ps1' because it does not exist. (.venv) PS E:\PyTorch_Build\pytorch> .\manual_build.ps1 (.venv) PS E:\PyTorch_Build\pytorch> # 安装 Docker Desktop (.venv) PS E:\PyTorch_Build\pytorch> Invoke-WebRequest -Uri "https://desktop.docker.com/win/main/amd64/Docker%20Desktop%20Installer.exe" -OutFile "$env:TEMP\docker_install.exe" (.venv) PS E:\PyTorch_Build\pytorch> Start-Process -FilePath "$env:TEMP\docker_install.exe" -ArgumentList "install --quiet" -Wait “
最新发布
09-02
评论 1
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

10km

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值