node-llama-cpp开源程序使用 llama.cpp 的 node.js 绑定在计算机上本地运行 AI 模型。在生成级别的模型输出上强制实施 JSON 架构-优快云博客

本文链接：https://blog.youkuaiyun.com/struggle2025/article/details/148354948

一、软件介绍

文末提供程序和源码下载

node-llama-cpp开源程序使用 llama.cpp 的 node.js 绑定在计算机上本地运行 AI 模型。在生成级别的模型输出上强制实施 JSON 架构。

二、Features 特征

Run LLMs locally on your machine
在您的机器上本地运行LLMs
Metal, CUDA and Vulkan supportMetal、CUDA 和 Vulkan 支持
Pre-built binaries are provided, with a fallback to building from source without node-gyp or Python
提供了预构建的二进制文件，并回退到在没有 node-gyp Python 的情况下从源代码构建
Adapts to your hardware automatically, no need to configure anything
自动适应您的硬件，无需配置任何内容
A Complete suite of everything you need to use LLMs in your projects
项目所需用LLMs到的一切的全套
Use the CLI to chat with a model without writing any code使用 CLI 与模型聊天，无需编写任何代码
Up-to-date with the latest llama.cpp. Download and compile the latest release with a single CLI command
与最新的 llama.cpp .使用单个 CLI 命令下载并编译最新版本
Enforce a model to generate output in a parseable format, like JSON, or even force it to follow a specific JSON schema
强制模型以可解析的格式（如 JSON）生成输出，甚至强制它遵循特定的 JSON 架构
Provide a model with functions it can call on demand to retrieve information or perform actions
为模型提供可按需调用以检索信息或执行作的函数
Embedding and reranking support嵌入和重新排名支持
Safe against special token injection attacks安全抵御特殊的令牌注入攻击

三、Try It Without Installing无需安装

文末提供全部版本下载

hat with a model in your terminal using a single command:
使用单个命令与终端中的模型聊天：

npx -y node-llama-cpp chat

四、Installation 安装

npm install node-llama-cpp

This package comes with pre-built binaries for macOS, Linux and Windows.
此软件包附带适用于 macOS、Linux 和 Windows 的预构建二进制文件。

If binaries are not available for your platform, it'll fallback to download a release of llama.cpp and build it from source with cmake. To disable this behavior, set the environment variable NODE_LLAMA_CPP_SKIP_DOWNLOAD to true.
如果二进制文件不适用于您的平台，它将回退以下载的 llama.cpp 版本并使用从源代码构建它 cmake 。要禁用此行为，请将环境变量 NODE_LLAMA_CPP_SKIP_DOWNLOAD 设置为 true 。

五、Usage 用法

Usage 用法

import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaChatSession} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const llama = await getLlama();
const model = await llama.loadModel({
    modelPath: path.join(__dirname, "models", "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf")
});
const context = await model.createContext();
const session = new LlamaChatSession({
    contextSequence: context.getSequence()
});


const q1 = "Hi there, how are you?";
console.log("User: " + q1);

const a1 = await session.prompt(q1);
console.log("AI: " + a1);


const q2 = "Summarize what you said";
console.log("User: " + q2);

const a2 = await session.prompt(q2);
console.log("AI: " + a2);