client_side_validations

本文介绍了Rails3下比较不错的前台校验插件Client Side Validations,包括如何安装、初始化及基本用法,并展示了如何在前端实现校验效果。

 

client_side_validations是一个rails3下比较不错的前台校验插件。 它能把模型中的校验做为前台的校验,更支持自定义校验,支持像SimpleForm 和Formtastic等的Formbuilders。

安装

Gemfile中加入一下代码:

gem 'client_side_validations', '~> 3.0.2'  

然后运行bundle install

初始化

rails g client_side_validations:install  

以上这个命令会拷贝两个文件到项目中,分别是:

config/initializers/client_side_validations.rb
public/javascripts/rails.validations.js

用法:

在layout中包含rails.validations.js:

<%= javascript_include_tag 'jquery', 'rails.validations'-%>

比如有模型Book, 后端校验如下:

class Book < ActiveRecord::Base
    validates :name. :presence => true
end

打开form的验证开关, 就可以看到js的校验效果了:

<%= semantic_form_for @book, :validate => true do |form| -%>
	<%= form.inputs do %>
		<%= form.input :name %>
	<% end %>

	<%= form.buttons do %>
		<%= form.commit_button true %>
	<% end %>
<% end %>  

如果不想使用前台校验,设置 :validate => false 即可。

接下来你就可以看看效果了。更具体的文档请看github项目地址:https://github.com/bcardarella/client_side_validations

 

 

文章转自  http://www.thoughtrails.com/gems/client_side_validations

 

``` # SPDX-License-Identifier: Apache-2.0 # SPDX-FileCopyrightText: Copyright contributors to the vLLM project """Benchmark offline inference throughput.""" import argparse import dataclasses import json import os import random import time import warnings from typing import Any, Optional, Union import torch, torch_npu import uvloop from tqdm import tqdm from transformers import AutoModelForCausalLM, AutoTokenizer, PreTrainedTokenizerBase # 从本地模块导入数据集和工具函数 from benchmark_dataset import ( AIMODataset, BurstGPTDataset, ConversationDataset, InstructCoderDataset, RandomDataset, SampleRequest, ShareGPTDataset, SonnetDataset, VisionArenaDataset, ) from benchmark_utils import convert_to_pytorch_benchmark_format, write_to_json from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs from vllm.entrypoints.openai.api_server import ( build_async_engine_client_from_engine_args, ) from vllm.inputs import TextPrompt, TokensPrompt from vllm.lora.request import LoRARequest from vllm.outputs import RequestOutput from vllm.sampling_params import BeamSearchParams from vllm.utils import FlexibleArgumentParser, merge_async_iterators # 禁用分词器并行 os.environ["TOKENIZERS_PARALLELISM"] = "false" # def run_vllm( # requests: list[SampleRequest], # n: int, # engine_args: EngineArgs, # disable_detokenize: bool = False, # ) -> tuple[float, Optional[list[RequestOutput]]]: # """使用vLLM同步引擎进行推理。 # 参数: # requests: 请求列表,每个请求包含提示和期望输出长度 # n: 每个提示生成的序列数 # engine_args: vLLM引擎参数 # disable_detokenize: 是否禁用detokenization(即不将token id转换为字符串) # 返回: # (elapsed_time, outputs): 耗时和输出结果 # """ # from vllm import LLM, SamplingParams # # 初始化LLM引擎 # llm = LLM(**dataclasses.asdict(engine_args)) # # 验证每个请求的长度不超过模型最大长度 # assert all( # llm.llm_engine.model_config.max_model_len # >= (request.prompt_len + request.expected_output_len) # for request in requests # ), ( # "Please ensure that max_model_len is greater than the sum of" # " prompt_len and expected_output_len for all requests." # ) # # Add the requests to the engine. # # 准备输入:提示和采样参数 # prompts: list[Union[TextPrompt, TokensPrompt]] = [] # sampling_params: list[SamplingParams] = [] # for request in requests: # prompts.append( # TokensPrompt( # prompt_token_ids=request.prompt["prompt_token_ids"], # multi_modal_data=request.multi_modal_data, # ) # if "prompt_token_ids" in request.prompt # else TextPrompt( # prompt=request.prompt, multi_modal_data=request.multi_modal_data # ) # ) # sampling_params.append( # SamplingParams( # n=n, # temperature=1.0, # top_p=1.0, # ignore_eos=True, # max_tokens=request.expected_output_len, # detokenize=not disable_detokenize, # ) # ) # lora_requests: Optional[list[LoRARequest]] = None # if engine_args.enable_lora: # lora_requests = [request.lora_request for request in requests] # use_beam_search = False # outputs = None # if not use_beam_search: # start = time.perf_counter() # outputs = llm.generate( # prompts, sampling_params, lora_request=lora_requests, use_tqdm=True # ) # end = time.perf_counter() # else: # assert lora_requests is None, "BeamSearch API does not support LoRA" # prompts = [request.prompt for request in requests] # # output_len should be the same for all requests. # output_len = requests[0].expected_output_len # for request in requests: # assert request.expected_output_len == output_len # start = time.perf_counter() # llm.beam_search( # prompts, # BeamSearchParams( # beam_width=n, # max_tokens=output_len, # ignore_eos=True, # ), # ) # end = time.perf_counter() # return end - start, outputs # def run_vllm_chat( # requests: list[SampleRequest], # n: int, # engine_args: EngineArgs, # disable_detokenize: bool = False, # ) -> tuple[float, list[RequestOutput]]: # """ # Run vLLM chat benchmark. This function is recommended ONLY for benchmarking # multimodal models as it properly handles multimodal inputs and chat # formatting. For non-multimodal models, use run_vllm() instead. # """ # from vllm import LLM, SamplingParams # llm = LLM(**dataclasses.asdict(engine_args)) # assert all( # llm.llm_engine.model_config.max_model_len # >= (request.prompt_len + request.expected_output_len) # for request in requests # ), ( # "Please ensure that max_model_len is greater than the sum of " # "prompt_len and expected_output_len for all requests." # ) # prompts = [] # sampling_params: list[SamplingParams] = [] # for request in requests: # prompts.append(request.prompt) # sampling_params.append( # SamplingParams( # n=n, # temperature=1.0, # top_p=1.0, # ignore_eos=True, # max_tokens=request.expected_output_len, # detokenize=not disable_detokenize, # ) # ) # start = time.perf_counter() # outputs = llm.chat(prompts, sampling_params, use_tqdm=True) # end = time.perf_counter() # return end - start, outputs # async def run_vllm_async( # requests: list[SampleRequest], # n: int, # engine_args: AsyncEngineArgs, # disable_frontend_multiprocessing: bool = False, # disable_detokenize: bool = False, # ) -> float: # from vllm import SamplingParams # async with build_async_engine_client_from_engine_args( # engine_args, disable_frontend_multiprocessing # ) as llm: # model_config = await llm.get_model_config() # assert all( # model_config.max_model_len # >= (request.prompt_len + request.expected_output_len) # for request in requests # ), ( # "Please ensure that max_model_len is greater than the sum of" # " prompt_len and expected_output_len for all requests." # ) # # Add the requests to the engine. # prompts: list[Union[TextPrompt, TokensPrompt]] = [] # sampling_params: list[SamplingParams] = [] # lora_requests: list[Optional[LoRARequest]] = [] # for request in requests: # prompts.append( # TokensPrompt( # prompt_token_ids=request.prompt["prompt_token_ids"], # multi_modal_data=request.multi_modal_data, # ) # if "prompt_token_ids" in request.prompt # else TextPrompt( # prompt=request.prompt, multi_modal_data=request.multi_modal_data # ) # ) # sampling_params.append( # SamplingParams( # n=n, # temperature=1.0, # top_p=1.0, # ignore_eos=True, # max_tokens=request.expected_output_len, # detokenize=not disable_detokenize, # ) # ) # lora_requests.append(request.lora_request) # generators = [] # start = time.perf_counter() # for i, (prompt, sp, lr) in enumerate( # zip(prompts, sampling_params, lora_requests) # ): # generator = llm.generate(prompt, sp, lora_request=lr, request_id=f"test{i}") # generators.append(generator) # all_gens = merge_async_iterators(*generators) # async for i, res in all_gens: # pass # end = time.perf_counter() # return end - start def run_hf( requests: list[SampleRequest], model: str, tokenizer: PreTrainedTokenizerBase, n: int, max_batch_size: int, trust_remote_code: bool, disable_detokenize: bool = False, ) -> float: device = torch.device("npu:0" if torch.npu.is_available() else "cpu") torch.npu.set_device(device) llm = AutoModelForCausalLM.from_pretrained( model, torch_dtype=torch.bfloat16, trust_remote_code=trust_remote_code ).to(device) if llm.config.model_type == "llama": # To enable padding in the HF backend. tokenizer.pad_token = tokenizer.eos_token # llm = llm.cuda() pbar = tqdm(total=len(requests)) start = time.perf_counter() batch: list[str] = [] max_prompt_len = 0 max_output_len = 0 for i in range(len(requests)): prompt = requests[i].prompt prompt_len = requests[i].prompt_len output_len = requests[i].expected_output_len # Add the prompt to the batch. batch.append(prompt) max_prompt_len = max(max_prompt_len, prompt_len) max_output_len = max(max_output_len, output_len) if len(batch) < max_batch_size and i != len(requests) - 1: # Check if we can add more requests to the batch. next_prompt_len = requests[i + 1].prompt_len next_output_len = requests[i + 1].expected_output_len if ( max(max_prompt_len, next_prompt_len) + max(max_output_len, next_output_len) ) <= 2048: # We can add more requests to the batch. continue # Generate the sequences. inputs = tokenizer(batch, return_tensors="pt", padding=True) # input_ids = inputs.input_ids # attention_mask = inputs.attention_mask # llm_outputs = llm.generate( # input_ids=input_ids, # attention_mask=attention_mask, # do_sample=True, # num_return_sequences=n, # temperature=1.0, # top_p=1.0, # use_cache=True, # max_new_tokens=max_output_len, # ) input_ids = inputs.input_ids.npu() attention_mask = inputs.attention_mask.npu() torch.npu.synchronize() start_time = time.time() llm_outputs = llm.generate( input_ids=input_ids, attention_mask=attention_mask, max_new_tokens=128, do_sample=True, repetition_penalty=1.5, no_repeat_ngram_size=3, ) if not disable_detokenize: # Include the decoding time. tokenizer.batch_decode(llm_outputs, skip_special_tokens=True) pbar.update(len(batch)) # Clear the batch. batch = [] max_prompt_len = 0 max_output_len = 0 end = time.perf_counter() return end - start # def run_mii( # requests: list[SampleRequest], # model: str, # tensor_parallel_size: int, # output_len: int, # ) -> float: # from mii import client, serve # llm = serve(model, tensor_parallel=tensor_parallel_size) # prompts = [request.prompt for request in requests] # start = time.perf_counter() # llm.generate(prompts, max_new_tokens=output_len) # end = time.perf_counter() # client = client(model) # client.terminate_server() # return end - start def save_to_pytorch_benchmark_format( args: argparse.Namespace, results: dict[str, Any] ) -> None: pt_records = convert_to_pytorch_benchmark_format( args=args, metrics={ "requests_per_second": [results["requests_per_second"]], "tokens_per_second": [results["tokens_per_second"]], }, extra_info={ k: results[k] for k in ["elapsed_time", "num_requests", "total_num_tokens"] }, ) if pt_records: # Don't use json suffix here as we don't want CI to pick it up pt_file = f"{os.path.splitext(args.output_json)[0]}.pytorch.json" write_to_json(pt_file, pt_records) def get_requests(args, tokenizer): # Common parameters for all dataset types. common_kwargs = { "dataset_path": args.dataset_path, "random_seed": args.seed, } sample_kwargs = { "tokenizer": tokenizer, "lora_path": args.lora_path, "max_loras": args.max_loras, "num_requests": args.num_prompts, "input_len": args.input_len, "output_len": args.output_len, } if args.dataset_path is None or args.dataset_name == "random": sample_kwargs["range_ratio"] = args.random_range_ratio sample_kwargs["prefix_len"] = args.prefix_len dataset_cls = RandomDataset elif args.dataset_name == "sharegpt": dataset_cls = ShareGPTDataset if args.backend == "vllm-chat": sample_kwargs["enable_multimodal_chat"] = True elif args.dataset_name == "sonnet": assert tokenizer.chat_template or tokenizer.default_chat_template, ( "Tokenizer/model must have chat template for sonnet dataset." ) dataset_cls = SonnetDataset sample_kwargs["prefix_len"] = args.prefix_len sample_kwargs["return_prompt_formatted"] = True elif args.dataset_name == "burstgpt": dataset_cls = BurstGPTDataset elif args.dataset_name == "hf": common_kwargs["no_stream"] = args.no_stream if args.dataset_path in VisionArenaDataset.SUPPORTED_DATASET_PATHS: dataset_cls = VisionArenaDataset common_kwargs["dataset_subset"] = None common_kwargs["dataset_split"] = "train" sample_kwargs["enable_multimodal_chat"] = True elif args.dataset_path in InstructCoderDataset.SUPPORTED_DATASET_PATHS: dataset_cls = InstructCoderDataset common_kwargs["dataset_split"] = "train" elif args.dataset_path in ConversationDataset.SUPPORTED_DATASET_PATHS: dataset_cls = ConversationDataset common_kwargs["dataset_subset"] = args.hf_subset common_kwargs["dataset_split"] = args.hf_split sample_kwargs["enable_multimodal_chat"] = True elif args.dataset_path in AIMODataset.SUPPORTED_DATASET_PATHS: dataset_cls = AIMODataset common_kwargs["dataset_subset"] = None common_kwargs["dataset_split"] = "train" else: raise ValueError(f"Unknown dataset name: {args.dataset_name}") # Remove None values sample_kwargs = {k: v for k, v in sample_kwargs.items() if v is not None} return dataset_cls(**common_kwargs).sample(**sample_kwargs) def main(args: argparse.Namespace): if args.seed is None: args.seed = 0 print(args) random.seed(args.seed) # Sample the requests. tokenizer = AutoTokenizer.from_pretrained( args.tokenizer, trust_remote_code=True, local_files_only=True ) tokenizer.padding_side = "left" # 新增 requests = get_requests(args, tokenizer) is_multi_modal = any(request.multi_modal_data is not None for request in requests) request_outputs: Optional[list[RequestOutput]] = None # if args.backend == "vllm": # if args.async_engine: # elapsed_time = uvloop.run( # run_vllm_async( # requests, # args.n, # AsyncEngineArgs.from_cli_args(args), # args.disable_frontend_multiprocessing, # args.disable_detokenize, # ) # ) # else: # elapsed_time, request_outputs = run_vllm( # requests, # args.n, # EngineArgs.from_cli_args(args), # args.disable_detokenize, # ) if args.backend == "hf": assert args.tensor_parallel_size == 1 elapsed_time = run_hf( requests, args.model, tokenizer, args.n, args.hf_max_batch_size, args.trust_remote_code, args.disable_detokenize, ) # elif args.backend == "mii": # elapsed_time = run_mii( # requests, args.model, args.tensor_parallel_size, args.output_len # ) # elif args.backend == "vllm-chat": # elapsed_time, request_outputs = run_vllm_chat( # requests, args.n, EngineArgs.from_cli_args(args), args.disable_detokenize # ) else: raise ValueError(f"Unknown backend: {args.backend}") if request_outputs: # Note: with the vllm and vllm-chat backends, # we have request_outputs, which we use to count tokens. total_prompt_tokens = 0 total_output_tokens = 0 for ro in request_outputs: if not isinstance(ro, RequestOutput): continue total_prompt_tokens += ( len(ro.prompt_token_ids) if ro.prompt_token_ids else 0 ) total_output_tokens += sum(len(o.token_ids) for o in ro.outputs if o) total_num_tokens = total_prompt_tokens + total_output_tokens else: total_num_tokens = sum(r.prompt_len + r.expected_output_len for r in requests) total_output_tokens = sum(r.expected_output_len for r in requests) total_prompt_tokens = total_num_tokens - total_output_tokens if is_multi_modal and args.backend != "vllm-chat": print( "\033[91mWARNING\033[0m: Multi-modal request with " f"{args.backend} backend detected. The " "following metrics are not accurate because image tokens are not" " counted. See vllm-project/vllm/issues/9778 for details." ) # TODO(vllm-project/vllm/issues/9778): Count multi-modal token length. # vllm-chat backend counts the image tokens now print( f"Throughput: {len(requests) / elapsed_time:.2f} requests/s, " f"{total_num_tokens / elapsed_time:.2f} total tokens/s, " f"{total_output_tokens / elapsed_time:.2f} output tokens/s" ) print(f"Total num prompt tokens: {total_prompt_tokens}") print(f"Total num output tokens: {total_output_tokens}") # Output JSON results if specified if args.output_json: results = { "elapsed_time": elapsed_time, "num_requests": len(requests), "total_num_tokens": total_num_tokens, "requests_per_second": len(requests) / elapsed_time, "tokens_per_second": total_num_tokens / elapsed_time, } with open(args.output_json, "w") as f: json.dump(results, f, indent=4) save_to_pytorch_benchmark_format(args, results) def validate_args(args): """ Validate command-line arguments. """ # === Deprecation and Defaulting === if args.dataset is not None: warnings.warn( "The '--dataset' argument will be deprecated in the next release. " "Please use '--dataset-name' and '--dataset-path' instead.", stacklevel=2, ) args.dataset_path = args.dataset if not getattr(args, "tokenizer", None): args.tokenizer = args.model # === Backend Validation === valid_backends = {"vllm", "hf", "mii", "vllm-chat"} if args.backend not in valid_backends: raise ValueError(f"Unsupported backend: {args.backend}") # === Dataset Configuration === if not args.dataset and not args.dataset_path: print("When dataset path is not set, it will default to random dataset") args.dataset_name = "random" if args.input_len is None: raise ValueError("input_len must be provided for a random dataset") # === Dataset Name Specific Checks === # --hf-subset and --hf-split: only used # when dataset_name is 'hf' if args.dataset_name != "hf" and ( getattr(args, "hf_subset", None) is not None or getattr(args, "hf_split", None) is not None ): warnings.warn( "--hf-subset and --hf-split will be ignored \ since --dataset-name is not 'hf'.", stacklevel=2, ) elif args.dataset_name == "hf": if args.dataset_path in ( VisionArenaDataset.SUPPORTED_DATASET_PATHS.keys() | ConversationDataset.SUPPORTED_DATASET_PATHS ): assert args.backend == "vllm-chat", ( f"{args.dataset_path} needs to use vllm-chat as the backend." ) # noqa: E501 elif args.dataset_path in ( InstructCoderDataset.SUPPORTED_DATASET_PATHS | AIMODataset.SUPPORTED_DATASET_PATHS ): assert args.backend == "vllm", ( f"{args.dataset_path} needs to use vllm as the backend." ) # noqa: E501 else: raise ValueError(f"{args.dataset_path} is not supported by hf dataset.") # --random-range-ratio: only used when dataset_name is 'random' if args.dataset_name != "random" and args.random_range_ratio is not None: warnings.warn( "--random-range-ratio will be ignored since \ --dataset-name is not 'random'.", stacklevel=2, ) # --prefix-len: only used when dataset_name is 'random', 'sonnet', or not # set. if ( args.dataset_name not in {"random", "sonnet", None} and args.prefix_len is not None ): warnings.warn( "--prefix-len will be ignored since --dataset-name\ is not 'random', 'sonnet', or not set.", stacklevel=2, ) # === LoRA Settings === if getattr(args, "enable_lora", False) and args.backend != "vllm": raise ValueError("LoRA benchmarking is only supported for vLLM backend") if getattr(args, "enable_lora", False) and args.lora_path is None: raise ValueError("LoRA path must be provided when enable_lora is True") # === Backend-specific Validations === if args.backend == "hf" and args.hf_max_batch_size is None: raise ValueError("HF max batch size is required for HF backend") if args.backend != "hf" and args.hf_max_batch_size is not None: raise ValueError("HF max batch size is only for HF backend.") if ( args.backend in {"hf", "mii"} and getattr(args, "quantization", None) is not None ): raise ValueError("Quantization is only for vLLM backend.") if args.backend == "mii" and args.dtype != "auto": raise ValueError("dtype must be auto for MII backend.") if args.backend == "mii" and args.n != 1: raise ValueError("n must be 1 for MII backend.") if args.backend == "mii" and args.tokenizer != args.model: raise ValueError("Tokenizer must be the same as the model for MII backend.") # --data-parallel is not supported currently. # https://github.com/vllm-project/vllm/issues/16222 if args.data_parallel_size > 1: raise ValueError( "Data parallel is not supported in offline benchmark, \ please use benchmark serving instead" ) def create_argument_parser(): parser = FlexibleArgumentParser(description="Benchmark the throughput.") parser.add_argument( "--backend", type=str, choices=["vllm", "hf", "mii", "vllm-chat"], default="vllm", ) parser.add_argument( "--dataset-name", type=str, choices=["sharegpt", "random", "sonnet", "burstgpt", "hf"], help="Name of the dataset to benchmark on.", default="sharegpt", ) parser.add_argument( "--no-stream", action="store_true", help="Do not load the dataset in streaming mode.", ) parser.add_argument( "--dataset", type=str, default=None, help="Path to the ShareGPT dataset, will be deprecated in\ the next release. The dataset is expected to " "be a json in form of list[dict[..., conversations: " "list[dict[..., value: <prompt_or_response>]]]]", ) parser.add_argument( "--dataset-path", type=str, default=None, help="Path to the dataset" ) parser.add_argument( "--input-len", type=int, default=None, help="Input prompt length for each request", ) parser.add_argument( "--output-len", type=int, default=None, help="Output length for each request. Overrides the " "output length from the dataset.", ) parser.add_argument( "--n", type=int, default=1, help="Number of generated sequences per prompt." ) parser.add_argument( "--num-prompts", type=int, default=1000, help="Number of prompts to process." ) parser.add_argument( "--hf-max-batch-size", type=int, default=None, help="Maximum batch size for HF backend.", ) parser.add_argument( "--output-json", type=str, default=None, help="Path to save the throughput results in JSON format.", ) parser.add_argument( "--async-engine", action="store_true", default=False, help="Use vLLM async engine rather than LLM class.", ) parser.add_argument( "--disable-frontend-multiprocessing", action="store_true", default=False, help="Disable decoupled async engine frontend.", ) parser.add_argument( "--disable-detokenize", action="store_true", help=( "Do not detokenize the response (i.e. do not include " "detokenization time in the measurement)" ), ) # LoRA parser.add_argument( "--lora-path", type=str, default=None, help="Path to the LoRA adapters to use. This can be an absolute path, " "a relative path, or a Hugging Face model identifier.", ) parser.add_argument( "--prefix-len", type=int, default=None, help=f"Number of prefix tokens to be used in RandomDataset " "and SonnetDataset. For RandomDataset, the total input " "length is the sum of prefix-len (default: " f"{RandomDataset.DEFAULT_PREFIX_LEN}) and a random context length " "sampled from [input_len * (1 - range_ratio), " "input_len * (1 + range_ratio)]. For SonnetDataset, " f"prefix_len (default: {SonnetDataset.DEFAULT_PREFIX_LEN}) " "controls how much of the input is fixed lines versus " "random lines, but the total input length remains approximately " "input_len tokens.", ) # random dataset parser.add_argument( "--random-range-ratio", type=float, default=None, help=f"Range ratio (default : {RandomDataset.DEFAULT_RANGE_RATIO}) " "for sampling input/output length, " "used only for RandomDataset. Must be in the range [0, 1) to " "define a symmetric sampling range " "[length * (1 - range_ratio), length * (1 + range_ratio)].", ) # hf dtaset parser.add_argument( "--hf-subset", type=str, default=None, help="Subset of the HF dataset." ) parser.add_argument( "--hf-split", type=str, default=None, help="Split of the HF dataset." ) parser = AsyncEngineArgs.add_cli_args(parser) return parser if __name__ == "__main__": parser = create_argument_parser() args = parser.parse_args() if args.tokenizer is None: args.tokenizer = args.model validate_args(args) main(args) ``` 详细分析代码
08-22
import argparse import dataclasses import json import os import random import time import warnings from typing import Any, Optional, Union import torch, torch_npu import uvloop from tqdm import tqdm from transformers import AutoModelForCausalLM, AutoTokenizer, PreTrainedTokenizerBase, BitsAndBytesConfig # 从本地模块导入数据集和工具函数 from benchmark_dataset import ( AIMODataset, BurstGPTDataset, ConversationDataset, InstructCoderDataset, RandomDataset, SampleRequest, ShareGPTDataset, SonnetDataset, VisionArenaDataset, ) from benchmark_utils import convert_to_pytorch_benchmark_format, write_to_json from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs from vllm.entrypoints.openai.api_server import ( build_async_engine_client_from_engine_args, ) from vllm.inputs import TextPrompt, TokensPrompt from vllm.lora.request import LoRARequest from vllm.outputs import RequestOutput from vllm.sampling_params import BeamSearchParams from vllm.utils import FlexibleArgumentParser, merge_async_iterators # 禁用分词器并行 os.environ["TOKENIZERS_PARALLELISM"] = "false" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, # Support torch.float16, torch.float32, torch.bfloat16 bnb_4bit_quant_type="nf4", # # Only support `nf4` bnb_4bit_use_double_quant=False ) def run_hf( requests: list[SampleRequest], model: str, tokenizer: PreTrainedTokenizerBase, n: int, max_batch_size: int, trust_remote_code: bool, disable_detokenize: bool = False, ) -> float: device = torch.device("npu:0" if torch.npu.is_available() else "cpu") torch.npu.set_device(device) llm = AutoModelForCausalLM.from_pretrained( model, torch_dtype=torch.bfloat16, quantization_config=bnb_config, use_safetensors=True, trust_remote_code=True ).to(device) torch.npu.synchronize() if llm.config.model_type == "llama": # To enable padding in the HF backend. tokenizer.pad_token = tokenizer.eos_token pbar = tqdm(total=len(requests)) start = time.perf_counter() batch: list[str] = [] max_prompt_len = 0 max_output_len = 0 for i in range(len(requests)): prompt = requests[i].prompt prompt_len = requests[i].prompt_len output_len = requests[i].expected_output_len # Add the prompt to the batch. batch.append(prompt) max_prompt_len = max(max_prompt_len, prompt_len) max_output_len = max(max_output_len, output_len) if len(batch) < max_batch_size and i != len(requests) - 1: # Check if we can add more requests to the batch. next_prompt_len = requests[i + 1].prompt_len next_output_len = requests[i + 1].expected_output_len if ( max(max_prompt_len, next_prompt_len) + max(max_output_len, next_output_len) ) <= 2048: # We can add more requests to the batch. continue # inputs = tokenizer(batch, return_tensors="pt", padding=True) inputs = tokenizer(batch, return_tensors="pt", padding=True) input_ids = inputs.input_ids.npu() attention_mask = inputs.attention_mask.npu() torch.npu.synchronize() llm_outputs = llm.generate( input_ids=input_ids, attention_mask=attention_mask, max_new_tokens=32, do_sample=False, ) if not disable_detokenize: # Include the decoding time. tokenizer.batch_decode(llm_outputs, skip_special_tokens=True) pbar.update(len(batch)) # Clear the batch. batch = [] max_prompt_len = 0 max_output_len = 0 end = time.perf_counter() return end - start def save_to_pytorch_benchmark_format( args: argparse.Namespace, results: dict[str, Any] ) -> None: pt_records = convert_to_pytorch_benchmark_format( args=args, metrics={ "requests_per_second": [results["requests_per_second"]], "tokens_per_second": [results["tokens_per_second"]], }, extra_info={ k: results[k] for k in ["elapsed_time", "num_requests", "total_num_tokens"] }, ) if pt_records: # Don't use json suffix here as we don't want CI to pick it up pt_file = f"{os.path.splitext(args.output_json)[0]}.pytorch.json" write_to_json(pt_file, pt_records) def get_requests(args, tokenizer): # Common parameters for all dataset types. common_kwargs = { "dataset_path": args.dataset_path, "random_seed": args.seed, } sample_kwargs = { "tokenizer": tokenizer, "lora_path": args.lora_path, "max_loras": args.max_loras, "num_requests": args.num_prompts, "input_len": args.input_len, "output_len": args.output_len, } if args.dataset_path is None or args.dataset_name == "random": sample_kwargs["range_ratio"] = args.random_range_ratio sample_kwargs["prefix_len"] = args.prefix_len dataset_cls = RandomDataset elif args.dataset_name == "sharegpt": dataset_cls = ShareGPTDataset if args.backend == "vllm-chat": sample_kwargs["enable_multimodal_chat"] = True elif args.dataset_name == "sonnet": assert tokenizer.chat_template or tokenizer.default_chat_template, ( "Tokenizer/model must have chat template for sonnet dataset." ) dataset_cls = SonnetDataset sample_kwargs["prefix_len"] = args.prefix_len sample_kwargs["return_prompt_formatted"] = True elif args.dataset_name == "burstgpt": dataset_cls = BurstGPTDataset elif args.dataset_name == "hf": common_kwargs["no_stream"] = args.no_stream if args.dataset_path in VisionArenaDataset.SUPPORTED_DATASET_PATHS: dataset_cls = VisionArenaDataset common_kwargs["dataset_subset"] = None common_kwargs["dataset_split"] = "train" sample_kwargs["enable_multimodal_chat"] = True elif args.dataset_path in InstructCoderDataset.SUPPORTED_DATASET_PATHS: dataset_cls = InstructCoderDataset common_kwargs["dataset_split"] = "train" elif args.dataset_path in ConversationDataset.SUPPORTED_DATASET_PATHS: dataset_cls = ConversationDataset common_kwargs["dataset_subset"] = args.hf_subset common_kwargs["dataset_split"] = args.hf_split sample_kwargs["enable_multimodal_chat"] = True elif args.dataset_path in AIMODataset.SUPPORTED_DATASET_PATHS: dataset_cls = AIMODataset common_kwargs["dataset_subset"] = None common_kwargs["dataset_split"] = "train" else: raise ValueError(f"Unknown dataset name: {args.dataset_name}") # Remove None values sample_kwargs = {k: v for k, v in sample_kwargs.items() if v is not None} return dataset_cls(**common_kwargs).sample(**sample_kwargs) def main(args: argparse.Namespace): if args.seed is None: args.seed = 0 print(args) random.seed(args.seed) # Sample the requests. print("*****args.tokenizer:", args.tokenizer) tokenizer = AutoTokenizer.from_pretrained( args.tokenizer, trust_remote_code=True, local_files_only=True, use_fast=True ) tokenizer.padding_side = "left" # 新增 requests = get_requests(args, tokenizer) is_multi_modal = any(request.multi_modal_data is not None for request in requests) request_outputs: Optional[list[RequestOutput]] = None if args.backend == "hf": assert args.tensor_parallel_size == 1 elapsed_time = run_hf( requests, args.model, tokenizer, args.n, args.hf_max_batch_size, args.trust_remote_code, args.disable_detokenize, ) else: raise ValueError(f"Unknown backend: {args.backend}") total_num_tokens = sum(r.prompt_len + r.expected_output_len for r in requests) total_output_tokens = sum(r.expected_output_len for r in requests) total_prompt_tokens = total_num_tokens - total_output_tokens print( f"Throughput: {len(requests) / elapsed_time:.2f} requests/s, " f"{total_num_tokens / elapsed_time:.2f} total tokens/s, " f"{total_output_tokens / elapsed_time:.2f} output tokens/s" ) print(f"Total num prompt tokens: {total_prompt_tokens}") print(f"Total num output tokens: {total_output_tokens}") # Output JSON results if specified if args.output_json: results = { "elapsed_time": elapsed_time, "num_requests": len(requests), "total_num_tokens": total_num_tokens, "requests_per_second": len(requests) / elapsed_time, "tokens_per_second": total_num_tokens / elapsed_time, } with open(args.output_json, "w") as f: json.dump(results, f, indent=4) save_to_pytorch_benchmark_format(args, results) def validate_args(args): # === Deprecation and Defaulting === if args.dataset is not None: warnings.warn( "The '--dataset' argument will be deprecated in the next release. " "Please use '--dataset-name' and '--dataset-path' instead.", stacklevel=2, ) args.dataset_path = args.dataset if not getattr(args, "tokenizer", None): args.tokenizer = args.model # === Backend Validation === valid_backends = {"vllm", "hf", "mii", "vllm-chat"} if args.backend not in valid_backends: raise ValueError(f"Unsupported backend: {args.backend}") # === Dataset Configuration === if not args.dataset and not args.dataset_path: print("When dataset path is not set, it will default to random dataset") args.dataset_name = "random" if args.input_len is None: raise ValueError("input_len must be provided for a random dataset") # === Dataset Name Specific Checks === # --hf-subset and --hf-split: only used # when dataset_name is 'hf' if args.dataset_name != "hf" and ( getattr(args, "hf_subset", None) is not None or getattr(args, "hf_split", None) is not None ): warnings.warn( "--hf-subset and --hf-split will be ignored \ since --dataset-name is not 'hf'.", stacklevel=2, ) elif args.dataset_name == "hf": if args.dataset_path in ( VisionArenaDataset.SUPPORTED_DATASET_PATHS.keys() | ConversationDataset.SUPPORTED_DATASET_PATHS ): assert args.backend == "vllm-chat", ( f"{args.dataset_path} needs to use vllm-chat as the backend." ) # noqa: E501 elif args.dataset_path in ( InstructCoderDataset.SUPPORTED_DATASET_PATHS | AIMODataset.SUPPORTED_DATASET_PATHS ): assert args.backend == "vllm", ( f"{args.dataset_path} needs to use vllm as the backend." ) # noqa: E501 else: raise ValueError(f"{args.dataset_path} is not supported by hf dataset.") # --random-range-ratio: only used when dataset_name is 'random' if args.dataset_name != "random" and args.random_range_ratio is not None: warnings.warn( "--random-range-ratio will be ignored since \ --dataset-name is not 'random'.", stacklevel=2, ) # --prefix-len: only used when dataset_name is 'random', 'sonnet', or not # set. if ( args.dataset_name not in {"random", "sonnet", None} and args.prefix_len is not None ): warnings.warn( "--prefix-len will be ignored since --dataset-name\ is not 'random', 'sonnet', or not set.", stacklevel=2, ) # === LoRA Settings === if getattr(args, "enable_lora", False) and args.backend != "vllm": raise ValueError("LoRA benchmarking is only supported for vLLM backend") if getattr(args, "enable_lora", False) and args.lora_path is None: raise ValueError("LoRA path must be provided when enable_lora is True") # === Backend-specific Validations === if args.backend == "hf" and args.hf_max_batch_size is None: raise ValueError("HF max batch size is required for HF backend") if args.backend != "hf" and args.hf_max_batch_size is not None: raise ValueError("HF max batch size is only for HF backend.") if ( args.backend in {"hf", "mii"} and getattr(args, "quantization", None) is not None ): raise ValueError("Quantization is only for vLLM backend.") if args.backend == "mii" and args.dtype != "auto": raise ValueError("dtype must be auto for MII backend.") if args.backend == "mii" and args.n != 1: raise ValueError("n must be 1 for MII backend.") if args.backend == "mii" and args.tokenizer != args.model: raise ValueError("Tokenizer must be the same as the model for MII backend.") # --data-parallel is not supported currently. # https://github.com/vllm-project/vllm/issues/16222 if args.data_parallel_size > 1: raise ValueError( "Data parallel is not supported in offline benchmark, \ please use benchmark serving instead" ) def create_argument_parser(): parser = FlexibleArgumentParser(description="Benchmark the throughput.") parser.add_argument( "--backend", type=str, choices=["vllm", "hf", "mii", "vllm-chat"], default="vllm", ) parser.add_argument( "--dataset-name", type=str, choices=["sharegpt", "random", "sonnet", "burstgpt", "hf"], help="Name of the dataset to benchmark on.", default="sharegpt", ) parser.add_argument( "--no-stream", action="store_true", help="Do not load the dataset in streaming mode.", ) parser.add_argument( "--dataset", type=str, default=None, help="Path to the ShareGPT dataset, will be deprecated in\ the next release. The dataset is expected to " "be a json in form of list[dict[..., conversations: " "list[dict[..., value: <prompt_or_response>]]]]", ) parser.add_argument( "--dataset-path", type=str, default=None, help="Path to the dataset" ) parser.add_argument( "--input-len", type=int, default=None, help="Input prompt length for each request", ) parser.add_argument( "--output-len", type=int, default=None, help="Output length for each request. Overrides the " "output length from the dataset.", ) parser.add_argument( "--n", type=int, default=1, help="Number of generated sequences per prompt." ) parser.add_argument( "--num-prompts", type=int, default=1000, help="Number of prompts to process." ) parser.add_argument( "--hf-max-batch-size", type=int, default=None, help="Maximum batch size for HF backend.", ) parser.add_argument( "--output-json", type=str, default=None, help="Path to save the throughput results in JSON format.", ) parser.add_argument( "--async-engine", action="store_true", default=False, help="Use vLLM async engine rather than LLM class.", ) parser.add_argument( "--disable-frontend-multiprocessing", action="store_true", default=False, help="Disable decoupled async engine frontend.", ) parser.add_argument( "--disable-detokenize", action="store_true", help=( "Do not detokenize the response (i.e. do not include " "detokenization time in the measurement)" ), ) # LoRA parser.add_argument( "--lora-path", type=str, default=None, help="Path to the LoRA adapters to use. This can be an absolute path, " "a relative path, or a Hugging Face model identifier.", ) parser.add_argument( "--prefix-len", type=int, default=None, help=f"Number of prefix tokens to be used in RandomDataset " "and SonnetDataset. For RandomDataset, the total input " "length is the sum of prefix-len (default: " f"{RandomDataset.DEFAULT_PREFIX_LEN}) and a random context length " "sampled from [input_len * (1 - range_ratio), " "input_len * (1 + range_ratio)]. For SonnetDataset, " f"prefix_len (default: {SonnetDataset.DEFAULT_PREFIX_LEN}) " "controls how much of the input is fixed lines versus " "random lines, but the total input length remains approximately " "input_len tokens.", ) # random dataset parser.add_argument( "--random-range-ratio", type=float, default=None, help=f"Range ratio (default : {RandomDataset.DEFAULT_RANGE_RATIO}) " "for sampling input/output length, " "used only for RandomDataset. Must be in the range [0, 1) to " "define a symmetric sampling range " "[length * (1 - range_ratio), length * (1 + range_ratio)].", ) # hf dtaset parser.add_argument( "--hf-subset", type=str, default=None, help="Subset of the HF dataset." ) parser.add_argument( "--hf-split", type=str, default=None, help="Split of the HF dataset." ) parser = AsyncEngineArgs.add_cli_args(parser) return parser if __name__ == "__main__": parser = create_argument_parser() args = parser.parse_args() if args.tokenizer is None: args.tokenizer = args.model validate_args(args) main(args) ''' python nf4_bm.py \ --backend hf \ --dataset-name random \ --input-len 512 \ --output-len 128 \ --hf-max-batch-size 32 \ --model /models/z50051264/summary/Qwen2.5-7B-nf python nf4_bm.py \ --backend hf \ --dataset-name random \ --input-len 512 \ --output-len 128 \ --hf-max-batch-size 32 \ --model /models/z50051264/vllm-0.10.0/benchmarks/pangu-ok-model-nf4 ''' assert args.tensor_parallel_size == 1这里为什么assert,tensor_parallel_size==1,???如果我想增加并发数,应该怎么修改代码???
07-29
本资源为黑龙江省 2023 年水系分布数据,涵盖河流、沟渠、支流等线状要素,以及湖泊、水库、湿地等面状水体,提供完整的二维水文地理框架。数据以标准 GIS 格式发布,包含可编辑 MXD 工程文件、Shapefile 数据以及标准制图 TIF,适用于科研、规划设计、生态评估与地图制图等多类应用场景。 【数据内容】 1、水系线状要素(.shp) 包括主要河流、支流、人工渠道等 属性字段涵盖:名称、类别等 线要素拓扑规范,无断裂与悬挂节点 2、水体面状要素(.shp) 覆盖湖泊、水库、池塘、湿地等面状水体 属性包含:名称、类型等信息 几何边界经过平滑与精修,保证面积统计可靠 3、可编辑 MXD 工程文件(.mxd) 预设图层渲染、图例、比例尺、指北针与布局 支持用户根据自身制图需求快速调整样式、色带及标注规则 博主使用的 ArcMap 10.8 环境 4、标准成图 TIF(.tif) 专业级地图输出,含必要图廓与标注,可直接用于报告、论文与展示 输出分辨率高,适合印刷与电子稿应用 【数据技术说明】 坐标系统:WGS 84 地理坐标系 数据年份:2023 年 制作流程:基于卫星影像、水利普查数据和地理编码信息进行提取 → 几何校正 → 拓扑审查 → 分类整理 → 成图渲染 质量控制措施:保证线状与面状水体不重叠、不缺失;对水库与湖泊边界进行了人工校核,提高空间精度 【应用价值】 地表水资源调查与监测,水利、水文模型的空间输入,城市与农村规划中的水系布局分析,生态修复、水环境治理与湿地保护研究,教学、制图与地理信息可视化应用 【使用说明】 首次打开 MXD 文件前,请确保 Shapefile 和栅格文件均已解压至同一目录,以免出现路径丢失。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值