Sentences repository

本文探讨了程序性能受资源共享及输入数据的影响,并介绍了通过分析预测性能的方法。同时,深入讨论了数据结构的基础概念,包括不同类型的树结构及其在算法中的应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

In an environment where resources are being shared, even the same program can have varying performance characteristics at two different times. Second, many programs are extremely sensitive to their input data, and performance might fluctuate wildly depending on the input.

one may run much more efficiently on one particular kind of input and the second may run efficiently under other circumstances.

we can often use approximate analytic results in conjunction with empirical studies to predict performance accurately.

writing programs that exclusively process bits would be tiresome indeed

All of our data structures are comprised of objects and references to objects
All of our data structures consist of objects and references to objects

Our primary goal is to lay the groundwork for the development

the higher-level constructs that will serve as the basis for most of the algorithms

It is customary to refer to these basic types by their Java names

we defer consideration of character data to Section 3.6

We use a fixed number of bits to represent numbers

choosing from among the types int, long, short, or byte for integers and from among float or double for floating-point numbers

we think of the type of the data more in terms of the needs of the program than the capabilities of the machine

the operations that we perform on them

a list of sets of values (primitive or other types) and associated operations (methods).

Many of the operations associated with standard data types (for example, the arithmetic operations) are built into the Java language
All the methods associated with this Iterface are used to operate on Integer data.

These diagrams show examples of a binary tree (top left), a ternary tree (top right), a rooted tree (bottom left), and a free tree (bottom right).

A rooted tree is one where we designate one node as the root of a tree

even though this convention seems unnatural at first

father children sibling

That is, a binary tree is a special type of ordered tree, an ordered tree is a special type of rooted tree, and a rooted tree is a special type of free tree

The different types of trees arise naturally in various applications

This definition makes it plain that ...

we are working with just one concrete realization of that abstraction

This alternative is analogous to a doubly linked list

The binary-tree representation depicted in ...

Any one of these conditions is necessary and sufficient to prove the other three

 knowing different tree abstractions is often an essential ingredient in finding an efficient algorithm and corresponding data structure for a given problem
 
 we also often profit from working with the proper tree abstraction
 
 it serves as the basis for many basic algorithms for processing graphs
 
 inserting the element into the vacated position
 
 E-MAIL CONFIDENTIALITY NOTICE: This e-mail, including attachments, may include confidential patient health information and/or proprietary information, and is intended only for the use of the individual or entity to which it is addressed. If the reader of this e-mail is not the intended recipient or his or her authorized agent, the reader is hereby notified that any dissemination, distribution or copying of this e-mail is prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and delete this e-mail immediately.

but there is a hidden cost that is interesting to consider

we defer considering the implementation in detail until Chapter 10

The main drawback to CGI is that it must run a new copy of the CGI-aware program
for each request. This is a relatively expensive process that can bog down
high-volume sites where thousands of requests are serviced per minute. Another
drawback is that CGI programs tend to be platform dependent. A CGI program
written for one operating system may not run on another./

Sun’s Java Servlet platform directly addresses the two main drawbacks of CGI programs.
First, servlets offer better performance and utilization of resources than
conventional CGI programs. Second, the write-once, run-anywhere nature of Java
means that servlets are portable between operating systems that have a Java Virtual
Machine (JVM).

Cookies and URL rewriting are two common ways to keep track of users between requests.

Once the servlet has been created, using it for additional requests incurs
very little overhead./
While Java servlets are a big step up from CGI programs, they are not a panacea.

This chapter explores the Struts framework in depth and highlights the benefits
Struts can bring to your development efforts.

Choosing a web application framework should not be a casual decision. Many
people will use this book, and especially this chapter, as part of evaluating Struts
for their project. Accordingly, we conclude this chapter with a candid look at the
strengths and weaknesses of the Struts framework and address concerns regarding
overall performance. Struts is designed for professional developers. To make
informed decisions, professionals need to be aware of both a tool’s capabilities
and its limitations./

Developing for the web, while rewarding, brings its own set of
challenges. Let’s take a quick look at what makes web development so challenging.

Input may even be hostile and contrived to harm the application.


Since there are so many obstacles
to writing robust web applications, using a framework is vital, lest your application
become an endless series of workarounds and kluges.

drawback shortcoming weakness

Struts has already outgrown its mailing list.

This tells us that the official reference copy of this document’s DTD can be found
at the indicated URL. 

### 使用 `Tokenizer` 处理句子 在自然语言处理(NLP)中,`Tokenizer` 是一种用于将文本数据转换为数值表示形式的关键工具。这一步骤至关重要,因为机器学习模型无法直接理解原始文本;相反,它们需要能够解释的数字输入。 #### 文本预处理与分词 为了准备文本以便进一步分析或建模,通常会先执行一些初步操作,比如去除停用词、标点符号以及标准化大小写等[^5]。接着就是实际的分词过程——即将连续的文字流切分成独立单元(即token)。不同的场景可能适合不同类型的分词器: - **基于规则的方法**:如 NLTK 库里的 TreebankWordTokenizer 或 PunktWordTokenizer,这些方法利用预先定义好的模式来进行切割。 - **正则表达式驱动的方式**:当遇到复杂情况时,可以采用 regexp_tokenize 函数自定义匹配规则实现更精细控制[^1]。 - **深度学习框架自带的功能**:像 TensorFlow 中提供的 tf.keras.preprocessing.text.Tokenizer 类,则提供了更加灵活高效的接口来管理大规模语料库的数据转换需求。 下面是具体应用实例展示如何通过 Keras 的 Tokenizer 对象对一组句子进行编码: ```python from tensorflow.keras.preprocessing.text import Tokenizer sentences = [ "The cat sat on the mat.", "Dogs are friendly animals." ] # 创建并配置 Tokenizer 实例 tokenizer = Tokenizer(num_words=100) # 构建词汇表 tokenizer.fit_on_texts(sentences) # 转换为整数序列 sequences = tokenizer.texts_to_sequences(sentences) print(sequences) ``` 上述代码片段展示了怎样创建一个简单的 Tokenizer 并将其应用于几个样本句子上。这里调用了 fit_on_texts 方法让对象学会根据给定材料构建内部字典;随后 texts_to_sequences 则负责把每句话映射成相应位置编号构成的新列表。 #### 后续步骤 一旦完成了 tokenization 过程之后,还可以考虑其他技术手段继续深化 NLP 流水线的工作流程,例如填充/截断使得所有样本长度一致、使用 Embedding 层获取分布式特征表示等等[^4]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值