大O的经典英文解释:Plain English explanation of Big O

理解Big O 符号
本文用简单易懂的方式解释了Big O 符号的概念及其在算法复杂度分析中的应用,通过具体例子介绍了线性、对数、平方及阶乘等不同复杂度级别的含义。

Quick note, this is almost certainly confusing Big O notation (which is an upper bound) with Theta notation (which is a two-side bound). In my experience this is actually typical of discussions in non-academic settings. Apologies for any confusion caused.

The simplest definition I can give for Big-O notation is this:

Big-O notation is a relative representation of the complexity of an algorithm.

There are some important and deliberately chosen words in that sentence:

  • relative: you can only compare apples to apples. You can't compare an algorithm to do arithmetic multiplication to an algorithm that sorts a list of integers. But two algorithms that do arithmetic operations (one multiplication, one addition) will tell you something meaningful;
  • representation: Big-O (in its simplest form) reduces the comparison between algorithms to a single variable. That variable is chosen based on observations or assumptions. For example, sorting algorithms are typically compared based on comparison operations (comparing two nodes to determine their relative ordering). This assumes that comparison is expensive. But what if comparison is cheap but swapping is expensive? It changes the comparison; and
  • complexity: if it takes me one second to sort 10,000 elements how long will it take me to sort one million? Complexity in this instance is a relative measure to something else.

Come back and reread the above when you've read the rest.

The best example of Big-O I can think of is doing arithmetic. Take two numbers (123456 and 789012). The basic arithmetic operations we learnt in school were:

  • addition;
  • subtraction;
  • multiplication; and
  • division.

Each of these is an operation or a problem. A method of solving these is called an algorithm.

Addition is the simplest. You line the numbers up (to the right) and add the digits in a column writing the last number of that addition in the result. The 'tens' part of that number is carried over to the next column.

Let's assume that the addition of these numbers is the most expensive operation in this algorithm. It stands to reason that to add these two numbers together we have to add together 6 digits (and possibly carry a 7th). If we add two 100 digit numbers together we have to do 100 additions. If we add two 10,000 digit numbers we have to do 10,000 additions.

See the pattern? The complexity (being the number of operations) is directly proportional to the number of digitsn in the larger number. We call this O(n) or linear complexity.

Subtraction is similar (except you may need to borrow instead of carry).

Multiplication is different. You line the numbers up, take the first digit in the bottom number and multiply it in turn against each digit in the top number and so on through each digit. So to multiply our two 6 digit numbers we must do 36 multiplications. We may need to do as many as 10 or 11 column adds to get the end result too.

If we have two 100-digit numbers we need to do 10,000 multiplications and 200 adds. For two one million digit numbers we need to do one trillion (1012) multiplications and two million adds.

As the algorithm scales with n-squared, this is O(n2) or quadratic complexity. This is a good time to introduce another important concept:

We only care about the most significant portion of complexity.

The astute may have realized that we could express the number of operations as: n2 + 2n. But as you saw from our example with two numbers of a million digits apiece, the second term (2n) becomes insignificant (accounting for 0.0002% of the total operations by that stage).

The Telephone Book

The next best example I can think of is the telephone book, normally called the White Pages or similar but it'll vary from country to country. But I'm talking about the one that lists people by surname and then initials or first name, possibly address and then telephone numbers.

Now if you were instructing a computer to look up the phone number for "John Smith" in a telephone book that contains 1,000,000 names, what would you do? Ignoring the fact that you could guess how far in the S's started (let's assume you can't), what would you do?

A typical implementation might be to open up to the middle, take the 500,000th and compare it to "Smith". If it happens to be "Smith, John", we just got real lucky. Far more likely is that "John Smith" will be before or after that name. If it's after we then divide the last half of the phone book in half and repeat. If it's before then we divide the first half of the phone book in half and repeat. And so on.

This is called a binary search and is used every day in programming whether you realize it or not.

So if you want to find a name in a phone book of a million names you can actually find any name by doing this at most 20 times. In comparing search algorithms we decide that this comparison is our 'n'.

For a phone book of 3 names it takes 2 comparisons (at most).
For 7 it takes at most 3.
For 15 it takes 4.
...
For 1,000,000 it takes 20.

That is staggeringly good isn't it?

In Big-O terms this is O(log n) or logarithmic complexity. Now the logarithm in question could be ln (base e), log10, log2 or some other base. It doesn't matter it's still O(log n) just like O(2n2) and O(100n2) are still both O(n2).

It's worthwhile at this point to explain that Big O can be used to determine three cases with an algorithm:

  • Best Case: In the telephone book search, the best case is that we find the name in one comparison. This isO(1) or constant complexity;
  • Expected Case: As discussed above this is O(log n); and
  • Worst Case: This is also O(log n).

Normally we don't care about the best case. We're interested in the expected and worst case. Sometimes one or the other of these will be more important.

Back to the telephone book.

What if you have a phone number and want to find a name? The police have a reverse phone book but such lookups are denied to the general public. Or are they? Technically you can reverse lookup a number in an ordinary phone book. How?

You start at the first name and compare the number. If it's a match, great, if not, you move on to the next. You have to do it this way because the phone book is unordered (by phone number anyway).

So to find a name:

  • Best Case: O(1);
  • Expected Case: O(n) (for 500,000); and
  • Worst Case: O(n) (for 1,000,000).

The Travelling Salesman

This is quite a famous problem in computer science and deserves a mention. In this problem you have N towns. Each of those towns is linked to 1 or more other towns by a road of a certain distance. The Travelling Salesman problem is to find the shortest tour that visits every town.

Sounds simple? Think again.

If you have 3 towns A, B and C with roads between all pairs then you could go:

A -> B -> C
A -> C -> B
B -> C -> A
B -> A -> C
C -> A -> B
C -> B -> A

Well actually there's less than that because some of these are equivalent (A -> B -> C and C -> B -> A are equivalent, for example, because they use the same roads, just in reverse).

In actuality there are 3 possibilities.

Take this to 4 towns and you have (iirc) 12 possibilities. With 5 it's 60. 6 becomes 360.

This is a function of a mathematical operation called a factorial. Basically:

5! = 5 * 4 * 3 * 2 * 1 = 120
6! = 6 * 5 * 4 * 3 * 2 * 1 = 720
7! = 7 * 6 * 5 * 4 * 3 * 2 * 1 = 5040
...
25! = 25 * 24 * ... * 2 * 1 = 15,511,210,043,330,985,984,000,000
...
50! = 50 * 49 * ... * 2 * 1 = 3.04140932... × 10^64 

So the Big-O of the Travelling Salesman problem is O(n!) or factorial or combinatorial complexity.

By the time you get to 200 towns there isn't enough time left in the universe to solve the problem with traditional computers.

Something to think about.

Polynomial Time

Another point I wanted to make quick mention of is that any algorithm that has a complexity of O(na) is said to have polynomial complexity or is solvable in polynomial time.

Traditional computers can solve polynomial-time problems. Certain things are used in the world because of this. Public Key Cryptography is a prime example. It is computationally hard to find two prime factors of a very large number. If it wasn't, we couldn't use the public key systems we use.

Anyway, that's it for my (hopefully plain English) explanation of Big O (revised).

随着信息技术在管理上越来越深入而广泛的应用,作为学校以及一些培训机构,都在用信息化战术来部署线上学习以及线上考试,可以与线下的考试有机的结合在一起,实现基于SSM的小码创客教育教学资源库的设计与实现在技术上已成熟。本文介绍了基于SSM的小码创客教育教学资源库的设计与实现的开发全过程。通过分析企业对于基于SSM的小码创客教育教学资源库的设计与实现的需求,创建了一个计算机管理基于SSM的小码创客教育教学资源库的设计与实现的方案。文章介绍了基于SSM的小码创客教育教学资源库的设计与实现的系统分析部分,包括可行性分析等,系统设计部分主要介绍了系统功能设计和数据库设计。 本基于SSM的小码创客教育教学资源库的设计与实现有管理员,校长,教师,学员四个角色。管理员可以管理校长,教师,学员等基本信息,校长角色除了校长管理之外,其他管理员可以操作的校长角色都可以操作。教师可以发布论坛,课件,视频,作业,学员可以查看和下载所有发布的信息,还可以上传作业。因而具有一定的实用性。 本站是一个B/S模式系统,采用Java的SSM框架作为开发技术,MYSQL数据库设计开发,充分保证系统的稳定性。系统具有界面清晰、操作简单,功能齐全的特点,使得基于SSM的小码创客教育教学资源库的设计与实现管理工作系统化、规范化。
<think>好的,我现在需要解决用户遇到的ArrowNotImplementedError问题,这个错误出现在从Parquet文件读取list<element: struct>结构的时候。首先,我得理解这个错误的原因。根据用户提供的引用资料,特别是引用[1]中提到Apache Arrow的数组结构,可能涉及到数据结构的兼容性问题。Parquet文件中的复杂类型,比如列表中的结构体,可能在Arrow中没有被正确支持。 接下来,我需要查看用户提供的引用资料。引用[2]中的代码示例显示在Parquet模式中定义了数组类型和结构类型,这可能在读取时触发了Arrow的某些限制。引用[4]提到Parquet适合多列查询,但可能处理复杂嵌套结构时存在问题。而引用[5]讨论了Parquet的数据页结构,可能涉及解码时的处理方式。 可能的解决方案包括检查Parquet文件的模式定义是否正确,确保使用的库版本支持这些结构。例如,检查Arrow和PyArrow的版本是否过旧,是否需要升级。另外,可能需要使用特定的读取选项来处理嵌套结构,比如设置use_compliant_nested_type参数为True。如果问题依旧,可能需要手动转换数据类型,或者处理模式不兼容的情况。 此外,用户可能需要考虑是否存在模式演化的问题,或者Parquet文件中的某些特性不被Arrow支持。例如,某些嵌套结构或复杂类型可能需要特定的处理方式。根据引用[3]中的基准测试,不同文件格式对复杂结构的支持可能不同,这也可能影响Arrow的兼容性。 最后,建议用户分步骤排查:验证模式、升级库、调整读取选项、手动转换数据。如果这些步骤无效,可能需要查看更详细的错误日志或寻求社区支持。</think>### ArrowNotImplementedError 读取 Parquet 列表结构体的解决方案 #### 1. 错误背景分析 当使用 Apache Arrow 读取包含复杂嵌套类型(如 `list<element: struct>`)的 Parquet 文件时,可能出现 `ArrowNotImplementedError`。这是由于 Arrow 对某些嵌套结构的支持存在版本限制或实现差异[^1][^4]。 #### 2. 分步解决方案 **步骤1:验证 Parquet 文件模式** 检查 Parquet 文件的 Schema 定义是否符合 Arrow 的兼容性要求。例如,在 Java 中定义的嵌套结构: ```java // 用户提供的 Schema 示例(引用[2]) Type arrayDataType = ParquetDataSchema.stringTypeInfo("array_element", Type.Repetition.OPTIONAL); types.add(ParquetDataSchema.arrayTypeInfo("friends", Type.Repetition.OPTIONAL, arrayDataType)); ``` 需确保 Arrow 版本支持 `LIST` 类型包裹 `STRUCT` 的嵌套逻辑。 **步骤2:升级 Arrow/PyArrow 版本** 低于 3.0 的 Arrow 版本对复杂嵌套类型支持较弱: ```bash # 升级 PyArrow pip install --upgrade pyarrow ``` **步骤3:使用兼容性读取选项** 在读取时启用 `use_compliant_nested_type` 参数: ```python import pyarrow.parquet as pq table = pq.read_table( "file.parquet", use_compliant_nested_type=True # 强制兼容嵌套类型 ) ``` **步骤4:手动转换数据类型** 若自动解析失败,可显式定义 Schema: ```python from pyarrow import schema, field, list_, struct custom_schema = schema([ field("friends", list_(struct([ field("array_element", pa.string()) ]))) ]) table = pq.read_table("file.parquet", schema=custom_schema) ``` **步骤5:检查数据存储特性** Parquet 的列式存储可能影响嵌套结构解码效率,需确保数据页编码方式正确(引用[5]): ```python # 验证 Parquet 元数据 metadata = pq.read_metadata("file.parquet") print(metadata.schema.to_arrow_schema()) ``` #### 3. 典型错误场景与修复 | 错误场景 | 修复方法 | |----------------------------------|----------------------------------------| | Arrow 版本过低 | 升级到 PyArrow ≥ 8.0 | | LIST<STRUCT> 未启用兼容模式 | 添加 `use_compliant_nested_type=True` | | Schema 定义与存储结构不匹配 | 显式声明自定义 Schema | #### 4. 引用说明 - Arrow 数组结构的实现差异可能导致类型解析失败(引用[1]) - Parquet 对多列和嵌套结构的存储特性(引用[4]) - 数据页解码过程与 Schema 的关联性(引用[5])
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值