DATA4800 Artificial Intelligence and Machine LearningC/C++

Java Python DATA4800 Subject Outline

Subject Code:

DATA4800

Subject Name:

Artificial Intelligence and Machine Learning

Credit Points:

Four

Pre-requisite:

STAM4000

Co-requisite:

DATA4100

Subject

Coordinator(s):

Please see MyKBS subject page for details

Workshop

Facilitator(s):

Please see MyKBS subject page for details

Study Mode:

On campus and online

Required

Materials:

Please see MyKBS subject page for resources

Subject Description

This subject builds upon previous analytics topics and explores advances in algorithmic Artificial

Intelligence (AI) techniques such as Machine Learning (ML). Students will first explore both

supervised and unsupervised ML, which include basic classification methods such as Decision Trees, Random Forests, Logistic Regression and Support Vector Machines and also basic

dimensionality reduction and clustering algorithms, such as PCA and K-means clustering.

Students will then investigate the theory and application of neural nets and deep learning in AI. The subject also introduces contemporary methods in ML such as convolutional neural

networks for image analytics as well as transformer models for natural language processing.

Subject Website

Students will need access to technology resources (including the internet) to undertake this     subject. The subject has a dedicated website called MyKBS. It is important that students visit   the MyKBS site regularly for subject updates and general information. The MyKBS site can be accessed at:https://elearning.kbs.edu.au/. All assessments are electronically lodged via

MyKBS.

Subject Learning Outcomes

Upon successful completion of this subject, students should be able to demonstrate achievement of the following learning outcomes.

LO1:

Evaluate the advantages and disadvantages of artificial intelligence in business

LO2:

Create and communicate business insights through machine learning

LO3:

Explore relational and non-relational database tools,

and how they support machine learning

LO4:

Analyse chatbots and applications of natural language processing in artificial intelligence

LO5:

Investigate the benefits to business and individuals of smart technologies

Kaplan Graduate Attributes

Kaplan has a series of Graduate Attributes that define the philosophy underpinning its subjects and programs. Further details about the Graduate Attributes can be accessed at:

https://www.kbs.edu.au/about-us/school-policies.

Recommended Resources

The recommended (but optional) textbook for this subject is “Machine Learning in Business” 2e  (John Hull, 2020) or 3e (John Hull, 2022). Further, students are expected to access and utilise a wide range of information sources that are provided on MyKBS under weekly workshop content  tabs. The Kaplan Library also has an extensive collection of resources that are available to support student research and study needs. The library can be accessed at

https://library.kaplan.edu.auor via MyKBS.

Software

The required software for this subject is:

Orange Data Mining. All analysis in this course will be carried out using the Orange Data Mining software exclusively:

1.   Download at:https://orangedatamining.com/(you do not have to do this before Week 1, but you are encouraged to do so).

 

Google Docs and Sheets. All word processing and report writing in this course will be carried out using Google Docs and Sheets.

1.   Go to:https://accounts.google.com/and “create account” using the same email address  that is listed on your MyKBS profile (you do not need to do this if your email address is a Gmail address).

Assessments Overview

This is an assessed and graded subject. A student’s overall grade will be calculated from the

marks for each assessment task based on the relative assessment weightings shown in the table below. Students must obtain an overall mark for the subject of at least 50% to be able to pass the subject. Details about the subject assessments can be accessed at the subject’s website at

MyKBS. Further details about Kaplan’s assessment policy, including information about late submissions, can be accessed athttps://www.kbs.edu.au/about-us/school-policies.

Assessment

Weight

Learning   outcomes

Mode of

submission

Submissio n week

Details

Assessment 1

Group prediction

project –

classification.

30%

LO1, LO2

Group: In

dai 写DATA4800 Artificial Intelligence and Machine LearningC/C++

Class

 

 

Individual: Via Turnitin

Week 5

Teamwork:

Building &

analysing

decision tree

and logistic

regression

models for a    simple dataset

Assessment 2

Individual

implementation and problem solving

(puzzle)

30%

LO3, LO5

Individual: In class

implementatio n and

problem

solving

 

Participation: During class

Week 10

Implementing and analysing neural

networks and

other

predictive

models

Assessment 3

Individual project.

40%

LO2, LO3, LO4, L05

Individual:

Via Turnitin

 

Video Quiz

Week 13

Integration of knowledge

using a

business

problem

Student Feedback

Every trimester, each subject is reviewed as part of a continuous process of quality assurance.

Students are surveyed as part of this review process. This is known as the SELTS survey, that is: Student Experience, Learning and Teaching Survey. The survey takes place in the later part of

the trimester and participation is voluntary, confidential, and highly encouraged. The results of the survey are factored into continual workshop delivery and subject development by the subject

facilitators and coordinators. Results are only made available to the subject facilitators after the release of the final grades in the subject. Should students have suggestions for improvement about their subject during the trimester, discussion with their workshop facilitator is always encouraged.

Weekly Workshop Schedule

Weekly on-campus and online workshops are designed to offer interesting and effective activities

to help you learn best. These may include peer-to-peer activities, videos, online discovery, real- life content examples, open discussion, assessment guidance discussions, and feedback opportunities.

 

Topic

Week 1

Introduction to Machine Learning and Ethics

Week 2

Supervised Learning: Linear and Logistic Regression, KNN

Week 3

Unsupervised Learning: K Means and PCA + Orange Lab

Week 4

Supervised Learning: Decision Trees and Random Forests

Week 5

In-class assessment 1

Week 6

Introduction to Neural Networks

Week 7

Image Classification with Deep CNNs + Orange Lab

Week 8

Introduction to Natural Language Processing

Week 9

Natural Language Processing with BERT + Orange Lab

Week 10

In-class assessment 2

Week 11

Reinforcement Learning and Autonomous Agents

Week 12

Model Interpretability and Subject Summary

Academic Integrity Policy

KBS values academic integrity. All students must understand the meaning and consequences of cheating, plagiarism and other academic offences under the Academic Integrity and Conduct Policy.

.    What is academic integrity and misconduct?

.    What are the penalties for academic misconduct?

.    What are the late penalties?

.     How can I appeal my grade?

The answers to these questions can be accessed athttps://www.kbs.edu.au/about-us/school- policies.

Length Limits for Assessments

Penalties may be applied for assessment submissions that exceed prescribed limits.

Study Assistance

Students may seek study assistance from their local Academic Learning Advisor or refer to the resources on theMyKBS Academic Success Centrepage. Further details can be accessed at  https://elearning.kbs.edu.au/course/view         

<think>我们被要求使用C++构建子串索引(用于处理100个及以上字符串),并参考了用户提供的引用内容。根据引用,我们知道在C++中可以使用标准库的`std::string::find`或者手动实现查找算法(如暴力法、KMP等)。但是,构建子串索引通常采用的方法是预先构建一个索引结构(如n-gram索引)来加速子串搜索。 我们需要实现一个类似之前Python示例的n-gram索引,但是使用C++。主要步骤包括: 1. 构建一个类(例如`SubstringIndex`),包含: - 一个存储所有字符串的容器(例如`std::vector<std::string>`) - 一个索引结构(例如`std::unordered_map<std::string, std::set<size_t>>`,其中键是n-gram,值是该n-gram出现的字符串索引集合) - 一个整数n(表示n-gram的长度) 2. 构建索引的方法:遍历每个字符串,将其分解为长度为n的所有子串(n-gram),并将该字符串的索引(在字符串向量中的位置)加入到对应n-gram的集合中。 3. 搜索方法:对于给定的查询字符串: - 如果查询长度小于n,则退化为线性扫描(因为索引是基于n-gram的,无法使用索引,或者我们可以考虑生成查询字符串的所有子串,但这样效率可能不高,所以线性扫描更直接)。 - 如果查询长度大于等于n,则: a. 提取查询字符串的所有n-gram。 b. 找到包含所有这些n-gram的字符串索引集合(取交集)。 c. 然后在这些候选字符串中验证是否包含整个查询字符串(因为n-gram是连续的,但多个n-gram出现在字符串中不一定保证连续出现整个查询串)。 注意:C++标准库中没有直接提供多个集合的交集,但我们可以使用`std::set_intersection`或者循环取交集。 考虑到性能,我们使用`std::unordered_map`和`std::set`(或者`std::unordered_set`)来构建索引。由于我们要处理100个及以上的字符串,使用索引可以大大提高搜索效率。 另外,引用[3]提到了`substr`函数,我们可以用它来提取子串。 下面我们提供一个C++的实现示例: </think>### C++ 子串索引构建方法及代码示例 在 C++ 中构建高效的子串索引需要结合哈希映射和 n-gram 技术,特别适合处理 100+ 字符串集合。以下是完整实现方案: #### 核心数据结构 ```cpp #include <iostream> #include <vector> #include <unordered_map> #include <set> #include <algorithm> class SubstringIndex { private: int n_gram; // n-gram 长度 std::vector<std::string> strings; // 原始字符串集合 std::unordered_map<std::string, std::set<size_t>> index; // 索引结构 public: SubstringIndex(int n = 3) : n_gram(n) {} // 构建索引 void buildIndex(const std::vector<std::string>& input_strings) { strings = input_strings; index.clear(); for (size_t str_id = 0; str_id < strings.size(); ++str_id) { const std::string& s = strings[str_id]; // 跳过空字符串 if (s.empty()) continue; // 提取所有 n-gram for (size_t i = 0; i <= s.length() - n_gram; ++i) { std::string gram = s.substr(i, n_gram); // 使用 substr 提取子串 index[gram].insert(str_id); } } } // 搜索包含子串的字符串 std::vector<size_t> search(const std::string& query) { // 处理空查询 if (query.empty()) return {}; // 短查询处理(长度 < n_gram) if (query.length() < n_gram) { return linearSearch(query); } // 长查询处理(长度 ≥ n_gram) std::vector<std::string> grams; for (size_t i = 0; i <= query.length() - n_gram; ++i) { grams.push_back(query.substr(i, n_gram)); } // 获取候选集(包含所有 grams 的字符串) std::set<size_t> candidate_ids; bool first = true; for (const auto& gram : grams) { if (index.find(gram) == index.end()) return {}; // 任一 gram 缺失 if (first) { candidate_ids = index[gram]; first = false; } else { std::set<size_t> intersect; std::set_intersection( candidate_ids.begin(), candidate_ids.end(), index[gram].begin(), index[gram].end(), std::inserter(intersect, intersect.begin()) ); candidate_ids = intersect; } } // 验证候选字符串 std::vector<size_t> result_ids; for (size_t id : candidate_ids) { if (strings[id].find(query) != std::string::npos) { // 使用 find 确认完整匹配 result_ids.push_back(id); } } return result_ids; } private: // 短查询线性搜索 std::vector<size_t> linearSearch(const std::string& query) { std::vector<size_t> result; for (size_t i = 0; i < strings.size(); ++i) { if (strings[i].find(query) != std::string::npos) { result.push_back(i); } } return result; } }; ``` #### 使用示例 ```cpp int main() { // 创建测试数据集(150个字符串) std::vector<std::string> dataset; for (int i = 0; i < 50; ++i) { dataset.push_back("artificial intelligence"); dataset.push_back("machine learning algorithms"); dataset.push_back("data science applications"); } // 构建索引(n=4) SubstringIndex indexer(4); indexer.buildIndex(dataset); // 执行搜索 std::string query = "learn"; std::vector<size_t> results = indexer.search(query); // 输出结果 std::cout << "找到 " << results.size() << " 个匹配项:\n"; for (size_t id : results) { std::cout << "ID: " << id << "\t内容: " << dataset[id] << "\n"; } return 0; } ``` #### 性能优化要点 1. **时间复杂度**: - 构建索引:$O(M \cdot L)$($M$=字符串数,$L$=平均长度) - 搜索:短查询 $O(M)$,长查询 $O(K + C)$($K$=查询n-gram数,$C$=候选集大小) 2. **n-gram 选择**: - 推荐 $n=3 \sim 5$,权衡召回率与内存开销 - 可通过实验确定最佳值: ```cpp // 测试不同n值效果 for (int n = 2; n <= 6; ++n) { SubstringIndex test_index(n); test_index.buildIndex(dataset); // 测试查询性能... } ``` 3. **内存管理**: ```cpp // 分批处理超大规模数据集 const size_t BATCH_SIZE = 10000; for (size_t i = 0; i < large_dataset.size(); i += BATCH_SIZE) { auto batch_begin = large_dataset.begin() + i; auto batch_end = (i + BATCH_SIZE < large_dataset.size()) ? large_dataset.begin() + i + BATCH_SIZE : large_dataset.end(); indexer.buildIndex(std::vector<std::string>(batch_begin, batch_end)); } ``` #### 关键技术说明 1. 使用 `std::unordered_map` 实现哈希索引,提供 $O(1)$ 平均查找性能[^2] 2. 通过 `std::set_intersection` 高效计算集合交集[^1] 3. 对短查询自动降级为线性搜索,优化边界情况处理 4. 利用 `std::string::find` 进行精确匹配验证[^2]
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值