COMP 3027J Data Mining and Machine Learning


Data Mining and Machine Learning COMP 3027J
ASSIGNMENT 1
Weight: 40%
Submissions: A report (PDF), and a zip file (including code and datasets) on Brightspace.
The purpose of this assignment is to practice how to use data mining and machine learning to
solve real-world problems. You will need to identify the target problem yourself. You can
choose any project, but it must be a classification task and includes visual analytics in the
report. (Note: Do not related to or use the dataset in Assignment 2; Do not related to your
FYP project.) as long as it is legal. This assignment is a group project, and each group should
have four members. Each group only needs to submit one solution.
Your pdf report should clearly detail how you carried out the experiment to address your
targeted problem and show the results you got.
1. Your report should be written in Overleaf, and use the provided template:
https://www.overleaf.com/latex/templates/acm-journals-primary-articletemplate/cpkjqttwbshg.
2. It should be a human-readable document (e.g. do not include code)
3. The final report is expected to be 4-6 pages including references.
4. You should provide your UCD student number instead of institution in the provided
template.
5. Use clear headings for each section.
6. Include tables and figures if needed appropriately, such as giving captions, describing
your figures or analysing the results provided in your tables in your text etc.
7. The final report filename should be “Comp3027J_GroupXX” (e.g.
Comp3027J_Group01)
In your report, it is recommended tData Mining and Machine Learning COMP 3027J  
ASSIGNMENT 1  
Weight: 40%  
Submissions: A report (PDF), and a zip file (including code and datasets) on Brightspace.  
The purpose of this assignment is to practice how to use data mining and machine learning to  
solve real-world problems. You will need to identify the target problem yourself. You can  
choose any project, but it must be a classification task and includes visual analytics in the  
report. (Note: Do not related to or use the dataset in Assignment 2; Do not related to your  
FYP project.) as long as it is legal. This assignment is a group project, and each group should  
have four members. Each group only needs to submit one solution.  
Your pdf report should clearly detail how you carried out the experiment to address your  
targeted problem and show the results you got.  
1. Your report should be written in Overleaf, and use the provided template:  
https://www.overleaf.com/latex/templates/acm-journals-primary-articletemplate/cpkjqttwbshg.  
2. It should be a human-readable document (e.g. do not include code)  
3. The final report is expected to be 4-6 pages including references.  
4. You should provide your UCD student number instead of institution in the provided  
template.  
5. Use clear headings for each section.  
6. Include tables and figures if needed appropriately, such as giving captions, describing  
your figures or analysing the results provided in your tables in your text etc.  
7. The final report filename should be “Comp3027J_GroupXX” (e.g.  
Comp3027J_Group01)  
In your report, it is recommended to代 写COMP 3027J   Data Mining and Machine Learning discuss the following essential topics, but not limited to  
these topics:  
1. What is the real-world problem addressed and why it is important.  
2. Dataset selection (collection) and Data pre-processing.  
Where you find your data (or how do you collect the data and create your dataset)?  
How do you analyze your data?  
how to pre-process your data to fit your solution?  
Any challenges with your dataset?  
etc.  
3. Methodology  
Any machine learning algorithm can be used (not limited to the algorithm we have  
learned).  
Creativity is encouraged.  
Be careful, a sophisticated approach with little description and explanation will  
receive little credit.  
4. Evaluation  
Elaborate your experiment, such as splitting dataset, K-fold;  
Compare your solution with benchmarks in literature;  
Evaluation metrics for your task;  
Analysing your results etc.  
You should submit a pdf file and a zip file. In your zip file, you should include your code and  
dataset. Please make sure to clean up your code to make the results reproducible. If its size  
exceeds the Brightspace limit, it needs to be submitted via a USB key. Note your pdf report  
must be submitted as an individual file, which should not be compressed into the zip file.  
There will be an interview at the end of the term, and you will be asked about the methodology  
adopted.  
2  
• Grading  
Problem Literature Methodolgy Evaluation Code+Reproducibility  
5% 5% 15% 10% 5%  
 WX:codinghelpo discuss the following essential topics, but not limited to
these topics:
1. What is the real-world problem addressed and why it is important.
2. Dataset selection (collection) and Data pre-processing.
Where you find your data (or how do you collect the data and create your dataset)?
How do you analyze your data?
how to pre-process your data to fit your solution?
Any challenges with your dataset?
etc.
3. Methodology
Any machine learning algorithm can be used (not limited to the algorithm we have
learned).
Creativity is encouraged.
Be careful, a sophisticated approach with little description and explanation will
receive little credit.
4. Evaluation
Elaborate your experiment, such as splitting dataset, K-fold;
Compare your solution with benchmarks in literature;
Evaluation metrics for your task;
Analysing your results etc.
You should submit a pdf file and a zip file. In your zip file, you should include your code and
dataset. Please make sure to clean up your code to make the results reproducible. If its size
exceeds the Brightspace limit, it needs to be submitted via a USB key. Note your pdf report
must be submitted as an individual file, which should not be compressed into the zip file.
There will be an interview at the end of the term, and you will be asked about the methodology
adopted.
2
• Grading
Problem Literature Methodolgy Evaluation Code+Reproducibility5% 5% 15% 10% 5%
 WX:codinghelp

由于没有确切来自Comp90049课程的机器学习介绍部分笔记,但参考常见的机器学习入门知识和提供的引用内容,以下是可能涵盖的笔记内容: ### 卷积神经网络基础 #### 卷积层相关 - **从全连接到卷积**:进行图像识别有两个原则,可从全连接层出发应用这两个原则得到卷积。卷积算子用于图像识别,有二维交叉相关、二维卷积层,还存在一维、三维交叉相关的情况。二维交叉相关与卷积有所不同。 - **填充和步幅**:在卷积层里,填充和步幅是重要概念。填充可改变输出的尺寸,步幅影响卷积操作的移动间隔,并且有对应的代码实现 [^1]。 - **多输入多输出通道**:卷积层存在多个输入通道和多个输出通道的情况。多个输入通道时,卷积操作会考虑不同通道的信息;多个输出通道可增加特征提取的多样性。1*1卷积层是卷积核的高和宽都等于1,它不会识别空间信息,只是融合通道,相当于将输入拉成向量后与权重为co*ci的全连接层进行操作 [^1][^3]。 ```python # 简单示例代码,使用PyTorch创建一个卷积层 import torch import torch.nn as nn # 创建一个卷积层,输入通道为3,输出通道为6,卷积核大小为3 conv_layer = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3) ``` #### 池化层 池化层有二维最大池化层和平均池化层。最大池化层会选取局部区域的最大值,平均池化层则计算局部区域的平均值,也有对应的代码实现 [^1]。 ```python # 简单示例代码,使用PyTorch创建一个最大池化层 import torch import torch.nn as nn # 创建一个最大池化层,池化核大小为2,步幅为2 max_pool = nn.MaxPool2d(kernel_size=2, stride=2) ``` #### 经典卷积神经网络LeNet LeNet是经典的卷积神经网络,有其具体的实现方式,并且可以学习如何手动检查模型 [^1]。 ### 降维方法 非负矩阵分解(NMF)是一种降维方法。从`sklearn.decomposition`中导入`NMF`,可创建实例`nmf = NMF(n_components,init,tol)`,其中`n_components`为主成分个数/降维后的维度,`init`为初始化方法,`tol`为最小迭代差值。通过`pca.fit_transform(data)`可进行降维计算并按行返回降维后的数据 [^4]。 ```python from sklearn.decomposition import NMF import numpy as np # 示例数据 data = np.random.rand(100, 20) # 创建NMF实例 nmf = NMF(n_components=5) # 降维计算 data_reduced = nmf.fit_transform(data) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值