program python

Java Python CA Assignment 2
Clustering Algorithms
Assignment Number 2 of (2)
Weighting 15%
Assignment Circulated 10.03.2025
Deadline 27.03.2025
Submission Mode Electronic Via Canvas
Purpose of assessment The purpose of this assignment is to demonstrate: (1) the under standing of the KMeans (2) the understanding of KMeans++(3)
the understanding of evaluation metrics for clustering.
Learning outcome assessed A critical awareness of current problems and research issues in
data mining. (3) The ability to consistently apply knowledge con cerning current data mining research issues in an original manner
and produce work which is at the forefront of current develop ments in the sub-discipline of data mining.
1. (20) Implement k-means clustering algorithm and cluster the dataset provided using it. Vary the value of k
from 1 to 9 and compute the Silhouette coefficient for each set of clusters. Plot k in the horizontal axis
and the Silhouette coefficient in the vertical axis in the same plot.
2. (10) Generate synthetic data of same size (i.e. same number of data points) as the dataset provided and use
this data to cluster K Means. Plot k in the horizontal axis and the Silhouette coefficient in the vertical
axis in the same plot.
3. (20) Implement k-means++ clustering algorithm and cluster the dataset provided using it. Vary the value
of k from 1 to 9 and compute the Silhouette coefficient for each set of clusters. Plot k in the horizontal
axis and the Silhouette coefficient in the vertical axis in the same plot.
4. (20) Implement the Bisecting k-Means algorithm to compute a hierarchy of clusterings that refines the initial
single cluster to 9 clusters. For each s from 1 to 9, extract from the hierarchy of clusterings the clustering
with s clusters and compute the Silhouette coefficient for this clustering. Plot s in the horizontal axis
and the Silhouette coefficient in the vertical axis in the same plot.
5. (20) Compute the confusion matrix, macro-averaged Precision, Recall, and F-score for the clustering shown
in Figure 1.
Figure 1: Outcome of a Clustering Algorithm
1
6. (10) For the same clusters as in Figure 1, compute B-CUBED Precision, Recall, and F-score.
Important Notes
1. No credit will be given for implementing any other type of clustering algorithms or using an existing
library for clustering instead of implementing it by yourself. However, you are allowed to use
• numpy library (any function)
• random module;
• matplotlib for plotting; and
• pandas.read csv, csv.reader, or similar modules only for reading data from the files.
However, it is not a requirement of the assignment to use any of those modules.
2. Your program
• should run and produce all results for Questions 1, 2, 3 and 4 in one click without requiring any
changes to the code;
• should output only the required data in a clearly structured way; it should NOT output any
intermediate steps;
• should assume that the input file is named ‘dataset’ and is located in the same folder as the
program; in particular, it should NOT use absolute paths.
3. Programs that do not run will result in a mark of zero!
4. dai 写program、python Your code should be as clear as possible and should contain only the functionality needed to answer the
questions. Provide as much comments as needed to make sure that the logic of the code is clear enough
to a marker. Marks may be deducted if the code is obscure, implements unnecessary functionality, or
is overly complicated.
5. If you use module random to make some random actions, use a fixed seed value so that your program
always produces the same output.
6. The answers of Questions 1 to 4 will be in the form of .py files and the answer for Question 5 and 6
should be in a PDF format.
7. The python code of the implementation of the algorithms should be included in the .py file, and not
in the report.
8. You may use or (re)use any portion of the function that calculates the Silhouette coefficient from the
solution to the tasks in Lab 6.
9. For Question 1, the name of the coding file should be KMeans.py.
10. For Question 2 the name of the coding file should be KMeansSynthetic.py.
11. For Question 3 the name of the coding file should be KMeansplusplus.py.
12. For Question 4 the name of the coding file should be BisectingKMeans.py.
13. For Questions 1 to 4, markers will run python filename.py. This should be able to generate the
corresponding plot in the current directory.
14. There will be a load dataset function for Question 1,3 and 4. This function will be used to process the
dataset provided.
15. For questions 1 to 4 there should be following functions defined in your code.
Page 2
• a function called plot silhouttee to write the code for plot number of clusters vs. silhouttee
coefficient values.
• a function called ComputeDistance to computing the distance between two points.
• a function called initialSelection which will choose initial cluster representatives or clusters.
• a function called clustername(x,k) where x is the data and k is the value of the number of clusters.
16. For question 1 to 3, Following functions should be there.
• a function named assignClusterIds that will assign cluster ids to each data point.
• a function named computeClusterRepresentatives which will compute the cluster representations.
17. For Question 4, computeSumfSquare function to compute the sum of squared distances within a cluster.
18. You can use the KMeans function implemented for question 1 in Question 2 and 4.
19. Each function should have a comment. Each comment should describe input, output and what the
function does.
20. Edge case conditions should be handled (e.g. File not given, File corrupted, only 1 datapoint in the
file).
21. Your submission should be your own work. Do not copy or share! Make sure that you clearly understand
the severity of penalties for academic misconduc.
22. Plotting should generate the plot in my current folder
23. You’re free to include as many functions in your program as you need. Nevertheless, you should have
at least the functions specified earlier.
24. A sample program structure for KMeans is given below just for the illustration purpose. You can follow
different program structure with same functions         

AI 代码审查Review工具 是一个旨在自动化代码审查流程的工具。它通过集成版本控制系统(如 GitHub 和 GitLab)的 Webhook,利用大型语言模型(LLM)对代码变更进行分析,并将审查意见反馈到相应的 Pull Request 或 Merge Request 中。此外,它还支持将审查结果通知到企业微信等通讯工具。 一个基于 LLM 的自动化代码审查助手。通过 GitHub/GitLab Webhook 监听 PR/MR 变更,调用 AI 分析代码,并将审查意见自动评论到 PR/MR,同时支持多种通知渠道。 主要功能 多平台支持: 集成 GitHub 和 GitLab Webhook,监听 Pull Request / Merge Request 事件。 智能审查模式: 详细审查 (/github_webhook, /gitlab_webhook): AI 对每个变更文件进行分析,旨在找出具体问题。审查意见会以结构化的形式(例如,定位到特定代码行、问题分类、严重程度、分析和建议)逐条评论到 PR/MR。AI 模型会输出 JSON 格式的分析结果,系统再将其转换为多条独立的评论。 通用审查 (/github_webhook_general, /gitlab_webhook_general): AI 对每个变更文件进行整体性分析,并为每个文件生成一个 Markdown 格式的总结性评论。 自动化流程: 自动将 AI 审查意见(详细模式下为多条,通用模式下为每个文件一条)发布到 PR/MR。 在所有文件审查完毕后,自动在 PR/MR 中发布一条总结性评论。 即便 AI 未发现任何值得报告的问题,也会发布相应的友好提示和总结评论。 异步处理审查任务,快速响应 Webhook。 通过 Redis 防止对同一 Commit 的重复审查。 灵活配置: 通过环境变量设置基
【直流微电网】径向直流微电网的状态空间建模与线性化:一种耦合DC-DC变换器状态空间平均模型的方法 (Matlab代码实现)内容概要:本文介绍了径向直流微电网的状态空间建模与线性化方法,重点提出了一种基于耦合DC-DC变换器的状态空间平均模型的建模策略。该方法通过数学建模手段对直流微电网系统进行精确的状态空间描述,并对其进行线性化处理,以便于系统稳定性分析与控制器设计。文中结合Matlab代码实现,展示了建模与仿真过程,有助于研究人员理解和复现相关技术,推动直流微电网系统的动态性能研究与工程应用。; 适合人群:具备电力电子、电力系统或自动化等相关背景,熟悉Matlab/Simulink仿真工具,从事新能源、微电网或智能电网研究的研究生、科研人员及工程技术人员。; 使用场景及目标:①掌握直流微电网的动态建模方法;②学习DC-DC变换器在耦合条件下的状态空间平均建模技巧;③实现系统的线性化分析并支持后续控制器设计(如电压稳定控制、功率分配等);④为科研论文撰写、项目仿真验证提供技术支持与代码参考。; 阅读建议:建议读者结合Matlab代码逐步实践建模流程,重点关注状态变量选取、平均化处理和线性化推导过程,同时可扩展应用于更复杂的直流微电网拓扑结构中,提升系统分析与设计能力。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值