32130 Data exploration and preparation

32130

Assessment Task 2: Data exploration and preparation

Task details

This assessment will give you prac!cal experience in data visualisation, explora!on, and prepara!on (preprocessing and transforma!on) for data analytics. This assignment is individual work. Each of you will be working with an individual dataset that you can download from the link below.

Objectives:

This assessment task addresses the following subject learning objec!ves (SLOs): 2 & 4

This assessment task contributes to the development of the following Course Intended Learning Outcomes (CILOs): D.1

Scenario

Nowadays, the Internet of Things (IoT) concept plays a pivotal role in society and brings new capabilities to different industries. The number of IoT solu!ons in areas such as transportation and healthcare is increasing and new services are under development. In the last decade, society has experienced a drastic increase in IoT connections. In fact, IoT connections will increase in the next few years across different areas. Conversely, several challenges still need to be faced to enable secure operations. Thus, efforts have been made to produce datasets composed of attacks against IoT devices. The main goal of this project is to foster the development of security analytics applications in real IoT operations.

In this task, the Head of the Analytics Unit asks you to use the collected dataset to do a 3-class (Mirai-greip_flood, Recon-OSScan, DictionaryBruteForce) intrusion type classification to help understand the behavior. of attacks. As you will see, this dataset is highly complicated and includes a lot of features that make this problem more challenging.

Your tasks include:

understanding the specifics of the dataset;

extracting informa!on about each of the attributes, possible associa!ons between them, and any other specifics of the dataset.

The tasks in the assignment are specified below.

Datasets

For this dataset, you only have the attribute headings (here ) and a paper created a large version of this dataset sensors-23-05941-v2.pdf . Each student is assigned an individual table with the actual values of these attributes. You will find your individual dataset in the link below. Your dataset is the one with your student ID in the file name.

Individual Student Datasets: Student Datasets

Tasks

1A. Initial data exploration

1. Identify the attribute type of each attribute in your dataset. If it's not clear, you may need to justify why you chose the type.

2. For each attribute, conduct below studies for each of them: Identify the values of the summarising properties for the attributes, including frequency, location and spread (e.g. value ranges of the attributes, frequency of values, distributions, medians, means, variances, percentiles, etc. - the statistics that have been covered in the lectures and materials given). Note that not all of these summary statistics will make sense for all the attribute types, so use your judgement! Where necessary, use proper visualisa!ons for the corresponding sta!s!cs.

3. Using KNIME or Python, explore mul!ple attributes rela!onship of your dataset, and identify any outliers, clusters of similar instances, "interes!ng" attributes and specific values of those attributes. Note that you may need to 'temporarily' recode attributes to numeric or from numeric to nominal. The report should include the corresponding snapshots from the tools and an explanation of what has been identified there.

Present your findings in the assignment report.

1B. Data preprocessing

Perform. each of the following data prepara!on tasks (each task applies to the original data) using your choice of tool:

1. Use the following binning techniques to smooth the values of the following two attributes:

- Protocol Type

- Duration

For each attribute, you must apply:

I. Equi-width binning

II. Equi-depth binning

In the assignment report, for each of these techniques, you need to illustrate your steps. In your Excel workbook file place the results in separate columns in the corresponding spreadsheet. Use your judgement in choosing the appropriate number of bins - and justify this in the report.

2. Use the following techniques to normalise the following attribute:

- Weight

For this attribute, you must apply:

I. min-max normalization to transform. the values onto the range [0.0-1.0].

II. z-score normalization to transform. the values.

The assignment report provides an explanation of each of the applied techniques. In your Excel workbook file place the results in separate columns in the corresponding spreadsheet.

3. Discretise the flow_dura"on attribute into the following categories:

Small [0 -1]

Medium (1 — 10,000)

Large [10,000 - inf)

Provide the frequency of each category in your dataset.

Your assignment report should provide an explanation of each of the applied techniques. In your Excel workbook file place the results in a separate column in the corresponding spreadsheet.

4. Binarise the Header_Length variable [with values "0" or "1"].

Your assignment report should provide an explanation of the applied binarisation technique. In your Excel workbook file place the results in separate columns in the corresponding spreadsheet.

1C. Summary

At the end of the report include a summary sec!on in which you summarise your findings. The summary is not a narrative of what you have done, but a condensed informative section of what you have found about the data that you should report to the Head of the Analytics Unit. The summary may include the most important findings (specific characteristics (or values) of some attributes, important informa!on about the distributions, some clusters identified visually that you propose to examine, associa!ons found that should be inves!gated more rigorously, etc.).

Deliverables

The deliverables are:

A report, for which the structure should follow the tasks of the assignment, and

An Excel workbook file with individual spreadsheets for each task (spreadsheets should be labelled according to the task names, for example, "1A"). Each of the results of parts (a) to (d) in task 1B should be presented in a separate sheet (and respec!vely table in the assignment report).

In the report, include a section (starting with a section title) for each of the tasks in the assignment.

本指南详细阐述基于Python编程语言结合OpenCV计算机视觉库构建实时眼部状态分析系统的技术流程。该系统能够准确识别眼部区域,并对眨眼动作与持续闭眼状态进行判别。OpenCV作为功能强大的图像处理工具库,配合Python简洁的语法特性与丰富的第三方模块支持,为开发此类视觉应用提供了理想环境。 在环境配置阶段,除基础Python运行环境外,还需安装OpenCV核心模块与dlib机器学习库。dlib库内置的HOG(方向梯度直方图)特征检测算法在面部特征定位方面表现卓越。 技术实现包含以下关键环节: - 面部区域检测:采用预训练的Haar级联分类器或HOG特征检测器完成初始人脸定位,为后续眼部分析建立基础坐标系 - 眼部精确定位:基于已识别的人脸区域,运用dlib提供的面部特征点预测模型准确标定双眼位置坐标 - 眼睑轮廓分析:通过OpenCV的轮廓提取算法精确勾勒眼睑边缘形态,为状态判别提供几何特征依据 - 眨眼动作识别:通过连续帧序列分析眼睑开合度变化,建立动态阈值模型判断瞬时闭合动作 - 持续闭眼检测:设定更严格的状态持续时间与闭合程度双重标准,准确识别长时间闭眼行为 - 实时处理架构:构建视频流处理管线,通过帧捕获、特征分析、状态判断的循环流程实现实时监控 完整的技术文档应包含模块化代码实现、依赖库安装指引、参数调优指南及常见问题解决方案。示例代码需具备完整的错误处理机制与性能优化建议,涵盖图像预处理、光照补偿等实际应用中的关键技术点。 掌握该技术体系不仅有助于深入理解计算机视觉原理,更为疲劳驾驶预警、医疗监护等实际应用场景提供了可靠的技术基础。后续优化方向可包括多模态特征融合、深度学习模型集成等进阶研究领域。 资源来源于网络分享,仅用于学习交流使用,请勿用于商业,如有侵权请联系我删除!
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值