MCD2080 Business Statistics Trimester 2 2024Python

Java Python MCD2080 Business Statistics

Trimester 2, 2024

Group Assignment

Problem background: Glassdoor.com

Glassdoor is a free digital platform. that gathers information and reviews from employees or former employees about companies, salaries, and even job openings.

The dataset used for this group assignment contains a random sample of job advertisements from Glassdoor.com. It is used to analyse the current job trends in the data science field based on job positions, company size, software skills, etc.

Refer to the workbook labelled Job Advertisements.xlsx in the Group assignment section on Moodle. This data can be used to understand various software skill requirements and other factors in job advertisements for Data Analysts, Data Engineers and Data Scientists. In this assignment, your task is to investigate and report how the expected salary is associated with various factors such as job types and software skills requirements.

Data definition:

In the file “Job Advertisements.xlsx”, you are provided with both numeric and categorical data. Note that this data has already been cleaned for you, and any missing records are removed. The following table contains the data definition.

Column

Column Name

Data Definition

A

Advertisement ID

The unique identifier for the job posting

B

Job Type

A simplified job title

C

Company Name

Full name of the company the advertisement is posted for

D

Company Size

Range of number of employees in the company

E

Ownership Type

Company type of ownership. 8 ownership types provided

F

Industry

The industry to which the organisation belongs

G

Min Salary

Minimum expected salary ($ 000 per year) for the job

H

Expected Salary

Average expected salary ($ 000 per year) for the job

I

Python

A binary indicator of whether the job requires Python knowledge/skills (1:Yes, 0:No)

J

AWS

A binary indicator of whether the job requires AWS knowledge/skills (1:Yes, 0:No)

K

Excel

A binary indicator of whether the job requires Excel knowledge/skills (1:Yes, 0:No)

Purpose:

We wish to explore the relationships between the expected salary and other independent variables. This is done by utilising the following statistical tools:

1.      Pivot Tables and Charts

2.      Summary Statistics

3.      Confidence Intervals

4.      Hypothesis Testing

5.      Regression Analysis

Assignment questions:

Answer all questions.

Week 4 Checkpoint: Do question 1

1 a). Discuss and compare the average expected salary for Data Engineers and Data Analysts using the following factors:

Ownership

Industry

Construct appropriate charts to support your discussion. Keep your discussion succinct.

Your answer to this question should not be longer than 1-2 pages.

b). We wish to compare the distribution of the expected salary between data analysts and engineers.

Generate Summary statistics and histograms and use them to compare the distributions. In your discussion, include measures of central tendency, variability and shape.

When discussing, include contextual interpretations of the measures used.

Your answer to this question should not be longer than 2 pages.     (14 marks)

Week 7 Checkpoint: Do questions 2 & 3.

2. We will now explore the relationship between the expected salary of Data Analysts and Data Engineers.

a). Calculate the 95% Confidence Interval estimate of the true average expected salary for Data Analysts and Engineers. Report your results using the table below.

Confidence Interval Estimate of Average Expected Salary for Job Types

Job Type

Lower Boundary / Limit

Upper Boundary / Limit

Data Analysts

Data Engineers

b). Calculate the 95% Confidence Interval estimate of the true average expected salary for Data Analysts and Engineers who have the following software skills:

•    Excel

•   Python

•   AWS

For each variable, report your results using the following format in the examples provided.

Confidence Interval Estimate of Average Expected Salary of Data Analysts requiring Excel Skills

Excel Skills

Lower Boundary / Limit

Upper Boundary / Limit

0 (No)

1 (Yes)

Confidence Interval Estimate of Average Expected Salary of Data Engineers requiring Excel Skills

Excel Skills

Lower Boundary / Limit

Upper Boundary / Limit

0 (No)

1 (Yes)

(Please use a similar format for Python and AWS)

c). Discuss your results obtained in (a) and (b). Remember to discuss answers for all tables produced.

For part (c) only, the expected length of the answer should be less than a page.    (20 marks)

3. We wish to disentangle the relationship between expected salary and Excel skills in each job type.

Use your knowledge in Hypothesis Testing to answer the following questions.

a). Do a majority/minority of data analyst roles require Excel skills?

b). Do a majority/minority of data analyst roles require Python skills? c). Do a majority/minority of data engineer roles require Excel skills? d). Do a majority/minority of data engineer roles require Python skills?

Hint: For each test, state the hypotheses, p-value and conclusion in the context of the question.   (6 marks)

Week 11 Final presentation and report submission: Do questions 4 & 5.

4. Estimate a multiple regression model to analyse the relationship between:

Expected salary and all other variables, such as three software skills, the two job types (data analysts and data engineers), and the minimum salary. You are required to produce one multiple regression output.

This section includes an analysis of the statistical significance of various factors in the model. Highlight the key factors that the multiple regression reveals as being the driver of Expected  Salary.

Your answer to this question should be approximately 1 to 1.5 pages.    (15 marks)

5. Based on the statistical analysis and results in questions 1 to 4, draw conclusions on the following:

a). All factors associated with Expected Salary.

b). The importance of software skills for different job types

c). Recommendations for job seekers to improve their ability to obtain higher-paying employment.

Your answer to this question should be approximately 1 to 1.5 pages.   (20 marks)

Assignment marks

The maximum total mark for the assignment is 175. Your total score will be composed of two parts:

•    Final assignment report (Questions 1-5): maximum marks of 75.

• Presentation: a maximum mark of 100

(i). Week 4 checkpoint - 20 (staff: 10 & peer to peer evaluation: 10)

(ii). Week 7 checkpoint - 30 (staff:15 & peer to peer evaluation:15)

(iii). Week 11 checkpoint - 40 (staff:20 & peer to peer evaluation:20)

Please note that any group member who will not give feedback to other group members will be awarded zero marks.

You will be required to fill in the peer evaluation on Teammates to be eligible for this component.

Please note that the Unit Leader reserves the right to adjust individual report marks based on the peer evaluation. Should the feedback indicate that an individual did not contribute to the group assignment, the reporting mark will be adjusted to zero, implying that the individual’s group assignment contribution to their final grade will be 0%.

Report requirements:

●     All answers should be in font size 12pt and 1.5 spacing.

●     Plots and tables must be legible, with appropriate labels to aid readers.

●     Statistical results need to be summarised in succinct table formats.

●     You will lose marks for poor presentation.

Presentation:

Use PowerPoint or other cloud-based apps eg Google slide, Prezi or Visme, etc.

Week 11 Final Assignment submission guidelines

•     The   link   is   set   up   using   an   Assignment   Tool   on   Moodle.   Please   submit   the   group Report/Answers in Word document or PDF.

•     If the question has sub-parts, for example, (a), (b) …, please indicate the labels for each part clearly.

•     DO NOT click on "submit all and finish" before you finish all questions.

ONLY 1 attempt is allowed for the Assignment. Group members should appoint one member to submit on behalf of the group         

MATLAB主动噪声和振动控制算法——对较大的次级路径变化具有鲁棒性内容概要:本文主要介绍了一种在MATLAB环境下实现的主动噪声和振动控制算法,该算法针对较大的次级路径变化具有较强的鲁棒性。文中详细阐述了算法的设计原理与实现方法,重点解决了传统控制系统中因次级路径动态变化导致性能下降的问题。通过引入自适应机制和鲁棒控制策略,提升了系统在复杂环境下的稳定性和控制精度,适用于需要高精度噪声与振动抑制的实际工程场景。此外,文档还列举了多个MATLAB仿真实例及相关科研技术服务内容,涵盖信号处理、智能优化、机器学习等多个交叉领域。; 适合人群:具备一定MATLAB编程基础和控制系统理论知识的科研人员及工程技术人员,尤其适合从事噪声与振动控制、信号处理、自动化等相关领域的研究生和工程师。; 使用场景及目标:①应用于汽车、航空航天、精密仪器等对噪声和振动敏感的工业领域;②用于提升现有主动控制系统对参数变化的适应能力;③为相关科研项目提供算法验证与仿真平台支持; 阅读建议:建议读者结合提供的MATLAB代码进行仿真实验,深入理解算法在不同次级路径条件下的响应特性,并可通过调整控制参数进一步探究其鲁棒性边界。同时可参考文档中列出的相关技术案例拓展应用场景。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值