Loop, data, and MapReduce

本文探讨了计算机程序中循环结构(包括递归)的重要性及其普遍性。文章分析了循环在现实世界现象建模中的应用,并讨论了两种主要的循环用途:处理数据集和数值积分。此外,还介绍了并行处理的概念及Google的MapReduce如何利用这一思想。
Copyright (c) prototype, all rights reserved。
在不对原文内容(包括作者信息)做任何改动的前提下,欢迎自由转载。

There is a quite old story about job interview. I forgot who initially wrote/told it, but I still  clearly remember the story itself even after many years. In the story, the interviewee, who was seeking a programmer job, was asked of a "basic" interview question:"Tell me, what kind of programs are you good at writing?". The guy pondered for a few seconds and then answered:"I am good at writing loops...".

 

Loop (including recursion, in the general sense) is a fundamental structure in computer programs. No matter what problem you try to solve by writing a computer program in whatever general-purpose language, you can hardly avoid writing loops! Unless... Well, let me take a small step back, unless you are one of the few lucky or poor guys who write codes at a very high level or in a strange language (e.g., Makefile :-). 

 

 It is an interesting question to ask: why are loops so ubiquitous in computer programs? What I came up is that, first of all, iterations are observed in all kinds of phenomena in the real world that programs are trying to model. Everyday, the Sun rises and sets, and we do the same thing again and again... The earth self-rotates and rotates, the solar system rotates, the milky way self-rotates (I don't know whether it rotates around some other things)... Why does the whole universe have to rotate? That is a quite profound question that came into my mind. I don't have any answer; if you do let me know. Anyways, it is hard to think of other ways to model all these in programs better than loop structures. The second reason is perhaps rooted to a gap between human being and machines. Machines do not understand, they just carry out instructions. Therefore, to get a machine to work, a translation of the human's idea to machine instructions is needed. This translation cannot be performed without the concept of iteration. If you ever learned such a thing called "algorithms", you know what I am talking about.

 

What on earth can loops do then? I think there are two fundamental things that they typically do. One is to process a set of data (PASD). For example, in linear algebra, when we add up two vectors, we could go through one vector element by element to add the element in this vector with the corresponding one from the other vector. In code, it is simply like this:

 

for (int i = 0; i < vector_size; i++) {

vector1[i] += vector2[i];

}

 

The other one is to integrate functions (IF). For example, to calculate the integral of a given function numerically, we could write a loop like this:

 

float x = start;

float dx = step;

float integral = 0.0;

while (x <= end) {

integral += func( x, integral ) * dx;

x += dx;

}

 

Note that I deliberately write func in such a way that its current value depends on not only x, but also integral. This is an important point to note, because there implies a fundamental difference between PASD and IF: In the former, each cycle is independent of any of the other ones; whereas in the latter, each cycle depends on at least the last cycle.

 

 After realizing that in PASD each cycle is actually an independent calculation, we can answer this question: Do we have to use a loop to do PASD? No, but provided that we can have as processors so that each will deal with just one datum. Now you see what I am heading to. Yes, PASD is parallelizable.

 

 Google's MapReduce (MR) is a product of this kind of thoughts. The computing model of MR is very simple yet quite general: First, transform raw data to a useful form -- mapping, then compute on the basis of the transformed data to generate the final result -- reducing. Mapping is exactly PASD and can be implemented to be parallel. MR's main selling point is basically a (good) implementation of this parallelism.

 

Let me examine the limitations of the MR computing model: (1) It is a simple computing model and does not fit to every problem; (2) it may not benefit when the data set is small, when the overheads of parallelism are comparatively too much; (3) it may not benefit when the calculation in PASD is trivial and thus the mapping process is not rate-limiting.

 

The most important lesson I learn from this thought journey is: Weight shifting! Shift the weights from whatever parts in your code to the part of data processing, since the latter is (more) parellelizable. One important way to do shifting is to transform the data from a complicated form to a more simplistic form.

 

基于51单片机,实现对直流电机的调速、测速以及正反转控制。项目包含完整的仿真文件、源程序、原理图和PCB设计文件,适合学习和实践51单片机在电机控制方面的应用。 功能特点 调速控制:通过按键调整PWM占空比,实现电机的速度调节。 测速功能:采用霍尔传感器非接触式测速,实时显示电机转速。 正反转控制:通过按键切换电机的正转和反转状态。 LCD显示:使用LCD1602液晶显示屏,显示当前的转速和PWM占空比。 硬件组成 主控制器:STC89C51/52单片机(与AT89S51/52、AT89C51/52通用)。 测速传感器:霍尔传感器,用于非接触式测速。 显示模块:LCD1602液晶显示屏,显示转速和占空比。 电机驱动:采用双H桥电路,控制电机的正反转和调速。 软件设计 编程语言:C语言。 开发环境:Keil uVision。 仿真工具:Proteus。 使用说明 液晶屏显示: 第一行显示电机转速(单位:转/分)。 第二行显示PWM占空比(0~100%)。 按键功能: 1键:加速键,短按占空比加1,长按连续加。 2键:减速键,短按占空比减1,长按连续减。 3键:反转切换键,按下后电机反转。 4键:正转切换键,按下后电机正转。 5键:开始暂停键,按一下开始,再按一下暂停。 注意事项 磁铁和霍尔元件的距离应保持在2mm左右,过近可能会在电机转动时碰到霍尔元件,过远则可能导致霍尔元件无法检测到磁铁。 资源文件 仿真文件:Proteus仿真文件,用于模拟电机控制系统的运行。 源程序:Keil uVision项目文件,包含完整的C语言源代码。 原理图:电路设计原理图,详细展示了各模块的连接方式。 PCB设计:PCB布局文件,可用于实际电路板的制作。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值