2010-12-26

2010-12-26
2011年12月22日
  [b]Unit 1[/b]
  III vocabulary
  1. 1) substituted 2) analogy 3) represented 4) associated 5) challenge
  6) converted 7) concept 8) reduced 9) image 10) bundles
  11) choose 12) pointed 13) instead 14) various
  [b] [/b]
  [b]四、[/b][b]Usage[/b]
  1. took 2. go 3. take 4. go 5. go
  [b] [/b]
  [b]五、[/b][b]Structure[/b]
  1. What caused the fire
  2. What size of shoes my father wears
  3. What looked like a ball
  4. What our family and friends do for us
  5. What she had bought for his birthday
  [b] [/b]
  [b] [/b]
  [b]六、[/b][b]Translation[/b]
  1. What the boy likes to do most is putting together building blocks.
  2. In terms of precious working experience, John is the best choice for this position.
  3. My physics teacher often uses analogy to explain some difficult concepts.
  4. With the help of his family and friends, Tom built up his publishing business bit by bit.
  5. Linda was not able to go to that famous college, but she planned to start all over again rather than give up the challenge.
  6. This company has a very good public image. People always associate its product with high quality and good service.
  [b] [/b]
  [b]TEXT B [/b]
  [b]Exercise [/b][b]一、[/b][b][/b]
  1. recognized 2. later on 3. fall back on 4. slightest 5. alternative
  6. figure out 7. convinced 8. complicated 9. struck terror into
  10. oral 11. sound 12. follow 13. doubts 14. master
  [b] [/b]
  UNIT 2
  TEXT A
  三、Vocabulary activities
  1、1) back and forth 2) destination 3) terminals 4) distinction
  5) are not supposed to 6) bet 7) rotten 8) racial
  9) board 10) inexpensive 11) delight 12) ride
  13) pretended 14) increasing 15) valuable
  2、1) come up with 2) turned out 3) hold on to 4) take over
  5) picked up speed 6) head for
  四、Word formation
  1. reboarded 2. invaluable 3. inarguably 4. interracial 5. unlikely
  五、Structure
  1. on the table were some flowers.
  2. many a time have I climbed that hill.
  3. only in this way can we solve the problem.
  4. so many times has mother told you to go to bed before 10 o’clock.
  5. no better person can I think of this job.
  六、Translation
  1. The children are pretty annoyed that their parents won’t allow them to play around the railway track.
  2. I bet if I pick up a little speed I will reach the destination sooner than they do.
  3. You don’t want to go out in such rotten weather. It’s better for you to stay home and stretch your legs and do physical exercises.
  4. It’s half past ten, and you’re not supposed to be sleeping! It’s time to head for the airport to pick up your cousin!
  5. Who came up with the idea to ask Mike to take over the project?
  6. The school makes no distinction in treating students from different racial backgrounds.
  TEXT B
  Exercise 一、
  1. oversleeping 2. disappointed 3. grades 4. parental
  5. figure 6. fuss 7. have it in for 8. sack
  9. attitude 10. come by 11. changed
  [b]VI. Answers to the Translation[/b]
  [b] [/b]1. He is a qualified mechanic, but he winds up with a job in international trade.
  2. He enrolled in an elementary computer-training program in his spare time but failed to get through.
  3. After the interviews, the principal chose several outstanding university graduates to work as teachers.
  4. This contract is very important to our company. The more concrete it is, the better. I need to talk it over with my colleagues.
  5. The boy suffers from severe leukemia and has to be transferred to a big hospital for further treatment.
  6. When he heard that the school where his father worked had closed down, tears rolled down his cheeks.
  [b]Text B[/b]
  [b] A Teacher’s Story[/b]
  [b]Difficult Sentences:[/b]
  1. Mrs. Thompson would actually take delight in marking his papers with a broad red pen, making bold X’s and then putting a big “F” at the top of his papers.
  实际上,汤姆森夫人常常用一支粗粗的红笔来批改他的试卷,划上大大的叉,在试卷上端写下大大的“F(不及格)”,并以此为乐。
  2. At the school where Mrs. Thompson taught, she was required to review each child’s past records.
  按照汤姆森夫人所在学校的规定,她必须查阅每个孩子的档案记录。
  3. She felt even worse when her students brought her Christmas presents, wrapped in beautiful ribbons and bright paper, except for Teddy’s.
  学生给她带来了圣诞礼物,这些礼物都用鲜艳的纸包着,还扎上美丽的丝带,只有泰迪的除外。这时她感到更难受了。
  4. Some of the children started to laugh when she found a bracelet with some of the stones missing and a bottle that was one quarter full of perfume.
  她看到是一个手链,上面的一些宝石已经脱落了,还有一瓶香水,里面的香水只剩下四分之一。一些孩子见此情形开始大笑起来。
  Unit 3
  III
  mechanicconfusedqualifiedinterviewscarceoutstandingseverelicenseelementarytransferredpraisearriveis settled officiallyroll
  close downmop upget byout of the question’talk…overget in the way’
  text B
  I
  1.clumsily’
  2. exclaimed
  3. took delight in
  4. slumped
  5. spray
  6. bold
  7. took pains
  8. has affected
  9. wrap
  10. withdrawn
  11. noticed
  12. requied
  13.ashamed
  14. was missing
  Unit 4
  III vocabulary
  apologyrespondedrelationshipabilitiesargumentspilledmoodtraditiontremendousrudelytouched failed
  figure outpull … throughbring outadded to light upat leasta touch of
  [b]VI. Answers to the Translation[/b]
  [b] [/b]1. Mr. Bruce made an apology for his disrespect for the local traditions.
  2. At the meeting the two parties exchanged their opinions on their relationship between the two countries.
  3. He is in such a mood that it is not appropriate for him to appear in public.
  4. You should at least try not to spill the water when carrying it.
  5. The argument of this scientist received tremendous support from academic circles.
  6. No matter what difficulties you may come across, we will pull you through.
  [b] [/b]
  [b]Text B[/b]
  [b] Dear Daughter [/b]
  [b]Difficult Sentences:[/b]
  1.I will miss seeing how your eyes sparkle when you are excited about something and want to share it with me.
  [b]当你为某事兴奋不已并希望和我分享这事的时候,你的眼睛里就闪烁着喜悦的光芒,我会怀念你这样的神情。[/b][b][/b]
  [b]2.[/b][b]It all came to a dead stop the night the car crashed into the wall.[/b]
  [b]在汽车撞上墙壁的那个夜晚,一切都嘎然而止。[/b][b][/b]
  [b]3.[/b][b]Just an empty shell where our beautiful baby girl used to live.[/b]
  [b]我们美丽的宝贝如今只剩下一个失去了灵魂的躯壳。[/b][b][/b]
  [b]4.[/b][b]Wisdom is learning to face those temptations, think about what God and your parents would have you do, and then just say “no”.[/b]
  [b]智慧就是学会面对这些诱惑,想想上帝和父母会让你怎样做,然后说“不”。[/b][b][/b]
  [b] [/b]
  [b]Unit 5`[/b]
  [b]III. [/b]
  [b]1. [/b][b]event[/b]
  [b]2. [/b][b]routine [/b]
  [b]3. [/b][b]message [/b]
  [b]4. [/b][b]had detected [/b]
  [b]5. [/b][b]manner[/b]
  [b]6. [/b][b]devices[/b]
  [b]7. [/b][b]essential[/b]
  [b]8. [/b][b]impact[/b]
  [b]9. [/b][b]enables[/b]
  [b]10. [/b][b]regular[/b]
  [b]11. [/b][b]research[/b]
  [b] [/b]
  [b] [/b]
  [b]2 [/b]
  [b]1. for instance[/b]
  [b]2. are equipped with[/b]
  [b]3. switch off[/b]
  [b]4. be on the lookout for[/b]
  [b]5. took the hint[/b]
  [b]6. speed up[/b]
  [b]7. based on[/b]
  [b] [/b]
  [b]VI. Answers to the Translation[/b]
  1.Will mobile communication edge out fixed lines as the most frequently used means of communication?
  2.I pointed at the clock on the wall. My daughter took the hint and sped up dressing herself.
  3.Please switch off your cell phone or set it in silent mode during the meeting.
  4.Meteorologists are on the lookout for the progress of the typhoon so as to alert the public in time.
  5.Xiao Zhang cited the steady economic growth in China and its bright future to explain his decision to return to work here.
  [b]Text B Cell Phone Etiquette[/b]
  Difficult Sentences
  1. 手机礼仪其实只是普通礼节
  2. 社会必须制定礼貌使用新设备的规则。而我们的手机礼仪还在演变中。
  3. 在葬礼、婚礼上和其他类似场合,最好把你的手机留在家里,或者至少留在车上。
  4. 通常,你在同他人会面之初就应告知对方你在等一个重要的电话并请他们允许你接听。这样做是符合礼仪的。
实验3-统计某电商网站买家收藏商品数量 现有某电商网站用户对商品的收藏数据,记录了用户收藏的商品id以及收藏日期,名为buyer_favorite1。buyer_favorite1包含:买家id,商品id,收藏日期这三个字段,数据以“\t”分割,样本数据及格式如下: 1.买家id 商品id 收藏日期 2.10181 1000481 2010-04-04 16:54:31 3.20001 1001597 2010-04-07 15:07:52 4.20001 1001560 2010-04-07 15:08:27 5.20042 1001368 2010-04-08 08:20:30 6.20067 1002061 2010-04-08 16:45:33 7.20056 1003289 2010-04-12 10:50:55 8.20056 1003290 2010-04-12 11:57:35 9.20056 1003292 2010-04-12 12:05:29 10.20054 1002420 2010-04-14 15:24:12 11.20055 1001679 2010-04-14 19:46:04 12.20054 1010675 2010-04-14 15:23:53 13.20054 1002429 2010-04-14 17:52:45 14.20076 1002427 2010-04-14 19:35:39 15.20054 1003326 2010-04-20 12:54:44 16.20056 1002420 2010-04-15 11:24:49 17.20064 1002422 2010-04-15 11:35:54 18.20056 1003066 2010-04-15 11:43:01 19.20056 1003055 2010-04-15 11:43:06 20.20056 1010183 2010-04-15 11:45:24 21.20056 1002422 2010-04-15 11:45:49 22.20056 1003100 2010-04-15 11:45:54 23.20056 1003094 2010-04-15 11:45:57 24.20056 1003064 2010-04-15 11:46:04 25.20056 1010178 2010-04-15 16:15:20 26.20076 1003101 2010-04-15 16:37:27 27.20076 1003103 2010-04-15 16:37:05 28.20076 1003100 2010-04-15 16:37:18 29.20076 1003066 2010-04-15 16:37:31 30.20054 1003103 2010-04-15 16:40:14 31.20054 1003100 2010-04-15 16:40:16 要求编写MapReduce程序,统计每个买家收藏商品数量,并撰写实验报告。直接给出完整的代码即可
03-28
<think>好的,我现在需要帮助用户解决他们的实验问题,也就是统计电商网站每个买家收藏商品的数量,并编写MapReduce程序。首先,我得仔细理解用户的需求。 用户提供的样例数据有三个字段:买家id、商品id和收藏日期,用制表符分隔。目标是统计每个买家收藏了多少个不同的商品。这里需要注意的是,可能同一个买家会多次收藏同一个商品,所以是否需要去重呢?根据用户的要求,是“收藏商品数量”,通常这种情况下是统计每个买家收藏的商品数量,而不是次数,所以可能需要去重。但用户给出的样例数据中,例如买家20056在2010-04-15多次收藏不同的商品id,可能每个商品id是唯一的。但为了保险起见,可能需要确认是否需要去重。不过根据问题描述,可能直接统计每个买家的记录数即可,但用户可能需要的是每个买家收藏了多少个不同的商品,这时候应该去重。比如,如果同一买家收藏同一商品多次,只算一次。 所以,我得确定处理逻辑。在Map阶段,应该输出买家id和商品id作为键值对吗?或者,是否需要将买家id作为键,商品id作为值,然后在Reduce阶段进行去重统计? 正确的做法应该是将买家id作为键,商品id作为值,然后在Reducer中对每个买家对应的所有商品id进行去重计数。或者,也可以在Map阶段直接输出买家id和商品id的组合作为中间键,这样在Reducer中可以更高效地统计。或者,在Mapper中输出复合键(买家id,商品id),然后在Reducer中统计每个买家的不同商品数量。 不过,在MapReduce中,通常的做法是在Mapper中输出键值对,其中键是用于分组的字段,而值是需要处理的数据。为了统计每个买家的不同商品数,我们需要将每个买家id和商品id的组合作为中间键,以确保相同的买家和商品只计数一次。但MapReduce会自动将相同的键分到同一个Reducer中,所以如果以(买家id,商品id)作为键,在Reducer中每个键只会出现一次,这样可以统计每个买家有多少个不同的商品id。或者,可以在Mapper中发送买家id作为键,商品id作为值,然后在Reducer中使用一个集合(Set)来去重商品id的数量。 两种方法各有优劣。第一种方法,使用复合键的话,可能需要一个Combiner来优化,或者在Reducer中直接统计每个买家对应的不同商品数量。而第二种方法,Reducer需要维护一个集合来存储商品id,这可能在数据量大的时候导致内存问题。因此,更高效的方法可能是在Mapper中将买家id作为键,商品id作为值,然后在Reducer中使用一个HashSet来去重,最后统计数量。 但是,对于大规模数据来说,使用HashSet可能会消耗较多内存,特别是当某个买家收藏了上百万商品时。这时候可以考虑使用布隆过滤器或者其他结构,但对于这个实验来说,假设数据量不大,直接用HashSet是可行的。 接下来,我需要编写Map和Reduce的代码。对于Mapper来说,需要读取每一行,分割出买家id和商品id,然后将买家id作为key,商品id作为value发送出去。但是需要注意,可能同一买家在同一商品上有多次收藏,所以需要在Reducer中进行去重。 所以Mapper的输出是(买家id,商品id),然后Reducer接收到同一个买家id的所有商品id,将它们放入一个集合中,最后统计集合的大小。 但是,在MapReduce中,如果每个商品id对应同一个买家id,那么在Reducer中,可以通过将商品id添加到一个HashSet中,自动去重,然后统计数量。 现在,针对代码部分,用户要求的是完整的代码,包括MapReduce的Java代码。所以需要写出Mapper类、Reducer类,以及驱动程序的设置。 需要注意的是,输入文件的格式是用制表符分割的三个字段,所以需要正确分割每行数据。此外,要处理可能的异常,比如行数据不符合格式的情况,但实验可能假设数据都是正确的。 Mapper的逻辑: 读取每一行,分割成三个部分:买家id、商品id、日期。然后输出key为买家id,value为商品id。 Reducer的逻辑: 对于每个买家id,收集所有的商品id,存入一个HashSet中,然后输出买家id和HashSet的大小。 同时,需要注意在Hadoop中,Mapper和Reducer的输入输出类型。通常,使用Text作为键和值的类型。所以Mapper的输出是Text(买家id)作为key,Text(商品id)作为value。Reducer的输出是Text(买家id)和IntWritable(数量)。 现在编写具体的代码: Mapper类: public static class TokenizerMapper extends Mapper<Object, Text, Text, Text> { private Text buyerId = new Text(); private Text itemId = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { String[] parts = value.toString().split("\t"); if (parts.length >= 3) { String bid = parts[0].trim(); String item = parts[1].trim(); buyerId.set(bid); itemId.set(item); context.write(buyerId, itemId); } } } Reducer类: public static class IntSumReducer extends Reducer<Text, Text, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { Set<String> uniqueItems = new HashSet<>(); for (Text val : values) { uniqueItems.add(val.toString()); } result.set(uniqueItems.size()); context.write(key, result); } } 驱动程序: public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "buyer favorite count"); job.setJarByClass(BuyerFavoriteCount.class); job.setMapperClass(TokenizerMapper.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); // Mapper的输出value类型是Text // 因为Reducer的输入是Text和Text,所以需要设置Map输出的key和value类型 job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); // Reducer的输出是Text和IntWritable job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } 需要注意的是,驱动程序中需要正确设置Mapper和Reducer的输出类型。因为Reducer的输出key是Text,value是IntWritable,所以job.setOutputKeyClass和setOutputValueClass应该对应。而Mapper的输出key和value是Text和Text,所以需要单独设置setMapOutputKeyClass和setMapOutputValueClass。 在Hadoop中,如果Mapper和Reducer的输出类型不一致,必须明确设置。比如,Reducer的输出是Text和IntWritable,而Mapper的输出是Text和Text,所以需要分别设置。 这样,整个程序就可以正确统计每个买家的不同商品数量。例如,在样例数据中,买家20056有多个收藏记录,但可能收藏不同的商品id,所以Reducer会将它们去重后统计数量。 最后,用户可能需要这个程序的完整代码,包括import语句和类定义。所以需要补全这些部分。 完整的Java代码如下: import java.io.IOException; import java.util.HashSet; import java.util.Set; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class BuyerFavoriteCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, Text> { private Text buyerId = new Text(); private Text itemId = new Text(); @Override public void map(Object key, Text value, Context context) throws IOException, InterruptedException { String[] parts = value.toString().split("\t"); if (parts.length >= 3) { String bid = parts[0].trim(); String item = parts[1].trim(); buyerId.set(bid); itemId.set(item); context.write(buyerId, itemId); } } } public static class IntSumReducer extends Reducer<Text, Text, Text, IntWritable> { private IntWritable result = new IntWritable(); @Override public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { Set<String> uniqueItems = new HashSet<>(); for (Text val : values) { uniqueItems.add(val.toString()); } result.set(uniqueItems.size()); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "buyer favorite count"); job.setJarByClass(BuyerFavoriteCount.class); job.setMapperClass(TokenizerMapper.class); job.setReducerClass(IntSumReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } 这样,用户就可以将这个代码编译打包成JAR文件,在Hadoop集群上运行,输入路径和输出路径作为参数传入。运行结果会得到每个买家id对应的收藏商品数量,且已经去重。</think>以下是完整的MapReduce程序代码,用于统计每个买家收藏商品数量: ```java import java.io.IOException; import java.util.HashSet; import java.util.Set; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class BuyerFavoriteCount { public static class FavoriteMapper extends Mapper<Object, Text, Text, Text> { private Text buyerId = new Text(); private Text itemId = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { String[] fields = value.toString().split("\t"); if (fields.length >= 3) { buyerId.set(fields[0].trim()); itemId.set(fields[1].trim()); context.write(buyerId, itemId); } } } public static class FavoriteReducer extends Reducer<Text, Text, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<Text> values, Context context ) throws IOException, InterruptedException { Set<String> uniqueItems = new HashSet<>(); for (Text val : values) { uniqueItems.add(val.toString()); } result.set(uniqueItems.size()); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "Buyer Favorite Count"); job.setJarByClass(BuyerFavoriteCount.class); job.setMapperClass(FavoriteMapper.class); job.setReducerClass(FavoriteReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ``` ### 代码说明 1. **Mapper阶段**: - 输入格式:`<行偏移量, 文本行>` - 解析每行数据,提取买家ID和商品ID - 输出键值对:`<买家ID, 商品ID>` 2. **Reducer阶段**: - 使用HashSet对商品ID进行去重 - 统计唯一商品数量 - 输出格式:`<买家ID, 收藏商品数量>` 3. **程序特征**: - 处理数据格式:TSV文本文件 - 自动过滤不符合格式的记录 - 精确统计唯一商品数量(自动去重) ### 执行方式 1. 将代码打包为JAR文件 2. 在Hadoop集群执行命令: ```shell hadoop jar BuyerFavoriteCount.jar /input/path /output/path ``` ### 输出结果示例 ``` 10181 1 20001 2 20042 1 20054 5 20055 1 ... ``` 每个买家ID后面对应其收藏的唯一商品数量。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值