amuseme_lu-优快云博客

原创智能世界 2030

分析了未来计算场景的发展，涵盖更智能的AI、更普惠的AI、更深入的感知、超越现实的体验、更准确的探索和更高效的数据驱动创新。总结了计算2030的愿景和关键技术特征，涵盖物理层突破、绿色集约、智能认知、多维协同、多样性计算和内生安全等方面。总结了通信网络2030的愿景和关键技术特征，涵盖立体超宽网络、确定性体验、智能原生、通信感知融合、安全可信和绿色低碳等方面。分析了新一轮能源变革的方向，涵盖风电和光伏发电成本竞争力优势、电力电子技术为能源系统变革提供保障，以及数字技术使能能源系统智能化演进。

2024-02-24 11:04:11 1145

原创关于深入实施“东数西算”工程加快构建全国一体化算力网的实施意见研读

别外需要提升在算力基础之前的应用产业落地的效率，盘活存量算力资源，并提升整体使用率；随着2023年大模型技术的兴起，带动了整个围绕算力产业的生态的崛起，2024年基础大模型也会进一步进行归并，大模型的创新应用会百花齐放，深入到各行各业中，来提升企业管理和服务效率、企业营销运营效率，个人的IOT智能设备等企业与个人生活的方方面面中。另外提出了五个一体化统筹布局，从混合算力构建，到中西算力协同化，再到算力、数据算法整合，最后到绿色算力中心和相应安全机制的建立，都进行了相应的一体化构建。

2024-01-03 23:21:28 1382

原创【AI基础设施】智算场景的资源管理系统与未来展望

首先澄清两个概念，高性能计算与智算场景，高性能计算主要是面向天气预测、生物计算、材料计算等场景，而最近几年很火的智算主要是面向AI场景的计算，如语音识别、图像识别、自动驾驶等场景，我们可以看到他们有一个共同的基础组件能力，叫分布式资源管理与任务调度执行服务，在HPC场景用了最多的就是Slurm和LSF，在大数据场景用的比较多的就是Yarn和K8s，但未来，智算场景，甚至HPC、大数据场景（数据湖）也会统一到K8s。GPU长任务（推理）与短任务混合等，提高整体资源的利用率，降低计算所带来的成本消耗。

2023-12-17 21:47:47 610

原创【大数据】Hadoop生态未来发展的一些看法

这三篇论文开启了工业界的大数据时代，被称为Google的三驾马车。整体来看，随着后Hadoop时代，大数据生态技术慢慢会成为像操作系统一样的稳定软件，公司的普及率也会越来越大，不管是使用公有云提供的服务，还是使用成熟的商业化产品，所带来的企业使用成本也会相对降低，也推动企业数字化转型的速度和力度，所以未来我们应该更关注在使用这些工具能给业务带来的价值，类似我们在一个成熟的操作系统上能开发出多少能真正给企业、人个产生价值的App，这些才能我们未来需要关注的点。

2023-12-12 23:19:46 719

原创【AI算力】关于国产算力的一些调研分析

为了缩小国内外GPU卡的差距，国内企业需要加强自主研发和创新，提高技术研发能力；同时，需要加强生态系统建设，完善硬件、软件、开发工具w和应用场景等方面的布局；以华为、曙光为第一梯队代表，壁仞、燧原、寒武纪等为第二梯队代表，场景覆盖从图像识别到大模型训练、推理，落地领域包括金融、安防、智能汽车、IOT、智能客服等行业。1. 技术差距：国外GPU卡在技术研发方面领先，拥有较高的计算性能和能效比。3. 用户生态：包括集成商生态，用户生态，软件生态、人才生态、行业生态、场景生态等需要完善。

2023-12-11 23:38:05 2346

原创关于大语言模型LLM插件和Agent的一些想法

插件是安装在程序上的软件附加组件，目的是增强其功能，核心是形成围绕基础软件的生态，而这个逻辑放到大语言模型上也是成立的，大语言模型通过插件增强了其能力边界，如实时股票搜索、知识库搜索等功能，来形成围绕大模型基础能力的生态，以适应在不同场景中形成解决方案。(1) 左：在个体层面。对于单个代理来说，其他代理也是环境的一部分。在车载场景中类似车内代驾（自动驾驶），在智能家具场景类似智能管家，在客户场景类似私人客户，在智慧医疗场景类似私人医生，等等，未来我们生活中会有不同的Agent来完成特定场景的特定任务。

2023-12-05 23:10:38 1006

原创关于近期互联网行业收缩的一些看法

全球经济复苏仍面临持续挑战，越来越多的迹象表明，全球经济活动正在失去动能，发达经济体增速显著放缓，预计将从2022年的2.7%降至2023年的1.5%，其中美国2023年经济增长预期为1.8%，2024年放缓至1.0%，欧元区今明两年的预期增速分别为0.9%和1.5%；同时积极参加外部的相关技术、产品等讨论，提升在外部的个人品牌影响力。6. 找到自己的兴趣和目标并不断精进：寻找自己真正感兴趣的事，可能不是正在从事的工作，也可能是，并持续的精进，说不定自己的兴趣会成为自己生活的资本。7. 待补充.....

2023-12-04 22:43:59 224

原创关于业界大语言模型（LLM）开源的一些看法

此外，不同的贡献者可能会添加不同的大模型库或组件，这可能导致大模型库变得庞大而难以维护。3. 许可证问题：开源大模型通常使用各种开源许可证发布，这些许可证可能会对如何使用和分发大模型施加限制。1. 提升技术水平：国内大语言模型开源可以使更多的研究人员、工程师和学生参与进来，共同开发和改进模型，从而提升国内的自然语言处理技术水平。6. 生态发展：围绕开源基础大模型、数据集、算力，借助大模型训练工具，形成围绕大模型的生态。总之，国内大语言模型开源具有多方面的好处，有助于促进学术、产业、社会、生态和人才的发展。

2023-12-03 21:41:07 755

原创 2021 大数据白皮书（中国信通院）

大数据从一个新兴的技术产业，正在成为融入经济社会发展各领域的要素、资源、动力、观念大数据正从这六个方面高速发展大数据正多政策面、法律面、技术面、管理面、流程面、安全面这六个方向进行高速发展政策面：我国大数据战略进一步深化、激活数据要素潜能、加快数据要素市场化建设成为核心议题法律面：从基本法律、行业行政法规到地方立法，我国数据法律体系架构初步搭建完成技术面：大数据技术体系以提升效率、赋能业务、加强安全、促进流通为目标加速向各领域扩散，已形成支撑数据要素发展的整套工具体系管理面：数据资产管理实践加速

2022-04-05 10:18:22 12035

原创大数据技术合集（持续更新）

0. 大数据技术演变史（来自通信院）1. 起源：Google的三驾马车1.1 GFS1.2 BigTable1.3 MapReduce2. 开源大数据文件、对象、列式等存储2.1 文件系统：HDFS、Ceph、GlusterFS2.2 分布式缓冲文件系统：Alluxio、JuiceFS2.3 对象存储：OZone、SeaweedFS、MinIO3. 列式数据库：HBase、Cassandra3.1 OLAP数据库：Hive、Doris、Kylin、Clickhouse3.2 HTAP

2022-04-05 09:52:25 2647

原创回来了，老熟悉了

回来了

2022-03-31 22:50:41 1745

原创 Blog 搬新家了

由于朋友送了一个空间，现在把一些这个博客中的原创内容都搬到新家中去。地址为www.lemolu.com以后这个博客就不更新了。多谢大家这么久以来的支持与关心。

2012-11-08 14:32:37 3578

原创 Nutch 2.0 之抓取流程简单分析

Nutch 2.0 抓取流程介绍---------------------1. 整体流程InjectorJob => GeneratorJob => FetcherJob => ParserJob => DbUpdaterJob => SolrIndexerJobInjectorJob : 从文件中得到一批种子网页，把它们放到抓取数据库中去Generator

2012-07-23 23:41:26 10570 3

原创 Nutch 2.0 之 Apache Gora MR介绍

Nutch 2.0 之 Apache Gora MR介绍-----------------1. 介绍 Apapche Gora内建了对于Apache Hadoop的支持，而Gora的dataStore可以用来做为InputFormat与OutputFormat的输入与输出，然而这些输出的对象都会被序列化，Gora扩展了Avro的DatumWriters来实现的。2

2012-07-21 15:05:50 5159

原创 Nutch2.0 之 Apache Gora 介绍

Nutch 2.0 之 Apache Gora介绍-----------------1. 什么是Apache Gora Apache Gora是一个开源的ORM框架，主要为大数据提供内存数据模型与数据的持久化。目前Gora支持对于列数据、key-value数据，文档数据与RDBMS数据的存储，还支持使用Apache Hadoop来对对大数据进行分析。2. 为什么

2012-07-20 22:43:20 20967

原创 Nutch 2.0 终于来了

Nutch 2.0 终于来了-------------------------带着大部分人的期待，Nutch 2.0终于发布了，它在Nutch 1.x的基础上做了比较大的改变，主要还是在它的存储层的抽象上，Nutch 2.0的计划最终由使用者对于Nutch对No-sql的不支持而建立起来的，最初的版本叫做NutchBase，由Dogacan Guney进行开发，最初始版本由于过于依赖H

2012-07-17 00:07:35 4876

原创 Xapian 学习笔记 4 分面搜索

Xapian 学习笔记 4 分面搜索------------------------1. 什么是分面搜索分面搜索使用户可以动态的对用户所查询的命中文档进行特定属性的聚合，分面搜索在很多地方都有应用，特别昌电子商场中，用户输入一个查询条件，服务器返回这个查询所命中的文档的分类信息，如用户查询“电脑”，那服务器返回命中“电脑”这个关键词的所有文档，并且对这些文档进行类型的聚类，如平板

2012-06-01 15:13:48 6166

原创 Xapian 学习笔记 3 相关字段的排序

Xapian 学习笔记 3 相关字段的排序在Xapina中，命中文档的排序是以文档的相关度降序来做的，当两个文档的相关度一样时，按文档id的升序来做，你也可以通过设置enquire.set_docid_order(enquire.DESCENDING)来把其变成降序，或者设置成不关心文档id的排序enquire.set_docid_order(enquire.DONT_CARE)；当然这个

2012-05-31 17:31:29 6462

原创 Xapian 学习笔记 2 相关概念

Xapian 学习笔记 2 一些概念---------------------------1. 同步概念 Xapian没有显示的支持多线程，为了避免不必要的线程死锁，Xapian没有使用任何全局变量，所以你可以你的多线程应用中放心的使用Xapain对象，但是一些Xapian对象内部是有关联的，如Xapian::Database::get_document(),返回的对象Xapian

2012-05-30 13:34:35 7042

原创 Xapian学习笔记 1 介绍

Xapian介绍 ----------1. 简单介绍 Xapian 是一个开源的搜索引擎库，是用C++来编写的，准许GPL协议(http://www.opensource.org/licenses/gpl-license.php),它现在可以与Perl,python,PHP,Java等语言来绑定使用。和Lucene一样，Xapian只是一个搜索引擎工具库，用户可以

2012-05-24 15:38:39 9628

原创 Nutch 1.3 学习笔记外传扩展Nutch插件实现自定义索引字段

扩展Nutch插件实现自定义索引字段1.Nutch与Solr的使用介绍 1.1 一些基本的配置在conf/nutch-site.xml加入http.agent.name的属性生成一个种子文件夹，mkdir -p urls，在其中生成一个种子文件，在这个文件中写入一个url，如http://nutch.apache.org/ 编辑conf/regex-urlfilter.txt文

2012-04-25 10:23:49 6725

原创如何使用MongoDB自带的json库来反序列json字符串

需求：在对mongodb中的字段值进行解析的时候发现，因为这个值是json字符串，需要对其进行反序列化。解决方法：首先想到了到http://www.json.org/json-zh.html网站去找相应的C++库，试了一下jsoncpp和JSON Spirit，因为是用scons来构建了，装了一下，编译以后玩不起来，放弃了。再试JSON Spirit，(http://www.cod

2012-03-23 17:46:18 11974

原创在Centos上安装RabbitMQ流程

在Centos上安装RabbitMQ流程------------------------1. 需求由于项目中要用到消息队列，经过ActiveMQ与RabbitMQ的比较，最终选择了RabbbitMQ做为我们的消息系统，但是ActiveMQ在效率和可扩展性上都不错，只是网上很多人反应它会时常崩溃，而且随着消息并发数的增加，时常会出现连接很慢的情况。目前我测试的服务器系

2011-12-15 15:35:44 14117

原创使用awk对文档中特定字段的排序

使用awk对文档中特定字段的排序----------------------------------------------------1. 问题定义现在要对如下文档按特定字段排序，lemo@debian:~/Testspace/awk$ cat fileName Sex Salary Lemo man 4000 Jok woman 3000 Job man 6000 P

2011-11-17 23:47:48 13222

原创 Hadoop 之 Secondary Sort介绍

Hadoop 之 Secondary Sort介绍---------------------------我们知道，在reduce之前，MP框架会对收到的对按K进行排序，而对于一个特定的K来说，它的List是没有被排过序的，就是说这些V是无序的，因为它们来自不同的Map端，而且很多应用也不依赖于K所对应的list的顺序，但是有一些应用就要就要依赖于相同K的V的顺序，而且还要把他们聚合在一起，

2011-11-10 14:41:09 12428

原创 emacs Magit简单介绍

emacs Magit简单介绍---------------------1. 什么是Magit 在介绍Magit之前，我们先来了解一下什么是Git,Git 是 Linux Torvalds 为了帮助管理 Linux® 内核开发而开发的一个开放源码的版本控制软件,它是一个快速、可扩展的分布式版本控制系统，它具有极为丰富的命令集，对内部系统提供了高级操作和完全访问。而这里的Ma

2011-11-03 13:45:03 10888

原创 Boost Tokenizer 使用介绍

Boost Tokenizer 使用介绍-------------------------1. 介绍Boost Tokenizer提供了一种把字符序列转换成一组Token的能力，当然，你也可以定义TokenizerFunction来自定义序列的切分符号，如果不指定，默认是以空格为分割，去掉一些标点符号。2. 几个简单的例子下面是一个简单的例子：// simple

2011-11-03 11:13:28 5297

原创 Nutch 1.3 学习笔记 12 Nutch 2.0 的主要变化

Nutch 2.0 的主要一些变化1. Storage Abstraction initially with back end implementations for HBase and HDFS extend it to other storages l

2011-09-20 14:27:13 5289 1

翻译 Nutch 1.3 学习笔记 11-2 页面评分机制 LinkRank 介绍

刚试了一把Google翻译，感觉不是可以的。下面是Google翻译的http://wiki.apache.org/nutch/NewScoring内容，是关于Nutch 新的链接分数算法的说明，有点类似于Google的PageRank，这里有其运行的一个例子http://wi

2011-09-20 13:50:42 8277

原创 Nutch 1.3 学习笔记 11-1 页面评分机制 OPIC

Nutch 1.3 学习笔记 11-1 页面评分机制 OPIC--------------------------------------1. Nutch 1.3 的页面评分机制 Nutch1.3目前默认还是使用OPIC作为其网页分数算法，但其之后，已经引入了Pa

2011-09-20 13:46:47 7036

原创 Nutch 1.3 学习笔记 10-3 插件机制分析

Nutch 1.3 学习笔记 10-3 插件机制分析-------------------------------------1. 一些对象说明 PluginRepository:这是一个用于存储所有插件描述对象(PluginDescriptor)，插件扩展点

2011-09-18 00:19:08 4843

原创 Nutch 1.3 学习笔记 10-2 插件扩展

Nutch 1.3 学习笔记插件扩展 10-2---------------------------------1. 自己扩展一个简单的插件这里扩展一个Nutch的URLFilter插件，叫MyURLFilter 1.1 生成一个Package

2011-09-15 22:43:51 2867

原创 Nutch 1.3 学习笔记 10-1 - Ntuch 插件机制简单介绍

Nutch 1.3 学习笔记 10 -1 - Ntuch 插件机制简单介绍---------------------------------------- 在Nutch中，大量的可扩展的部分都使用了插件来做，如网页下载时所用的协议选择，解析不同类型的网页，url的过滤和

2011-09-15 00:00:10 3167

原创 Nutch 1.3 学习笔记 9 SolrIndexer

Nutch 1.3 学习笔记 9 SolrIndexer----------------------------------新的Nutch使用了Solr来做了后台的索引服务，nutch正在努力与Solr进行更方便的整合，它很好的与Solr处理了耦合关系，把Solr当成一个

2011-09-01 23:50:23 4875 8

原创 Lucene 3.3 学习笔记 1 介绍

Lucene 3.3 学习笔记 1 Lucene 3.3 学习笔记 1 包架构------------------------1. Lucene介绍 Lucene的作者说过，Lucene只是一个高效的全文搜索引擎库，而不是一个平台，提供了非常简单的API，使其可以

2011-08-30 23:05:39 1884

原创 Nutch 1.3 学习笔记 8 LinkDb

Nutch 1.3 学习笔记 8 LinkDb----------------------------这里主要是分析一下org.apache.nutch.crawl.LinkDb,它主要是用计算反向链接。1. 运行命令 bin/nutch invertlink

2011-08-29 22:02:25 3971

原创 Nutch 1.3 学习笔记 7 CrawlDb - updatedb

Nutch 1.3 学习笔记 7 CrawlDb - updatedb------------------------------这里主要看一下CrawlDb中的updatedb，它主要是用来更新CrawlDb数据库的1. bin/nutch updatedb

2011-08-28 23:33:15 4245 2

原创 Nutch 1.3 学习笔记 6 ParseSegment

Nutch 1.3 学习笔记 6 ParseSegment-----------------------------------1. bin/nutch parse这个命令主要是用来解析抓取的内容，对其进行外链接分析，计算分数等操作，这个解析在抓取的时候就可以

2011-08-28 22:11:40 3514 2

原创 Nutch 1.3 学习笔记 5-1 FetchThread

Nutch 1.3 学习笔记 5-1 FetchThread-----------------------------------上一节看了Fetcher中主要几个类的实现，这一节会来分析一下其中用到的消费者FetcherThread,来看看它是干嘛的。1.

2011-08-27 22:54:44 3869 1

原创 Nutch 1.3 学习笔记 5 Fetcher流程

Nutch 1.3 学习笔记 5 Fetcher-------------------------------1. Fetcher模块的简单介绍Fetcher这个模块在Nutch中有单独一个包在实现，在org.apache.nutch.fetcher，其中有Fetch

2011-08-27 15:18:39 4771 1

Effective.STL.pdf

这是一本关于如果有效使用STL的书，感觉还可以。

2008-10-21

lucene in action

Contents Preface Chapter 1 Meet Lucene Chapter 2 Indexing Chapter 3 Adding search to your application Chapter 4 Analysis Chapter 5 Advanced search techniques Chapter 6 Extending search Chapter 7 Parsing common document formats Chapter 8 Tools and extensions Chapter 9 Lucene ports Chapter 10 Administration and performance tuning Chapter 11 Case studies Appendix A Installing Lucene Appendix B Lucene index format Appendix C Resources Appendix D Using the benchmark (contrib) framework

2010-03-04

expert c programming

Introduction C code. C code run. Run code run...please! —Barbara Ling All C programs do the same thing: look at a character and do nothing with it. —Peter Weinberger Have you ever noticed that there are plenty of C books with suggestive names like C Traps and Pitfalls, or The C Puzzle Book, or Obfuscated C and Other Mysteries, but other programming languages don't have books like that? There's a very good reason for this! C programming is a craft that takes years to perfect. A reasonably sharp person can learn the basics of C quite quickly. But it takes much longer to master the nuances of the language and to write enough programs, and enough different programs, to become an expert. In natural language terms, this is the difference between being able to order a cup of coffee in Paris, and (on the Metro) being able to tell a native Parisienne where to get off. This book is an advanced text on the ANSI C programming language. It is intended for people who are already writing C programs, and who want to quickly pick up some of the insights and techniques of experts. Expert programmers build up a tool kit of techniques over the years; a grab-bag of idioms, code fragments, and deft skills. These are acquired slowly over time, learned from looking over the shoulders of more experienced colleagues, either directly or while maintaining code written by others. Other lessons in C are self-taught. Almost every beginning C programmer independently rediscovers the mistake of writing:

2010-03-05

object-oriented programming with ANSI c

Object-oriented programming is the current cure-all — although it has been around for much more then ten years. At the core, there is little more to it then finally applying the good programming principles which we have been taught for more then twenty years. C++ (Eiffel, Oberon-2, Smalltalk ... take your pick) is the New Language because it is object-oriented — although you need not use it that way if you do not want to (or know how to), and it turns out that you can do just as well with plain ANSI-C. Only object-orientation permits code reuse between pro- jects — although the idea of subroutines is as old as computers and good program- mers always carried their toolkits and libraries with them. This book is not going to praise object-oriented programming or condemn the Old Way. We are simply going to use ANSI-C to discover how object-oriented pro- gramming is done, what its techniques are, why they help us solve bigger prob- lems, and how we harness generality and program to catch mistakes earlier. Along the way we encounter all the jargon — classes, inheritance, instances, linkage, methods, objects, polymorphisms, and more — but we take it out of the realm of magic and see how it translates into the things we have known and done all along. I had fun discovering that ANSI-C is a full-scale object-oriented language. To share this fun you need to be reasonably fluent in ANSI-C to begin with — feeling comfortable with structures, pointers, prototypes, and function pointers is a must. Working through the book you will encounter all the newspeak — according to Orwell and Webster a language ‘‘designed to diminish the range of thought’’ — and I will try to demonstrate how it merely combines all the good programming princi- ples that you always wanted to employ into a coherent approach. As a result, you may well become a more proficient ANSI-C programmer. The first six chapters develop the foundations of object-oriented programming with ANSI-C. We start with a careful information hiding technique for abstract data types, add generic functions based on dynamic linkage and inherit code by judicious lengthening of structures. Finally, we put it all together in a class hierarchy that makes code much easier to maintain. Programming takes discipline. Good programming takes a lot of discipline, a large number of principles, and standard, defensive ways of doing things right. Pro- grammers use tools. Good programmers make tools to dispose of routine tasks once and for all. Object-oriented programming with ANSI-C requires a fair amount of immutable code — names may change but not the structures. Therefore, in chapter seven we build a small preprocessor to create the boilerplate required. It looks like yet another new object-oriented dialect language (yanoodl perhaps?) but it should not be viewed as such — it gets the dull parts out of the way and lets us concentrate on the creative aspects of problem solving with better techniques. ooc

2010-03-05

programming erlang

Many of the designations used by manufacturers and sellers to distinguish their prod- ucts are claimed as trademarks. Where those designations appear in this book, and The Pragmatic Programmers, LLC was aware of a trademark claim, the designations have been printed in initial capital letters or in all capitals. The Pragmatic Starter Kit, The Pragmatic Programmer, Pragmatic Programming, Pragmatic Bookshelf and the linking g device are trademarks of The Pragmatic Programmers, LLC. Every precaution was taken in the preparation of this book. However, the publisher assumes no responsibility for errors or omissions, or for damages that may result from the use of information (including program listings) contained herein. Our Pragmatic courses, workshops, and other products can help you and your team create better software and have more fun. For more information, as well as the latest Pragmatic titles, please visit us at

2010-03-05

mastering unix shell scripting

The information that I gathered together in this book is the result of working with some of the most talented UNIX professionals on the topic. I have enjoyed every minute of my association with these UNIX gurus and it has been my pleasure to have the opportunity to gain so much knowledge from the pros. I want to thank every one of these experts for asking and answering questions over the last fifteen years. If my brother, Jim, had not kept telling me, “you should write a book,” after querying me for UNIX details on almost a weekly basis, I doubt this book would have ever been writ- ten. So, thanks Jim! I especially want to thank Jack Renfro at Daimler/Chrysler Corporation for giving me my first shell scripting project so long ago. I had to start with the man pages, but that is how I learned to dig deep to get an answer. Since then I have been on a mission to automate, through shell scripting, everything on every system that I come in contact with. I certainly value the years that I was able to work with Jack. I must also thank the talented people at Wiley Publishing. Margaret Eldridge started me on this project by letting me do my own thing, and Carol Long kept me going. Scott Amerman kept me on schedule, and Angela Smith did the edits that make my writing flow with ease. It has been a valuable experience for me to work with such a fine group of professionals at Wiley. I also want to thank Carole McClendon at Waterside Produc- tions for all of the support on this project. Carole is the best Agent that anyone could ever ask for. She is a true professional with the highest ethics. Of course my family had a lot to do with my success on this and every project. I want to thank Mom, Gene, Jim, Marcia, Rusty, Mallory, and Anica. I want to thank my Wife Robin for her understanding and support. The girls, Andrea and Ana, always keep a smile on my face, and Steve is always on my mind. I could not have written this book without the support of all of these people and the many others that remain unnamed. It has been an honor!

2010-03-05

scaling_mongodb

In the Terminator movies, an artificial intelligence called Skynet wages war on humans, chugging along for decades creating robots and killing off humanity. This is the dream of most ops people—not to destroy humanity, but to build a distributed system that will work long-term without relying on people carrying pagers. Skynet is still a pipe dream, unfortunately, because distributed systems are very difficult, both to design well and to keep running. A single database server has a couple of basic states: it’s either up or down. If you add another machine and divide your data between the two, you now have some sort of dependency between the servers. How does it affect one machine if the other goes down? Can your application handle either (or both) machines going down? What if the two machines are up, but can’t communicate? What if they can communicate, but only very, very, slowly? As you add more nodes, these problems just become more numerous and complex: what happens if entire parts of your cluster can’t communicate with other parts? What happens if one subset of machines crashes? What happens if you lose an entire data center? Suddenly, even taking a backup becomes difficult: how do you take a consistent snapshot of many terabytes of data across dozens of machines without freezing out the application trying to use the data? If you can get away with a single server, it is much simpler. However, if you want to store a large volume of data or access it at a rate higher than a single server can handle, you’ll need to set up a cluster. On the plus side, MongoDB tries to take care of a lot of the issues listed above. Keep in mind that this isn’t as simple as setting up a single mongod (then again, what is?). This book shows you how to set up a robust cluster and what to expect every step of the way.

2012-05-31

refactoring improving the design of existing code

Your class library works, but could it be better? Refactoring: Improving the Design of Existing Code shows how refactoring can make object-oriented code simpler and easier to maintain. Today refactoring requires considerable design know-how, but once tools become available, all programmers should be able to improve their code using refactoring techniques. Besides an introduction to refactoring, this handbook provides a catalog of dozens of tips for improving code. The best thing about Refactoring is its remarkably clear presentation, along with excellent nuts-and-bolts advice, from object expert Martin Fowler. The author is also an authority on software patterns and UML, and this experience helps make this a better book, one that should be immediately accessible to any intermediate or advanced object-oriented developer. (Just like patterns, each refactoring tip is presented with a simple name, a "motivation," and examples using Java and UML.) Early chapters stress the importance of testing in successful refactoring. (When you improve code, you have to test to verify that it still works.) After the discussion on how to detect the "smell" of bad code, readers get to the heart of the book, its catalog of over 70 "refactorings"--tips for better and simpler class design. Each tip is illustrated with "before" and "after" code, along with an explanation. Later chapters provide a quick look at refactoring research. Like software patterns, refactoring may be an idea whose time has come. This groundbreaking title will surely help bring refactoring to the programming mainstream. With its clear advice on a hot new topic, Refactoring is sure to be essential reading for anyone who writes or maintains object-oriented software. --Richard Dragan Topics Covered: Refactoring, improving software code, redesign, design tips, patterns, unit testing, refactoring research, and tools. Book News, Inc. A guide to refactoring, the process of changing a software system so that it does not alter the external behavior of the code yet improves its internal structure, for professional programmers. Early chapters cover general principles, rationales, examples, and testing. The heart of the book is a catalog of refactorings, organized in chapters on composing methods, moving features between objects, organizing data, simplifying conditional expressions, and dealing with generalizations

2010-03-05

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval is the first textbook with a coherent treat- ment of classical and web information retrieval, including web search and the related areas of text classification and text clustering. Written from a computer science perspective, it gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents and of methods for evaluating systems, along with an introduction to the use of machine learning methods on text collections. Designed as the primary text for a graduate or advanced undergraduate course in information retrieval, the book will also interest researchers and professionals. A complete set of lecture slides and exercises that accompany the book are available on the web. Christopher D. Manning is Associate Professor of Computer Science and Lin- guistics at Stanford University. Prabhakar Raghavan is Head of Yahoo! Research and a Consulting Professor of Computer Science at Stanford University. ̈ Hinrich Schutze is Chair of Theoretical Computational Linguistics at the In- stitute for Natural Language Processing, University of Stuttgart.

2010-03-05

Mining the Web

Mining the Web Mining the Web Mining the Web Mining the Web

2009-03-03

The c programming language 2nd

Preface The computing world has undergone a revolution since the publication of The C Programming Language in 1978. Big computers are much bigger, and personal computers have capabilities that rival mainframes of a decade ago. During this time, C has changed too, although only modestly, and it has spread far beyond its origins as the language of the UNIX operating system. The growing popularity of C, the changes in the language over the years, and the creation of compilers by groups not involved in its design, combined to demonstrate a need for a more precise and more contemporary definition of the language than the first edition of this book provided. In 1983, the American National Standards Institute (ANSI) established a committee whose goal was to produce ``an unambiguous and machine-independent definition of the language C'', while still retaining its spirit. The result is the ANSI standard for C. The standard formalizes constructions that were hinted but not described in the first edition, particularly structure assignment and enumerations. It provides a new form of function declaration that permits cross-checking of definition with use. It specifies a standard library, with an extensive set of functions for performing input and output, memory management, string manipulation, and similar tasks. It makes precise the behavior of features that were not spelled out in the original definition, and at the same time states explicitly which aspects of the language remain machine-dependent. This Second Edition of The C Programming Language describes C as defined by the ANSI standard. Although we have noted the places where the language has evolved, we have chosen to write exclusively in the new form. For the most part, this makes no significant difference; the most visible change is the new form of function declaration and definition. Modern compilers already support most features of the standard. We have tried to retain the brevity of the first edition. C is not a big language, and it is not well served by a big book. We have improved the exposition of critical features, such as pointers, that are central to C programming. We have refined the original examples, and have added new examples in several chapters. For instance, the treatment of complicated declarations is augmented by programs that convert declarations into words and vice versa. As before, all examples have been tested directly from the text, which is in machine-readable form. Appendix A, the reference manual, is not the standard, but our attempt to convey the essentials of the standard in a smaller space. It is meant for easy comprehension by programmers, but not as a definition for compiler writers -- that role properly belongs to the standard itself. Appendix B is a summary of the facilities of the

2010-03-05

linux shell

about linux shell and unix shell

2008-09-26

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人