What Factors Justify the Use of Apache Hadoop?

本文探讨了Hadoop面临的两大批评:大多数用户不需要处理大数据及MapReduce过于复杂。同时,文章也提到了Hadoop的支持者如何回应这些批评,并讨论了Hadoop在中等规模数据处理中的优势及其生态系统的发展。

Relational database authors and advocates have two criticisms of Hadoop. First, that most users have little need for Big Data. Second, that MapReduce is more complex than traditional SQL queries.

Both of these criticisms are valid.

In a post entitled “Terabytes is not big data, petabytes is,” Henrik Ingo argued that the gigabytes and terabytes I referenced as Big Data did not justify that term. He is correct. Further, it is true that the number of enterprises worldwide with petabyte scale data management challenges is limited.

MapReduce, for its part, is in fact challenging. Challenging enough that there are two separate projects (Hive and Pig) that add SQL-like interfaces as a complement to the core Hadoop MapReduce functionality. Besides being more accessible, SQL skills are an order of magnitude more common from a resource availability standpoint.

Hadoop supporters, meanwhile, counter both of those concerns.

It was Hadoop sponsor Cloudera, in fact, that originally coined the term “Medium Data” as an acknowledgement that data complexity was not purely a function of volume. As Bradford Cross put it:

Companies do not have to be at Google scale to have data issues. Scalability issues occur with less than a terabyte of data. If a company works with relational databases and SQL, they can drown in complex data transformations and calculations that do not fit naturally into sequences of set operations. In that sense, the “big data” mantra is misguided at times…The big issue is not that everyone will suddenly operate at petabyte scale; a lot of folks do not have that much data. The more important topics are the specifics of the storage and processing infrastructure and what approaches best suit each problem.

Big Data, like NoSQL, has become a liability in most contexts. Setting aside the lack of a consistent definition, the term is of little utility because it is single-dimensional. Larger dataset sizes present unique computational challenges. But the structure, workload, accessibility and even location of the data may prove equally challenging.

We use Hadoop at RedMonk, for example, to attack unstructured and semi-structured datasets without the overhead of an ETL step to insert them into a traditional relational database. From CSV to XML, we can load in a single step and begin querying.

There are a variety of options for data mining at the scale we practice it. From the basic grep to the Perl CPAN modules Henrik points to, there are many tools that would provide us with similar capabilities. Why Hadoop? Because the ecosystem is growing, the documentation is generally excellent, the unstructured nature of our datasets and, yes, its ability to attack Big Data. Because while our datasets – at least individually – do not constitute Big Data, they are growing rapidly.

Nor have we had to learn MapReduce. The Hadoop ecosystem at present is rich enough already that we have a variety of front end options available, from visual spreadsheet metaphors (Big Sheets) to SQL-style queries (Hive) with a web UI (Beeswax). No Java necessary.

Brian Aker’s comparison of MapReduce to an SUV in Henrik’s piece is apt; whether you’re a supporter of Hadoop or not, curiously. Brian’s obviously correct that a majority of users will use a minority of its capabilities. Much like SUVs and their owners.

While the overkill of an SUV is offset by its higher fuel costs and size, however, the downside to Hadoop usage is less apparent. Its single node performance is merely adequate and the front ends are immature relative to the tooling available in the relational database world, but the build out around the core is improving by the day.

When is Hadoop justified? For a petabyte workloads, certainly. But the versatility of tool makes it appropriate for a variety of workloads beyond quote unquote big data. It’s not going to replace your database, but your database isn’t likely to replace Hadoop either.

Different tools for different jobs, as ever.

Disclosure: Cloudera is a RedMonk customer.

 

<think>嗯,用户之前问过如何在Vue2和Element UI的el-main里做一个右侧侧边栏,我给了用Flex布局的解决方案。现在用户问的是“justify-content 能实现吗?”,这说明用户可能对Flex布局中的justify-content属性不太熟悉,或者想知道是否可以用这个属性来实现侧边栏的右侧定位。 首先,我需要确认用户的问题点。用户可能在使用Flex布局时,对如何排列子元素有疑问。他们可能尝试过使用justify-content来让侧边栏靠右,但发现效果不如预期,所以想知道是否可行。 接下来,我应该回忆Flex布局中justify-content的作用。这个属性主要用于在主轴上分配子元素之间的空间,比如水平方向(默认情况下)。而用户想要侧边栏在右侧,可能需要让主内容区占据剩余空间,侧边栏自然靠右。这时候,flex:1在内容区加上侧边栏固定宽度,其实已经可以达到效果,不需要justify-content。 但用户可能误解了主轴的方向,如果容器是flex-direction: row(默认),主轴是水平的,justify-content会影响水平排列。例如,如果侧边栏在内容区之后,设置justify-content: flex-end会让所有子元素靠右,导致内容区被压缩到右边,侧边栏紧贴其左,这显然不是用户想要的。 所以正确的做法应该是利用flex-grow让内容区自动填充剩余空间,这样侧边栏自然会在右侧。而justify-content在这里并不适用,因为它的作用是分配子元素之间的空间,而不是单个元素的定位。 另外,用户可能没有完全理解Flex布局中主轴和交叉轴的概念,这时候需要简单说明一下两者的区别,以及justify-content和align-items的不同作用。可能用户混淆了这两个属性,导致尝试错误的方法。 总结下来,回答应该明确说明justify-content不适合这个场景,并解释原因,同时对比正确的实现方法,即使用flex-grow或flex属性。还要给出代码示例,说明两者的区别,帮助用户理解为什么他们的尝试可能不成功,以及正确的做法是什么。 此外,用户可能是在尝试调整布局时遇到了问题,比如侧边栏没有靠右,或者内容区没有占据正确的位置。这时候需要指出常见的误区,并提供替代方案,比如margin-left: auto的方法,这也是Flex布局中常用的技巧,让元素靠右。 最后,确保回答清晰,结构合理,先回答问题是否可行,再解释原因,然后给出正确的方法,并附上示例代码,帮助用户巩固理解。同时提醒用户注意Flex容器的方向,以及子元素的排列顺序,避免其他潜在的问题。</think>在Flex布局中,`justify-content` **不能直接实现右侧侧边栏布局**。以下是详细说明和替代方案: --- ### ⚠️ `justify-content` 的局限性 1. **作用范围** `justify-content` 控制的是 **主轴方向** 的子项分布(默认主轴为水平方向),适用于多个子项的**整体分布控制**(如均匀分布、两端对齐等) 2. **不适配场景** 当需要实现「主内容区自适应 + 侧边栏固定右侧」时,`justify-content` 无法单独实现这种**非对称布局** --- ### ✅ 推荐实现方案(通过flex自动填充) ```html <div class="container"> <div class="main-content">主内容区(自动填充剩余宽度)</div> <div class="sidebar">右侧侧边栏(固定300px)</div> </div> <style> .container { display: flex; height: 100vh; } .main-content { flex: 1; /* 关键属性:自动填充剩余空间 */ background: #f0f0f0; } .sidebar { width: 300px; background: #ffffff; box-shadow: -2px 0 8px rgba(0,0,0,0.1); } </style> ``` --- ### 🔄 对比 `justify-content` 的不同表现 | 方案 | 代码 | 效果 | 是否适用场景 | |--------------------------|-----------------------------|---------------------------|------------------------| | 自动填充(推荐) | `flex: 1` + `width` | 主内容自适应,侧边栏固定右侧 | ✅ 单侧边栏布局 | | justify-content: flex-end| `justify-content: flex-end` | 所有子元素整体右对齐 | ❌ 导致主内容被压缩变形 | --- ### 🎯 如果坚持使用 `justify-content` 的替代方案 ```html <div class="container"> <div class="sidebar"></div> <div class="main-content"></div> </div> <style> .container { display: flex; justify-content: flex-end; /* 强制右对齐 */ } .main-content { width: calc(100% - 300px); /* 需要手动计算剩余宽度 */ } .sidebar { width: 300px; } </style> ``` ⚠️ 注意:这种方案需要: 1. 调整DOM元素顺序(侧边栏在前) 2. 手动计算主内容宽度 3. 灵活性较差,不推荐实际使用 --- ### 📚 关键知识总结 | 属性/方法 | 适用场景 | 实现原理 | |------------------------|-------------------------------|--------------------------| | `flex: 1` | 单侧边栏布局(推荐) | 自动填充剩余空间 | | `margin-left: auto` | 单个元素右对齐 | 通过自动外边距推动元素 | | `justify-content` | 多个子项的整体分布控制 | 控制主轴方向的空间分配 | --- ### 🛠️ 最佳实践建议 ```css /* 更健壮的实现方案 */ .container { display: flex; min-height: 100vh; /* 确保容器高度 */ } .main-content { flex: 1; order: 1; /* 通过order控制显示顺序 */ min-width: 0; /* 防止内容溢出 */ } .sidebar { order: 2; width: 300px; flex-shrink: 0; /* 禁止侧边栏收缩 */ } ```
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值