MongoDB Hadoop Connector的1.0版本

10gen发布MongoDB Hadoop Connector 1.0版本,作为中间件连接MongoDB与Hadoop,使用户能利用Hadoop进行分布式计算并处理MongoDB数据。支持通过Pig写入数据、Flume导入日志及使用Python编写MapReduce。
10gen刚刚发布了MongoDB Hadoop Connector的1.0版本,它是一个中间件产品,用于将MongoDB和Hadoop连接起来,让MongoDB可以方便的使用Hadoop的分布式计算能力。MongoDB Hadoop Connector的主要流程是让Hadoop从MongoDB中读取原始数据,在通过Hadoop计算完成后,再将结果导入到MongoDB中。原始数据的读取和结果写入可以对同一个MongoDB,也可以是不同的。其主要目的是让使用MongoDB的用户能够更方便地直接使用Hadoop功能。目前MongoDB Hadoop Connector已经与Hadoop生态系统中的一些组件进行了整合,后续还会根据反馈进行更全方便的整合。具体如下:可以通过 Pig 向 MongoDB 中写入数据。可以通过分布式日志系统 Flume 往MongoDB中导入日志原始数据。通过使用Hadoop Streaming,可以用 Python来写 MapReduce函数。MongoDB Hadoop Connector目前支持2.0以上的版本(1.8.x版本也基本上能够支持)。当然项目是开源的,项目地址:mongo-hadoop
这是我的xml文件内容:<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.example</groupId> <artifactId>spark-mongodb-processor</artifactId> <version>1.0-SNAPSHOT</version> <properties> <maven.compiler.source>8</maven.compiler.source> <maven.compiler.target>8</maven.compiler.target> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <!-- 统一版本配置 --> <spark.version>3.5.0</spark.version> <scala.binary.version>2.12</scala.binary.version> <scala.version>2.12.18</scala.version> <mongodb.connector.version>10.2.1</mongodb.connector.version> <mongodb.driver.version>4.11.1</mongodb.driver.version> <hadoop.version>3.3.6</hadoop.version> <log4j.version>2.20.0</log4j.version> <shade.plugin.version>3.5.1</shade.plugin.version> <scala.maven.plugin.version>4.8.1</scala.maven.plugin.version> <compiler.plugin.version>3.11.0</compiler.plugin.version> </properties> <dependencies> <!-- Spark 核心依赖 --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.binary.version}</artifactId> <version>${spark.version}</version> <scope>provided</scope> </dependency> <!-- Spark SQL 依赖 --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.binary.version}</artifactId> <version>${spark.version}</version> <scope>provided</scope> </dependency> <!-- MongoDB Spark Connector --> <dependency> <groupId>org.mongodb.spark</groupId> <artifactId>mongo-spark-connector_${scala.binary.version}</artifactId> <version>${mongodb.connector.version}</version> <!-- 简化排除配置 --> <exclusions> <exclusion> <groupId>org.mongodb</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <!-- MongoDB 驱动依赖 - 统一版本 --> <dependency> <groupId>org.mongodb</groupId> <artifactId>mongodb-driver-sync</artifactId> <version>${mongodb.driver.version}</version> </dependency> <!-- Scala 语言依赖 --> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> <!-- Hadoop 客户端 - 更新到较新版本 --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> <scope>provided</scope> </dependency> <!-- 日志处理 --> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-api</artifactId> <version>${log4j.version}</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-core</artifactId> <version>${log4j.version}</version> </dependency> <!-- 添加 SLF4J 桥接,避免日志冲突 --> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-slf4j-impl</artifactId> <version>${log4j.version}</version> </dependency> </dependencies> <build> <plugins> <!-- Scala 编译插件 --> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>${scala.maven.plugin.version}</version> <executions> <execution> <id>scala-compile-first</id> <phase>process-resources</phase> <goals> <goal>add-source</goal> <goal>compile</goal> </goals> </execution> <execution> <id>scala-test-compile</id> <phase>process-test-resources</phase> <goals> <goal>testCompile</goal> </goals> </execution> </executions> <configuration> <scalaVersion>${scala.version}</scalaVersion> <args> <arg>-target:jvm-1.8</arg> </args> </configuration> </plugin> <!-- Java 编译插件 --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>${compiler.plugin.version}</version> <configuration> <source>${maven.compiler.source}</source> <target>${maven.compiler.target}</target> <encoding>${project.build.sourceEncoding}</encoding> </configuration> </plugin> <!-- Shade 打包插件 --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>${shade.plugin.version}</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <shadedArtifactAttached>true</shadedArtifactAttached> <shadedClassifierName>shaded</shadedClassifierName> <createDependencyReducedPom>true</createDependencyReducedPom> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <!-- 请确认这个主类路径是否正确 --> <mainClass>EmotionAnalysis.EmoAnalysis</mainClass> </transformer> <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>reference.conf</resource> </transformer> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> </transformers> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> <exclude>module-info.class</exclude> <!-- 排除不必要的许可证文件 --> <exclude>LICENSE</exclude> <exclude>NOTICE</exclude> </excludes> </filter> </filters> <!-- 优化包含配置 --> <artifactSet> <includes> <include>org.mongodb:*</include> <include>org.mongodb.spark:*</include> <include>org.apache.logging.log4j:*</include> </includes> </artifactSet> </configuration> </execution> </executions> </plugin> </plugins> </build> </project>
最新发布
10-24
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值