JanusGraph ,生产环境安装_seatunnel janusgraph-优快云博客

本文链接：https://blog.youkuaiyun.com/ProgrammerThreeStone/article/details/80688219

本文档详细介绍了JanusGraph在生产环境中所需的环境准备，包括Elasticsearch和Cassandra的安装步骤。首先，按照Elasticsearch的中文权威指南进行安装，接着安装Cassandra，确保开启Thrift协议。然后，解决了JanusGraph与Cassandra的连接问题，调整了Cassandra的配置文件，并处理了连接超时和与Elasticsearch的连接问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Part 1 Janusgraph 所需组的准备

######安装ElasticSearch
需要组建包括casandra 和 elasticsearsh 我们需要先在服务器上安装并准备这两个环境。首先从安装elasticsearch开始。此处可以参考中文版权威指南[ElasticSearch权威ie指南]https://es.xiaoleilu.com/010_Intro/10_Installing_ES.html.
安装可以使用rpm包完成，我使用的是rpm版本是6.2.4以下安装后给出提示。

准备中...                          ################################# [100%]
Creating elasticsearch group... OK
Creating elasticsearch user... OK
正在升级/安装...
   1:elasticsearch-0:6.2.4-1          ################################# [100%]

### NOT starting on installation, please execute the following statements to configure elasticsearch service to start automatically using systemd
 sudo systemctl daemon-reload
 sudo systemctl enable elasticsearch.service
### You can start elasticsearch service by executing
 sudo systemctl start elasticsearch.service

注意提示中说明，需要手动启elasticsearch，并配置开机自动启动。还需要注意的是，新版elasticsearch 在安装过程中，已经将es需要的用户和用户组设置好了。
安装过程注意使用root账号，作者使用sudo账号安装失败，切换后安装成功。成功后可以使用curl 命令测试安装结果

	curl 'http://localhost:9200/?pretty'

可选步骤包括给elasticsearch 安装监控程序 Marvel 安装前需要将elasticsearch 服务停止，命令如下

sudo systemctl stop elasticsearch.service

接下来安装marve 其具体的安装步骤见如下连接
[marvel 安装步骤详解]
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/installing-xpack-es.html
笔者在安装marvel的时候，遇到问题。回退到rpm 安装的elasticsearch 的初始化版本。后续手动查看log,此处ElasticEearch 的安装工作暂时完成。

安装后可将/etc/elasticsearch/elasticsearch.yaml 中

network.host: 0.0.0.0 #开启es的外部访问

安装Cassandra

安装Cassandra 的方式建议采用rpm 源的方式非常简单。[Cassandra的官方安装教程]http://cassandra.apache.org/download/

sudo yum install cassandra  #代码示例，此处需要配置rpm源

安装过程可能会非常缓慢，需要耐心等待…
安装完成后需要重载服务，并启动Cassandra 服务

systemctl daemon-reload #重载服务
systemctl start cassandra.service #启动Cassandra

因为janusgraph 底层依赖thrift 进行rpc ，此处需要打开cassandra 的thrift 协议

./bin/nodetool enablethrift #打开thrift命令
#下面是打印的日志 开启thirft 打印的日志
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/apps/janusgraph-0.2.0-hadoop2/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/apps/janusgraph-0.2.0-hadoop2/lib/logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

#####配置启动JanusGraph
问题1：JanusGraph 与Cassandra 不连同

解决办法

打开cassandra的thrift连

nodetool enablethrift #在casandra的bin目录下执行

配置cassandra.yaml 文件使用cassandra主机地址进行广播。直接将cassandra.yaml文件内容贴在下面读者可作参考

# Cassandra storage config YAML

# NOTE:
#   See http://wiki.apache.org/cassandra/StorageConfiguration for
#   full explanations of configuration directives
# /NOTE

# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'Test Cluster'

# This defines the number of tokens randomly assigned to this node on the ring
# The more tokens, relative to other nodes, the larger the proportion of data
# that this node will store. You probably want all nodes to have the same number
# of tokens assuming they have equal hardware capability.
#
# If you leave this unspecified, Cassandra will use the default of 1 token for legacy compatibility,
# and will use the initial_token as described below.
#
# Specifying initial_token will override this setting on the node's initial start,
# on subsequent starts, this setting will apply even if initial token is set.
#
# If you already have a cluster with 1 token per node, and wish to migrate to 
# multiple tokens per node, see http://wiki.apache.org/cassandra/Operations
num_tokens: 256

# Triggers automatic allocation of num_tokens tokens for this node. The allocation
# algorithm attempts to choose tokens in a way that optimizes replicated load over
# the nodes in the datacenter for the replication strategy used by the specified
# keyspace.
#
# The load assigned to each node will be close to proportional to its number of
# vnodes.
#
# Only supported with the Murmur3Partitioner.
# allocate_tokens_for_keyspace: KEYSPACE

# initial_token allows you to specify tokens manually.  While you can use it with
# vnodes (num_tokens > 1, above) -- in which case you should provide a 
# comma-separated list -- it's primarily used when adding nodes to legacy clusters 
# that do not have vnodes enabled.
# initial_token:

# See http://wiki.apache.org/cassandra/HintedHandoff
# May either be "true" or "false" to enable globally
hinted_handoff_enabled: true

# When hinted_handoff_enabled is true, a black list of data centers that will not
# perform hinted handoff
# hinted_handoff_disabled_datacenters:
#    - DC1
#    - DC2

# this defines the maximum amount of time a dead host will have hints
# generated.  After it has been dead this long, new hints for it will not be
# created until it has been seen alive and gone down again.
max_hint_window_in_ms: 10800000 # 3 hours

# Maximum throttle in KBs per second, per delivery thread.  This will be
# reduced proportionally to the number of nodes in the cluster.  (If there
# are two nodes in the cluster, each delivery thread will use the maximum
# rate; if there are three, each will throttle to half of the maximum,
# since we expect two nodes to be delivering hints simultaneously.)
hinted_handoff_throttle_in_kb: 1024

# Number of threads with which to deliver hints;
# Consider increasing this number when you have multi-dc deployments, since
# cross-dc handoff tends to be slower
max_hints_delivery_threads: 2

# Directory where Cassandra should store hints.
# If not set, the default directory is $CASSANDRA_HOME/data/hints.
# hints_directory: /var/lib/cassandra/hints

# How often hints should be flushed from the internal buffers to disk.
# Will *not* trigger fsync.
hints_flush_period_in_ms: 10000

# Maximum size for a single hints file, in megabytes.
max_hints_file_size_in_mb: 128

# Compression to apply to the hint files. If omitted, hints files
# will be written uncompressed. LZ4, Snappy, and Deflate compressors
# are supported.
#hints_compression:
#   - class_name: LZ4Compressor
#     parameters:
#         -

# Maximum throttle in KBs per second, total. This will be
# reduced proportionally to the number of nodes in the cluster.
batchlog_replay_throttle_in_kb: 1024

# Authentication backend, implementing IAuthenticator; used to identify users
# Out of the box, Cassandra provides org.apache.cassandra.auth.{AllowAllAuthenticator,
# PasswordAuthenticator}.
#
# - AllowAllAuthenticator performs no checks - set it to disable authentication.
# - PasswordAuthenticator relies on username/password pairs to authenticate
#   users. It keeps usernames and hashed passwords in system_auth.roles table.
#   Please increase system_auth keyspace replication factor if you use this authenticator.
#   If using PasswordAuthenticator, CassandraRoleManager must also be used (see below)
authenticator: AllowAllAuthenticator

# Authorization backend, implementing IAuthorizer; used to limit access/provide permissions
# Out of the box, Cassandra provides org.apache.cassandra.auth.{AllowAllAuthorizer,
# CassandraAuthorizer}.
#
# - AllowAllAuthorizer allows any action to any user - set it to disable authorization.
# - CassandraAuthorizer stores permissions in system_auth.role_permissions table. Please
#   increase system_auth keyspace replication factor if you use this authorizer.
authorizer: AllowAllAuthorizer

# Part of the Authentication & Authorization backend, implementing IRoleManager; used
# to maintain grants and memberships between roles.
# Out of the box, Cassandra provides org.apache.cassandra.auth.CassandraRoleManager,
# which stores role information in the system_auth keyspace. Most functions of the
# IRoleManager require an authenticated login, so unless the configured IAuthenticator
# actually implements authentication, most of this functionality will be unavailable.
#
# - CassandraRoleManager stores role data in the system_auth keyspace. Please
#   increase system_auth keyspace replication factor if you use this role manager.
role_manager: CassandraRoleManager

# Validity period for roles cache (fetching granted roles can be an expensive
# operation depending on the role manager, CassandraRoleManager is one example)
# Granted roles are cached for authenticated sessions in AuthenticatedUser and
# after the period specified here, become eligible for (async) reload.
# Defaults to 2000, set to 0 to disable caching entirely.
# Will be disabled automatically for AllowAllAuthenticator.
roles_validity_in_ms: 2000

# Refresh interval for roles cache (if enabled).
# After this interval, cache entries become eligible for refresh. Upon next
# access, an async reload is scheduled and the old value returned until it
# completes. If roles_validity_in_ms is non-zero, then this must be
# also.
# Defaults to the same value as roles_validity_in_ms.
# roles_update_interval_in_ms: 2000

# Validity period for permissions cache (fetching permissions can be an
# expensive operation depending on the authorizer, CassandraAuthorizer is
# one example). Defaults to 2000, set to 0 to disable.
# Will be disabled automatically for AllowAllAuthorizer.
permissions_validity_in_ms: 2000

# Refresh interval for permissions cache (if enabled).
# After this interval, cache entries become eligible for refresh. Upon next
# access, an async reload is scheduled and the old value returned until it
# completes. If permissions_validity_in_ms is non-zero, then this must be
# also.
# Defaults to the same value as permissions_validity_in_ms.
# permissions_update_interval_in_ms: 2000

# Validity period for credentials cache. This cache is tightly coupled to
# the provided PasswordAuthenticator implementation of IAuthenticator. If
# another IAuthenticator implementation is configured, this cache will not
# be automatically used and so the following settings will have no effect.
# Please note, credentials are cached in their encrypted form, so while
# activating this cache may reduce the number of queries made to the
# underlying table, it may not  bring a significant reduction in the
# latency of individual authentication attempts.
# Defaults to 2000, set to 0 to disable credentials caching.
credentials_validity_in_ms: 2000

# Refresh interval for credentials cache (if enabled).
# After this interval, cache entries become eligible for refresh. Upon next
# access, an async reload is scheduled and the old value returned until it
# completes. If credentials_validity_in_ms is non-zero, then this must be
# also.
# Defaults to the same value as credentials_validity_in_ms.
# credentials_update_interval_in_ms: 2000

# The partitioner is responsible for distributing groups of rows (by
# partition key) across nodes in the cluster.  You should leave this
# alone for new clusters.  The partitioner can NOT be changed without
# reloading all data, so when upgrading you should set this to the
# same partitioner you were already using.
#
# Besides Murmur3Partitioner, partitioners included for backwards
# compatibility include RandomPartitioner, ByteOrderedPartitioner, and
# OrderPreservingPartitioner.
#
partitioner: org.apache.cassandra.dht.Murmur3Partitioner

# Directories where Cassandra should store data on disk.  Cassandra
# will spread data evenly across them, subject to the granularity of
# the configured compaction strategy.
# If not set, the default directory is $CASSANDRA_HOME/data/data.
# data_file_directories:
#     - /var/lib/cassandra/data

# commit log.  when running on magnetic HDD, this should be a
# separate spindle than the data directories.
# If not set, the default directory is $CASSANDRA_HOME/data/commitlog.
# commitlog_directory: /var/lib/cassandra/commitlog

# Enable / disable CDC functionality on a per-node basis. This modifies the logic used
# for write path allocation rejection (standard: never reject. cdc: reject Mutation
# containing a CDC-enabled table if at space limit in cdc_raw_directory).
cdc_enabled: false

# CommitLogSegments are moved to this directory on flush if cdc_enabled: true and the
# segment contains mutations for a CDC-enabled table. This should be placed on a
# separate spindle than the data directories. If not set, the default directory is
# $CASSANDRA_HOME/data/cdc_raw.
# cdc_raw_directory: /var/lib/cassandra/cdc_raw

# Policy for data disk failures:
#
# die
#   shut down gossip and client transports and kill the JVM for any fs errors or
#   single-sstable errors, so the node can be replaced.
#
# stop_paranoid
#   shut down gossip and client transports even for single-sstable errors,
#   kill the JVM for errors during startup.
#
# stop
#   shut down gossip and client transports, leaving the node effectively dead, but
#   can still be inspected via JMX, kill the JVM for errors during startup.
#
# best_effort
#    stop using the failed disk and respond to requests based on
#    remaining available sstables.  This means you WILL see obsole