通知设置 新通知
python redis.StrictRedis.from_url 连接redis
李魔佛 发表了文章 • 0 个评论 • 5711 次浏览 • 2019-08-23 16:43
用url的方式连接redis
r=redis.StrictRedis.from_url(url)
url为以下的格式:
redis://[:password]@localhost:6379/0
rediss://[:password]@localhost:6379/0
unix://[:password]@/path/to/socket.sock?db=0
原创文章,转载请注明出处:
http://30daydo.com/article/527
查看全部
用url的方式连接redis
r=redis.StrictRedis.from_url(url)
url为以下的格式:
redis://[:password]@localhost:6379/0
rediss://[:password]@localhost:6379/0
unix://[:password]@/path/to/socket.sock?db=0
原创文章,转载请注明出处:
http://30daydo.com/article/527
mongodb 判断列表字段不为空
李魔佛 发表了文章 • 0 个评论 • 7803 次浏览 • 2019-08-20 11:08
db.test_tab.insert({array:[]})
db.test_tab.insert({array:[]})
db.test_tab.insert({array:[]})
db.test_tab.insert({array:[1,2,3,4,5]})
db.test_tab.insert({array:[1,2,3,4,5,6]})
使用以下命令判断列表不为空:
db.getCollection("example").find({array:{$exists:true,$ne:[]}}); # 字段不为0 查看全部
db.test_tab.insert({array:[]})
db.test_tab.insert({array:[]})
db.test_tab.insert({array:[]})
db.test_tab.insert({array:[1,2,3,4,5]})
db.test_tab.insert({array:[1,2,3,4,5,6]})
使用以下命令判断列表不为空:
db.getCollection("example").find({array:{$exists:true,$ne:[]}}); # 字段不为0
redis health_check_interval 参数无效
李魔佛 发表了文章 • 0 个评论 • 6996 次浏览 • 2019-08-09 16:13
# helper
class RedisHelp(object):
def __init__(self,channel):
# self.pool = redis.ConnectionPool('10.18.6.46',port=6379)
# self.conn = redis.Redis(connection_pool=self.pool)
# 上面的方式无法使用订阅者 发布者模式
self.conn = redis.Redis(host='10.18.6.46')
self.publish_channel = channel
self.subscribe_channel = channel
def publish(self,msg):
self.conn.publish(self.publish_channel,msg) # 1. 渠道名 ,2 信息
def subscribe(self):
self.pub = self.conn.pubsub()
self.pub.subscribe(self.subscribe_channel)
self.pub.parse_response()
print('initial')
return self.pub
helper = RedisHelp('cuiqingcai')
# 订阅者
if sys.argv[1]=='s':
print('in subscribe mode')
pub = helper.subscribe()
while 1:
print('waiting for publish')
pubsub.check_health()
msg = pub.parse_response()
s=str(msg[2],encoding='utf-8')
print(s)
if s=='exit':
break
# 发布者
elif sys.argv[1]=='p':
print('in publish mode')
msg = sys.argv[2]
print(f'msg -> {msg}')
helper.publish(msg)
而官网的文档说使用参数:
health_check_interval=30 # 30s心跳检测一次
但实际上这个参数在最新的redis 3.3以上是被去掉了。 所以是无办法使用 self.conn = redis.Redis(host='10.18.6.46',health_check_interval=30)
这点在作者的github页面里面也得到了解释。
https://github.com/andymccurdy/redis-py/issues/1199
所以要改成
data = client.blpop('key', timeout=300)
300s后超时,data为None,重新监听。
查看全部
# helper
class RedisHelp(object):
def __init__(self,channel):
# self.pool = redis.ConnectionPool('10.18.6.46',port=6379)
# self.conn = redis.Redis(connection_pool=self.pool)
# 上面的方式无法使用订阅者 发布者模式
self.conn = redis.Redis(host='10.18.6.46')
self.publish_channel = channel
self.subscribe_channel = channel
def publish(self,msg):
self.conn.publish(self.publish_channel,msg) # 1. 渠道名 ,2 信息
def subscribe(self):
self.pub = self.conn.pubsub()
self.pub.subscribe(self.subscribe_channel)
self.pub.parse_response()
print('initial')
return self.pub
helper = RedisHelp('cuiqingcai')
# 订阅者
if sys.argv[1]=='s':
print('in subscribe mode')
pub = helper.subscribe()
while 1:
print('waiting for publish')
pubsub.check_health()
msg = pub.parse_response()
s=str(msg[2],encoding='utf-8')
print(s)
if s=='exit':
break
# 发布者
elif sys.argv[1]=='p':
print('in publish mode')
msg = sys.argv[2]
print(f'msg -> {msg}')
helper.publish(msg)
而官网的文档说使用参数:
health_check_interval=30 # 30s心跳检测一次
但实际上这个参数在最新的redis 3.3以上是被去掉了。 所以是无办法使用 self.conn = redis.Redis(host='10.18.6.46',health_check_interval=30)
这点在作者的github页面里面也得到了解释。
https://github.com/andymccurdy/redis-py/issues/1199
所以要改成
data = client.blpop('key', timeout=300)
300s后超时,data为None,重新监听。
mongodb 修改嵌套字典字典的字段名
李魔佛 发表了文章 • 0 个评论 • 4919 次浏览 • 2019-08-05 13:55
db.test.update({},{$rename:{'旧字段':'新字段'}},true,true)
比如下面的例子:db.getCollection('example').update({},{$rename:{'corp':'企业'}})
上面就是把字段corp改为企业。
如果是嵌套字段呢?
比如 corp字典是一个字典,里面是 { 'address':'USA', 'phone':'12345678' }
那么要修改里面的address为地址:
db.getCollection('example').update({},{$rename:{'corp.address':'corp.地址'}})
原创文章,转载请注明出处
原文连接:http://30daydo.com/article/521
查看全部
db.test.update({},{$rename:{'旧字段':'新字段'}},true,true)
比如下面的例子:
db.getCollection('example').update({},{$rename:{'corp':'企业'}})
上面就是把字段corp改为企业。
如果是嵌套字段呢?
比如 corp字典是一个字典,里面是 { 'address':'USA', 'phone':'12345678' }
那么要修改里面的address为地址:
db.getCollection('example').update({},{$rename:{'corp.address':'corp.地址'}})
原创文章,转载请注明出处
原文连接:http://30daydo.com/article/521
mongodb motor 异步操作比同步操作的时间要慢?
量化投机者 回复了问题 • 2 人关注 • 1 个回复 • 4185 次浏览 • 2019-08-03 09:01
mongodb find得到的数据顺序每次都是一样的
李魔佛 发表了文章 • 0 个评论 • 2471 次浏览 • 2019-07-26 09:00
Django 版本不兼容报错 AuthenticationMiddleware
李魔佛 发表了文章 • 0 个评论 • 5818 次浏览 • 2019-07-04 15:43
?: (admin.E408) 'django.contrib.auth.middleware.AuthenticationMiddleware' must be in MIDDLEWARE in order to use the admin application.
在之前的版本上没有问题,更新后就出错。
降级Django
pip install django=2.1.7
PS: 这个django的版本兼容的确是个大问题,哪天升级了下django版本,不经过严格的测试就带来灾难性的后果。 查看全部
ERRORS:
?: (admin.E408) 'django.contrib.auth.middleware.AuthenticationMiddleware' must be in MIDDLEWARE in order to use the admin application.
在之前的版本上没有问题,更新后就出错。
降级Django
pip install django=2.1.7
PS: 这个django的版本兼容的确是个大问题,哪天升级了下django版本,不经过严格的测试就带来灾难性的后果。
使用pymongo中的find_one_and_update出错:需要分片键
李魔佛 发表了文章 • 0 个评论 • 4593 次浏览 • 2019-06-10 17:13
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Query for sharded findAndModify must contain the shard key
2019-06-10 16:14:32 [scrapy.core.engine] INFO: Closing spider (finished)
2019-06-10 16:14:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
需要在查询语句中把分片键也添加进去。
因为findOneModify只会找一个记录,但是到底在哪个分片的记录呢? 因为不确定,所以才需要把shard加上去。
参考官方:
Targeted Operations vs. Broadcast Operations
Generally, the fastest queries in a sharded environment are those that mongos route to a single shard, using the shard key and the cluster meta data from the config server. These targeted operations use the shard key value to locate the shard or subset of shards that satisfy the query document.
For queries that don’t include the shard key, mongos must query all shards, wait for their responses and then return the result to the application. These “scatter/gather” queries can be long running operations.
Broadcast Operations
mongos instances broadcast queries to all shards for the collection unless the mongos can determine which shard or subset of shards stores this data.
After the mongos receives responses from all shards, it merges the data and returns the result document. The performance of a broadcast operation depends on the overall load of the cluster, as well as variables like network latency, individual shard load, and number of documents returned per shard. Whenever possible, favor operations that result in targeted operation over those that result in a broadcast operation.
Multi-update operations are always broadcast operations.
The updateMany() and deleteMany() methods are broadcast operations, unless the query document specifies the shard key in full.
Targeted Operations
mongos can route queries that include the shard key or the prefix of a compound shard key a specific shard or set of shards. mongos uses the shard key value to locate the chunk whose range includes the shard key value and directs the query at the shard containing that chunk.
For example, if the shard key is:
copy
{ a: 1, b: 1, c: 1 }
The mongos program can route queries that include the full shard key or either of the following shard key prefixes at a specific shard or set of shards:
copy
{ a: 1 }
{ a: 1, b: 1 }
All insertOne() operations target to one shard. Each document in the insertMany() array targets to a single shard, but there is no guarantee all documents in the array insert into a single shard.
All updateOne(), replaceOne() and deleteOne() operations must include the shard key or _id in the query document. MongoDB returns an error if these methods are used without the shard key or _id.
Depending on the distribution of data in the cluster and the selectivity of the query, mongos may still perform a broadcast operation to fulfill these queries.
Index Use
If the query does not include the shard key, the mongos must send the query to all shards as a “scatter/gather” operation. Each shard will, in turn, use either the shard key index or another more efficient index to fulfill the query.
If the query includes multiple sub-expressions that reference the fields indexed by the shard key and the secondary index, the mongos can route the queries to a specific shard and the shard will use the index that will allow it to fulfill most efficiently.
Sharded Cluster Security
Use Internal Authentication to enforce intra-cluster security and prevent unauthorized cluster components from accessing the cluster. You must start each mongod or mongos in the cluster with the appropriate security settings in order to enforce internal authentication.
See Deploy Sharded Cluster with Keyfile Access Control for a tutorial on deploying a secured shardedcluster.
Cluster Users
Sharded clusters support Role-Based Access Control (RBAC) for restricting unauthorized access to cluster data and operations. You must start each mongod in the cluster, including the config servers, with the --auth option in order to enforce RBAC. Alternatively, enforcing Internal Authentication for inter-cluster security also enables user access controls via RBAC.
With RBAC enforced, clients must specify a --username, --password, and --authenticationDatabase when connecting to the mongos in order to access cluster resources.
Each cluster has its own cluster users. These users cannot be used to access individual shards.
See Enable Access Control for a tutorial on enabling adding users to an RBAC-enabled MongoDB deployment. 查看全部
File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\helpers.py", line 155, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Query for sharded findAndModify must contain the shard key
2019-06-10 16:14:32 [scrapy.core.engine] INFO: Closing spider (finished)
2019-06-10 16:14:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
需要在查询语句中把分片键也添加进去。
因为findOneModify只会找一个记录,但是到底在哪个分片的记录呢? 因为不确定,所以才需要把shard加上去。
参考官方:
Targeted Operations vs. Broadcast Operations
Generally, the fastest queries in a sharded environment are those that mongos route to a single shard, using the shard key and the cluster meta data from the config server. These targeted operations use the shard key value to locate the shard or subset of shards that satisfy the query document.
For queries that don’t include the shard key, mongos must query all shards, wait for their responses and then return the result to the application. These “scatter/gather” queries can be long running operations.
Broadcast Operations
mongos instances broadcast queries to all shards for the collection unless the mongos can determine which shard or subset of shards stores this data.
After the mongos receives responses from all shards, it merges the data and returns the result document. The performance of a broadcast operation depends on the overall load of the cluster, as well as variables like network latency, individual shard load, and number of documents returned per shard. Whenever possible, favor operations that result in targeted operation over those that result in a broadcast operation.
Multi-update operations are always broadcast operations.
The updateMany() and deleteMany() methods are broadcast operations, unless the query document specifies the shard key in full.
Targeted Operations
mongos can route queries that include the shard key or the prefix of a compound shard key a specific shard or set of shards. mongos uses the shard key value to locate the chunk whose range includes the shard key value and directs the query at the shard containing that chunk.
For example, if the shard key is:
copy
{ a: 1, b: 1, c: 1 }
The mongos program can route queries that include the full shard key or either of the following shard key prefixes at a specific shard or set of shards:
copy
{ a: 1 }
{ a: 1, b: 1 }
All insertOne() operations target to one shard. Each document in the insertMany() array targets to a single shard, but there is no guarantee all documents in the array insert into a single shard.
All updateOne(), replaceOne() and deleteOne() operations must include the shard key or _id in the query document. MongoDB returns an error if these methods are used without the shard key or _id.
Depending on the distribution of data in the cluster and the selectivity of the query, mongos may still perform a broadcast operation to fulfill these queries.
Index Use
If the query does not include the shard key, the mongos must send the query to all shards as a “scatter/gather” operation. Each shard will, in turn, use either the shard key index or another more efficient index to fulfill the query.
If the query includes multiple sub-expressions that reference the fields indexed by the shard key and the secondary index, the mongos can route the queries to a specific shard and the shard will use the index that will allow it to fulfill most efficiently.
Sharded Cluster Security
Use Internal Authentication to enforce intra-cluster security and prevent unauthorized cluster components from accessing the cluster. You must start each mongod or mongos in the cluster with the appropriate security settings in order to enforce internal authentication.
See Deploy Sharded Cluster with Keyfile Access Control for a tutorial on deploying a secured shardedcluster.
Cluster Users
Sharded clusters support Role-Based Access Control (RBAC) for restricting unauthorized access to cluster data and operations. You must start each mongod in the cluster, including the config servers, with the --auth option in order to enforce RBAC. Alternatively, enforcing Internal Authentication for inter-cluster security also enables user access controls via RBAC.
With RBAC enforced, clients must specify a --username, --password, and --authenticationDatabase when connecting to the mongos in order to access cluster resources.
Each cluster has its own cluster users. These users cannot be used to access individual shards.
See Enable Access Control for a tutorial on enabling adding users to an RBAC-enabled MongoDB deployment.
Warning: unable to run listCollections, attempting to approximate collection
李魔佛 发表了文章 • 0 个评论 • 18284 次浏览 • 2019-06-07 17:35
Warning: unable to run listCollections, attempting to approximate collection names by parsing connectionStatus
那是因为设置了密码,但是没有进行认证导致的错误。这个错误为啥不直接说明原因呢。汗
直接: db.auth('admin','密码')
认证成功返回1, 然后重新执行show tables就可以看到所有的表了。 查看全部
Warning: unable to run listCollections, attempting to approximate collection names by parsing connectionStatus
那是因为设置了密码,但是没有进行认证导致的错误。这个错误为啥不直接说明原因呢。汗
直接: db.auth('admin','密码')
认证成功返回1, 然后重新执行show tables就可以看到所有的表了。
python连接mongodb集群 cluster
李魔佛 发表了文章 • 0 个评论 • 3557 次浏览 • 2019-06-03 15:55
连接方法如下:import pymongo
db = pymongo.MongoClient('mongodb://10.18.6.46,10.18.6.26,10.18.6.102')上面默认的端口do都是27017,如果是其他端口,需要这样修改:db = pymongo.MongoClient('mongodb://10.18.6.46:8888,10.18.6.26:9999,10.18.6.102:7777')
然后就可以正常读写数据库:
读:coll=db['testdb']['testcollection'].find()
for i in coll:
print(i)输出内容:{'_id': ObjectId('5cf4c7981ee9edff72e5c503'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7991ee9edff72e5c504'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7991ee9edff72e5c505'), 'username': 'hello'}
{'_id': ObjectId('5cf4c79a1ee9edff72e5c506'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7b21ee9edff72e5c507'), 'username': 'hello world'}
写:collection = db['testdb']['testcollection']
for i in range(10):
collection.insert({'username':'huston{}'.format(i)})
原创文章,转载请注明出处:
http://30daydo.com/article/494
查看全部
连接方法如下:
import pymongo上面默认的端口do都是27017,如果是其他端口,需要这样修改:
db = pymongo.MongoClient('mongodb://10.18.6.46,10.18.6.26,10.18.6.102')
db = pymongo.MongoClient('mongodb://10.18.6.46:8888,10.18.6.26:9999,10.18.6.102:7777')
然后就可以正常读写数据库:
读:
coll=db['testdb']['testcollection'].find()输出内容:
for i in coll:
print(i)
{'_id': ObjectId('5cf4c7981ee9edff72e5c503'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7991ee9edff72e5c504'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7991ee9edff72e5c505'), 'username': 'hello'}
{'_id': ObjectId('5cf4c79a1ee9edff72e5c506'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7b21ee9edff72e5c507'), 'username': 'hello world'}
写:
collection = db['testdb']['testcollection']
for i in range(10):
collection.insert({'username':'huston{}'.format(i)})
原创文章,转载请注明出处:
http://30daydo.com/article/494
ElasticSearch查看已经存在的文档保存在哪个分片
李魔佛 发表了文章 • 0 个评论 • 3084 次浏览 • 2019-05-26 12:54
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "XxyrM2kBVzdNcvl_GHv2",
"_score" : 1.0,
"_source" : {
"name" : "Shiled",
"twitter" : "Sonny sql is awesome",
"date" : "2018-12-27",
"id" : 1240,
"tags" : [
"red",
"shit"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YByrM2kBVzdNcvl_tnvm",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 12357,
"tags" : [
"red"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "7777",
"_score" : 1.0,
"_source" : {
"name" : "Rocky Chen",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 9999
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YhzDN2kBVzdNcvl_enuT",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 888888,
"tags" : [
"red",
"green"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YxzDN2kBVzdNcvl_u3th",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 888888,
"tags" : [
"red",
"green"
]
}
}
]
}
}
如果我想看看id是 "_id" : "YxzDN2kBVzdNcvl_u3th",
这个文档是保存在哪个分片,如何查看?
引用:
路由一个文档到一个分片中编辑
当索引一个文档的时候,文档会被存储到一个主分片中。 Elasticsearch 如何知道一个文档应该存放到哪个分片中呢?当我们创建文档时,它如何决定这个文档应当被存储在分片 1 还是分片 2 中呢?
首先这肯定不会是随机的,否则将来要获取文档的时候我们就不知道从何处寻找了。实际上,这个过程是根据下面这个公式决定的:
shard = hash(routing) % number_of_primary_shards
routing 是一个可变值,默认是文档的 _id ,也可以设置成一个自定义的值。 routing 通过 hash 函数生成一个数字,然后这个数字再除以 number_of_primary_shards (主分片的数量)后得到 余数 。这个分布在 0 到 number_of_primary_shards-1 之间的余数,就是我们所寻求的文档所在分片的位置。
这就解释了为什么我们要在创建索引的时候就确定好主分片的数量 并且永远不会改变这个数量:因为如果数量变化了,那么所有之前路由的值都会无效,文档也再也找不到了。
那么可以使用
GET test/_search_shards?routing=ID号 来查看你要查询的id所在的分片
得到的结果:
{
"nodes" : {
"yl-qYmh1SXqzJsfI4d1ddw" : {
"name" : "node-3",
"ephemeral_id" : "UsJ9rFELTiCW07oHE9YMdg",
"transport_address" : "10.18.6.26:9300",
"attributes" : {
"ml.machine_memory" : "6088101888",
"rack" : "r1",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true",
"ml.enabled" : "true"
}
},
"wT7wUd3iTkujYUsbVNVv-w" : {
"name" : "node-1",
"ephemeral_id" : "fP-vgSb0SdemnHDyaJUsWw",
"transport_address" : "10.18.6.102:9300",
"attributes" : {
"ml.machine_memory" : "8256720896",
"rack" : "r1",
"xpack.installed" : "true",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true"
}
}
},
"indices" : {
"test" : { }
},
"shards" : [
[
{
"state" : "STARTED",
"primary" : true,
"node" : "wT7wUd3iTkujYUsbVNVv-w",
"relocating_node" : null,
"shard" : 1,
"index" : "test",
"allocation_id" : {
"id" : "k-8E4dL7QmGgwcsNsUCP6Q"
}
},
{
"state" : "STARTED",
"primary" : false,
"node" : "yl-qYmh1SXqzJsfI4d1ddw",
"relocating_node" : null,
"shard" : 1,
"index" : "test",
"allocation_id" : {
"id" : "lvOpQIKgRUibkulr3nRfEw"
}
}
]
]
}
只需要关注shards字段就可以,从上面可以看到,该文档存在shard 1 分片上。 分别在node1和node3节点,一个是主分片,一个是副本分片 查看全部
比如我有以下的文档:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "XxyrM2kBVzdNcvl_GHv2",
"_score" : 1.0,
"_source" : {
"name" : "Shiled",
"twitter" : "Sonny sql is awesome",
"date" : "2018-12-27",
"id" : 1240,
"tags" : [
"red",
"shit"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YByrM2kBVzdNcvl_tnvm",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 12357,
"tags" : [
"red"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "7777",
"_score" : 1.0,
"_source" : {
"name" : "Rocky Chen",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 9999
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YhzDN2kBVzdNcvl_enuT",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 888888,
"tags" : [
"red",
"green"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YxzDN2kBVzdNcvl_u3th",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 888888,
"tags" : [
"red",
"green"
]
}
}
]
}
}
如果我想看看id是 "_id" : "YxzDN2kBVzdNcvl_u3th",
这个文档是保存在哪个分片,如何查看?
引用:
路由一个文档到一个分片中编辑
当索引一个文档的时候,文档会被存储到一个主分片中。 Elasticsearch 如何知道一个文档应该存放到哪个分片中呢?当我们创建文档时,它如何决定这个文档应当被存储在分片 1 还是分片 2 中呢?
首先这肯定不会是随机的,否则将来要获取文档的时候我们就不知道从何处寻找了。实际上,这个过程是根据下面这个公式决定的:
shard = hash(routing) % number_of_primary_shards
routing 是一个可变值,默认是文档的 _id ,也可以设置成一个自定义的值。 routing 通过 hash 函数生成一个数字,然后这个数字再除以 number_of_primary_shards (主分片的数量)后得到 余数 。这个分布在 0 到 number_of_primary_shards-1 之间的余数,就是我们所寻求的文档所在分片的位置。
这就解释了为什么我们要在创建索引的时候就确定好主分片的数量 并且永远不会改变这个数量:因为如果数量变化了,那么所有之前路由的值都会无效,文档也再也找不到了。
那么可以使用
GET test/_search_shards?routing=ID号 来查看你要查询的id所在的分片
得到的结果:
{
"nodes" : {
"yl-qYmh1SXqzJsfI4d1ddw" : {
"name" : "node-3",
"ephemeral_id" : "UsJ9rFELTiCW07oHE9YMdg",
"transport_address" : "10.18.6.26:9300",
"attributes" : {
"ml.machine_memory" : "6088101888",
"rack" : "r1",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true",
"ml.enabled" : "true"
}
},
"wT7wUd3iTkujYUsbVNVv-w" : {
"name" : "node-1",
"ephemeral_id" : "fP-vgSb0SdemnHDyaJUsWw",
"transport_address" : "10.18.6.102:9300",
"attributes" : {
"ml.machine_memory" : "8256720896",
"rack" : "r1",
"xpack.installed" : "true",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true"
}
}
},
"indices" : {
"test" : { }
},
"shards" : [
[
{
"state" : "STARTED",
"primary" : true,
"node" : "wT7wUd3iTkujYUsbVNVv-w",
"relocating_node" : null,
"shard" : 1,
"index" : "test",
"allocation_id" : {
"id" : "k-8E4dL7QmGgwcsNsUCP6Q"
}
},
{
"state" : "STARTED",
"primary" : false,
"node" : "yl-qYmh1SXqzJsfI4d1ddw",
"relocating_node" : null,
"shard" : 1,
"index" : "test",
"allocation_id" : {
"id" : "lvOpQIKgRUibkulr3nRfEw"
}
}
]
]
}
只需要关注shards字段就可以,从上面可以看到,该文档存在shard 1 分片上。 分别在node1和node3节点,一个是主分片,一个是副本分片
elasticsearch在match查询里面使用了type字段 报错
李魔佛 发表了文章 • 0 个评论 • 11611 次浏览 • 2019-05-26 00:26
{
"query":
{
"match": {
"name": {
"type":"phrase",
"query":"enterprise london",
"slop":1
}}
},
"_source": "name"
}
报错:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[match] query does not support [type]",
"line": 6,
"col": 13
}
],
"type": "parsing_exception",
"reason": "[match] query does not support [type]",
"line": 6,
"col": 13
},
"status": 400
}
在6.x已经不支持在math里面使用type,
可以修改为以下语法:
POST get-together/_search
{
"query":
{
"match_phrase": {
"name": {
"query":"enterprise london",
"slop":1
}}
},
"_source": "name"
}
得到的效果是一致的:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.3243701,
"hits" : [
{
"_index" : "get-together",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.3243701,
"_source" : {
"name" : "Enterprise search London get-together"
}
}
]
}
} 查看全部
POST get-together/_search
{
"query":
{
"match": {
"name": {
"type":"phrase",
"query":"enterprise london",
"slop":1
}}
},
"_source": "name"
}
报错:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[match] query does not support [type]",
"line": 6,
"col": 13
}
],
"type": "parsing_exception",
"reason": "[match] query does not support [type]",
"line": 6,
"col": 13
},
"status": 400
}
在6.x已经不支持在math里面使用type,
可以修改为以下语法:
POST get-together/_search
{
"query":
{
"match_phrase": {
"name": {
"query":"enterprise london",
"slop":1
}}
},
"_source": "name"
}
得到的效果是一致的:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.3243701,
"hits" : [
{
"_index" : "get-together",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.3243701,
"_source" : {
"name" : "Enterprise search London get-together"
}
}
]
}
}
elasticsearch 更新文档的坑
李魔佛 发表了文章 • 2 个评论 • 9242 次浏览 • 2019-05-24 22:46
{
"doc":{
"title":"中国操作系统"
}
}
那个body里面的”doc" 不能少
不然会报错:
{
"error": {
"root_cause": [
{
"type": "action_request_validation_exception",
"reason": "Validation Failed: 1: script or doc is missing;"
}
],
"type": "action_request_validation_exception",
"reason": "Validation Failed: 1: script or doc is missing;"
},
"status": 400
} 查看全部
POST cnbeta/doc/cUxO42oB9O-zF2ru-rs-/_update
{
"doc":{
"title":"中国操作系统"
}
}
那个body里面的”doc" 不能少
不然会报错:
{
"error": {
"root_cause": [
{
"type": "action_request_validation_exception",
"reason": "Validation Failed: 1: script or doc is missing;"
}
],
"type": "action_request_validation_exception",
"reason": "Validation Failed: 1: script or doc is missing;"
},
"status": 400
}
ElasticSearch配置集群无法发现节点问题【已解决】
李魔佛 发表了文章 • 0 个评论 • 3518 次浏览 • 2019-05-05 10:00
修改文件elasticsearch.yml 文件中的timeout参数,改成原来值得10倍就可以了。 查看全部
修改文件elasticsearch.yml 文件中的timeout参数,改成原来值得10倍就可以了。
版本不兼容会增加学习的成本和挫败感-致ElasticSearch和Django
李魔佛 发表了文章 • 0 个评论 • 2404 次浏览 • 2019-04-27 21:59
看的书或者网上的教程,一步一步下来,发现要一路google。 2018年8月的书,到2019年上机,书上代码已经无法正常运行了。 报的错误就是新版ElasticSearch或者Django已经不支持这个api了。 真是一万字草泥码奔腾而过。
查看全部
看的书或者网上的教程,一步一步下来,发现要一路google。 2018年8月的书,到2019年上机,书上代码已经无法正常运行了。 报的错误就是新版ElasticSearch或者Django已经不支持这个api了。 真是一万字草泥码奔腾而过。
Fielddata is disabled on text fields by default. Set fielddata=true
李魔佛 发表了文章 • 0 个评论 • 4938 次浏览 • 2019-04-24 15:37
{ "size":0,
"aggs":
{
"color":
{
"terms":{
"field":"color"
}
}
}
}
创建数据如下:
curl -X POST "10.18.6.102:9200/cars/transactions/_bulk" -H 'Content-Type: application/json' -d'
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
'
那么运行聚合操作会报错,官方的说法是text是会分词,如果text中一个文本为New York,那么就会被分成2个桶,一个New桶,一个York桶,那么显然不能聚合操作,要么你把该类型替换成keyword类型,因为keyword类型是不会分词的,可以用来做聚合操作。
如果实在是想要用text做聚合操作,那么可以手工修改其mapping
PUT my_index/_mapping/_doc
{
"properties": {
"my_field": {
"type": "text",
"fielddata": true
}
}
}上面语句可以在已有d的mapping上修改。
修改完成后就可以正常聚合操作了。
查看全部
{ "size":0,
"aggs":
{
"color":
{
"terms":{
"field":"color"
}
}
}
}
创建数据如下:
curl -X POST "10.18.6.102:9200/cars/transactions/_bulk" -H 'Content-Type: application/json' -d'
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
'
那么运行聚合操作会报错,官方的说法是text是会分词,如果text中一个文本为New York,那么就会被分成2个桶,一个New桶,一个York桶,那么显然不能聚合操作,要么你把该类型替换成keyword类型,因为keyword类型是不会分词的,可以用来做聚合操作。
如果实在是想要用text做聚合操作,那么可以手工修改其mapping
PUT my_index/_mapping/_doc上面语句可以在已有d的mapping上修改。
{
"properties": {
"my_field": {
"type": "text",
"fielddata": true
}
}
}
修改完成后就可以正常聚合操作了。
python操作kafka报错:return '<SimpleProducer batch=%s>' % self.async
李魔佛 发表了文章 • 0 个评论 • 17677 次浏览 • 2019-04-08 16:59
C:\ProgramData\Anaconda3\python.exe C:/git/base_function/kafka_usage.py
Traceback (most recent call last):
File "C:/git/base_function/kafka_usage.py", line 6, in <module>
from kafka import KafkaProducer
File "C:\ProgramData\Anaconda3\lib\site-packages\kafka\__init__.py", line 23, in <module>
from kafka.producer import KafkaProducer
File "C:\ProgramData\Anaconda3\lib\site-packages\kafka\producer\__init__.py", line 4, in <module>
from .simple import SimpleProducer
File "C:\ProgramData\Anaconda3\lib\site-packages\kafka\producer\simple.py", line 54
return '<SimpleProducer batch=%s>' % self.async
^
SyntaxError: invalid syntax
因为py3.7里面async已经变成了关键字。所以导致了不兼容。
解决办法:
使用最新的kafka版本,但是pyPI上的kafka还没有被替换成最新的,可以使用下面的方法升级kafka python
pip install kafka-python
然后问题就解决了。 查看全部
C:\ProgramData\Anaconda3\python.exe C:/git/base_function/kafka_usage.py
Traceback (most recent call last):
File "C:/git/base_function/kafka_usage.py", line 6, in <module>
from kafka import KafkaProducer
File "C:\ProgramData\Anaconda3\lib\site-packages\kafka\__init__.py", line 23, in <module>
from kafka.producer import KafkaProducer
File "C:\ProgramData\Anaconda3\lib\site-packages\kafka\producer\__init__.py", line 4, in <module>
from .simple import SimpleProducer
File "C:\ProgramData\Anaconda3\lib\site-packages\kafka\producer\simple.py", line 54
return '<SimpleProducer batch=%s>' % self.async
^
SyntaxError: invalid syntax
因为py3.7里面async已经变成了关键字。所以导致了不兼容。
解决办法:
使用最新的kafka版本,但是pyPI上的kafka还没有被替换成最新的,可以使用下面的方法升级kafka python
pip install kafka-python
然后问题就解决了。
postman使用_analyze端点 ElasticSearch
李魔佛 发表了文章 • 0 个评论 • 3390 次浏览 • 2019-04-01 15:31
ES 6.x如何使用_analyze端点
因为使用curl编辑查询语句很不方便。平时用postman最多,故平时查询ES经常使用postman查询。
_analyze端点是用于查询分析器的分析效果。
文档中使用如下方法查询
curl -XPOST 'localhost:9200/_analyze?analyzer=standard' -d 'I love Bears and Fish.'
只是奇怪,为何post的内容 'I love Bears and Fish.'不需要字段名?
试验了几次后,发现在6.x上,该字段的字段名是text
所以请求body应该是这样的
可以使用get方法来使用_analyze端点 查看全部
ES 6.x如何使用_analyze端点
因为使用curl编辑查询语句很不方便。平时用postman最多,故平时查询ES经常使用postman查询。
_analyze端点是用于查询分析器的分析效果。
文档中使用如下方法查询
curl -XPOST 'localhost:9200/_analyze?analyzer=standard' -d 'I love Bears and Fish.'
只是奇怪,为何post的内容 'I love Bears and Fish.'不需要字段名?
试验了几次后,发现在6.x上,该字段的字段名是text
所以请求body应该是这样的
可以使用get方法来使用_analyze端点
Elasticsearch : Failed to obtain node lock, is the following location writable
李魔佛 发表了文章 • 0 个评论 • 3421 次浏览 • 2019-02-25 18:35
看了下权限,没有问题,可以写。
后来发现后台的ES进程没有的得到释放,使用kill命令杀掉
ps -axu | grep 'java'
找到对应的进程ID,然后杀掉
kill ID号
然后重新调用./elasticsearch 就可以了。
查看全部
看了下权限,没有问题,可以写。
后来发现后台的ES进程没有的得到释放,使用kill命令杀掉
ps -axu | grep 'java'
找到对应的进程ID,然后杀掉
kill ID号
然后重新调用./elasticsearch 就可以了。
修改Logstash中的sql_last_value值
李魔佛 发表了文章 • 0 个评论 • 10017 次浏览 • 2019-02-20 19:37
这个文件在logstash的第一层目录底下。
不然每次都是从这个最后的值开始执行的。
这个文件在logstash的第一层目录底下。
不然每次都是从这个最后的值开始执行的。