Query takes a long time Solr 6.1.0 | KandaSearch Community Support Forum

返信投稿者：Shawn Heisey (2019/06/05 21:42 投稿)

On 6/5/2019 5:35 AM, vishal patel wrote:

We have 2 shards and 2 replicas in Live also have multiple collections.
We are performing heavy search and update.

There is no information here about how many servers are serving those
four shard replicas.

-> I haveattachedsome query which takes time for executing. why does
it take too much time? Due to the query length?

No attachments made it to the list. Attachments rarely make it --
you'll need to find some other way to share content.

-> Some times replica goes in recovery mode and from the log, we can not
identify the issue but GC pause time 15 to 20 seconds. Ideally what
should be GC pause time? GC pause time increase due to indexing or
searching documents?

Individual GC pauses that are long are caused by having a large heap
that undergoes a full collection. Long pauses from multiple collections
are typically caused by a heap that's too small. When the heap is
properly sized and GC is tuned well, full collections will be very rare,
and the generation-specific collections will typically be very fast.

My Solr live data :

This indicates that your total size for shard1 is almost 400 gigabytes,
and your total size for shard2 is almost 300 gigabytes.

If you have 400 or 700 GB of data on one server, then you will need a
SIGNIFICANT amount of memory in that server, with most of it NOT
allocated to the heap for Solr.

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

Thanks,
Shawn

返信投稿者：vishal patel (2019/06/05 22:08 投稿)

I have attached RAR file but not attached properly. Again attached txt file.

For 2 shards and 2 replicas, we have 2 servers and each has 256 GB ram and 1 TB storage. One shard and another shard replica in one server.

Sent from Outlookhttp://aka.ms/weboutlook

From: Shawn Heisey apache@elyograg.org
Sent: Wednesday, June 5, 2019 6:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

On 6/5/2019 5:35 AM, vishal patel wrote:

We have 2 shards and 2 replicas in Live also have multiple collections.
We are performing heavy search and update.

There is no information here about how many servers are serving those
four shard replicas.

-> I haveattachedsome query which takes time for executing. why does
it take too much time? Due to the query length?

No attachments made it to the list. Attachments rarely make it --
you'll need to find some other way to share content.

-> Some times replica goes in recovery mode and from the log, we can not
identify the issue but GC pause time 15 to 20 seconds. Ideally what
should be GC pause time? GC pause time increase due to indexing or
searching documents?

Individual GC pauses that are long are caused by having a large heap
that undergoes a full collection. Long pauses from multiple collections
are typically caused by a heap that's too small. When the heap is
properly sized and GC is tuned well, full collections will be very rare,
and the generation-specific collections will typically be very fast.

My Solr live data :

This indicates that your total size for shard1 is almost 400 gigabytes,
and your total size for shard2 is almost 300 gigabytes.

If you have 400 or 700 GB of data on one server, then you will need a
SIGNIFICANT amount of memory in that server, with most of it NOT
allocated to the heap for Solr.

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

Thanks,
Shawn

添付ファイル：

56 longQuery.txt

返信投稿者：Shawn Heisey (2019/06/05 22:40 投稿)

On 6/5/2019 7:08 AM, vishal patel wrote:

I have attached RAR file but not attached properly. Again attached txt file.

For 2 shards and 2 replicas, we have 2 servers and each has 256 GB ram
and 1 TB storage. One shard and another shard replica in one server.

You got lucky. Even text files usually don't make it to the list --
yours did this time. Use a file sharing website in the future.

That is a massive query. The primary reason that Lucene defaults to a
maxBooleanClauses value of 1024, which you are definitely exceeding
here, is that queries with that many clauses tend to be slow and consume
massive levels of resources. It might not be possible to improve the
query speed very much here if you cannot reduce the size of the query.

Your query doesn't look like it is simple enough to replace with the
terms query parser, which has better performance than a boolean query
with thousands of "OR" clauses.

How much index data is on one server with 256GB of memory? What is the
max heap size on the Solr instance? Is there only one Solr instance?

The screenshot mentioned here will most likely relay all the info I am
looking for. Be sure the sort is correct:

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

You will not be able to successfully attach the screenshot to a message.
That will require a file sharing website.

Thanks,
Shawn

返信投稿者：vishal patel (2019/06/06 20:45 投稿)

Thanks for your reply.

How much index data is on one server with 256GB of memory? What is the
max heap size on the Solr instance? Is there only one Solr instance?

One server(256GB RAM) has two below Solr instance and other application also
1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)

The second server(256GB RAM and 1 TB storage) has two below Solr instance and other application also
1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)

Both server memory and disk usage:
https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5

Note: Average 40GB heap used normally in each Solr instance. when replica gets down at that time disk IO are high and also GC pause time above 15 seconds. We can not identify the exact issue of replica recovery OR down from logs. due to the GC pause? OR due to disk IO high? OR due to time-consuming query? OR due to heavy indexing?

Regards,
Vishal

From: Shawn Heisey apache@elyograg.org
Sent: Wednesday, June 5, 2019 7:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

On 6/5/2019 7:08 AM, vishal patel wrote:

I have attached RAR file but not attached properly. Again attached txt file.

For 2 shards and 2 replicas, we have 2 servers and each has 256 GB ram
and 1 TB storage. One shard and another shard replica in one server.

You got lucky. Even text files usually don't make it to the list --
yours did this time. Use a file sharing website in the future.

That is a massive query. The primary reason that Lucene defaults to a
maxBooleanClauses value of 1024, which you are definitely exceeding
here, is that queries with that many clauses tend to be slow and consume
massive levels of resources. It might not be possible to improve the
query speed very much here if you cannot reduce the size of the query.

Your query doesn't look like it is simple enough to replace with the
terms query parser, which has better performance than a boolean query
with thousands of "OR" clauses.

How much index data is on one server with 256GB of memory? What is the
max heap size on the Solr instance? Is there only one Solr instance?

The screenshot mentioned here will most likely relay all the info I am
looking for. Be sure the sort is correct:

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

You will not be able to successfully attach the screenshot to a message.
That will require a file sharing website.

Thanks,
Shawn

返信投稿者：vishal patel (2019/06/08 00:04 投稿)

Any one is looking my issue??

Get Outlook for Androidhttps://aka.ms/ghei36

From: vishal patel
Sent: Thursday, June 6, 2019 5:15:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

Thanks for your reply.

How much index data is on one server with 256GB of memory? What is the
max heap size on the Solr instance? Is there only one Solr instance?

One server(256GB RAM) has two below Solr instance and other application also
1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)

The second server(256GB RAM and 1 TB storage) has two below Solr instance and other application also
1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)

Both server memory and disk usage:
https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5

Note: Average 40GB heap used normally in each Solr instance. when replica gets down at that time disk IO are high and also GC pause time above 15 seconds. We can not identify the exact issue of replica recovery OR down from logs. due to the GC pause? OR due to disk IO high? OR due to time-consuming query? OR due to heavy indexing?

Regards,
Vishal

From: Shawn Heisey apache@elyograg.org
Sent: Wednesday, June 5, 2019 7:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

On 6/5/2019 7:08 AM, vishal patel wrote:

I have attached RAR file but not attached properly. Again attached txt file.

For 2 shards and 2 replicas, we have 2 servers and each has 256 GB ram
and 1 TB storage. One shard and another shard replica in one server.

You got lucky. Even text files usually don't make it to the list --
yours did this time. Use a file sharing website in the future.

That is a massive query. The primary reason that Lucene defaults to a
maxBooleanClauses value of 1024, which you are definitely exceeding
here, is that queries with that many clauses tend to be slow and consume
massive levels of resources. It might not be possible to improve the
query speed very much here if you cannot reduce the size of the query.

Your query doesn't look like it is simple enough to replace with the
terms query parser, which has better performance than a boolean query
with thousands of "OR" clauses.

How much index data is on one server with 256GB of memory? What is the
max heap size on the Solr instance? Is there only one Solr instance?

The screenshot mentioned here will most likely relay all the info I am
looking for. Be sure the sort is correct:

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

You will not be able to successfully attach the screenshot to a message.
That will require a file sharing website.

Thanks,
Shawn

返信投稿者：David Hastings (2019/06/08 00:07 投稿)

There isnt anything wrong aside from your query is poorly thought out.

On Fri, Jun 7, 2019 at 11:04 AM vishal patel vishalpatel200928@outlook.com
wrote:

Any one is looking my issue??

Get Outlook for Androidhttps://aka.ms/ghei36

From: vishal patel
Sent: Thursday, June 6, 2019 5:15:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

Thanks for your reply.

How much index data is on one server with 256GB of memory? What is the
max heap size on the Solr instance? Is there only one Solr instance?

One server(256GB RAM) has two below Solr instance and other application
also
1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)

The second server(256GB RAM and 1 TB storage) has two below Solr instance
and other application also
1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)

Both server memory and disk usage:
https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5

Note: Average 40GB heap used normally in each Solr instance. when replica
gets down at that time disk IO are high and also GC pause time above 15
seconds. We can not identify the exact issue of replica recovery OR down
from logs. due to the GC pause? OR due to disk IO high? OR due to
time-consuming query? OR due to heavy indexing?

Regards,
Vishal

From: Shawn Heisey apache@elyograg.org
Sent: Wednesday, June 5, 2019 7:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

On 6/5/2019 7:08 AM, vishal patel wrote:

I have attached RAR file but not attached properly. Again attached txt
file.

For 2 shards and 2 replicas, we have 2 servers and each has 256 GB ram
and 1 TB storage. One shard and another shard replica in one server.

You got lucky. Even text files usually don't make it to the list --
yours did this time. Use a file sharing website in the future.

That is a massive query. The primary reason that Lucene defaults to a
maxBooleanClauses value of 1024, which you are definitely exceeding
here, is that queries with that many clauses tend to be slow and consume
massive levels of resources. It might not be possible to improve the
query speed very much here if you cannot reduce the size of the query.

Your query doesn't look like it is simple enough to replace with the
terms query parser, which has better performance than a boolean query
with thousands of "OR" clauses.

How much index data is on one server with 256GB of memory? What is the
max heap size on the Solr instance? Is there only one Solr instance?

The screenshot mentioned here will most likely relay all the info I am
looking for. Be sure the sort is correct:

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

You will not be able to successfully attach the screenshot to a message.
That will require a file sharing website.

Thanks,
Shawn

返信投稿者：Shawn Heisey (2019/06/08 00:30 投稿)

On 6/6/2019 5:45 AM, vishal patel wrote:

One server(256GB RAM) has two below Solr instance and other application also
1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)

The second server(256GB RAM and 1 TB storage) has two below Solr instance and other application also
1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)

An 80GB heap is ENORMOUS. And you have two of those per server. Do you
know that you need a heap that large? You only have 50 million
documents total, two instances that each have 80GB seems completely
unnecessary. I would think that one instance with a much smaller heap
would handle just about anything you could throw at 50 million documents.

With 160GB taken by heaps, you're leaving less than 100GB of memory to
cache over 700GB of index. This is not going to work well, especially
if your index doesn't have many fields that are stored. It will cause a
lot of disk I/O.

Both server memory and disk usage:
https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5

Unless you have changed the DirectoryFactory to something that's not
default, your process listing does not reflect over 700GB of index data.
If you have changed the DirectoryFactory, then I would strongly
recommend removing that part of your config and letting Solr use its
default.

Note: Average 40GB heap used normally in each Solr instance. when replica gets down at that time disk IO are high and also GC pause time above 15 seconds. We can not identify the exact issue of replica recovery OR down from logs. due to the GC pause? OR due to disk IO high? OR due to time-consuming query? OR due to heavy indexing?

With an 80GB heap, I'm not really surprised you're seeing GC pauses
above 15 seconds. I have seen pauses that long with a heap that's only 8GB.

GC pauses lasting that long will cause problems with SolrCloud. Nodes
going into recovery is common.

Thanks,
Shawn

返信投稿者：vishal patel (2019/06/10 18:24 投稿)

An 80GB heap is ENORMOUS. And you have two of those per server. Do you
know that you need a heap that large? You only have 50 million
documents total, two instances that each have 80GB seems completely
unnecessary. I would think that one instance with a much smaller heap
would handle just about anything you could throw at 50 million documents.

With 160GB taken by heaps, you're leaving less than 100GB of memory to
cache over 700GB of index. This is not going to work well, especially
if your index doesn't have many fields that are stored. It will cause a
lot of disk I/O.

We have 27 collections and each collection has many schema fields and in live too many search and index create&update requests come and most of the searching requests are sorting, faceting, grouping, and long query.
So approx average 40GB heap are used so we gave 80GB memory.

Unless you have changed the DirectoryFactory to something that's not
default, your process listing does not reflect over 700GB of index data.
If you have changed the DirectoryFactory, then I would strongly
recommend removing that part of your config and letting Solr use its
default.

our directory in solrconfig.xml

Here our schema file and solrconfig XML and GC log, please verify it. is it anything wrong or suggestions for improvement?
https://drive.google.com/drive/folders/1wV9bdQ5-pP4s4yc8jrYNz77YYVRmT7FG

GC log ::
2019-06-06T11:55:37.729+0100: 1053781.828: [GC (Allocation Failure) 1053781.828: [ParNew
Desired survivor size 3221205808 bytes, new threshold 8 (max 8)

age 1: 268310312 bytes, 268310312 total
age 2: 220271984 bytes, 488582296 total
age 3: 75942632 bytes, 564524928 total
age 4: 76397104 bytes, 640922032 total
age 5: 126931768 bytes, 767853800 total
age 6: 92672080 bytes, 860525880 total
age 7: 2810048 bytes, 863335928 total
age 8: 11755104 bytes, 875091032 total
: 15126407K->1103229K(17476288K), 15.7272287 secs] 45423308K->31414239K(80390848K), 15.7274518 secs] [Times: user=212.05 sys=16.08, real=15.73 secs]
Heap after GC invocations=68829 (full 187):
par new generation total 17476288K, used 1103229K [0x0000000080000000, 0x0000000580000000, 0x0000000580000000)
eden space 13981056K, 0% used [0x0000000080000000, 0x0000000080000000, 0x00000003d5560000)
from space 3495232K, 31% used [0x00000004aaab0000, 0x00000004ee00f508, 0x0000000580000000)
to space 3495232K, 0% used [0x00000003d5560000, 0x00000003d5560000, 0x00000004aaab0000)
concurrent mark-sweep generation total 62914560K, used 30311010K [0x0000000580000000, 0x0000001480000000, 0x0000001480000000)
Metaspace used 50033K, capacity 50805K, committed 53700K, reserved 55296K
}
2019-06-06T11:55:53.456+0100: 1053797.556: Total time for which application threads were stopped: 42.4594545 seconds, Stopping threads took: 26.7301882 seconds

For which reason GC paused 42 seconds?

Heavy searching and indexing create & update in our Solr Cloud.
So, Should we divide a cloud between 27 collections? Should we add one more shard?

Sent from Outlookhttp://aka.ms/weboutlook

From: Shawn Heisey apache@elyograg.org
Sent: Friday, June 7, 2019 9:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

On 6/6/2019 5:45 AM, vishal patel wrote:

One server(256GB RAM) has two below Solr instance and other application also
1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)

The second server(256GB RAM and 1 TB storage) has two below Solr instance and other application also
1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)

An 80GB heap is ENORMOUS. And you have two of those per server. Do you
know that you need a heap that large? You only have 50 million
documents total, two instances that each have 80GB seems completely
unnecessary. I would think that one instance with a much smaller heap
would handle just about anything you could throw at 50 million documents.

With 160GB taken by heaps, you're leaving less than 100GB of memory to
cache over 700GB of index. This is not going to work well, especially
if your index doesn't have many fields that are stored. It will cause a
lot of disk I/O.

Both server memory and disk usage:
https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5

Unless you have changed the DirectoryFactory to something that's not
default, your process listing does not reflect over 700GB of index data.
If you have changed the DirectoryFactory, then I would strongly
recommend removing that part of your config and letting Solr use its
default.

Note: Average 40GB heap used normally in each Solr instance. when replica gets down at that time disk IO are high and also GC pause time above 15 seconds. We can not identify the exact issue of replica recovery OR down from logs. due to the GC pause? OR due to disk IO high? OR due to time-consuming query? OR due to heavy indexing?

With an 80GB heap, I'm not really surprised you're seeing GC pauses
above 15 seconds. I have seen pauses that long with a heap that's only 8GB.

GC pauses lasting that long will cause problems with SolrCloud. Nodes
going into recovery is common.

Thanks,
Shawn

返信投稿者：Shawn Heisey (2019/06/10 23:30 投稿)

On 6/10/2019 3:24 AM, vishal patel wrote:

We have 27 collections and each collection has many schema fields and in live too many search and index create&update requests come and most of the searching requests are sorting, faceting, grouping, and long query.
So approx average 40GB heap are used so we gave 80GB memory.

Unless you've been watching an actual graph of heap usage over a
significant amount of time, you can't learn anything useful from it.

And it's very possible that you can't get anything useful even from a
graph, unless that graph is generated by analyzing a lengthy garbage
collection log.

our directory in solrconfig.xml

When using MMAP, one of the memory columns should show a total that's
approximately equal to the max heap plus the size of all indexes being
handled by Solr. None of the columns in your Resource Monitor memory
screenshot show numbers over 400GB, which is what I would expect based
on what you said about the index size.

MMapDirectoryFactory is a decent choice, but Solr's default of
NRTCachingDirectoryFactory is probably better. Switching to NRT will
not help whatever is causing your performance problems, though.

Here our schema file and solrconfig XML and GC log, please verify it. is it anything wrong or suggestions for improvement?
https://drive.google.com/drive/folders/1wV9bdQ5-pP4s4yc8jrYNz77YYVRmT7FG

That GC log covers a grand total of three and a half minutes. It's
useless. Heap usage is nearly constant for the full time at about 30GB.
Without a much more comprehensive log, I cannot offer any useful
advice. I'm looking for logs that lasts several hours, and a few DAYS
would be better.

Your caches are commented out, so that is not contributing to heap
usage. Another reason to drop the heap size, maybe.

2019-06-06T11:55:53.456+0100: 1053797.556: Total time for which application threads were stopped: 42.4594545 seconds, Stopping threads took: 26.7301882 seconds

Part of the problem here is that stopping threads took 26 seconds. I
have never seen anything that high before. It should only take a
small fraction of a second to stop all threads. Something seems to be
going very wrong here. One thing that it might be is something called
"the four month bug", which is fixed by adding -XX:+PerfDisableSharedMem
to the JVM options. Here's a link to the blog post about that problem:

https://www.evanjones.ca/jvm-mmap-pause.html

It's not clear whether the 42 seconds includes the 26 seconds, or
whether there was 42 seconds of pause AFTER the threads were stopped. I
would imagine that the larger number includes the smaller number. Might
need to ask Oracle engineers. Pause times like this do not surprise me
with a heap this big, but 26 seconds to stop threads sounds like a major
issue, and I am not sure about what might be causing it. My guess about
the four month bug above is a shot in the dark that might be completely
wrong.

Thanks,
Shawn