The test environment
pgsql-ha
service name | cpu | memory | disk |
---|---|---|---|
pgpool | 2 | 4 | HHD 200G |
pg-master | 8 | 16 | HHD 500G |
pg-slave1 | 8 | 16 | HHD 500G |
pg-slave2 | 8 | 16 | HHD 500G |
diagram
Citus-ha
master
service name | cpu | memory | disk |
---|---|---|---|
pgpool | 2 | 4 | HHD 200G |
pg-master | 8 | 16 | HHD 500G |
pg-slave1 | 8 | 16 | HHD 500G |
pg-slave2 | 8 | 16 | HHD 500G |
worker-node * 2
service name | cpu | memory | disk |
---|---|---|---|
pgpool & pg-master | 4 | 8 | HHD 500G |
pg-slave1 | 4 | 8 | HHD 500G |
pg-slave2 | 4 | 8 | HHD 500G |
diagram
report
320W | pgsql-ha | citus-shard8 | citus-shard12 | citus-shard16 | citus-shard20 | citus-shard24 | citus-shard28 | citus-shard32 |
---|---|---|---|---|---|---|---|---|
no index | 10781.747 ms | 2437.940 ms | 2213.936 ms | 2194.031 ms | 2022.280 ms | 2094.108 ms | 2040.045 ms | 2138.645 ms |
index | 8669.755 ms | 1717.051 ms | 1667.949 ms | 1634.968 ms | 1447.810 ms | 1303.432 ms | 1306.975 ms | 1314.583 ms |
sql
explain analyse
SELECT
"col0" AS diffcol0,
count('*') AS count
FROM
"stream_620392b78a6fdca19ee6ba54_300w" AS tab0,
(
SELECT
tab0._id,
c_unnest_taxonomy_with_null("f_fxi_bc70_t_labels$textual") AS "col0"
FROM
"stream_620392b78a6fdca19ee6ba54_300w" as tab0
) AS tab1
WHERE
(
(
"f_sxi_f158_ngrams$textual" && (ARRAY ['汽车_之家', '销售_顾问']::text [])
)
AND "tab1"."_id" = "tab0"."_id"
)
GROUP BY
"diffcol0"
ORDER BY
"diffcol0" ASC NULLS FIRST
LIMIT
200 OFFSET 0;
Tips
distributed
You can create indexes for distributed first. If you create an index first and then distribute, you will wait a long time
undistributed
You can clear the index first and undistribute it. If you have index in your table and undistributed it, you will be waiting too long and may cause unknown errors
ERROR
Controls are used to query on client-endpoint, An error.
SELECT
topic_id, token_name, topic_weight, mp_mn_arr, (
( first_text ).doc_id, ( first_text ).pos, ( first_text ).neg, content_fields, ( first_text ).score
) :: first_text, row_number_by_topic, bytopic_token_count
FROM
(
SELECT
topic_id, token_name, topic_weight, mp_mn_arr, first_text, row_number_by_topic, COUNT ( * ) OVER ( PARTITION BY topic_id ) AS bytopic_token_count
FROM
(
WITH topic_unwind AS (
SELECT UNNEST
( "f_xxi_bd74_topics$numerical" ) AS topic_id, tab0._id AS ID, NAME AS token_name, "f_rxi_f161_s_positive$numerical" AS positive, "f_pxi_f161_s_negative$numerical" AS negative, analysis.weight AS arr_weight
FROM
"stream_620392b78a6fdca19ee6ba54" AS tab0, "analysis_620395c98a6fdca19ee6bd74" AS analysis
WHERE
tab0._id = analysis._id
AND "f_xxi_bd74_topics$numerical" != ARRAY [ NULL ]:: FLOAT []
) SELECT
topic_id, token_name, c_sentiment_score_aggregation ( positive, negative ) AS mp_mn_arr, c_best_topic_score_aggregation (
ID, positive, negative, arr_weight [ topic_id + 1 ]
) AS first_text, ROW_NUMBER ( ) OVER (
PARTITION BY topic_id
ORDER BY
SUM ( arr_weight [ topic_id + 1 ] ) DESC
) AS row_number_by_topic, SUM ( arr_weight [ topic_id + 1 ] ) AS topic_weight
FROM
topic_unwind
GROUP BY
topic_id,
token_name,
arr_weight
) AS to_reg_token
WHERE
row_number_by_topic < 21
) filtered_tb
LEFT JOIN LATERAL (
SELECT
array_to_string( ARRAY [ "f_pwi_ba54_顾问$textual" ], ' \' ) AS content_fields
FROM
"stream_620392b78a6fdca19ee6ba54" s
WHERE
s._id = ( first_text ).doc_id
) f2 ON TRUE
ORDER BY
topic_id
```
```sql
[ ERROR ][dev-chinarc][2022-02-14 08:05:06 - handler.linked.relations-34-139732885411584] - Runtime error: direct joins between distributed and local tables are not supported
HINT: Use CTE's or subqueries to select from local tables and use them in joins
Traceback (most recent call last):
File "client_endpoint/handlers/actions/analyses.py", line 339, in client_endpoint.handlers.actions.analyses.AnalysesActions.POST_generate_views
File "aggregation/controllers_sql/unstructure_controller.py", line 154, in aggregation.controllers_sql.unstructure_controller.UnstructureWidgetController.generate_packet
File "aggregation/controllers_sql/unstructure_widgets.py", line 854, in aggregation.controllers_sql.unstructure_widgets.TopicWheelWidget.generate_packet
File "aggregation/controllers_sql/unstructure_widgets.py", line 190, in aggregation.controllers_sql.unstructure_widgets.PSQLQueryExecutor.__enter__
File "aggregation/storage/postgres.py", line 917, in aggregation.storage.postgres.RenewCursor.execute
File "aggregation/storage/postgres.py", line 994, in aggregation.storage.postgres.RenewCursor.__execute
psycopg2.errors.FeatureNotSupported: direct joins between distributed and local tables are not supported
HINT: Use CTE's or subqueries to select from local tables and use them in joins
Try to distribute the Analysis table. An error.
[ ERROR ][dev-chinarc][2022-02-14 08:10:42 - handler.linked.relations-32-139733829428992] - Runtime error: cannot pushdown the subquery
DETAIL: Complex subqueries and CTEs cannot be in the outer part of the outer join
,
Traceback (most recent call last):
File "client_endpoint/handlers/actions/analyses.py", line 339, in client_endpoint.handlers.actions.analyses.AnalysesActions.POST_generate_views
File "aggregation/controllers_sql/unstructure_controller.py", line 154, in aggregation.controllers_sql.unstructure_controller.UnstructureWidgetController.generate_packet
File "aggregation/controllers_sql/unstructure_widgets.py", line 854, in aggregation.controllers_sql.unstructure_widgets.TopicWheelWidget.generate_packet
File "aggregation/controllers_sql/unstructure_widgets.py", line 190, in aggregation.controllers_sql.unstructure_widgets.PSQLQueryExecutor.__enter__
File "aggregation/storage/postgres.py", line 917, in aggregation.storage.postgres.RenewCursor.execute
File "aggregation/storage/postgres.py", line 994, in aggregation.storage.postgres.RenewCursor.__execute
psycopg2.errors.FeatureNotSupported: cannot pushdown the subquery
DETAIL: Complex subqueries and CTEs cannot be in the outer part of the outer join