Release Notes for BDR3 v3.7
BDR 3.7.22 (2023 Aug 31)
This is a maintenance release for BDR 3.7 that includes minor improvements as well as fixes for issues identified in previous versions.
Also check the release notes for pglogical 3.7.22 for resolved issues that affect BDR as well.
Resolved issues
Changed
bdr.autopartition_drop_partition()
signature to use text.Autopartition: Drop partition if it exists It will help in recover from the cases when duplicate drop_partition workitems are created.
Fixed memory leak in
bdr.sequence_alloc
by modifying the missing catalog signature.Prevented superuser check when GUC was specified on PG command line.
Fixed check for malformed connection string tp prevent failure in
bdr.create_node()
. (RT95453)Backport
bdr.accept_connections
GUC.Fixed a memory leak in
bdr.sequence_alloc
.Remove txn_config entry from ReorderBuffer hash table
Ignore global_lock check from repset_func when SDW enabled
Added check for conflicting node names.
Fixed an issue whereby a crash occurred when BDR extension is used with pgaudit.
Fixed an issue by allowing a logical join of node if there are foreign key constraints violations. (RT91745)
Improvements
Upgrades
This release supports upgrading from the following versions of BDR:
- 3.7.9 and higher
- 3.6.29 and higher
BDR 3.7.21 (2023 May 16)
This is a maintenance release for BDR 3.7 that includes minor improvements as well as fixes for issues identified in previous versions.
Also check the release notes for pglogical 3.7.21 for resolved issues that affect BDR as well.
Resolved issues
Fixed memory leak in consensus process (RT91830). The memory consumed by the node is just 32 bytes, but when the consensus worker handles hundreds of requests per second, sustained for hours, the memory builds up. We saw 47% of memory consumed by consensus worker when used with HARP, which executes
bdr.consensus_kv_fetch()
at a rate of 600 times per second.Fixed issue where a node can be inconsistent with the group after rejoining. If a node was part of a subgroup, parted, and then rejoined to the group, it might be inconsistent with the group. The changes from some nodes of the group would be replayed from a wrong starting point, resulting in potential data loss.
Fixed join and replication when SDW and standby_slot_names are set (RT89702, RT89536).
Fixed upgrades for nodes with CRDTs.
Fixed replication for subscriber-only node (RT89814).
Fixed WARNING message in
bdr.raft_leadership_transfer()
(RT92180).Fixed segfault where a conflict_slot was being used (RT76439, RT92180) while using
synchronize_structures='none'
during the join. Prevent reuse of the slot after release during multi-insert (COPY).
Improvements
Upgrades
This release supports upgrading from the following versions of BDR:
- 3.7.9 and higher
- 3.6.29 and higher
BDR 3.7.20 (2023 Feb 14)
This is a maintenance release for BDR 3.7 that includes minor improvements as well as fixes for issues identified in previous versions.
Check also release notes for pglogical 3.7.20 for resolved issues that affect BDR as well.
Note
This version is required for EDB Postgres Advanced Server versions 12.14.18, 13.10.14, and later.
Resolved issues
- Fix watermark handling on clusters with multiple sub-groups Watermark is used to ensure data consistency during join. Previously, this didn't work correctly in the presence of multiple data sub-groups.
Improvements
Upgrades
This release supports upgrading from the following versions of BDR:
- 3.7.9 and higher
- 3.6.29 and higher
BDR 3.7.19 (2022 Dec 13)
This is a maintenance release for BDR 3.7 that includes minor improvements as well as fixes for issues identified in previous versions.
Check also release notes for pglogical 3.7.19 for resolved issues that affect BDR as well.
Resolved issues
- Fix timeout issue related to global lock handling (BDR-2836)
Correctly lock Raft maintained tables when needed.
Upgrades
This release supports upgrading from the following versions of BDR:
- 3.7.9 and higher
- 3.6.29 and higher
BDR 3.7.18 (2022 Nov 16)
This is a maintenance release for BDR 3.7 that includes minor improvements as well as fixes for issues identified in previous versions.
Check also release notes for pglogical 3.7.18 for resolved issues that affect BDR as well.
Resolved issues
Don't wait for ADD CONSTRAINT progress if DDL replication is off (BDR-2645, RT86043)
The constraint validation from all nodes is not needed if we don't replicate the DDL or from any node that is PARTED or STANDBY.Fix raft snapshot read/write routines for sequences (BDR-2666, RT86246)
Adjust joining to older BDR 3.6 version nodes while using galloc sequences.Fix rare segfault for bdr.drop_node()
Check for null values in the result from all the other nodes when trying to drop a node.Fix hangs in multiple concurrent joins (RT82977)
Various lock corrections for functons and raft requests that reduces the probability of distributed deadlocks.
Upgrades
This release supports upgrading from the following versions of BDR:
- 3.7.9 and higher
- 3.6.29 and higher
BDR 3.7.17 (2022 Aug 23)
This is a maintenance release for BDR 3.7 that includes minor improvements as well as fixes for issues identified in previous versions.
Check also the release notes for pglogical 3.7.17 for resolved issues that affect BDR as well.
Resolved issues
Fix spurious segmentation faults when conflicts are logged to bdr.conflict_history (BDR-2403, RT83436, RT83928)
When conflicts are logged to the catalog bdr.conflict_history, the pglogical writer process may crash because of a segmentation fault due to an invalid pointer being used. Fix this usage.Clean up the replication slot when bdr_init_physical fails (BDR-2364, RT74789)
If bdr_init_physical aborts without being able to join the node, it will leave behind an inactive replication slot. Remove such a replication slot when it is inactive before an irregular exit.
Improvements
- Allow consumption of the reserved galloc sequence slot (BDR-2367, RT83437, RT68255)
The galloc sequence slot reserved for future use by background allocator can be consumed in the presence of consensus failure.
Upgrades
This release supports upgrading from the following versions of BDR:
- 3.7.9 and higher
- 3.6.29 and higher
BDR 3.7.16 (2022 May 17)
This is a maintenance release for BDR 3.7 that includes minor improvements as well as fixes for issues identified in previous versions.
Check also release notes for pglogical 3.7.16 for resolved issues that affect BDR as well.
Resolved issues
Make ALTER TABLE lock the underlying relation only once (RT80204)
This avoids the ALTER TABLE operation falling behind in the queue when it released the lock in between internal operations. With this fix, concurrent transactions trying to acquire the same lock after the ALTER TABLE command will properly wait for the ALTER TABLE to finish.Show a proper wait event for CAMO / Eager confirmation waits (BDR-1899, RT75900)
Show correct "BDR Prepare Phase"/"BDR Commit Phase" inbdr.stat_activity
instead of the default “unknown wait event”.Correct
bdr.monitor_local_replslots
for down nodes (BDR-2080)
This function mistakenly returned an okay result for down nodes before.Reduce log for bdr.run_on_nodes (BDR-2153, RT80973)
Don't log when settingbdr.ddl_replication
to off if it's done with the "run_on_nodes" variants of function. This eliminates the flood of logs for monitoring functions.Correct an SDW decoder restart edge case (BDR-2109)
Internal testing revealed a possible error during WAL decoder recovery about mismatch between confirmed_flush LSN of WAL decoder slot also stating: "some LCR segments might be missing". This could happen before in case the WAL decoder exited immediately after processing a "Standby" WAL record other than "RUNNING_XACTS" and would lead to a halt of replication with the decoder processes continuing to restart.
Improvements
- Use 64 bits for calculating lag size in bytes (BDR-2215)
Upgrades
This release supports upgrading from the following versions of BDR:
- 3.7.9 and higher
- 3.6.29 and higher
BDR 3.7.15 (2022 Feb 15)
This is a maintenance release for BDR 3.7 that includes minor improvements as well as fixes for issues identified in previous versions.
Check also release notes for pglogical 3.7.15 for resolved issues that affect BDR as well.
Improvements
Performance of COPY replication including the initial COPY during join has been greatly improved for partitioned tables (BDR-1479)
For large tables this can improve the load times by order of magnitude or more.Back-port
bdr.run_on_nodes()
andbdr.run_on_group()
from BDR 4.0 (BDR-1433)
These functions behave same asbdr.run_on_all_nodes()
but allow running SQL on specific group or set of nodes rather than all nodes.Add execute_locally option to bdr.replicate_ddl_command (RT73533)
This allows optional queueing of ddl commands for replication to other groups without executing it locally.Don't ERROR on consensus issue during JOIN
The reporting of these transient errors was confusing as they are shown in bdr.worker_errors. These are now changed to WARNINGs.
Resolved issues
WAL decoder confirms end LSN of the running transactions record (BDR-1264)
Confirm end LSN of the running transactions record processed by WAL decoder so that the WAL decoder slot remains up to date and WAL senders get the candidate in timely manner.Improve handling of node name reuse during parallel join (RT74789)
Nodes now have a generation number so that it's easier to identify the name reuse even if the node record is received as part of a snapshot.Fix locking and snapshot use during node management in the BDR manager process (RT74789)
When processing multiple actions in the state machine, we make sure reacquire the lock on the processed node and update the snapshot to make sure any updates happening through consensus are taken into account.Improve cleanup of catalogs on local node drop
Drop all groups, not only the primary one and drop all the node state history info as well.Don't wait for autopartition tasks to complete on parting nodes (BDR-1867)
When a node has started parting process, it makes no sense to wait for autopartition tasks on such nodes to finish since it's not part of the group anymore.Ensure loss of CAMO partner connectivity switches to Local Mode immediately
This prevents disconnected partner from being reported as CAMO ready.Fix the cleanup of
bdr.node_pre_commit
for async CAMO configurations (BDR-1808)
Previously, the periodic cleanup of commit decisions on the CAMO partner checked the readiness of it's partner, rather than the origin node. This is the same node for symmetric CAMO configurations, so those were not affected. This release corrects the check for asymmetric CAMO pairings.Improve error checking for join request in bdr_init_physical
Previously bdr_init_physical would simply wait forever when there was any issue with the consensus request, now we do same checking as the logical join does.Improve handling of various timeouts and sleeps in consensus
This reduces amount of new consensus votes needed when processing many consensus requests or time consuming consensus requests, for example during join of a new node.
Upgrades
This release supports upgrading from the following versions of BDR:
- 3.7.9 and higher
- 3.6.29 and higher
BDR 3.7.14 (2021 Dec 15)
This is a maintenance release for BDR 3.7 that includes minor improvements as well as fixes for issues identified in previous versions.
Check also release notes for pglogical 3.7.14 for resolved issues that affect BDR as well.
Improvements
Reduce frequency of CAMO partner connection attempts (EE)
In case of a failure to connect to a CAMO partner to verify its configuration and check the status of transactions, do not retry immediately (leading to a fully busy pglogical manager process), but throttle down repeated attempts to reconnect and checks to once per minute.Ensure CAMO configuration is checked again after a reconnect (EE)
Add dummy CAMO configuration catalogs and Raft support (BDR-1676)
This is just to ease rolling upgrades from BDR 3.7 to 4.0.x on CAMO enabled installations.Avoid unnecessary LCR segment reads (BDR-1426)
We'll now only attempt to read new LCR segments when there are some available. This should reduce I/O load when decoding worker is enabled.
Resolved issues
Switch from CAMO to Local Mode only after timeouts (EE, RT74892)
Do not use thecatchup_interval
estimate when switching from CAMO protected to Local Mode, as that could induce inadvertent switching due to load spikes. Use the estimate only when switching from Local Mode back to CAMO protected (to prevent toggling forth and back due to lag on the CAMO partner).Prevent duplicate values generated locally by galloc sequence in high concurrency situations when the new chunk is used (RT76528)
The galloc sequence could have temporarily produce duplicate value when switching which chunk is used locally (but not across nodes) if there were multiple sessions waiting for the new value. This is now fixed.Ensure that the group slot is moved forward when there is only one node in the BDR group
This prevents disk exhaustion due to WAL accumulation when the group is left running with just single BDR node for prolonged period of time. This is not recommended setup but the WAL accumulation was not intentional.Advance Raft protocol version when there is only one node in the BDR group
Single node clusters would otherwise always stay on oldest support protocol until another node was added. This could limit available feature set on that single node.
Other changes
- Add CAMO configuration infrastructure needed for upgrade to BDR4 (BDR-1676)
Add dummy CAMO configuration infrastructure bdr.camo_pairs table and bdr.add/remove_camo_pair() functions to be able to upgrade a CAMO enabled cluster to BDR4
Upgrades
This release supports upgrading from following versions of BDR:
- 3.7.9 and higher
- 3.6.29 and higher
BDR 3.7.13.1 (2021 Nov 19)
This is a hotfix release for BDR 3.7.13.
Resolved issues
Fix potential FATAL error when using global DML locking with CAMO (BDR-1675, BDR-1655)
Fix lag calculation for CAMO local mode delay (BDR-1681)
BDR 3.7.13
This is a maintenance release for BDR 3.7 that includes minor improvements as well as fixes for issues identified in previous versions.
Check also release notes for pglogical 3.7.13 for resolved issues that affect BDR as well.
Improvements
Use a separate replication origin for the BDR consensus process (BDR-1613)
For Eager transactions that need to COMMIT PREPARED from the consensus process, use a dedicated replication origin, this way the consensus does not conflict with writer origins.Improve documentation of the backup/restore procedure (RT72503, BDR-1340)
Recommend against dropping the extension with cascade because it may drop user columns that are using CRDT types and break the sequences. It's better to usedrop_node
function instead.Add function
bdr.get_decoding_worker_stat()
(BDR-1302)
If the Decoding Worker is enabled, this function shows information about the state of the Decoding Worker associated with the current database. This also provides more granular information about Decoding Worker progress than is available viapg_replication_slots
.
Resolved issues
Fix a subscriber-side memory leak when bulk-inserting into a partitioned table (BDR-1473)
This improves memory usage during node join when there are partitioned tables present.Fix
bdr.alter_sequence_set_kind
to accept a bigint as a start value (RT74294)
The function was casting the value to anint
thus getting bogus values whenbigint
was used.Fix memory leak from consensus worker of Raft leader (RT74769)
The tracing context was leaked causing growing memory usage from the consensus, on BDR groups with many nodes, this could cause memory exhaustion.Enable async conflict resolution for explicit 2PC (BDR-1666, RT71298)
Continue applying the transaction using the async conflict resolution for explicit two phase commit.Fix potential crash if
bdr.receive_lcr
is "false" (BDR-1620)
Adust Single Decoding Worker feature to automatically disable itself if thebdr.receive_lcr
is "false". This prevents crash situation when starting replication from a peer in the cluster(on restart, or new join) withbdr.receive_lcr
disabled andenable_wal_decoder
enabled.
Other changes
- Add deprecation hint for
bdr.group_max_connections
(BDR-1596)
Allowbdr.group_max_connections
option, but make sure it's properly marked as deprecated in favor ofbdr.raft_group_max_connections
. This GUC will be removed in BDR 4.0.
Upgrades
This release supports upgrading from following versions of BDR:
- 3.7.9 and higher
- 3.6.28
BDR 3.7.12 (2021 Sep 21)
This is a maintenance release for BDR 3.7 that includes minor improvements as well as fixes for issues identified in previous versions.
Check also release notes for pglogical 3.7.12 for resolved issues that affect BDR as well.
Improvements
Tweak Single Decoding performance by caching and better locking (BDR-1311, BDR-1312)
Add caching for BDR-internal catalog information about the Decoding Worker. Split a single global lock into multiple locks (one per WAL sender) for access to internal status information of the WAL sender. This improves performance especially with many concurrent WAL sender processes.Add a new view bdr.replication_status (BDR-1412)
This is similar to the viewpglogical.replication_status
and shows information about the replication status of the local node with respect to all other BDR nodes in the cluster.Add function bdr.wal_sender_stats() This provides information about whether the WAL sender is using LCRs emitted by a Decoding Worker, and if so the name of the LCR file currently being read from.
Prevent CAMO to be used in combination with Decoding Worker (BDR-792)
These features cannot currently work in combination. This release prevents enabling them both in many cases. This is just a best-effort strategy to prevent mis-configuration.Allow to specify a postgresql.auto.conf file for
bdr_init_physical
(RT72989, BDR-1400)
Add a command line argument tobdr_init_physical
allowing to provide a custom file to be used forpostgresql.auto.conf
.
Resolved issues
Fix a potential data loss issue with bdr_init_physical (RT71888)
When reusing a slot name, previous state was not properly cleaned up in all cases. This has caused potential data loss during physical join as the slot is created ahead of time bybdr_init_physical
with the same name. The transition from physical to logical replication could miss part of the replication stream, as this drops and recreates the slot. This release properly cleans slot information when dropped and thereby prevents data loss.Fix
bdr.camo_local_mode_delay
to really kick in (BDR-1352)
This artificial delay allows throttling a CAMO node that is not currently connected to its CAMO partner to prevent it from producing transactions faster than the CAMO partner can possibly apply. In previous versions, it did not properly kick in afterbdr.global_commit_timeout
amount of lag, but only 1000 times later (due to erroneously comparing seconds to milliseconds).Prevent segfault in combination with third-party output plugins (BDR-1424, RT72006)
Adjust handling of logical WAL messages specific to BDR's Eager All Node Replication mode for output plugins unrelated to BDR. This allows for example Debezium's decoderbufs output plugin to work alongside BDR.Improve compatibility with Postgres 13 (BDR-1396)
Adjust to an API change in ReplicationSlotAcquire that may have led to unintended blocking when non-blocking was requestend and vice versa. This version of PGLogical eliminates this potential problem, which has not been observed on production systems so far.Fix serialization of Raft snapshots including commit decisions (CAMO, BDR-1454)
A possible mismatch in number of tuples could lead to serialization or deserialization errors for a Raft snapshot taken after transactions using CAMO or Eager All Node replication were used recently and stored their commit decisions.Fix
--recovery-conf
option inbdr_init_physical
Upgrades
This release supports upgrading from following versions of BDR:
- 3.7.9 and higher
- 3.6.27
BDR 3.7.11 (2021 Aug 18)
This is a maintenance release for BDR 3.7 that includes minor improvements as well as fixes for issues identified in previous versions.
Check also release notes for pglogical 3.7.11 for resolved issues that affect BDR as well.
Improvements
Reduce debug logging of decoding worker (BDR-1236, BDR-1239)
Allow configuration of maximum connections for consensus (BDR-1005)
This allows for setting up very large clusters.
Resolved issues
Fix snapshot handling in autopatition and executor
For compatibility with latest version of PostgreSQLFix deadlock handling in CAMO
This solves issue with extremely slow resolution of conflicts in cross-CAMO setup.Get copy of slot tuple when logging conflict (BDR-734)
Otherwise we could materialize the row early causing wrong update in presence of additional columns on the downstream.Improve LCR segment removal logic (BDR-1180, BDR-1183, BDR-993, BDR-1181)
Make sure we keep LCR segments for all the LSN that is the smaller between group slot LSN and the decoding worker slot LSN.Fix handling of concurrent attach to the internal connection pooler while the pool owner (consesus worker) is restating (BDR-1113)
Upgrades
This release supports upgrading from following versions of BDR:
- 3.7.9 and higher
- 3.6.27
BDR 3.7.10 (2021 Jul 20)
This is a maintenance release for BDR 3.7 that includes minor improvements as well as fixes for issues identified in previous versions.
Improvements
Check raft quorum in
bdr.monitor_group_raft()
(BDR-960) Return "CRITICAL" status inbdr.monitor_group_raft()
if at least half of the voting nodes are unreachable.Allow
bdr_monitor
role to read additional informational views. (BDR-732)bdr.group_camo_details
bdr.group_versions_details
bdr.group_raft_details
bdr.group_replslots_details
bdr.group_subscription_summary
Add
is_decoder_slot
tobdr.node_slots
to differentiate slots used by the Decoder Worker
Resolved issues
Make the consensus worker always exit if postmaster dies (BDR1063, RT70024)
Fix starting LSN of Decoding Worker after a restart
When the Decoding Worker restarts, it scans the existing LCR segments to find the LSN, transactions upto which, are completely decoded. If this LSN is higher than the slot's confirmed LSN, it updates the slot before decoding any transactions. This avoids transactions being decoded and replicated multiple times. (BDR-876, RT71345)Do not synchronize Decoding Worker's replication slot on a physical standby
When the WAL decoder starts the first time, the Decoding Worker's slot needs to be behind all the WAL sender slots so that it decodes the WAL required by the WAL senders. But the slot on primary has moved ahead of all WAL senders so synchronizing it is not useful. It is created anew after the physical standby is promoted. (BDR-738)Improve join performance when Decoding Worker is enabled
Whenfsync
=on
, joining a new node to a cluster takes much longer with Decoding Worker enabled. Also WAL buildup is observed on the node used as the source of join. This was because the Decoding Worker synced the LCR segments too frequently. Fixed the issue by reducing the frequency. (BDR-1160, RT71345)Fix TOAST handling for UPDATE/UPDATE conflicts when Decoding Worker is used
Fix filtering of additional origins when Decoding Worker is used
This mostly affects mixing BDR with Decoding Worker and a separate pglogical replication.Eliminate potential hang in
bdr.raft_leadership_transfer
(BDR-1039)
In combination withwait_for_completion
, the best effort approach led to an infinite loop in case the original request was submitted properly, but the actual leadership transfer still failed.Do not throw an error when PGL manager can not start a worker (RT71345)
If PGL manager throws an error, it is restarted. Since it's responsible for maintaining the node states and other BDR management tasks restarting it on such errors affects the EDB Postgres Distributed cluster's health. Instead log a WARNING.Make the repset configuration handling during join more deterministic (RT71021)
Theautoadd_tables
option might not be respected in all cases before.Deprecate
pub_repsets
andsub_repsets
in bdr.node_summary (BDR-702, RT70743)
They now always showNULL
rather than bogus info, will be removed completely in next major version.Show node and group info in
bdr.node_slots
when origin and target node are in different groups.Make sure
bdr.monitor_local_replslots()
understands standby nodes and subscriber-only group configuration and does not check for slots that are not needed in these situations (BDR-720)Fix internal connection pooler potentially not reusing free connect slots (BDR-1068)
Fix reported schema name in the missing column error message (BDR-759)
BDR 3.7.9 (2021 Jun 15)
Improvements
Add
bdr.local_group_slot_name()
function which returns the group slot name (BDR-931)
Useful primarily for monitoring.Add
bdr.workers
view which show additional information about BDR workers (BDR-725)
Helps with monitoring of BDR specific activity. Useful especially when joined withbdr.stat_activity
.Allow Parallel Apply on logical standbys for forwarded transaction (BDR-852)
Previously, parallel apply would could be used only for changes replicated directly from the upstream of the logical standby, but not for any changes coming from another node.Introduce
bdr.batch_inserts
configuration variable (RT71004, RT70727)
This sets after how manyINSERT
s into same table in a row (in same transaction) BDR will switch to multi insert strategy.This normally improves performance of replication of large data loads, be it via
INSERT
s or theCOPY
command. However BDR 3.7.8 would try to use this strategy always which would result in performance degradation in workloads that do many single row inserts only.
Resolved issues
Destroy WAL decoder infra on node part/drop (BDR-1107) This enures that the WAL decoder infra is removed when a node is parted from the cluster. We remove the LCR directory as well as the decoder slot. This allows the node to cleanly join the cluster again later, if need be.
Do not start WAL decoder on subscriber-only node (BDR-821) The subscriber-only node doesn't send changes to any other nodes in the cluster. So it doesn't require WAL decoder infra and the WAL decoder process itself. Fixing this also ensures that the subscriber-only nodes do not hold back WAL because of an unused slot.
Start WAL decoder only after reaching PROMOTE state (BDR-1051) We used to create WAL decoder infra when a node starts the join process. That's too early and can lead to WAL accumulation for logical standbys. Instead, we now create the WAL decoder infra only when the node reaches PROMOTE state. That's the state when other nodes may start connecting to the node and hence need WAL decoder.
Fix group slot advance on subscriber-only nodes (BDR-916, BDR-925, RT71182)
This solves excessive WAL log retention on subscriber-only nodes.Use correct slot name when joining subscriber-only node using
bdr_init_physical
(BDR-895, BDR-898, RT71124)
Thebdr_init_physical
used to create wrong slot, which resulted in 2 slots existing on the join source node when subscriber-only node was joined using this method. This would result in excessive WAL retention on the join source node.Fix group monitoring view to allow more than one row per node (BDR-848)
Group monitoring views would previously truncate the information from any node reporting more than one row of information. This would result in for example slots missing inbdr.group_replslots_details
.Correct commit cancellation for CAMO (BDR-962()
This again corrects CAMO behaviour when a user cancels a query.Restore global lock counters state after receiver restart (BDR-958)
We already restored locks themselves but not the counters which could cause deadlocks during global locking when using parallel apply.Fix handling of
skip_transaction
conflict resolver when there are multiple changes in the transaction after the one that caused theskip_transaction
(BDR-886)Fix Raft snapshot creation for autopartitioned tables (RT71178, BDR-955)
Previously the Raft snapshot didn't take into account state of autopartition tasks on all nodes when writing the information. This could result in some nodes skipping partition creation after prolonged period of downtime.Adjust transaction and snapshot handling in autopartition (BDR-903)
This ensures valid snapshot is used during autopartition processing at all times. The previous approach would cause problem in the future point release of PostgreSQL.Fix KSUUID column detection in autopartition
Fix misreporting of node status by
bdr.drop_node()
functionEnsure that correct sequence type is always set in the global galloc sequence state.
Fix DDL replication and locking management of several commands (BDR-874)
ANALYZE
,CHECKPOINT
,CLUSTER
,PREPARE
/COMMIT
/ABORT
TRANSACTION
,MOVE
,RELEASE
,ROLLBACK
were documented as replicated and some of these even tried to take DDL lock which they should not.Reduce logging of some unreplicated utility commands (BDR-874)
PREPARE
andEXECTUE
don't need to spam about not being replicated as nobody expects that they would be.Fix global locking of
ALTER TABLE ... SET
(BDR-653)
It should not take global DML lock.Fix documentation about how
TRUNCATE
command is replicated (BDR-874)
WhileTRUNCATE
can acquire global locks, it's not replicated the way other DDL commands are, it's replicated like DML, according to replication set settings.Document that CAMO and Eager currently don't work with Decoding Worker (BDR-584)
Multiple typo and grammar fixes in docs.
BDR 3.7.8 (2021 May 18)
This is first stable release of the BDR 3.7. It includes both new major features and fixes for problems identified in 3.7.7.
Important notes
BDR 3.7 introduces several major new features as well as architectural changes some of which affect backward compatibility with existing applications. See Upgrades for details.
Upgrades are supported from BDR 3.6.25 and 3.7.7 in this release.
The highlights of BDR 3.7
Support for PostgreSQL 11, 12 and 13
Support EDB Advanced Server
Both Standard Edition and Enterprise Edition are now available to use with EDB Advanced ServerParallel Apply
Allows configuring number of parallel writers that apply the replication stream. This is feature is supported in Enterprise Edition only.AutoPartition
Allows automatic management of partitioned tables, with automated creation, automated cleanup with configurable retention periods and more.Introduce option to separate BDR WAL decoding worker
This allows using single decoding process on each node, regardless of number of subscriptions connected to it.
The decoded information is stored in logical change record (LCR) files which are streamed to the other nodes in similar way traditional WAL is. Optional separation of decoding from walsender.
This is feature is supported in Enterprise Edition only.Implement the concept of
subscriber-only
nodes
These are wholly joined nodes, but they don't ever send replication changes to other BDR nodes in the cluster. But they do receive changes from all nodes in the cluster (except, of course the other subscriber-only nodes). They do not participate in the Raft voting protocol, and hence their presence (or absence) does not determine Raft leader election. We don't need to create any replication slots on these nodes since they don't send replication changes. Similarly, we don't need to create any subscriptions for these nodes on other BDR nodes.Support
CREATE TABLE ... AS
andSELECT INTO
statement
This feature is now supported in Enterprise Edition only.New ability to define BDR sub-groups in order to better represent physical configuration of the EDB Postgres Distributed cluster.
This also simplifies configurations where the EDB Postgres Distributed cluster is spread over multiple data centers and only part of the database is replicated across data centers as each subgroup will automatically have new default replication set assigned to it.Multiple new monitoring views
Focused primarily on group level monitoring and in-progress monitoring on the apply side.Conflicts are now logged by default to
bdr.conflict_history
Logging to a partitioned table with row level security to allow easier access to conflicts for application users.New conflict types
multiple_unique_conflicts
andapply_error_ddl
Allows continuing replication in more edge case situationsReduced lock levels for some DDL statements
Also, documented workarounds that help with reducing lock levels for multiple other DDL statements.Use best available index when applying update and delete
This can drastically improve performance forREPLICA IDENTITY FULL
tables which don't have primary key.
Following are changes since 3.7.7.
Improvements
Support Parallel Apply in EDB Advanced Server (EE)
Increase progress reporting frequency when needed (BDR-436, BDR-522)
This helps speed up the performance of VALIDATE CONSTRAINT without DML locking.Change all BDR configuration options that are settable from SQL session to be settable by
bdr_superuser
rather than only Postgres superuser.Set bdr.ddl_replication to off in
bdr.run_on_all_nodes()
(BDR-445)
It's usually not desirable to replicate any DDL executed using thebdr.run_on_all_nodes()
function as it already runs it on all nodes.Improve monitoring of transactions that are in progress on apply side (BDR-690, BDR-691)
Add query to pg_stat_activity when applying DDL and several additional fields tobdr.subscription_summary
view which show LSN of latest received change, LSN of latest received commit, applied commit LSN, flushed LSN and applied timestamp.This helps monitoring of replication progress, especially when it comes to large transactions.
Add view
bdr.stat_activity
, similar topg_stat_activity
but shows BDR specific wait states.Allow batching inserts outside of the initial data sync
Improves performance of big data loads into existing BDR Group.Reduce the global lock level obtained by DROP INDEX from DML Global Lock to DDL Global Lock (BDR-652)
Resolved issues
Fix replication settings of several DDL commands
In general make sure that actual behavior and documented behavior for what's allowed, what's replicated and what locks are held during DDL replication match.For example TABLESPACE related commands should not be replicated.
Fix a race condition in concurrent join. (BDR-644, BDR-645)
Always create initially enabled subscription if the local node has already crossed the PROMOTING state.Set group leader for already held lock (BDR-418, BDR-291)
This solves "canceling statement due to global lock timeout" during some DDL operations when the writer already had open table before. This was especially problem when partitioning or parallel apply is involved.Progress WAL sender's slot based on WAL decoder input (BDR-567)
Without this, server could eventually stop working with single decoding worker.Switch to TEMPORARY replication slots in
bdr_init_physical
(BDR-191)
This ensures they are properly cleaned up afterbdr_init_physical
is done.Clean up XID progress records that are no longer required (BDR-436, BDR-532)
Reduces the size of the xid progress snapshot.Track applied_timestamp correctly in BDR Writer (BDR-609)
It was not updated in 3.7.7Fix creation of BDR Stream triggers on EPAS (BDR-581)
They used to be created as wrong trigger type.Improve error handling when options stored in LCR file and passed to walsender differ (BDR-551)
Enable WAL decoder config only for top node group (BDR-566)
We only allow group configuration changes for top node group in general.Use "C" collation or "name" type for specific BDR catalog columns (BDR-561)
This solves potential index collation issues for BDR catalogs.Correct commit cancellation for CAMO
This fixes CAMO behavior when user cancels a query.Fix autopartition handling of tables with already existing partitions (BDR-668)
Don't cache relation with no remote id in BDRWrite (BDR-620)
Fixes replication breakage after some forms of TRUNCATE command.Craft upstream decoder slot name considering upstream dbname in wal decoder (BDR-460)
Fixes slot names used by wal decoder.Use correct BDR output options used by WAL decoder and WAL sender using LCR (BDR-714)
Fix crash of monitor functions on a broken cluster. (BDR-580, BDR-696)
Don't show nonexisting slots for PARTED in bdr.node_slots view
Drop Stream Trigger when dropping node (BDR-692)
This enables use ofbdr_init_physical
with Stream Triggers.Ensure we don't segfault while handling a SIGUSR2 signal
Signals can come at any point in process lifetime so don't make any assumptions about the current state.Handle concurrent drop of the table which can lead to missing autopartition rule
Make sure we don't crash when we get ERROR during handing of different ERROR
Don't send global xid to client if we are in background worker
There is nobody to send this.
Other changes
Allow session-level bdr.xact_replication = off when bdr.permit_unsafe_commands is on
Helps when usingpg_restore
to manually populate the database.Various larger documentaion improvements
Throw nicer error when removing table from replication set if the table is not in the repset already (BDR-562)
Allow
check_constraints
option again, but make sure it's properly marked as deprecated (BDR-26)
Will be removed in BDR 4.0.Move the management of WAL senders when WAL decoder is enabled/disabled to manager process (BDR-612)
Managing them in consensus worker could negatively affect responsiveness of consensus subsystem.Check for interrups in more places
Should reduce chance of runaway loops
BDR 3.7.7 (2021 Apr 08)
This is a beta release of the BDR 3.7. It includes both new major features and fixes for problems identified in 3.7.6.
Important notes
BDR 3.7 introduces several major new features as well as architectural changes some of which affect backward compatibility with existing applications. See Upgrades for details.
Beta software is not supported in production - for application test only
Upgrades are supported from BDR 3.6.25 and 3.7.6 in this release.
Improvements
Support Enterprise Edition features on EDB Advanced Server
This notably excludes CAMO and Eager replication.Support most of the EDB Advanced Server DDL commands (EBC-45)
Note that DDL related to queues is replicated, but the contents of queues are not replicated.Adjust DDL replication handling to follow more on command level rather than internal representation (BDR-275)
This mainly makes filtering and documentation easier.Allow SELECT INTO statement in Enterprise Edition (BDR-306)
Handle BDR sequences in COPY FROM (BDR-466)
COPY FROM does it's own processing of column defaults which does not get caught by query planner hook as it only uses expression planner. Sadly, expression planner has no hook so we need to proccess the actual COPY FROM command itself.Improve bdr.run_on_all_nodes(BDR-326, BDR-303)
Change return type to jsonb, always return status of each command, Improve error reporting by returning the actual error message received from remote server.Add more info to conflict_history (BDR-440)
This adds couple new fields to the conflict history table for easier identification of tuples without having to look at the actual data.First one is origin_node_id which points to origin of the change which can be different than origin of the subscription because in some situations we forward changes from different original nodes.
Second one is change_nr which represents the number of change (based on counter) in the transaction. One change represents one row, not one original command.
These are also added to the conflict history summary table.
Add local_time into bdr.conflict_history_summary local_time is the partition key of bdr.conflict_history, which we need to allow monitoring queries to execute efficiently.
Add --node-group-name option to bdr_init_physical
Same as node_group_name in bdr.join_node_group - allows joining sub-group of a node.Store LCRs under directory named after WAL decoder slot (BDR-60)
Pglogical stores LCR in a directory named after the replication slot used to produce those.Various improvements in WAL decoder/sender coordination (BDR-232, BDR-335, BDR-342)
We now expose the information about WALDecoder waitlsn and let WALSender use that information to wait and signal the WALDecoder when the required WAL is available. This avoids the unnecessary polling and improves coordinator between the two.Single Decoder Worker GUC Option Changes. (BDR-222)
Changedbdr.receive_logical_change_records
tobdr.receive_lcr
andbdr.logical_change_records_cleanup_interval
tobdr.lcr_cleanup_interval
Move most of the CAMO/Eager code into BDR (BDR-330)
Makes CAMO and Eager All Node less dependent on Postgres patches.Support the parallelization of initial sync.
When parallel apply is enabled, the initial sync during logical join will be paralellized as well.Deprecate bdr.set_ddl_replication and bdr.set_ddl_locking.
Resolved issues
Fix logic in
bdr_stop_wal_decoder_senders()
(BDR-232)
Increase the period for which bdr_stop_wal_decoder_senders() should wait before checking status of WAL sender again.Disallow running ALTER TABLE..ADD FOREIGN KEY in some cases (EBC-38,BDR-155)
If the current user does not have permissions to read the referenced table, disallow the ALTER TABLE ADD FOREIGN KEY to such a tableImprove detection of queries which mix temporary and permanent objects
These need to be disallowed otherwise they could break replication.Fix EXPLAIN statement when using INTO TABLE clause.
Fix bdr.run_on_all_nodes() crash on mixed utility commands and DMLs (BDR-305)
Fix CTAS handling on older minor versions of EPAS
Consolidate table definition checks (BDR-24)
This fixes several hidden bugs where we'd miss the check or creation of extra objectFix REINDEX and DROP index on an invalid index (BDR-155, EBC-41)
REINDEX throws error if index is invalid. Users can drop invalid indexes using DROP index if_exists.Improve checks for local node group membership (BDR-271)
Couple of functions, namelybdr_wait_for_apply_queue
andbdr_resynchronize_table_from_node
didn't do this check, potentially causing a crash.Corrected misleading CTAS ERROR
In case of underlying un-supported or non-replicated utility, we should error out and should mention the underlying utility.Fixes and improvements around enabling WAL decoder (BDR-272, BDR-427)
Fix pglogical manager's WAL decoder infrastructure removal (BDR-484)
BDR 3.7.6 (2021 Feb 23)
This is a beta release of the BDR 3.7. It includes both new major features and fixes for problems identified in 3.7.5.
Important notes
BDR 3.7 introduces several major new features as well as architectural changes some of which affect backward compatibility with existing applications. See Upgrades for details.
Beta software is not supported in production - for application test only
Upgrades are supported from BDR 3.6.25 in this release.
Improvements
Introduce option to separate BDR WAL decoding worker (RM18868, BDR-51, BDR-58)
This allows using single decoding process on each node, regardless of number of subscriptions connected to it. The decoded information is stored in logical change record (LCR) files which are streamed to the other nodes in similar way traditional WAL is.Enable parallel apply for CAMO and Eager (RM17858)
Rework relation caching in BDRWriter
This fixes missed invalidations that happened between our cache lookup and table opening. We also reduced the amount of hash table lookups (improving performance).Don't allow mixing temporary and permanent object in single DDL command (BDR-93)
It's important to not try to replicate DDLs that work with temporary objects as such DDL is sure to break replication.Add bdr.alter_subscription_skip_changes_upto() (BDR-76)
Allows skipping replication changes up to given LSN for a specified subcription. Similar function already exists in pglogical.Make the snapshot entry handler lookup more robust (BDR-86)
This should make it harder to introduce future bugs with consensus snapshot handling.Add bdr.consensus_snapshot_verify() (BDR-124)
Can be used to verify that consensus snapshot provided is correct before passing it to bdr.consensus_snapshot_import().Add support for most DDL commands that are specific to EDB Postgres Advanced Server (EBC-39, EBC-40)
Reduce WARNING spam on non-replicated commands that are not expected to be replicated in the first place (like VACUUM)
Improve warnings and hints around CAMO configuration
Resolved issues
Make sure we have xid assigned before opening relation in writer
This should improve deadlock detection for parallel applyCheck table oid in function drop_trigger (BDR-35)
Fixes crash when invalid oid was passed to the function.Fix application of older consensus snapshots (BDR-231)
We used to not handle missing group UUID correctly resulting in 3.7 node not being able to join 3.6 cluster.Readjust default truncate handling (BDR-25)
Don't take lock by default. While this can cause potential out of order truncation, it presents better backwards compatibility.Fix crash when OPTION clause is used in CREATE FOREIGN TABLE statement (EBC-37)
Ensure that we don't send extra data while talking to node with old consensus protocol (BDR-135)
Read kv_data part of consensus snapshot in mixed version group (BDR-130)
Both BDR 3.6. and 3.7 write this part of consensus snapshot but BDR 3.7 would only read it if the snapshot was also written by 3.7.Move bdr.constraint to EE script (EBC-36)
It's Enterprise Edition only feature so the catalog should only be installed with Enterprise Edition.Don't try to replicate GRANT/REVOKE commands on TABLESPACE and Large Objects
These objects are not replicated so trying to replicate GRANT and REVOKE would break replication.Make sure CAMO does not block replay progress (RT69493)
Fix failed CAMO connection handling (RT69493, RM19924)
Correct the state machine to properly cleanup and recover from this failure and reset to the UNUSED & IDLE state.Don't accept Raft request from unknown nodes
Consensus leader should not accept raft request from nodes it does not know.Don't try to negotiate consensus protocol on unknown node progress (RT69779)
When node is forcefully dropped, we might still receive progress message from it. This has to gracefully ignore such message otherwise consensus could break in such situation.
Other changes
- Remove code unsupported consensus protocols (BDR-86)
BDR 3.7.5 (2021 Jan 19)
This is a beta release of the BDR 3.7. It includes both new major features and fixes for problems identified in 3.7.4.
Important notes
BDR 3.7 introduces several major new features as well as architectural changes some of which affect backward compatibility with existing applications. See Upgrades for details.
Beta software is not supported in production - for application test only
Upgrades are supported from BDR 3.6.22 in this release.
Improvements
Reduce "now supports consensus protocols" log spam. (RT69557)
Extend
bdr.drop_node
with anode_state
check. (RM19280)
Adds a new argument 'force' tobdr.drop_node
, defaulting to false, in which case the following additional check is performed: Viabdr.run_on_all_nodes
, the currentnode_state
of the node to be dropped is queried. If the node to be parted is not fully parted on all nodes, this now yields an error. The force argument allows to ignore this check. This feature also removes the "force" behavior thatcascade
had before, now we have two distinct options, one to skip sanity checks (force) and one to cascade to dependent objects (cascade).Deprecate
pg2q.enable_camo
(RM19942, RT69521)
The parameter has been changed in 3.7 to the newbdr.enable_camo
.Add new parameter
detector_args
tobdr.alter_table_conflict_detection
(RT69677)
Allow additional parameters for individual detectors. Currently just adds atttype for row_version which allows using smallint and bigint, not just the default integer for the column type.Add
bdr.raft_leadership_transfer
(RM20159)
Promote a specific node as the Raft leader. Per Raft paper, transferring leadership to a specific node can be done by the following steps:- the current leader stops accepting new requests
- the current leader sends all pending append entries to the designated leader
- the current leader then forces an election timeout on the designated leader, giving it a better chance to become the next leader
The feature pretty much follows that outline. Instead of sending append entries just to the designated leader, we send it to all nodes as that also acts as a heartbeat. That should ensure that no other node times out while the current leader delegating power to the designated node. We also check status of the designated node and don't accept the request if the node is not an active node or if it doesn't have voting rights.
Implement the concept of
subscriber-only
nodes
These are wholly joined nodes, but they don't ever send replication changes to other BDR nodes in the cluster. But they do receive changes from all nodes in the cluster (except, of course the other subscriber-only nodes). They do not participate in the Raft voting protocol, and hence their presence (or absence) does not determine Raft leader election. We don't need to create any replication slots on these nodes since they don't send replication changes. Similarly, we don't need to create any subscriptions for these nodes on other BDR nodes. We implement this by defining a new type of BDR node group, called "subscriber-only" group. Any node supposed to be a subscriber-only node should join this node group instead of the top level BDR group. Of course, someone needs to create the subscriber-only BDR nodegroup first. The feature does not attempt to create it automatically.Improve DDL replication support for PostgreSQL 13
TheALTER STATISTICS
andALTER TYPE ... SET
commands are now supported.
Resolved issues
Relax the safety check in
bdr.drop_node
. (RT69639)
If a node is already dropped on any peer node, that peer does not know the status of the node to drop. It must still be okay to drop that node.Do not re-insert a deleted autopartition rule.
When an autopartition rule is dropped by one node and while the action is being replicated on some other node, if the other node executes one or more pending tasks for the table, we might accidentally re-insert the rule just being dropped. That leads to problems as where we fail to drop the table on the remote node because the dependency check on autopartition rules fails.Fix definition of
node_summary
andlocal_node_summary
views (RT69564)
While the underlying pglogical catalogs support multiple interfaces per node, BDR will only ever use one, the one that's named same as the node. These views didn't reflect that and shown wrong information - if the node had multiple interfaces the node_summary view would show multiple results and the local_node_summary would not necessarily pick the correct one from those either.Fix
bdr.node_log_config
(RM20318)
Adjust the viewbdr.node_log_config
to return correctly the conflict resolution.Fix table access statistics reporting inside the writer
This should fix PostgreSQL monitoring views that show access and I/O statistics for tables which was broken in previous betas.Fix the partitioning of
bdr.conflict_history
after upgrade from 3.6
Previously we'd keep the 3.6 definition, now we do the automatic partitioning same way as fresh 3.7 installs.Fix node name reuse for nodes that get initialized from snapshot (RM20111)
These nodes previously missed initial state info which could cause catchup phase of join process to be skipped, with the new node missing concurrently written data as a result. This now works correctly.Fix potential crash on table rewrite (
VACUUM FULL
) on Standard Edition (EBC-34)
Check for triggers on Standard Edition could cause crash on table rewrite previously.Don't try to drop Enterprise Edition objects when removing node in Standard Edition (RM19581)
Improve documentation language
BDR 3.7.4 (2020 Nov 05)
This is a beta release of the BDR 3.7. It includes both new major features and fixes for problems identified in 3.7.3.
Important notes
BDR 3.7 introduces several major new features as well as architectural changes some of which affect backward compatibility with existing applications. See Upgrades for details.
Beta software is not supported in production - for application test only
Upgrades are supported from BDR 3.6.22 in this release.
Improvements
Add support for PostgreSQL 13
Extend
bdr.get_node_sub_receive_lsn
with an optionalcommitted
argument
The default behaviour has been corrected to return only the last received LSN for a committed transaction to apply (filtered), which is the original intent and use of the function (e.g. by HARP). Passing afalse
lets this function return the unfiltered most recent LSN received, matching the previous version's behavior. This change is related to the hang inbdr.wait_for_apply_queue
mentioned below.Error out if INCREMENT BY is more than galloc chunk range (RM18519) The smallint, int and bigint galloc sequences get 1000, 1000000, 1000000000 values allocated in each chunk respectively. We error out if the INCREMENT value is more than these ranges.
Add support for validating constraints without a global DML lock (RM12646)
The DDL operation ALTER TABLE ... ADD CONSTRAINT can take quite some time due to the validation to be performed. BDR now allows deferring the validation and running the ALTER TABLE ... VALIDATE CONSTRAINT part without holding the DML lock during the lengthy validation period.See the section "Adding a CONSTRAINT" in the "DDL Replication" chapter of the documentation for more details.
ALTER TABLE ... VALIDATE CONSTRAINTS waits for completion
Instead of expecting the user to explicitly wait for completion of this DDL operation, BDR now checks progress and waits for completion automatically.Add new conflict kind
apply_error_ddl
and resolverskip_transaction
(RM19351)
Can be used to skip transactions where DDL replication would causeERROR
. For example when same DDL was applied manually on multiple nodes.Add new statistics to
bdr.stat_subscription
(RM18548)- nabort - how many aborts did writer get
- how many errors the writer seen (currently same as above)
- nskippedtx - how many txes did the writer skip (using the
skip_transaction
conflict resolver) - nretries - how many times writer did retry without restart/reconnect
Improve SystemTAP integration, especially for global locking.
Resolved issues
Correct a hang in
bdr.wait_for_apply_queue
(RM11416, also affects CAMO)
Keepalive messages possibly move the LSN forward. In an otherwise quiescent system (without any transactions processed), this may have led to a hang inbdr.wait_for_apply_queue
, because there may not be anything to apply for the corresponding PGL writer, so theapply_lsn
doesn't ever reach thereceive_lsn
. A proper CAMO client implementation usesbdr.logical_transaction_status
, which in turn uses the affected function internally. Thus a CAMO switch- or fail-over could also have led to a hang. This release prevents the hang by discarding LSN increments for which there is nothing to apply on the subscriber.Allow consensus protocol version upgrades despite parted nodes (RM19041)
Exclude already parted nodes from the consensus protocol version negotiation, as such nodes do not participate in the consensus protocol any more. Ensures the newest protocol version among the set of active nodes is used.Numerous fixes for galloc sequences (RM18519, RM18512) The "nextval" code for galloc sequences had numerous issues:
- Large INCREMENT BY values (+ve or -ve) were not working correctly
- Large CACHE values were not handled properly
- MINVAL/MAXVAL not honored in some cases The crux of the issue was that large increments or cache calls would need to make multiple Raft fetch calls. This caused the loop retry code to be invoked multiple times. The various variables to track the loops needed adjustment.
Fix tracking of the last committed LSN for CAMO and Eager transactions (RM13509)
The GUCbdr.last_committed_lsn
was only updated for standard asynchronous BDR transactions, not for CAMO or Eager ones.Fix a problem with NULL values in
bdr.ddl_epoch
catalog (RM19046, RM19072)
Release 3.7 added a newepoch_consumed_lsn
column tobdr.ddl_epoch
catalog. Adding a new column would set the column value to NULL in all existing rows in the table. But the code failed to handle the NULL values properly. This could lead to reading garbage values or even memory access errors. The garbage values can potentially lead to global lock timeouts as a backend may wait on a LSN which is far into the future.We fix this by updating all NULL values to '0/0' LSN, which is an invalid value representation for LSN. The column is marked NOT NULL explicitly and the code is fixed to never generate new NULL values for the column.
Corrections for upgrading from BDR 3.6.22
Properly migrate subscription writer and conflict handlers from PGLogical, where this information used to be with BDR 3.6. Ensure bdr.conflict_history is handled properly after an upgrade.Fix
JOINING
state handling on consensus request timeout (RT69076)
The timeoud duringJOINING
state handling could result in node unable to join the BDR group. The retry logic now handles this state correctly.Validate inputs to replication_set_remove_table (RT69248, RM19620)
Handle missing column gracefully for
ALTER COLUMN TYPE
(RM19389, RT69114)
Throw the standard ERROR rather than crashing when this happens.Fix memory handling of a tuple slot during conflict lookup (RM18543)
No longer crashes when the found tuple is logged into conflict log table.Fix local node cache invalidation handling (RM13821)
Previously BDR might not notice node creation or node drop due to race conditions, and would chose wrong behavior inside user backend.
BDR 3.7.3 (2020 Aug 06)
This is a beta release of the BDR 3.7. It includes both new major features and fixes for problems indentified in 3.7.2.
Important notes
BDR 3.7 introduces several major new features as well as architectural changes some of which affect backward compatibility with existing applications. See Upgrades for details.
Beta software is not supported in production - for application test only
Upgrade from 3.6 is not supported in this release, yet.
Improvements
Parallel Apply (RM6503)
Using the new infrastructure in pglogical 3.7.3, add support for parallel writers.
The defaults are controlled by same pglogical configuration options (and hence this feature is currently off by default)
The number of parallel writers can be changed per group using thenum_writers
parameter of thebdr.alter_node_group_config()
administration interface.resynchronize_table_from_node()
works with the generated columns (RM14876)
It copies all the columns except the generated columns from remote node and computes the generated column values locally.resynchronize_table_from_node()
freezes
the table on target node (RM15987) When we use this function the target table is truncated first and then copied into on the destination node. This activity additionally FREEZEs the tuples when the resync happens. This avoids a ton of WAL activity which could potentially happen when hint bit related I/O+WAL would come into the picture in the future on this destination node.Allow use of CRDTs on databases with BDR extension installed but without any node (RM17470). Earlier restoring CRDT values on a node with BDR extension, but without any node, would have failed with an ERROR as the CRDT data type queries for the node identifier. It is now fixed by storing an
InvalidOid
value when the node identifier is not available. If the node is subsequently added to a BDR cluster and when the CRDT value is updated,InvalidOid
will be replaced by a proper node identifier as part of the UPDATE operation.Add consistent KV Store implementation for the use by the HARP project (RM17825)
This is not meant for direct user consumption, but enables the HARP to work with BDR without additional consensus setup.
Resolved issues
Re-add the "local_only" replication origin (RT68021)
Usingbdr_init_physical
may have inadvertently removed it due to a bug that existing up until release 3.6.19. This release ensures to recreate it, if it's missing.Handle NULL arguments to bdr.alter_node_set_log_config() gracefully (RT68375, RM17994)
The function caused segmentation fault when the first argument to this function is NULL. It is now fixed to provide an appropriate error message instead.Fix MAXVALUE and MINVALUE with galloc sequences (RM14596)
While fetching values in advance, we could have reached the limit. Now we use only the values that we fetched before reaching the limit.Optionally wait for replication changes triggered by prior epoch (RM17594, RM17802) This improves handling of multiple concurrent DDL operations across the BDR Group which would previously result in global lock timeout, but now are allowed to pass as long as the replication lag between nodes is not too large.
resynchronize_table_from_node()
now correctly checks membership of the resynchronized table in replication sets subscribed by the target node (RM17621) This is important in order to not allow unprivileged users to copy tables that they don't have otherwise ability to access.Allow new group creation request to work after previous attempt has failed (RM17482)
Previously, the new requests would always fail in some setups until BDR was completely removed from the node and reinstalled if the initial group creation has failed.Lower the CPU consumption of consensus worker when Autopartition feature is used (RM18002)
Fix memory leak during initial data synchronization (RM17668)
Fix
update_recently_deleted
conflict detection (RM16471)
This conflict was not detected correctly in 3.7.2.Check the options when altering a galloc sequence (RM18301, RT68470) Galloc sequences do not accept some modifications, warn the user in case not allowed options are used.
Make sure
bdr_wait_slot_confirm_lsn
is waiting for all slots (RM17478)
This function used to skip some of the slots when checking if downstream has replicated everything.Improve
PART_CATCHUP
node state handling (RM17418)
Resolves cases where node state would stayPART_CATCHUP
forever due to race condition between nodes.Make the consensus process more resilient when there are missing parted nodes
Don't fail when trying to update a node's state toPARTED
and the node no longer exists.Remove
--recovery-conf
argument frombdr_init_physical
(RM17196)
It didn't work previously anywa and PostgreSQL12 does not longer haverecovery.conf
.
Other improvements
Enable
bdr.truncate_locking
by default
This is needed for TRUNCATE operations to always produce consistent results when there is concurrent DML happening in the BDR Group. This was missed by previous beta.Create a virtual sequence record on other nodes RM16008 If we create a galloc sequence and try to use its value in the same transaction block, then because it does not exist yet on other nodes, it used to error out with "could not fetch next sequence chunk" on the other nodes. We solve this by creating a virtual record on the other nodes.
Significant improvements to the language in documentation.
BDR 3.7.2 (2020 Jun 01)
This is a beta release of the BDR 3.7.
Important notes
BDR 3.7 introduces several major new features as well as architectural changes some of which affect backward compatibility with existing applications. See Upgrades for details.
Beta software is not supported in production - for application test only
Upgrade from 3.6 is not supported in this release, yet.
The highlights of BDR 3.7
Parallel Apply
Allows configuring number of parallel writers that apply the replication stream.AutoPartition
See AutoPartition for details.Support
CREATE TABLE ... AS
statement (RM9696)
This feature is now supported in Enterprise Edition only.New ability to define BDR sub-groups in order to better represent physical configuration of the EDB Postgres Distributed cluster.
This also simplifies configurations where the EDB Postgres Distributed cluster is spread over multiple datacenters and only part of the database is replicated across datacenters as each subgroup will automatically have new default replication set assigned to it.Conflicts are now logged by default to
bdr.conflict_history
Logging to a partitioned table with row level security to allow easier access to conflicts for application users.New conflict type
multiple_unique_conflicts
Allows resolution of complex conflicts involving multiple UNIQUE constraints for both INSERT and UPDATE.Merge views
bdr.node_replication_rates
andbdr.node_estimate
intobdr.node_replication_rates
.bdr.node_estimate
has been removed (RM13523)Don't replicate REINDEX command, now treated as a maintenance command
Various other changes to default settings
Other improvements
Optional monitoring tables for describing node connections and geographical distribution
Add bdr.resynchronize_table_from_node function (RM13565, RM14875)
This function resynchronizes the relation from a remote node. This acquires a global DML lock on the relation, truncates the relation locally, and copies data into it from the remote note. The relation must exist on both nodes with the same name and definition.Add a function bdr.trigger_get_origin_node_id to be used in conflict triggers(RM15105, RT67601)
This will enable users to define their conflict triggers such that a trusted node will always win in case of DML conflicts.Extend
bdr.wait_for_apply_queue
to wait for a specific LSN (RM11059, RT65827)Add committed LSN reporting via
bdr.last_committed_lsn
(RM11059, RT65827)BDR now accepts also URI in connection strings (RM14588)
We can now specify also the format URI "postgresql://... " for the connection string.
Resolved issues
Resilience against
idle_in_transaction_session_timeout
(RM13649, RT67029, RT67688)
Setidle_in_transaction_session_timeout
to 0 so we avoid any user setting that could close the connection and invalidate the snapshot.Correct parsing of BDR WAL messages (RT67662)
In rare cases a DDL which is replicated across a EDB Postgres Distributed cluster and requires a global lock may cause errors such as "invalid memory alloc request size" or "insufficient data left in message" due to incorrect parsing of direct WAL messages. The code has been fixed to parse and handle such WAL messages correctly.Fix locking in ALTER TABLE with multiple sub commands (RM14771)
Multiple ALTER TABLE sub-commands should honor the locking requirements of the overall set. If one sub-command needs the locks, then the entire ALTER TABLE command needs it as well.
- On this page
- BDR 3.7.22 (2023 Aug 31)
- BDR 3.7.21 (2023 May 16)
- BDR 3.7.20 (2023 Feb 14)
- BDR 3.7.19 (2022 Dec 13)
- BDR 3.7.18 (2022 Nov 16)
- BDR 3.7.17 (2022 Aug 23)
- BDR 3.7.16 (2022 May 17)
- BDR 3.7.15 (2022 Feb 15)
- BDR 3.7.14 (2021 Dec 15)
- BDR 3.7.13.1 (2021 Nov 19)
- BDR 3.7.13
- BDR 3.7.12 (2021 Sep 21)
- BDR 3.7.11 (2021 Aug 18)
- BDR 3.7.10 (2021 Jul 20)
- BDR 3.7.9 (2021 Jun 15)
- BDR 3.7.8 (2021 May 18)
- BDR 3.7.7 (2021 Apr 08)
- BDR 3.7.6 (2021 Feb 23)
- BDR 3.7.5 (2021 Jan 19)
- BDR 3.7.4 (2020 Nov 05)
- BDR 3.7.3 (2020 Aug 06)
- BDR 3.7.2 (2020 Jun 01)