This is probably our number one reliability issue at the moment, and has been a problem for a very long time.
The existing CHECKPOINT_LOCK would prevent new connections being created when we are checkpointing or about to checkpoint. However, in many cases we obtain the db connection early on and then don't perform any queries until later. An example would be in synchronization, where the connection is obtained at the start of the process and then retained throughout the sync round. My suspicion is that we were encountering this series of events:
1. Open connection to database
2. Call maybeCheckpoint() and confirm there are no active transactions
3. An existing connection starts a new transaction
4. Checkpointing is performed, but deadlocks due to the in-progress transaction
This potential fix includes preparedStatement.execute() in the CHECKPOINT_LOCK, to block any new transactions being started when we are locked for checkpointing. It is fairly high risk so we need to build some confidence in this before releasing it.
This is probably the most efficient way to process the data on the fly, but it's still not very scalable. A better approach would be to pre-process the HTML when building the file structure, and then serve them completely statically (i.e. using a standard webserver rather than via application memory). But it makes sense to keep it this way for development and maybe early beta testing.
Rename to zh_SC for better distinguish between zh_SC (Simple Chinese)and zh_TC(Traditional Chinese)
Rephrase some of the words for better understanding.
This can be used to preview a site before signing a transaction and announcing it to the network. The response will need reworking to return JSON (along with most of the other new APIs)
This fixes an NPE when trying to send a file that doesn't exist. It also removes the caching, which we can add again later if it turns out to be needed.
Now that we aren't disconnecting mid sync, we can get away with more frequent disconnections. This brings the average connection length to around 9 mins.
Connection limits are defined in settings (denominated in seconds):
"minPeerConnectionTime": 120,
"maxPeerConnectionTime": 3600
Peers will disconnect after a randomly chosen amount of time between the min and the max. The default range is 2 minutes to 1 hour, as above.
This encourages nodes to connect to a wider range of peers across the course of each day, rather than staying connected to an "island" of peers for an extended period of time. Hopefully this will reduce the amount of parallel chains that can form due to permanently connected clusters of peers.
We may find that we need to reduce the defaults to get optimal results, however it is best to do this incrementally, with the option for reducing further via each node's settings. Being too aggressive here could cause some of the earlier problems (e.g. 20% missing blocks minted) to reappear. We can re-evaluate this in the next version. Note that if defaults are reduced significantly, we may need to add code to prevent this from happening mid-sync. With higher defaults, this is less of an issue.
Thanks to @szisti for supplying some base code for this commit, and also to @CWDSYSTEMS for diagnosing the original problem.