Game of Trees Repos

Commits

Commit:: db9b9b1c2b70d98419e70b05e7283b2284bedbec
From:: Stefan Sperling <stsp@stsp.name>
Date:: Tue Jun 14 20:26:15 2022 UTC

let got-read-pack be explicit about whether it could enumerate all objects This allows the main process to avoid looping over all object IDs again in case the pack file used for enumeration is complete. ok op@

diff | patch | tree

Commit:: eb7b30a1caf056832bec7619ececf88efa18f6bd
From:: Stefan Sperling <stsp@stsp.name>
Date:: Mon Jun 13 17:13:59 2022 UTC

fix error handling in find_pack_for_enumeration(); pointed out by op@

diff | patch | tree

Commit:: 0ab4c95723904e176687f5edc131bdf422dd261a
From:: Stefan Sperling <stsp@stsp.name>
Date:: Mon Jun 13 17:13:59 2022 UTC

Bring back object enumeration inside got-read-pack as a fast path. The problem that was found in the earlier version has been fixed. ok op@

diff | patch | tree

Commit:: e44d939152693c16e95d2855b539ad6b30e81b15
From:: Stefan Sperling <stsp@stsp.name>
Date:: Tue Jun 7 19:20:01 2022 UTC

revert object enumeration in got-read-pack for now; needs more work This implementation marked commits and trees as enumerated before all trees which they depend on were enumerated. This behaviour leads to incomplete pack files when a tree is only partially packed and got-read-pack hits a missing tree entry as a result. The algorithm must be reworked such that packed leave nodes are marked enumerated first, then bubble-up. Found by op@

diff | patch | tree

Commit:: 9f4f302a43f7e186910d59f9dbe0f839b6f2d565
From:: Stefan Sperling <stsp@stsp.name>
Date:: Tue Jun 7 16:04:15 2022 UTC

free id and path in load_packed_tree_ids() on error, else they would leak pointed out by op@

diff | patch | tree

Commit:: cee6a7ea556f9f3ae0f50df959c2acd8cb59bf80
From:: Stefan Sperling <stsp@stsp.name>
Date:: Tue Jun 7 15:56:46 2022 UTC

implement object enumeration support in got-read-pack ok op@

diff | patch | tree

Commit:: ce2bf7b7c9058374563c6db8608dbab9df2bba7d
From:: Stefan Sperling <stsp@stsp.name>
Date:: Sun May 29 17:51:33 2022 UTC

fix a bug in findwixt() which caused pack files with missing parent commits The 'nskip' variable is supposed to reflect commits which are waiting on the queue and have the 'skip' color. Only increment 'nskip' when adding such commits to the queue. Problem observed with got send -T and a tag pointing to a deleted branch. Test to reproduce the bug written by op@.

diff | patch | tree

Commit:: d6a28ffe187127e3247254d7e242bb52d66eb26b
From:: Omar Polo <op@omarpolo.com>
Date:: Fri May 20 21:21:42 2022 UTC

use random seeds for murmurhash2 change the three hardcoded seeds to fresh ones generated on demand via arc4random. Suggested/fixed by and ok stsp@

diff | patch | tree

Commit:: 17cfdba68dcb4432269af930abb1f9fb9ee48e97
From:: Omar Polo <op@omarpolo.com>
Date:: Fri May 20 21:19:30 2022 UTC

include header

diff | patch | tree

Commit:: 411cbec1f714f639184814306c5c88454521e289
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri May 20 09:31:25 2022 UTC

shrink struct got_pack_meta a bit by removing the have_reused_delta flag This flag can be expressed as m->reused_delta_offset != 0 because all deltas in valid pack files will be written at a non-zero offset. We allocate a huge number of these structs during packing, so every little bit helps.

diff | patch | tree

Commit:: adb4bbb29d6a1407355e47e71716ca7f40c6dd67
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri May 20 08:40:46 2022 UTC

reduce the amount of memory used for caching deltas during deltification With files sorted properly for deltification we produce better deltas but end up consuming more memory and risk running into OpenBSD ulimits during packing. To compensate, reduce the threshold for the amount of delta data we store in memory, spooling more deltas into the cache file. ok op@

diff | patch | tree

Commit:: f8174ca59ba426ea9c475fd15d2db770f8595b5e
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri May 20 08:40:46 2022 UTC

store a path hash instead of a verbatim path in pack meta data This reduces memory use by gotadmin pack. The goal is to sort files which share a path next to each other for deltification. A hash of the path is good enough for this purpose and consumes less memory than a verbatim copy of the path. Git does something similar. ok op@

diff | patch | tree

Commit:: 3e6ceea0bd8a65737eb2231ce18d0e591dfb92ff
From:: Stefan Sperling <stsp@stsp.name>
Date:: Fri May 20 08:40:46 2022 UTC

fix paths stored in pack meta data, improving file deltification The old code was broken and stored an empty path or filenames, instead of a repository-relative path. Which means we didn't sort files for deltification as was intended. Fixing this provides much better deltas in large pack files written by gotadmin pack -a. In my test case, pack size changed from 2GB to 1.5GB. ok op@

diff | patch | tree

Commit:: 17259bfa94068499f61aec3129c47ae2671bd531
From:: Stefan Sperling <stsp@stsp.name>
Date:: Thu May 19 09:26:13 2022 UTC

plug a small memleak on error in got_pack_create()

diff | patch | tree

Commit:: e93fb944fe7b32d96fafd4726043a28062563b54
From:: Stefan Sperling <stsp@stsp.name>
Date:: Tue May 10 11:34:16 2022 UTC

map delta cache file into memory if possible while writing a pack file with a fix from + ok op@

diff | patch | tree

Commit:: dc3fe1bf10950c200037f0198f2445ea176f8cc3
From:: Stefan Sperling <stsp@stsp.name>
Date:: Tue May 10 11:24:12 2022 UTC

fix load_object_ids() such that packing tags works if zero commits are packed reported by jrick and op

diff | patch | tree

Commit:: fae7e03842e8618973f4d4910a86a52d881ab2ab
From:: Stefan Sperling <stsp@stsp.name>
Date:: Sat May 7 11:50:56 2022 UTC

run the search for deltas to reuse in got-read-pack This significantly speeds up the deltification step of packing by avoiding imsg traffic. gotadmin no longer requests individual raw deltas from got-read-pack to check whether it can reuse them. Instead, got-read-pack obtains a list of objects we want to pack, and hands back the list of all deltas in its pack file which can be reused. Messages are now batched such that imsg buffers are filled as much as possible. Another advantage is that deltas we are not going to reuse will no longer be written to the delta cache file, saving disk space. Before this patch, any raw delta candidate was written to the delta cache file by got-read-pack, and the decision whether to reuse the delta happened afterwards in the gotadmin process. Code for reading individual raw deltas is now unused and could be removed at some point. ok op@

diff | patch | tree

Commit:: 2f8438b006e9015401b93f55cea57b36b021ce56
From:: Stefan Sperling <stsp@stsp.name>
Date:: Wed May 4 15:39:15 2022 UTC

avoid 'remove unused' loop by storing excluded objects in a separate set ok op@

diff | patch | tree

Commit:: f5e78e05ae4eb8c5e5909841ee696fa5b6d0dfea
From:: Stefan Sperling <stsp@stsp.name>
Date:: Wed May 4 15:39:15 2022 UTC

avoid loop over the ID set which removes objects IDs with reused deltas ok op@

diff | patch | tree

Commit:: 2d9e6abf243a0a1895786fa9002b28d69a0f6fea
From:: Stefan Sperling <stsp@stsp.name>
Date:: Wed May 4 13:43:24 2022 UTC

store deltas in compressed form while packing, both in memory and cache file This reduces memory and disk space consumption during packing. with tweaks + memleak on error fix from op@ ok op@

diff | patch | tree

Commit:: 611e8e319eae000b4d691f6c188af95e4de294a7
From:: Stefan Sperling <stsp@stsp.name>
Date:: Sun May 1 11:47:21 2022 UTC

avoid subtraction of values larger than int in qsort(3) comparison callbacks tweak + ok tb@

diff | patch | tree

Commit:: d7b5a0e827bb38e5c8502f0ba8d7838fedaba19b
From:: Stefan Sperling <stsp@stsp.name>
Date:: Wed Apr 20 14:00:12 2022 UTC

inline struct got_object_id in struct got_object_qid Saves us from doing a malloc/free call for every item on the list. ok op@

diff | patch | tree

Commit:: cbc287dcbb29ad321dca5cd14c31998279205243
From:: Stefan Sperling <stsp@stsp.name>
Date:: Tue Apr 19 20:08:41 2022 UTC

reimplement object-ID set data structure on top of a hash table Siphash suggested by jrick as a better alternative to murmurhash for this use case. with small fixes from and ok op@

diff | patch | tree

Commit:: 70f8f24dc5b689abd6265a71e787dfef79ba40cf
From:: Stefan Sperling <stsp@stsp.name>
Date:: Thu Apr 14 15:05:19 2022 UTC

speed up initial stage of packing by adding a "skip" commit color The skip color marks boundary commits and their ancestors. Boundary commits are reachable both via references which we want to exclude from the pack, and via references which we want to include in the pack. We continue processing commit history up to the point we are left with only skip commits on the queue. This can speed up findtwixt() significantly and avoids wrong results produced by the old algorithm which made no distinction between "drop" and "skip". This idea was first implemented by Michael Forney for git9: https://git.9front.org/plan9front/plan9front/2e47badb88312c5c045a8042dc2ef80148e5ab47/commit.html Michael's log message for git9 is reproduced below: git/query: refactor graph painting algorithm (findtwixt, lca) We now keep track of 3 sets during traversal: - keep: commits we've reached from head commits - drop: commits we've reached from tail commits - skip: ancestors of commits in both 'keep' and 'drop' Commits in 'keep' and/or 'drop' may be added later to the 'skip' set if we discover later that they are part of a common subgraph of the head and tail commits. From these sets we can calculate the commits we are interested in: lca commits are those in 'keep' and 'drop', but not in 'skip'. findtwixt commits are those in 'keep', but not in 'drop' or 'skip'. The "LCA" commit returned is a common ancestor such that there are no other common ancestors that can reach that commit. Although there can be multiple commits that meet this criteria, where one is technically lower on the commit-graph than the other, these cases only happen in complex merge arrangements and any choice is likely a decent merge base. Repainting is now done in paint() directly. When we find a boundary commit, we switch our paint color to 'skip'. 'skip' painting does not stop when it hits another color; we continue until we are left with only 'skip' commits on the queue. This fixes several mishandled cases in the current algorithm: 1. If we hit the common subgraph from tail commits first (if the tail commit was newer than the head commit), we ended up traversing the entire commit graph. This is because we couldn't distinguish between 'drop' commits that were part of the common subgraph, and those that were still looking for it. 2. If we traversed through an initial part of the common subgraph from head commits before reaching it from tail commits, these commits were returned from findtwixt even though they were also reachable from tail commits. 3. In the same case as 2, we might end up choosing an incorrect commit as the LCA, which is an ancestor of the real LCA.

diff | patch | tree

Commit:: bb6672b6aba1cb45a10d52bec828e68379e9ad61
From:: Theo Buehler <tb@openbsd.org>
Date:: Thu Apr 14 09:51:32 2022 UTC

make sure callers of got_object_idset_add() free data.

diff | patch | tree

More ↓