Commits
- Commit:
fcece7180725bba9a781eaa892af379b1986208b
- From:
- Omar Polo <op@omarpolo.com>
- Date:
attempt to speed up the deltification for big files
The current hash table perform poorly on big files due to a small
resize step that pushes the table to its limits continuously.
Instead, to have both a better performing hash table and keep the
memory consumption low, save the blocks in an array and use the
hash table as index. Then, use a more generous resizing scheme
that guarantees the good properties of the hash table.
To avoid having to rebuild the table when the array is resized,
save the indexes in the table, and to further reduce the memory
consumption use 32 bit indices. On amd64 this means that each slot
is 4 bytes instead of 8 for a pointer or 24 for a struct
got_deltify_block.
ok stsp@
- Commit:
0f2e686eec562e28977521d25101acfa4396b47a
- From:
- Omar Polo <op@omarpolo.com>
- Date:
bump the deltify table resize step
By incrementing the resize step from 64 to 256 deltifying takes
less time on modestly sized files; the resize is still a small
number instead of a fraction of the current table size (which would
be more usual for a hash table) since this code is also used in
gotd.
ok stsp
- Commit:
04aed1557bf2e67bfef8d3a991fd54526142c8a8
- From:
- Christian Weisgerber <naddy@mips.inka.de>
- Date:
fix off_t type mismatches
off_t is a signed type and depending on the platform, it can be
"long" or "long long", so cast to long long for printf().
ok stsp
- Commit:
d6a28ffe187127e3247254d7e242bb52d66eb26b
- From:
- Omar Polo <op@omarpolo.com>
- Date:
use random seeds for murmurhash2
change the three hardcoded seeds to fresh ones generated on demand via
arc4random. Suggested/fixed by and ok stsp@
- Commit:
d58ddaf3fc10239711ae7a88664e3a100567ba3c
- From:
- Christian Weisgerber <naddy@mips.inka.de>
- Date:
const-ify tables
ok thomas_adam millert
- Commit:
f6027426102430eb80a6df7ce1bf2e31d15cf85d
- From:
- Christian Weisgerber <naddy@mips.inka.de>
- Date:
consistently match size of hash variables to that returned by murmurhash
ok millert stsp
- Commit:
2b474c2514b417c6ead14e07c19c19c97dcbf7ff
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
use murmurhash instead of sha1 for deltification blocks; suggested by ori
- Commit:
64a8571e126da3ef8c0488551727b87e4509b50d
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
map raw object files into memory while packing if possible
- Commit:
2d467c6d020f635039e8a2fadf1b6ea7f7a18a9e
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
fix wrong function in error string of emitdelta()
- Commit:
f736d93a2da5b433c03766eee9f631af9dec2318
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
link to the FastCDC paper from deltify.c; suggested by Ori some time ago
- Commit:
6eab69f730c8340837a82452cf8797251b3e69c2
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
make the number of elements in deltify's geartab explicit
- Commit:
5de743f8fddcaaf2912ffc92dce239aa6227d6d0
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
fix seek to incorrect offset in the delta base when creating deltas
The stretchblk() function needs to compare data located after the block
which has just been matched. However, upon entry it was resetting the
file pointer of the delta base to the beginning(!) of the block.
The other file is correctly positioned after the block.
In many cases the data won't match and stretchblk() will not stretch
the matched block. But when the data did happen to match this resulted
in a bogus delta, and wrong file contents when the delta was applied.
Fix this by setting the delta base file pointer to end of the block.
Problem reported by naddy after our server refused a pack file which
was sent by 'got send'. I could reproduce the issue by running the
'gotadmin pack' command on a copy of naddy's repository.
ok naddy
- Commit:
0af64e86449b8d836b04b25ece0bbc5543a75238
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
plug a memory leak in an error path of got_deltify()
- Commit:
dd29967c8be9311a99ae3310d49789c65989498e
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
make got_deltify() rellocate the deltas array less often
- Commit:
9a8dc2b3ec216fd01b3c33137eb92d98ddadb63e
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
fix deltas with trailing data that is smaller than the minimum chunk size
- Commit:
740bba1c3179a597c83f7dd3a23bffb50a494bdf
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
allow the delta base file to lose its header between deltify_init and deltify
This simplifies pack file creation. A delta base could be read from a
loose object, a packfile, or it might be available in a temporary file.
All these cases can now be handled the same way. We may need to open,
close, and re-open a given delta base multiple times while packing.
- Commit:
7550e799ee994b0b74689a6895f84d8aaec86f49
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
check for errors from emitdelta() in got_deltify()
- Commit:
aa51f4a4acac901a4f1bf4062664644ce95d3e8c
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
handle fseek in got_deltify() instead of in stretchblk(); simplifies the code
- Commit:
f34b169e54fc4d4960f06b804cabe1aeec70e07d
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
Allow for skipping the base object header in got_deltify().
- Commit:
0d15f6dcf929ae42606d3ca046621aee79e45890
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
in addblk(), only read data into buffer1 if we will compare it to buffer2
suggested by and ok naddy@
- Commit:
68bdcdc2f5d3c37d918f85368c2537a8aa7d90eb
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
addblk() may seek in its input file; reposition the file pointer afterwards
- Commit:
a893025fd207950945eed1482170223a2d3b9ce3
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
addblk: iterate over the correct number of entries after growing the array
ok naddy
- Commit:
e89540a95a268f47ef2d1b24c41fbb72a1f0bdc9
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
addblk: be more careful about expanding the blocks array when we outgrow it
fixes + ok naddy
- Commit:
51a494da48acb57ed84501a6d10f39ed624c711e
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
check a block's hash as well as its length before expensive comparisons
suggested by + ok naddy, and Ori agrees
- Commit:
dbbf4a5f0cfb712c5970dcb79a65c5dd2e62b19a
- From:
- Stefan Sperling <stsp@stsp.name>
- Date:
allow got_deltify_free(NULL); will be needed by 'gotadmin pack'