Sat, 24 Jun 2006 0:07:42
Copy and Checksum Large Files
Last week I wrote a short example of how to use checksum ports,
and last year I gave an example of how to use the /seek refinement to deal with large files.
The code below combines these two concepts in a function that copies a file, even if the file is larger than memory (e.g. MPG, MP3, WAV). It will also compute and return the checksum of the file's data.
This is a robust "commercial quality" file copy function that you can use in your applications. If you find a bug, please let me know and I will correct it here.
Title: "Copy File with Optional Checksum"
Author: "Carl Sassenrath"
copy-file: func [
"Copy a file. Return WORD for failure or return optional checksum."
/sum "checksum the data"
ff ; from file port
tf ; to file port
path: split-path dest
foreach [block err-word] [
[make-dir/deep path/1] dir-failed
[ff: open/binary/read/seek from] read-failed
[tf: open/binary/write dest] write-failed
[if sum [sum: open [scheme: 'checksum]]] sum-failed
while [not tail? ff] [
print index? ff
data: copy/part ff 100000
insert tail tf data
if sum [insert sum data]
ff: skip ff length? data
;print index? ff
if error? try block [
if port? sum [close sum]
if tf [close tf]
if ff [close ff]
if sum [
data: copy sum
data ; checksum value or none
print copy-file/sum %movie.mpg %movie2.mpg
#The code has only been tested on REBOL 2.6.2. The code requires a newer REBOL that supports the /seek refinement (Core 2.6).
#If you are new to REBOL, note the way the foreach is used to perform error checking for each step and return the appropriate error word for failures.
#The make-dir line is correct as written. If you do a source on make-dir you will see that it becomes a no-op if the dir exists. Adding an additional exists? check is not needed.
#The "from file" (ff) is opened with /read access. This is done to cause an error if the file cannot be opened. Without it, the file will open as an empty file, even if it does not exist.
#The checksum port defaults to the SHA1 (secure hash) algorithm.
#The code remembers to close the ports if an error occurs.
#File data are copied in chunks of 100000. This number is arbitrary, and you can set it to whatever buffer size you prefer. Smaller numbers may slow the transfer. Larger numbers will require more memory.
#Uncomment the print lines if you want to see it working. You could also modify those lines to show a progress bar.
More here: http://www.rebol.net/article/0281.html