Log in

Carl's Blog Alert - REBOL Community
June 24th, 2006
01:30 am


Previous Entry Share Next Entry
Carl's Blog Alert
Sat, 24 Jun 2006 0:07:42
Copy and Checksum Large Files
Last week I wrote a short example of how to use checksum ports,
and last year I gave an example of how to use the /seek refinement to deal with large files.
The code below combines these two concepts in a function that copies a file, even if the file is larger than memory (e.g. MPG, MP3, WAV). It will also compute and return the checksum of the file's data.

This is a robust "commercial quality" file copy function that you can use in your applications. If you find a bug, please let me know and I will correct it here.

Title: "Copy File with Optional Checksum"
Author: "Carl Sassenrath"
License: 'MIT

copy-file: func [
"Copy a file. Return WORD for failure or return optional checksum."
from [file!]
dest [file!]
/sum "checksum the data"
ff ; from file port
tf ; to file port
path: split-path dest

foreach [block err-word] [
[make-dir/deep path/1] dir-failed
[ff: open/binary/read/seek from] read-failed
[tf: open/binary/write dest] write-failed
[if sum [sum: open [scheme: 'checksum]]] sum-failed
while [not tail? ff] [
print index? ff
data: copy/part ff 100000
insert tail tf data
if sum [insert sum data]
ff: skip ff length? data
;print index? ff
] copy-failed
if error? try block [
if port? sum [close sum]
if tf [close tf]
if ff [close ff]
return err-word

data: none
if sum [
update sum
data: copy sum
close sum
close tf
close ff
data ; checksum value or none

print copy-file/sum %movie.mpg %movie2.mpg
ask "done"


#The code has only been tested on REBOL 2.6.2. The code requires a newer REBOL that supports the /seek refinement (Core 2.6).

#If you are new to REBOL, note the way the foreach is used to perform error checking for each step and return the appropriate error word for failures.

#The make-dir line is correct as written. If you do a source on make-dir you will see that it becomes a no-op if the dir exists. Adding an additional exists? check is not needed.

#The "from file" (ff) is opened with /read access. This is done to cause an error if the file cannot be opened. Without it, the file will open as an empty file, even if it does not exist.

#The checksum port defaults to the SHA1 (secure hash) algorithm.

#The code remembers to close the ports if an error occurs.

#File data are copied in chunks of 100000. This number is arbitrary, and you can set it to whatever buffer size you prefer. Smaller numbers may slow the transfer. Larger numbers will require more memory.

#Uncomment the print lines if you want to see it working. You could also modify those lines to show a progress bar.

More here: http://www.rebol.net/article/0281.html

(Leave a comment)

Powered by LiveJournal.com