I recently had a need to convert a PDF file into a Base64-encoded string, quickly — as in, within the next few minutes.
I had recently been writing some scripts in Python, both to accomplish some actual work, and to teach myself Python. I like a lot about Python, and the “batteries included” nature of its standard library makes it great for scripting.
So I launched TextMate, started typing a script, and, after a few quick Google searches, whipped this up:
import binascii
sourceFile = open('documents.pdf')
targetFile = open('output.txt', 'w')
targetFile.write(binascii.b2a_base64(sourceFile.read()))
That worked out really well. A couple of lines of code, a few quick searches, and after a few minutes, I’m done. Nice!
The next day, it occurred to me that I probably could have just as easily accomplished my task with Groovy, another one of my favorite languages/platforms. Like Python, Groovy is intended to support scripting and application development equally well, so its out-of-the-box capabilities are quite extensive.
Another couple of quick Google searches, a new TextMate window, and I had a Groovy script which does the same exact thing as the Python script:
sourceFile = new File("documents.pdf")
targetFile = new File("output.txt")
targetFile.write(sourceFile.readBytes().encodeBase64().toString())
Some observations:
- The Groovy version is a little higher-level than the Python code. I kinda like that, particularly for scripting.
- The Groovy version needs one less line — no need to import anything. That’s nice, but no big deal.
- Both scripts could have done the work with a single line, instead of creating the variables
sourceFileandtargetFile. But I think using those variables made it easier to write the scripts, and make them easier to read as well.
And that’s it for today, I hope this might have been of some value to readers interested in Python, Groovy, or scripting.
Chris Dary Said:
This little terminal one liner to encode/decode is also pretty neat:
http://www.macosxhints.com/article.php?story=20030721010526390
openssl also supports a lot of different encodings/digests (man dgst and man enc for more)
Avi Flax Said:
Good tip Chris! Thanks!
It’s also possible to run Python and Groovy scripts directly from the command line:
python -c “import binascii;open(‘output.txt’, ‘w’).write(binascii.b2a_base64(open(‘documents.pdf’).read()))”
groovy -e ‘new File(“output.txt”).write(new File(“documents.pdf”).readBytes().encodeBase64().toString())’
Matt Williams Said:
Python and Groovy are nice, but for “in the next few minutes”, don’t forget your resident php devs!
Matt Williams Said:
Apparently, php tags cause all following text to be stripped…
file_put_contents(
$argv[2],
file_get_contents($argv[1])
);
Matt Williams Said:
Wow, don’t post while distracted by the television, kids :P
One more try…
file_put_contents(
$argv[2],
base64_encode(file_get_contents($argv[1]))
);
Avi Flax Said:
Thanks Matt! Good stuff.
I think the only language from the Arc90 family that we’re missing at this point is Ruby… I wonder how long it’ll be before Nir or Dan post up a Ruby script for this purpose. And what’s after that? LOLcode?
Matt Williams Said:
Couldn’t resist…
Ruby script:
require “base64″
File.open(ARGV[0], ‘r’) do |input|
File.open(ARGV[1], ‘w’) do |output|
output.write(Base64.b64encode(input.read))
end
end
Ruby command line:
ruby -e ‘print [IO.read(File.join(Dir.pwd, ARGV[0]))].pack(“m”)’ input.pdf > output.txt
No luck on the LOLcode…
Joel Potischman Said:
Let’s not forget C#:
File.WriteAllText(“output.txt”, Convert.ToBase64String(File.ReadAllBytes(“input.pdf”)));