Categories
From a Little Spark May Burst a Flame | Home | Moby Dick Annotated with Arc90's Sidenotes
Filed under on December 19, 2008 by Avi FlaxGroovy and Python for Quick Scripting Wins
I recently had a need to convert a PDF file into a Base64-encoded string, quickly — as in, within the next few minutes.
I had recently been writing some scripts in Python, both to accomplish some actual work, and to teach myself Python. I like a lot about Python, and the "batteries included" nature of its standard library makes it great for scripting.
So I launched TextMate, started typing a script, and, after a few quick Google searches, whipped this up:
import binascii
sourceFile = open('documents.pdf')
targetFile = open('output.txt', 'w')
targetFile.write(binascii.b2a_base64(sourceFile.read()))
That worked out really well. A couple of lines of code, a few quick searches, and after a few minutes, I'm done. Nice!
The next day, it occurred to me that I probably could have just as easily accomplished my task with Groovy, another one of my favorite languages/platforms. Like Python, Groovy is intended to support scripting and application development equally well, so its out-of-the-box capabilities are quite extensive.
Another couple of quick Google searches, a new TextMate window, and I had a Groovy script which does the same exact thing as the Python script:
sourceFile = new File("documents.pdf")
targetFile = new File("output.txt")
targetFile.write(sourceFile.readBytes().encodeBase64().toString())
Some observations:
- The Groovy version is a little higher-level than the Python code. I kinda like that, particularly for scripting.
- The Groovy version needs one less line — no need to import anything. That's nice, but no big deal.
- Both scripts could have done the work with a single line, instead of creating the variables
sourceFileandtargetFile. But I think using those variables made it easier to write the scripts, and make them easier to read as well.
And that's it for today, I hope this might have been of some value to readers interested in Python, Groovy, or scripting.
Comments
Good tip Chris! Thanks!
It's also possible to run Python and Groovy scripts directly from the command line:
python -c "import binascii;open('output.txt', 'w').write(binascii.b2a_base64(open('documents.pdf').read()))"
groovy -e 'new File("output.txt").write(new File("documents.pdf").readBytes().encodeBase64().toString())'
Posted on December 19, 2008 12:21 PM by Avi Flax
Python and Groovy are nice, but for "in the next few minutes", don't forget your resident php devs!
Posted on December 25, 2008 12:06 AM by Matt Williams
Apparently, php tags cause all following text to be stripped...
file_put_contents(
$argv[2],
file_get_contents($argv[1])
);
Posted on December 25, 2008 12:08 AM by Matt Williams
Wow, don't post while distracted by the television, kids :P
One more try...
file_put_contents(
$argv[2],
base64_encode(file_get_contents($argv[1]))
);
Posted on December 25, 2008 12:10 AM by Matt Williams
Thanks Matt! Good stuff.
I think the only language from the Arc90 family that we're missing at this point is Ruby... I wonder how long it'll be before Nir or Dan post up a Ruby script for this purpose. And what's after that? LOLcode?
Posted on December 25, 2008 9:58 AM by Avi Flax
Couldn't resist...
Ruby script:
require "base64" File.open(ARGV[0], 'r') do |input| File.open(ARGV[1], 'w') do |output| output.write(Base64.b64encode(input.read)) end endRuby command line:
ruby -e 'print [IO.read(File.join(Dir.pwd, ARGV[0]))].pack("m")' input.pdf > output.txt
No luck on the LOLcode...
Posted on December 25, 2008 1:11 PM by Matt Williams
Let's not forget C#:
File.WriteAllText("output.txt", Convert.ToBase64String(File.ReadAllBytes("input.pdf")));
Posted on January 15, 2009 11:03 AM by Joel Potischman
From a Little Spark May Burst a Flame | Main | Moby Dick Annotated with Arc90's Sidenotes

This little terminal one liner to encode/decode is also pretty neat:
http://www.macosxhints.com/article.php?story=20030721010526390
openssl also supports a lot of different encodings/digests (man dgst and man enc for more)
Posted on December 19, 2008 12:09 PM by Chris Dary