main page

In the following example I add German descriptions of chess openings to a file "eco.pgn" that contains the corresponding English chess opening codes. The file eco.pgn contains the following line for each opening (for example):

[ECO "B08"]

B08 denotes the code of the chess opening. The script searches this code in the first column of an html table in Wikipedia, proceeds to the third column and extracts the German description located there. This description is then inserted into eco.pgn as a line:
[White87 "German description"]

Doing that in Python is tedios because one needs to write a needlessly complex regular expression

r'\[ECO \"(?P<code>[A-E0-9]*)\"\]'

that not only matches the line [ECO "B08"], but also makes it possible to access the three character code B08. I thought it would be great to have a syntax for this purpose that is readable and therefore  less error-prone. Obviously, this is:

 [ECO "{code}"]

The variable {code} is set to the content found in the file eco.pgn and can be accessed in subsequent commands.

babelscriptPython (incomplete source code)
load eco.pgn as {pgn}
load http://de.wikipedia.org/w/index.php?title=ECO-Codes as {wikipedia}

loop
  find [ECO "{code}"] in {pgn} // now we are further than the whole Python code to the right

  findfirst <td>{code}</td> in {wikipedia} // find the code on the Wikipedia page
  find <td> in {wikipedia} // jump to next column
  find <td>{description}</td> in {wikipedia} // get German description
 
  striphtml {description} // free description from any html tags
  utf8toascii {description} // convert description to plain ASCII
 
  insert {newline}[White87 "{description}"] into {pgn} // insert description
repeat

save {pgn} as ECO_out.pgn
import urllib
import fileinput
import re

wikipedia = urllib.urlopen("http://de.wikipedia.org/w/index.php?title=ECO-Codes").read()
pgn = open("eco.pgn").read()

for eco in re.finditer(r'\[ECO \"(?P<code>[A-E0-9]*)\"\]', pgn):
    print eco.span('code')

This loop searches for [ECO "..."] and writes the position of the text between quotes to the tuple code. Now we need to extract the actual text, search it in the Wikipedia page, extract the corresponding German description and insert it into pgn. As we cannot change pgn in the above loop, we have to make a list of all text positions and insert the German descriptions afterwards in a second pass.

Aside from the fact that the above regular expression is unreadable, the whole task is done much simpler and in one pass in babelscript.