| babelscript | Python (incomplete source code) |
| load eco.pgn as {pgn} load http://de.wikipedia.org/w/index.php?title=ECO-Codes as {wikipedia} loop find [ECO "{code}"] in {pgn} // now we are further than the whole Python code to the right findfirst <td>{code}</td> in {wikipedia} // find the code on the Wikipedia page find <td> in {wikipedia} // jump to next column find <td>{description}</td> in {wikipedia} // get German description striphtml {description} // free description from any html tags utf8toascii {description} // convert description to plain ASCII insert {newline}[White87 "{description}"] into {pgn} // insert description repeat save {pgn} as ECO_out.pgn | import urllib import fileinput import re wikipedia = urllib.urlopen("http://de.wikipedia.org/w/index.php?title=ECO-Codes").read() pgn = open("eco.pgn").read() for eco in re.finditer(r'\[ECO \"(?P<code>[A-E0-9]*)\"\]', pgn): print eco.span('code') This loop searches for [ECO "..."] and writes the position of the text between quotes to the tuple code. Now we need to extract the actual text, search it in the Wikipedia page, extract the corresponding German description and insert it into pgn. As we cannot change pgn in the above loop, we have to make a list of all text positions and insert the German descriptions afterwards in a second pass. Aside from the fact that the above regular expression is unreadable, the whole task is done much simpler and in one pass in babelscript. |