The Italian Parliament annoys me tremendously. Not for substantial reasons (though it might also annoy me for that reason), but for technical reasons.
They have some nicely formatted XML files for the resoconti (minutes) of each parliamentary sitting.
But their voting information is stuck in crappy PDFs.
Grrr.
So, I have to
- download all the PDF files using a horrible bash script;
- convert them to XML (
for file in *.pdf; do pdftohtml -xml "$file"; done) - examine the XML file to find out where the column breaks are
- write a perl script to parse the files using this information
…and then merge them.
Dear Chris,
may I ask you where in Italian Parliament’s website you find the XML files quoted in your post? Thanks a lot.
Link | September 16th, 2009 at 10:12 am
Sure — http://www.camera.it -> Documenti -> Resoconti, then pick a month and a seduta from the drop down menu. When you click on a seduta, you’ll see in the left-hand bar, “Resoconto in formato XML”. If you want to download them all (and have a linux-ish system).
for i in 1:213; do wget “http://documenti.camera.it/apps/resoconto/getXmlStenografico.aspx?idNumero=$i&idLegislatura=16″; done.
Link | September 16th, 2009 at 1:02 pm
Hi,
what infos are you looking for exactly ?
do you know about this project ?
http://parlamento.openpolis.it
they seem to have voting details and some
way to make them accessible.
my 2c,
Andrea
Link | September 20th, 2009 at 10:08 pm