httrack – WebLog Pro Olivier Berger

Shell script to connecting to a Shibboleth protected web app with curl

Here’s a shell script I’ve created (reusing one meant for CAS protected resources), which will allow to connect to a Web application protected by the Shibboleth SSO mechanism.
Continue reading “Shell script to connecting to a Shibboleth protected web app with curl”

Offline backup/mirror of a Moodle course, using httrack

I havent’ found much details online on how to perform a Moodle course mirror that could be browsed offline using httrack.

This could be useful both for backup purposes, or for distant learners with connectivity issues.

In my case, there’s a login/password dialog that grants access to the moodle platform, which can be processed by httrack by capturing the POST form results using the “catchurl” option.

The strategy I’ve used is to add filters so that everything is excluded and only explicitely mentioned filters are then allowed to be mirrored. This allows to perform the backup connected with a user that may have high privileges, while avoiding to disappear in loops or complex links following for UI rendering variants of Moodle’s interface.
Continue reading “Offline backup/mirror of a Moodle course, using httrack”

Offline backup mediawiki with httrack

I’ve had the need to restore the contents of a wiki which ran mediawiki, recently. Unfortunately there were no backups, and my only solution was to restore from an outdated version that was available in Google’s cache.

The problem was that I only had the HTML “output” version and copy-pasting it into the Wiki sources on restore time lost all formatting and links.

Thus I’ve come up with the following script which is con-ed to make systematic backups in the background, both of an offline viewable version of the wiki, in static HTML pages, and of the wiki pages’ sources, for eventual restoration.

It uses the marvelous httrack and wget tools.

Here we go :
#! /bin/sh


site=wiki.my.site

topurl=http://$site
backupdir=/home/me/backup-websites/$site
httrack -%i -w $topurl/index.php/Special:Allpages \

  -O "$backupdir" -%P -N0 -s0 -p7 -S -a -K0 -%k -A25000 \

  -F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" -%F '' \

  -%s -x -%x -%u \

  "-$site/index.php/Special:*" \

  "-$site/index.php?title=Special:*" \

  "+$site/index.php/Special:Recentchanges" \

  "-$site/index.php/Utilisateur:*" \

  "-$site/index.php/Discussion_Utilisateur:*" \

  "-$site/index.php/Aide:*" \

  "+*.css" \

  "-$site/index.php?title=*&oldid=*" \

  "-$site/index.php?title=*&action=edit" \

  "-$site/index.php?title=*&curid=*" \

  "+$site/index.php?title=*&action=history" \

  "-$site/index.php?title=*&action=history&*" \

  "-$site/index.php?title=*&curid=*&action=history*" \

  "-$site/index.php?title=*&limit=*&action=history"

for page in $(grep "link updated: $site/index.php/" $backupdir/hts-log.txt | sed "s,^.*link updated: $site/index.php/,," | sed 's/ ->.*//' | grep -v Special:) do wget -nv -O $backupdir/$site/index.php/${page}_raw.txt "$topurl/index.php?index=$page&action=raw" done

Hope this helps,