Experimenting with Linked Open Data about FLOSS projects : matching Debian upstream projects

I’ve been experimenting with Linked Open Data about FLOSS projects harvested from different sources of DOAP or ADMS.SW descriptions. I’ve tried and match upstream projects of Debian packages with upstream projects hosted at Apache, Gnome, or Alioth.debian.org, or catalogued on Pypi.

I’m matching them on identical values of the Homepage field (comparing the Homepage Control field set by Debian packagers with the doap:homepage meta-data in the RDF documents harvested from the upstream project catalogues).

Here are initial results of my little experiment, for number of matched projects, and results on project name’s similarity :

Upstream catalogue Total matching projs Exact same project name Same project name (case independant)
apache 31 0 (0 %) 0 (0 %)
alioth 16 13 (81 %) 13 (81 %)
pypi 439 217 (49 %) 273 (62 %)
gnome 21 0 (0 %) 7 (33 %)
Total 507 230 (45%) 293 (58 %)

The data set contains tens of thousands of projects, with probably many duplicates, but from all of these, only 507 have common homepages.

As you can see, in some cases, the Debian source package names match the upstream project name (sometimes with lower/upper case variants), but in general, the project names aren’t identical, so it is interesting to try and match them by homepage.

For the curious ones, the Apache, Gnome and Pypi project catalogues use to provide RDF meta-data for quite some time. More recently have we introduced ADMS.SW meta-data for Debian source packages, and even more recently for the Alioth projects (through the ADMS.SW exporter plugin for FusionForge).

There are still some ways for improvements, for instance to normalize homepage URLs which tend to vary (trailing slashes, or different HTTP/HTTPS schemes).

Stay tuned for more details.

Debian Package Tracking System now produces RDF description of source packages

Here’s a second post on the subject of RDF descriptions for Debian source packages (see the previous post for some context).

From now on, the Debian Package Tracking System (PTS) will produce, alongside HTML pages meant for humans, RDF pages meant for Linked Data / Semantic Web aware applications.

Every Debian source package, which used to have an HTML page like http://packages.qa.debian.org/packagename now has a corresponding RDF/XML document available provided that the application/rdf+xml content-type is required (the HTTP client being redirected to the proper HTML or RDF document).
Continue reading “Debian Package Tracking System now produces RDF description of source packages”

Generating RDF description of Debian package sources with ADMS.SW

Edit : I’ve now managed to roll out my contribution which is now in production on packages.qa.debian.org. See a later post I’ve made on the subject, and beware that the generated RDF has changed a bit also.

ADMS.SW proposes specifications for description of software present in software catalogues.

I’ve tried and apply it to the contents of the Debian Package Tracking System (PTS), using transformation of the information known by the PTS to RDF+XML.

The result is not yet in production, but here’s an example of what can be done, using the Turtle syntax (more readable) :
Continue reading “Generating RDF description of Debian package sources with ADMS.SW”

First draft of Helios_bt bug ontology : request for comment

We’ve been working on modeling bug reports properties for quite a long time, but never managed to stabilize an ontology specification. Better late than never, here’s our first proposed draft and our request for comments :

More details in the helios project’s blog :
http://sourceforge.net/apps/wordpress/heliosplatform/2010/06/04/first-draft-of-helios_bt-bug-ontology-request-for-comment/