CVS edit link for milesw

I've developed a module called Linked Data Import that can be used to import remote Linked Data content (using either content negotiation or SPARQL) and create nodes from that content. The module depends heavily on the Feeds module and is primarily a set of plugins for Feeds.

The README and INSTALL files with the module contain further information.

Sandbox project: http://drupal.org/sandbox/milesw/1085078

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

milesw’s picture

Status: Postponed (maintainer needs more info) » Needs review
FileSize
17.79 KB

Adding tarball with latest module code from GitHub.

apaderno’s picture

Status: Needs review » Fixed
Issue tags: +Module review

Hello, and thank you for applying for a CVS account.

As per requirements, the motivation needs to include a description of the module features, and a comparison with the existing solutions; the module feature description should be longer than two sentences.

apaderno’s picture

Status: Fixed » Needs work

I set the wrong status.

milesw’s picture

Status: Needs work » Needs review

Sorry, I thought it would be easier just to point to GitHub and the README where those questions are answered. I'll paste it here...

RDFIMPORTER MODULE

RDFimporter is primarily a set of plugins for the Feeds module that can be used to import remote RDF resources and map their content to Drupal nodes. It can be used for both one-time imports and periodic data imports.

Most of the Drupal functionality comes from the Feeds module itself. All RDF fetching and parsing is handled by the ARC2 RDF framework. (Both great pieces of software)

This module is the result of work being done at Cornell University's Mann Library to bring content from Cornell's VIVO database into Drupal. Our hope is that other groups and institutions will be able use this module to bring VIVO content into their Drupal sites. We also hope this code will be useful to others as RDF becomes more available on the Web.

GOALS

- Make it easy for site builders, including non-developers, to import remote RDF content
- Allow RDF to be mapped to standard Drupal structures (CCK, Taxonomy)
- Provide a way to update nodes manually or automatically

FEATURES

- Support for Linked Data and SPARQL endpoints (via ARC2)
- Allows users to define mappings between RDF properties and Drupal node properties
- Requires very little knowledge about RDF and SPARQL
- Automatically fetches human-readable labels for object properties
- Automatically filters out content in languages different from the site (when language is explicitly set in RDF)
- Provides hooks to 'preprocess' retrieved data and hooks to expose additional node properties

OTHER MODULES

There are some great modules available that provide similar functionality, including SPARQL Views and Linked Data. One major difference is that RDFimporter can work without a SPARQL endpoint if RDF is available via Linked Data. However, compared to those modules, RDFimporter does things quite lazily. It requires far less familiarity with RDF or SPARQL, but this ultimately makes it inefficient in some cases. If the data you're after is available from a SPARQL endpoint and you can write SPARQL queries, you might be better off with something like SPARQL Views + Feeds View Parser.

zzolo’s picture

Component: Miscellaneous » miscellaneous
Status: Needs review » Postponed

Hi. Please read all the following and the links provided as this is very important information about your CVS Application:

Drupal.org has moved from CVS to Git! This is a very significant change for the Drupal community and for your application. Please read the following documentation on how this affects and benefits you and the application process:
Migrating from CVS Applications to (Git) Full Project Applications

  • The status of this application will be put to "postponed" and by following the instructions in the above link, you will be able to reopen it.
  • Or if your application has been "needs work" for more than 5 weeks, your application will be marked as "closed (won't fix)". You can still reopen it, by reading the instructions above.
milesw’s picture

Status: Postponed » Needs review

The module is now located in this Git sandbox:

http://drupal.org/sandbox/milesw/1085078

jthorson’s picture

Project: Drupal.org CVS applications » Drupal.org security advisory coverage applications
Component: miscellaneous » module

Please note step 5 at the Migrating CVS Applications Link.

sreynen’s picture

Title: milesw [milesw] » RDFimporter

I'm updating the title on this to match the project, hopefully making it a little easier for interested reviewers to find.

Anonymous’s picture

Status: Needs review » Needs work

Wow, this has lingered in the queue for a long time.

I am reviewing the 7.x version of the module as found on GitHub.

This module is pretty solid in a lot of ways:

  • Does not duplicate functionality. There are no Feeds parsers that allow such sophisticated RDF parsing and SPARQL queries.
  • Handles it's third party code in the appropriate manner - directing the user to download it to the correct location (though you will want to update the library path based on a change in RDFx)
  • It makes proper use of core systems like the theme system
  • It leverages an established contrib system

Now to the work...

Style issues:

  1. There is currently a Notice when installing.
  2. There is commented out code, including whole functions. Commented out code (except for demonstration purposes) should never be committed, though that rule isn't always followed.
  3. Drupal 7 standardized on "Implements" instead of "Implementation of"
  4. Need a new line at the end of some files. All should end in a new line.
  5. Lines should be broken at 80 characters. For example, * Gather sample data for all an individual's properties and save it in the batch. is 85 characters. In Komodo IDE (or Komodo Edit), there is a faint line at 80 chars, maybe there is in your editor too.

Some potential security issues:

  1. I may have missed it, but it doesn't seem like you filter URI values for dangerous protocols. I was looking for it around line 169 in RdfImporterParser.inc
  2. This code is commented out, but does the query in RdfImporterNodeProcessor.inc need a node_access tag? I'm not familiar enough with the feeds_node_item table and the data it stores.

Looking forward to reviewing again, we'll make sure it doesn't take a year this time ;)

sreynen’s picture

linclark, it's great to have a topic expert doing reviews, thanks! If you do another round after these updates, please look at the sandbox project rather than the Github version, since that's what will eventually become the full project. I'm guessing they're the same right now, but maybe not.

milesw, make sure you're doing your updates in the sandbox project. If you want to keep Github updated, that's great too.

I just added the sandbox link to the issue summary to make it a little easier to find.

milesw’s picture

Title: RDFimporter » Linked Data Import
Status: Needs work » Needs review

Thanks, linclark, for the great code review. And thanks jthorson and sreynen for nudging this along.

Based on a suggestion from linclark, I've renamed the project from "rdfimporter" to "ld_import" (Linked Data Import) to better represent the features of the module.

I've committed a number of changes to the sandbox repository based on the comments in #9...

Style issues:

  1. Fixed. Wasn't seeing the notice, even with E_STRICT, until I switched to PHP 5.2. Weird.
  2. Cleared out commented code.
  3. Replace all instances of "Implementation of..."
  4. Added new lines at end of files.
  5. Changed all documentation lines to break at 80. Found a nice feature to do this in TextMate.

Security issues:

  1. As I understand it, the only time you need to filter URIs is when they're being rendered somewhere. I've gone ahead and made sure that theme functions are using check_plain() for any data that's been imported. However, I don't think the parser needs to do anything special.
  2. The Feeds module doesn't seem to be using a node_access tag for any of its queries. The commented query in question from LinkedDataImportNodeProcessor::existingEntityId() was for D6, and has been replaced with the same query used by Feeds' own FeedsProcessor::existingEntityId().

Hope I've covered all the issues. I did check the ARC2 path in the latest RDFx module and it appears to be the same as what I'm using.

milesw’s picture

Issue summary: View changes

Added link to sandbox.

klausi’s picture

Status: Needs review » Needs work
  • README.markdown should be README.txt
  • same for INSTALL.markdown
  • text in those files shouls not exceed 80 characters
  • Remove all old CVS $Id tags, not needed anymore
  • info file: "php = 5.2" is not needed, Drupal 7 core already requires PHP 5.2
  • files that contain classes should be listed in the info file to leverage the Druapl registry autoloading
  • do not implement hook_init() just to include the library. hook_init() is executed on literally every page request and is bad for performance. Include code in places where you actually need it.
  • class properties should use lowerCamel naming, see http://drupal.org/node/608152
MiSc’s picture

@milesw has been contacted to ask if the application is abandoned.

After ten weeks with a status of needs work: the applicant may be contacted by a reviewer to determine whether the application was indeed abandoned. The action taken by the reviewer should be documented in the project application issue.

http://drupal.org/node/894256

milesw’s picture

Status: Needs work » Closed (won't fix)

Yup, closing the application. Not really abandoned, but probably not enough interest for Drupal.org, so no need to waste any more time reviewing. Thanks to those who already have.

milesw’s picture

Issue summary: View changes

Changed module name, removed GitHub link to avoid confusion.

apaderno’s picture

Issue summary: View changes
Issue tags: -Module review