I am assigning myself this task. I want to give back to all those that give. I know you are working very hard and your efforts are very much appreciated.

One of the issues with getting started in nutch/drupal is learning to use the nutch-site.xml. I am going to write a patch that includes the editing of that file using the Drupal interface and possibly allowing for inserting directly from the nutch-default.xml as code snippets with a small 'library' of property tags.

Take a look at the attached file. Feedback welcomed.

Comments

broncomania’s picture

Yo, that's cool! Just make it simple. thumbs up. Where is the code?

maxmmize’s picture

I will be pushing it up here within a week. I wanted feedback while I was at it because law school midterms are coming up in three weeks and I will be out of the game a bit starting here shortly until they are over.

robertdouglass’s picture

Insofar as we want to be editing files from within Drupal, I think the interface shown is highly superior to the text area -> runbot -> write file workflow that currently exists. In other words, if we go this route (which I can see the need for) can we also do it this way for seed/urls and the regexp filters?

One of the hard things about nutch is knowing how to enable/disable plugins, and then what files to edit in order to configure them. I suppose this is a step towards baking that domain knowledge into the Drupal module which will be helpful to some people.

On the other hand, these problems are sometimes best addressed with documentation - which is sorely lacking in the nutch world, and maybe we should just be teaching people how to do this configuration?

maxmmize’s picture

Indeed, the nuts and bolts is lacking. I am not sure how much you want others to start dredging through the current documentation. If nobody is opposed I guess after I get these two things handled a good project while I am studying for my midterms would be to collect everything we have into a rtf file and then reorder it, re-engineer and re-format it to make it more clear on what path a new person should follow.

That way we can all discuss the changes, agree and have them implemented. A basic restructuring on what we have will give us a better idea of what we need.

But yes, the interface is huge if we plan to develop a local nutch/solr community. You figure for every 100 or so people, one will actively participate and grow the community.

maxmmize’s picture

Status: Active » Needs review

Sorry, figuring out CVS was too much during Real Property class...and Robert wasn't up yet or I would have bugged him.

Add to nutch.module line 71 add a new tab: nutch.module version // $Id: nutch.module,v 1.3 2010/09/30 11:21:31 dstuart

    $items['admin/settings/nutch/conf'] = array(
     'title'              => 'Nutch Site Config',
     'page callback'      => 'drupal_get_form',
     'page arguments'     => array('nutch_admin_conf'),
     'access arguments'   => array('administer nutch'),
     'type'               => MENU_LOCAL_TASK, 
     'file'               => 'nutch.admin.inc',
     'weight'             => 4
   );

then add this to the end of nutch.admin.inc // $Id: nutch.admin.inc,v 1.1 2010/04/11 23:40:26 dstuart Exp $

function nutch_admin_logs_archive($log_file) {
    $path = explode("/", $log_file);
    $file = array_pop($path);
    $path[] = 'history';
    echo implode('/', $path).'/'.date('mdY-His').$file;
    die;
}
function nutch_admin_conf() {
    $output = '';
    
    $default_file = variable_get('nutch_nutch_dir', '/usr/local/nutch') .'/conf/nutch-default.xml';
    if(!is_readable($default_file)){
        drupal_set_message(t('Cannot access  %default either it does not exist or is not writable by the webserver.', array('%default' => $default_file)), 'error');
    }else{
        $site_default = html_entity_decode(check_plain(file_get_contents($default_file)));
    }
    
    $config_file = variable_get('nutch_nutch_dir', '/usr/local/nutch') .'/conf/nutch-site.xml';
    if(!is_writable($config_file)){
        drupal_set_message(t('Cannot access  %config either it does not exist or is not writable by the webserver.', array('%config' => $config_file)), 'error');
    }else{
        $site_config = html_entity_decode(check_plain(file_get_contents($config_file)));
    }
    
    $form=array();
    $form['open']=array('#value'=>'<div style="width:50%;float:left;">');
    $form['path']=array(
    '#type'=>'hidden',
    '#value'=>$config_file,
    );
    $form['site_config']= array(
    '#type' => 'textarea',
    '#title' => 'Site Config',
    '#value' => $site_config,
    '#rows' => 30,//count(explode("\n", $default_file)),
    );
    $form['middle']=array('#value'=>'</div><div style="width:50%;float:left;">');
    $form['site_default'] = array(
    '#type' => 'textarea',
    '#title'=>'Site Default',
    '#value' => $site_default,
    '#rows'=>30,
    );
    $form['end']=array('#value'=>'</div><div style="clear:both;"></div>');
    $form['controls']['save'] = array('#type' => 'submit','#value' => t('Save Config'));
    return $form;
}
function nutch_admin_conf_submit($form, &$form_state) {
    $path = variable_get('nutch_nutch_dir', '/usr/local/nutch') .'/conf/nutch-site.xml';
    
    $path = $form['#post']['path'];
    $content = $form['#post']['site_config'];
    
    if (!file_exists($path)) {
        drupal_set_message('Failed to save site configuration file, '.$path.' doesn\'t exist.', 'error');
        return 0;
    } elseif (!is_writable($path)) {
        drupal_set_message('Failed to save site configuration file, '.$path.' isn\'t writable.', 'error');
        return 0;
    } elseif (!file_put_contents($path, $content)) {
        drupal_set_message('Failed to save site configuration file, unknown reasath was '.$path.'.', 'error');
        return 0;
    } else {
        drupal_set_message('Site configuration has been saved.');
    }
}

I promise I will figure out CVS sometime after midterms or get with another Drupalite on Skype to work with me to get it into a format you all prefer.

broncomania’s picture

Cool, is working after setting the right write permissions to the nutch-site.xml file.
Frank

maxmmize’s picture

Can you port it for me?

broncomania’s picture

What do you mean with port? Making a patch of your code?

maxmmize’s picture

Yeah sorry, can you make a patch, I guess Dave would port it if he thought it was something to useful.

broncomania’s picture

Sure I can try. Will post it the next days here.

broncomania’s picture

StatusFileSize
new3.99 KB

So here is the patch. Hope it works

broncomania’s picture

StatusFileSize
new1.25 KB

Ah I forgot the module file patch.

avpaderno’s picture

Issue summary: View changes
Status: Needs review » Closed (outdated)

I am closing this issue, since Drupal 6 isn't supported anymore.