Swish-e and backslashes

gilf - February 21, 2008 - 22:09
Project:Swish-E Indexer
Version:5.x-1.x-dev
Component:Code
Category:support request
Priority:normal
Assigned:Unassigned
Status:active
Description

Hi,

When I (or cron.php) does an index I see in the drupal log:

Indexing Data Source: "File-System"
Indexing "/var/www/drupal5/files"

Checking dir "/var/www/drupal5/files"...
PBSProERS_71.pdf - Using DEFAULT (HTML2) parser - (no words indexed)
SCM-080114.doc - Using DEFAULT (HTML2) parser - (no words indexed)
PBSProUG_7.1.pdf - Using DEFAULT (HTML2) parser - (no words indexed)
KnowledgeTreeUserManua.pdf - Using DEFAULT (HTML2) parser - (no words indexed)
PBSProQS_7.1.pdf - Using DEFAULT (HTML2) parser - (no words indexed)
PBSProAG_7.1.pdf - Using DEFAULT (HTML2) parser - (no words indexed)
BiografX.pdf - Using DEFAULT (HTML2) parser - (no words indexed)
ADMEnsa.TXT - Using DEFAULT (HTML2) parser - (68 words)
DRAFT EPIX Pharmaceuticals AI License Agreement 31 Oct 2007.doc - Using DEFAULT (HTML2) parser - (no words indexed)
Software_inventory_2008-02-15-2.xls - Using DEFAULT (HTML2) parser - (no words indexed)

And the indexing is not done..
And in the appache error log:

Error: Couldn't open file '/var/www/drupal5/files/PBSProERS\_71\.pdf'
catdoc: No such file or directory
Error: Couldn't open file '/var/www/drupal5/files/PBSProUG\_7\.1\.pdf'
Error: Couldn't open file '/var/www/drupal5/files/KnowledgeTreeUserManua\.pdf'
Error: Couldn't open file '/var/www/drupal5/files/PBSProQS\_7\.1\.pdf'
Error: Couldn't open file '/var/www/drupal5/files/PBSProAG\_7\.1\.pdf'
Error: Couldn't open file '/var/www/drupal5/files/BiografX\.pdf'
catdoc: No such file or directory
/var/www/drupal5/files/Software\_inventory\_2008\-02\-15\-2\.xls: No such file or directory
Error: Couldn't open file 'files/KnowledgeTreeUserManual_2007-05-17.pdf'

I cannot find when and where those backslashes are added, and if indeed this the reason files are not indexed.

#1

populist - March 7, 2008 - 19:41

I suspect this is an issue with Swish-E escaping the files when it is pulling them into the search index.

#2

lienty - March 8, 2008 - 08:45

I have the same escaping problem

when i index the file , the file who contain's escape are not open and i have this massage:

catdoc: No such file or directory
catdoc: No such file or directory
pdftotext version 3.02
Copyright 1996-2007 Glyph & Cog, LLC
Usage: pdftotext [options] <PDF-file> [<text-file>]
  -f <int>          : first page to convert
  -l <int>          : last page to convert
  -layout           : maintain original physical layout
  -raw              : keep strings in content stream order
  -htmlmeta         : generate a simple HTML file, including the meta information
  -enc <string>     : output text encoding name
  -eol <string>     : output end-of-line convention (unix, dos, or mac)
  -nopgbrk          : don't insert page breaks between pages
  -opw <string>     : owner password (for encrypted files)
  -upw <string>     : user password (for encrypted files)
  -q                : don't print any messages or errors
  -cfg <string>     : configuration file to use in place of .xpdfrc
  -v                : print copyright and version info
  -h                : print usage information
  -help             : print usage information
  --help            : print usage information
  -?                : print usage information
catdoc: No such file or directory
catdoc: No such file or directory
catdoc: No such file or directory
catdoc: No such file or directory
catdoc: No such file or directory
catdoc: No such file or directory
catdoc: No such file or directory
catdoc: No such file or directory
catdoc: No such file or directory
catdoc: No such file or directory
catdoc: No such file or directory

i have no probleme for no escaping files

 
 

Drupal is a registered trademark of Dries Buytaert.