When editing biblio items (imported from endnote XML, initially from pubmed) many times upon saving there is an error such as the following:
"Keywords cannot be longer than 255 characters but is currently 481 characters long."
This applies to the Keywords field towards the end of the form, not the one that is linked to taxonomy.
A fix would be to change the keyword field from a varchar field to full text. Unless it is an issue related to parsing the keyword list into individual keywords. In recent biomedical publications, the limit is exceeded about a third of the time, so this would be a very needed fix.
Thanks :-)

CommentFileSizeAuthor
#3 reference library.txt160.53 KBpimousse98
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

pimousse98’s picture

This got even stranger... I had pasted the content to "custom 1" so as not to lose it, and decided to try pasting it back in the Keywords field - without any change in data I was able to save successfully. To repeat, if you 1. edit a node with long keywords and 2. try to save you get the error BUT if you 1. cut & paste keyword data into another field (I used custom 1) 2. save node 3. edit node and cut & paste from custom 1 back to keywords and 4. save node, then it works.

rjerome’s picture

Could you post either one of these keyword strings or the bit of the xml file containing it or (preferably) both?

Also, could you check to see what your keyword separator is set to on the admin/settings/biblio page.

Ron.

pimousse98’s picture

FileSize
160.53 KB

Hi,
The keyword separator is set to be ",". Some of my keywords include commas, though, which might be part of the problem. An example would be "Genes, Viral". I am attaching the EndNote XML.

Unfortunately after finding that "fix" (saving with empty keyword field then pasting back the keywords in and saving again), I went through and fixed all of the problem nodes. I went through my log to find two I most likely fixed and here are the keyword lists (these are straight from pubmed and are full of * and / ).

Animals, Humans, Immunohistochemistry, Mice, Cricetinae, *Genetic Vectors, Transfection, Gene Expression Regulation, Antineoplastic Agents/*pharmacology, Apoptosis/*drug effects, Factor VIII/analysis, SCID, Enzymologic/drug effects, Gene Expression

Animals, Humans, Mice, Amino Acid Sequence, Molecular Sequence Data, Laminin/*metabolism, Cell Adhesion/drug effects, Melanoma/*pathology, Peptides/*pharmacology, *Protein Precursors, Melanoma, Experimental/pathology, *Neoplastic Cells, Circulating, *Rece

Hope this helps,

Delphine

rjerome’s picture

Hi Delphine,

I don't know how you are going from PubMed to Endnote, but I think somewhere in that process, you need to have the keyword lists split on the commas. EndNote typically exports a single keyword per <keyword> tag in the xml file, but as you see, there are some with comma separated lists. I didn't run into any errors when importing the file you attached above, but I guess you already fixed the problem?

For what it's worth, I should have direct PubMed import shortly. I just finished the DOI lookups / import, so PubMed should follow the same framework. Basically I just need to write the PubMed XML parser.

Ron.

pimousse98’s picture

Hi Ron,
Importing from pubmed would be awesome! I had imported the references to EndNote 9 using their connector/search tool (in tools > connect > PubMed (NLM) ).
These keywords are MeSH keywords - many of them are supposed to include a comma (as in "Cell line, tumor"). Here is the address of the keyword browser:
http://www.nlm.nih.gov/cgi/mesh/2009/MB_cgi

The keywords with commas in them are imported to taxonomy correctly (within quotes) but are split when imported to the keywords (so we end up with a "cell line" and a "tumor" keyword, which is not the desired outcome). I have looked in the biblio_keywords table and for some reason some keywords get split on comma (History, 20th century) and some do not (RNA, antisense).

I revisited the references where the keywords were "too long". I believe the copy and paste operation only kept the first 255 characters or so, since it ends up having more keywords in the taxonomy list than in the keywords list.

The taxonomy keywords end up mostly accurate, except for when there is an ampersand, for some reason they do not end up enclosed in quotes like the ones with a comma.

These issues would probably be fixed by systematically enclosing all keywords in quote marks, and increasing the length authorized for the keywords field.

Thanks!

Delphine

pimousse98’s picture

Oh, a few more details - I think some of the keywords you think are comma-separated lists are supposed to be that way, the commas are to indicate a qualifier, so all terms should stay together. For example: "chromosomes, artificial, bacterial" is one item (see MeSH page ).
Also, I did not get any error messages upon importing, but got them afterward when I was uploading file attachments to each node. When saving after editing, I would get the "length exceeded" error. At that point the biblio & taxonomy keywords were already created, so nothing should have been over 255 chars.

rjerome’s picture

OK, this has uncovered a little bug in the keyword handling... and that is if the keywords had embedded separators (as yours did) this would be lost on editing and saving. I have now rectified the problem and you will see that in subsequent releases, keywords with embedded separators will be wrapped in double quotes to preserve the separator.

As for the field width, although the keywords are displayed as a single long string separated by commas, they are stored separately in the database. So we really don't need a wider column in the database, we need a "wider" input box on the input form because it's the input form which is limiting the combined keyword string length to 255 characters. I have now increased that 1000 characters, so hopefully you won't run into this issue again.

Ron.

pimousse98’s picture

Hi Ron,
This is great! I can't wait to get to use the new version. Thank you so much for all your hard work on this module.
Delphine

pimousse98’s picture

Hi Ron,
Another question. Are these changes committed to the 'dev' revision? Do you advise I use the latest dev version, or do you know when the next stable release will come out?
Thanks,
Delphine

rjerome’s picture

Yes, they have been committed to -dev. You can use it if you like and then switch back to the release version when it comes out. I had hoped to have the 1.3 version out by now, but there have been a few little issues (like this one) popping up which I thought I would address prior to making another release.

Ron.

pimousse98’s picture

Okay, I just installed the -dev dated of today. The import works fine, no more error message upon saving! :-)
There is still an issue with the "&" character when importing with keywords =>taxonomy. A keyword containing "&" (such as: "Genetic Vectors/*administration & dosage/immunology") stays together correctly in the Keywords fields, but is split into 3 parts (including the lonely "&") when imported to taxonomy. I'm assuming anything with an & should be enclosed in quotes/parsed together for taxonomy import.
Thanks!

Delphine

rjerome’s picture

Everything going to the taxonomy module is already being wrapped in double quotes be default, so I suspect it's the taxonomy modules parsing of the string that doing this.

I'll see if I can track down where this is happening.

Ron.

rjerome’s picture

Status: Active » Fixed

The bad news is that I couldn't pin this bug on anyone else but myself :-(

The good news is that some other work I was doing with keyword/taxonomy integration fixed the issue anyway, so it should be a non-issue with the very latest -dev version (avail. <= 12 hrs from now).

Ron.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.