I've noticed over the past several months that there are still quite a few test book pages being entered on drupal.org. Many without any text at all.

It may be of benefit to set a minimum character limit on book pages in an attempt to reduce the pratice of these test users.

Thoughts?

Comments

silverwing’s picture

+1

I've deleted many, many of these myself and I would be all for making it harder to post test pages.

greggles’s picture

Title: Minimum character setting on handbook pages » Minimum character/word setting on handbook pages

We could do 20 words as a minimum number?

However, I wanted to see what numbers might be of a right size and found a lot of nodes with low word counts (my test for "word count" isn't great, but it mostly works).
mysql> SELECT concat('http://drupal.org/node/', n.nid), LENGTH(body) - LENGTH(REPLACE(body, ' ', '')) FROM node_revisions nr inner join node n on n.nid = nr.nid AND n.vid = nr.vid where type = 'book' order by LENGTH(body) - LENGTH(REPLACE(body, ' ', '')) asc limit 50;
+------------------------------------------+-----------------------------------------------+
| concat('http://drupal.org/node/', n.nid) | LENGTH(body) - LENGTH(REPLACE(body, ' ', '')) |
+------------------------------------------+-----------------------------------------------+
| http://drupal.org/node/402290 | 0 |
| http://drupal.org/node/417694 | 0 |
| http://drupal.org/node/206130 | 0 |
| http://drupal.org/node/31595 | 0 |
| http://drupal.org/node/396570 | 0 |
| http://drupal.org/node/384880 | 0 |
| http://drupal.org/node/557370 | 0 |
| http://drupal.org/node/46641 | 0 |
| http://drupal.org/node/443536 | 0 |
| http://drupal.org/node/380194 | 0 |
| http://drupal.org/node/480892 | 0 |
| http://drupal.org/node/553824 | 0 |
| http://drupal.org/node/386314 | 0 |
| http://drupal.org/node/394118 | 0 |
| http://drupal.org/node/591710 | 0 |
| http://drupal.org/node/385144 | 0 |
| http://drupal.org/node/390568 | 0 |
| http://drupal.org/node/99612 | 0 |
| http://drupal.org/node/394126 | 0 |
| http://drupal.org/node/379716 | 0 |
| http://drupal.org/node/401780 | 0 |
| http://drupal.org/node/400818 | 0 |
| http://drupal.org/node/410500 | 0 |
| http://drupal.org/node/421616 | 0 |
| http://drupal.org/node/46651 | 0 |
| http://drupal.org/node/628292 | 0 |
| http://drupal.org/node/393506 | 0 |
| http://drupal.org/node/394450 | 0 |
| http://drupal.org/node/417188 | 0 |
| http://drupal.org/node/416698 | 0 |
| http://drupal.org/node/400856 | 0 |
| http://drupal.org/node/451238 | 0 |
| http://drupal.org/node/46635 | 0 |
| http://drupal.org/node/389228 | 0 |
| http://drupal.org/node/411734 | 0 |
| http://drupal.org/node/506068 | 0 |
| http://drupal.org/node/111022 | 0 |
| http://drupal.org/node/557322 | 1 |
| http://drupal.org/node/44895 | 1 |
| http://drupal.org/node/415078 | 1 |
| http://drupal.org/node/23192 | 1 |
| http://drupal.org/node/227210 | 1 |
| http://drupal.org/node/573150 | 1 |
| http://drupal.org/node/22288 | 1 |
| http://drupal.org/node/262 | 1 |
| http://drupal.org/node/448456 | 1 |
| http://drupal.org/node/630552 | 1 |
| http://drupal.org/node/405796 | 1 |
| http://drupal.org/node/88197 | 2 |
| http://drupal.org/node/420786 | 2 |
+------------------------------------------+-----------------------------------------------+

greggles’s picture

Another query based on character count:

mysql> SELECT concat('http://drupal.org/node/', n.nid), LENGTH(body) FROM node_revisions nr inner join node n on n.nid = nr.nid AND n.vid = nr.vid where type = 'book' order by LENGTH(body) asc limit 50;
+------------------------------------------+--------------+
| concat('http://drupal.org/node/', n.nid) | LENGTH(body) |
+------------------------------------------+--------------+
| http://drupal.org/node/506068 | 0 |
| http://drupal.org/node/111022 | 0 |
| http://drupal.org/node/553824 | 0 |
| http://drupal.org/node/386314 | 0 |
| http://drupal.org/node/394118 | 0 |
| http://drupal.org/node/416698 | 0 |
| http://drupal.org/node/379716 | 0 |
| http://drupal.org/node/401780 | 0 |
| http://drupal.org/node/394126 | 0 |
| http://drupal.org/node/385144 | 0 |
| http://drupal.org/node/46635 | 0 |
| http://drupal.org/node/591710 | 0 |
| http://drupal.org/node/557370 | 0 |
| http://drupal.org/node/443536 | 0 |
| http://drupal.org/node/46641 | 0 |
| http://drupal.org/node/396570 | 0 |
| http://drupal.org/node/400818 | 0 |
| http://drupal.org/node/410500 | 0 |
| http://drupal.org/node/390568 | 0 |
| http://drupal.org/node/411734 | 0 |
| http://drupal.org/node/400856 | 0 |
| http://drupal.org/node/480892 | 0 |
| http://drupal.org/node/628292 | 0 |
| http://drupal.org/node/380194 | 0 |
| http://drupal.org/node/394450 | 0 |
| http://drupal.org/node/417188 | 0 |
| http://drupal.org/node/46651 | 0 |
| http://drupal.org/node/421616 | 0 |
| http://drupal.org/node/417694 | 0 |
| http://drupal.org/node/448456 | 1 |
| http://drupal.org/node/22288 | 1 |
| http://drupal.org/node/227210 | 1 |
| http://drupal.org/node/23192 | 1 |
| http://drupal.org/node/7176 | 3 |
| http://drupal.org/node/393506 | 3 |
| http://drupal.org/node/451238 | 4 |
| http://drupal.org/node/384880 | 5 |
| http://drupal.org/node/557322 | 6 |
| http://drupal.org/node/262 | 11 |
| http://drupal.org/node/630552 | 12 |
| http://drupal.org/node/415078 | 13 |
| http://drupal.org/node/69725 | 14 |
| http://drupal.org/node/639994 | 17 |
| http://drupal.org/node/573150 | 17 |
| http://drupal.org/node/575276 | 18 |
| http://drupal.org/node/489662 | 19 |
| http://drupal.org/node/299562 | 20 |
| http://drupal.org/node/299563 | 20 |
| http://drupal.org/node/299564 | 20 |
| http://drupal.org/node/299565 | 20 |
+------------------------------------------+--------------+

The reason I did word count initially is because word count is a feature of Drupal core that we could implement now, while character count is something we would need to alter drupalorg.module to fix.

vm’s picture

My apologies, I actually meant word count and not character count as I knew there was a core feature that would cover this issue.

I suppose what has to be asked is .... is a document that only has 20 - 30 words a document worth having added?
Old documentation wouldn't be affected by this change only new documentation correct?

As a measuring stick, take this comment as an example which has over 30 words, could useful documentation be created in under 30 words?

sepeck’s picture

Assigned: Unassigned » sepeck

Choices are 0, 1, 10, 25, 50, 75, 100, 125, 150, 175 and 200.

I set it to 10 for right now. That will prevent blank pages and 'This is a test' messages.

sepeck’s picture

Status: Active » Needs review

changing status. let me know if it needs to be removed or go higher.

WorldFallz’s picture

Assigned: sepeck » Unassigned
Status: Needs review » Active

I clean these up a lot too. This is a great idea! The shortest legit pages I know of are the ones that merely link to a screencast somewhere (ie http://drupal.org/node/289310)-- even those have more than 20 words. Its really hard to think of a valid page that would only have 20 words.

WorldFallz’s picture

Status: Active » Needs review

sorry... looks like we crossposted.

avpaderno’s picture

Assigned: Unassigned » sepeck
Status: Needs review » Active

The problem is that now it is not possible to delete a book page if the body text doesn't contain at least 10 words. I was deleting all those test pages, but I had to enter 10 words of two characters (I entered random text, really).

Would not be better to change the minimum number of words after deleting the test pages?

vm’s picture

Assigned: sepeck » Unassigned
Status: Active » Needs review

I just took a stroll through greggles list of 0 word count nodes. Now that there is a word limit to delete the nodes, one must enter the proper amount of words.

It may be worth running a query to delete them?

avpaderno’s picture

Assigned: Unassigned » sepeck
vm’s picture

Assigned: sepeck » Unassigned

heh, and now I crossposted over Kiam.

vm’s picture

running a query may not be the best idea as it looks like a few of the pages could be landing pages which only offer links to child pages.

Sepeck, if you can switch this back to 0. Kiam and I can clean up the list Greggles has then we can switch it back.

Kiam you start at the top and I'll start at the bottom?

avpaderno’s picture

It seems that this report is the most active, at the moment. :-)

avpaderno’s picture

I have already started from the top; thank God you didn't proposed vice versa. :-)

Ok, then. Let the dance start!

vm’s picture

I've just copied ten words from my comment and am pasting it in the body to get rid of them.

Sepeck never mind switching the limit.

vm’s picture

Kiam I think were overlapping which must mean we got all of those which should have been deleted.

Pages which had no documentation and were blank parent pages linking to child pages, I did not remove.

avpaderno’s picture

The list given from greggles has been scanned. The left book pages are the ones with child pages.

vm’s picture

Status: Needs review » Fixed

^5 Kiam

Thanks greggles and sepeck. Let's see how the word limit works out over the next few days. Will reopen if needed.

webernet’s picture

greggles’s picture

Maybe the long term solution is better spam fighting tools.

avpaderno’s picture

Is there any reason why Mollom is used on g.d.o, and not on d.o?

greggles’s picture

re #22 - killes really dislikes black boxes.

To expand on my comment in #21, we could create a view or a page on d.o that shows book nodes in a table sortable by the number of characters and the date posted.

avpaderno’s picture

Views solution is good. Rather than being limited to book pages, it could be made for all the content types, and have an exposed filter that allows to filter out the list basing on the content type of the nodes.

Actually, it could be good to have a more generic view that would allow to easily catch spam posts. I am not sure how this could be achieved, but we could start using a view with filters with default values that would allow to catch some spam messages (i.e. the content length is not higher than X characters, the post has been committed in the last 10 minutes, etc...).

vm’s picture

I'm not at all opposed to a view maintainers can pull up which may have a word count or something that would help keep up with the test pages being posted.

After reading the discussion in #361106: Reduce the minimum word limit? , I understand the arguments both Lee and Webchick make with reference to feeling as they have to uyse more nodes then necessary, especially when, by design, they want a blank parent menu item with child menu items.

Finding a middle of the road for all involved would certiainly be the prudent thing to do.

WorldFallz’s picture

see #421676: Implement view for orphan book pages for a related effort aimed at orphan book pages

vm’s picture

orphaned pages help too but some of these 0 word test pages were child pages thrown in as child pages arbitrairily.

silverwing’s picture

Certainly not perfect :) Someone just posted a test page that read:

welkom op de site van stokcar nouws.... test test test

vm’s picture

yea I guess it will be an issue with test pages regardless of the word limit set. Users can just paste Loren Ipsum to get around any setting. Not sure how to handle this going forward but at least, with the aid of greggles list, we were able to get the pages that have been missed in the past.

sepeck’s picture

Immediately block that user. There is a warning message regarding test posts on book pages.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.