It would be most helpful if I could run queries against prod. I am working on an automated spam solution, and it would help to be able to pipe in some real data to help train the filters. Having read access to prod data would also help with things like #1689784: Mass block profile spammers
Comments
Comment #1
cweagansTagging and setting to NR
Comment #2
killes@www.drop.org commentedHow exactly would you want to use read-only access?
Comment #3
cweagansMy plan was to track down a bunch of spam posts living in the full Drupal.org database and use those posts to do some initial training of the bayesian filter. I have a utility written on my computer at home that will loop through every comment on a Drupal site and evaluate spamminess (using the spam module). If it's very high, it'll just assume spam. If it's medium, it'll prompt (spam or ham), and if it's low, then it'll just go to the next one. It will continue doing this until it has a high accuracy rate ( > 92%), at which point, the filter will be trained enough to do an initial deployment.
I also have some sample data from a few friends: they purposefully did not delete spam comments on their site, so I can use them to train the bayesian filter. Step 1) Pipe in a bunch of known bad spam messages (easily identified in the test data that I've been supplied). Step 2) Run as many comments as possible from Drupal.org through my utility and try to get the filter to the point where it's correctly identifying spam the majority of the time.
As I mentioned in the issue body, I'd also like to be able to point to data on prod when trying to identify spammers for issues like this: https://drupal.org/node/1689784
Comment #4
killes@www.drop.org commentedCould this utility run on e.g. util.drupal.org? We'll not open outside access in any case.
Comment #5
adams.garfield commentedyea even i wanted to know the same. I guess it should work..
Comment #6
killes@www.drop.org commentedI think this is no longer needed.