Using session_api with flag for anonymous users on Pressflow, everything is working beautifully after #785292, except that all anonymous sessions are being cleared on cron. This makes sense given the cron task:

function session_api_cron() {
  // Fetch list of outdated sids.
  $result = db_query("SELECT sap.sid FROM {session_api} sap LEFT JOIN {sessions} s ON (sap.session_id = s.sid) WHERE s.sid IS NULL");
  $outdated_sids = array();
  while ($session = db_fetch_object($result)) {
    $outdated_sids[] = $session->sid;
  }
  if (!empty($outdated_sids)) {
    module_invoke_all('session_api_cleanup', $outdated_sids);
    db_query('DELETE FROM {session_api} WHERE sid IN (' . implode(',', $outdated_sids) . ')');
  }
}

Basically, since a "faux session" is started without an entry in the sessions table, all of these entries are cleared on cron. It seems like there could be some expiration logic in the session_api table to combat this clearing. I realize this is an edge case, but it's a pretty big issue for Pressflow users.

Comments

jide’s picture

Encountered this issue too with d7.
Moreover, unless I'm missing something, session_api and sessions tables are never left joined ? And the condition for the sid to be null is even more obscure to me. Something wrong here ?

jide’s picture

Title: Session API + Pressflow anonymous sessions cleared on cron » Use an expiration logic when clearing sessions on cron
Version: 6.x-1.x-dev » 7.x-1.x-dev
Status: Active » Needs review
Issue tags: +cron
StatusFileSize
new3.58 KB

Okay, here is my attempt at a patch for this. TBH, I'm not sure I understood what the original intent of the author was, so I may be wrong with this, but I can't see how this could be different.

jide’s picture

StatusFileSize
new3.56 KB

Still not used to git patches. Here we go.

jide’s picture

Ouch that

-
+$sid = NULL;

Shouldn't be here, must reroll the patch.

jide’s picture

StatusFileSize
new3.32 KB

Corrected the patch.

rbayliss’s picture

StatusFileSize
new2.61 KB

That's pretty much exactly what I ended up with for D6/Pressflow as well. Just to make it clear for those coming later,

This is the Drupal 6 Patch. The Drupal 7 patch is in #5.

jide’s picture

Nice, we should sync those two patches so that d6 and d7 versions are similar.
One thing I don't get though :

+      //Set a created date so we know when to clear the record.
+      //If a session exists for this user, use a creation time of 0 so 
+      //core session handling has control.
+      $rec->created = isset($_SESSION) ? 0 : time();
+  $result = db_query("SELECT sap.sid FROM {session_api} sap LEFT JOIN {sessions} s ON (sap.session_id = s.sid) WHERE s.sid IS NULL AND sap.created < %d", time() - variable_get('session_api_cookie_expire_time', 2592000));

How would core sessions handling have any impact on this ? Am I missing something ?

jide’s picture

BTW, any clue on this join ? :

LEFT JOIN {sessions} s ON (sap.session_id = s.sid)

And this condition :

WHERE s.sid IS NULL

Since session_id is never NULL, I can't see what this JOIN and this WHERE purpose is.

rbayliss’s picture

The issue occurs when anonymous users do something that requires Session API, but not a core session. Since both Pressflow D6 and D7 try to avoid starting a session for anonymous users, this happens quite a bit. In my case, it is anonymous flagging that causes a Session API cookie to be set and a Session API record to be written to the database, but no core session to be created (rightly so).

Before this patch, every cron run would clear these Session API sessions out of the database since the WHERE s.sid IS NULL part of the query was matched. So to answer your question, core has some control over Session API sessions in that as long as a sessions table entry exists, the matching Session API record will not be deleted. IMO, this is good and expected behavior to let core expire sessions (and with it Session API sessions) wherever possible, which is the reason I'm suggesting setting a 0 created date for any Session API records that match up to the sessions table:

//If isset($_SESSION) is true, we know we have or will have a session table entry.  
+      $rec->created = isset($_SESSION) ? 0 : time(); 

I admit this is semantically confusing to have a created column with a bunch of 0 values though, so maybe there's a better way to do this. Just to be clear, I'm suggesting we add expiration logic for the Session-API-only sessions, not every session API record.

rbayliss’s picture

Since session_id is never NULL, I can't see what this JOIN and this WHERE purpose is.

Right, session_api.session_id is never NULL, but sessions.sid will be NULL if no matching core session exists.

chaps2’s picture

Moreover, unless I'm missing something, session_api and sessions tables are never left joined ? And the condition for the sid to be null is even more obscure to me. Something wrong here ?

I don't get this either. When will the session_id in session_api ever match the sid in the session table? They are created independently and are not the same. As far as I can tell, the left join will never be satisfied so all rows in the session_api table will be deleted with every cron run.

Is the original code in the cron hook just plain wrong or am I missing something too?

rbayliss’s picture

StatusFileSize
new2.98 KB

Ah yeah, I see what you mean. Guess I missed that change. Seems like the intent of this module (prior to #785292: Incompatible with Pressflow) was to have the session_api.session_id column be the core session ID if one is set, which is why the left join is there to begin with. This patch just moves back to using the core session ID if one exists so that left join makes sense again. Otherwise, it should be the same as above.

jhedstrom’s picture

Priority: Normal » Critical
jhedstrom’s picture

I'm afraid the approach in 12 may revert the logic added for Pressflow, since the session_id() function is used to "get and/or set the current session id", but I haven't had time to test it.

The patch for 7.x is looking good, but I'd love to add a test here to make certain that sessions aren't cleared out on cron that shouldn't be.

jhedstrom’s picture

Status: Needs review » Needs work

changing status.

ju.ri’s picture

has anyone got this to work? I just tried the patch in #12 and got lot of duplicate entries errors :

User warning: Duplicate entry '2cb29cdbc3bafe0e...' for key 'session_id' query: INSERT INTO session_api (session_id, created)

jaydub’s picture

I've attempted to sync up the patches in #5 and #6. I avoided #12 since as noted in #14 that would seem to create issues with Pressflow which is the environment we are testing session_api under.

I added an index to the timestamp field since the core sessions table has one and in theory since the queries that operate on session_api table with these patches have where clauses on the timestamp field then it should be indexed. Feel free to strip that out.

jhedstrom’s picture

Status: Needs work » Needs review

Changing status--I'd love some feedback on how the patches in #17 are working for people before I commit them.

jaydub’s picture

//If isset($_SESSION) is true, we know we have or will have a session table entry.  
+      $rec->created = isset($_SESSION) ? 0 : time(); 

There's definitely an issue with a simple check of isset($_SESSION) as a proxy for a drupal {sessions} table entry.

In our case we have Pressflow in place which does not write a row to the sessions table unless a data is explicitly added to the $_SESSION array and is meant to persist beyond the page request. There are other parts of Drupal that can add to the $_SESSION array but that will not persist beyond the page request. An example is a drupal_set_message() message. The $_SESSION array will be populated with messages during the building of the page and then emptied out when the page is rendered.

So in our case we have a flag that uses a confirmation form upon flagging. So for an anonymous user the following happens:

* User clicks flag link
* User is taken to page with flag confirmation form
* Upon confirmation, the flag is set and a session is created in session_api
** In the session_api.module code in session_api_get_sid(), this results in a row in session_api and since $_SESSION is not set at this point, the timestamp used for the session_api record is time().
** A message is set using drupal_set_message() to tell the user that the content was flagged and so $_SESSION is now populated (but after the record to session_api is added
* The form redirects back to the page that the user first flagged from.
** In the session_api.module code the session_api_get_sid() call now is an update. The same test for isset($_SESSION) is now true because the drupal_set_message() string is in the $_SESSION array but there still is not a row in the sessions table (Pressflow here).
** This results in a $row->timestamp value of 0 which is then used in the update to the row in session_api.

Now when the session_api_cron() runs, since this row has a timestamp value of 0 and there still is no row in the sessions table, this session data will be deleted from session_api instead of persisting until (timestamp + session_api_cookie_expire_time).

I'm looking for a reason why $row->timestamp shouldn't just always be set to time() in session_api_get_sid() so please correct me if I'm wrong here...

jerry’s picture

Subscribing.

jerry’s picture

Changing status--I'd love some feedback on how the patches in #17 are working for people before I commit them.

FWIW, the 6.x patch didn't help in my non-Pressflow installation. The symptom was that anonymous node flags were no longer visible to Views after a cron run, though the node itself still indicated that it was flagged. The behavior was unchanged post-patch.

I'm looking for a reason why $row->timestamp shouldn't just always be set to time() in session_api_get_sid() so please correct me if I'm wrong here...

I noted in the sessions table that the timestamp for my anonymous flag sessions was consistently being set to 0. Changing the timestamp to unconditionally use time() as above appeared to solve the Views problem described. I haven't had time to look at this closely enough to be confident that this is the right approach.

jessehs’s picture

StatusFileSize
new3.02 KB

I re-rolled the D6 patch in #17, except I removed the logic that would save the timestamp as 0 when the $_SESSION variable is not set, following jerry's comment in #22 above.

I think the problem occurred (at least for me) when a session_api row was updated while the $_SESSION variable was set, as this set the timestamp to 0.

Rolled against 6.x-1.x head. Also note that this adds the $outdated_sids array back to the query in the hook_cron function, which apparently got removed in the 6.x branch.

mattcasey’s picture

The patch in #22 worked for me on my site -- running Cron did not clear my flags this time. Using Sessions with Flag 6.x-2.0-beta6

trrroy’s picture

The D7 patch in #17 is working for me.

ashokpola’s picture

The D7 patch #17 is working fine for me. But still getting Duplicate entry errors in log. Is there any way of get rid off them?

ashokpola’s picture

Version: 7.x-1.x-dev » 7.x-1.0-beta2

I posted a patch to fix duplicate errors, please review it & let me know if any issues are found.
http://drupal.org/node/483176#comment-5699138

jhedstrom’s picture

Version: 7.x-1.0-beta2 » 7.x-1.x-dev
Status: Needs review » Patch (to be ported)

I committed a slight modification of the patch in #17 to the 7.x branch (the db_add_table and db_add_index calls in 7.x no longer have the $ret parameter). Thanks all!

jhedstrom’s picture

Version: 7.x-1.x-dev » 6.x-1.x-dev
jaydub’s picture

Version: 6.x-1.x-dev » 7.x-1.x-dev
Status: Patch (to be ported) » Needs review
StatusFileSize
new622 bytes

The patch first submitted in #5 and carried through in subsequent patches (including my own) for d7 added a JOIN on the wrong column in moving from the d6 SQL to the d7 DBTNG SQL:

-  $result = db_query("SELECT sap.sid FROM {session_api} sap LEFT JOIN {sessions} s ON (sap.session_id = s.sid) WHERE s.sid IS NULL");
-  $outdated_sids = array();
-  foreach ($result as $session) {
-    $outdated_sids[] = $session->sid;
-  }
+  $query = db_select('session_api', 'sap');
+  $query->leftJoin('sessions', 's', 'sap.sid = s.sid');
+  $query->fields('sap', array('sid'));
+  $query->condition('sap.timestamp', REQUEST_TIME - variable_get('session_api_cookie_expire_time', 2592000), '<');
+  $outdated_sids = $query->execute()->fetchCol();

The join should be on sap.session_id = s.sid on: + $query->leftJoin('sessions', 's', 'sap.sid = s.sid');

I think this is responsible for the issues #1401000: session_api_cron() not valid for PostgreSQL and #1476828: Postgresql and possibly other d7 related issues.

Patch is attached.

olarin’s picture

Status: Needs review » Reviewed & tested by the community

seems simple enough - caused no problems for me with latest dev version (although i'm not using pressflow nor postgresql anyway). any reason not to commit?

jm.federico’s picture

Status: Reviewed & tested by the community » Needs work

Hi Guys

Sorry to move this needs work
But i'm still unsure on why we are doing the Join.

As I can see we are only fetching one field

<?php
  $query->fields('sap', array('sid'));

If so, the LEFT JOIN is useless as it will always return the column in sap (session_api) regardless of what we have in session.

And a second problem is that we assume here that we have sessions in the SQL database, but what if we are using say mongodb for session storage?

I don't think we should use the join, I see no point in it.

If I missed something and the join ias needed, please let me know.

Cheers

jibus’s picture

#29 resolve my problem on PostgreSQL

tic2000’s picture

I have to say the the Pressflow patch was not very well thought through.

Before that patch there was a "link" between session_api and session tables. Now that link is totally gone. If that broken link is a good or a bad thing is not for me to decide.

Given that the left join is not wrong because it's on the wrong columns, is wrong because it was left there. sap_session_id and s.sid will NEVER be the same. The first is md5(ip_address() . time() . drupal_get_private_key()) the second is session_id().

That left join has to go because it does absolutely nothing.

Someone was concerned that session_id() not only gets, but also sets a session id and that would brake the pressflow patch. session_id() sets the session id only if an argument is passed to it AND if that is done before session_start(). Without an argument the function is just a getter and if no session id exists it returns "".

ju.ri’s picture

Hi, I have tested the code in #22 for Drupal 6 and it seems to work well.
Will this be committed, aka good to use on a live site? There's a database change, that's why I'm asking.
The unresolved issues seem to be in D7?

gold’s picture

As it stands in #29 this fixes my issue with posgres. Reading further up though, that doesn't appear to be the original problem. I arrived here from #1401000. I'm going to steal the patch in #29 and apply it there. The patch at #29 does address the issue at #1401000.

a.ross’s picture

StatusFileSize
new937 bytes

#31 and #33 are correct and I came to that same conclusion. Here's a patch to remove the join, I also took the opportunity to use db_delete instead of db_query.

a.ross’s picture

Status: Needs work » Needs review
a.ross’s picture

StatusFileSize
new937 bytes
a.ross’s picture

StatusFileSize
new937 bytes

Re-uploading for testbot

a.ross’s picture

Issue tags: -cron +Needs followup

I guess I'll commit this as it's labeled critical. I'd like some test coverage for the cron run though.

a.ross’s picture

Status: Needs review » Fixed

Status: Fixed » Closed (fixed)
Issue tags: -Needs followup

Automatically closed -- issue fixed for 2 weeks with no activity.