Python and recovering Firefox sessions

First let me confess, I am a tab-aholic…

I tend to have several Firefox windows open at once, with several tabs in each. And I leave them open for… well… weeks. And usually that’s fine – it restores everything if I have to reboot or restart Firefox, and all is well. Except when it doesn’t.

Sometimes, for whatever reason, the sessions aren’t restored, and all of those tabs, as in “hey, I wasn’t quite done with that!”, are lost.

When this happened to me last night, I took a look in my Firefox profile directory and discovered something about the way Firefox stores it’s sessions. I’m sure it’s well-known to many, but it was news to me.

When I looked at sessionstore.bak (the automatically created backup of sessionstore.js) I saw something like:

({"windows":[{"tabs":[{"entries":[{"url":"http://stoa.canterburyschool.org:9080
/stoa/courses/Faculty/TechnologyGuides/VoIPandSkypeinForeignLanguage
/document_edit_form","title":": VoIP and Skype in Foreign Language","ID":2528274837}...

Hmmmm… I know that’s Javascript, but it also looks a lot like a Python dictionary. Why not see if Python could eval() it to a dictionary? My first try failed on the unknown identifiers “true” and “false”, but after defining true and false as equal to Python’s True and False, it worked. Now I had a nested set of dictionaries containing all of my previous session’s info. Not bad.

[Edit: Not being a regular JSON user, and since the file extension was .js, not .json, I never considered that this might be JSON. It turns that it's not QUITE JSON and the Python JSON library won't decode it (without at least stripping off the leading and trailing parentheses). Check out this discussion for more info.]

From there it was easy to create some code that would grab the URL’s from an unrecoverable session and put them into an HTML page for easy access later.

Warning: this code uses eval() on arbitrary strings loaded from an arbitrary file, so it is not secureuse with great caution. If it wipes your hard drive, sells your house and absconds with the money, and kicks your dog on the way out, you have been warned.

In any case, below is my session recovery code, which I’ve already used a couple of times.


import sys
""" this code is released to the public domain and meant only for illustration. It is not
    warranted as being safe or suitable for any particular purpose at all.

    in particular, this program evaluates (and potentially executes) arbitrary code,
    so use with caution.
"""

def get_session(infilename, outfilename):
    true = True
    false = False
    sessionstring = open(infilename).read()
    session = eval(sessionstring)

    outfile = open(outfilename, "w")
    for window in session['windows']:
        outfile.write("---------------Window-----------------<p>\n")
        outfile.write("<ul>\n")
        for tab in window['tabs']:
          outfile.write( '<li><a href="%s" \="">%s</a> - %s</li> \n' %
                         (tab['entries'][0]['url'],tab['entries'][0]['url'],
                          tab['entries'][0]['title']))
        outfile.write("</ul>\n")
    outfile.close()

if __name__ == '__main__':
    infile = sys.argv[1]
    if len(sys.argv) == 3:
        outfile = sys.argv[2]
    else:
        outfile = "sessionsave.html"

    get_session(infile, outfile)
About these ads

18 Responses to Python and recovering Firefox sessions

  1. Lucas Wiman says:

    I haven’t tried this on the that file specifically, but that looks like valid JSON. If your using python 2.5, you should use the simplejson package or in >2.6, use the json pacakge that’s in the standard library. In either case, the loads function will convert the json string into a python ojbect. Additionally, if you’re using 5.6 you shouldn’t use eval unless you absolutely need it. For most applications where you’re dealing with python literals, use ast.literal_eval, which is secure (and only evaluates python literals like dicts, tuples, strings etc).

  2. Dirkjan Ochtman says:

    It’s called JSON, it’s great. You should really use the json module from the stdlib (2.6+) or simplejson (the external version of the stdlib’s json).

  3. Thomas Herve says:

    Nice trick! Simply using simplejson.loads instead of eval will remove the security problem (and the need of true/false).

  4. voyager says:

    Wouldn’t using the JSON libraries made more sense to avoid the eval? (Just a little nitpick.)

    http://docs.python.org/library/json.html

  5. philikon says:

    How to make it secure :)

    1. easy_install simplejson
    2. import simplejson
    3. s/eval/simplejson.loads

  6. You might find that you could import the data structure more safely using a JSON library, since it looks like Firefox is only using simple JavaScript constants of the sort that parse cleanly as JSON.

  7. Vern Ceder says:

    Thanks for all the suggestions. I had not really thought of it being JSON, since it doesn’t use the .json extension (which the Firefox bookmark backups do). Also, I have to admit that I’m not normally a JSON user, so I didn’t recognize the similarity. But…

    In fact, sessionstore.js (and sessionstore.bak) apparently is NOT strictly JSON and the solutions suggested all fail on my files. I did find a discussion of the problem, with the same results as mine, here: http://mail.python.org/pipermail/python-list/2009-April/1202439.html

    So I’m afraid the insecure way is still the easiest way.

  8. ulrik says:

    My sessionstore parses perfectly as JSON if I only remove the enclosing parantheses

    import json
    sess = “/path/to/sessionsstore.js”
    dataobj = json.loads(open(sess).read().strip(“()”))

    • Vern Ceder says:

      Yeah, my understanding is that the parentheses are the main problem… of course, that means that other parens in the titles, etc will be stripped out. That’s probably not the end of the world though.

      • Jason says:

        strip(“()”) only removes ‘(‘ and ‘)’ from the beginning and end of the string, so any other parentheses in the JSON should be fine (since it’ll stop at the braces).

      • Vern Ceder says:

        Doh! You’re right, of course…

  9. A kludge of a script we wrote at work for end users that both clears lockfiles from firefox & thunderbird as well as fixes firefox profiles (two issues we’ve run into when a user’s NFS mounted home directory is disconnected while firefox/thunderbird is open) parses firefox’s profiles.ini with ConfigParser, copies user data to a temporary directory, blows away the old profile, recreates it via ‘firefox -CreateProfile,’ and copies the user data back.

    It works like a charm, unless a stale nfs file handle breaks shutil.rmtree. My name should link to it: http://math.umn.edu/~ben/lockhunter.py

  10. Taras Di says:

    Nice, worked perfectly!

    Did you figure out what the structure of the JSON dictionaries were (in particular the children entries)? That was throwing me off when I was trying to do it myself. Maybe it’s storing tab history?

    • Vern Ceder says:

      Glad it helped!

      Actually, I didn’t bother to go into the children entries, but they seem not to be the browser history, but maybe elements embedded or linked in the pages… At least that’s what it seems like on a guess. In mine I see a fair number of ads and counters that I know I didn’t load directly.

  11. [...] Python and recovering Firefox sessions [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 133 other followers

%d bloggers like this: