Sunday, May 07, 2006

Getting off my Lazy Arse

Dear Lazyweb:

Previously I have blogged about the woeful state of many GNOME applications with respect to UIs that block on I/O. It seems that no-one (who matters) really gives a shit about the agony of those of us with 57k modem connections, so I have decided to try to attack this problem myself.

But I don't know where to start! I have never written a GNOME application, or a single line of Python code. (It seems that many of the apps that I that have this problem are written in Python. Coincidence?) So if I want to fix yumex, pup, pirut, gaim etc. so that they don't become unresponsive (and don't repaint their window) while waiting for the network, do I look at/learn about:


  1. Python
  2. PyGTK
  3. GMainLoop
  4. Something else
  5. All of the above


???

I am not a student, and am not eligible for Summer of Code funding (plus there is a reasonable probability that I will fail in my mission ...), but any mentoring would be greatly appreciated.

Luv ya,

John

7 comments:

Stephen Thorne said...

Non-blocking asyncronous event driven code is reasonably hard to write, especially when integrating both GUI events and networking I/O. This seems to be the root of the problem for GUIs that block unnecesserily.

Twisted is a python non-blocking async event based framework that supports both GTK GUIs and networking, at the same time, in the same main loop.

Jump on #twisted on irc.freenode.net if you want to persue some advice, or email me privately.

James "Doc" Livingston said...

I was actually writing up some stuff about this earlier today, so people writing Rhythmbox plugin don't block the UI. Here's a draft of what I had:


The first step is finding where an application is blocking. There is a simple method that usually works:

1) run the application under gdb
2) when it is blocking the UI, switch to the terminal with gdb and type Control-C to interrupt the app
3) enter "thread apply all bt"
4) find the thread that has the mainloop. and copy it's stack trace somewhere
5) enter "cont" to continue the application
6) type Control-C a short time later why it's still blocking and goto (3) to get a few more stack traces


Next you need to look at the traces, and figure out where it's blocking, and what it's doing. How to fix it depends on what it's doing.


If it's blocking doing IO:
Make it use asynchronous IO, so you get a callback once it's finished instead of blocking. e.g. using gnome_vfs_async_*


If it's doing a computation involving a loop:
Change it to do small amounts of work in an idle callback, instead of blocking.

while (running)
do_stuff (data);

should be replaced with:

g_idle_add (idle_do_stuff, data);

static gboolean idle_do_stuff(gpointer data) {
do_stuff();
return running;
}


It's harder when the function calling the blocking operation can't return until it's done, or you are using a library which only has blocking operations. For those, you basically have to use a thread. This means the blocking operation must be thread-safe, however the code after it doesn't need to be (as it can still be run in the main thread).


If you're calling a library which block with code like:
result = blocking_operation (data);
process_result (result)

replace it with:
g_thread_create (operation_thread, data, FALSE, NULL);

gpointer operation_thread (gpointer data)
{
result = blocking_operation(data);
g_idle_add (process_result, result);
return NULL;
}


And if the fuction can't return until the it has been done, use:

g_thread_create (operation_thread, data, FALSE, NULL);
gtk_main ();

gpointer operation_thread (gpointer data) {
result = blocking_operation(data);
g_idle_add (process_result, result);
return NULL;
}

gboolean process_idle (gpointer result) {
process_result (result)
gtk_main_quit ();
return FALSE;
}

Unknown said...

Problem is many developers don't abuse the GNOME-VFS asynchronous functions and the GObject timeout functions enough. :-) Anyway, I'll have a look at Twisted again. I thought it was for network stuff alone.

Anonymous said...

The deal with the python applications you listed yumex, pup, pirut (and in fact all of the Fedora configuration tools) are:

a) they are generally not developed by desktop developers so certaint usablility issues are secondary to "does it work"

b) most of the work is done by librpm or other wrapped libraries which do not integrate into GNOME's main loop (which is the prereq for getting single thread apps to not block UI's)

Solutions:

Thread the app - may not be an option if the libraries are not thread safe

Fix the library - not too hard but may take a significant amount of time to code.


In responce to stephen's "Non-blocking asyncronous event driven code is reasonably hard to write, especially when integrating both GUI events and networking I/O." comment.

It isn't actually that hard. I just did it for the CUPS code which is in GTK+ right now. It just takes a bit of time breaking up the blocking code into smaller units of work and creating a state machine to manage when each gets called. In fact most of it was just copy and paste from the CUPS library.

Anonymous said...

We've run into these issues a lot in Quod Libet. A lot of times it's external modules being a pain, as J5 said.

For Python in particular: Python's "default" network handlers, urllib and urllib2, are old, crufty, and hard to integrate properly with PyGTK. Python tries pretty hard to hide the real socket from you. Which is good when you're writing Python, but bad when you'd like to hand it off to a GIOChannel handler for asynchronous I/O. Threading is also a bit pickier than in just C since you have competing thread implementations.

And of course, there are just a lot of programs that don't bother to use timeout_add, idle_add, or io_add_watch at all. With Python generators it's fairly easy to make a state machine for idle_add or timeout_add, which works great when the issue is long computation.

Jon Dowland said...

For C and similar programs, this technique might be useful. It avoids introducing threads, which is the approach most people think of at first. See also another code example.

Anonymous said...

Take a look at:
http://www.async.com.br/faq/pygtk/index.py?req=show&file=faq20.006.htp