Does JayDeBeApi work with multithreading?

Asked by Luke Miner

I'm trying to run a query via JayDeBeApi using a ThreadPoolExecutor from futures. Even when I only use one thread, I run into a segmentation fault error. Here's what gets dumped; I can get you a more complete dump if you'd like.

# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fe425b83bfe, pid=9440, tid=140617424148224
#
# JRE version: Java(TM) SE Runtime Environment (7.0_45-b18) (build 1.7.0_45-b18)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [_jpype.so+0x37bfe] JPJavaEnv::NewString(unsigned short const*, int)+0x3e

Question information

Language:
English Edit question
Status:
Solved
For:
JayDeBeApi Edit question
Assignee:
No assignee Edit question
Solved by:
Luke Miner
Solved:
Last query:
Last reply:
Revision history for this message
Bastian (baztian) said :
#1

How are you using JayDeBeApi exactly? Can you please post some code that shows how you instantiate JayDeBeApi and how you trigger your threads?

Revision history for this message
Luke Miner (lminer) said :
#2

Sure, thanks a lot! Here's a snippet:

def foo(connection, sql, filename):
    try:
        with open(filename, 'a') as f:
            writer = csv.writer(f, delimiter='\t')
            cursor = connection.cursor()
            cursor.execute(sql)

            # Loop through table at 5000 row increments and write to file
            result = cursor.fetchmany(size=5000)
            while result:
                writer.writerows(result)
                result = cursor.fetchmany(size=5000)
    except Exception as e:
        print e("error")
    finally:
        cursor.close()

con = jaydebeapi.connect('driver', [url, 'user', 'password'], 'driver_path')

service = futures.ThreadPoolExecutor(max_workers =1)

future = service.submit(foo, con, sql) # crashes here

future.result()

Revision history for this message
Luke Miner (lminer) said :
#3

whoops, second to last line should read:

future = service.submit(foo, con, sql, filename) # crashes here

Revision history for this message
Bastian (baztian) said :
#4

Thank you for your code. What I wanted to check if you are not instantiating a new connection in each thread which you aren't doing.

Actually I can't really tell if JayDeBeApi is thread safe. I haven't tested it much in that regard. But there shouldn't be any code in JayDeBeApi that should prevent the use of concurrency. I don't know how well JPype (the underlying framework I use to connect to Java classes) works in concurrency scenarios.

Another important aspect is if your JDBC-driver is designed with concurrency in mind. Please check the db JDBC driver documentation for that. Maybe you're better of creating a connection for each thread, with or without a connection pool. But be careful with that: you might have to initialize JPype yourself before invoking JayDeBeApi and then you shouldn't supply the driver_path as you do now. Feel free to ask again if you need help on that.

Maybe I should think about enhancing JayDeBeApi to provide some kind of automatic connection pooling or specific thread support. But I don't really know what is needed. So please report about your success or failure.

Revision history for this message
Luke Miner (lminer) said :
#5

Thanks a ton for the help on this, I'll be sure to report back!

I've used JayDeBeApi with sqlalchemy's connection pool and that seems to work fine. The driver definitely supports concurrency; this project is just a port of a multithreaded Java program.

Looking through the JPype documents, I saw the following. Not sure if this might explain things:

For the most part, python threads based on OS level threads (i.e posix threads), will work without problem. The only thing to remember is to call jpype.attachThreadToJVM() in the thread body to make the JVM usable from that thread. For threads that you do not start yourself, you can call isThreadAttachedToJVM() to check.

http://jpype.sourceforge.net/doc/user-guide/userguide.html#python_threads

How would I go about initializing JPype and invoking JayDeBeApi? Do I need to make this call to jpype.attachThreadToJVM() ?

Revision history for this message
Bastian (baztian) said :
#6

Thanks for pointer to the JPype threading information. With this the solution should be very simple:
import jpype
def foo(connection, sql, filename):
    try:
        if not jpype.isThreadAttachedToJVM():
            jpype.attachThreadToJVM()
        with open(filename, 'a') as f:
            ...

The rest of code the can stay unchanged. I haven't tested this.

I don't want to require a user of JayDeBeApi having JPype knowledge. So if this works I would favor a JayDeBeApi solution. Either an explicit method like
connection.attach_to_thread() # calls isThreadAttached() and attachThreadTo()
or implicit calls in connect(...) and/or connection.cursor() or even in every method of JayDeBeApi. Don't know if there is a performance impact.

Revision history for this message
Bastian (baztian) said :
#7

I'm curious how you are using sqlalchemy connection pool together with JayDeBeApi. Could you post some more details about that? And just for statistical information: What Database are you using?

Revision history for this message
Luke Miner (lminer) said :
#8

Your code worked perfectly at least on an individual thread! Haven't had a chance to check on multiple threads yet. Will report back.

The database that I'm using is proprietary, created and employed solely by the company I'm working for so not sure how useful it is for stats. In terms of the sqlalchemy code, I just followed the directions on their website: http://docs.sqlalchemy.org/en/rel_0_9/core/pooling.html

Here's an example:

import sqlalchemy.pool as pool
import jaydebeapi

def getconn():
    c = jaydebeapi.connect('com.location.of.driver',['127.0.0.1', username, password], '/path/to/driver/')
    return c

mypool = pool.QueuePool(getconn, max_overflow=10, pool_size=5)

# get a connection
conn = mypool.connect()

# use it
cursor = conn.cursor()
cursor.execute("select foo")

Revision history for this message
Luke Miner (lminer) said :
#9

As an update. I managed to test this with multiple threads and it went off without a hitch.

Revision history for this message
Bastian (baztian) said :
#10

Awesome, thank you!

Revision history for this message
Mart Sillence (martin-luminoussheep) said :
#11

Hi,

I'm hitting the same problem with woosh, it might be how we are connecting. I added the lines:
        if not jpype.isThreadAttachedToJVM():
            jpype.attachThreadToJVM()

to where we get the connection and all is good now, not sure if you want to consider putting this in for all calls to the API?