Skip to content Skip to sidebar Skip to footer

How To Scale Psycopg2 Insert And Select With Single Process In Python?

It takes average of about 0.300095081329 for my insert to go through to finish commit to postgres. Here is my table pattern id_table latest_update_id (primary index) produc

Solution 1:

Do a single insert query in instead of 3. Notice the triple quotes and dictionary parameter passing:

insert_query = """
    with i as (
        insert into id_table (product_id, publish_date) 
        values (%(product_id)s, %(publish_date)s)
        returning latest_update_id
    )
    insert into product_table (
        latest_update_id,
        product_id,
        note_related_info1,
        note_related_info2
    ) values (
        (select latest_update_id from i),
        %(product_id)s, %(note_related_info1)s, %(note_related_info2)s
    )
    returning *
"""

db_cursor.execute(insert_query, my_dict)

Solution 2:

Followup on my network comment.

Say you have 100ms roundtrip (like the time for SELECT 1).

If you want to chain queries, then you will have no other choice than to do INSERT... with tons of values to amortize the roundtrip time.

This is cumbersome, as you then will have to sort through the returned ids, to insert the dependent rows. Also, if your bandwidth is low, you will saturate it, and it won't be that fast anyway.

If your bandwidth is high enough but your ping is slow, you may be tempted to multithread... but this creates another problem...

Instead of having, say 1-2 server process churning through queries very fast, you'll have 50 processes sitting there doing nothing except waste valuable server RAM while they wait for the queries to come over the slow network.

Also, concurrency and lock issues may arise. You won't do just INSERTs... You're going to do some SELECT FOR UPDATE which grabs a lock...

...and then other processes pile up to acquire that lock while your next query crawls over the network...

This feels like using MyISAM in a concurrent write-intensive scenario. Locks should be held for the shortest time possible... fast pings help, putting the whole chain of queries from lock acquisition to release lock inside a stored proc is even better, so it is held for only a very short time.

So, consider executing your python script on the DB server, or on a server on the same LAN.

Post a Comment for "How To Scale Psycopg2 Insert And Select With Single Process In Python?"