Why Is It So Much Slower To Export My Data To .xlsx Than To .xls Or .csv?

January 30, 2024 Post a Comment

I have a dataframe that I'm exporting to Excel, and people want it in .xlsx. I use to_excel, but when I change the extension from .xls to .xlsx, the exporting step takes about 9 se

Solution 1:

Pandas defaults to using OpenPyXL for writing xlsx files which can be slower than than the xlwt module used for writing xls files.

Try it instead with XlsxWriter as the xlsx output engine:

df.to_excel('file.xlsx', sheet_name='Sheet1', engine='xlsxwriter')

It should be as fast as the xls engine.

Solution 2:

As per different Python to Excel modules benchmark, pyexcelerate has better performance. Below code used to take sqlite tables data into xlsx file datasheets. table is not stored in xlsx file unless raw size is less than 1000000 raws. In that case info is stored in csv file.

defpassfile(datb, tables):
    """copy to xlsx or csv files tables from query results"""import sqlite3
    import pandas as pd
    import timeit
    import csv
    from pyexcelerate import Workbook
    from pathlib import Path
    from datetime import date
    dat_dir = Path("C:/XML")
    db_path = dat_dir / datb
    start_time = timeit.default_timer()
    conn = sqlite3.connect(db_path)                # database connection
    c = conn.cursor()
    today = date.today()
    tablist = []
    withopen(tables, 'r') as csv_file:             # tables to be collected file
        csv_reader = csv.DictReader(csv_file)
        for line in csv_reader:
            tablist.append(line['table'])           #column header
    xls_file = "Param" + today.strftime("%y%m%d") + ".xlsx"
    xls_path = dat_dir / xls_file                   # xls file path-name
    csv_path = dat_dir / "csv"# csv path to store big data
    wb = Workbook()                                 # excelerator file initfor line in tablist:
        try:
            df = pd.read_sql_query("select * from " + line + ";", conn)  # pandas dataframe from sqliteiflen(df) > 1000000:                   # excel not supportedprint('save to csv')
                csv_loc = line + today.strftime("%y%m%d") + '.csv.gz'# compressed csv file name
                df.to_csv(csv_path / csv_loc, compression='gzip')
            else:
                data = [df.columns.tolist()] + df.values.tolist()
                data = [[index] + row for index, row inzip(df.index, data)]
                wb.new_sheet(line, data=data)
        except sqlite3.Error as error:  # sqlite error handlingprint('SQLite error: %s' % (' '.join(error.args)))
    print("saving workbook")
    wb.save(xls_path)
    end_time = timeit.default_timer()
    delta = round(end_time - start_time, 2)
    print("Took " + str(delta) + " secs")
    c.close()
    conn.close()


passfile("20200522_sqlite.db", "tablesSQL.csv")

Python College

Why Is It So Much Slower To Export My Data To .xlsx Than To .xls Or .csv?

Solution 1:

Solution 2:

Post a Comment for "Why Is It So Much Slower To Export My Data To .xlsx Than To .xls Or .csv?"