Why Is It So Much Slower To Export My Data To .xlsx Than To .xls Or .csv?
I have a dataframe that I'm exporting to Excel, and people want it in .xlsx. I use to_excel, but when I change the extension from .xls to .xlsx, the exporting step takes about 9 se
Solution 1:
Pandas defaults to using OpenPyXL for writing xlsx files which can be slower than than the xlwt module used for writing xls files.
Try it instead with XlsxWriter as the xlsx output engine:
df.to_excel('file.xlsx', sheet_name='Sheet1', engine='xlsxwriter')
It should be as fast as the xls engine.
Solution 2:
As per different Python to Excel modules benchmark, pyexcelerate has better performance. Below code used to take sqlite tables data into xlsx file datasheets. table is not stored in xlsx file unless raw size is less than 1000000 raws. In that case info is stored in csv file.
defpassfile(datb, tables):
"""copy to xlsx or csv files tables from query results"""import sqlite3
import pandas as pd
import timeit
import csv
from pyexcelerate import Workbook
from pathlib import Path
from datetime import date
dat_dir = Path("C:/XML")
db_path = dat_dir / datb
start_time = timeit.default_timer()
conn = sqlite3.connect(db_path) # database connection
c = conn.cursor()
today = date.today()
tablist = []
withopen(tables, 'r') as csv_file: # tables to be collected file
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
tablist.append(line['table']) #column header
xls_file = "Param" + today.strftime("%y%m%d") + ".xlsx"
xls_path = dat_dir / xls_file # xls file path-name
csv_path = dat_dir / "csv"# csv path to store big data
wb = Workbook() # excelerator file initfor line in tablist:
try:
df = pd.read_sql_query("select * from " + line + ";", conn) # pandas dataframe from sqliteiflen(df) > 1000000: # excel not supportedprint('save to csv')
csv_loc = line + today.strftime("%y%m%d") + '.csv.gz'# compressed csv file name
df.to_csv(csv_path / csv_loc, compression='gzip')
else:
data = [df.columns.tolist()] + df.values.tolist()
data = [[index] + row for index, row inzip(df.index, data)]
wb.new_sheet(line, data=data)
except sqlite3.Error as error: # sqlite error handlingprint('SQLite error: %s' % (' '.join(error.args)))
print("saving workbook")
wb.save(xls_path)
end_time = timeit.default_timer()
delta = round(end_time - start_time, 2)
print("Took " + str(delta) + " secs")
c.close()
conn.close()
passfile("20200522_sqlite.db", "tablesSQL.csv")
Post a Comment for "Why Is It So Much Slower To Export My Data To .xlsx Than To .xls Or .csv?"