Importing massive amount of data
Now that our environment is ready, we can begin working with bigger datasets. Let's start by profiling the import process and then optimize it. We will start with our small geocaching dataset and after the code is optimized we will move to bigger sets.
In your
geodata_app.py
file, edit theif __name__ == '__main__':
block to call the profiler.if __name__ == '__main__': profile = cProfile.Profile() profile.enable() import_initial_data("../data/geocaching.gpx", 'geocaching') profile.disable() profile.print_stats(sort='cumulative')
Run the code and see the results. Don't worry about duplicated entries in the database now, we will clean it later. (I removed some information from the following output for space reasons.)
Importing geocaching... 112 features. Done! 1649407 function calls (1635888 primitive calls) in 5.858 seconds cumtime percall filename:lineno(function) 5.863 5.863 geodata_app.py:24(import_initial_data) 5.862 5.862...