I released a new tool this week: google-drive-to-sqlite. It’s a CLI utility for fetching metadata about files in your Google Drive and writing them to a local SQLite database.
It’s pretty fun!
Here’s how to create a SQLite database of every file you’ve started in your Google Drive, including both files created in Google Docs/Sheets and files you’ve uploaded to your drive:
% pip install google-drive-to-sqlite
% google-drive-to-sqlite auth
Visit the following URL to authenticate with Google Drive
https://accounts.google.com/o/oauth2/v2/auth?access_type=offline&...
Then return here and paste in the resulting code:
Paste code here:
# Authentication is now complete, so run:
% google-drive-to-sqlite files starred.db --starred
% ls -lah starred.db
-rw-r--r--@ 1 simon staff 40K Feb 20 14:14 starred.db
The OAuth client ID it is using hasn’t been verified by Google yet, which I think means that only the first 100 people to use it will be able to authenticate. If you need to you can work around that by creating your own client ID, as described in the README.
Having created that starred.db
file you can explore the resulting database using Datasette or Datasette Desktop:
datasette starred.db
# or if you have the Datasette Desktop macOS app installed:
open starred.db
Here’s Datasette running against one of my larger metadata collections:
Why build this?
I recently got involved with a participatory journalism project, where a team of reporters have used FOIA requests to gather a huge corpus of thousands of files. The files are in a complex folder hierarchy a Google Drive. I wanted to start getting a feel for what’s in there.
Pulling the metadata—file names, sizes, file types, file owners, creation dates—into a SQLite database felt like a great way to start understanding the size and scope of what had been collected so far.
Outside of that project, there’s something very exciting to me about being able to use Google Drive to collate all kinds of different data and then tie it into the larger Datasette and Dogsheep ecosystems. I think there’s a lot of potential here for all kinds of interesting projects.
How it works
The tool is written in Python using Click (based on my click-app template) and sqlite-utils. It works by calling the Google Drive API.
The auth
command needs to get hold of an OAuth access token scoped to make read-only calls to the user’s Google Drive contents.
This took a bit of figuring out. I wrote up what I learned in this TIL: Google OAuth for a CLI application
Notably, the end result of that flow is a JSON response containing both an access_token
and a refresh_token
.
The access token can be used to make authenticated API calls, but it expires