The deep learning community has been relying on powerful libraries enabling
more than I can dream of in terms of mathematical capabilities.
Back in the days, I worked on an artificial neural network project where we
implemented the derivatives where we would need them. Seeing those projects made
me willing to toy around with their capacities for other models,
not necessarily artificial neural networks.
We are going to try to fit a toy econometric model using gradient descent.
A single dependency: JAX. JAX is a computation library compatible with NumPy
able to leverage your GPU for learning… This is probably an overkill.
pipenv --python 3.10 install jax
So here are the import of our main.py
import jax
import jax.numpy as numpy
import random
Let’s fake some training data
raw_knowledge_base = [
# (sector, surface, parkings, property price to be added)
(0, 100, 1),
(0, 50, 0),
(0, 120, 1),
(0, 120, 2),
(0, 110, 1),
(0, 220, 3),
(0, 100, 2),
(0, 90, 0),
(1, 40, 1),
(1, 50, 0),
(1, 80, 0),
(1, 120, 2),
(1, 110, 0),
(1, 30, 0),
(1, 140, 1),
(1, 100, 1),
(1, 40, 1),
]
def get_mock_price(sector: int, surface: float, parkings: int) -> float:
"""
This function is just here to generate our learning data,
in the real world we would use known price for each line above
"""
standard_surface_price = 10000 * surface
standard_parking_price = 40000 * parkings
# 20% premium for sector 1
sector_surface_premium = standard_surface_price * sector * 0.2
# 10% premium for sector 1
sector_parking_premium = standard_parking_price * sector * 0.1
raw_price = standard_surface_price + standard_parking_price +
sector_surface_premium + sector_parking_premium
random_factor = 1 - (random.random() - 0.5) / 10 # from -5 to +5%
return round(raw_price * random_factor)
# Add a price to previous line
knowledge_base = [
(sector, surface, parkings, get_mock_price(sector, surface, parkings))
for (sector, surface, parkings) in raw_knowledge_base
]
At this point, we only have some fake data on which we are going to try to fit
our model.
Our model
To make it a bit more readable, we are splitting the model parameters from the
model input itself.
def get_model_price(params, x):
"""
params:
* [0] average price / sqm
* [1] price / sqm premium in sector 0
* [2] price / sqm premium in sector 1
* [3] price per parking
"""
surface_value = (params[0] + params[1] * (1 - x[0]) + params[2] * x[0]) * x[1]
parking_value = params[3] * x[2]
return surface_value + parking_value
The error
Our game is now to find the model parameters that are minimizing the error over
every entry of our knowledge_base
. So the first thing is to be able to
associate an error to a set of model parameters.
def get_model_error(params):
error = 0.0
for sector, surface, parkings, expected_price in knowledge_base:
x = (sector, surface, parkings)
model_price = get_model_price(params, x)
delta = expected_price - model_price
error += numpy.sqrt(numpy.power(delta, 2))
return error