RSI is something that scares the bejeezus out of me. As someone who
works with their fingers, losing the ability to type – or even losing my current proficiency to talk to my system – would feel like losing the ability to walk. I have had multiple family members affected with RSI, and the anecdotal evidence of it showing up out of nowhere has led me
down the path of mobility work, sparring and doing many, many preventative things in the hope of giving myself as much useful time with my hands as I can. I do occasionally get wrist pain that needs me
to stop working, but thankfully nothing has stuck.
I also wonder what I would do if I did find myself on the other side. I’ve kept an eye on voice to code interfaces, but the learning curve has always looked steep, and full of memorised voice commands. For a field that changes quickly, this can be a problem.
With the introduction of ChatGPT and
Whisper, it has been an exciting week to say the least. Whisper.cpp, llama.cpp and now alpaca.cpp have hinted at how good local tools are about to get, and drove me down a quick study of what we can do today in this regime.
I’ll pick something that I can read aloud – as I would think of the code to myself – and we’ll see how much gets through the system. With something as pedantically processed as source code, we should be able to compare line for line and see where we could do better.
Where do we start?
Let’s try the OpenAI code snippet for python to connect to the Chat Completions API.
import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(completion.choices[0].message)
Here’s the audio of me describing it. Let’s convert it to the 16 khz wav using ffmpeg, and give it to the large whisper model:
ffmpeg -i ~/Downloads/Whisper.cpp.m4a -acodec pcm_s16le -ar 16000 ../testscripts/whisper.cpp.wav
./main -m models/ggml-large.bin -t 4 ../testscripts/code.wav --print-colors --output-txt
Did you know you could prompt Whisper? This is primarily being used for vocabulary injection, but we’ll try using more detailed prompts to see if it can understand what we tell it – it should. First we’ll try no prompts at all.
Prompt: None
We get a pretty garbled string back that would be pretty unusable as code.
Import OS, import OpenAI. OpenAI.API_KEY = OS.getenv(OpenAI_API_KEY).
Completion = OpenAI.chatCompletion.create(parameters.model = tpt3.5.turbo(messages.array.firstElement.role(user.content.hello))
CloseArray, CloseObject, CloseParameter. Print Completion.choices[0].message
Let’s try telling it what we’re doing. The prompt can be Reading the openai python API out loud -
:
import OS import openai openai.api key equals os.getenv openai_api_key
completion equals openai.chat_completion.create
parameters model equals tpt 3.5 turbo messages array first element role user
content hello close array close object close parameter print completion dot
choices 0 dot message
That did next to nothing, but I think some of the capitalization is gone. It might actually understand what we’re saying.
One big change we want to make is to get rid of the written special characters and turn them into the actual characters. The prompt is convert spoken characters into the characters themselves
:
Import OS, import OpenAI. OpenAI.API_KEY = OS.getenv(OpenAI_API_KEY).
Completion = OpenAI.chatCompletion.create(parameters.model = tpt3.5.turbo(messages.array.firstElement.role(user.content.hello))
Close array, close object, close parameter. Print Completion.choices[0].message
Not there yet, but a hell of a lot better.
Let’s try combining our prompts to see if that works better. The prompt is convert spoken characters into the characters themselves. Reading the openai python API out loud -
:
import os import openai openai.api key equals os.getenv openai underscore api underscore key
capital completion equals openai.chat completion dot create parameters model equals tpt 3.5 turbo
messages array first element role user content hello close array close object close parameter
print completion dot choices zero dot message
That was actually worse. We’ll try one more thing before we move on. The prompt is ```python
:
import OS import OpenAI openAI.API_KEY = OS.getenv(OpenAI_API_KEY)
completion = OpenAI.chatCompletion.create
parameters model = tpt3.5.Turbo
messages array firstElement role user content hello
close array close object close parameter
print completion.choices[0].message
Ever so slightly better, but we need help.
Let’s feed our last transcript into GPT-4 with the following prompt, and see if it can, with it’s much bigger brain, figure out what we meant instead of what we said.
Prompt:
Here is a transcript of me reading some python code. Please convert it as accurately as you can, into only valid python code. “Import OS, import OpenAI. OpenAI.API_KEY = OS.getenv(OpenAI_API_KEY). Completion = OpenAI.chatCompletion.create(parameters.model = tpt3.5.turbo(messages.array.firstElement.role(user.content.hello)) CloseArray, CloseObject, CloseParameter. Print Completion.choices[0].message
Result:
import os
import openai
openai.api_key = os.getenv("OpenAI_API_KEY")
completion = openai.ChatCompletion.create(
model="text-davinci-002",
messages=[{"role": "user", "content": "hello"}]
)
print(completion.choices[0].message)
Works out of the box! But maybe it knew the answer somehow? I don’t see why OpenAI would feed it its own docs, but it’s not out of the question.
Let’s try a more complicated piece of code.
Here’s a piece of some python code I wrote over half a decade ago, describing a fire detection engine using cv2:
def fire_analysis_engine(folder,filename):
video = cv2.VideoCapture(folder+"http://olickel.com/"+filename)
pointer = int((hist_samples/(temp_max-temp_min))*(threshold_temp-temp_min))
counter = 0
background = None
ret, frame = video.read()
areas = []
deltaXY = []
while(ret!=False):
posX = 0
posY = 0
if(counter==0):
counter+=1
background_mask = cv2.bitwise_not(cv2.inRange(frame, np.array([245,245,245]), np.array([255,255,255])))
continue
original = frame
frame = cv2.bitwise_and(frame,frame,mask=background_mask)
frame = frame[32:144,0:277]
framehist = np.histogram(cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY).ravel(),bins=256,range=(0,256))
Here’s me reading it aloud without much care about how I read it – a lot like reading it to another dev that knows python.
Let’s start with no prompt. Whisper.cpp gives us –
def fire_analysis_engine(folder,filename)video=cv2.videocapture(folder+"http://olickel.com/"+filename)
pointer=int[]histsamples/temp_max-temp_min][][][][]
counter=0 background=none
ret,frame=video.read() areas=empty_array() delta x y =()
while ret is not equal to false
pos x = 0 pos y = 0
if counter==0 colon counter+=1
background_mask=cv2.bitwise_not(cv2.in_range(frame,np.array()
[][][][]
original=frame frame=cv2.bitwise_and(frame,frame,mask=background_mask)
frame=frame[32:144,0:277]
start processing
frame_hist=np.histogram(cv2.cvt_color(frame,cv2.color_bgr2_gray).ravel(bins=256,range=0,256)
Running thi