Let me be clear upfront about what I mean by working with data. Generally working with data means that you have some form of data and you do some kind of analysis on it. Now once I have very clean structured data already available data analysis becomes easy.
But what if I have data that is not clean in form of files or databases? Then I have to read the files, clean it up and structure it in data structures according to my need (is THIS called parsing?). I am talking about this preprocessing part.
What CS or programming subjects should I study to become somewhat of an expert in data cleaning, preprocessing and structuring large amounts of files in batches?
I am also interested in the second part of the pipeline where I analyse the data and produce output both in terms of good visualisations and output data to be stored in files.
Any books/courses or any other types of resource pointers will be appreciated.
P.S.: Files can be anything. They are just streams of bytes. Images, audio, video, text, csv.