Follow

Super vague AI question for MIR 

Super vague AI question:

I have a pile of wav files and I would like some AI to come in and do an unsupervised analysis of them where it comes up with some ways in which they're distinct from each other and give them a score of 1-5 on how close they are to each pole of whatever it decides makes them distinct. So it decides on some pronounced x-factor and then scores them accordingly.

My questions are:
1. What's the terminology to describe this process?

2. Is there code lurking around (preferably in python or SuperCollider) that can do the machine learning/analysis/(see question 1)?

3. Are the results of it likely to be something that humans can perceive? (Or is there a way to encourage that?)

Web 2 4 1

Super vague AI question for MIR 

@celesteh not unsupervised but sounds like a 'K-nearest neighbour' could help with the classification?

Super vague AI question for MIR, not an answer 

@rra @celesteh en.wikipedia.org/wiki/Self-org is a nice technique to get a low dimensional map from a high dimensional set of points. The main problem is that different random seeds can give different maps, so the (e.g.) 2D coordinates aren't meaningful in themselves: however (this is the point of it) nearby 2D points should have similar high-dimensional characteristics.

I used it in one project:
1. calculate energy per octave of each WAV section (over 10 octaves or so)
2. make a 2D SOM out of the 10D data
3. analyzing a long WAV, make a Markov chain out of the 2D SOM coordinates ("nearest SOM node" is a discrete space)
4. run the Markov chain probabalistically to generate sound

Super vague AI question for MIR, not an answer 

@mathr @rra

That is a cool project!

What I want to do is use the original audio files with the scores. So I've got a playback engine that picks a starting point and a target and interpolates between that over time. On the way, it plays audio files with scores near it's current state. This is a way to organise what files should be played when and impose musical values on a relatively random collection of sounds.

Super vague AI question for MIR 

@celesteh
Background: The better an Algo is or solving certain problems the worse it is at the rest of the possible problems (No Free Lunch Theorems).

And for an Algo to treat it as sound or music you need to tell it it's music and it needs to understand what to do with that. For example, you could submit the wave file to an image Algo and get interesting results, but to your question 3, not necessarily something a human would perceive as similarity.

Super vague AI question for MIR 

@celesteh
#1: you're looking for a distance or similarity metric that works for human sound or music perception. If you get a model that's already trained, it may have been supervised but you don't want to train one so from your perspective, unsupervised. You may organize and redistance the files with a higher dimensional model or kmeans clustering as suggested (both forms of unsupervised training the model to your data).

Super vague AI question for MIR 

@celesteh
#2 IDK what's out there sorry!
#3 expanding on #1: why you need something that models human perception of sound / music. Consider a fugue. A human can recognize that as many layers and repetitions of the same theme at different tempos, pitches, etc. However, pitch and tempo are both obscured in a wave file. So, it depends what the Algo looks for. Hence a music Algo beats an image Algo here. And I'd bet a temporal analysis would fall somewhere in between.

Sign in to participate in the conversation
post.lurk.org

Welcome to post.lurk.org, an instance for discussions around cultural freedom, experimental, new media art, net and computational culture, and things like that.