Artificial intelligence has learned to write piano music so that people no longer notice the difference

Soft

Just a few years ago, music created by artificial intelligence was easily recognized. It sounded mechanical, formulaic and devoid of emotion. Even an unprepared listener could almost immediately tell: it was written by an algorithm, not a person. However, new research in the field of music generation has shown… xrust. The study doesn't just describe a new model, but answers a much more important question: what does “success” even mean in AI music generation?

Music Turing test: what is the point

class=»notranslate»>__GTAG11__The classic Turing test tests whether a computer can fool a human in a conversation. In the musical version, everything is arranged in a similar way. Participants in the experiment are given piano pieces to listen to and asked to determine whether they were created by a person or an algorithm. If the listener cannot reliably distinguish one from the other, then the system has approached the human level of perception.

It was this approach that became central to the study. Scientists were not interested in compliance with formal metrics and not in the mathematical accuracy of predicting notes, but in people’s actual perception of music. The result was unexpected even for the authors themselves: the best model showed a recognition accuracy of about 50%. Simply put, listeners were essentially guessing without reliable reference points.

Why conventional metrics are misleading

For a long time, the quality of musical models was assessed by quantitative indicators. For example, how accurately does the algorithm predict the next note. Such metrics are convenient: they are easy to measure, compare and publish. But they have a serious drawback — they almost do not reflect how a person perceives music.

Music that strictly follows patterns may be technically “correct” but also boring. True composers often subvert expectations and create tension and surprise. From a statistical point of view, this may seem like a mistake, but for the listener, these are the moments that make the music come alive. Research has shown that optimizing for the wrong metrics results in models that look good in reports, but perform poorly in practice.

Three key success factors

The authors of the work did not try to find a “magic” architecture. Instead, they systematically examined the influence of three factors: model size, data volume and quality, and training strategy.

The first factor is the scale of the model. Transformers with a number of parameters from 155 to 950 million were tested. Larger models did perform better, but the quality gains dropped off quickly. Each subsequent increase in size gave less and less effect.

The second factor is data. The researchers compared two fundamentally different sets. MAESTRO is a small, carefully selected dataset of high-quality piano music. And Aria-Deduped — a huge collection of approximately 80 thousand MIDI files of different genres, styles and quality levels. Despite the “chaotic” nature, it was the second set that gave the best listening results.

The third factor is learning strategy. Models that were first trained on a large and diverse dataset and then fine-tuned on a small expert dataset significantly outperformed those trained from scratch.

Why data diversity is more important than “ideal quality”

One of the main findings of the study is that the diversity of data is often more important than its perfect purity. The large Aria set included music from different eras, genres and styles. This allowed the model to capture the fundamental patterns of piano music, rather than learning one specific performance format.

An analogy can be drawn with language models. They learn not from perfectly edited texts, but from millions of real statements. It is diversity that helps build intuition. The same turned out to be true for music.

Where does the benefit of zooming in on the model end

class=»notranslate»>__GTAG7__ The study also showed that endlessly increasing parameters does not make sense without an appropriate amount of data. A large model trained on a small dataset begins to remember rather than generalize. As a result, resources are wasted, and quality hardly improves.

The practical conclusion is simple: it is better to invest effort in collecting and expanding training data than in endlessly increasing the size of the neural network.

What real people hear

The final and most important stage was the audition. Participants in the experiment evaluated fragments of music without knowing who created them. In a number of genres, artificial intelligence has reached a level where the difference with human composition practically disappears. In other styles, the difference was still felt, which is directly related to the composition of the training data.

It is also important that people have their own prejudices. Some tend to consider unusual music the result of the work of AI, others — a sign of human genius. Therefore, 50% accuracy in such a test is actually a serious achievement.

The main conclusion of the study

The breakthrough occurred not because of one successful idea, but thanks to a systematic approach. Moderate model size, maximizing data diversity, and a pre-training strategy followed by fine-tuning proved to be the key to success.

This conclusion applies not only to music. It is relevant for all tasks that require the generation of complex sequences: text, code, speech and other forms of digital creativity. Artificial intelligence is getting closer to the point where the difference between machine and human creativity will no longer be obvious.

Xrust Artificial intelligence has learned to write piano music so that people no longer notice the difference

Оцените статью
Xrust.com
Добавить комментарий