• kindred@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    97
    ·
    3 months ago

    This is by far the largest music metadata database that is publicly available. For comparison, we have 256 million tracks, while others have 50-150 million. Our data is well-annotated: MusicBrainz has 5 million unique ISRCs, while our database has 186 million.

    Does this mean the MusicBrainz database will soon go from 5 million to 186 million tracks?

    • zingo@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      14
      ·
      3 months ago

      That’s exactly what I was wondering too.

      Acquiring high quality music is already easy enough in most cases.

      What I am interested in is the metadata. Accurate tagging of all my files is of high interest.

  • massive_bereavement@fedia.io
    link
    fedilink
    arrow-up
    76
    ·
    3 months ago

    I’ll strongly suggest to take out all the cheaply AI generated music from this “back up” and save themselves some space.

    • AnarchistArtificer@slrpnk.net
      link
      fedilink
      English
      arrow-up
      19
      ·
      3 months ago

      I’m not sure how they would go about doing that at scale without also getting some false positives and removing human music too

      • cheesybuddha@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        ·
        3 months ago

        You could cut off your search around the time AI tracks started to appear. Not sure when that was, maybe 2023. You’d miss a lot of recent stuff, but you’d filter out a lot of spam too

        • AnarchistArtificer@slrpnk.net
          link
          fedilink
          English
          arrow-up
          4
          ·
          3 months ago

          I see your point, but as you say, there would still be the tradeoff of missing more recent stuff. That might only involve missing a couple of years’ worth of stuff now, but AI isn’t going away any time soon, so it would mean that there’d be an increasing amount of human made music not being archived; One of the things I like about Anna’s archive is that they seem to look at this problem as a long term, informational infrastructure kind of way, so I imagine they wouldn’t be keen on stopping the archive at 2023.

          It seems they’ve opted for a different tradeoff instead: lower popularity songs are archived at a lower bitrate, and even the higher popularity stuff has some compression. Some archives go for quality, and thus prioritise high quality FLACs, so Anna’s archive are aiming to fulfill a different niche. I can respect that.

    • nibbler@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      9
      ·
      3 months ago

      do you have any numbers on the AI share? I doubt it’s more than a 2%, so I assume you are just virtue signalling on a completely unrelated topic here :-)

  • helpImTrappedOnline@lemmy.world
    link
    fedilink
    English
    arrow-up
    55
    ·
    edit-2
    3 months ago

    The data they compiled is really cool.

    If reading the chart right, the genera with the most artists is opera.

    Even if they didn’t have the music files, the analysis on the metadata is insane.

    Publicly admitting they are the origin of the torrents is definitely a risky an insane move. I don’t think they want Sony going after them, but also fuck Sony for locking art behind shitty contracts that forces these kind of projects to exist.

    • JensSpahnpasta@feddit.orgOP
      link
      fedilink
      English
      arrow-up
      31
      arrow-down
      1
      ·
      3 months ago

      Publicly admitting they are the origin of the torrents is definitely a risky an insane move. I don’t think they want Sony going after them

      Let’s be honest: Everybody is trying to go after Annas Archive. Every book publisher wants to get them, the US government, too and it really doesn’t matter if every music publisher wants them also. I hope that they are based in a country where the western systems can’t get them

      • Tangent5280@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        3 months ago

        I hope (also assume since it hasn’t been taken down yet) it’s more of a decentralised deal with servers in many places and backups in every nation under the sun

    • douglasg14b@lemmy.world
      link
      fedilink
      English
      arrow-up
      22
      ·
      3 months ago

      Yeah, it’s a wild move admitting that they are the source of pirated content for music here.

      We don’t need Anna’s Archive to go under as a result of Sony going after them because of this…

      • rainwall@piefed.social
        link
        fedilink
        English
        arrow-up
        11
        ·
        3 months ago

        They have had a dozen or more lawsuits/police actions against them. They are already enemy #1 in piracy terms, so I expect they are okay leaning into it and doing more good for the world.

  • lietuva@lemmy.world
    link
    fedilink
    English
    arrow-up
    46
    ·
    edit-2
    3 months ago

    There’s definitely gonna be some crazy guy who will put this on their server and stream it to their phones lol

  • arcterus@piefed.blahaj.zone
    link
    fedilink
    English
    arrow-up
    42
    ·
    3 months ago
    1. Over-focus on the most popular artists. There is a long tail of music which only gets preserved when a single person cares enough to share it. And such files are often poorly seeded.
    • We primarily used Spotify’s “popularity” metric to prioritize tracks. View the top 10,000 most popular songs in this HTML file (13.8MB gzipped).
    • For popularity>0, we got close to all tracks on the platform. The quality is the original OGG Vorbis at 160kbit/s. Metadata was added without reencoding the audio (and an archive of diff files is available to reconstruct the original files from Spotify, as well as a metadata file with original hashes and checksums).
    • For popularity=0, we got files representing about half the number of listens (either original or a copy with the same ISRC). The audio is reencoded to OGG Opus at 75kbit/s — sounding the same to most people, but noticeable to an expert.

    Perhaps I’m reading this wrong, but is this not a little backwards? Since unpopular music is poorly preserved, shouldn’t the focus be on getting the least popular music first?

    • JensSpahnpasta@feddit.orgOP
      link
      fedilink
      English
      arrow-up
      28
      arrow-down
      1
      ·
      3 months ago

      It depends on what your goal is: If you want to preserve the music that is important to most people or to the era, you should start with the most popular stuff. And Spotify has a big spam problem. Everybody who thinks he is a DJ wants his music to be on there and there is so much AI music flooding the scene. So it does make sense to backup what people are actually listening and not some AI-generated music spam nobody cares about.

      • arcterus@piefed.blahaj.zone
        link
        fedilink
        English
        arrow-up
        10
        ·
        3 months ago

        I mean, they say earlier that music is actually well-preserved, but it’s disproportionately popular music. If the goal is then to preserve everything, I’d expect them to go for stuff that isn’t likely to be in some random audiophile’s collection or whatever then.

      • mrdown@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        3 months ago

        I am pretty sure the major labels are already preserving the most mainstream artists. Msybe it should be sorting by the most popular independent artists

    • UltraMagnus@startrek.website
      link
      fedilink
      English
      arrow-up
      13
      ·
      3 months ago

      The politics of preservation is definitely an interesting one. I suppose one argument in favor of preserving more popular music is that there are going to be fewer popular tracks than unpopular tracks - and they’re already at 300TB, which is nothing to sneeze at, especially since it’s a third the size of their existing library of ebooks.

    • dustyData@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      3 months ago

      If we were talking about the ethnic music of an extinct tribe that uses a language on risk of disappearing, sure, you would be right.

      But think about it for a bit longer. They are just a commercial production that had no cultural impact in a population. They are still getting preserved in a format with a quality degradation that is imperceptible to the human ear. That’s usually enough. Audiophiles are usually overzealous about fidelity preservation. But the efforts are often misguided and discussions abound on technical topics that ultimately don’t matter.

    • Techlos@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      5
      ·
      3 months ago

      If you want that long tail, bandcamp and soundcloud are better sources. The barrier to entry is low with those, and there’s a plethora of small, niche artists just doing their own thing.

      For a representative snapshot of music though, it’s pretty amazing. It shows what a massive percentage of the planet listens to, preserved hopefully across many seeds, and historians will love shit like this in the future.

      • Prunebutt@slrpnk.net
        link
        fedilink
        English
        arrow-up
        8
        ·
        3 months ago

        AFAIK: Yes. But it’s supposedly a pain to set up, so I’ll never know the difference.

        • nymnympseudonym@piefed.social
          link
          fedilink
          English
          arrow-up
          12
          arrow-down
          1
          ·
          3 months ago

          TBH I plan to migrate off Funkwhale to something more featureful and yea it was a bit of complex set up. Props to the devs tho, it’s open source, stable, and does what it says on the tin

      • Wolf314159@startrek.website
        link
        fedilink
        English
        arrow-up
        4
        ·
        3 months ago

        No. Soulseek is old school P2P. All you need to do is run the client software, set a local shared folder, and your are client and server in one. Funkwhale is more like running your own Lemmy instance and building a community. The difference between them is like the difference between using Airdrop or Syncthing to share files and hosting hosting your own domain and server.

      • three@lemmy.zipBanned from community
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        2
        ·
        3 months ago

        Oh no, around here we mention esoteric software but we will never include any extra information in the post. If you know you know.

      • nymnympseudonym@piefed.social
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        3 months ago

        Soulseek afict requires dedicated clients. The Subsonic standard is supported by more & more mobile/PC apps, I wish it was supported

  • exu@feditown.com
    link
    fedilink
    English
    arrow-up
    5
    ·
    3 months ago

    Oo, I’ll have to check those when they release. I follow some artists that only upload to YouTube and Spotify, neither of which is ideal.

  • Agility0971@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    3 months ago

    So hear me out. Streamio can stream video from various sources including torrents. So it should be possible to create some music frontend that can access the music library similarly. Right? There are probably someone who is creating this right now.