• ElectroNeutrino@lemmy.world
    link
    fedilink
    English
    arrow-up
    174
    arrow-down
    1
    ·
    1 year ago

    How about just not auto-convert everything and keep the integrity of the data unless specifically asked to? Is that so hard?

    • Chais@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      113
      arrow-down
      1
      ·
      edit-2
      1 year ago

      Microsoft assumes their users are complete idiots, even when they (the users) are actively trying to convince them (Microsoft) otherwise. No matter how advanced the feature may be, they’ll assume you found instructions somewhere to do something entirely unrelated and they constantly have to save you from yourself. As a result you constantly have to fight the OS for access and control to get it to do what you want.
      If you’re even a bit of a power user that is, of course.

      But more often than not Microsoft’s assumption is probably spot on.

      • WhatAmLemmy@lemmy.world
        link
        fedilink
        English
        arrow-up
        7
        ·
        1 year ago

        That assumption is perfectly good for a default. Not a mandatory feature that power users have to live with.

        • Chais@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          1 year ago

          As a default, sure. Should be one that’s easily changed, though. Repeatedly fighting the machine that’s supposed to do your bidding and make your life easier gets old rather quickly. A machine you own and administrate, let’s not forget that.

    • Black616Angel@feddit.de
      link
      fedilink
      English
      arrow-up
      25
      ·
      1 year ago

      Excel is inherently flawed in its design.

      The thing is, that excel already has half the means of what would be necessary to really fix this bug. That is a field for each cell where the original text can stay.

      An excel sheet is just a bunch of XML files zipped in a specific structure. You can unpack a file and look for yourself.
      Each worksheet is it’s own file and each cell is subdivided into the value and the formula, that generated this value (or nothing, if there is no formula).
      Excel could easily fix this issue by adding another possible cell attribute like “original” or “plain” that, when set, allows you to roll back any conversion.

      But no, they go a half assed way as always and screw up even more.

      • RunningInRVA@lemmy.world
        link
        fedilink
        English
        arrow-up
        15
        ·
        1 year ago

        In order to do that I think they would first have to ratify a standards change to the Excel format, which is open.

        • Black616Angel@feddit.de
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Uh, I mean kinda…

          Excel implements two Microsoft file format standards:

          • ECMA-376
          • ISO 29500

          Those are not the same and even incompatible in parts. It is correct, that Microsoft tries to use ISO 29500 more, but most files (2007) still are ECMA-376.

          But yes, they kinda would have to change their shitty, ISO-incompatible ISO “standard” to fix this issue this way.

          Or use the formula field, idk. 😅

    • sndrtj@feddit.nl
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Excel is never ever going to break backwards compatability. In fact, quite some “features” in Excel are just there to stay bug-for-bug compatible with existing systems.

      Example: Excel stores dates internally as a float - called the serial date, you can view it by running DATEVALUE on any cell that contains a date. It is supposed to be the number of days since 1 January 1900. However, since early Excel versions had to be compatible with Lotus1-2-3, Excel had to be compatible with a bug in Lotus123: they had erroneously assumed 1900 to be a leap year. In addition, the indexing is off by one. So the actual 0 epoch of an Excel serial date is 30 December 1899 for all dates starting 1 March 1900.

  • MelodiousFunk@kbin.social
    link
    fedilink
    arrow-up
    114
    arrow-down
    1
    ·
    1 year ago

    Me before reading the article: It’s got to be dates. Excel thinks everything is a date.

    Me after reading the article: Even the workaround is halfhearted. Jeebus.

    • TwinHaelix@reddthat.com
      link
      fedilink
      English
      arrow-up
      13
      ·
      1 year ago

      Microsoft’s blog adds caveats, such as that Excel avoids the conversion by saving the data as text, which means the data may not work for calculations later. There’s also a known issue where you can’t disable the conversions when running macros.

  • Artyom@lemm.ee
    link
    fedilink
    English
    arrow-up
    67
    arrow-down
    4
    ·
    1 year ago

    The idea that any scientist is doing data analysis in Excel is honestly terrifying on every level.

    • griffinsklow@feddit.de
      link
      fedilink
      English
      arrow-up
      18
      ·
      1 year ago

      I remember when a biologist asked us for help - Excel crashed on processing his 700MB tables. Took some time and Chatgpt to convince him to do the analysis in R. It worked out in the end and he is now recommending this solution to his colleagues, which is nice.

    • Blackmist@feddit.uk
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 year ago

      Flashback to the time the UK government lost 16,000 positive COVID patients because Excel has a 1 million row limit.

      If only there were better ways of storing large amounts of records with a fixed structure. Maybe the future will provide such technology…

    • Evotech@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      7
      ·
      1 year ago

      Excel is excellent at data analysis… Python integrations and everything

        • filcuk@lemmy.zip
          link
          fedilink
          English
          arrow-up
          10
          arrow-down
          1
          ·
          edit-2
          1 year ago

          Because every scientist is also a programmer?
          Especially if they struggle to use Excel properly, no chance.

    • Wooshock@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      10
      ·
      1 year ago

      What the hell else is there? Good luck getting universities using OpenOffice

      • asdfasdfasdf@lemmy.world
        link
        fedilink
        English
        arrow-up
        15
        arrow-down
        4
        ·
        edit-2
        1 year ago

        Scientists should be using programming languages like R or Python. They are both extremely popular in this field, much more than Excel.

          • Hawk@lemmynsfw.com
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            3
            ·
            1 year ago

            Except every scientist and analyst. Stats, data sci and ML is done in R and Python, be it astro, health data or genomics.

            If someone has been taught stats in spreadsheet software, they have have been taught wrong, period.

            Also, programming is a very strong term. we’re talking about stats in a scripting language, not software development in CPP.

              • atzanteol@sh.itjust.works
                link
                fedilink
                English
                arrow-up
                2
                arrow-down
                1
                ·
                1 year ago

                Programming in R or Python isn’t a lot harder than learning how to get Excel to do what you want. I’d wager it’s easier since you don’t have to fight your tools.

                Excel has its place for simple quick calculations. But at some point it’s simply the wrong tool.

          • isles@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            2
            ·
            1 year ago

            Research projects almost exclusively have more than one person working on them.

  • neuropean@kbin.social
    link
    fedilink
    arrow-up
    49
    ·
    1 year ago

    Thank god! You have no idea how awful this is for scientists. Need to paste some gene names down? Better hope it’s not MARCHF8 or in the Septin gene family, otherwise you have to convert columns to text then import the data. Seems like a simple fix, but many wet lab biologists are technologically challenged.

  • chepox@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    41
    arrow-down
    1
    ·
    1 year ago

    "Microsoft’s blog adds caveats, such as that Excel avoids the conversion by saving the data as text, which means the data may not work for calculations later. There’s also a known issue where you can’t disable the conversions when running macros. "

    This sounds very half assed…

  • JoBo@feddit.uk
    link
    fedilink
    English
    arrow-up
    39
    ·
    1 year ago

    It’s no good having this as part of the user options. It should be a sheet characteristic and the default should be “keep cells exactly as entered regardless of data type”.

    • kalleboo@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      6
      ·
      edit-2
      1 year ago

      Changing the default will break the workflows of tens of thousands in the business industry

      Scientists should be using something like MATLAB, not Excel.

      • RheingoldRiver@kbin.social
        link
        fedilink
        arrow-up
        3
        ·
        1 year ago

        You could make a new filetype, default new versions to it, & not break compatibility. Wouldn’t do anything for existing workbooks, and keep xlsx an option, but “it would break compatibility” is not a be-all end-all argument against this.

      • JoBo@feddit.uk
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        1 year ago

        They’re not doing their analysis in Excel. MATLAB solves no problems here?

  • AutoTL;DR@lemmings.worldB
    link
    fedilink
    English
    arrow-up
    32
    ·
    1 year ago

    This is the best summary I could come up with:


    In 2020, scientists decided just to rework the alphanumeric symbols they used to represent genes rather than try to deal with an Excel feature that was interpreting their names as dates and (un)helpfully reformatting them automatically.

    Yesterday, a member of the Excel team posted that the company is rolling out an update on Windows and macOS to fix that.

    Excel’s automatic conversions are intended to make it easier and faster to input certain types of commonly entered data — numbers and dates, for instance.

    But for scientists using quick shorthand to make things legible, it could ruin published, peer-reviewed data, as a 2016 study found.

    Microsoft detailed the update in a blog post this week, adding a checkbox labeled “Convert continuous letters and numbers to a date.” You can probably guess what that toggles.

    The update builds on the Automatic Data Conversions settings the company added last year, which included the option for Excel to warn you when it’s about to get extra helpful and let you load your file without automatic conversion so you can ensure nothing will be screwed up by it.


    The original article contains 225 words, the summary contains 184 words. Saved 18%. I’m a bot and I’m open source!

    • JackGreenEarth@lemm.ee
      link
      fedilink
      English
      arrow-up
      19
      arrow-down
      60
      ·
      1 year ago

      Why are scientists using a paid service such as Excel anyway? Shouldn’t they be using something like Libre Open Office?

      • kaitco@lemmy.world
        link
        fedilink
        English
        arrow-up
        50
        arrow-down
        8
        ·
        1 year ago

        Many scientists are based out of corporations or universities who contract with Microsoft, so Excel would be the default solution for working with spreadsheets.

        Also, when it comes to “office” applications, there is no real substitute for Excel. Word processing, presentations, email, notes; there are many open and closed source alternatives that will do the same if not better than MS Office applications. Excel, however, is the exception.

        LibreOffice Calc, G-Sheets, Apple’s Numbers, or the myriad of competitor office solutions have never matched Excel for in-depth analyses or overall function. For just basic features, one could limp by with most alternatives, but doing real analytical work within spreadsheets requires Excel.

        • themeatbridge@lemmy.world
          link
          fedilink
          English
          arrow-up
          31
          arrow-down
          18
          ·
          1 year ago

          “Real analytical work” shouldn’t be done in spreadsheets at all. You should use a database. Basic spreadsheet features are all you should ever use spreadsheet software to do anyway.

          • kaitco@lemmy.world
            link
            fedilink
            English
            arrow-up
            38
            ·
            1 year ago

            While you will commonly hear that you shouldn’t use Excel as a database, it happens all the time.

            Excel is generally more accessible than something like Access or other proprietary database applications, and given that a lot of initial data originally lives in a spreadsheet, it’s the simplest solution that doesn’t require something like SQL coding knowledge to access.

            Basic spreadsheet features are all you should ever use spreadsheet software to do anyway.

            It depends on what you mean when you say “basic”. A spreadsheet with filters or maybe some pivot tables? A spreadsheet connecting to 12 others with refreshes created using VBA code so that end users just need to click a button and see their data? A spreadsheet that connects to a database, runs several queries, and spits out data in an easy to read form? There are folks who consider pivot tables and the use of any code to be “advanced” use of Excel. There are also folks who have made full-on applications with Excel and consider those to be made with only “intermediate” grade knowledge.

            I’ve found that the more you know about an application like Excel, the more you realize what you don’t know.

            • radix@lemmy.world
              link
              fedilink
              English
              arrow-up
              18
              arrow-down
              1
              ·
              1 year ago

              Excel does 1000 different things, and for 998 of them, there’s at least one better option.

              The two things Excel does best: 1) be accessible to everyone from the greenest high schooler to the most senior IT admin. 2) do those 1000 different things at least somewhat competently.

              • captainlezbian@lemmy.world
                link
                fedilink
                English
                arrow-up
                8
                ·
                1 year ago

                Exactly. Like personally I’d rather do libreoffice for data entry, spit out a csv, and slap that into an R based analyzer, that’s because I have an irrational hate for excel’s graphs compared to ggplot2. I do use excel a lot though in my job because fuck it it just works for basically everything

          • gcheliotis@lemmy.world
            link
            fedilink
            English
            arrow-up
            15
            ·
            1 year ago

            “Real analytical work” (I will take that to mean work people actually care about and may even pay good money for), is done with whatever does the job, on the given timeframe, and the analyst, researcher, or team are comfortable with. That may well be Excel. Or not. Depending on the task and people. But your audience will always care more for the appropriateness of your analytical approach for the given audience and task, and of course your results, rather than the tools you used to get there. Of course spreadsheets have limitations and one will do well to know them.

            • LogarithmicCamel@feddit.uk
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              1
              ·
              1 year ago

              I have already seen data having to be thrown away because the researcher copied and pasted it incorrectly from multiple spreadsheets and no one could tell what the correct data was anymore. No one should be doing this if they are responsibly doing “real analytical work”.

          • stifle867@programming.dev
            link
            fedilink
            English
            arrow-up
            6
            ·
            1 year ago

            As a user you don’t always have access to the database. It’s much easier to work out of Excel than to find the right person to ask in the corporate hierarchy just for them to say no.

        • jasory@programming.dev
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          1 year ago

          Gnumeric is superior for numerical evaluation.

          Also any analysis on scale will use some proper programming language often in C or Fortran since Excel is simply far too slow.

        • Zeth0s@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          13
          ·
          edit-2
          1 year ago

          No one does real analytical work with excel… If one is using excel, they are doing basic analytical work that can be done pretty much by every spreadsheet software.

          It is just habit. People are used to excel, and are not competent enough to use more advanced tools to do real analytical work. And that’s fine. If one is good in a lab doesn’t necessarily need be good in data science

        • dream_weasel@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          14
          ·
          edit-2
          1 year ago

          “Real analytical work” is the ultimate power-tools-injure scenario with excel, and that’s why this article exists.

          Programmers using actual databases and crafting custom analysis do not have this problem. There is a time and a place for excel, and this ain’t it; leave it to secretaries and people trying to copy data into word documents. I like a pivot table as much as the next guy, but JFC, learn to program, learn git, write in latex, publish science.

          • captainlezbian@lemmy.world
            link
            fedilink
            English
            arrow-up
            7
            ·
            1 year ago

            I’m as much of an R fangirl as the next lady, but still scientists come from any number of technical skill sets. Hardcore analytics is probably gonna flounder in excel, but if you can’t convince IT to let you have something better you can throw together some chi square test or an anova to get an analysis of your data. And often that will be enough.

            • dream_weasel@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              2
              ·
              1 year ago

              Excel just isn’t a database (evidenced by the fact that Microsoft also has Access) and it also just isn’t a one stop shop for analytics. Having spent 17 years in academia, I’m well aware that people are resistant to learning new things and also aware that sometimes you NEED to.

              Sure you can do some line fitting and sensitivity analysis stuff and that is great for preliminary work, but excel is just not the one stop shop people want it to be. PowerPoint is also turing complete, but just because you can doesn’t mean you should program with it.

              The fact that the month rename problem is killing scientific data is just a smell related to the fact that sometimes you’ve got to stop and ask yourself “what am I trying to do” and “what tool should I be using to do it”.

              IMO excel should be left to the MBAs and management: if you are smart enough to do set up analysis of variance or run a t-test or have an intelligent discussion about p values, you SHOULD NOT be dependent on excel.

      • driving_crooner@lemmy.eco.br
        link
        fedilink
        English
        arrow-up
        16
        ·
        edit-2
        1 year ago

        In college a professor gave us some homework to be done in excel, and as the nerd that I am, I asked if Livre Office was ok because I use Linux and have no access to Excel. The professor was like, well in that case everyone do the homework on R or python. My classmates were really mad at me for that.

      • Zeth0s@lemmy.world
        link
        fedilink
        English
        arrow-up
        14
        arrow-down
        1
        ·
        edit-2
        1 year ago

        By experience, being a scientist doesn’t mean one is the smartest guy in the room. Just that one has passion and luck and luxury to pursue that passion.

        Many use alternatives to excel (R, python, Matlab, libreoffice).

        For others installing a software is challenging enough that they use whatever provided by IT.

        The remaining don’t give a sh*it, they are too busy in exploiting or in being exploited. No time to think about what is better

      • Tavarin@lemmy.ca
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        1
        ·
        1 year ago

        I’ve had the same copy of excel since high school, and it’s done a damn fine job processing experimental date through undergrad, my PhD, and 6 years as a working researcher.

        It’s also the software pretty much everyone has, so you can easily share data with collaborators and other researchers. And it has a ton of functionality so you can process and analyze data easily, and create the visuals for papers very easily.

      • LogarithmicCamel@feddit.uk
        link
        fedilink
        English
        arrow-up
        4
        ·
        1 year ago

        You are completely right, and the Open Science movement is catching on. The idea is to give everyone access to the (anonymised) data and use only tools that are freely accessible, even to scientists from developing countries without Microsoft licenses, so that they too can rerun your analyses and verify your results. You shouldn’t be getting downvoted.

        • emergencyfood@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          1 year ago

          In science, it is important to have verifiable and replicable results. This means everything you use - from ingredients to software - should be transparent. We can’t examine Excel’s source code, so we don’t know if it is working as it claims to be. Most scientific disciplines are moving towards open source, open access etc., and you can’t use Excel in fields like physics or mathematical biology. But molecular biology is a bit of a holdout.

  • macrocephalic@lemmy.world
    link
    fedilink
    English
    arrow-up
    33
    arrow-down
    2
    ·
    1 year ago

    Now if only it would stop dropping leading zeros unless you ask it, and we got rid of the MM/DD/yyyy date format entirely.

    • theparadox@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 year ago

      Now if only it would stop dropping leading zeros unless you ask it

      That appears to actually be a feature.

        • theparadox@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          1 year ago

          Apparently our typical installer for Visio 2016 and our 365 license use “incompatible installers” so it is going to be a pain in the ass for me to have both installed at the same time. Thankfully I’m trusted by IT so I might be able to just do it myself.

  • MonkderZweite@feddit.ch
    link
    fedilink
    English
    arrow-up
    27
    arrow-down
    1
    ·
    1 year ago

    20 years after the problem was first reported.

    Meaning there’s still hope for XDG support in Firefox?

  • Kethal@lemmy.world
    link
    fedilink
    English
    arrow-up
    25
    arrow-down
    1
    ·
    1 year ago

    Microsoft fixes one of the Excel features that wreck scientific data.

  • CatLikeLemming@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    23
    arrow-down
    4
    ·
    1 year ago

    This isn’t a fix. Excel wasn’t meant for this. While I do understand it’s convenient as a database, unless you’re doing something unimportant and small you just really should use something proper. And even now that this “problem” is gone, I am certain there are still more things that cause trouble. You can not satisfy everyone and Excel was just… not made for gene info storage.

    Even if you don’t want to use stuff that isn’t Microsoft Office, that comes with Microsoft Access, which is a proper database management system. It’s literally in the same software package, so why do people refuse to use it?

    • zalgotext@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      36
      arrow-down
      1
      ·
      1 year ago

      Why would you need a full blown (shitty) relational database management system to store gene info? Excel should be just fine for storing data in arbitrary tables. It shouldn’t make assumptions about your data by default, and changing values that look like they’re in a specific format should be opt-in, not default behavior.

      • CatLikeLemming@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        2
        ·
        1 year ago

        That is not what it was made for. It was made to do shenanigans with values like doing math on them and plotting graphs. If you merely want data storage, use a table. I agree, a database is overkill for most things, but that doesn’t change the fact that Excel is the wrong tool for the job. Maybe if they added a table mode where it’s basically just a frontend for a csv it’d work, but right now I’d still say it’s better to use a scalpel than a hammer, even if scissors do the trick just fine.

      • Hawk@lemmynsfw.com
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        Sqlite and duckdb are great, I don’t know about shitty.

        You don’t get the visual feedback but the query language, reliability and python interface are all top notch.

      • CatLikeLemming@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        1 year ago

        I’ve never used Access personally, so I don’t know if it’s any good or not, I’m just frustrated by people using spreadsheets for data storage.

        • Evotech@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          1 year ago

          It’s been years since I used it tbh. But “access bad” is a meme for a reason

    • Echo Dot@feddit.uk
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      3
      ·
      1 year ago

      I’m so sick of people using Excel for things it’s not supposed to be used for.

      As a general rule if you’re not actually making use of the formula tool, you probably don’t need to be using Excel.

  • Deebster@programming.dev
    link
    fedilink
    English
    arrow-up
    16
    ·
    edit-2
    1 year ago

    It’s too late though, scientists already had to rename the genes. Although of course there are other things that can trigger it, not just in science.

  • detalferous@lemm.ee
    link
    fedilink
    English
    arrow-up
    14
    ·
    1 year ago

    From the article:

    The problem of Excel software (Microsoft Corp., Redmond, WA, USA) inadvertently converting gene symbols to dates and floating-point numbers was originally described in 2004 [1]. For example, gene symbols such as SEPT2 (Septin 2) and MARCH1 [Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase] are converted by default to ‘2-Sep’ and ‘1-Mar’, respectively. Furthermore, RIKEN identifiers were described to be automatically converted to floating point numbers (i.e. from accession ‘2310009E13’ to ‘2.31E+13’). Since that report, we have uncovered further instances where gene symbols were converted to dates in supplementary data of recently published papers (e.g. ‘SEPT2’ converted to ‘2006/09/02’). This suggests that gene name errors continue to be a problem in supplementary files accompanying articles.

  • Etterra@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    2
    ·
    1 year ago

    Office Libre is free, and modern MS Office UIs looks like dog dookie. OL can also save in Excel format if you want.

    Hey look at that, I found a solution that didn’t require they change their entire process or have to wait for Microsloughed to get their act together.

    • Moneo@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Libre calc is one of the worst UXs I have ever had the displeasure of using. I can’t imagine anyone recommending it is using it as their main work application.