Many software organizations have a deployment process that is several hours long. They deploy a new release once a week at most. Many orgs also avoid deploying on Fridays because they need the next day to fix issues during launch.
I used to believe this was just how things needed to be done. That changed when I was at a company that started off in this state and ended up deploying 6+ times a day. There were a number of initiatives that got us there, but the biggest one was automated testing.

When most people think about automated testing, they tend to think of unit testing and code coverage. I can promise you that 99% code coverage of unit testing provides nearly zero confidence in being able to deploy a release. Having a high code coverage feels good and gives someone in management a number to brag about, but it is a vanity metric. The reasoning is simple. Unit testing is effective when you have a fairly complex algorithm. Example: parsing text out of an email or running complex mathematical formulas like Black Scholes. Odds are that you are working on a web or mobile application where the vast majority of the product relies heavily on writing and reading data from a database. Database interactions are "mocked" in unit testing, leaving them untested.
There is a valid reason to mock out database interactions in unit tests. The idea of unit tests is to have the ability to run them in isolation as many times as you'd like and achieve a consistent result. If a database was involved, then there would be new data added after every test run. That alone removes the conditions for a consistent result.
The problem is that for a production system at scale, likely errors are going to be issues with database interactions, like problematic queries on big tables, or issues with the interactions between subsystems. These are often hard to debug, hard to fix, and provide a huge negative experience to the user.
Example 1: Your code writes to the database. You accidentally reverse two fields in the query. Your unit tests will pass because they mocked out the database. Your manual testing may still pass if there's a heavy overlap of valid values for those two columns (e.g. primary keys of 1, 2, and 3). Your users will eventually encounter unfortunate errors.
Example 2: A common tactic to get better scalability is to send only write queries to a primary database. All read queries would then get sent to a read replica. In a dev, qa, or staging environment the replication from primary to read replica is ~2ms or less. This may even be the case for the production system at first. Then your marketing team runs a big promotion and traffic spikes. The replication from primary to read replica can jump to several seconds, maybe even a minute. That means if you create a new item in the database and then redirect the user to a web page that includes the new item's primary key in the url, they're going to see a 404 because the item hasn't been created in the read replica yet.
Example 3: I was on a team that built a feature that required adding a new column to a database table. Only recent data mattered so we made it a nullable column and defaulted all the old rows to null.
The problem was that the table had 70 million rows and we created an index on this new column. At 70 million rows, something came into play that we never had to think about during development for our initial tests to pass: cardinality. When a lot of rows have the same value for an index, the cardinality is low. MySQL becomes unpredictable at low cardinality. A search for an exact value on that table worked. A search with a "where in ()" with 5 values worked. A search with a "where in()" with 6+ values gave us seemingly random rows that did not match the query. The query also took several seconds, likely because MySQL gave up on using the index and did a full table scan.
Doing more than unit tests gets really complicated really fast though. No one has "the answer" to doing it well because it will be bespoke to the type of system being built. I will go over what's worked for me and what situations those testing patterns were used for.
A common way of getting better tests is to use Selenium or Playwright and run a test of the system as a whole. That's what we did at the company I mentioned earlier. The logic behind this type of testing is simple and easy to understand. You set up an actual environment with everything you need to run the system: database, web servers, message queues, etc. You then have automation that loads your application in a real browser and have it interact with your application the way a real user would. To get consistency in test results, any datastores such as a postgres database or an elasticsearch cluster, have all their data deleted to give the next test run a blank slate. This test is as realistic as an automated test can get.
You may already have thought of the issues with this test though as there are many. The first is speed. Loading a browser and clicking things in your app is really slow. If you want to have a comprehensive set of tests, they could take hours to actually run. To make it faster, the tests must be broken up into groups and run concurrently. That leads to problem two. Unit tests may not be enough, but the logic behind them is still valid. Some of your tests may affect the data used by other tests, resulting in a different result with every test run. You could try to coordinate the test runs, but in that situation you'll likely find yourself spending more time fixing tests than doing anything else.
The solution here is to create an entirely separate environment for each test group. It works, but now you need a lot more infrastructure for your tests. That could raise your costs beyond a reasonable level. It's also not trivial to maintain that infrastructure so those costs include human labor. You could use one of the many SaaS services that provide this, but they also cost quite a bit.
The problems with this type of testing result in a lot of teams giving up on it. This type of testing gives us a high amount of confidence theoretically, but the theory doesn’t hold up if people decide to not write tests. Fortunately, we have other options.
Let's start with a simple application: a single page application frontend, a webserver, and a database. For most systems, the bottleneck in that setup should be the overhead of making actual web requests. Unless you redesign how the internet works, that overhead isn't going away. There's only one way to get rid of it from our tests: mock out all calls to the webserver. I know I talked about how mocks reduce confidence, but our choice now isn't more confidence vs less confidence. A lot of people these days seem to be giving up on end to end tests these days. That makes our choice to be less confidence vs not having the tests at all.
We can keep everything else about our frontend testing the same. With Jest and Enzyme, you can effectively render the entire frontend application. You can still simulate clicking through various buttons or links. Some things are even easier to test with this than with a true end to end test. You can manually set the order of operations for "server" returns which would let you test potential concurrency issues. The important thing is that you can still test realistic user interactions and ensure that the front end logic works as intended.
The backend tests is in most cases perfectly realistic. Start a server and make calls to API endpoints. A relational database is easy to deal with because you can use a real one. Every test run would just clear out all the tables and recreate the schema. In cases where your user's data are isolated, e.g. in an accounting app every clients' data is isolated from others, you can just create a new user that's named after your test. That will prevent any data from one test affecting another test.
For example, if I have a test called "test_loan_amortization_calculation", I would create a user called "test_loan_amortization_calculation user" and a client called "test_loan_amortization_calculation client". So long as all my tests have unique names, I should never see a collision in test data.
Global data, such as a top 10 list on an e-commerce site, is a bit more complex. For these you may need to create several databases and use a database for each test to prevent data collisions. You should not need more hardware as most relational databases will let you create multiple databases that run on the same machine. And in this situation you can even put the database on the same machine as the web server, which removes some more overhead and allows faster test execution.
A similar strategy can be used for other datastores. With message queues, you can create a new queue unique to a given test. With file storage, you can give each test it's own unique "folder" to act as the base directory (folder is in quotes because there are no actual folders if you're using something like S3). Most distributed caches will also have some mechanism that will allow you to isolate data to a given test.
Things get a bit more gnarly when attempting to test issues that usually only occur on production. You can set a mysql server and a postgres server to have a replication delay. You're not necessarily going to want to have every test use that setup because it requires the test having a wait which increases the test execution time. Yet, the issues with replication delay always happen when you least expect it. There's no easy answer here on where the line should be. This is where experience, institutional knowledge, and intution come into play. You have to think critically about your situation and make a judgement call.
Things get worse if you have a lot of third party API integrations. How hard that ends up being depends on whether they have a testing sandbox, how strict their API rate limits are, performance of their API, etc. The nice thing about most third party APIs is that they don't often change the return values in a way that is not backwards compatible. That would force all the developers to update their code and they likely don't want that. That makes mocking the API call to a third party a bit more trusthworthy than a mock to your own database.
There are many more situations that make testing difficult, but it doesn't change the fact that you should be able to get >95% of the confidence full end to end tests give you by testing your frontend and backend separately. Doing so will increase the speed of those tests significantly. I have test suites that would likely take hours if they were full end to end tests, but run in less than a minute using this method. That's a pretty good tradeoff in my opinion.