Oracle GoldenGate 23ai: The Replicat Process
In this episode, Lois Houston and Nikita Abraham, along with Nick Wagner, Senior Director of Product Management, dive into the Replicat process in Oracle GoldenGate 23ai. They discuss how Replicat applies changes to the target database, highlighting the different types: Classic, Coordinated, and Parallel Replicat. Oracle GoldenGate 23ai: Fundamentals: https://mylearn.oracle.com/ou/course/oracle-goldengate-23ai-fundamentals/145884/237273 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. ---------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:25 Lois: Hello and welcome to another episode of the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Team Lead: Editorial Services. Nikita: Hi everyone! If you’ve been listening to us these last few weeks, you’ll know we’ve been discussing the fundamentals of GoldenGate 23ai. Today is going to be all about the Replicat process. Again, this is something we’ve discussed briefly in earlier episodes, but just to recap, the Replicat process applies changes from the source database to the target database. It's responsible for reading trail files and applying the changes to the target system. 01:04 Lois: That’s right, Niki. And we’ll be chatting with Nick Wagner, Senior Director of Product Management for Oracle GoldenGate. Hi Nick! Thanks for joining us again today. Let’s get straight into it. Can you give us an overview of the Replicat process? Nick: One thing that's very important is the Replicat is extremely chatty with that target database. So it's going to be going in and trying to make lots of little transactions on that system. The Replicat process only issues single row DML. So if you can imagine a source database that's generating hundreds of thousands of changes per second, we're going to have to have a Replicat process that can do 100,000 changes per second on that target site. That means that it's going to have to send a lot of little one record commands. And so we've got a lot of ways to optimize that. But in all situations you're really going to want very, very low ping time between that Replicat process and that target database. This often means that if you're going to be running GoldenGate in a cloud, you're going to want the Cloud GoldenGate environment to be running in that target data center, wherever that target database is. 02:06 Lois: What are the key characteristics of the process, Nick? Nick: Replicat process is going to read the changes from the trail file and then apply them to the target system, just like any database user would. It's not doing anything special where it's going under the covers and trying to apply directly to the database blocks. It's just applying regular standard insert, update, delete, and DDL statements to that target database. A single trail file does support high volume of data replication activity depending on the type of Replicat. Replicats do preserve the boundary of their transactions. So in the situations, by default, a transaction that's on the source, let's say five inserts followed by a commit will remain five inserts followed by a commit on the target site. There are some operations and changes that do affect this, but they're not turned on by default. There are things like group transactions that allows you to group multiple transactions into a single commit. This one could actually improve performance in some cases. We also have batch SQL that can change the boundaries of a transaction as well. And then in a Parallel Replicat, you actually have the ability to split a large transaction into multiple chunks and apply those chunks in Parallel. So again, by default, it's going to preserve the boundaries, but there are ways to change that. And then the Replicats use a checkpoint table to help with recovery and to know where they're applying data and what they've done. The other thing in here is, like an Extract process can write to multiple trails and write subsets of data to each one, a Replicat can only process a single set of trail files at once. So it's going to be attached to a specific trail file like trail file AB, and will only be able to read changes from trail file AB. If I have multiple trails that need to be applied into a target system, then I have to set up multiple Replicats to handle that. 03:54 Nikita: So, what are the different Replicat types, Nick? Nick: We have three types in the product today. We have Classic Replicat, which should really only be used for testing purposes or in environments that don't support any of the other specialized Replicats. We have Coordinated Replicat, which is a high speed apply mechanism to apply data into a target system. It does have some parallelism in it, but it's user defined parallelism. And then we have our flagship and that's Parallel Replicat. And this is the most performant lowest latency Replicat that we have. 04:25 Lois: Ok. Let’s dive a little deeper into each of them, starting with the Classic Replicat. How does it work? Nick: It's pretty straightforward. You're going to have a process that reads the trail files, and then in a single threaded fashion it's going to take the trail file logical change record, convert it to an insert, update, or delete, and then apply it into that target database. Each transaction that it does is preceded by a change to the checkpoint table. So when the transaction that the Replicat is currently doing is committed, that checkpoint table update also gets committed. That way when the Replicat restarts, it knows exactly what transaction it left off and how it last applied the record. And all the Replicats work the same way with regards to checkpoint tables. They each have their own little method of ensuring that the transaction they're applying is also reflected within the checkpoint table so that when it restarts, it knows exactly where it happened. That way, if a Replicat dies in the middle of a transaction, it can be restarted without any duplicate data or without missing data. 05:29 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You’ll find training on everything from multicloud, database, networking, and security to artificial intelligence and machine learning, all free for our subscribers. So, what are you waiting for? Pick a topic, head over to mylearn.oracle.com, and get started. 05:53 Nikita: Welcome back! Moving on, what about Coordinated Replicat? Nick: The Coordinated Replicat is going to read from a set of trail files. It's going to have multiple threads that do this. So you have your base thread, your coordinated thread that's going to be thread 1. It's going to process the data and apply it into that target database. You then have thread 2, 4, 5, 6, and so on. When you set up your Replicat parameter file for a Coordinated Replicat, the map commands that maps from one table on the source to a table on the target has an additional option. So you'll have an option called a range or thread range. With the range and thread range option, you can actually tell which table to go into which thread. 06:38 Lois: Can you give us an example of this? Nick: So I could say map Scott.M into thread 1 and I want Scott.Dept into thread 2. Well, this is fantastic until you realize that Scott.M and Scott.Dept have a foreign key between them or a child dependencies, parent-child relationships. What that means is that now I'm going to have to disable that foreign key on the target site, because there's no way for GoldenGate to coordinate the changes in one thread to another thread. And so you really have to be careful on how you pair your tables together. If you don't have any referential integrity on that target database, then you can use parallel coordinated Replicat to really high degrees of parallelism, and you get some very good performance out of it. Let's say that you have a table that's really got too much data for even a single thread to process, that's where the thread range comes in. And thread range command will use something like the table's primary key to split transactions on that table across multiple threads. So I can say, hey, take my table Scott.M and I want to spread transactions across threads 10, 11, 12, 13, and 14 and then spread them evenly based on the primary key. And Coordinated Replicat will do that. So you can get some very high performance numbers out of it and you can really fine tune the tables, especially if you know the amount of data coming into each one. While this does work great, we observed that a lot of customers really don't know their applications to that level of detail, and so we needed a different method to push data into that target database, where we could define the parallelism based on the database expectations. So instead of the customer having to try and figure out what are the parent-child relationships, why can't GoldenGate do it for me? And that led to Parallel Replicat. 08:26 Nikita: And what are the benefits and features of the Parallel Replicat process? Nick: So Parallel Replicat has been around for quite a few years now. It supports most targets, it was Oracle initially, but now it's been expanded out to a lot of the non-Oracle targets and even some of the nonrelational database targets. It has absolutely the best performance of any Replicat process out there. You can use it to split large transactions as well. So if all of a sudden you have a batch job that does a million inserts followed by a single commit, I can split that across 10 threads, each thread doing 100,000 inserts. And it's aware of your transaction dependencies, that's the cool thing. So in Coordinated Replicat, you had to worry about how to split your tables up, in Parallel Replicat, we do it for you. 09:11 Lois: And how does Parallel Replicat work? Nick: So there's three main processes to the Parallel Replicat. You have your first is the mapper process. This is going to be responsible for taking the data out of the trail files and putting them into kind of our collator and scheduler box. As transactions go from the trail file, they get put into this box in memory where they're processed. There's a collator process that will look at these processes and go, OK, as they're coming in, let me read some of the data in them to determine how they can be applied in Parallel or not. And so the collator process understands the foreign key dependencies on that target database. And it's able to say, hey, I know that my two tables are these two tables, have a parent-child relationship, I need to make sure that changes on those tables go in the correct order. And so if all of a sudden you see an insert using the parent record and then another insert into the child record and they're mapped together, GoldenGate will ensure that those two transactions go serially and not parallel where they could get applied out of order. There's then a scheduler process that's going to look at this and say, OK, now that I'm taking transactions from the collator process, who's already identified whether or not transactions can be applied in parallel or serial, and I'm going to feed them off to applier processes that are ready and waiting for me to apply those changes into the database. And then the applier process is waiting for the scheduler process to send its transactions and say, OK, what's my next one? Where's the next transaction I should be working on and applying? And then the applier process is the one actually applying the changes into that target database, again, just using standard DML operations. So there's a lot of benefits to this one. You don't need to worry about your foreign key dependencies, you can leave all your foreign keys enabled. The collator process will actually use information within the trail file to determine which transactions can be applied in parallel, and which one needs to be applied serially. 11:13 Lois: Thank you, Nick, for this insightful conversation. There’s loads more to discover about the Replicat process, and you can do that by heading over to mylearn.oracle.com and searching for the Oracle GoldenGate 23ai: Fundamentals course. Nikita: In our next episode, Nick will take us through managing Extract Trails and Files. Until then, this is Nikita Abraham… Lois: And Lois Houston, signing off! 11:37 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.