Podcasts EducaciónOracle University Podcast

Escucha gratis este podcast en la aplicación:

radio.net

Sleep timer

Favoritos

Descarga gratuita en la App Store

Oracle University Podcast

Oracle Corporation

Educación Tecnología

Oracle University Podcast

Último episodio

Episodios disponibles

5 de 122

Oracle GoldenGate 23ai: Managing Extract Trails and Files
In this episode of the Oracle University Podcast, Lois Houston and Nikita Abraham explore the intricacies of trail files in Oracle GoldenGate 23ai with Nick Wagner, Senior Director of Product Management. They delve into how trail files store committed operations, preserving the order of transactions and capturing essential metadata. Nick explains that trail files are self-describing, containing database and table definition records, making them easier to work with. The episode also covers trail file management, including the purge trail task and the ability to download trail files directly from the web UI, providing flexibility in various deployment scenarios. Oracle GoldenGate 23ai: Fundamentals: https://mylearn.oracle.com/ou/course/oracle-goldengate-23ai-fundamentals/145884/237273 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. ------------------------------------------------------------- Podcast Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:25 Nikita: Welcome back to another episode of the Oracle University Podcast! I’m Nikita Abraham, Team Lead of Editorial Services with Oracle University, and I’m joined by Lois Houston, Director of Innovation Programs. Lois: Hi there! In our last episode, we discussed the Replicat process. That was a good introduction, and you should give it a listen if you’re interested in the fundamentals of GoldenGate 23ai. 00:49 Nikita: Nick Wagner, Senior Director of Product Management for Oracle GoldenGate, is back with us today to talk about how to manage Extract Trails and Files. Hi Nick, it’s a pleasure to have you with us. So, we’ve spoken about trail files in our earlier episodes. But can you tell us about the kind of information that’s actually stored in these files? Nick: The trail files contain committed operations only. In an Oracle environment, the extract process is actually able to understand and read both committed and uncommitted transactions. It holds the uncommitted activity and the cache manager associated settings. As a transaction is committed, it's then flushing that information to the trail file. All this information in the transaction is preserved, so we have not only the transaction itself, but the order of the operations within that transaction. All the changed columns, including the primary key and any scheduling columns are also captured, and this is controlled by the log or sub calls parameter and other parameters within the extract process. The data captured depends on settings in the extract file and you can include additional information, including tokens. The trail files also contain metadata information, where the trail files are what we call self-describing, which means that as we start reading in new objects, we start writing the definition of those objects into the trail file themselves. 02:11 Lois: Nick, what does the structure of a trail file look like? Nick: The trail files have a header information, which simply keeps information about what version of trail file this is, where GoldenGate is handling it, information about that trail file itself. You'll also have three different types of records. You'll have a data record, which contains the actual before and after images, the table update statement, the type of operations. You have a database definition record, which includes information about the database that GoldenGate is capturing from, and then you'll also have a table definition record. As GoldenGate starts up and creates a trail file for the first time, it's always going to write the trail file header and associated database definition record, and then it's going to start reading data out of the source database. As it encounters a new table for the first time in that trail file, it's going to write the metadata for that object as well. This makes it very easy. This means that within a single trail file, any data records I have in there, that trail file also contains the associated table definition record for that table. 03:20 Nikita: Let’s talk about compatibility between different versions of GoldenGate. How do the trail files fit into that? Nick: The GoldenGate trail files themselves have information built into them to help understand what they're compatible with as far as GoldenGate releases. If I'm replicating from a new version of GoldenGate to an older version of GoldenGate, I can set the format release value to tell the extract process to write these trail files in an older version. In this case, I can simply say format release 19 and it'll write the trail files in the 19C version. If you're going from an older version to a newer version of GoldenGate, it's automatically able to process the old version trail file structure without having to change anything. 04:02 Nikita: Now, GoldenGate is constantly generating these trail files as it runs. So, how do we manage them over time. What’s the cleanup process like? Nick: Within the GoldenGate microservices architecture, the web UI has a way to manage your trail files and clean them up. So there's a purge trail task that allows you to go in and set up rules on how long to keep the trail files around for before they're purged. We have customers that want to reposition extract and so you'll want to make sure that you keep trail files around long enough so that you can handle any reposition that you intend to do. Trail files will always be kept around even past their purge rules if they're still needed for GoldenGate recovery. Also new to GoldenGate 23ai is the ability to download trail files directly from the web UI. This is extremely helpful if you're using OCI GoldenGate or you don't have OS access on the machine where GoldenGate is running. 04:56 Lois: What if we want to look inside these trail files and actually see the individual records? How can we examine them directly? Nick: Well, that can be seen using a tool called Logdump. Logdumps utility, that's installed in your ogghome/bindirectory. It has online help as well as full documentation. 05:14 Lois: And how do you use Logdump? Nick: So to use Logdump, the first thing you'll do is launch the service and then you'll open a trail file. You would specify the full path of the trail file along with the path name and the sequence number of that trail file. Once you've set it up, you'll position into that location within that trail file. Normally people position at record 0 and then they'll do a next, which allows them to get the next information. There's a couple other commands in there, such as POS, which allows you to set the position, scan for header, allows you to scan to the next record if you position within the middle of a record. So, when you first run Logdump, it's not going to have very much information available for you. So, you'll want to turn on a couple of settings. You'll want to enable File Header, GHDR, and Detail to be able to see more information about what's going on within that record within the trail. Logdump also has the ability to show you the actual ASCII values as opposed to the text value. This is very useful for dealing with multibyte data as well as unprintable characters. You can also specify the length of the record to show for each Logdump record. And this is in the reclen parameter, 280 is a rough number and it will usually show about enough that'll fit on a single page. 06:40 Join the Oracle University Learning Community and tap into a vibrant network of over 1 million members, including Oracle experts and fellow learners. This dynamic community is the perfect place to grow your skills, connect with likeminded learners, and celebrate your successes. As a MyLearn subscriber, you have access to engage with your fellow learners and participate in activities in the community. Visit community.oracle.com/ou to check things out today! 07:12 Nikita: Welcome back! Nick, earlier you mentioned data records in trail files. What kind of information do these records contain? Nick: When we start looking at data records within the trail file, we're going to see a little bit different format. It's going to give us information about what type of operation this was, the before, after indicator, is this an after image or a before image? It's going to give us the time information. It's going to tell us what table this record was on and the values within that record. We can also count the number of records in a trail using the count option that tells us how many records in the trail, the average size, and then the operation type breakdown. We can also get some additional details on that count, including having it broken out by table and operation within those tables. This is really useful if you're trying to track down a missing record or an out of sync condition and you want to make sure that GoldenGate is appropriately capturing all the changes. We can also use an option within Logdump called scan for metadata. The shorthand for this command is sfmd, it allows you to scan for something like a database definition record. You may have multiple database definition records versions within the same trail file. It tells us what type of database this was, the character set, which is important because this information is used by the replica when it goes to apply changes into the target database. We can also scan for metadata to get table definition records. The data types are numeric values that are associated with an internal GoldenGate data type. 08:43 Lois: Thank you, Nick, for your insights. There’s a lot more you can find in the Oracle GoldenGate 23ai: Fundamentals course on MyLearn. So, make sure you check that out by visiting mylearn.oracle.com. Nikita: Join us next week for a discussion on parameters, data selection, filtering, and transformation. Until then, this is Nikita Abraham… Lois: And Lois Houston signing off! 09:07 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
--------
9:36
Oracle GoldenGate 23ai: The Replicat Process
In this episode, Lois Houston and Nikita Abraham, along with Nick Wagner, Senior Director of Product Management, dive into the Replicat process in Oracle GoldenGate 23ai. They discuss how Replicat applies changes to the target database, highlighting the different types: Classic, Coordinated, and Parallel Replicat. Oracle GoldenGate 23ai: Fundamentals: https://mylearn.oracle.com/ou/course/oracle-goldengate-23ai-fundamentals/145884/237273 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. ---------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:25 Lois: Hello and welcome to another episode of the Oracle University Podcast. I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Team Lead: Editorial Services. Nikita: Hi everyone! If you’ve been listening to us these last few weeks, you’ll know we’ve been discussing the fundamentals of GoldenGate 23ai. Today is going to be all about the Replicat process. Again, this is something we’ve discussed briefly in earlier episodes, but just to recap, the Replicat process applies changes from the source database to the target database. It's responsible for reading trail files and applying the changes to the target system. 01:04 Lois: That’s right, Niki. And we’ll be chatting with Nick Wagner, Senior Director of Product Management for Oracle GoldenGate. Hi Nick! Thanks for joining us again today. Let’s get straight into it. Can you give us an overview of the Replicat process? Nick: One thing that's very important is the Replicat is extremely chatty with that target database. So it's going to be going in and trying to make lots of little transactions on that system. The Replicat process only issues single row DML. So if you can imagine a source database that's generating hundreds of thousands of changes per second, we're going to have to have a Replicat process that can do 100,000 changes per second on that target site. That means that it's going to have to send a lot of little one record commands. And so we've got a lot of ways to optimize that. But in all situations you're really going to want very, very low ping time between that Replicat process and that target database. This often means that if you're going to be running GoldenGate in a cloud, you're going to want the Cloud GoldenGate environment to be running in that target data center, wherever that target database is. 02:06 Lois: What are the key characteristics of the process, Nick? Nick: Replicat process is going to read the changes from the trail file and then apply them to the target system, just like any database user would. It's not doing anything special where it's going under the covers and trying to apply directly to the database blocks. It's just applying regular standard insert, update, delete, and DDL statements to that target database. A single trail file does support high volume of data replication activity depending on the type of Replicat. Replicats do preserve the boundary of their transactions. So in the situations, by default, a transaction that's on the source, let's say five inserts followed by a commit will remain five inserts followed by a commit on the target site. There are some operations and changes that do affect this, but they're not turned on by default. There are things like group transactions that allows you to group multiple transactions into a single commit. This one could actually improve performance in some cases. We also have batch SQL that can change the boundaries of a transaction as well. And then in a Parallel Replicat, you actually have the ability to split a large transaction into multiple chunks and apply those chunks in Parallel. So again, by default, it's going to preserve the boundaries, but there are ways to change that. And then the Replicats use a checkpoint table to help with recovery and to know where they're applying data and what they've done. The other thing in here is, like an Extract process can write to multiple trails and write subsets of data to each one, a Replicat can only process a single set of trail files at once. So it's going to be attached to a specific trail file like trail file AB, and will only be able to read changes from trail file AB. If I have multiple trails that need to be applied into a target system, then I have to set up multiple Replicats to handle that. 03:54 Nikita: So, what are the different Replicat types, Nick? Nick: We have three types in the product today. We have Classic Replicat, which should really only be used for testing purposes or in environments that don't support any of the other specialized Replicats. We have Coordinated Replicat, which is a high speed apply mechanism to apply data into a target system. It does have some parallelism in it, but it's user defined parallelism. And then we have our flagship and that's Parallel Replicat. And this is the most performant lowest latency Replicat that we have. 04:25 Lois: Ok. Let’s dive a little deeper into each of them, starting with the Classic Replicat. How does it work? Nick: It's pretty straightforward. You're going to have a process that reads the trail files, and then in a single threaded fashion it's going to take the trail file logical change record, convert it to an insert, update, or delete, and then apply it into that target database. Each transaction that it does is preceded by a change to the checkpoint table. So when the transaction that the Replicat is currently doing is committed, that checkpoint table update also gets committed. That way when the Replicat restarts, it knows exactly what transaction it left off and how it last applied the record. And all the Replicats work the same way with regards to checkpoint tables. They each have their own little method of ensuring that the transaction they're applying is also reflected within the checkpoint table so that when it restarts, it knows exactly where it happened. That way, if a Replicat dies in the middle of a transaction, it can be restarted without any duplicate data or without missing data. 05:29 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You’ll find training on everything from multicloud, database, networking, and security to artificial intelligence and machine learning, all free for our subscribers. So, what are you waiting for? Pick a topic, head over to mylearn.oracle.com, and get started. 05:53 Nikita: Welcome back! Moving on, what about Coordinated Replicat? Nick: The Coordinated Replicat is going to read from a set of trail files. It's going to have multiple threads that do this. So you have your base thread, your coordinated thread that's going to be thread 1. It's going to process the data and apply it into that target database. You then have thread 2, 4, 5, 6, and so on. When you set up your Replicat parameter file for a Coordinated Replicat, the map commands that maps from one table on the source to a table on the target has an additional option. So you'll have an option called a range or thread range. With the range and thread range option, you can actually tell which table to go into which thread. 06:38 Lois: Can you give us an example of this? Nick: So I could say map Scott.M into thread 1 and I want Scott.Dept into thread 2. Well, this is fantastic until you realize that Scott.M and Scott.Dept have a foreign key between them or a child dependencies, parent-child relationships. What that means is that now I'm going to have to disable that foreign key on the target site, because there's no way for GoldenGate to coordinate the changes in one thread to another thread. And so you really have to be careful on how you pair your tables together. If you don't have any referential integrity on that target database, then you can use parallel coordinated Replicat to really high degrees of parallelism, and you get some very good performance out of it. Let's say that you have a table that's really got too much data for even a single thread to process, that's where the thread range comes in. And thread range command will use something like the table's primary key to split transactions on that table across multiple threads. So I can say, hey, take my table Scott.M and I want to spread transactions across threads 10, 11, 12, 13, and 14 and then spread them evenly based on the primary key. And Coordinated Replicat will do that. So you can get some very high performance numbers out of it and you can really fine tune the tables, especially if you know the amount of data coming into each one. While this does work great, we observed that a lot of customers really don't know their applications to that level of detail, and so we needed a different method to push data into that target database, where we could define the parallelism based on the database expectations. So instead of the customer having to try and figure out what are the parent-child relationships, why can't GoldenGate do it for me? And that led to Parallel Replicat. 08:26 Nikita: And what are the benefits and features of the Parallel Replicat process? Nick: So Parallel Replicat has been around for quite a few years now. It supports most targets, it was Oracle initially, but now it's been expanded out to a lot of the non-Oracle targets and even some of the nonrelational database targets. It has absolutely the best performance of any Replicat process out there. You can use it to split large transactions as well. So if all of a sudden you have a batch job that does a million inserts followed by a single commit, I can split that across 10 threads, each thread doing 100,000 inserts. And it's aware of your transaction dependencies, that's the cool thing. So in Coordinated Replicat, you had to worry about how to split your tables up, in Parallel Replicat, we do it for you. 09:11 Lois: And how does Parallel Replicat work? Nick: So there's three main processes to the Parallel Replicat. You have your first is the mapper process. This is going to be responsible for taking the data out of the trail files and putting them into kind of our collator and scheduler box. As transactions go from the trail file, they get put into this box in memory where they're processed. There's a collator process that will look at these processes and go, OK, as they're coming in, let me read some of the data in them to determine how they can be applied in Parallel or not. And so the collator process understands the foreign key dependencies on that target database. And it's able to say, hey, I know that my two tables are these two tables, have a parent-child relationship, I need to make sure that changes on those tables go in the correct order. And so if all of a sudden you see an insert using the parent record and then another insert into the child record and they're mapped together, GoldenGate will ensure that those two transactions go serially and not parallel where they could get applied out of order. There's then a scheduler process that's going to look at this and say, OK, now that I'm taking transactions from the collator process, who's already identified whether or not transactions can be applied in parallel or serial, and I'm going to feed them off to applier processes that are ready and waiting for me to apply those changes into the database. And then the applier process is waiting for the scheduler process to send its transactions and say, OK, what's my next one? Where's the next transaction I should be working on and applying? And then the applier process is the one actually applying the changes into that target database, again, just using standard DML operations. So there's a lot of benefits to this one. You don't need to worry about your foreign key dependencies, you can leave all your foreign keys enabled. The collator process will actually use information within the trail file to determine which transactions can be applied in parallel, and which one needs to be applied serially. 11:13 Lois: Thank you, Nick, for this insightful conversation. There’s loads more to discover about the Replicat process, and you can do that by heading over to mylearn.oracle.com and searching for the Oracle GoldenGate 23ai: Fundamentals course. Nikita: In our next episode, Nick will take us through managing Extract Trails and Files. Until then, this is Nikita Abraham… Lois: And Lois Houston, signing off! 11:37 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
--------
12:06
Oracle GoldenGate: Distribution Path, Target Initiated Path, Receiver Server, and Initial Load
In this episode, Lois Houston and Nikita Abraham dive into key components of Oracle GoldenGate 23ai with expert insights from Nick Wagner, Senior Director of Product Management. They break down the Distribution Service, explaining how it moves trail files between environments, replaces the classic extract pump, and ensures secure data transfer. Nick also introduces Target Initiated Paths, a method for connecting less secure environments to more secure ones, and discusses how the Receiver Service simplifies monitoring and management. The episode wraps up with a look into Initial Load, covering different methods for syncing source and target databases without downtime. Oracle GoldenGate 23ai: Fundamentals: https://mylearn.oracle.com/ou/course/oracle-goldengate-23ai-fundamentals/145884/237273 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. ----------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:25 Nikita: Welcome to the Oracle University Podcast! I’m Nikita Abraham, Team Lead of Editorial Services with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hey there! Last week, we spoke about the Extract process and today we’re going to spend time discussing the Distribution Path, Target Initiated Path, Receiver Server, and Initial Load. These are all critical components of the GoldenGate architecture, and understanding how they work together is essential for successful data replication. 00:58 Nikita: To help us navigate these topics, we’ve got Nick Wagner joining us again. Nick is a Senior Director of Product Management for Oracle GoldenGate. Hi Nick! Thanks for being with us today. To kick things off, can you tell us what the distribution service is and how it works? Nick: A distribution path is used when we need to send trail files between two different GoldenGate environments. The distribution service replaces the extract pump that was used in GoldenGate classic architecture. And so the distribution service will send the trail files as they're being created to that receiver service and it will write the trail files over on the target system. The distribution service works in a kind of a streaming fashion, so it's constantly pulling the trail files that the extract is creating to see if there's any new data. As soon as it sees new data, it'll packet it up and send it across the network to the receiver service. It can use a couple of different methods to do this. The most secure and recommended method is using a WebSocket secure connection or WSS. If you're going between a microservices and a classic architecture, you can actually tell the distribution service to send it using the classic architecture method. In that case, it's the OGG option when you're configuring the distribution service. There's also some unsecured methods that would send the trail files in plain text. The receiver service is then responsible for taking that data and rewriting it into the trail file on the target site. 02:23 Lois: Nick, what are some of the key features and responsibilities of the distribution service? Nick: It's responsible for command deployment. So any time that you're going to actually make a command to the distribution service, it gets handled there directly. It can handle multiple commands concurrently. It's going to dispatch trail files to one or more receiver servers so you can actually have a single distribution path, send trail files to multiple targets. It can provide some lightweight filtering so you can decide which tables get sent to the target system. And it also is integrated in with our data streams, our pub and subscribe model that we've added in GoldenGate 23ai. 03:01 Lois: Interesting. And are there any protocols to remember when using the distribution service? Nick: We always recommend a secure WebSocket. You also have proxy support for use within cloud environments. And then if you're going to a classic architecture GoldenGate, you would use the Oracle GoldenGate protocol. So in order to communicate with the distribution service and send it commands, you can communicate directly from any web browser, client software-- installation is not required-- or you can also do it through the admin client if necessary, but you can do it directly through browsers. 03:33 Nikita: Ok, let's move on to the target initiated path. Nick, what is it and what does it do essentially? Nick: This is used when you're communicating from a less secure environment to a more secure environment. Often, this requires going through some sort of DMZ. In these situations, a connection cannot be established from the less secure environment into the more secure environment. It actually needs to be established from the more secure environment out. And so if we need to replicate data into a more secure environment, we need to actually have the target GoldenGate environment initiate that connection so that it can be established. And that's what a target-initiated path does. 04:12 Lois: And how do you set it up? Nick: It's pretty straightforward to set up. You actually don't even need to worry about it on the source side. You actually set it up and configure it from the target. The receiver service is responsible for receiving the trail file data and writing it to the local trail file. In this situation, we have a target-initiated path created. And so that receiver service is going to write the trail files locally and the replicat is going to apply that data into that target system. 04:37 Nikita: I also want to ask you about the Receiver service. What is it really? Nick: Receiver service is pretty straightforward. It's a centrally controlled service. It allows you to view the status of your distribution path and replaces target side collectors that were available in the classic architecture of GoldenGate. You can also get statistics about the receiver service directly from the web UI. You can get detailed information about these paths by going into the receiver service and identifying information like network details, transfer protocols, how many bytes it's received, how many bytes it's sent out. If you need to issue commands from the admin client to the receiver service, you can use the info command to get details about it. Info all will tell you everything that's running. And you can see that your receiver service is up and running. 05:28 Are you working towards an Oracle Certification this year? Join us at one of our certification prep live events in the Oracle University Learning Community. Get insider tips from seasoned experts and learn from others who have already taken their certifications. Go to community.oracle.com/ou to jump-start your journey towards certification today! 05:53 Nikita: Welcome back. In the last section of today’s episode, we’ll cover what Initial Load is. Nick, can you break down the basics for us? Nick: So, the initial load is really used when you need to synchronize the source and target systems. Because GoldenGate is designed for 24/7 environments, we need to be able to do that initial load without taking downtime on the source. And so all the methods that we talk about do not require any downtime for that source database. 06:18 Lois: How do you do the initial load? Nick: So there's a couple of different ways to do the initial load. And it really depends on what your topology is. If I'm doing like-to-like replication in a homogeneous environment, we'll say Oracle-to-Oracle, the best options are to use something that's integrated with GoldenGate, some sort of precise instantiation method that does not require HandleCollisions. That's something like a database backup and restoring it to a specific SDN or CSN value using a Database Snapshot. Or in some cases, we can use Oracle Data Pump integration with GoldenGate. There are some less precise instantiation options, which do require HandleCollisions. We also have dissimilar initial load methods. And this is typically when you're going between heterogeneous environments. When my source and target databases don't match and there isn't any kind of fast unload or fast load utility that I could use between those two databases. In almost all cases, this does require HandleCollisions to be used. 07:16 Nikita: Got it. So, with so many options available, are there any advantages to using GoldenGate's own initial load method? Nick: While some databases do have very good fast load and unload utilities, there are some advantages to using GoldenGate's own initial load method. One, it supports heterogeneous replication environments. So if I'm going from Postgres to Oracle, it'll do all the data type transformation, character set transformation for me. It doesn't require any downtime, if certain conditions are met. It actually performs transformation as the data is loaded, too, as well as filtering. And so any transformation that you would be doing in your normal transaction log replication or CDC replication can also go through the same transformation for the initial load process. GoldenGate's initial load process does read directly from the source tables. And it fetches the data in arrays. It also uses parallel processing to speed up the replication. It does also handle activity on the source tables during the initial load process, so you do not need to worry about quiescing that source database. And a lot of the initial load methods directly built into GoldenGate support distributed application analytics targets, including things like Databricks, Snowflake, BigQuery. 08:28 Lois: And what about its limitations? Or to put it differently, when should users consider using different methods? Nick: So the first thing to consider is system proximity. We want to make sure that the two systems we're working with are close together. Or if not, how are we going to send the data across? One thing to keep in mind, when we do the initial load, the source database is not quiesced. So if it takes an hour to do the initial load or 10 hours, it really doesn't matter to GoldenGate. So that's something to keep in mind. Even though we talk about performance of this, the performance really isn't as critical as one might suspect. So the important thing about data system proximity is the proximity to the extract and replicat processes that are going to be pulling the data out and pushing it across. And then how much data is generated? Are we talking about a database that's just a couple of gigabytes? Or are we talking about a database that's hundreds of terabytes? Do we want to consider outage time? Would it be faster to take a little bit of outage and use some other method to move the data across? What kind of outage or downtime windows do we have for these environments? And then another consideration is disk space. As we're pulling the data out of that source database, we need to have somewhere to store it. And if we don't have enough disk space, we need to run to temporary space or to use multiple external drives to be able to support it. So these are all different considerations. 09:50 Nikita: I think we can wind up our episode with that. Thanks, Nick, for giving us your insights. Lois: If you'd like to learn more about the topics we covered today, head over to mylearn.oracle.com and check out the Oracle GoldenGate 23ai: Fundamentals course. Nikita: In our next episode, Nick will take us through the Replicat process. Until then, this is Nikita Abraham… Lois: And, Lois Houston signing off! 10:14 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
--------
10:43
Oracle GoldenGate 23ai: The Extract Process
The Extract process is the heart of Oracle GoldenGate 23ai, capturing data changes with precision. In this episode, Lois Houston and Nikita Abraham sit down with Nick Wagner, Senior Director of Product Management, to break down Extract’s role, architecture, and best practices. Learn how Extract works across different setups, from running on source databases to using a Hub model for greater flexibility. Additionally, understand how trail files, parameter files, and naming conventions impact performance. Oracle GoldenGate 23ai: Fundamentals: https://mylearn.oracle.com/ou/course/oracle-goldengate-23ai-fundamentals/145884/237273 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. -------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:25 Lois: Hello and welcome to the Oracle University Podcast! I’m Lois Houston, Director of Innovation Programs with Oracle University, and with me today is Nikita Abraham, Team Lead: Editorial Services. Nikita: Hi everyone! Last week, we spoke about installing GoldenGate and today, we’re diving into the Extract process. We’ve discussed it briefly in an earlier episode, but to recap, the Extract process captures changes from the source database and writes them to a trail file. 00:54 Lois: Joining us again is Nick Wagner, Senior Director of Product Management for Oracle GoldenGate. Hi Nick! Before we get into the Extract process, can you walk us through the different architecture options available for GoldenGate. Let’s start with when GoldenGate is installed on the same server as the source database. What are the benefits of this architecture? Nick: There's a couple of advantages to this. It means that GoldenGate can use the same resources on that source database. It means that you don't need another host to support the GoldenGate environment. It also means that GoldenGate can use a bequeathed connection to connect from the Extract process into the source database to make it run faster. The restrictions on this are that the Replicat process is highly communicative with the target database. What that really means is that the Replicat process is constantly doing lots of little transactions. And so the network latency between the Replicat process and the target database should really be around 4 milliseconds or less for optimal performance. So that means that a lot of people can't really run GoldenGate on the source system, even though it's an option, because they need that Replicat latency performance. And so they'll often install GoldenGate on the same server as the target database. In this case, they can use the Replicat to connect using a bequeath connection to that target system, you know that it's going to be highly performant and that latency is not going to be an issue. This works really well because the Extract process has actually been optimized to do remote capture. And so it's actually able to handle 80 milliseconds round trip ping time or less between the actual Extract process and the source database itself. And so a lot of customers will opt for this method, where they're actually running GoldenGate away from the target, or excuse me, away from the source database. 02:44 Nikita: Interesting. And is there an option where you don’t need to install GoldenGate on the actual source or target database? Nick: We also have another architecture pattern called a Hub model. And this is what you would see in something like OCI GoldenGate or OCI Marketplace, or even in third party clouds environments where you don't have the ability to install GoldenGate on the actual source or target database. In these cases, GoldenGate is just going to run on a virtual machine or an environment that you have set up specifically for GoldenGate. Now, this GoldenGate Hub doesn't need to have any database software installed. It doesn't need to have any database information on it. It's simply working as a client. So GoldenGate Extract process is a client connecting into the source database and the Replicat is a client connecting into the target database. And this really gives you a lot of flexibility. However, in some cases, there may be too much of a distance, so you won't be able to get both less than 80 milliseconds on the source side in less than 4 milliseconds on the round trip on the target side. And so in that case, you can have multiple GoldenGate Hubs. And so you would have a Hub on the Extract side and another Hub on the Replicat side. And all these are fully accessible. In this case, you'll actually use the distribution service to send the trail files from one system to another. 04:00 Lois: So, coming to the Extract process, what does it actually do? Nick: The Extract process is configured to capture changes from that source database. In different terminology, it can subscribe to a topic if we're pulling data out of a Kafka queue or a topic or some messaging system like a JMS queue and relational database language, we're pulling database from the database transaction logs. There's a lot of different sources and targets. You can always use the GoldenGate Certification Matrix to determine which sources and targets are supported, and where we can extract data from. The capture process also connects to the source table for initial loads. When we do the initial load, instead of reading from the transaction logs, GoldenGate is actually going to do a select star on that table to get the information it needs for that load. 04:49 Lois: And what about the Extract process group? Nick: The process group is kind of a grouping of the process itself, which is either going to be my Extract or Replicat and associated files. So in an Extract environment, we have our parameter file and a report file and our checkpoint files. The parameter file, the .prm file, is going to list out which objects we're going to capture and how we're going to capture that data. It also controls what we're going to be writing to the trail file and where that trail file exists. The report file is really just a log of what's going on in that Extract process, how it's working, what tables it's encountered. It's used for any troubleshooting to make sure everything is running smoothly. And then you also have the checkpoint files. The checkpoint files and report files should not be modified by the user, the parameter file can be. The checkpoint files are going to include information about where that process is reading from, where it's writing to, and any open transaction that it's tracking as part of the bounded recovery or cache manager functionality. 05:54 Nikita: How do you go about creating an Extract group? Nick: The Extract group can be created by doing an Add Extract command or through the UI. Each Extract must also have a unique name. On the Extract process side, there is an eight-character hard limit for the name itself. And so, you can’t have an Extract process called my Extract for today is called Nick. More than eight characters. 06:17 Lois: Nick, I was wondering, is there a simple way to identify what an Extract or Replicat is doing? Nick: If you need something to help identify what that Extract or Replicat is doing or the description of it, we do have a description field. So when you do the Add Extract or Add Replicat, there is a DESC field that allows you to add more details in. And this is really key because it allows you to put a lot more information that’s going to show up in all the log files at the service manager level. And any time you do an info on the service it’ll also bring up that description field so you can see what’s going on. That way, if you get an alert, a watch, you need to keep track of something you can easily identify what that process is doing and what it’s replicating for. 07:06 Adopting a multicloud strategy is a big step towards future-proofing your business and we’re here to help you navigate this complex landscape. With our suite of courses, you'll gain insights into network connectivity, security protocols, and the considerations of working across different cloud platforms. Start your journey to multicloud today by visiting mylearn.oracle.com. 07:32 Nikita: Welcome back! Before the break, we were talking about the description field, which helps identify what the Extract is doing. Nick, are there any best practices to keep in mind when naming a group? Nick: You also don't want to use any special characters when naming the group, especially you know things like slashes or dashes. You don't want to use spaces in them, just really stick to alphanumeric characters only. The group names are also case insensitive, so EDEPT, all capitalized is the same as edept lowercase. The other thing that you don't want to do and this isn't a hard restriction, it's just more of a friendly reminder is don't end your group with a numeric value. The report files themselves end in numeric values, so you'll have a report file, 0123456789, and so on. If you were to end your group name with a numeric value, then it can often be confused for a report file. And so you don't want to really do that. But otherwise you're free to call it whatever you want. 08:39 Lois: Got it. What about naming conventions? Are there any rules that apply? Nick: You can use whatever naming convention you want, but again, try and follow these best practices. No strange characters and don't end your process names with a numeric value. 08:53 Nikita: Can you explain the role of parameter and trail files in the Extract process? Nick: The parameter files is really where GoldenGate is doing all of its hard work. And there's two parameter files, there's the GLOBALS file that's configured at the service manager level, and then you have your actual process parameter files that are configured at your Extract, Replicat, distribution service levels. We also have our trail files. We've talked a little bit about the trail file as they're the continuous source of information for GoldenGate. They can exist on the source or target or intermediate system. An Extract process can write to multiple trail files, but a trail file can only be written to by one process at a time. A trail file also consists of a two-letter abbreviation, and then we have that nine-digit sequence number. And then the processes that read a trail file include the Distribution Path or Target Initiated Path, as well as the Replicat. And again, just as a quick reminder, all the trail files are stored in the ogg_var_home/libdata directory. This is important because they do grow pretty rapidly, and so if you need to keep track of the space used in that directory, this is an important thing to do. 10:05 Nikita: How does GoldenGate handle data extraction for different database systems? Nick: You have a couple of different options. There's for non-Oracle databases, you have a classic extract. And these read directly from the transaction logs themselves or they call an API within that target database. For example, within DB2, we actually use the DB2 API to pull data out of the transaction logs. In something like Postgres, we're getting data directly out of the transaction logs by reading it. And that's put in there by the test decoding plugin. With Oracle, it's a little bit different. Because it's actually integrated in with the database kernel itself, all the log mining is done inside the database engine. All GoldenGate is doing is connecting to that log mining area and getting the data from it. We also have the option of doing standby redo logs or downstream capture, which allows GoldenGate to read data from the standby redo logs. It's still an integrated extract. So GoldenGate isn't directly reading the standby redo logs, but it allows you to set it up in a different way. And then we have an initial load extract, which is to pull out the base data out of the tables by doing a select star on the tables. And this does not include any change data. So the initial loads and the what we call tran log extracts are separate. 11:24 Lois: Before we wrap up, can you quickly walk us through the key steps to configure an extract? Nick: So the first thing we want to do is set up a restart profile so that if the process fails, it'll automatically restart. Next thing we do is we create the extract parameter file. Then we'll go ahead and register the extract with the database. This is important in Oracle and also with a couple of other databases that we support where we actually need to tell the database engine that, hey, we're going to be setting up a process that's going to pull data out of that environment. Then we go ahead and add the extract process. We add the trail file so that it knows where to write to. And then we can go ahead and start the extract process and we'll be on our way to configuring replication. If necessary, we can configure a distribution service path as well. 12:09 Nikita: Thanks, Nick, for telling us about the Extract process. If you'd like to learn more about what we discussed today, head over to mylearn.oracle.com and check out the Oracle GoldenGate 23ai: Fundamentals course. Lois: You'll find detailed instructions to help you get started. In the next episode, we’ll go through the Distribution Path, Target Initiated Path, Receiver Server, and Initial Load. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 12:37 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
--------
13:06
Oracle GoldenGate Installation
Installing Oracle GoldenGate 23ai is more than just running a setup file—it’s about preparing your system for efficient, reliable data replication. In this episode, Lois Houston and Nikita welcome back Nick Wagner to break down system requirements, storage considerations, and best practices for installing GoldenGate. You’ll learn how to optimize disk space, manage trail files, and configure network settings to ensure a smooth installation. Oracle GoldenGate 23ai: Fundamentals: https://mylearn.oracle.com/ou/course/oracle-goldengate-23ai-fundamentals/145884/237273 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. ------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we’ll bring you foundational training on the most popular Oracle technologies. Let’s get started! 00:25 Nikita: Hello and welcome to Oracle University Podcast! I’m Nikita Abraham, Team Lead of Editorial Services with Oracle University, and I’m joined by Lois Houston, Director of Innovation Programs. Lois: Hi there! Last week, we took a close look at the security strategies of Oracle GoldenGate 23ai. In this episode, we’ll discuss all aspects of installing GoldenGate. 00:48 Nikita: That’s right, Lois. And back with us today is Nick Wagner, Senior Director of Product Management for GoldenGate at Oracle. Hi Nick! I’m going to get straight into it. What are the system requirements for a typical GoldenGate installation? Nick: As far as system requirements, we're going to split that into two sections. We've got an operating system requirements and a storage requirements. So with memory and disk, and I know that this isn't the answer you want, but the answer is that it varies. With GoldenGate, the amount of CPU usage that is required depends on the number of extracts and replicats. It also depends on the number of threads that you're going to be using for those replicats. Same thing with RAM and disk usage. That's going to vary on the transaction sizes and the number of long running transactions. 01:35 Lois: And how does the recovery process in GoldenGate impact system resources? Nick: You've got two things that help the extract recovery. You've got the bonded recovery that will store transactions over a certain length of time to disk. It also has a cache manager setting that determines what gets written to disk as part of open transactions. It's not just the simple answer as, oh, it needs this much space. GoldenGate also needs to store trail files for the data that it's moving across. So if there's network latency, or if you expect a certain network outage, or you have certain SLAs for the target database that may not be met, you need to make sure that GoldenGate has enough room to store its trail files as it's writing them. The good news about all this is that you can track it. You can use parameters to set them. And we do have some metrics that we'll provide to you on how to size these environments. So a couple of things on the disk usage. The actual installation of GoldenGate is about 1 to 1.5 gig in size, depending on which version of GoldenGate you're going to be using and what database. The trail files themselves, they default to 500 megabytes apiece. A lot of customers keep them on disk longer than they're necessary, and so there's all sorts of purging options available in GoldenGate. But you can set up purge rules to say, hey, I want to get rid of my trail files as soon as they're not needed anymore. But you can also say, you know what? I want to keep my trail files around for x number of days, even if they're not needed. That way they can be rebuilt. I can restore from any previous point in time. 03:15 Nikita: Let’s talk a bit more about trail files. How do these files grow and what settings can users adjust to manage their storage efficiently? Nick: The trail files grow at about 30% to 35% of the generated redo log data. So if I'm generating 100 gigabytes of redo an hour, then you can expect the trail files to be anywhere from 30 to 35 gigabytes an hour of generated data. And this is if you're replicating everything. Again, GoldenGate's got so many different options. There's so many different ways to use it. In some cases, if you're going to a distributed applications and analytics environment, like a Databricks or a Snowflake, you might want to write more information to the trail file than what's necessary. Maybe I want additional information, such as when this change happened, who the user was that made that change. I can add specific token data. You can also tell GoldenGate to log additional records or additional columns to the trail file that may not have been changed. So I can always say, hey, GoldenGate, replicate and store the entire before and after image of every single row change to your trail file, even if those columns didn't change. And so there's a lot of different ways to do it there. But generally speaking, the default settings, you're looking at about 30% to 35% of the generated redo log value. System swap can fill up quickly. You do want this as a dedicated disk as well. System swap is often used for just handling of the changes, as GoldenGate flushes data from memory down to disk. These are controlled by a couple of parameters. So because GoldenGate is only writing committed activity to the trail file, the log reader inside the database is actually giving GoldenGate not only committed activity but uncommitted activity, too. And this is so it can stay very high speed and very low latency. 05:17 Lois: So, what are the parameters? Nick: There's a cache manager overall feature, and there's a cache directory. That directory controls where that data is actually stored, so you can specify the location of the uncommitted transactions. You can also specify the cache size. And there's not only memory settings here, but there's also disk settings. So you can say, hey, once a cache size exceeds a certain memory usage, then start flushing to disk, which is going to be slower. This is for systems that maybe have less memory but more high-speed disk. You can optimize these parameters as necessary. 05:53 Nikita: And how does GoldenGate adjust these parameters? Nick: For most environments, you're just going to leave them alone. They're automatically configured to look at the system memory available on that system and not use it all. And then as soon as necessary, it'll overflow to disk. There's also intelligent settings built within these parameters and within the cache manager itself that if it starts seeing a lull in activity or your traditional OLTP type responses to actually free up the memory that it has allocated. Or if it starts seeing more activity around data warehousing type things where you're doing large transactions, it'll actually hold on to memory a little bit longer. So it kinda learns as it goes through your environment and starts replicating data. 06:37 Lois: Is there anything else you think we should talk about before we move on to installing GoldenGate? Nick: There's a couple additional things you need to think of with the network as well. So when you're deploying GoldenGate, you definitely want it to use the fastest network. GoldenGate can also use a reverse proxy, especially important with microservices. Reverse proxy, typically we recommend Nginx. And it allows you to access any of the GoldenGate microservices using a single port. GoldenGate also needs either host names or IP addresses to do its communication and to ensure the system is available. It does a lot of communication through TCP and IP as well as WSS. And then it also handles firewalls. So you want to make sure that the firewalls are open for ingress and egress for GoldenGate, too. There's a couple of different privileges that GoldenGate needs when you go to install it. You'll want to make sure that GoldenGate has the ability to write to the home where you're installing it. That's kind of obvious, but we need to say it anyways. There's a utility called oggca.sh. That's the GoldenGate Configuration Assistant that allows you to set up your first deployments and manage additional deployments. That needs permissions to write to the directories where you're going to be creating the new deployments. The extract process needs connection and permissions to read the transaction logs or backups. This is not important for Oracle, but for non-Oracle it is. And then we also recommend a dedicated database user for the extract and replicat connections. 08:15 Are you keen to stay ahead in today's fast-paced world? We’ve got your back! Each quarter, Oracle rolls out game-changing updates to its Fusion Cloud Applications. And to make sure you’re always in the know, we offer New Features courses that give you an insider’s look at all of the latest advancements. Don't miss out! Head over to mylearn.oracle.com to get started. 08:41 Nikita: Welcome back! So Nick, how do we get started with the installation? Nick: So when we go to the install, the first thing you're going to do is go ahead and go to Oracle's website and download the software. Because of the way that GoldenGate works, there's only a couple moving parts. You saw the microservices. There's five or six of them. You have your extract, your replicat, your distribution service, trail files. There's not a lot of moving components. So if something does go wrong, usually it affects multiple customers. And so it's very important that when you go to install GoldenGate, you're using the most recent bundle patch. And you can find this within My Oracle Support. It's not always available directly from OTN or from the Oracle e-delivery website. You can still get them there, but we really want people going to My Oracle Support to download the most recent version. There's a couple of environment variables and certificates that you'll set up as well. And then you'll run the Configuration Assistant to create your deployments. 09:44 Lois: Thanks, Nick, for taking us though the installation of GoldenGate. Because these are highly UI-driven topics, we recommend that you take a deep dive into the GoldenGate 23ai Fundamentals course, available on mylearn.oracle.com. Nikita: In our next episode, we’ll talk about the Extract process. Until then, this is Nikita Abraham… Lois: And Lois Houston signing off! 10:08 That’s all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We’d also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
--------
10:37

Más podcasts de Educación

Podcasts a la moda de Educación

Acerca de Oracle University Podcast

Oracle University Podcast delivers convenient, foundational training on popular Oracle technologies such as Oracle Cloud Infrastructure, Java, Autonomous Database, and more to help you jump-start or advance your career in the cloud.

Sitio web del podcast

Educación Tecnología

Escucha Oracle University Podcast, Miss Honey: Slow English Podcast y muchos más podcasts de todo el mundo con la aplicación de radio.net

Descarga la app gratuita: radio.net

Añadir radios y podcasts a favoritos
Transmisión por Wi-Fi y Bluetooth
Carplay & Android Auto compatible
Muchas otras funciones de la app

Descarga la app gratuita: radio.net

Añadir radios y podcasts a favoritos
Transmisión por Wi-Fi y Bluetooth
Carplay & Android Auto compatible
Muchas otras funciones de la app

Oracle University Podcast

Escanea el código,
Descarga la app,
Escucha.

Oracle University Podcast: Podcasts del grupo

Empresa

Información legal

Servicio

Aplicaciones

Redes sociales

v7.18.7 | © 2007-2025 radio.de GmbH

Generated: 6/25/2025 - 4:23:00 PM