A site about Talend
Salesforce does not provide any formal Change Data Capture Data (CDC) mechanism, as you may find with a Relational Databases (RDBMS). If you need to extract changed data from Salesforce, then I would recommend using the SystemModstamp field; which can be found on every Salesforce object. By using this date and time value, and by recording the previous date and time that data was extracted, you are able to incrementally select new, modified and (optionally) deleted rows from Salesforce. I would recommend incorporating an overlap in the date and time values, to mitigate the risk of lost changes due to longer running transactions.
The following table shows a sample series of extracts for the Account object.
Object | Extract Start | Last Extract Start | Overlap (seconds) | Query (Where) |
---|---|---|---|---|
Account | 2014-01-31T12:00:00.000Z | 60 | ||
Account | 2014-01-31T13:00:00.000Z | 2014-01-31T12:00:00.000Z | 60 | where SystemModstamp >= 2014-01-31T11:59:00.000Z |
Account | 2014-01-31T14:00:00.000Z | 2014-01-31T13:00:00.000Z | 60 | where SystemModstamp >= 2014-01-31T12:59:00.000Z |
In the above scenario, we can reasonably expect (you may need to review the overlap value) to capture all changes to the Salesforce object. We will not be able to reliable determine the nature of the change (in respect to how our Data Warehouse or other receiving system will perceive the change); that is, determining if the change is an Insert, Update, Delete or a restoration from he Recycle Bin. The onus is, therefore, on the receiving system to determine the nature of the change, for example, if the row has been seen before.
The following table shows the key fields that are available for tracking the modification history of a row.
Field | Description |
---|---|
CreatedDate | The date and time that the row was created, for example, 2014-01-31T12:00:00.000Z |
CreatedId | The User.ID of the user that created the row. |
LastModifiedDate | The date and time that the row was created, last modified, deleted or restored for example, 2014-01-31T12:00:00.000Z |
LastModifiedBy | The User.ID of the user that created, last modified, deleted or restored the row. |
SystemModstamp | The date and time that the row was created, last modified, deleted, restored or other system events. |
IsDeleted | A boolean value to show if a row is in the Recycle Bin. Unless requested, these rows are not available for query. |
The following table shows typical values for these fields, based on the specified events.
Action | CreatedDate | LastModifiedDate | SystemModStamp | IsDeleted |
---|---|---|---|---|
Insert Row | 2014-01-31T12:00:00.000Z | 2014-01-31T12:00:00.000Z | 2014-01-31T12:00:00.000Z | false |
Update Row | 2014-01-31T12:00:00.000Z | 2014-01-31T12:15:00.000Z | 2014-01-31T12:15:00.000Z | false |
Delete Row | 2014-01-31T12:00:00.000Z | 2014-01-31T12:30:00.000Z | 2014-01-31T12:30:00.000Z | true |
Restore Row | 2014-01-31T12:00:00.000Z | 2014-01-31T12:45:00.000Z | 2014-01-31T12:45:00.000Z | false |
Most, but not all, Salesforce Objects allow rows to be deleted. Deleted rows are moved to a Recycle Bin; however, they do not remain there indefinitely. If a row is deleted, then the field isDeleted will be set to true. When querying Salesforce Objects, you must specify if you want deleted rows to be selected.