xtdb:database
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
xtdb:database [2022/04/05 16:47] – created James Sentman | xtdb:database [2023/02/13 14:52] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 2: | Line 2: | ||
The data that is stored in XTdb is fairly simple, and always in chronological order which means that it is possible to use flat files for data and indexes which are potentially much faster to access and much less resource intensive than using an SQL database. The combination of indexing and disk caching that XTdb uses is very fast in retrieving data and building reports. If disk corruption does occur it is easy to rebuild both the data files and the indexes from remaining data and filter out any garbage data. | The data that is stored in XTdb is fairly simple, and always in chronological order which means that it is possible to use flat files for data and indexes which are potentially much faster to access and much less resource intensive than using an SQL database. The combination of indexing and disk caching that XTdb uses is very fast in retrieving data and building reports. If disk corruption does occur it is easy to rebuild both the data files and the indexes from remaining data and filter out any garbage data. | ||
+ | ====Using the Database Window:==== | ||
+ | By default a database entry is created for every unit in XTension. You can see this list and run queries against them from the Database window. Open this window by going to the Database window and selecting “Show Database Window” | ||
+ | {{: | ||
- | ====Database File Format: | + | All the Units are listed here along with the current number of records stored |
- | There are 2 files for each unit storing data in the XTdb database. | + | |
- | The data file has the suffix of “XTdbd” and the index file has the suffix file “XTdbi” The index file is less important and can always be rebuilt by XTdb if necessary | + | **Enable/ |
- | The index file contains 8 bytes for each date record again in double precision format. A date is written to the index file for every 500 data entries in the data file. Making it much faster to get to within 500 records of the correct placement | + | **Set Reap Timer:** Opens a popup with some common reap timer values. These control how long data is stored |
- | When performing a query XTdb finds the nearest start date in the data file by performing | + | **Set Clear IgnoreOffs: |
- | After finding | + | Units shown with a red marker at the beginning of their name indicate a Unit for which there is an entry in XTdb for, but which no longer exists in XTension. (or if XTension is not running all Units will appear that way) See the section on managing Units no longer in XTension |
+ | ----- | ||
+ | ====Database Unit Detail Window:=== | ||
+ | {{: | ||
+ | |||
+ | Double click a Unit in the list to open it’s Database Unit Detail window. The top information section shows you some statistics about this Unit in the database. How many records, the dates of the first and last records, the data file size and the index file size as well as the number of index nodes. | ||
+ | |||
+ | **Enabled: | ||
+ | |||
+ | **Save data for:** The reap timer for this Unit or how long to save data before it gets rolled off the back of the file. Popup values are 30, 60 or 90 days, 6 months, 1 year, 5 years or forever. The default for new Units is Forever. This can also be set in the XTdb tab of the Edit Unit dialog in XTension. | ||
+ | |||
+ | **Ignore Offs:** Does not save Off events if they are of no use in the generating of reports or graphs. For devices like motion sensors only the on events may be useful and so you can ignore the offs. | ||
+ | |||
+ | **Save All Datapoints: | ||
+ | |||
+ | **Ignore Values Above/ | ||
+ | |||
+ | The query section and actions at the bottom of the window are documented in the next sections. | ||
----- | ----- | ||
- | ====Appending New Datapoints: | ||
- | When the value for a unit changes, or when an on/off event happens that needs to be stored in the database that information is sent to XTdb. In most cases the new activity date coming with the data is after the last data point that is already saved and so the new date and value can be written to the file as is. If the new value puts the record count over the next index point then the date is also appended to the index file and the in memory array of date offsets. | ||
- | If the new value or event from a unit has an activity date associated | + | ====Running A Query: |
+ | At the bottom of the Unit Detail window is an interface to running a query. By default the start and end fields are filled | ||
- | The exception | + | {{: |
+ | |||
+ | The query results window can take some time to populate, the window | ||
+ | |||
+ | Select Save from the File menu to save the output as a csv, comma separated value, file. This is easy to import into other applications for graphing | ||
+ | |||
+ | You can copy lines out as regular text as well. | ||
+ | |||
+ | **Deleting Individual Records:** By highlighting lines in the output and selecting Delete from the Edit menu you can remove the selected lines from the output. This will cause the data files on disk to be conformed without the selected records. This is useful | ||
+ | |||
+ | It is also possible to run queries via AppleScript, | ||
+ | |||
+ | |||
+ | ----- | ||
+ | ====Delete, Validate, Rebuild, Filter: | ||
+ | At the bottom of the Database Unit Detail window are several buttons | ||
+ | |||
+ | ===Delete: | ||
+ | This will delete all the data saved for a Unit. If the Unit is still in XTension it is not possible | ||
+ | |||
+ | ===Validate: | ||
+ | If you suspect that data is either missing or not showing up in graphs | ||
+ | |||
+ | ===Rebuild: | ||
+ | If the data file has become corrupted or has been written out of order in spite of all the checks to prevent that it may be necessary to rebuild the data file and indexs. This button will do both. The data file is opened and loaded entirely into memory, then re-sorted by the dates, re-written and then the index re-created. This can be a time consuming and memory intensive process for large datasets. A background processing window | ||
+ | |||
+ | ===Filter: | ||
+ | {{: | ||
+ | |||
+ | The Filter button brings | ||
+ | |||
+ | ----- | ||
+ | |||
+ | ====Managing Units No Longer in XTension: | ||
+ | XTdb will not delete a Unit from its database when you delete a Unit in XTension. The data is kept until you specifically delete it. You can do this manually from each Units database detail window or you can use the Manage Units No Longer in XTension window. Any Units that are no longer found in the feed from XTension will be marked in the Database list window with a red marker ahead of their name. | ||
+ | |||
+ | From the Database menu select the “Manage Units No Longer In XTension” menu. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | This window contains a list of all “orphaned” Units still in the database but no longer found in the feed from XTension. All data for all Units in the list will be deleted when you click the delete button. NOT just the selected ones in the list. To edit the list you can highlight Units and use the “Remove From List” button to remove them from the list. This does not remove the data, just removes them from the list that will be deleted when you click the Delete button. | ||
+ | |||
+ | If you have deleted more Units while the window was open, or if you have removed items from the list you do wish to delete, you can use the Refresh List button to rebuild the list. | ||
+ | |||
+ | Note that all Units that you haven’t removed from the list are deleted when you press the delete button, NOT just any that you have selected in the list. Remove any Units from the list that you wish to keep the data for before pressing the delete button. | ||
- | In order to preserve the limited write cycles of modern SSD drives new data is only written out to the disk every few seconds maximum so that writes aren’t happening with every data point constantly and thereby using up the drive. This does mean that if the program crashes or the computer hangs or panics it is possible to lose the last few seconds of data. This seems a decent tradeoff for using up expensive SSD drives, some of which are internal now days and not replaceable. An option to not do this delayed disk writing is not currently available but is possible if anyone has a need for such a thing please let me know. Also let me know why you would want that as it might be a bigger problem that I could solve if the program is crashing regularly or something like that. | ||
- | Once the database has been managed the event is sent on to any graph or gauge that includes that unit. Since Graphs and Gauges can contain many units the redrawing and distributing of new renders is not done inline with the data reception. Instead the new value is added to the cached query data for the graph and the graph is added to a redraw queue. This way if several units get updates closely to each other it will only result in the graph redrawing a single time with all the new data rather than redrawing for each unit that updates and sending multiple updates immediately after each other. As of this writing there are 3 rendering threads that look for any graphs in the queue, try to render them and send updates to the graph image, and then go to sleep for half a second before looking again. This way if there is a large graph that takes a long time to render other graphs can begin their rendering while it is still reading data or rendering the graphics for it. | ||
Line 50: | Line 110: | ||
If you have a setting other than “forever” for the individual unit then it is scanned during this process. If there are records at the beginning of the file that predate your cutoff then a temporary file is created to replace it, the data is read and each record that is earlier than you wanted to save is ignored and when data is reached that is within the time frame you wish to keep the data is transferred to the new file. When complete the old file is swapped with the temporary file and then deleted. The index file is rebuilt to represent the new offsets of the data and all disk caches and memory arrays are invalidated so they know to reload from the actual data the next time they are needed. | If you have a setting other than “forever” for the individual unit then it is scanned during this process. If there are records at the beginning of the file that predate your cutoff then a temporary file is created to replace it, the data is read and each record that is earlier than you wanted to save is ignored and when data is reached that is within the time frame you wish to keep the data is transferred to the new file. When complete the old file is swapped with the temporary file and then deleted. The index file is rebuilt to represent the new offsets of the data and all disk caches and memory arrays are invalidated so they know to reload from the actual data the next time they are needed. | ||
+ | |||
+ | ----- | ||
+ | |||
+ | ====Database File Format:==== | ||
+ | There are 2 files for each unit storing data in the XTdb database. They are named “unit “ and then the unique ID of the unit as assigned in XTension. This is a unique numerical ID and not really human readable. If you need to find the files for a specific unit you can highlight it in the Database list window and then select “Reveal Unit Data File In The Finder” from the Database menu. | ||
+ | |||
+ | The data file has the suffix of “XTdbd” and the index file has the suffix file “XTdbi” The index file is less important and can always be rebuilt by XTdb if necessary for whatever reason. The layout of the data file is 16 bytes for each entry. The first 8 is a “double precision” number that contains the total seconds of the date of the event. The second 8 bytes contains a double precision number which is the value of the Unit from XTension when the value changed. XTension uses double precision values for all unit values so that they can contain fractions and not just whole integers. There are limitations to the double precision numerical format that you should be aware of by reading the [[https:// | ||
+ | |||
+ | The index file contains 8 bytes for each date record again in double precision format. A date is written to the index file for every 500 data entries in the data file. Making it much faster to get to within 500 records of the correct placement in the data file. This index interval may change at any moment however as I continue to optimize the system. At this moment all index arrays are loaded into memory at startup and maintained there as well as on disk for faster searching. This may change in the future if I need to switch to only caching the most often used indexes due to memory usage or something similar. | ||
+ | |||
+ | When performing a query XTdb finds the nearest start date in the data file by performing a [[https:// | ||
+ | |||
+ | After finding the nearest start date the data is read sequentially until the dates equal or exceed the stop date for the query. | ||
+ | |||
+ | |||
+ | |||
+ | ----- | ||
+ | ====Appending New Datapoints: | ||
+ | When the value for a unit changes, or when an on/off event happens that needs to be stored in the database that information is sent to XTdb. In most cases the new activity date coming with the data is after the last data point that is already saved and so the new date and value can be written to the file as is. If the new value puts the record count over the next index point then the date is also appended to the index file and the in memory array of date offsets. | ||
+ | |||
+ | If the new value or event from a unit has an activity date associated with it that is before the last value already in the file then that data is considered to be in error and thrown out. It would be theoretically possible to find the correct place in the file for the new data and insert it, recreating the rest of the file and the indexes after doing so, but there seems no point to adding this ability since data should only ever come in that is increasing in time and not randomly forwards and backwards. | ||
+ | |||
+ | The exception to that is if an update is received that has the same date as the last data point. Since the dates use the total seconds value as the resolution of the database, it is limited to one value or event in any given second. This seems perfectly fine for almost any data that regular home automation software should need to deal with. It is possible for something to send repeats or for a sensor or other device to send more than one update in a second however rare this may be. If an update is received with an activity date in the same second as the last value saved the value is compared to the last value. If it is the same then the update is discarded as it wouldn’t change anything. If it is a different value then the data file is backed up by 8 bytes and the new value is written in place of the original one. This is an expensive operation CPU wise and should be avoided whenever possible. | ||
+ | |||
+ | In order to preserve the limited write cycles of modern SSD drives new data is only written out to the disk every few seconds maximum so that writes aren’t happening with every data point constantly and thereby using up the drive. This does mean that if the program crashes or the computer hangs or panics it is possible to lose the last few seconds of data. This seems a decent tradeoff for using up expensive SSD drives, some of which are internal now days and not replaceable. An option to not do this delayed disk writing is not currently available but is possible if anyone has a need for such a thing please let me know. Also let me know why you would want that as it might be a bigger problem that I could solve if the program is crashing regularly or something like that. | ||
+ | |||
+ | Once the database has been managed the event is sent on to any graph or gauge that includes that unit. Since Graphs and Gauges can contain many units the redrawing and distributing of new renders is not done inline with the data reception. Instead the new value is added to the cached query data for the graph and the graph is added to a redraw queue. This way if several units get updates closely to each other it will only result in the graph redrawing a single time with all the new data rather than redrawing for each unit that updates and sending multiple updates immediately after each other. As of this writing there are 3 rendering threads that look for any graphs in the queue, try to render them and send updates to the graph image, and then go to sleep for half a second before looking again. This way if there is a large graph that takes a long time to render other graphs can begin their rendering while it is still reading data or rendering the graphics for it. | ||
+ | |||
+ | |||
+ | |||
xtdb/database.1649177258.txt.gz · Last modified: 2023/02/13 14:51 (external edit)