KEYWORDS: Data modeling, Education and training, Performance modeling, Systems modeling, Head, Transform theory, Reflection, Engineering, System integration, Statistical analysis
Temporal knowledge graph (TKG) can abstract the temporal information of entities and relations in the real world. If we can infer unknown temporal quadruples, we can predict the developmental trends of events in the long run. However, current TKG reasoning methods are difficult to model the relative temporal relations between quadruples well, and there is also an issue of insufficient reasoning information. Therefore, we propose a TKG reasoning model named TKBK, which combines formalized temporal knowledge and generative background knowledge. TKBK retrieves temporal knowledge from TKG and generates background knowledge from large language models (LLM). It uses a masking strategy to train a pre-trained language model and transforms the complex reasoning task into a masked token prediction task. We evaluated our proposed model on two datasets. The results show that TKBK outperforms the baseline model on most metrics, proving the effectiveness of this model in TKG reasoning tasks.
Imitation learning aims to learn policy from the demonstrations of experts. Compared to reinforcement learning, which learns by trial and error, imitation learning is not limited and affected by reward functions. Therefore, more and more research is focusing on using imitation learning to help agents explore and learn, especially in reward-sparse environments. Most existing work in this area assumes that expert demonstrations include both state and action information. However, in many cases we are only provided with state-only demonstrations, which can affect policy performance. In this paper, we use a state-only demonstrations to guide agents learning in a reward-sparse environment. We propose a policy optimization from observation (POfO) method. First, we reshape the rewards by forcing occupancy measure matching between the current policy and the demonstrations, which can effectively guide agent learning. Second, we train an inverse dynamics model(IDM) for inferring and completing the missing actions in state-only demonstrations. Finally, we accelerate policy learning based on demonstrations that have been complemented by IDM. According to the experimental results, the performance of our method is comparable to that of the method using the complete demonstrations and is significantly better than other methods of the same type.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.