How to extract data from XML nodes in Scala

本文介绍如何使用Scala的 Elem 和 NodeSeq 类的方法从XML中提取数据。涵盖了常用的Elem类方法,如搜索子元素、获取属性值及子节点等,并通过实例展示了如何应用这些方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Problem: In a Scala application, you want to extract information from XML you receive, so you can use the data in your application.

Solution

Use the methods of the Scala Elem and NodeSeq classes to extract the data. The most commonly used methods of the Elem class are shown here:

Commonly used methods of the Elem class

Method                 Description
------                 -----------

x \ "div"              Searches the XML literal x for elements of type <div>. 
                       Only searches immediate child nodes (no grandchild or “descendant” nodes).

x \\ "div"             Searches the XML literal x for elements of type <div>. Returns matching 
                       elements from child nodes at any depth of the XML tree.

x.attribute("class")   Returns the value of the given attribute in the current node.
                       <a x="10" y="20">foo</a>.attribute("x")   // returns Some(10).

x.attributes           Returns all attributes of the current node, prefixed and unprefixed, 
                       in no particular order.
                       scala> <a x="10" y="20">foo</a>.attributes
                       res0: scala.xml.MetaData =  x="10" y="20"

x.child                Returns the children of the current node.
                       <a><b>foo</b></a>.child   // returns <b>foo</b>.

x.copy(...)            Returns a copy of the element, letting you replace data during the 
                       copy process.

x.label                The name of the current element. 
                       <a><b>foo</b></a>.label   // returns a.

x.text                 Returns a concatenation of text(n) for each child n.

x.toString             Emits the XML literal as a String. 
                       Use scala.xml.PrettyPrinter to format the output, if desired.

Examples

The following examples demonstrate most of the methods just shown. Given this XML literal:

scala> val x = <div class="content"><p>Hello</p><p>world</p></div>
x: scala.xml.Elem = <div class="content"><p>Hello</p><p>world</p></div>

you can search for and extract subelements with the \ and \\ XPath methods:

scala> x \ "p"
res0: scala.xml.NodeSeq = NodeSeq(<p>Hello</p>, <p>world</p>)

scala> x \\ "p"
res1: scala.xml.NodeSeq = NodeSeq(<p>Hello</p>, <p>world</p>)

These methods will be demonstrated more in subsequent recipes.

The label method returns the name of the current element. A <p> tag returns p, a <div> tag returns div, etc.:

scala> x.label
res2: String = div

scala> <name>Joe</name>.label
res3: String = name

The text method returns the text from all subelements, which the Scaladoc describes as, “a concatenation of all text(n) for each child n”:

scala> x.text
res4: String = Helloworld

Later examples will demonstrate how to improve on this result.

Element attributes are extracted with the attribute or attributesmethods. The following examples demonstrate how to call these methods, and the values they return:

scala> x.attribute("class")
res5: Option[Seq[scala.xml.Node]] = Some(content)

scala> x.attributes("class")
res6: Seq[scala.xml.Node] = content

scala> x.attributes.get("class")
res7: Option[Seq[scala.xml.Node]] = Some(content)

The following examples demonstrate how those same method calls behave when you search for an attribute that doesn’t exist:

scala> x.attribute("foo")
res8: Option[Seq[scala.xml.Node]] = None

scala> x.attributes("foo")
res9: Seq[scala.xml.Node] = null

scala> x.attributes.get("foo")
res10: Option[Seq[scala.xml.Node]] = None

scala> x.attributes.get("foo").getOrElse("N/A")
res11: Object = N/A

To demonstrate more ways to work with element attributes, let’s create a new element:

scala> val w = <forecast day="Thu" date="10 Nov 2011" low="37" high="58" />
w: scala.xml.Elem = <forecast day="Thu" date="10 Nov 2011" low="37" high="58" />

These examples show how attribute and attributes work with multiple attributes:

scala> w.attribute("day")
res0: Option[Seq[scala.xml.Node]] = Some(Thu)

scala> w.attributes("day")
res1: Seq[scala.xml.Node] = Thu

scala> w.attributes
res2: scala.xml.MetaData =  day="Thu" date="10 Nov 2011" low="37" high="58"

These examples show how to iterate over a set of attributes:

scala> for (a <- w.attributes) println(s"key: ${a.key}, value: ${a.value}")
key: day, value: Thu
key: date, value: 10 Nov 2011
key: low, value: 37
key: high, value: 58

scala> w.attributes.asAttrMap
res3: Map[String,String] = Map(low -> 37, date -> 10 Nov 2011, 
      day -> Thu, high -> 58)

Child elements

The child method returns all child nodes of the current element. To demonstrate this, let’s create a new XML variable:

scala> val p = <person><name>Ken</name><age>23</age></person>
p: scala.xml.Elem = <person><name>Ken</name><age>23</age></person>

The child method returns immediate child nodes:

scala> p.child
res0: Seq[scala.xml.Node] = ArrayBuffer(<name>Ken</name>, <age>23</age>)

You can use child to iterate over all the children:

scala> for (n <- p.child) println(n)
<name>Ken</name>
<age>23</age>

Because child returns a sequence, you can also access the child elements like this:

scala> p.child(0)
res1: scala.xml.Node = <name>Ken</name>

scala> p.child(0).label
res2: String = name

scala> p.child(0).text
res3: String = Ken

scala> p.child(1)
res4: scala.xml.Node = <age>23</age>

scala> p.child(1).text.toInt
res5: Int = 23

Text and strings

The toString method returns the XML structure as a String:

scala> p.toString
res6: String = <person><name>Ken</name><age>23</age></person>

You can improve this result with the PrettyPrinter class.

This approach shows another way to extract the text from the elements:

scala> for (n <- p.child) yield n.text
res7: Seq[String] = ArrayBuffer(Ken, 23)

There are more ways to tackle these problems using XPath methods, which will be shown in subsequent chapters.

As a word of caution, be careful with the text method. It returns different results depending on how the XML is formatted, which can be a particular problem when extracting XHTML data. To demonstrate this, the following examples show the output when there is a space before the <br> tag, and when there is no space:

scala> <div><p>Hello, world, <br/>it's me.</p></div>.text
res0: String = Hello, world, it's me.

scala> <div><p>Hello, world,<br/>it's me.</p></div>.text
res1: String = Hello, world,it's me.

In the next examples the same XML, formatted in different ways, yields different results:

scala> <div><p>Is 2 > 1?</p><p>Why do you ask?</p></div>.text
res2: String = Is 2 > 1?Why do you ask?

scala> <div>
     | <p>Is 2 > 1?</p>
     | <p>Why do you ask?</p>
     | </div>.text
res3: String = 
"
Is 2 > 1?
Why do you ask?
"

If you need to extract text in this manner, a workaround is to extract the text components individually into a sequence, and then re-combine the sequence as desired. The following example demonstrates how to accomplish this with the childlabel, and text methods. Given this XML literal:

val xml = <div><p>Is 2 > 1?</p><p>Why do you ask?</p></div>

the child method returns the elements as a sequence:

scala> xml.child
res0: Seq[scala.xml.Node] = 
  ArrayBuffer(<p>Is 2 > 1?</p>, <p>Why do you ask?</p>)

This lets you write the following code, which creates a sequence of strings from the <p> tags:

val strings = for {
  e <- xml.child
  if e.label == "p"
} yield e.text

The REPL shows that the resulting variable strings has the following type and data:

strings: Seq[String] = ArrayBuffer(Is 2 > 1?, Why do you ask?)

In the XPath recipes in this chapter you’ll see how to accomplish some of the same tasks using the \ and \\ methods.


Example data sets and REPL memory errors

If you want to test these commands against large data sets, this URL maintains a nice collection of sample XML data:

The NASA data set is 23 MB, and causes the Scala REPL to crash with a Java heap space error:

scala> val xml = scala.xml.XML.loadFile("nasa.xml")
java.lang.OutOfMemoryError: Java heap space ...

To get around this problem, you can allocate more heap space when starting the REPL with this command:

$ scala -J-Xms256m -J-Xmx512m

or this command:

$ env JAVA_OPTS="-Xms256m -Xmx512m" scala

内容概要:该论文探讨了一种基于粒子群优化(PSO)的STAR-RIS辅助NOMA无线通信网络优化方法。STAR-RIS作为一种新型可重构智能表面,能同时反射和传输信号,与传统仅能反射的RIS不同。结合NOMA技术,STAR-RIS可以提升覆盖范围、用户容量和频谱效率。针对STAR-RIS元素众多导致获取完整信道状态信息(CSI)开销大的问题,作者提出一种在不依赖完整CSI的情况下,联合优化功率分配、基站波束成形以及STAR-RIS的传输和反射波束成形向量的方法,以最大化总可实现速率并确保每个用户的最低速率要求。仿真结果显示,该方案优于STAR-RIS辅助的OMA系统。 适合人群:具备一定无线通信理论基础、对智能反射面技术和非正交多址接入技术感兴趣的科研人员和工程师。 使用场景及目标:①适用于希望深入了解STAR-RIS与NOMA结合的研究者;②为解决无线通信中频谱资源紧张、提高系统性能提供新的思路和技术手段;③帮助理解PSO算法在无线通信优化问题中的应用。 其他说明:文中提供了详细的Python代码实现,涵盖系统参数设置、信道建模、速率计算、目标函数定义、约束条件设定、主优化函数设计及结果可视化等环节,便于读者理解和复现实验结果。此外,文章还对比了PSO与其他优化算法(如DDPG)的区别,强调了PSO在不需要显式CSI估计方面的优势。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值